Prithiv Sakthi

Prithiv Sakthi

277 Photos and videos

Tweets

Pinned Tweet

Prithiv Sakthi @prithivMLmods

Feb 27

Qwen3-VL-Video-Grounding Demo. Perform point tracking, text-guided detection, and video question answering, all powered by the Qwen3-VL-4B vision-language model with real-time bounding box detection and cross-frame object matching. 🤗 @huggingface Demo in 🧵

0:33

475

50,279

DailyPapers

Prithiv Sakthi retweeted

DailyPapers

@HuggingPapers

Jun 14

LabVLA: Grounding VLA models in scientific laboratories RoboGenesis builds 10K lab scenes across 16 robot types. LabVLA pairs a Qwen3-VL backbone with a DiT flow-matching expert, reaching 71.1% success on LabUtopia and transferring to real Franka robots.

3,779

DailyPapers

Prithiv Sakthi retweeted

DailyPapers

@HuggingPapers

Jun 14

SpatialWorld A new benchmark pushing multimodal agents to navigate, manipulate, and reason in physical 3D spaces. 760 tasks across 8 simulators reveal even GPT-5 only succeeds 17% of the time.

3,152

Prithiv Sakthi

Prithiv Sakthi @prithivMLmods

Jun 13

^EVERYDAY

Julien Chaumond

@julien_c

Jun 13

run local models TODAY

merve

Prithiv Sakthi retweeted

merve

@mervenoyann

Jun 12

new transformers tutorials just dropped for vision 🔥 🛰️ segmentation on satellite imagery: fine-tune RF-DETR-Seg segment buildings 📱 object detection on mobile UI: fine-tune RF-DETR on screenshots runs on toaster, converges fast, give to your agent for your use cases🫡

189

16,686

Kimi.ai

Prithiv Sakthi retweeted

Kimi.ai

@Kimi_Moonshot

Jun 12

🌘 Kimi-K2.7-Code, our latest coding model, is now released and open-sourced! 🔷 Improved coding & agent performance over K2.6: 21.8% on Kimi Code Bench v2, 11.0% on Program Bench, and 31.5% on MLS Bench Lite. 🔷 Reasoning efficiency: Less overthinking, with 30% lower reasoning-token usage compared to K2.6. 🔷 Long-horizon coding: Improved instruction following, higher end-to-end coding task success rates. ⚡️ 6x High-Speed Mode coming soon! 🔌 Available today via Kimi API and Kimi Code. 🔗 Kimi Code: kimi.com/code 🔗 API: platform.moonshot.ai

618

1,639

13,708

2,100,673

Google Gemma

Prithiv Sakthi retweeted

Google Gemma

@googlegemma

Jun 10

Meet DiffusionGemma! An experimental open model that explores a fast approach to text generation, released under an Apache 2.0 license. Moving beyond sequential, token-by-token processes to generate entire blocks of text simultaneously. Here’s what’s new with DiffusionGemma: 👇

0:05

169

810

5,025

921,821

Weijie Wang

Prithiv Sakthi retweeted

Weijie Wang @wjwang2003

Jun 9

🚀 Excited to share Mirage: latent spatial memory for video world models. No RGB point-cloud render-and-encode loop. No pixel-space detour. Just store 3D memory directly in latent space. ⚡ Up to 10.57x faster generation and 55x smaller 3D cache. 🧵 🌐 aka.ms/latent-spatial-memory

1:10

510

80,383

DailyPapers

Prithiv Sakthi retweeted

DailyPapers

@HuggingPapers

Jun 9

SpatialWorld reveals how poorly multimodal agents reason in 3D space 760 human-annotated tasks across 8 simulators, from kitchens to city streets. Even GPT-5 only solves 17.4% of them.

2,981

Omar Sanseviero

Prithiv Sakthi retweeted

Omar Sanseviero

@osanseviero

Jun 8

llama.cpp just added video input support 👀 You can now enjoy Gemma 4 video understanding capabilities in your chat completions endpoint and via mtmd-cli github.com/ggml-org/llama.cp…

mtmd : add video input support by ngxson · Pull Request #24269 · ggml-org/llama.cpp

Overview Fix #18389 Goals of this PR: Allow input video file via mtmd-cli and via /chat/completions (which automatically enables it on web ui) Invoke ffmpeg via a subprocessor (NOT pre-bundled, us...

github.com

399

17,615

Omar Sanseviero

Prithiv Sakthi retweeted

Omar Sanseviero

@osanseviero

Jun 7

Gemma 4 MTP just got officially merged into llama.cpp This means you can use Gemma 4 QAT MTP for a lightweight super fast setup. Excited to see what the community builds with it github.com/ggml-org/llama.cp…

llama : add Gemma4 MTP by am17an · Pull Request #23398 · ggml-org/llama.cpp

Overview This PR adds MTP support for Gemma 4 models. For the MoE model I don't observe a speed-up on my system, but the dense model has on average >2x speedup. Correctness wise I a...

github.com

130

1,229

93,565

Prithiv Sakthi

Prithiv Sakthi @prithivMLmods

Jun 6

RT @harbhajan_singh: Sad no Rajat Patidar in the indian squad. What else he needs to do ? Scored 501 runs strike rate almost 200 . Unfair 💔…

3,972

DailyPapers

Prithiv Sakthi retweeted

DailyPapers

@HuggingPapers

Jun 5

VideoKR: The first dataset for knowledge- and reasoning-intensive video understanding It curates 315K examples over 145K expert-domain videos with human-in-the-loop generation. The VideoKR-Eval benchmark forces models to perform genuine visual reasoning rather than relying on textual shortcuts.

2,189

NVIDIA AI

Prithiv Sakthi retweeted

NVIDIA AI

@NVIDIAAI

Jun 4

Today we're shipping Nemotron 3 Ultra. A 550B MoE frontier-intelligence open model built for long-running agents. It delivers 5x faster inference and lowers the cost of complex agentic tasks by up to 30% versus other open frontier models.

2:59

199

461

3,483

1,242,806

DailyPapers

Prithiv Sakthi retweeted

DailyPapers

@HuggingPapers

Jun 2

Model: huggingface.co/nvidia/4D-RGP… Paper page: huggingface.co/papers/2512.1…

nvidia/4D-RGPT-8B · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

982

Qwen

Prithiv Sakthi retweeted

Qwen

@Alibaba_Qwen

Jun 1

👏👏 Introducing Qwen3.7-Plus — a multimodal agent model that unifies vision and language into one versatile agent foundation. ✅ Multimodal interactive hybrid agent: unified GUI & CLI operation across visual and text tasks ✅ Versatile coding agent & productivity assistant with full-modality input ✅ Visual Agent: perception, reasoning, grounding, and search-augmented QA ✅ Cross-harness generalization across diverse agent frameworks One model. Sees, thinks, codes, acts.🙌🙌 Now available via API on Alibaba Cloud Model Studio. Try it — let us know what you build.😎 🔗🔗⬇️⬇️ Blog：qwen.ai/blog?id=qwen3.7-plus Qwen Studio：chat.qwen.ai/?models=qwen3.7… API：modelstudio.console.alibabac…

271

457

3,951

489,760

Prithiv Sakthi

Prithiv Sakthi @prithivMLmods

Jun 2

Open Source — Qwhen? 👀

Qwen

@Alibaba_Qwen

Jun 1

NVIDIA AI

Prithiv Sakthi retweeted

NVIDIA AI

@NVIDIAAI

Jun 1

Nemotron 3 Ultra is coming this week. ⌛️

2:11

105

353

3,302

389,391

Virat Kohli

Prithiv Sakthi retweeted

Virat Kohli

@imVkohli

Jun 1

We asked ourselves a question last year- can we go back to back? Here we are again 🏆🏆❤️❤️ @RCBTweets

3,971

29,970

216,732

2,835,678

AK

Prithiv Sakthi retweeted

@_akhaliq

Jun 1

GrepSeek Training Search Agents for Direct Corpus Interaction

10,549

Loïck BOURDOIS

Prithiv Sakthi retweeted

Loïck BOURDOIS @BdsLoick

May 28

Replying to @IBM @Microsoft @GoogleDeepMind @baai @qwen @orionweller @vllm @Google @Meta @huggingface @Nils_Reimers @jaseweston

Big thanks to my HF Fellows bros for multilingual evaluation @tomaarsen, Bram Vanroy, @christopher, @w00jun_ @mrm8488, @prithivMLmods and to @AI_AlphaEdge for the time dedicated to this project 🙏 Links 👇 Blogpost: huggingface.co/blog/lbourdoi… Models: huggingface.co/spaces/alphae…

Introduction to Trimming ✂

A Blog post by Loïck BOURDOIS on Hugging Face

huggingface.co

364