Joined June 2008
1,244 Photos and videos
Part 1: NVIDIA's LocateAnything is built for the moment AI stops answering questions and starts pointing, clicking, reading, and acting. Speed isn't a luxury — it's the difference between a useful agent and a confused one. vist.ly/57hdt
336
The Three DNN Engines of OpenCV 5. The old 4.x DNN engine imported ~22% of ONNX. The new graph-based engine pushes past 80%, fuses MatMul→Softmax→MatMul into one FlashAttention layer, and runs YOLO26n 41% faster than ONNX Runtime — no code changes. Deep dive: vist.ly/57guu #OpenCV #ComputerVision #ONNX
1
342
OpenCV @ CVPR 2026 See what's new 👉 vist.ly/57de2 #OpenCV #CVPR2026 #ComputerVision #AI
624
Satya Mallick retweeted
I’m getting increasingly annoyed by young people complaining that they cannot do AI-related research unless they join big industrial labs… well, here is my reply: academia is supposed to work on ideas that money cannot buy!
48
44
948
77,823
Generic multi-token prediction is fast but breaks object detection — it doesn't know where one box ends and the next begins. LocateAnything's fix: make the prediction block = the box itself. The block isn't arbitrary. The block IS the geometry. 3 modes: 🐢 Slow — token-by-token, stable ⚡ Fast — parallel blocks, throughput 🔀 Hybrid — fast by default, falls back on hard cases Don't force geometry into a text generation mode. Part 2 of "Why Can't AI Point to the Exact Pixels Where Objects Are?" vist.ly/56vtn #LocateAnything #ComputerVision #VLM
1
498
Satya Mallick retweeted
I want to offer some unsolicited advice to computer vision researchers jumping into robotics. Don't focus too much on VLMs, VLAs etc. That's fine, but the real action is at the sensorimotor level. Most of the open problems in robotics are in manipulation, which is about hand-object interaction, and contacts and forces are central. Proprioception and tactile sensing are as important as vision. Don't get seduced by cherry-picked demos. You can't do robotics without doing robotics.
72
394
3,147
473,523
Most VLMs predict bounding boxes one token at a time — X1, Y1, X2, Y2. But a box isn't text. It's geometry. NVIDIA's LocateAnything predicts the entire box as one atomic unit. Parallel Box Decoding > next-token prediction for spatial outputs. (Part 1 🧵) Breakdown 👇 learnopencv.com #ComputerVision #AI
2
549
📣 AMD is now an OpenCV 5 Launch Partner & Gold Sponsor! We're teaming up to bring first-class CPU GPU acceleration to OpenCV 5, speeding up Vision AI pre- & post-processing across AMD Ryzen™, RDNA™ & ROCm™. Read more opencv.org/opencv-and-amd-an… #OpenCV #AMD #ComputerVision #AI

3
6
1,989
Instead of training a new model, what if you could just tell your AI what to look for? That's YOLOE-26 with text prompts. Hand it a list — Person, Helmet, Safety vest — get instance segmentation masks, not just boxes. Full tutorial: vist.ly/56iqy
1
3
663
3
629
An AI can tell you there's a cat in the image. Pointing to the exact pixels is the hard part. The reason it's slow: most VLMs spell out a bounding box one coordinate token at a time — some even split "1024" into single digits. But a box's corners are connected. Decode them independently and errors compound. That's the wall the next gen of clicking, navigating AI agents has to break. Full breakdown 👇 vist.ly/56dq3
1
435
Weak AI vs Strong AI, in one line: Weak AI recognizes the cat in the photo. Strong AI debates climate change with you. One is already transforming industries. The other could revolutionize everything we know. Full breakdown 👇 vist.ly/569tx #AI #AGI #MachineLearning
443
Satya Mallick retweeted
Euler went blind in 1771 at 64. He published 1 paper per week in 1775. After dying in 1783 he published 228 more papers. A third of all papers on math, mathematical physics & engineering mechanics in the latter part of the 18th century were his. If he was still being cited, his h-index would be around 850.
10
36
287
18,359
YOLOE-26 turns object detection into three ways of saying "find this": → Text prompt (name it) → Visual prompt (show it) → Prompt-free (let the model decide) Closed-set rigidity → open-vocabulary conversation. Tutorial benchmarks: vist.ly/565gr
3
673
Object detection is shifting from "models that recognize fixed categories" to "models that understand concepts described in language." YOLOE delivers open-vocabulary detection at full YOLO speed — text module fused into the head, zero runtime overhead. Full tutorial code: vist.ly/55v2b
1
1
4
565
Satya Mallick retweeted
Replying to @anshulkundaje
Those are orthogonal concepts. - World models trained on highly diverse data become foundation models: their encoders can be used for a wide variety of downstream tasks. - "World" refers to two things: (1) predicting the evolution of a complex system or environment, (2) predicting the evolution of a system under control and its effect on the environment (action-conditioned world model) which is a necessary component of planning.
39
103
1,165
81,398
Satya Mallick retweeted
RF-DETR is now available in @huggingface transformers state of the art in both detection and segmentation, outperforming YOLO architectures - checkpoints: huggingface.co/Roboflow/mode… - demo: huggingface.co/spaces/huggin… - docs: huggingface.co/collections/m…
25
123
1,172
78,935
This robot's only job is to pretend it's your eyeball 👁️🤖 At Display Week 2026, Dr. Satya Mallick visits Gamma Scientific — the 6-axis robot AR/VR brands use to QA every headset before launch. 18 tests in one rig: contrast, parallax, MTF, color gamut, eye box. The invisible layer behind every Vision Pro. #ARVR #DisplayWeek2026 #Metrology #GammaScientific #Robotics #VisionPro
1
3
543
A $99 hologram. With an AI agent living inside it. Dr. Satya Mallick meets Shawn Frayne (CEO, Looking Glass Factory) at Display Week 2026 for a hands-on with the Looking Glass Go their new life-size Hololuminescent Display — SID 2026 Display of the Year. The future of display isn't a headset. 🧵 #LookingGlass #Hologram #AI #DisplayWeek2026 #LightField #SpatialComputing #Hololuminescent
1
3
378