Satya Mallick

Satya Mallick

1,244 Photos and videos

Tweets

Satya Mallick

@LearnOpenCV

Jun 12

Part 1: NVIDIA's LocateAnything is built for the moment AI stops answering questions and starts pointing, clicking, reading, and acting. Speed isn't a luxury — it's the difference between a useful agent and a confused one. vist.ly/57hdt

1:11

336

Satya Mallick

Satya Mallick

@LearnOpenCV

Jun 12

The Three DNN Engines of OpenCV 5. The old 4.x DNN engine imported ~22% of ONNX. The new graph-based engine pushes past 80%, fuses MatMul→Softmax→MatMul into one FlashAttention layer, and runs YOLO26n 41% faster than ONNX Runtime — no code changes. Deep dive: vist.ly/57guu #OpenCV #ComputerVision #ONNX

2:45

342

Satya Mallick

Satya Mallick

@LearnOpenCV

Jun 11

OpenCV @ CVPR 2026 See what's new 👉 vist.ly/57de2 #OpenCV #CVPR2026 #ComputerVision #AI

1:37

624

Yi Ma

Satya Mallick retweeted

Yi Ma

@YiMaTweets

Jun 10

I’m getting increasingly annoyed by young people complaining that they cannot do AI-related research unless they join big industrial labs… well, here is my reply: academia is supposed to work on ideas that money cannot buy!

948

77,823

Satya Mallick

Satya Mallick

@LearnOpenCV

Jun 7

Generic multi-token prediction is fast but breaks object detection — it doesn't know where one box ends and the next begins. LocateAnything's fix: make the prediction block = the box itself. The block isn't arbitrary. The block IS the geometry. 3 modes: 🐢 Slow — token-by-token, stable ⚡ Fast — parallel blocks, throughput 🔀 Hybrid — fast by default, falls back on hard cases Don't force geometry into a text generation mode. Part 2 of "Why Can't AI Point to the Exact Pixels Where Objects Are?" vist.ly/56vtn #LocateAnything #ComputerVision #VLM

1:57

498

Jitendra MALIK

Satya Mallick retweeted

Jitendra MALIK

@JitendraMalikCV

Jun 5

I want to offer some unsolicited advice to computer vision researchers jumping into robotics. Don't focus too much on VLMs, VLAs etc. That's fine, but the real action is at the sensorimotor level. Most of the open problems in robotics are in manipulation, which is about hand-object interaction, and contacts and forces are central. Proprioception and tactile sensing are as important as vision. Don't get seduced by cherry-picked demos. You can't do robotics without doing robotics.

394

3,147

473,523

Satya Mallick

Satya Mallick

@LearnOpenCV

Jun 5

Most VLMs predict bounding boxes one token at a time — X1, Y1, X2, Y2. But a box isn't text. It's geometry. NVIDIA's LocateAnything predicts the entire box as one atomic unit. Parallel Box Decoding > next-token prediction for spatial outputs. (Part 1 🧵) Breakdown 👇 learnopencv.com #ComputerVision #AI

1:58

549

OpenCV - Open Source Computer Vision Library

Satya Mallick retweeted

OpenCV - Open Source Computer Vision Library @opencvlive

Jun 4

📣 AMD is now an OpenCV 5 Launch Partner & Gold Sponsor! We're teaming up to bring first-class CPU GPU acceleration to OpenCV 5, speeding up Vision AI pre- & post-processing across AMD Ryzen™, RDNA™ & ROCm™. Read more opencv.org/opencv-and-amd-an… #OpenCV #AMD #ComputerVision #AI

1,989

Satya Mallick

Satya Mallick

@LearnOpenCV

Jun 4

Instead of training a new model, what if you could just tell your AI what to look for? That's YOLOE-26 with text prompts. Hand it a list — Person, Helmet, Safety vest — get instance segmentation masks, not just boxes. Full tutorial: vist.ly/56iqy

1:33

663

Satya Mallick

Satya Mallick

@LearnOpenCV

Jun 4

0:59

629

Satya Mallick

Satya Mallick

@LearnOpenCV

Jun 2

An AI can tell you there's a cat in the image. Pointing to the exact pixels is the hard part. The reason it's slow: most VLMs spell out a bounding box one coordinate token at a time — some even split "1024" into single digits. But a box's corners are connected. Decode them independently and errors compound. That's the wall the next gen of clicking, navigating AI agents has to break. Full breakdown 👇 vist.ly/56dq3

2:16

435

Satya Mallick

Satya Mallick

@LearnOpenCV

Jun 2

Weak AI vs Strong AI, in one line: Weak AI recognizes the cat in the photo. Strong AI debates climate change with you. One is already transforming industries. The other could revolutionize everything we know. Full breakdown 👇 vist.ly/569tx #AI #AGI #MachineLearning

0:59

443

Peyman Milanfar

Satya Mallick retweeted

Peyman Milanfar

@docmilanfar

Jun 2

Euler went blind in 1771 at 64. He published 1 paper per week in 1775. After dying in 1783 he published 228 more papers. A third of all papers on math, mathematical physics & engineering mechanics in the latter part of the 18th century were his. If he was still being cited, his h-index would be around 850.

287

18,359

Satya Mallick

Satya Mallick

@LearnOpenCV

Jun 1

YOLOE-26 turns object detection into three ways of saying "find this": → Text prompt (name it) → Visual prompt (show it) → Prompt-free (let the model decide) Closed-set rigidity → open-vocabulary conversation. Tutorial benchmarks: vist.ly/565gr

1:18

673

Satya Mallick

Satya Mallick

@LearnOpenCV

May 29

Object detection is shifting from "models that recognize fixed categories" to "models that understand concepts described in language." YOLOE delivers open-vocabulary detection at full YOLO speed — text module fused into the head, zero runtime overhead. Full tutorial code: vist.ly/55v2b

2:20

565

Yann LeCun

Satya Mallick retweeted

Yann LeCun

@ylecun

May 28

Replying to @anshulkundaje

Those are orthogonal concepts. - World models trained on highly diverse data become foundation models: their encoders can be used for a wide variety of downstream tasks. - "World" refers to two things: (1) predicting the evolution of a complex system or environment, (2) predicting the evolution of a system under control and its effect on the environment (action-conditioned world model) which is a necessary component of planning.

103

1,165

81,398

SkalskiP

Satya Mallick retweeted

SkalskiP

@skalskip92

May 27

RF-DETR is now available in @huggingface transformers state of the art in both detection and segmentation, outperforming YOLO architectures - checkpoints: huggingface.co/Roboflow/mode… - demo: huggingface.co/spaces/huggin… - docs: huggingface.co/collections/m…

0:11

123

1,172

78,935

Satya Mallick

Satya Mallick

@LearnOpenCV

May 26

This robot's only job is to pretend it's your eyeball 👁️🤖 At Display Week 2026, Dr. Satya Mallick visits Gamma Scientific — the 6-axis robot AR/VR brands use to QA every headset before launch. 18 tests in one rig: contrast, parallax, MTF, color gamut, eye box. The invisible layer behind every Vision Pro. #ARVR #DisplayWeek2026 #Metrology #GammaScientific #Robotics #VisionPro

2:17

543

Satya Mallick

Satya Mallick

@LearnOpenCV

May 26

A $99 hologram. With an AI agent living inside it. Dr. Satya Mallick meets Shawn Frayne (CEO, Looking Glass Factory) at Display Week 2026 for a hands-on with the Looking Glass Go their new life-size Hololuminescent Display — SID 2026 Display of the Year. The future of display isn't a headset. 🧵 #LookingGlass #Hologram #AI #DisplayWeek2026 #LightField #SpatialComputing #Hololuminescent

2:58

378