Sameer goel

Sameer goel

9 Photos and videos

Tweets

Sameer goel @sameer_goel

Apr 9

Got my stipend today. Not much yet but working to make it bigger.

410

Sameer goel

Sameer goel @sameer_goel

Mar 31

Non technical client:we need real time video processing Me:we should use nvidia… Non technical client:Lets use tpu to train and deploy our system they are free on google collab Me:yeah but you wont be able to host it on prem Non technical client:no we can,we will just buy the tpu’s Me:nice

299

Sameer goel

Sameer goel @sameer_goel

Mar 30

Deployed a computer vision model using ONNX recently. Thought it’d be straightforward it wasn’t. First issue: outputs didn’t match PyTorch. Not completely broken, but enough drift to make predictions unreliable. Took a while to realize some ops weren’t translating cleanly during export. Then hit unsupported operators depending on the opset. Had to experiment with different versions tweak the model a bit just to get a valid graph. Performance was another surprise ONNX Runtime on CPU was actually slower than expected. Only improved after switching execution providers and testing with TensorRT. Also learned the hard way that dynamic shapes can silently mess things up if you’re not careful. Ended up using opset tuning graph simplification better runtime configuration to stabilize things. Main takeaway: training the model is the easy part. Deployment is where things actually break.

230

Sameer goel

Sameer goel @sameer_goel

Mar 27

Built a real-time intrusion detection pipeline using YOLOv8n defined an ROI polygon (red zone) detect person bboxes per frame assigned IDs via simple tracking (centroid/IOU) → green if outside, red if inside added basic debounce to avoid alert spam frame skipping for latency still tuning for shadows, occlusion, and camera angle edge cases Checkout below

0:14

197

Sameer goel

Sameer goel @sameer_goel

Mar 27

Another test case where people in a bank

0:05

118

Sameer goel

Sameer goel @sameer_goel

Mar 25

Most drone data is useless. Mine isn’t. I built a drone-based indexing system that turns raw aerial footage into searchable intelligence. • BLIP → generates dense image captions • Custom scoring engine → ranks every keyword by relevance • spaCy → auto-generates validates tags • Geospatial filtering → query data by location, not just text Stack: Python (Flask), JS, full custom pipeline. This isn’t a demo. It’s a step toward making unstructured visual data actually usable. If you’re building in CV, geospatial, or defense this is where things are going. Checkout the project- repo1-production-5fd4.up.rai…

6:46

154

Sameer goel

Sameer goel @sameer_goel

Mar 24

Computer vision engineers be like: Model is 99% accurate Meanwhile YOLO confidently detects a toaster as a dog because it saw two circles and decided close enough

Sameer goel

Sameer goel @sameer_goel

Mar 20

Goodbye Firebase Studio 2025–2027 Classic Google arc: experiment → validate → absorb Now it folds into Google AI Studio Antigravity If you built on it, you felt it.

144

Sameer goel

Sameer goel @sameer_goel

Mar 15

Why most Vision-Language Models hallucinate (and how researchers are fixing it) Lately I’ve been going deep into Vision-Language Models (VLMs). Models like •GPT‑4V •GPT‑4o •Gemini •LLaVA look insanely impressive. But there’s a big problem most people miss: VLMs hallucinate. A lot.

151

more replies

Sameer goel

Sameer goel @sameer_goel

Mar 15

The biggest insight after studying VLMs: Scaling the model does NOT automatically fix hallucinations. What matters more: • grounding • perception modules • verification loops • structured reasoning

113

Sameer goel

Sameer goel @sameer_goel

Mar 15

Curious: If you’re building with VLMs, what techniques are you using to reduce hallucinations? Grounding? Detectors? RAG for images? Drop ideas below