Vidhan Bansal

Vidhan Bansal

2 Photos and videos

Tweets

Thought LLMs were just prompt→response Then I dug deeper • Why are production systems so much faster? • Why memory becomes a bottleneck? → Continuous batching → Paged attention → Speculative decoding Harder problems begin underneath: latency, memory, throughput, serving.

Vidhan Bansal

Vidhan Bansal @mudio66

Jun 11

Thought research reproduction meant: Read paper → run code → verify results. Reality 🥲 Working through BioReason (DNA Qwen GRPO): broken imports, version mismatches, flash-attention/NVCC issues, Slurm debugging, config tuning… Fix one thing → another breaks.

Vidhan Bansal

Vidhan Bansal @mudio66

Jun 2

Coding with AI still feels broken. Repo in mind. Error visible. Yet we stop… to type explaining context again. Why still code like this? Been building Voker: a repo-aware voice coding agent. AI going in wrong direction? Just speak again. No restart. No prompt. No broken flow

0:24

117

Vidhan Bansal

Vidhan Bansal @mudio66

May 17

Went to verify a paper's results, I was like "Yeah! Let's do something fun today🫠" Reality: - random dependency issues - glibc confusion - disk quota errors Now, my hair is messy, eyes are sleepy and here I'm questioning my life choices 😭.

118

Vidhan Bansal

Vidhan Bansal @mudio66

Apr 21

RAG can retrieve “relevant” chunks and still miss the exact answer. Similarity ≠ correct context. Compared vector RAG vs PageIndex (cached indexing → retrieval only) Different retrieval → different answers. Try it: vidhan66-compare-rag.hf.spac…

117

Vidhan Bansal

Vidhan Bansal @mudio66

Apr 12

Everyone’s talking about OpenClaw. I tried building a 2-agent setup with a validator skill. Skill was: - registered - visible But never applied. In LLM frameworks: registration ≠ execution Full version: linkedin.com/posts/vidhan-ba…

Vidhan Bansal

Vidhan Bansal @mudio66

Apr 4

Most people use routing in LLM systems for intent: → query type → pipeline But the real use is when the system isn’t confident: → retry → ask follow-ups → expand retrieval Routing = handling failure states, not just intent.

Vidhan Bansal

Vidhan Bansal @mudio66

Mar 31

Most notes are useless. You read → don’t get it → go to YouTube. I’m building: Upload notes/PDF → AI generates a short whiteboard teaching video. Trying to validate if this is actually useful. 1 min form: forms.gle/NhRa1CELwKTUG4k57 Brutal honesty > fake validation.

AI Tutor

Upload your notes, PDFs, or study material and get a short video where an AI teaches the content step-by-step on a whiteboard—just like a real teacher explaining while writing. The system converts...

docs.google.com

Vidhan Bansal

Vidhan Bansal @mudio66

Mar 28

Most LLM apps fail here: They answer incomplete queries. Better approach: Detect missing fields → ask targeted questions → then answer Not the other way around.

Vidhan Bansal

Vidhan Bansal @mudio66

Mar 21

Why refusal logic is necessary in LLM systems — One thing I noticed while building LLM systems: The model almost always answers even when it shouldn’t. That’s the core issue behind hallucinations.

more replies

Vidhan Bansal

Vidhan Bansal @mudio66

Mar 21

For scaling: - ANN-based indexing becomes important as the dataset grows (I haven’t implemented yet) Takeaway: Hallucination control isn’t just a model problem. It’s a system design problem. If you don’t define when to refuse, your system will always answer even when it’s wrong.

Vidhan Bansal

Vidhan Bansal @mudio66

Mar 21

Next, I’ll share how I handle ambiguity and make LLMs ask better follow up questions instead of answering too early. If you are also into this field, I would love to know your approach and where I can improve it.

Vidhan Bansal

Vidhan Bansal @mudio66

Mar 14

How my first LLM app turned into a maintenance nightmare — At the start everything lived in one file: • prompts • retrieval logic • API calls • tools • database queries It worked… until the system started growing.

more replies

Vidhan Bansal

Vidhan Bansal @mudio66

Mar 14

But prompts were still embedded in code. That made iteration painful. Every prompt change required touching application logic. The final improvement: Externalizing prompts into YAML configs. Now prompt changes didn't require modifying the core system.

Vidhan Bansal

Vidhan Bansal @mudio66

Mar 14

Big realization: LLM apps behave more like systems engineering problems than simple scripts. Modularity becomes important much earlier than expected.