Corporate strategist by day |AI startup dreamer by night | From boardroom to #DeepTech | 🇮🇳|One AI agent at a time

Joined October 2010
57 Photos and videos
SJ retweeted
Is there a prompt guide for Fable?. Fable uses most of the quota in just few prompts and still feels nerfed. I tried to use fable for a serious task like product analysis. It gave sharp analysis however looks like model is shy about tool calls. It doesn't want to collect a lot of information and I had to push it hard to do real analysis. Analysis overall is sharper than Opus , however this feels like a nerfed model . Is there prompt guide or direction how to use this model effectively?
1
2
84
SJ retweeted
I built a small visualization layer on top of a local Qwen3 in Pure C to understand LLM output Image shows why sampling is not greedy decoding: a lower-probability token can still get selected when temperature/top-p keep it inside the candidate pool. I would also love feedback on what would make a visualization like this more useful for learning: - KV cache view? - attention heatmaps? - speculative decoding comparison? - greedy vs top-p side-by-side?
1
1
4
106
SJ retweeted
Claude Cowork with blender is so much fun, still work in progress will post the final scene soon. Trying out if it can build basic geometry nodes scene like waves hitting a beach 🌊🏖️
1
3
4
63
SJ retweeted
52% of MCP servers are dead within 90 days. But the median server has 6 commits — lifetime. The protocol works. The logic layer doesn't exist. Content goes stale. Tools stay isolated. Nobody monitors what fails. Full research: fetchlens.ai/research/mcp-se…
4
4
64
SJ retweeted
We are in “Buy an iphone or train your own model” era. Full ultra lean 1B model trained for one day at a budget of 💰1000🫡
Introducing HRM-Text. An ultra-lean 1B-parameter reasoning language model designed to deliver strong general performance with a fraction of the data, compute, and infrastructure. Trained on just 40B structured tokens, HRM-Text achieves competitive performance while using ~1/1000 of the training data of comparable models. The kicker? The full model trains in roughly one day on a $1,000 budget. This opens the door to a new generation of AI that is powerful, accessible, and radically easier to adapt. Theories and research concepts once deemed too expensive to test are officially back in the game. Sapient Intelligence invites you to help us shape a new paradigm for general intelligence.
1
5
127
SJ retweeted
New UI Preview feature on Claude Code is really great. I gave it a screenshot and asked it to make a navbar prettier. Instead of immediately editing CSS, it first asked me to choose a direction: Refined gold pill Sparkle prefix Glow halo around text That is the part I found useful. For frontend work, “make it prettier” is not a coding instruction. It is a taste decision. Claude Code did not jump straight from prompt to diff. It stopped at the subjective layer first. The flow felt like: visual context → design options → human choice → code edit All in a single clean flow.
4
6
55
SJ retweeted
This MTP pull request merge is getting more attention than many model drops. I first noticed MTP while looking at Qwen3.5-0.8B, and now llama.cpp support makes the whole thing more interesting. My current understanding is that MTP mainly improves token generation, not prompt processing. So it helps when the model is writing a lot: chat, coding, long answers, agents, synthetic data, local assistants. But if the workload is mostly huge prompt short answer, then prompt processing is still the bottleneck. People are mentioning around 1.5x to 1.8x faster token generation in some setups. My question is: how useful is this overall in real local AI workflows? Is MTP going to matter mainly for long generation and agent loops, or will it become a default feature people expect in models?
3
3
77
SJ retweeted
This is one of the most crucial lessons in First Break AI. It teaches you how to navigate @huggingface like a pro. Not just: download model → run notebook → move on In this lesson, we go deeper. We look at how open model repos are structured, how to read model files, how config.json connects to the actual model class, and how to trace from a Hugging Face model page into the Transformers code that runs the model. We use Qwen3-0.6B as the learning model. We also look at why Markdown matters so much in AI workflows: model cards, GitHub issues, README files, Discord, Cursor, Claude Code, planning docs, and AI-assisted work. Then comes the biggest win: datasets. Working with datasets is a core AI engineering skill. I show 3 ways to analyze datasets on Hugging Face: Croissant endpoint Data Studio / browser viewer load_dataset with Python, pandas, and plots We inspect dataset structure, categories, response lengths, distribution, short examples, long examples, and how to think about dataset quality before using it for training or fine-tuning. And this sets up the next part: running Qwen3 directly in C, without treating Transformers as magic. Lesson 01: Hugging Face Beyond Upload Watch: youtu.be/MjZio-A9oUY Free cohort: cohort.bubblnet.com/lessons/…
5
10
277
SJ retweeted
🚨 Major supply-chain attack: Mini Shai-Hulud is back Reported impact: • 170 npm/PyPI packages affected • 400 malicious package versions • 42 TanStack packages, including @tanstack/react-router • UiPath packages • Mistral AI SDK packages on npm/PyPI • OpenSearch JS client • Guardrails AI others This one targets developer CI/CD secrets, so teams should check lockfiles, CI logs, npm/PyPI installs, and rotate exposed tokens.
‼️🚨 UPDATE: The TanStack npm attack is now a full campaign. 'Mini' Shai-Hulud has hit: - OpenSearch - Mistral AI - Guardrails AI -UiPath - Squawk packages across npm and PyPI The malware specifically targets AI developer tooling. It hooks into Claude Code (.claude/settings.json) and VS Code (.vscode/tasks.json) to re-execute on every tool event, long after the infected package is gone. npm uninstall does not fix this.
1
2
5
344
SJ retweeted
Where are small Models like Qwen3 0.6B and Qwen3.5 0.8B used ? Huggingface shows 2.88 million downloads this month. I can see 2.88 million downloads per month for small Qwen3.5 model. I tried using earlier model 0.6B in a deep resarch workflow and it was very difficult to get something done with this model . Firstly they have a very surface level understanding of concepts. Poor Semantic understand means they can get confused about the topic or the task. Json outputs are often broken . Adding a layer of checks on top took much of my time while working with these models. Slow resposne. This one depends on a lot of factors and can actullay be improved , still slow response is a buzz kill most of the time I am very curious how is the community using these models.
2
2
6
223
SJ retweeted
I used to get irked by the rituals around me, whether social, religious, or community-based. I did not even value the rituals I had unintentionally created for myself. The passage defines rituals as “time architecture.” We cannot inhabit time if its flow is not shaped or held by something. Today, I see rituals as anchors. They are tools that help us center ourselves, calm the mind, and reduce anxiety, especially during turbulent times.
thinking about rituals, why they matter and what we lose when they’re gone
1
6
77
SJ retweeted
May 9
t=0,d=5e-4,draw=_=>{t ||createCanvas(w=400,w);background(9).stroke(w,96);for(x=y=z=9,i=3e4;i--;point((q=x*(e=sin(t*PI/20-x*x/99 i%9) 1) 89)*cos(k=z/59-e/29 t*PI/480 i%9*8) 200,200-(q 60*cos(k/2))*sin(k)))[x,y,z]=[x 9*(y-x)*d,y (x*(28-z)-y)*d,z (x*y-z-z)*d]}//#つぶやきProcessing
27
175
1,059
35,110
SJ retweeted
First Break AI cohort.bubblnet.com/ Cohort: 1 May 2026 — 30 June 2026 (2 months) 3⃣ intuitions that make LLMs click: 🗿The model is a pipeline For Qwen3-0.6B: Input → text embeddings → Qwen3DecoderLayer ×28 → RMSNorm → lm_head → output From far away, it looks simple. Most of the intelligence is inside the repeated decoder layers. 🗿 LLMs generate one token at a time. They are causal autoregressive models. At inference time, the model sees the entire context so far, but it cannot see future tokens. The loop is: current context → predict next token → append token → new context → repeat So the model does not produce the full answer in one shot. It keeps extending the sequence one token at a time. Each new token becomes part of the context for the next prediction. During training, the full sequence can be passed in at once, but a causal mask prevents each token from looking ahead. 🗿 The model does not directly output one word. At every step, it outputs probabilities over the full vocabulary. For Qwen3-0.6B, that vocabulary is 151,936 possible tokens. Decoding then chooses the next token.
2
3
63
SJ retweeted
Calcutta cabbies stopping on their own and asking kothay jaben is not the poriborton I was hoping for TBH
37
79
1,040
35,649
SJ retweeted
- Gothically gorgeous - Hauntingly beautiful
May 7
a=(x,y,d=mag(k=4*cos(x/21),e=y/8-20))=>circle((q=3*sin(k*2) .3/k sin(y/19)*k*(9 2*sin(e*14-d*3 t*2))) 50*cos(c=d-t) 200,q*sin(c) d*39-475,k*k>15?2:1) t=0,draw=$=>{t||createCanvas(w=400,w);background(9).noStroke().fill(w,116);for(t =PI/240,i=1e4;i--;)a(i,i/235)}#つぶやきProcessing
3
4
56
SJ retweeted
👇Epic prompt for learning , create clean Japanese style posters. Use claude to create 3d mockups instantly. Mockups could be better looking for a way to generate better ones. Interestingly on Qwen 3.5 0.8B I came across an MTP ( Multi token prediction side branch) on the mockup. This is used for speculative decoding in models and apparently other models are shipping this too ( deepseek , GLM - need to check). NORMAL PATH tokens / vision tokens ↓ 24 decoder layers ↓ RMSNorm ↓ tied LM head ↓ predict next token t 1 MTP SIDE PATH main hidden state token embedding ↓ fusion projection: fc.weight [1024, 2048] ↓ one small decoder-like layer: mtp.layers.0 ↓ mtp norm ↓ same / tied vocab projection ↓ draft token t 2 / t 3 ... I will include this topic in our cohort as well cohort.bubblnet.com/
Replying to @BenBackus4
try this. it will be bit more scoped. Create a premium English-language poster about [Topic] in a Japanese-inspired modern editorial science-graphic style: off-white textured paper, deep black vertical serif typography, electric cobalt blue technical linework, acid green highlight accents, thin rule lines, boxed annotations, sparse labels, asymmetrical grid, large negative space, subtle risograph grain and halftone texture. Include a dramatic central abstract diagram explaining [Topic], small side-panel mini diagrams, and refined infographic details. Make it feel like a gallery-worthy mathematical design poster: disciplined, minimal, cerebral, intense, poetic, and visually powerful. No Japanese text, no glossy 3D, no cartooniness, no clutter.
2
4
205
SJ retweeted
“Once you start learning about stuff , the density of accessible information increases in an extremely literal sense: You are able to engage with more of the world than you were before, even though the amount of physical world around you has not changed” - Quote from “How I read ” This is a great way to put into words my reaction when I discover a new perspective. One word in Mumbai slang describes this perfectly: “Aaila.” Link to the Original post: open.substack.com/pub/intern…

new essay: people are portals certain people increase the resolution of our reality i wrote about borrowed attention and how other people teach us what to notice 🪄🪞 velvetnoise.substack.com/p/p…
1
3
5
42
SJ retweeted
Context: There is a song called “Mach Chor,” composed by ISF against Soukat Molla, TMC candidate from Bhangar. ISF won against Soukat.

3
11
190
30,520
SJ retweeted
Only 5 hour limits are doubled which means weekly limits will hit even faster 👽🤣
Usage limits are up, effective today we're: 1) Doubling Claude Code's 5-hour limits for Pro, Max, Team and seat-based Enterprise plans 2) Removing peak hours limit reduction on Claude Code for Pro and Max plans 3) Substantially raising our API rate limits for Opus models
4
6
79
SJ retweeted
Cost/Perf tradeoffs & Evals are the most requested topics for this cohort. I was not expecting these to make top 3. Real life signals are always different from my assumptions.
First Break AI Your first break in AI — a guided journey from first commit to capstone Free, open cohort to upskill in training, inference, and AI product building. cohort.bubblnet.com/ Easy to follow Roadmap & AI Podcast guided journey are up. Weekly office hours (Friday) Join Discord Server: discord.gg/UfwdKvfku
1
5
6
73