Burn tokens, save lives.

Joined May 2016
9 Photos and videos
Pinned Tweet
I have found myself wondering about "Why do i not derive happiness from the same things as everyone else?" Am i a fool? Is happiness now a mere function of a reality that is too far to achieve? A man sitting on a gold mine perishes too often from his inability to start digging.
1
83
"I am more capable than I ever imagined" A general affirmation that parents need to make sure that their kids know and repeat. People have so much more to offer yet offer so little, out of sheer laziness.
3
This is why the supporters of @elonmusk work hard to "defend him." Because it's not about Elon. He's a symbol of progress.
64
309
1,877
45,393
Siddhesh retweeted
Replying to @bourscheid
No, you don't get it. He does not have $1 trillion sitting in cash, it is 99% stock in his companies. To make that wealth liquid would mean selling all that stock which would swiftly destroy *both* the companies (Tesla, SpaceX, others) and the wealth. If he sold it all, he'd end up with maybe $100b max, several hundred thousand people would be out of work, the companies ruined and many of their suppliers also ruined. Okay, but now Elon has $100b in cash, and can "solve the world's problems". $100b divided by the world's 8 billion people is $12 If you were in charge, several of the most innovative industrial companies in the world would be destroyed, hundreds of thousands out of work, and space would again close to human civilization for another generation. But everyone on earth could have one nice meal and you could revel in your altruism.
672
2,720
31,493
909,855
Siddhesh retweeted
This paragraph by Haruki Murakami hits very hard: “Once the storm is over, you won’t remember how you made it through, how you managed to survive. You won’t even be sure, whether the storm is really over. But one thing is certain. When you come out of the storm, you won’t be the same person who walked in. That’s what this storm’s all about.”
59
2,333
10,862
354,048
The danger of deep knowledge is that it locks you into yesterday’s solutions. ​True intelligence isn’t about how much information you hold, but how quickly you can discard it when the game changes. #intelligence #in #ai
10
Most discussions are just echo chambers disguised as intellect. If you can’t debate without taking it personally, you’re not conversing—you’re just contributing to the polarization. ​Separation of ideas from identity is a rare skill. Learn to argue better. #debate
1
10
Smarter than most people are drunk.
3
Do not be attached to things, people and places. You are fleeting in human existence, and you are not living the life, god lives it through you. Feel everything, but with detachment.
10
Someone give me slower models please. I need slow, high accuracy, cost effective models. 5 minute response times are okay. Just slash my costs.
5
The model costs at this token economics have stopped to make sense. So much crap is getting built without delivering real value. We need DeepSeekV4 level costs at current productivity levels. #AIInference #costlyAI
12
Efficiency drops the more things you have on your plate. You slowly burn the human context, and no amount of AI summarization can replace that. Suddenly you are building 3 projects, 8 side projects and getting lost in the files you no longer can track! #AI
5
For those talking about opus with CC and codex, y'all really havent tried @grok build and @Cursor 's composer. No wonder elon wants to buy them for 60B. Heck, it is daylight robbery at 60B. This is gonna be fun to watch.
1
37
Computational Drug Discovery sounds so interesting. I would love to see any fancy resources. I believe agentic drug discovery is already going to be the next great thing, but who brings the connection to life, that is the question. #StayTunedForMore #Compute
1
7
Siddhesh retweeted
SpaceX has almost finished writing V1.0 of an in-house AI training stack in C that exact-maps to 220k GB300s with 800G NICs, making heavy use of pipeline parallelism and getting as close to bare metal as possible. The potential speed improvement vs JAX for large training runs is over an order of magnitude.
7,343
10,029
96,483
30,620,824
I emailed @karpathy for this. Might as well get the advice from the GOAT, am i right?
Thinking of hitting the restart button on my career soon at this point. Is it too late to hit restart at 30?
13
Thinking of hitting the restart button on my career soon at this point. Is it too late to hit restart at 30?
1
25
I am never buying a @GooglePixel_US phone again. My phone is messed up and losing connectivity due to software issues. Support said, this is not covered, went to the repair shop, estimated cost, $550. I used this phone for just 1.5 years. Thought i had a lifelong phone. @Google
1
2
2,392
Siddhesh retweeted
Today we release Token Superposition Training (TST), a modification to the standard LLM pretraining loop that produces a 2-3× wall-clock speedup at matched FLOPs without changing the model architecture, optimizer, tokenizer, or training data. During the first third of training, the model reads and predicts contiguous bags of tokens, averaging their embeddings on the input side and predicting the next bag with a modified cross-entropy on the output side. For the remainder of the run, it trains normally on next-token prediction. The inference-time model is identical to one produced by conventional pretraining. Validated at 270M, 600M, and 3B dense scales, and at 10B-A1B MoE. The work on TST was led by @bloc97_, @gigant_theo, and @theemozilla.
150
415
3,695
448,590
Siddhesh retweeted
Incentives drive outcomes
Elon Musk: "If you punish people too much for failure, then they will respond accordingly, and the innovation you will get will be very incrementalist Nobody's gonna try anything bold for fear of getting fired or being punished in some way. So risk-reward must be balanced and favor taking bold moves, otherwise it will not happen"
6,056
20,496
126,562
18,158,069
The AI race is optimizing for the wrong frontier. I do not need an LLM to memorize everything. I need it to reason, admit uncertainty, extract signal from missing context, and know when to look things up. Perfect recall is not intelligence. A model that hallucinates with confidence is not advanced; it is expensive autocomplete with bad epistemics. The real frontier should be math, code, reasoning, planning, debugging, scientific problem-solving, and the ability to decompose vague tasks into executable steps. General knowledge beyond that should be retrieved from trusted sources, not buried inside trillion-parameter weights. We keep trying to make one model become a programmer, chemist, doctor, therapist, lawyer, teacher, researcher, and encyclopedia at the same time. Why? That is not elegance. That is brute force. We could build much smaller models with stronger reasoning loops, tighter tool integration, calibrated uncertainty, and access to curated expert databases. Let the model think. Let the database remember. Let the system verify. Coding already proves this. There is no magic “complex code” skill. Good coding is decomposition: define the task, break it into fundamentals, execute, test, inspect, revise, repeat. Intelligence lives in the loop, not in static recall. The next leap is not just bigger models. It is models that can navigate context, ask better questions, use external knowledge, reason through ambiguity, and then forget what they no longer need. We need global, curated, expert-level wiki databases that LLMs can interact with directly. Free, structured, verifiable, and built for reasoning agents. We need specialized models where specialization matters, and general reasoning models where reasoning matters. Training trillion-parameter models to memorize the world is inefficient, fragile, and bad for the hype cycle. The better future is smaller, sharper, more honest systems that solve problems instead of pretending to know everything. Intelligence is not knowing every fact. Intelligence is knowing what matters, what is missing, what to check, and what to do next. #AI #LLM #AIFrontier #Knowledge
1
3
28