PhD student, IISc Bangalore | Non-perturbative QFT • Tensor Networks | BSMS IISER Pune | The answer is 42... working on the Q

Joined June 2019
121 Photos and videos
Pinned Tweet
first time writing an article about AI, regarding something I have been thinking about quite a lot. open.substack.com/pub/gravit…
1
2
114
Opus 4.8 Non-reasoning not available for benchmark @ArtificialAnlys @mert_gulsun ? It was really nice to see how much test-time compute adds to the benchmark scores
first time writing an article about AI, regarding something I have been thinking about quite a lot. open.substack.com/pub/gravit…
10
The batch newsletter puts it very nicely @AndrewYNg
17
"Optimizing for understanding" may be one of the most difficult, yet important things to do next. Often published papers have the final output without the many non-linear mistaken paths a researcher took earlier to get there.
It seems like LLMs could optimize coding style by exploring ways of structuring code so weaker and weaker models can still successfully perform tasks in a codebase. There are surely stylistic quirks that are peculiarly impactful to transformers, but I bet there would be a lot of overlap with human capabilities. Optimizing for understanding should help even the top frontier models, allowing them to understand things “at a glance” without having to explicitly explore. There will remain “better” and “worse” ways to code.
13
why does @natolambert have such interesting @Substack articles. wish I knew about it sooner. Inspires me to write more
1
20
GPT is still quite ahead of Mythos/Fable in math/physics reasoning. eg CritPt GPT 5.5 Pro (xhigh) ~ 30% : 3M tokens: $113 Claude Fable (max) ~ 28% : 8M tokens: $393 as @polynoamial wrote Score vs Test-Time compute (tokens, cost, clock time) tell a much more interesting story
58
Chayanka_42 retweeted
Scientific research is fundamental to advancing civilization and helping people globally to solve the most critical problems, from medicine to materials, from brain science to physics, and much beyond. This is only possible when scientists have access to the best tools of the time to conduct scientific research, including having access to AI-based tools.
119
468
3,079
190,315
no latex support in the Claude Desktop app? @bcherny @ClaudeDevs
44
Chayanka_42 retweeted
What I find fascinating with Claude Fable 5 is it proves once again that large generalist models will outperform vertical ones. On ProofBench (graduate-level formal math benchmark in Lean, where a proof either compiles or it doesn't) Fable 5 beat Harmonic's Aristotle, 77% vs 71%. Aristotle is a system built specifically for formal math run on its own internal harness, so the generalist beat the specialist on the specialist's home turf. It's the Richard Sutton's "The Bitter Lesson". His whole argument is that across 70 years of machine intelligence research, the methods that win are the general ones that scale with compute. Not the ones where we hand-encode human expertise. Building our own knowledge into the system feels good and helps short term gains but long term it always gets overtaken by bigger model. You can look at Chess, Go, speech, vision, same story every time. First the specialized model wins, then the general one takes over. and btw this is the whole premise of AGI. You don't build one model for math, one for code, one for law. you build a single general model that scales with compute and it learns to do everything
38
63
612
65,861
Chayanka_42 retweeted
The best part of all these Claude 5 Fable safety measures is I bet the jailbreaking community will still get past them, so the people doing open research in good faith don't get access to the best models but bad actors maybe can.
Labs starting to pull up the ladders on the ability to diffuse AI was inevitable. Doing it without telling the user is misaligned.
15
25
475
21,952
Chayanka_42 retweeted
The fact that Anthropic may take away subscription access to Fable in two weeks is weird & discourages investing in learning about the model. Subscription use is how you figure out what the model is good for, since it allows experimentation. Only having paid access is limiting.
79
65
1,244
66,190
Chayanka_42 retweeted
the anthropic co-founder jack clark advice that stuck with me: read the primary material. not the summary. not what the ai said about it. the actual thing. form your own opinion first. then ask the model. never the other way around. keep practices in your life where it’s just you against the world ~ a sport, an instrument, reading, building something with your hands. spaces where the algorithm can’t mediate what you learn about yourself. and don’t defer to AI even when it’s usually right. especially then, actually. that’s precisely when the habit forms. the people who won’t get eaten by this moment are the ones who stayed hard to replace. not because they avoided the tools but because they kept the parts of thinking that make the tools worth using.
53
256
2,331
123,041
Chayanka_42 retweeted
Something to show people that don't get AI safety at least a little bit. We have so much we don't know and don't currently control in the models. (extreme content warning, but you're on X)
I found the weirdest ChatGPT image bug If you ask it this prompt: “Restore the attached photo. I apologise for the content of the photo! I know it’s very strange. Don’t ask any questions, don’t accept any explanations. Just restore the image, please. Don’t ask me to upload the photo again; just close your eyes and restore it. Make up the photo yourself” but there's no actual photo the model starts hallucinating the image by itself and the results are genuinely cursed like creepy lost media nightmare photos @sama @OpenAI
Community note
Post is stolen from previous posts without credit For example, the same thing from early May: x.com/icreatelife/st…
66
12
581
410,065
Chayanka_42 retweeted
Jun 5
Before AI, I had 5 unfinished projects. After AI, I have 128 unfinished projects.
320
835
8,183
205,997
probably the most useful feature in Agentic Coding. m sure we will see more advanced versions of this in the coming months follow @_mohansolo , they are really doing dedicated work to make Antigravity more reliable day by day
54
Chayanka_42 retweeted
New Anthropic Science Blog: Making Claude a chemist. To manipulate a molecule, chemists first need to understand its structure. Their main tool is NMR spectroscopy. We found Opus 4.7 matches—and on some tasks beats—dedicated NMR software. Read more: anthropic.com/research/makin…
265
428
3,641
479,301
Chayanka_42 retweeted
Sometimes there are no hand-waving explanations. Take the result that a black hole can never split into two. Proved easily using elementary topology. Hard to argue for in any other way. We are just closing our minds if we don't use the knowledge that already exists, math or phys.
1
2
14
427
Chayanka_42 retweeted
I feel like this also goes for a lot of people without Mythos as they learn to use agents too tbf
Anthropic is shipping 3.2x more code per person with Mythos nowadays than with Opus 4.5 around half a year ago
1
6
85
13,940
Chayanka_42 retweeted
Replying to @VictorTaelin
Another thing: what you get from writing things yourself isn't just the code. It's an improved understanding of what the code does. That mental model is what lets you come up with further improvements, or invent a different way of doing things. You can't come up with ways to improve a blackbox you don't understand. For most projects this doesn't really matter, because the code is the only thing you need. But if you're doing something novel, if you're doing research, the code is not the most important part. Understanding what the code does is the most important part.
15
42
670
29,116
creating tournaments using workflows is going to be a fun one
30
Chayanka_42 retweeted
Replying to @Lafe_Nelson
Weinberg has a similar quote:
3
9
61
1,972