building the world's best AI scientist

Joined August 2024
632 Photos and videos
shetty retweeted
tbh i never really cared about the agents​.md vs claude​.md stuff because you can just do ln -s agents.md claude​.md and move on with your life
105
57
1,967
134,249
time to lock in
29
343
shetty retweeted
Introducing Harness-1, a 20B search agent trained with a state-externalizing harness. > frontier-level long-horizon search, rivaling Opus-4.6 and outperforming GPT-5.4 > Context-1-level cost and latency > externalizes candidates, evidence, verification, and search history > open-source
90
274
2,959
264,538
shetty retweeted

17
56
980
166,524
man how many times will the TRANSFORMER ERA end lmao
Google has published a paper that might end the transformer era. For the last 7 years, every major AI, ChatGPT, Claude, Gemini, has been built on the exact same architecture: The Transformer. But Transformers have a fatal flaw. To remember context, they have to process every single word against every other word. It’s called quadratic complexity. As your prompt gets longer, the compute cost explodes. The alternative is the old-school RNN (Recurrent Neural Network). RNNs are incredibly cheap and fast, but they have a fixed memory size. If you give them a long document, they get amnesia. Until today. Google researchers published Memory Caching: RNNs with Growing Memory. And it fixes the biggest bottleneck in AI. Instead of an RNN having a fixed, rigid memory that constantly overwrites itself, Google gave it a "save" button. The technique allows the RNN to cache checkpoints of its hidden states as it reads. The memory capacity of the RNN can now dynamically grow as the sequence gets longer. They built four different variants, including sparse selective mechanisms where the AI actively chooses exactly which checkpoints matter most. The results rewrite the rules of efficiency. On long-context understanding and recall-intensive tasks, these new Memory-Cached RNNs closed the gap with Transformers. They achieved competitive accuracy without the explosive, quadratic compute cost. It perfectly bridges the gap between the cheap efficiency of an RNN and the massive capability of a Transformer. We have spent billions scaling Transformers because we thought they were the only way an AI could remember a long conversation. But Google just proved we don't need to process the whole history every single time. We just needed a smarter cache.
2
157
shetty retweeted
Una tenista polaca de 24 años llegó a París la semana pasada clasificada en el puesto 114 del mundo, sin patrocinadores, sin ingresos garantizados y sin certeza siquiera de poder pagar su habitación de hotel. Tuvo que ganar tres partidos de clasificación solo para entrar al cuadro principal del Abierto de Francia. El dinero de los premios solo se paga al final del torneo, así que una marca polaca de bebidas deportivas intervino discretamente y cubrió su factura de hotel. Su nombre es Maja Chwalinska. Y hoy, juega la final del Abierto de Francia. Antes de este torneo, había ganado exactamente un partido de cuadro principal de Grand Slam en toda su carrera. Luchó contra una depresión tan severa que en 2021 no podía levantarse de la cama. Se sometió a una cirugía de rodilla en 2022. Pasó años luchando en torneos menores por toda Europa solo para mantenerse a flote. Luego llegó a París, ganó tres clasificatorios y siguió ganando. Zheng Qinwen. Elise Mertens. Maria Sakkari. Diana Shnaider. Nueve partidos seguidos. Un solo set perdido. Ahora es la primera clasificatoria en la historia del Abierto de Francia en llegar a la final. La última vez que una clasificatoria alcanzó una final de Grand Slam fue Emma Raducanu en el Abierto de EE.UU. de 2021. Raducanu ganó. Simplemente por llegar a la final, Chwalinska ha ganado más dinero en premios que en toda su carrera junta. El cheque por ser subcampeona es de $1.6 millones. Si gana hoy, se lleva $3.25 millones a casa. Hace una semana no podía pagar su habitación de hotel.
488
7,012
38,304
1,564,523
shetty retweeted
A Shenzhen team completed full-parameter post-training of DeepSeek-V4-Pro (1.6 trillion parameters) on a Huawei Ascend 910C cluster. MFU exceeded 30%, and the training ran 1,500 steps without interruption. Two things to keep in mind. This is post-training, not pre-training. The compute requirements are an order of magnitude smaller. And 30% MFU is below what NVIDIA clusters typically achieve (40-50% ). Still, the fact that a third-party team (not Huawei itself) pulled this off on domestic chips is a meaningful data point for the Ascend ecosystem's maturity.
15
39
435
64,112
Same guy btw
139
907
12,807
1,032,615
shetty retweeted
Just learned: Software engineers used to do manual data labeling at Scale AI while Alex Wang was CEO. After he left, new leadership joined, and were HORRIFIED to learn this. Stopped it ASAP Now at Meta, software engineers are assigned manual data labeling... see the pattern?
204
157
5,664
1,075,143
shetty retweeted
Men are so private online. A guy could be moving to another country or having his first child, and he’d still only post a random football score on his story.
1,197
14,198
121,241
2,518,866
shetty retweeted
Hot take the infra stuff around ai is way cooler than model training
58
73
1,511
155,081
me fr
A remote job that pays in USD while I get to spend in INR.
6
290
shetty retweeted
TPOT is not the same anymore @shydev69 is inactive @heyhexadecimal went offline Saleh is no longer on X @Harish_521 doesn’t flex post anymore Everyone who coded and posted like a mad man has stable job, earns well and is no longer active X is not the same anymore 🥀
31
4
106
7,092
6
10
191
5,065
shetty retweeted
Our first model Mac-1 6.6B beating 3 giant models. - Haiku 4.5 - GPT 5.4 mini - Gemini 3 flash Running this model on my Macbook M3 24GB. (model takes only 7GB RAM) It searches web, call tools, ask follow-ups, tell jokes, find contacts, search files, write emails, book events, write notes, set reminders and so much Siri can't do. Read again, a 6.6B model. Will share full 2000 scenario test results & benchmark scores in 2 days.
162
108
1,827
774,143
shetty retweeted
very true. infra engineers will have generational run for years and beyond categories. - model serving / inference infra - gpu / distributed systems infra - cloud kubernetes orchestration - data infra - eval observability infra - agent / runtime infra i still think as ai advances there will be demand for areas which govern bottlenecks underneath like latency, throughput / cost / reliability / data quality and security.
Hot take the infra stuff around ai is way cooler than model training
23
89
1,308
78,342
man you all need to step up we can't let this happen💔
Researchers confirm that Gen Z is the least sexually active young generation on record
6
391
being able to afford anything without having to bother anyone is one of the greatest feelings you'll ever have
1
22
292
shetty retweeted
introducing cursor profiles! go claim your handle at cursor.com/profile
355
96
1,827
703,893