Rhys

Rhys

632 Photos and videos

Tweets

shetty retweeted

Rhys

@RhysSullivan

Jun 5

tbh i never really cared about the agents.md vs claude.md stuff because you can just do ln -s agents.md claude.md and move on with your life

105

1,967

134,249

shetty

shetty @thestoicccoder

Jun 7

time to lock in

343

Patrick Jiang

shetty retweeted

Patrick Jiang

@patpcj

Jun 6

Introducing Harness-1, a 20B search agent trained with a state-externalizing harness. > frontier-level long-horizon search, rivaling Opus-4.6 and outperforming GPT-5.4 > Context-1-level cost and latency > externalizes candidates, evidence, verification, and search history > open-source

0:56

274

2,959

264,538

vivek

shetty retweeted

vivek

@itsreallyvivek

Jun 5

x.com/i/article/206292275075…

980

166,524

shetty

shetty @thestoicccoder

Jun 6

man how many times will the TRANSFORMER ERA end lmao

How To Prompt

@HowToPrompt__

Jun 6

Google has published a paper that might end the transformer era. For the last 7 years, every major AI, ChatGPT, Claude, Gemini, has been built on the exact same architecture: The Transformer. But Transformers have a fatal flaw. To remember context, they have to process every single word against every other word. It’s called quadratic complexity. As your prompt gets longer, the compute cost explodes. The alternative is the old-school RNN (Recurrent Neural Network). RNNs are incredibly cheap and fast, but they have a fixed memory size. If you give them a long document, they get amnesia. Until today. Google researchers published Memory Caching: RNNs with Growing Memory. And it fixes the biggest bottleneck in AI. Instead of an RNN having a fixed, rigid memory that constantly overwrites itself, Google gave it a "save" button. The technique allows the RNN to cache checkpoints of its hidden states as it reads. The memory capacity of the RNN can now dynamically grow as the sequence gets longer. They built four different variants, including sparse selective mechanisms where the AI actively chooses exactly which checkpoints matter most. The results rewrite the rules of efficiency. On long-context understanding and recall-intensive tasks, these new Memory-Cached RNNs closed the gap with Transformers. They achieved competitive accuracy without the explosive, quadratic compute cost. It perfectly bridges the gap between the cheap efficiency of an RNN and the massive capability of a Transformer. We have spent billions scaling Transformers because we thought they were the only way an AI could remember a long conversation. But Google just proved we don't need to process the whole history every single time. We just needed a smarter cache.

157

Adi el Grande

shetty retweeted

Adi el Grande

@icardo8

Jun 6

Una tenista polaca de 24 años llegó a París la semana pasada clasificada en el puesto 114 del mundo, sin patrocinadores, sin ingresos garantizados y sin certeza siquiera de poder pagar su habitación de hotel. Tuvo que ganar tres partidos de clasificación solo para entrar al cuadro principal del Abierto de Francia. El dinero de los premios solo se paga al final del torneo, así que una marca polaca de bebidas deportivas intervino discretamente y cubrió su factura de hotel. Su nombre es Maja Chwalinska. Y hoy, juega la final del Abierto de Francia. Antes de este torneo, había ganado exactamente un partido de cuadro principal de Grand Slam en toda su carrera. Luchó contra una depresión tan severa que en 2021 no podía levantarse de la cama. Se sometió a una cirugía de rodilla en 2022. Pasó años luchando en torneos menores por toda Europa solo para mantenerse a flote. Luego llegó a París, ganó tres clasificatorios y siguió ganando. Zheng Qinwen. Elise Mertens. Maria Sakkari. Diana Shnaider. Nueve partidos seguidos. Un solo set perdido. Ahora es la primera clasificatoria en la historia del Abierto de Francia en llegar a la final. La última vez que una clasificatoria alcanzó una final de Grand Slam fue Emma Raducanu en el Abierto de EE.UU. de 2021. Raducanu ganó. Simplemente por llegar a la final, Chwalinska ha ganado más dinero en premios que en toda su carrera junta. El cheque por ser subcampeona es de $1.6 millones. Si gana hoy, se lleva $3.25 millones a casa. Hace una semana no podía pagar su habitación de hotel.

488

7,012

38,304

1,564,523

Poe Zhao

shetty retweeted

Poe Zhao

@poezhao0605

Jun 6

A Shenzhen team completed full-parameter post-training of DeepSeek-V4-Pro (1.6 trillion parameters) on a Huawei Ascend 910C cluster. MFU exceeded 30%, and the training ran 1,500 steps without interruption. Two things to keep in mind. This is post-training, not pre-training. The compute requirements are an order of magnitude smaller. And 30% MFU is below what NVIDIA clusters typically achieve (40-50% ). Still, the fact that a third-party team (not Huawei itself) pulled this off on domestic chips is a meaningful data point for the Ascend ecosystem's maturity.

435

64,112

International Cyber Digest

shetty retweeted

International Cyber Digest

@IntCyberDigest

Jun 6

Same guy btw

139

907

12,807

1,032,615

Gergely Orosz

shetty retweeted

Gergely Orosz

@GergelyOrosz

Jun 6

Just learned: Software engineers used to do manual data labeling at Scale AI while Alex Wang was CEO. After he left, new leadership joined, and were HORRIFIED to learn this. Stopped it ASAP Now at Meta, software engineers are assigned manual data labeling... see the pattern?

204

157

5,664

1,075,143

Adesewa

shetty retweeted

Adesewa

@t_Duches

Jun 4

Men are so private online. A guy could be moving to another country or having his first child, and he’d still only post a random football score on his story.

1,197

14,198

121,241

2,518,866

shrey birmiwal

shetty retweeted

shrey birmiwal @shreybirmiwal

Jun 5

Hot take the infra stuff around ai is way cooler than model training

1,511

155,081

shetty

shetty @thestoicccoder

Jun 6

me fr

Shravani

@shrav_10

Jun 5

A remote job that pays in USD while I get to spend in INR.

290

Sarthak

shetty retweeted

Sarthak

@Sarthak_Sama

Jun 6

TPOT is not the same anymore @shydev69 is inactive @heyhexadecimal went offline Saleh is no longer on X @Harish_521 doesn’t flex post anymore Everyone who coded and posted like a mad man has stable job, earns well and is no longer active X is not the same anymore 🥀

106

7,092

shetty

shetty @thestoicccoder

Jun 6

191

5,065

CJ Zafir

shetty retweeted

CJ Zafir

@cjzafir

Jun 5

Our first model Mac-1 6.6B beating 3 giant models. - Haiku 4.5 - GPT 5.4 mini - Gemini 3 flash Running this model on my Macbook M3 24GB. (model takes only 7GB RAM) It searches web, call tools, ask follow-ups, tell jokes, find contacts, search files, write emails, book events, write notes, set reminders and so much Siri can't do. Read again, a 6.6B model. Will share full 2000 scenario test results & benchmark scores in 2 days.

162

108

1,827

774,143

himanshu

shetty retweeted

himanshu

@himanshustwts

Jun 5

very true. infra engineers will have generational run for years and beyond categories. - model serving / inference infra - gpu / distributed systems infra - cloud kubernetes orchestration - data infra - eval observability infra - agent / runtime infra i still think as ai advances there will be demand for areas which govern bottlenecks underneath like latency, throughput / cost / reliability / data quality and security.

shrey birmiwal @shreybirmiwal

Jun 5

Hot take the infra stuff around ai is way cooler than model training

1,308

78,342