Training the models. Prior: Saronic, Anduril, Oculus VR, Game Closure, MSEE@GATech

Joined July 2014
1,480 Photos and videos
Pinned Tweet
Jun 15
Implemented teamcodex which is a port of teamclaude to codex. Just saw it successfully seamlessly switch between two Pro plans when the first one ran out of tokens for the week: github.com/catid/teamcodex
1
435
Really interesting thing about this new SPS idea to me is that top@k can be done per token with a shared KV cache by injecting random noise at the input of the prediction latent stream. Batch all the matrix operations with no additional KV cache/bandwidth needed.
1
1
92
You get rid of the temperature parameter at inference and just choose how much noise to inject instead and batch size. It allows models to scale better with FLOPs instead of being limited by memory bandwidth. Exciting to me.
1
1
75
Normal top@k requires different KV cache for each branch the LLM takes so memory explodes
63
Hm does this unify AR and diffusion models?
1
60
The answer appears to be yes and there's also a lot of different options in how to do it
47
Adds a 3 layer GELU MLP to predict the residual latents between each token in each SGD batch. Loss function is L1 between latents and the predicted token distribution (KL)
Next-token prediction is myopic. What if transformers learn to predict their own next latent state? ๐ŸŒ  We present ๐—ก๐—ฒ๐˜…๐˜-๐—Ÿ๐—ฎ๐˜๐—ฒ๐—ป๐˜ ๐—ฃ๐—ฟ๐—ฒ๐—ฑ๐—ถ๐—ฐ๐˜๐—ถ๐—ผ๐—ป (๐—ก๐—ฒ๐˜…๐˜๐—Ÿ๐—ฎ๐˜): a self-supervised learning method that teaches transformers to form compact world models for reasoning and planning. It also unlocks up to 3.3x faster inference via self-speculative decoding! ๐Ÿš€
6
344
catid retweeted
Next-token prediction is myopic. What if transformers learn to predict their own next latent state? ๐ŸŒ  We present ๐—ก๐—ฒ๐˜…๐˜-๐—Ÿ๐—ฎ๐˜๐—ฒ๐—ป๐˜ ๐—ฃ๐—ฟ๐—ฒ๐—ฑ๐—ถ๐—ฐ๐˜๐—ถ๐—ผ๐—ป (๐—ก๐—ฒ๐˜…๐˜๐—Ÿ๐—ฎ๐˜): a self-supervised learning method that teaches transformers to form compact world models for reasoning and planning. It also unlocks up to 3.3x faster inference via self-speculative decoding! ๐Ÿš€
22
124
796
46,955
catid retweeted
5/ On MIKASA-Robo, success rate jumps 0.42โ†’0.84, and held-out tasks with shared memory structure go 0.07โ†’0.23. On LIBERO it holds at 96.2% - recurrence doesn't hurt when memory isn't needed.
1
1
2
249
Jun 15
Implemented teamcodex which is a port of teamclaude to codex. Just saw it successfully seamlessly switch between two Pro plans when the first one ran out of tokens for the week: github.com/catid/teamcodex
1
435
Jun 15
Seems to be about 10x cheaper to do it this way than with API
283
Jun 15
I miss Fable. For a few days I was getting so much more done :(
90
catid retweeted
Jun 13
Intelligence should be open, accessible, and ready to build with, empowering every developer, everywhere. GLM-5.2 is now available to all GLM Coding Plan users, including Lite, Pro, Max, and Team plans. docs.z.ai/devpack/latest-modโ€ฆ As our new flagship model, GLM-5.2 delivers powerful coding capabilities, usable 1M-context support, and continued strengths in long-horizon tasks. API and Chatbot services will launch next week. The model will also be officially open-sourced next week under the MIT License. The future of AI is open, and it belongs to the people.
362
997
8,370
2,522,007
Jun 13
Who could have seen this coming oh no
Replying to @AnthropicAI
The state of things:
256
Jun 13
All models can be jailbroken
1
120
catid retweeted
The US government, citing national security authorities, has issued an export control directive to suspend all access to Fable 5 and Mythos 5 by any foreign national, whether inside or outside the United States, including foreign national Anthropic employees. The net effect of this order is that we must abruptly disable Fable 5 and Mythos 5 for all our customers to ensure compliance. Access to all other Claude models is not affected. We apologize for this disruption to our customers. We believe this is a misunderstanding and are working to restore access as soon as possible. Read our full statement: anthropic.com/news/fable-mytโ€ฆ
12,600
25,782
88,121
90,222,978