PhD Student at Stanford working with Tatsu Hashimoto and Jure Leskovec. Previously MIT, Oxford, CERN

Joined March 2012
Photos and videos
Marcel Rød retweeted
Excited to release a new paper today: “End-to-End Test-Time Training for Long Context”. Our method, TTT-E2E, enables models to continue learning at test-time via next-token prediction on the given context – compressing context into model weights. For our main result, we extend 3B parameter models from 8K to 128K. TTT-E2E scales with context length like full attention without maintaining keys and values for every token in the sequence. With linear-complexity, TTT-E2E is 2.7x faster than full attention at 128K tokens while achieving better performance. Paper: test-time-training.github.io… Code: github.com/test-time-trainin…
4
44
244
48,298
Marcel Rød retweeted
Our new paper, “End-to-End Test-Time Training for Long Context,” is a step towards continual learning in language models. We introduce a new method that blurs the boundary between training and inference. At test-time, our model continues learning from given context using the same next-token prediction objective as training. With this end-to-end objective, our model can efficiently compress substantial context into its weights and still use it effectively, unlocking extremely long context windows for complex reasoning and applications in agents and robotics. Paper: test-time-training.github.io… Code: github.com/test-time-trainin…
42
208
1,161
185,414