A few interesting challenges in extending context windows.
A model with a big prompt =/= "infinite context" in my mind. 10M tokens of context is not exactly on the path to infinite context.
Instead, it requires a streaming model that has
- an efficient state with fast incremental state updates
- strong compression & abstraction of multimodal history
- effective at reasoning over state to generate future action
(The model needs to be able to run forever.)
I dislike the phrase "infinite context" as a matter of taste, since it suggests an extension of the retrieval based paradigm where you reason over context windows in order to answer a query. That's not quite what's going on.
Instead, memory is about building useful abstraction over histories. It's nice to think of it as a state with a choice of method for storing/updating information -- this is more in line with a model that always runs and streams in information. Transformers / SSMs are just different ways to choose how to model the state and update it. There are more abstractions left to uncover.
There are also new data regimes and training algorithms required to produce these kind of long-term memory models. Interaction is a core part of the way these models operate.
We don't seem to have the right evals that measure useful proxies that are in line with this notion of memory. Ideally, it's not just about being able to remember facts -- the model should get better at retaining skills (compound knowledge), and learning new skills over time (learning to learn faster).
This is a very interesting direction that I'm personally super excited to be working on.
Sam Altman: 10m context window in months, infinite context within several years