Been heavily testing MCP with local and proprietary models, and I'm worried about memory complexity.
As long as the cache has high memory complexity, preventing larger context windows, even the simplest of MCP use cases saturate the context window before I can do anything (this also applies to low-context providers such as, ironically, Anthropic).
Either the MCP Servers are dumping too much context into my model, which is highly possible, or there's no way anyone can compete with Google's models; it's literally unusable technology if the context window isn't huge.
If this doesn't change, I don't see how Transformer-based models can be of any use being run locally.