Our LTM (Long Term Memory) mechanism needs >1,000x less compute and memory than Llama 3.1 405B’s attention. Llama 3.1 would need 638 H100s *per user* to store a 100M token KV cache. LTM needs a small fraction of one.
SSMs, RNNs, and RAG all exploit weaknesses in evals like Needle In A Haystack, so we made a new eval, HashHop:
1) Incompressible
2) Multi-hop
3) No semantic hints
4) No recency bias