🚀 Excited to share our latest paper: L²M: Mutual Information Scaling Law for Long-Context Language Modeling
📌We identified a new scaling law that emerges as a general pattern in long texts: this leads to important insights in designing more powerful and efficient large language models
🔍 Key Highlights:
· 📈 Discovered a fundamental pattern in how information is shared across text at different scales, showing power-law growth as texts get longer
· 🧠 Proved that LLMs need growing memory capacity to effectively handle longer documents—similar to how humans need to remember more when reading a book versus a sentence
· 🔮 Our findings show that promises of "infinite context length" face natural limits—how well models handle longer texts depends on how their memory capacity scales
· ⚙️ Tested these findings across different model architectures, confirming the link between memory scaling and long-text performance
· 💡 Offers practical insights for building better AI models that can work with everything from short messages to entire books
📝 Read our paper: arxiv.org/abs/2503.04725
🤗 See our paper on Huggingface: huggingface.co/papers/2503.0…
💻 Explore our code: github.com/LSquaredM/mutual_…
With Zhuo Chen, Oriol Mayné i Comas, Zhuotao Jin, Di Luo, Marin Soljačić
#MachineLearning#AI#LanguageModels#LLMs#LongContextModeling#InformationTheory
🚀 Thrilled to share that QuanTA, our innovative approach to highly parameter-efficient high-rank fine-tuning of LLMs, has been accepted to NeurIPS 2024! Looking forward to connecting on December 13!
📄 arxiv.org/abs/2406.00132
💻github.com/quanta-fine-tunin…
We are excited to present our work on "TENG: Time-Evolving Natural Gradient for Solving PDEs with Deep Neural Nets toward Machine Precision" at #ICML2024! Join us at Hall C 4-9, Poster #210, on July 24th, 11:30 am-1 pm CEST and virtually at Zoom: mit.zoom.us/j/97552446636.
SciCode is our new benchmark that challenges LMs to code solutions for scientific problems from advanced papers. The challenges were crafted by PhDs;
~10% of our benchmark is based on Nobel-winning research.
GPT-4 and Sonnet 3.5 get <5% ACC.
scicode-bench.github.io/ 🧵 1/6
By leveraging a structure inspired by quantum circuits, our QuanTA approach surpasses the limitations of Low-Rank Adaptation (LoRA), enabling a novel, easy-to-implement, efficient high-rank fine-tuning method for large language models.