Huge congrats to
@AIatMeta on the Llama 3.1 release!
Few notes:
Today, with the 405B model release, is the first time that a frontier-capability LLM is available to everyone to work with and build on. The model appears to be GPT-4 / Claude 3.5 Sonnet grade and the weights are open and permissively licensed, including commercial use, synthetic data generation, distillation and finetuning. This is an actual, open, frontier-capability LLM release from Meta. The release includes a lot more, e.g. including a 92-page PDF with a lot of detail about the model:
ai.meta.com/research/publica…
The philosophy underlying this release is in this longread from Zuck, well worth reading as it nicely covers all the major points and arguments in favor of the open AI ecosystem worldview:
"Open Source AI is the Path Forward"
facebook.com/4/posts/1011571…
I like to say that it is still very early days, that we are back in the ~1980s of computing all over again, that LLMs are a next major computing paradigm, and Meta is clearly positioning itself to be the open ecosystem leader of it.
- People will prompt and RAG the models.
- People will finetune the models.
- People will distill them into smaller expert models for narrow tasks and applications.
- People will study, benchmark, optimize.
Open ecosystems also self-organize in modular ways into products apps and services, where each party can contribute their own unique expertise. One example from this morning is
@GroqInc , who built a new chip that inferences LLMs *really fast*. They've already integrated Llama 3.1 models and appear to be able to inference the 8B model ~instantly:
x.com/karpathy/status/181580…
And (I can't seem to try it due to server pressure) the 405B running on Groq is probably the highest capability, fastest LLM today (?).
Early model evaluations look good:
ai.meta.com/blog/meta-llama-… x.com/alexandr_wang/status/1…
Pending still is the "vibe check", look out for that on X / r/LocalLlama over the next few days (hours?).
I expect the closed model players (which imo have a role in the ecosystem too) to give chase soon, and I'm looking forward to that.
There's a lot to like on the technical side too, w.r.t. multilingual, context lengths, function calling, multimodal, etc. I'll post about some of the technical notes a bit later, once I make it through all the 92 pages of the paper :)