People are just starting to realize the power of Local LLMs
Especially with new Apple chips. It's a game-changer
Let me show you:
Falcon is 180B LLM. It is the size of GPT3.
How fast would it run on your Mac?
Apple M3 Max: 3.5 tokens/sec
We can compare it to 2 x A100 (2 x 30k$)
2x A100 80GB: 7 tokens/sec
Now add to this the recent speed of
@NousResearch &
@teknium with models like openHermes
And the fact that soon they will add Vision functionality to send images
x.com/Teknium1/status/172661…
and we get real competition to commercial models from OpenAI and Anthropic
More of independent companies start to publish their model weights
Take
01.ai team from China. It is currently leading the OSS leaderboard in quality & context size
01 team has 200k context window model.
It's the 7 copies of Shakespeare's Hamlet which can fit onto a single prompt
GPT4 turbo has only 128K
7B models like Mistral are already running at 40 tok/s!
This is insanely fast! I have run a test on my old M1 a gif below to give you a feeling of the speed of it.
Absolutely stunning work is done by
@ggerganov. Georgi & llama.cpp community have literally shaped the progress of open source models with llama.cpp project. It allows you to run any LLM on your computer. You can even run llama on your ios with it
I am incredibly bullish on Apple metal & local computations.
There are many interesting projects rising around it, one that came to my mind is
@opentensor – decentralized commodity of computing resources like selling your GPU to someone
And I haven't even started on the power of webGPU & Web LLM
What other interesting stuff have you seen? share in comments
I should have multimodal vision Hermes next week if all goes well