TLMs: Tiny LLMs and Agents on Edge Devices with @cormacbyoutube.com/watch?v=-TiET_K-…
Function Gemma ships at 270 million parameters and runs nearly 2,000 tokens per second prefill on a Pixel 7. Out of the box, it hits 46% accuracy on a fixed set of app intents. Fine tune on a synthetically generated dataset and that clears 90% on eight of ten functions.
Cormac walks through the two paths developers have for on device AI: a skill harness built on Gemma 4 with a restaurant roulette demo running fully on device. Then Eloquent, a production transcription app built by chaining two sub billion parameter models together.
cc @osanseviero
Check out this video on how to run Gemma 4 locally on an iPhone!
It runs completely offline and handles long context, meaning no data plan, no API calls, and no monthly fees required.
Our first successful Gemma 4 Runtime in London with @swyx@patloeber@nick_kango@cormacb and others! 💎Great to go out for a run and talk about Gemma, agents, evals and more
Here's a ridiculous result from the @OpenAI GPT-2 paper (Table 13) that might get buried --- the model makes up an entire, coherent news article about TALKING UNICORNS, given only 2 sentences of context.
WHAT??!!
We are hiring in Seattle to join the Vertex.ai crew that recently joined us through acquisition. Great opportunity to join a top class team working on one of the new frontiers of CS/ML. lnkd.in/gEaDYxZ
My colleague Raghu has just released "Quantizing deep convolutional networks for efficient inference" - arxiv.org/abs/1806.08342v1
This white paper covers practical quantization approaches for most common CNNs in @TensorFlow, I think it will be super useful!
It is starting to look like deep learning workflows of the future feature autotuned architectures running with autotuned compute schedules across arbitrary backends. I don't know if I should be excited or scared.
I'm on the steering committee of SysML, a new conference that targets work at the intersection of computer systems and machine learning. The inaugural event will be at Stanford, Feb. 15-16, 2018, and submissions are due January 5th. Consider submitting if you have relevant work! x.com/edchi/status/942160617…
SysML is a new conference on systems and machine learning. Non-archival this year. The inaugural conference will take place Feb 15-16, 2018 in Stanford, CA. Consider submitting your work! sysml.cc/#submission#sysml2018
Elon Musk believes we'll have AGI (systems better than humans at *everything*) in 7 years. Guess we should stop partying and get back to work. #NIPS2017