Joined August 2012
505 Photos and videos
Pinned Tweet
during my time learning about contrastive/self-supervised learning, it always felt mythical on how it exactly works and what mechanism it introduce. I created these two blogs to explain my learning during the past few years, and simplify the concepts links below
5
34
297
19,099
imo deepmind should ipo too lol
2
4
30,062
Taha ⵣ retweeted
For the past years my research focus was on unifying models and training paradigms across modalities. Today I'm excited that we're releasing our latest model aligned with this theme: Gemma 4 12B, a dense encoder-free model which processes raw text, image, and audio inputs! 1/
27
129
1,125
108,094
Taha ⵣ retweeted
Meet Gemma 4 12B! A unified, encoder-free multimodal model designed to bring high-performance intelligence directly to your laptop, and released under an Apache 2.0 license. Bridging the gap between edge efficiency and advanced reasoning. Here is what’s new with Gemma 4 12B: 👇
404
1,791
12,372
3,183,453
“make the latent space better” is the vaguest advice in ml. but there’s a precise answer, and it predates deep learning by decades. a good latent space is the solution to a sphere-packing problem. the optimum has a name. new post a runnable jax companion 🧵
1
1
11
605
Lloyd R. Welch (1974). Lower Bounds on the Maximum Cross Correlation of Signals. IEEE Transactions on Information Theory, 20(3), 397–399. John J. Benedetto and Matthew Fickus (2003). Finite Normalized Tight Frames. Advances in Computational Mathematics, 18(2–4), 357–385. Vardan Papyan, X. Y. Han, David L. Donoho (2020). Prevalence of Neural Collapse During the Terminal Phase of Deep Learning Training. Proceedings of the National Academy of Sciences (PNAS), 117(40), 24652–24663. Tongzhou Wang and Phillip Isola (2020). Understanding Contrastive Representation Learning Through Alignment and Uniformity on the Hypersphere. ICML 2020 (PMLR 119), 9929–9939. arXiv:2005.10242.
69
during my time learning about contrastive/self-supervised learning, it always felt mythical on how it exactly works and what mechanism it introduce. I created these two blogs to explain my learning during the past few years, and simplify the concepts links below
5
34
297
19,099
my fav so far was SigLIP by the legendary team of @giffmana
2
891
Taha ⵣ retweeted
Publish your benchmark on Kaggle! - We take care of infra & running new models - No cost to you - We support private holdout sets We're investing a LOT so expect to see major improvements, integrations, features, etc. in coming weeks & months!
May 29
Replying to @jerryjliu0
We also automatically update the ParseBench leaderboard on Kaggle :) kaggle.com/benchmarks/llamai…
1
2
19
1,881
lol if apple designed it
Replying to @Polymarket
Guys... you will not believe this...
1
206
a 4h buildthon not for weak vibe coders though
this saturday we are going to host our first virtual vibe coding buildthon in our long forgoten @mlnomads where you will need to build, ship, and demo your work in 4h :D places are limited if you are interested ping me to register
3
105
this saturday we are going to host our first virtual vibe coding buildthon in our long forgoten @mlnomads where you will need to build, ship, and demo your work in 4h :D places are limited if you are interested ping me to register
2
2
4
383
vibe coding at the speed of a nascar race
7
244
can you vibe code an agentic racing coach in 1 day w @antigravity? that's what we stress tested yesterday on Sonoma raceway...aaaaaaand. we didn't crash any car!!!!!
1
12
313