Google Gemma

Google Gemma

1,222 Photos and videos

Tweets

Youssef KH retweeted

Google Gemma

@googlegemma

13h

Want to teach Gemma to master chess? Check out this awesome community project showing how to fine-tune Gemma 4 12B on your own data, 100% locally! Running text, images, and audio on just 8GB VRAM makes custom models more accessible than ever.

0:12

152

1,813

101,770

Alok

Youssef KH retweeted

Alok

@analogalok

Jun 14

This is the most hilarious thing I saw and did today Ran gemma-4-12B-coder-fable5-composer2.5-v1-GGUF locally with 8 GB VRAM at 20 tok/sec Anthropic's Claude Fable 5 launched June 9. By June 12 it was banned. I can't access it. You can't either. But here's the twist: I'm running a model trained on its chain of thought at 20 tok/s on my RTX 4060 8GB. Locally. Offline. No cloud. No export control. Enter: Gemma4-12B-Coder GGUF (Q4_K_M) Base: Google's gemma-4-12B-it Fine-tuned on verifiable Python CoT data: - Primary: Composer 2.5 real reasoning traces (only passing solutions kept) - Auxiliary: Fable 5 used to redo the hard cases Composer missed. Every training example's reasoning led to code that actually ran. No hallucinated logic. Llama.cpp flags: -m gemma4-coding-Q4_K_M.gguf -cnv -ngl 44 -c 64000 -v (huggingface model link in comments) Flag breakdown: -ngl 44 → offload 44 layers to GPU (tune this for your VRAM) -c 64000 → 64K context window -cnv → conversation/chat mode -v → verbose output The irony writes itself. Anthropic spent weeks telling the world Fable 5 (mythos) is too powerful to release. Then released it. Then got banned from serving it, including their own researchers. Meanwhile: a Gemma 4 12B fine tune, trained on Fable 5's reasoning, runs fully offline on my mid range consumer GPU No API. No cloud. Just me and llama.cpp. This is why local AI matters. Check out the model's link in the comments. How's your experience been with this model?

0:16

Hugging Models

@HuggingModels

Jun 14

Gemma 4 12B Coder is here and it's a game changer for local code generation. This GGUF model packs Google's latest gemma-4 architecture into a compact 12B size, perfect for running on consumer hardware. It's optimized for reasoning and thinking, making it ideal for developers who want fast, private coding assistance without the cloud.

169

1,957

282,715

Sharbel

Youssef KH retweeted

Sharbel

@sharbel

Jun 6

Someone built an AI agent that searches Reddit, X, YouTube, HN, TikTok, Polymarket, and the web in parallel. Scores everything by real upvotes, real likes, and real money. Synthesizes it into one brief. In seconds. It's called /last30days. 28,700 stars on GitHub. You type one command. The agent fans out across every platform at once. Reddit threads. X posts. YouTube transcripts. Polymarket odds backed by actual money. HN comments. GitHub commits. It scores each source by what real people engaged with. An AI judge synthesizes the whole thing into one grounded summary of the last 30 days. Here's what it does: → Searches Reddit for top upvoted threads and comments on any topic, person, or company. → Pulls X posts and scores them by likes and recency. Not algorithmic feed. Raw signal. → Transcribes and searches YouTube videos. Finds what was actually said, not just the title. → Reads TikTok engagement. Surfaces what creators and communities are actually talking about. → Queries Polymarket odds. Real money bet by real people on what happens next. → Searches Hacker News. The technical community's unfiltered take. → Searches GitHub commits and PRs. What someone is actually shipping right now. → Runs all sources in parallel. Scores them against each other by engagement weight. → AI agent judge synthesizes everything into one brief. No raw dump. A grounded summary. → Zero config to start. Reddit, HN, Polymarket, and GitHub work immediately. → One setup wizard unlocks X, YouTube, TikTok, and more in 30 seconds. → Installs into Claude Code, Codex, Cursor, Copilot, Gemini CLI, and 50 agent hosts. Here's the wildest part: Google doesn't touch Reddit comments or X posts. ChatGPT has a Reddit deal but can't search X or TikTok. Gemini has YouTube but not Reddit. Claude has none of them natively. Every platform is a walled garden with its own API, its own tokens, its own auth. No single AI has access to all of it. Until you bring your own keys and bridge them with an agent. That's the unlock. Not one better search engine. A dozen disconnected platforms, scored against each other by what real people actually engaged with and bet real money on. Google aggregates editors. /last30days searches people. Perplexity Pro: $20/month. $240/year. ChatGPT Plus: $20/month. $240/year. You(dot)com Pro: $15/month. $180/year. /last30days: $0. Unlimited queries. Unlimited topics. Your API keys. Your agent. Forever. 28,700 stars. 2,431 forks. MIT licensed. MIT licensed. Self-hosted. Open protocol. Free forever. 100% Open Source. Github repo: github.com/mvanhorn/last30da…

246

2,279

151,270

Nex

Youssef KH retweeted

Nex

@NexEcosystem

Jun 14

The Rio 3.5 model broke the internet this week. The plot twist? It’s essentially our open-source model, Nex N2 Pro, wearing a different hat. 🤯 We analyzed the weights, and the recipe is exact: Rio 3.5 ≈ 0.6 * Nex N2 Pro 0.4 * Qwen 3.5 It even literally introduces itself as "Nex N2 Pro" if you ask it without initial system prompt! 😂 We are flattered that the City of Rio used our work to achieve SOTA performance. Thanks for the ultimate benchmark validation. 🤝 But in the open-source world, attribution matters. 👇 Full mathematical proof & verify script in the first reply!

221

529

5,363

862,130

Tech with Mak

Youssef KH retweeted

Tech with Mak

@techNmak

Jun 14

A dev got so frustrated watching his AI agent write 500 lines for a 5-line problem that he built a fix. He called it Ponytail. Named after the guy every team has - long ponytail, oval glasses, been there longer than the version control. You show him fifty lines; he looks at them, says nothing, and replaces them with one. Now your agent does the same. Before writing anything, it looks for a reason not to. 80-94% less code. 47-77% cheaper. 3-6x faster. The best code is the code you never wrote. GitHub Repo: github.com/DietrichGebert/po…

203

822

16,131

1,067,464

Nebius Token Factory

Youssef KH retweeted

Nebius Token Factory

@nebiustf

Jun 13

Not your weights, not your model.

421

60,442

Youssef KH

Youssef KH @ucefkh

Jun 14

What’s the state of blockchain these days? Can’t get tesnets tokens from invalid for a week now! Faucets of @base and @0xPolygon not working what a shameful thing! You don’t even test these? What have we come to!? & @thirdweb want me to pay to get testnet tokens shame on you!

Youssef KH

Youssef KH @ucefkh

Jun 14

It had to happen anytime soon this is a breakthrough and more to come us too

starmex

@starmexxx

Jun 13

AMD CEO LISA SU HELD A MINI PC ON STAGE THAT RUNS A 235B MODEL AND REPLACES YOUR $440/MONTH AI STACK amd's ryzen ai max 395 is the first x86 chip that runs a 200 billion parameter model on one piece of silicon. cpu and gpu share 128gb of unified memory, no separate graphics card needed the gmktec evo-x2 runs qwen3 235b fully, deepseek v3 comfortably and llama 3.3 70b with headroom. on linux you get 110gb of usable vram out of 128gb amd claimed the chip beat an nvidia rtx 5080 by more than 3x on deepseek r1 inference. a lunchbox sized pc outrunning a $1,000 discrete gpu on a real ai workload a heavy ai user pays $200 for claude code max, $200 for chatgpt pro, $20 for cursor and $20 for gemini. that's $5,280 a year and the box pays itself off in 9 to 10 months install ollama, pull the model, point claude code at localhost. same interface, nothing leaves the machine, nothing costs per request bookmark this and read the article below

1:00

Youssef KH

Youssef KH @ucefkh

Jun 14

I swear to god I have setup this architecture with multiple phones as a huge cluster

Google Research

@GoogleResearch

Jun 12

Today on the blog, we discuss a pathway for the second life of phones through the exploration of “phone cluster computing”, which can directly reduce the environmental footprint of computing by avoiding the need for further raw material extraction. More →goo.gle/4aJe5vO

ALT Animation of the construction of a server using smartphones.

Youssef KH

Youssef KH @ucefkh

Jun 13

Good to have

Nous Research

@NousResearch

Jun 12

Hermes Agent now has a production-grade WhatsApp Business Cloud integration: use it as a private WhatsApp bot for yourself or your team, or configure it for customer-facing support. Connect an existing WhatsApp Business Cloud number or create one through Meta Business Manager, then run 'hermes whatsapp-cloud' to wire it into Hermes with guided setup, secure webhooks, media/voice support, read receipts, typing indicators, and interactive approval buttons.

0:11

Youssef KH

Youssef KH @ucefkh

Jun 13

Brazil spent 20 years trying to fix their defense just to let Morocco slice them like butter and chip Alisson like he’s playing in a Sunday league. Five World Cup trophies but Marquinhos and Gabriel look like they’ve never met before today. 😂🇲🇦🇧🇷 #BRAvsMAR #WorldCup2026

344

OpenRouter

Youssef KH retweeted

OpenRouter

@OpenRouter

Jun 13

Introducing the Fusion API, the smartest compound model in the market. Fusion achieves Fable-level intelligence at half the price. How it works 👇

684

1,743

14,699

5,847,614

Youssef KH

Youssef KH @ucefkh

Jun 13

Here we go : an amazing experiment

0:50

Youssef KH

Youssef KH @ucefkh

Jun 12

Tiny smart idea

Andrew McCalip

@andrewmccalip

Jun 11

Get paid to wait The Claude Code spinner might be the most watched line on Earth. So I turned it into an ad marketplace. Advertisers bid on it. You keep 50% of the money. Install the extension → get cash from ads. Introducing Kickbacks

0:24

Google DeepMind

Youssef KH retweeted

Google DeepMind

@GoogleDeepMind

Jun 11

We’re teaming up @Palmeiras, the first football club to meaningfully build upon TacticAI: our AI system that can help simulate field scenarios and predict open play dynamics up to 8 seconds in advance. ⚽

114

372

3,281

974,427

Spencer Baggins

Youssef KH retweeted

Spencer Baggins

@bigaiguy

Jun 8

A French engineer who lives quietly in Paris has spent 30 years writing software that the entire internet now runs on without knowing his name. He wrote the code that streams every YouTube video, every Netflix show, every TikTok clip. He wrote the code that runs the virtual servers underneath AWS, Google Cloud, and Microsoft Azure. He calculated more digits of pi than anyone in history. He has no Twitter. He has no marketing. He just keeps shipping. His name is Fabrice Bellard. Here is the story, because almost nobody outside the systems programming world knows what one man has built. Fabrice was born in 1972 in Grenoble, France. He studied at École Polytechnique, the top French engineering school. He never went to Silicon Valley. He never built a startup empire. He just wrote code. In 2000 he started a project called FFmpeg, an open-source multimedia framework for encoding, decoding, and streaming video. He was 28. The project did one thing nobody else had done well. It handled every video and audio format that existed, in one library, on every operating system. He led it himself for years. Today FFmpeg is the invisible engine of the internet. YouTube uses it. Netflix uses it. VLC uses it. Chrome and Firefox use parts of it. Every Android phone, every iPhone, every smart TV, every video editing tool you have ever touched runs FFmpeg somewhere underneath. If you have watched a video on a screen in the last 20 years, Fabrice's code processed it. He was not done. In 2003 he started QEMU, a machine emulator and virtualizer. He wrote it solo until version 0.7.1 in 2005. QEMU lets you run any operating system on any other operating system. It became the foundation of modern virtualization. KVM, the Linux kernel hypervisor, runs on top of QEMU. Every major cloud provider, AWS, Google Cloud, Microsoft Azure, IBM Cloud, runs virtual machines on infrastructure built around it. The Quick Emulator is the most cited piece of cloud infrastructure code on Earth. He kept going. In 2001 he won the International Obfuscated C Code Contest with a small C compiler that grew into TCC, the Tiny C Compiler. TCC can compile and boot a Linux kernel from source in under 15 seconds. In 2004 he calculated the most digits of pi ever computed at the time, using a personal desktop computer and an algorithm he derived himself called Bellard's formula. In 2011 he wrote a complete PC emulator in pure JavaScript that runs Linux in your browser, a project called JSLinux that engineers still cannot believe is real. In 2019 he released QuickJS, a small but complete JavaScript engine that fits where V8 cannot. In 2021 he released NNCP, a neural network based lossless data compressor that immediately took the lead on the Large Text Compression Benchmark. Then he turned his attention to large language models. He built TextSynth Server, a web server with a REST API for running LLMs locally. He released ts_zip and ts_sms, compression utilities that use language models to compress text and short messages at ratios traditional algorithms cannot reach. He released TSAC, a very low bitrate audio compression system. In December 2025 he released Micro QuickJS, a new JavaScript engine for microcontrollers, separate from QuickJS, designed for environments with almost no memory. Fabrice co-founded a telecom company called Amarisoft in 2012, where he serves as CTO. Amarisoft builds 4G and 5G base station software used by carriers and labs around the world. He has been running it for over a decade while continuing to ship personal projects from his own home page at bellard dot org He has no Twitter. He has no Instagram. He gives almost no interviews. His personal website is a flat list of projects with no styling, no fonts, no marketing copy. Just titles and links. A quiet French engineer who never moved to Silicon Valley wrote the code that quietly runs the internet. He is still shipping.

379

4,521

25,170

3,058,146

Youssef KH

Youssef KH @ucefkh

Jun 7

Oh shit this is a wild week

Victor M

@victormustar

Jun 5

Before the week ends, let's acknowledge one of the most INSANE week ever for open AI, with 25 notable open-weight drops across every modality: 🧠 LLMs → NVIDIA Nemotron 3 Ultra: 550B hybrid Mamba-MoE, only 55B active, 1M context, MMLU 89.1. NVFP4 variant claims ~5x throughput on Blackwell. First openly-weighted 550B hybrid Mamba-Transformer, closing the gap with frontier closed models. → Google Gemma 4 12B: fully open dense any-to-any (text/image/audio/video), 256k context, encoder-free, 140 languages, AIME 2026 at 77.5. Shipped with a 23-checkpoint QAT wave (mobile ONNX MLX). Most deployable model of the week. → StepFun Step-3.7-Flash: 198B sparse MoE VLM, ~11B active, SWE-Bench PRO 56.3. Apache 2.0. → Liquid AI LFM2.5-8B-A1B: edge MoE, just 1.5B active, 128k ctx, MATH500 88.8, MLX-ready. Best on-device option this week. → JetBrains Mellum2-12B-A2.5B-Thinking: their first open MoE, near-Qwen3-14B coding at 2.5B active. Apache 2.0. 🎨 Image gen (the surprise of the week) → Ideogram 4: their FIRST-EVER open weights. 9.3B flow-matching DiT trained from scratch. #2 overall behind GPT Image 2, top open-weight model on Design Arena LMArena. Strongest open checkpoint for text-rich images, full stop. It has taste. Still can't believe this is open weights. 🔊 Audio & Speech (a breakout week for open TTS, 4 labs shipped) → Boson Higgs Audio v3 4B: 102 languages, 21 emotions, singing/whispering/shouting, sub-second TTFA. → RedNote dots.tts: the only fully continuous (no codec) open TTS pipeline, Apache 2.0. → Google Magenta RealTime 2: real-time music gen, <200ms latency, text audio MIDI. multimodalart ported it to PyTorch within hours with live ZeroGPU demos. → NVIDIA Nemotron-3.5 ASR: 600M streaming, 17x more concurrent streams vs Parakeet RNNT 1.1B. 👁️ Vision & VLMs → PaddleOCR-VL-1.6: SOTA document parsing at 1B params, Apache 2.0. → Baidu NAVA: 6.3B joint audio-video gen, best-in-class A/V sync, Apache 2.0. 🎬 Video, 3D & World Models → NVIDIA Cosmos3-Super: 64B omnimodal world model coupling action trajectories with video audio gen, for Physical AI. → JD JoyAI-Echo: up to 5-min multi-shot text-to-video on LTX-2.3. → ByteDance Bernini-R VAST TripoSplat (single-image-to-3D Gaussian splats, MIT).

0:14

Patrick Jiang

Youssef KH retweeted

Patrick Jiang

@patpcj

Jun 6

Introducing Harness-1, a 20B search agent trained with a state-externalizing harness. > frontier-level long-horizon search, rivaling Opus-4.6 and outperforming GPT-5.4 > Context-1-level cost and latency > externalizes candidates, evidence, verification, and search history > open-source

0:56

272

2,958

264,897

CJ Zafir

Youssef KH retweeted

CJ Zafir

@cjzafir

Jun 5

Our first model Mac-1 6.6B beating 3 giant models. - Haiku 4.5 - GPT 5.4 mini - Gemini 3 flash Running this model on my Macbook M3 24GB. (model takes only 7GB RAM) It searches web, call tools, ask follow-ups, tell jokes, find contacts, search files, write emails, book events, write notes, set reminders and so much Siri can't do. Read again, a 6.6B model. Will share full 2000 scenario test results & benchmark scores in 2 days.

162

108

1,827

774,530

shmidt

Youssef KH retweeted

shmidt

@shmidtqq

Jun 6

NVIDIA just made paying for AI feel optional. Open model, a million tokens of context, free tier with no per-token cost, runs on your own hardware. Entire codebases, whole data rooms, a year of chat logs, all swallowed in one prompt. No chunking, no RAG, no rate limit theater. The closed-AI premium has 90 days to defend itself. Bookmark this and come back. Open beat closed. Again.

0:17

shmidt

@shmidtqq

Jun 6

x.com/i/article/206325346313…

160

1,325

294,356