CEO & Founder | PhD | Startup Advisor | @Columbia | Author Generative Software Engineering tinyurl.com/7ymyv4sb | πŸ”” Follow for AI & Vibe Coding Tips πŸ‘‡

Joined July 2023
6,615 Photos and videos
Pinned Tweet
πŸŽ—οΈ "Medium-Sized" LLM Burners Coming Soon! πŸ”₯ This Could Make Local HyperToken Generation a Reality. ⚑️ NVIDIA’s worst nightmare? 😱 βš™οΈ Application-Specific Hardware Taalas new PCIe ASIC board would burn the entire medium-sized Qwen 3.5-27B LLM straight into silicon 🀯 (already doing it with small models) Taalos said medium models on ASIC would be available in their lab by Spring '26. πŸ’­Imagine: 🚫 No more loading weights πŸš€ ~10,000 Tokens Per Second locally (Llama 3.1 8B already @ 17,000 tps) πŸ’» Standard PC slot, ultra-low power (10x less) πŸ”‹ 🌍 100% offline with no cloud, no GPU farm πŸ’° Reddit unit cost rumor $300 to $400 πŸ–₯️ Imagine HyperToken generation on your desktop. πŸ€– AI agents that think at light speed. ⚑️ Are you ready? πŸ‘€
179
421
2,716
491,862
πŸ’‘New RISC-V @SipeedIO K3 AI-Box Tested β‡’ Yes, it can inference! $600 Local AI option Been waiting for a good post, and @rcarmo came through. Here is what he found πŸ‘‡ πŸ› οΈ HW: Sipeed K3 (32GB LPDDR5, 128GB UFS) πŸ’° $600 πŸ”‹ 11W idle/22W load π–£˜ quiet πŸ–§ 10GbE/WiFi 🐧 Bianbu Linux 🧠 Real Benchmarks (llama.cpp fork A100 cores) Β» TinyLlama: ~36 tps Β» Gemma4 E2B: ~13 tps Β» Qwen3.6-28B-REAP-A3B: ~7 tps Β» Gemma4 E4B: ~6 tps Β» Gemma 4 26B-A4B: ~5 tps Not bad for a little dev board imho Link to his article in ALT
Wrote another thing about local models and dedicated hardware. Anyone want to send me a spare GB10? taoofmac.com/space/reviews/2…
11
1,346
See what it says at the bottom of the sale page that sells a similarly spec’d AI Max 395 PC w/2TB?
πŸš€ AMD Ryzen AI Halo is now available for pre-order! A compact local AI developer platform powered by the Ryzen AI Max 395: 🧠 128GB unified LPDDR5x memory ⚑ 40 CU Radeon 8060S graphics (RDNA 3.5) πŸ“¦ Run models up to 200B parameters locally πŸ–₯️ Windows Linux support out of the box Build and deploy AI workflows without cloud dependency. Pre-order β†’ @ amd
1
4
618
Holy πŸ’©! 56,000 tps has to be a 🌎 Guinness World Record. 🀯 If true, this is the inferencing πŸ‘‘! FPGAs applied to LLMs can only support smaller samller models. ASICs can handle a much larger LLM. What do you do with 56,000 tokens per second? ⏲️ Responds to a "Hi" prompt in 1 millisecond.
56,000 tokens/sec at just 80 MHz. 🀯 I burned a full Transformer with KV cache into a custom chip. Designed gate by gate as a 100% digital integrated circuit. Prototyped on a FPGA. (No GPU. No CPU) Just pure digital silicon running @karpathy microGPT, spelling out names on a tiny LCD. This is GateGPT πŸ‘‡
1
14
1,880
Most are betting Fable 5 will be restored in the new week or so.
2
14
2,618
More shots of the new AMD Strix Halo Dev PC going head to head with DGX-Spark (ROCm/Vulkan vs. CUDA).
AMD tackles NVIDIA's $4679 DGX Spark AI PC with its $3999 Ryzen AI Halo: Now available with 128 GB memory for blazing fast LLMs. πŸ”— wccf.tech/1kmsb
3
6
44
5,822
So people are asking where it says Gemini-3-Flash Kimi-K2.6 Deepseek-V4-Pro got within 1% of Fable 5 @ 50% the cost using the new Fusion tool, here it is from @OpenRouter's official blog post. πŸ§ͺ What is the DRACO Benchmark? πŸ‘‡ DRACO (Deep Research Agentic Comparison) is a benchmark designed to test AI models on complex, real-world research tasks. Key details: πŸ“ Created by Perplexity AI Contains 100 deep research tasks across 10 domains (law, medicine, finance, tech, product comparison, etc.) Evaluates reasoning, tool use, synthesis, factual accuracy, and citation quality Uses a detailed rubric with ~39 weighted criteria per task ⚠️ Is it independent? No, it was developed by Perplexity, so it’s not fully independent. However, the benchmark is public (arXiv) and can be used by anyone.
1
4
56
6,357
Is this for real? Did someone leak this intentionally?
Someone put Fable 5 on the pirate bay, 3.4TB πŸ˜‚
Community note
A Pirate Bay search for "fable" returns no relevant results, and further, there is no "Other / Models" category as claimed in the screenshot. thepiratebay.org/search.php?q=f…
307
🚨 Exciting news! πŸ”€ OpenRouter Fusion is now available and it might help while Fable 5 is restricted. πŸ’° A budget panel!! πŸ‘€ πŸ‘€ πŸ‘‰ Gemini 3 Flash Kimi K2.6 DeepSeek V4 Pro ➀ scored within 1% of Fable 5 performance at roughly half the cost How to use it:. β†’ Set model to "openrouter/fusion" πŸ”§ It runs server-side with tools enabled and supports custom panels. πŸ›‘οΈ Set up ZDR for additional privacy!!!
Introducing the Fusion API, the smartest compound model in the market. Fusion achieves Fable-level intelligence at half the price. How it works πŸ‘‡
4
4
51
7,456
People are waking up to the reality that the AI we use every day can be taken away or priced out of reach.
1
5
338
It's hard to believe that, with all the AI news this week, WWDC '26 was just last Monday.
1
275
"Token austerity" advice coming from Big Tech is bizarre indeed. Strange times we live in for sure.
1
1
3
480
πŸš€ πŸ“° GLM-5.2 Status Update Here's where things stand right now: βœ… Available today! β†’ GLM Coding Plan users (Lite, Pro, Max, Team) ⏳ API Chatbot β†’ Launching next week ⏳ Open Source (MIT) β†’ Releasing next week on Hugging Face πŸ“Š Currently on AgentArena ❖ GLM-5.2 brings strong coding performance 1M context. Full public release is coming soon.
Jun 13
Intelligence should be open, accessible, and ready to build with, empowering every developer, everywhere. GLM-5.2 is now available to all GLM Coding Plan users, including Lite, Pro, Max, and Team plans. docs.z.ai/devpack/latest-mod… As our new flagship model, GLM-5.2 delivers powerful coding capabilities, usable 1M-context support, and continued strengths in long-horizon tasks. API and Chatbot services will launch next week. The model will also be officially open-sourced next week under the MIT License. The future of AI is open, and it belongs to the people.
2
5
115
10,085
Maybe this is a better deal.
At least the cable is included in the price! Now can it run MiniMax M3?
14
3
83
18,268
Best open source model today. Now just need 256GB of VRAM to run it.
πŸ“Š With MiniMax M3 open source now out, here is what to expect on quants and sizes, including VRAM needed: MiniMax M3 (428B MoE, ~23B active) πŸ”₯ GGUF Size Estimates Q8_0 β†’ ~430-450 GB Q6_K β†’ ~340-360 GB Q5_K_M/XL β†’ ~280-310 GB Q4_K_M/XL β†’ ~220-250 GB (Best balance) Q3_K_XL β†’ ~170-200 GB Q2_K β†’ ~110-140 GB Last resort Very efficient due to extreme sparsity! Practical local runs will need high-VRAM setups (multiple 5090s or better).
7
5
84
12,191