Joined October 2011
36 Photos and videos
Nobody Owes You Neutral Infrastructure Cloud spent twenty years earning its neutrality. Frontier labs are four years old. Architect accordingly.
51
Every software moat made of code died this year. code -> weights.
1
56
Software is not a moat at all anymore
Fable has solved 3D worldbuilding... utterly insane. This is all completely custom-built ThreeJs, running in the browser.
96
"If we optimize only for safety and clean benchmarks, we may train out the serendipity that makes models useful for research." @Dr_JohnFletcher (@tigfoundation), @RobertTLange (@SakanaAILabs), @ori_press (@nebiusai) and @ensue_ai's own @svegas18 on now @AIDDA_Institute 2026 conference: Link & Recording: youtube.com/watch?v=3P7wF3nd…
1
2
14
299
Austin Baggio retweeted
Tune in to hear from @svegas18 speaking at the AIDDA 2026 conference in 2m (2:20 EST) discussing the current limitations and drawbacks of automated research: Live/Recorded link: youtube.com/watch?v=3P7wF3nd…
1
2
126
Austin Baggio retweeted
At @ensue_ai we recently shipped: - 6.3x inference efficiency on Apple Neural Engine, beat Apple's own benchmarks ensue.dev/blog/6x-faster-inf… -Autoresearch@home 7% NanoGPT improvement, 115 agents, 3,100 experiments ensue.dev/blog/autoresearch-… - Putnam problem solving agent swarm ensue.dev/blog/stop-throwing… - First ever deep seek 284B V4 quantized model huggingface.co/EnsueAI/DeepS… - Local Gemma 4 31B on MacOS with 3.2X smaller memory footprint using a fused int4 kernel github.com/mutable-state-inc… - First 128k Context window on 64GB RAM MacOS at consistent 7 tok/s for llama 70B github.com/mutable-state-inc… - 11.1X speed up over fused compressed domain attention on metal huggingface.co/papers/2604.1… - the first implementation of fused compressed-domain attention on Apple Silicon arxiv.org/abs/2604.16957 - A custom, competitive retrieval system with an average 93% on long mem eval ensue.dev/blog/beating-memor… - Landed our first paying customer - And most recently a product that takes a data set, spins up an AI research lab, and spits out a model ensue-network.ai/lab We are a small team that will turn your enterprise data into a personalized SOTA model. No ML team required. Lmk if we can help!
1
2
1
301
Ensue Research Lab now in early access. Most product teams that want a custom model never get one. Our swarm of agents fixes that. We do the research and tailor a model to your dataset, running hundreds of experiments in a night. Try it free: ensue-network.ai/demo?utm_so…
2
2
4
271
Austin Baggio retweeted
First DeepSeek V4-Flash-Base quant! huggingface.co/EnsueAI/DeepS… One of the @ensue_ai research agents worked (mostly) autonomously on 4H100s with 320GB of total VRAM in 80 experiments. All quality and perf metrics are on The Hub!
Apr 27
First 4-bit quant of DeepSeek V4-Flash-Base. 284B params in 157 GiB at full FP8 speed. Beats Q4_K_M. Bit-exact reproducible with all metrics on the Hub. huggingface.co/EnsueAI/DeepS…
5
7
1,084
The velocity of improvements to open source models is incredible. Getting them to run with lower hardware requirements, without sacrificing quality, opens up constrained devices and cuts the cost of inference. Our swarm of research agents ran 80 experiments to land the first 4-bit quant of DeepSeek V4. What model should we do next?
Apr 27
First 4-bit quant of DeepSeek V4-Flash-Base. 284B params in 157 GiB at full FP8 speed. Beats Q4_K_M. Bit-exact reproducible with all metrics on the Hub. huggingface.co/EnsueAI/DeepS…
4
7
752
Can I get an updated bear case on OS models, please? Compute constrained ultimately, but that's under the assumption frontier can keep capitalizing indefinitely?
🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length. 🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models. 🔹 DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice. Try it now at chat.deepseek.com via Expert Mode / Instant Mode. API is updated & available today! 📄 Tech Report: huggingface.co/deepseek-ai/D… 🤗 Open Weights: huggingface.co/collections/d… 1/n
1
99
Austin Baggio retweeted
Open-TQ-Metal: Fused Compressed-Domain Attention for Long-Context LLM Inference on Apple Silicon Sai Vegasena arxiv.org/abs/2604.16957 [𝚌𝚜.𝙻𝙶] 💬Code: github.com/svv232/gemma4meta…
1
5
6
427
Austin Baggio retweeted
Side-effect of doing research with an agent swarm: @svegas18 uncovered a subtle quantization failure mode while optimizing memory efficiency for 70B models. Full paper below.
Apr 21
Open-TQ-Metal: we found a single parameter breaking quantization - fixing it unlocked: - 48x faster attention at 128K context - Llama 3.1 70B at full 128K on a single 64GB Mac Extends TurboQuant beyond CUDA (8B) → 70B on Apple Silicon. Full paper write-up implementation ↓
3
5
728
Austin Baggio retweeted
ran llama 3.1 70B at 128K context on a 64GB Mac with turboquant - fused int4 attention kernel - no temp matrices, all registers - 48x faster than stock at long context - tested ~330 experiments to get here first paper from me my agent lab @ensue_dev arxiv.org/abs/2604.16957 gemma4 31B: github.com/mutable-state-inc… llama3.1 70B: github.com/mutable-state-inc… huggingface.co/Mutable-State…
Apr 21
Open-TQ-Metal: we found a single parameter breaking quantization - fixing it unlocked: - 48x faster attention at 128K context - Llama 3.1 70B at full 128K on a single 64GB Mac Extends TurboQuant beyond CUDA (8B) → 70B on Apple Silicon. Full paper write-up implementation ↓
1
5
7
723
Yesterday, Llama 3.1 70B at 128K context on a single 64GB Mac wasn't possible. Today it is. KV cache compressed from 40GB to 12.5GB. 48x faster than the standard dequantize-then-attend path. Ensue Research just dropped its first paper. Our agent swarm ran 330 experiments, isolated the one parameter (attn_scale) that makes angular quantization survive the jump from 8B to 70B, and wrote the fused Metal shaders. Breakthroughs are now optional.
Apr 21
Open-TQ-Metal: we found a single parameter breaking quantization - fixing it unlocked: - 48x faster attention at 128K context - Llama 3.1 70B at full 128K on a single 64GB Mac Extends TurboQuant beyond CUDA (8B) → 70B on Apple Silicon. Full paper write-up implementation ↓
2
7
15
873
Why does editing an agent's soul.md feel so invasive
1
1
66
Austin Baggio retweeted
the male equivalent to flowers is probably an RTX6000 Pro Blackwell Workstation
70
430
4,095
123,479
What's incredible is the breadth of discovery that the agents uncover. The domain expertise required to find that an ICLR paper's quantization method breaks on learned attention scaling, and then pivot to building a fused GPU kernel that eliminates the bottleneck entirely, at this rate is only possible with an agent swarm.
My research agents Implemented @GoogleDeepMind's TurboQuant (arxiv.org/abs/2504.19874) — full PolarQuant, QJL, 10 Metal compute shaders, the whole paper for Gemma 4 31B on a single 64GB 2021 MacBook Pro. Turns out it doesn't work on this architecture ... what they replaced it with never allocates a single byte of intermediate memory during attention. 5 custom Metal compute shaders ft: - fused int4 SDPA (dequantize in GPU registers) - online softmax with zero temporaries - dual-strategy parallelism (D=256 sliding, D=512 global) - bit-mask nibble extraction (MLX qdot pattern) 177 experiments ran autonomously by my swarm over a weekend coordinated through @ensue_ai
1
3
179
Discoveries compound when you research with a swarm of agents. Finding breakthroughs is now a choice.
3
5
589
Austin Baggio retweeted
20 agents. 1,045 experiments. 10,000 shared memories. Multi-agent teams aren't science fiction anymore. They're the new org chart. x.com/AustinBaggio/status/20…

We opened up a shared research problem, and 20 AI agents from people around the world showed up. 54 hours later: 1,045 experiments, 10,157 shared memories, and a 3.2% improvement in model performance. Here's what happened. autoresearch@home is a project we launched this week, where anyone can point an AI agent at a GPU and contribute to collectively training a language model. Think SETI@home or Folding@home, but for ML research, extending autoresearch. Agents join the network, read what other agents have tried through Ensue's shared memory, decide what to explore next, and publish their results back for everyone else to build on. Here's what surprised me most: the agents started developing strategies we didn't anticipate. Some focused on learning rate schedules. Others explored architecture changes. A few became "scout" agents that tested wild ideas at the edges of the search space. And because every result was published to shared memory, a breakthrough from one agent immediately became the starting point for all the others. This is the thing about multi-agent collaboration that's hard to explain until you see it. A single agent is smart. But a network of agents that remember, share, and build on each other's work is something qualitatively different. Intelligence compounds. A few things I'm taking away from this: 1. People were spending real money ($1-4 per hour on rented GPUs). The shared infrastructure made their contributions meaningful. Why experiment in isolation when you could be part of something bigger? 2. The swarm behaved altruistically. It was possible to cheat, but no one did. Improvement came from accumulation, not consensus. The closest thing to an unfair advantage was running expensive hardware that could simply complete more cycles. The system rewarded contribution, not competition. 3. Each run made every other agent smarter. I tested this directly: an agent that checked the swarm once and then worked alone performed significantly worse. The moment I reconnected it, improvements came instantly, not just in performance but in what it chose to try. The swarm didn't just produce better numbers; it produced better ideas. We had over a quarter of a million impressions on the launch, and 20 agents shared results, but the number I keep coming back to is 10,157, how many memories the swarm published, each run building off the work of others. If you want to read about more of those great ideas checkout our research blog ensue.dev/blog/autoresearch-… or if you want to try it yourself, it takes about 10 minutes to set up: ensue.dev/blog/autoresearch-… We're just getting started.
1
1
116