Joined January 2023
1,292 Photos and videos
if you had to build and train a large language model from scratch, which domain would you specialize it in, and why?
10
Ramón retweeted
Jun 13
Intelligence should be open, accessible, and ready to build with, empowering every developer, everywhere. GLM-5.2 is now available to all GLM Coding Plan users, including Lite, Pro, Max, and Team plans. docs.z.ai/devpack/latest-mod… As our new flagship model, GLM-5.2 delivers powerful coding capabilities, usable 1M-context support, and continued strengths in long-horizon tasks. API and Chatbot services will launch next week. The model will also be officially open-sourced next week under the MIT License. The future of AI is open, and it belongs to the people.
316
918
7,706
1,861,479
Ramón retweeted
Looking for a Fall intern to work with me on agentic kernel generation/optimization for LLM inference on TensorRT-LLM @nvidia. Great fit if you’re into compilers, GPU/ML performance, kernels, or systems. US/Canada. Apply by email only: fkhoubsirat@nvidia.com Subject: Fall Intern Candidate - Agentic Kernel Tooling Plz include your resume any relevant blogs, papers, or open-source work.
10
13
262
21,050
Ramón retweeted
I accidentally discovered that Gemma-4-26B-A4B is way better at writing human sounding content than every other model out there - including frontier models like GPT 5.5 and Sonnet 4.6. I'm not sure why this is - it's kind of crazy how slopified these big expensive models are and for some reason, Google's open source model sounds a lot more natural and follows writing instructions better. WTF?
39
34
800
54,494
Ramón retweeted
MiniMax M3, Open-Weight, Now On Hugging Face , with only ~428B parameters and ~23B activated parameters Weights: huggingface.co/MiniMaxAI/Min… MiniMax Sparse Attention: huggingface.co/papers/2606.1…
Introducing MiniMax M3: The First Open-Weights Model to Combine Three Frontier Capabilities - Coding & Agentic Frontier: 59.0% SWE-Bench Pro, 66.0% Terminal Bench 2.1, 34.8% SWE-fficiency, 28.8% KernelBench Hard, 74.2% MCP Atlas - MiniMax Sparse Attention scales context to 1M - Natively Multimodal from Step Zero API: platform.minimax.io Token Plan: platform.minimax.io/subscrib… 🚀New! MiniMax Code: code.minimax.io Weights & Tech Report in ~10 Days
114
328
2,775
646,516
Ramón retweeted
🚀 The MLX team is growing! If you love writing blazing-fast GPU kernels or implementing foundational models in Python & Swift, we want you. Drop a DM or apply below! 👇 #MachineLearning #AppleSilicon jobs.apple.com/en-us/details… github.com/ml-explore
13
47
422
129,884
Ramón retweeted
You should basically never use Fable for coding, but instead use it as a planner/orchestrator. Most of today's advanced models can implement a spec perfectly, and once done you can send the work to Fable to review. This has been my most powerful flow so far.
167
116
3,345
214,858
Ramón retweeted
We've reset 5-hour and weekly rate limits for all users. Enjoy Fable 5!
1,355
1,818
35,762
2,201,457
Ramón retweeted
Jun 9
Sovereign AI for all.
North Mini Code is now free on OpenCode 256K Context · fully open source Cohere's first coding model
11
31
729
55,524
Ramón retweeted
North Mini Code is now free on OpenCode 256K Context · fully open source Cohere's first coding model
47
80
2,122
212,329
Ramón retweeted
Claude Fable 5 is here. New model generation, new way of working. Here's how to get started in Claude Code and on the Claude Platform: 🧵
Introducing Claude Fable 5: a Mythos-class model that we’ve made safe for general use. Its capabilities exceed those of any model we’ve ever made generally available.
396
951
12,089
2,008,170
Ramón retweeted
BREAKING: Anthropic just dropped Claude Fable 5—this is Mythos, made safe for public release. It is the best coding model in the world. We've been testing it internally @every for the last week or so across coding, writing, marketing, editing, and more—here's our vibe check: - It broke our benchmarks. Fable scored a 91/100 on our Senior Engineer benchmark—this is human senior engineer level. The previous high score was Opus 4.8 at 63. GPT-5.5 is a 62. - It's a one-shot wonder. You can set it and forget for hours or overnight on huge coding tasks, and come back to completed work. It cleared entire production bug backlogs, built a playable 3D, and even made a 2-minute animated film—all one-shot. - Taste and attention to detail. In coding and knowledge work tasks, it has much better taste and attention to detail than we've ever seen. It gets subtle things right, adds little features you might not have thought of, and generally understands the assignment in ways that surprised us. - Great use of context. We set it loose analyzing customer feedback surveys and our website data and it came back with a crisp, clean report that identified a. our biggest problem and b. a concrete testable solution—and then we sent it off to build that. - It's best for power users. If you're already used to orchestrating multiple agents in your work, this model can do things that you've never seen before. If you're a knowledge worker or vibe coder with a more basic setup, you're not going to notice a huge difference—in fact, it probably isn't the right model for you. - It's very slow, token-hungry. Using this thing for regular knowledge work is like squashing an ant with a rocket launcher. It also routinely uses 500k to 1M tokens on tasks. That's why it's best for your heaviest jobs—but not as good for tasks like collaborative writing. - It's expensive. It's about twice as expensive as Opus, and it's also incredibly token hungry—so expect it to be something you'll use sparingly unless your company pays for it. Overall, I think of it like a warp drive for coding: It can get you across the galaxy in a few hours, when it used to take months or years. But it's not appropriate for getting around town—you need something faster, cheaper, and more maneuverable. The ceiling is extraordinarily high on this model though. Even our most advanced testers like @kieranklaassen felt like they were only scratching the surface of it. Want our full vibe check with all of our testing and benchmarks? Read it on @every: every.to/vibe-check/anthropi…
172
310
3,530
610,269
testing fable 5
20
Ramón retweeted
Here’s your monthly reminder that you shouldn’t be prompting coding agents anymore. You should be designing loops that prompt your agents.
1,785
1,373
19,570
8,298,130
Ramón retweeted
Anthropic just open-sourced a reference framework for AI-powered vulnerability discovery and remediation 🤖💀 The workflow: Recon → Find → Verify → Triage → Report → Patch Features: • Threat modeling • Autonomous vulnerability hunting • Crash verification • Finding deduplication • Exploitability analysis • AI-generated patches with validation Built around Claude Code and sandboxed agents using gVisor. 🔗 github.com/anthropics/defend… Interesting signal: AI is moving beyond code generation into autonomous security research and vulnerability management. #CyberSecurity #AppSec #AI #LLM #VulnerabilityManagement #DevSecOps #ClaudeAI
16
170
878
67,755
Ramón retweeted
Your agents can bypass logins on any website 🥷 Here's how to use Browser Use Profiles: > Create a profile and start the setup > Sync your local browser to Browser Use Cloud > Spin up a cloud browser with your synced profile Setup once, stay logged in. Try it now ↓🔗
15
32
488
31,456
Ramón retweeted
As an AI Engineer. Please learn >Harness engineering, not just prompt engineering >Context engineering, not just long prompts >Prompt caching vs. semantic caching tradeoffs >KV cache management, eviction, reuse, and memory pressure at scale >Prefill vs. decode latency and why they optimize differently >Continuous batching, paged attention, and throughput optimization >Speculative decoding vs. quantization vs. distillation tradeoffs >INT8, INT4, FP8, AWQ, GPTQ, and when quantization hurts quality >Structured output failures, schema validation, repair loops, and fallback chains >Function calling reliability, tool contracts, argument validation, and idempotency >Agent guardrails, loop budgets, tool budgets, and termination conditions >Model routing, graceful fallback logic, and degraded-mode UX >RAG architecture: chunking, embeddings, hybrid search, reranking, and freshness >Retrieval evals: recall, precision, grounding, attribution, and citation quality >Evals: golden sets, regression tests, adversarial tests, LLM-as-judge, and human evals >LLM observability as a first-class discipline: traces, spans, tokens, latency, errors, and drift >Cost attribution per feature, workflow, tenant, and user journey not just per model >Safety engineering: prompt injection defense, data leakage prevention, and permission boundaries >Multi-tenant isolation, cache safety, and cross-user context contamination prevention >Fine-tuning vs. in-context learning vs. RAG vs. distillation and when each is the wrong tool >Latency, quality, cost, and reliability tradeoffs across the full inference stack >Production failure modes: hallucinated tool calls, malformed JSON, stale retrieval, runaway agents, and silent eval regressions
104
491
4,284
239,305
wow lm studio for iOS??? x.com/lmstudio/status/206254…

Today.
52
Ramón retweeted
Today.
131
151
2,631
195,389
Ramón retweeted
google launches gemma 4 12b – nearly matches the 26b model on benchmarks, sometimes beats it, at less than half the memory footprint what changed under the hood: • vision. replaced the encoder with a lightweight embedding module (single matrix multiply positional embedding normalization). the llm backbone now handles visual processing directly • audio. encoder removed entirely. raw audio signal is projected straight into the same token space as text • inference. ships with multi-token prediction (mtp) drafters for speculative decoding, cutting latency benchmarks (gemma 3 27b / gemma 4 12b / gemma 4 26b): - gpqa diamond: 44 / 78.8 / ~80 - bbeh: 18 / 53 / 62 mmlu pro: 67 / 77.2 / 78 - livecodebench: 28 / 72 / 76 - docvqa: 83 / 94.9 / 93 - infovqa: 60 / 88.4 / 90 - mmmu pro: 65 / 69.1 / 72 runs locally on consumer laptops with 16gb vram or unified memory – including macbook m-series demo source: google follow @thehypedotnews for 24/7 ai news, analysis and breakdowns
Jun 3
Replying to @Google
Gemma 4 12B delivers great performance with a small memory footprint and a novel architecture.
3
3
28
11,022