Ramón

Ramón

1,292 Photos and videos

Tweets

Ramón

@learntouseai

50m

if you had to build and train a large language model from scratch, which domain would you specialize it in, and why?

Z.ai

Ramón retweeted

Z.ai

@Zai_org

Jun 13

Intelligence should be open, accessible, and ready to build with, empowering every developer, everywhere. GLM-5.2 is now available to all GLM Coding Plan users, including Lite, Pro, Max, and Team plans. docs.z.ai/devpack/latest-mod… As our new flagship model, GLM-5.2 delivers powerful coding capabilities, usable 1M-context support, and continued strengths in long-horizon tasks. API and Chatbot services will launch next week. The model will also be officially open-sourced next week under the MIT License. The future of AI is open, and it belongs to the people.

How to Switch Models - Overview - Z.AI DEVELOPER DOCUMENT

docs.z.ai

316

918

7,706

1,861,479

faraz

Ramón retweeted

faraz @farazdotai

Jun 12

Looking for a Fall intern to work with me on agentic kernel generation/optimization for LLM inference on TensorRT-LLM @nvidia. Great fit if you’re into compilers, GPU/ML performance, kernels, or systems. US/Canada. Apply by email only: fkhoubsirat@nvidia.com Subject: Fall Intern Candidate - Agentic Kernel Tooling Plz include your resume any relevant blogs, papers, or open-source work.

262

21,050

Joel - coffee/acc

Ramón retweeted

Joel - coffee/acc

@JoelDeTeves

Jun 12

I accidentally discovered that Gemma-4-26B-A4B is way better at writing human sounding content than every other model out there - including frontier models like GPT 5.5 and Sonnet 4.6. I'm not sure why this is - it's kind of crazy how slopified these big expensive models are and for some reason, Google's open source model sounds a lot more natural and follows writing instructions better. WTF?

800

54,494

MiniMax (official)

Ramón retweeted

MiniMax (official)

@MiniMax_AI

Jun 12

MiniMax M3, Open-Weight, Now On Hugging Face , with only ~428B parameters and ~23B activated parameters Weights: huggingface.co/MiniMaxAI/Min… MiniMax Sparse Attention: huggingface.co/papers/2606.1…

MiniMaxAI/MiniMax-M3 · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

MiniMax (official)

@MiniMax_AI

Jun 1

Introducing MiniMax M3: The First Open-Weights Model to Combine Three Frontier Capabilities - Coding & Agentic Frontier: 59.0% SWE-Bench Pro, 66.0% Terminal Bench 2.1, 34.8% SWE-fficiency, 28.8% KernelBench Hard, 74.2% MCP Atlas - MiniMax Sparse Attention scales context to 1M - Natively Multimodal from Step Zero API: platform.minimax.io Token Plan: platform.minimax.io/subscrib… 🚀New! MiniMax Code: code.minimax.io Weights & Tech Report in ~10 Days

114

328

2,775

646,516

Ronan Collobert

Ramón retweeted

Ronan Collobert

@trebolloc

Jun 11

🚀 The MLX team is growing! If you love writing blazing-fast GPU kernels or implementing foundational models in Python & Swift, we want you. Drop a DM or apply below! 👇 #MachineLearning #AppleSilicon jobs.apple.com/en-us/details… github.com/ml-explore

AIML - Machine Learning Engineer for MLX, MLR - Jobs - Careers at Apple

Apply for a AIML - Machine Learning Engineer for MLX, MLR job at Apple. Read about the role and find out if it’s right for you.

jobs.apple.com

422

129,884

Pietro Schirano

Ramón retweeted

Pietro Schirano

@skirano

Jun 11

You should basically never use Fable for coding, but instead use it as a planner/orchestrator. Most of today's advanced models can implement a spec perfectly, and once done you can send the work to Fable to review. This has been my most powerful flow so far.

167

116

3,345

214,858

ClaudeDevs

Ramón retweeted

ClaudeDevs

@ClaudeDevs

Jun 9

We've reset 5-hour and weekly rate limits for all users. Enjoy Fable 5!

1,355

1,818

35,762

2,201,457

Cohere

Ramón retweeted

Cohere

@cohere

Jun 9

Sovereign AI for all.

OpenCode

@opencode

Jun 9

North Mini Code is now free on OpenCode 256K Context · fully open source Cohere's first coding model

729

55,524

OpenCode

Ramón retweeted

OpenCode

@opencode

Jun 9

North Mini Code is now free on OpenCode 256K Context · fully open source Cohere's first coding model

2,122

212,329

ClaudeDevs

Ramón retweeted

ClaudeDevs

@ClaudeDevs

Jun 9

Claude Fable 5 is here. New model generation, new way of working. Here's how to get started in Claude Code and on the Claude Platform: 🧵

Claude

@claudeai

Jun 9

Introducing Claude Fable 5: a Mythos-class model that we’ve made safe for general use. Its capabilities exceed those of any model we’ve ever made generally available.

0:20

396

951

12,089

2,008,170

Dan Shipper 📧

Ramón retweeted

Dan Shipper 📧

@danshipper

Jun 9

BREAKING: Anthropic just dropped Claude Fable 5—this is Mythos, made safe for public release. It is the best coding model in the world. We've been testing it internally @every for the last week or so across coding, writing, marketing, editing, and more—here's our vibe check: - It broke our benchmarks. Fable scored a 91/100 on our Senior Engineer benchmark—this is human senior engineer level. The previous high score was Opus 4.8 at 63. GPT-5.5 is a 62. - It's a one-shot wonder. You can set it and forget for hours or overnight on huge coding tasks, and come back to completed work. It cleared entire production bug backlogs, built a playable 3D, and even made a 2-minute animated film—all one-shot. - Taste and attention to detail. In coding and knowledge work tasks, it has much better taste and attention to detail than we've ever seen. It gets subtle things right, adds little features you might not have thought of, and generally understands the assignment in ways that surprised us. - Great use of context. We set it loose analyzing customer feedback surveys and our website data and it came back with a crisp, clean report that identified a. our biggest problem and b. a concrete testable solution—and then we sent it off to build that. - It's best for power users. If you're already used to orchestrating multiple agents in your work, this model can do things that you've never seen before. If you're a knowledge worker or vibe coder with a more basic setup, you're not going to notice a huge difference—in fact, it probably isn't the right model for you. - It's very slow, token-hungry. Using this thing for regular knowledge work is like squashing an ant with a rocket launcher. It also routinely uses 500k to 1M tokens on tasks. That's why it's best for your heaviest jobs—but not as good for tasks like collaborative writing. - It's expensive. It's about twice as expensive as Opus, and it's also incredibly token hungry—so expect it to be something you'll use sparingly unless your company pays for it. Overall, I think of it like a warp drive for coding: It can get you across the galaxy in a few hours, when it used to take months or years. But it's not appropriate for getting around town—you need something faster, cheaper, and more maneuverable. The ceiling is extraordinarily high on this model though. Even our most advanced testers like @kieranklaassen felt like they were only scratching the surface of it. Want our full vibe check with all of our testing and benchmarks? Read it on @every: every.to/vibe-check/anthropi…

16:37

172

310

3,530

610,269

Ramón

Ramón

@learntouseai

Jun 9

testing fable 5

Peter Steinberger 🦞

Ramón retweeted

Peter Steinberger 🦞

@steipete

Jun 7

Here’s your monthly reminder that you shouldn’t be prompting coding agents anymore. You should be designing loops that prompt your agents.

1,785

1,373

19,570

8,298,130

Vivek | Cybersecurity

Ramón retweeted

Vivek | Cybersecurity

@VivekIntel

Jun 3

Anthropic just open-sourced a reference framework for AI-powered vulnerability discovery and remediation 🤖💀 The workflow: Recon → Find → Verify → Triage → Report → Patch Features: • Threat modeling • Autonomous vulnerability hunting • Crash verification • Finding deduplication • Exploitability analysis • AI-generated patches with validation Built around Claude Code and sandboxed agents using gVisor. 🔗 github.com/anthropics/defend… Interesting signal: AI is moving beyond code generation into autonomous security research and vulnerability management. #CyberSecurity #AppSec #AI #LLM #VulnerabilityManagement #DevSecOps #ClaudeAI

GitHub - anthropics/defending-code-reference-harness: Skills for threat modeling, scanning, triage,...

Skills for threat modeling, scanning, triage, patching, plus an autonomous scanning harness you can /customize - anthropics/defending-code-reference-harness

github.com

170

878

67,755

Browser Use

Ramón retweeted

Browser Use

@browser_use

Jun 5

Your agents can bypass logins on any website 🥷 Here's how to use Browser Use Profiles: > Create a profile and start the setup > Sync your local browser to Browser Use Cloud > Spin up a cloud browser with your synced profile Setup once, stay logged in. Try it now ↓🔗

0:36

488

31,456

diva

Ramón retweeted

diva

@divaagurlxw

Jun 4

As an AI Engineer. Please learn >Harness engineering, not just prompt engineering >Context engineering, not just long prompts >Prompt caching vs. semantic caching tradeoffs >KV cache management, eviction, reuse, and memory pressure at scale >Prefill vs. decode latency and why they optimize differently >Continuous batching, paged attention, and throughput optimization >Speculative decoding vs. quantization vs. distillation tradeoffs >INT8, INT4, FP8, AWQ, GPTQ, and when quantization hurts quality >Structured output failures, schema validation, repair loops, and fallback chains >Function calling reliability, tool contracts, argument validation, and idempotency >Agent guardrails, loop budgets, tool budgets, and termination conditions >Model routing, graceful fallback logic, and degraded-mode UX >RAG architecture: chunking, embeddings, hybrid search, reranking, and freshness >Retrieval evals: recall, precision, grounding, attribution, and citation quality >Evals: golden sets, regression tests, adversarial tests, LLM-as-judge, and human evals >LLM observability as a first-class discipline: traces, spans, tokens, latency, errors, and drift >Cost attribution per feature, workflow, tenant, and user journey not just per model >Safety engineering: prompt injection defense, data leakage prevention, and permission boundaries >Multi-tenant isolation, cache safety, and cross-user context contamination prevention >Fine-tuning vs. in-context learning vs. RAG vs. distillation and when each is the wrong tool >Latency, quality, cost, and reliability tradeoffs across the full inference stack >Production failure modes: hallucinated tool calls, malformed JSON, stale retrieval, runaway agents, and silent eval regressions

104

491

4,284

239,305

Ramón

Ramón

@learntouseai

Jun 4

wow lm studio for iOS??? x.com/lmstudio/status/206254…

LM Studio

@lmstudio

Jun 4

Today.

LM Studio

Ramón retweeted

LM Studio

@lmstudio

Jun 4

Today.

131

151

2,631

195,389

thehype.

Ramón retweeted

thehype.

@thehypedotnews

Jun 3

google launches gemma 4 12b – nearly matches the 26b model on benchmarks, sometimes beats it, at less than half the memory footprint what changed under the hood: • vision. replaced the encoder with a lightweight embedding module (single matrix multiply positional embedding normalization). the llm backbone now handles visual processing directly • audio. encoder removed entirely. raw audio signal is projected straight into the same token space as text • inference. ships with multi-token prediction (mtp) drafters for speculative decoding, cutting latency benchmarks (gemma 3 27b / gemma 4 12b / gemma 4 26b): - gpqa diamond: 44 / 78.8 / ~80 - bbeh: 18 / 53 / 62 mmlu pro: 67 / 77.2 / 78 - livecodebench: 28 / 72 / 76 - docvqa: 83 / 94.9 / 93 - infovqa: 60 / 88.4 / 90 - mmmu pro: 65 / 69.1 / 72 runs locally on consumer laptops with 16gb vram or unified memory – including macbook m-series demo source: google follow @thehypedotnews for 24/7 ai news, analysis and breakdowns

0:42

Google

@Google

Jun 3

Replying to @Google

Gemma 4 12B delivers great performance with a small memory footprint and a novel architecture.

A bar chart titled "Gemma 4 12B Benchmarks" comparing performance across three models: Gemma 3 27B (grey), Gemma 4 12B (light blue), and Gemma 4 26B (dark blue). The chart shows Gemma 4 12B significantly outperforming the older, larger Gemma 3 27B across various evaluation benchmarks, including GPQA Diamond (78.8), BBEH (53), MMLU Pro (77.2), LiveCode Bench (72), DocVQA (94.9), InfoVQA (88.4), MMMU Pro (69.1), and MRCR v2 8 needle 128k average (43.4). Gemma 4 26B shows the highest scores across all metrics.

ALT A bar chart titled "Gemma 4 12B Benchmarks" comparing performance across three models: Gemma 3 27B (grey), Gemma 4 12B (light blue), and Gemma 4 26B (dark blue). The chart shows Gemma 4 12B significantly outperforming the older, larger Gemma 3 27B across various evaluation benchmarks, including GPQA Diamond (78.8), BBEH (53), MMLU Pro (77.2), LiveCode Bench (72), DocVQA (94.9), InfoVQA (88.4), MMMU Pro (69.1), and MRCR v2 8 needle 128k average (43.4). Gemma 4 26B shows the highest scores across all metrics.

11,022