Raymond Weitekamp

Raymond Weitekamp

Photos and videos

Tweets

Pinned Tweet

Raymond Weitekamp

@raw_works

Jan 22

“Study hard what interests you the most in the most undisciplined, irreverent and original manner possible.” ― Richard Feynmann I am currently studying: ralph wiggum, rlms, dspy, gastown & related agent orchestrators, agent-native apps and orgs How about you?

6,185

Ronnie Stoeferle

Raymond Weitekamp retweeted

Ronnie Stoeferle

@RonStoeferle

Jun 12

Ok, that's really good... ht @kevinmuir and esp. @nntaleb

149

24,855

Chris 🇨🇦

Raymond Weitekamp retweeted

Chris 🇨🇦

@llm_wizard

Jun 13

A bit ago: I told my landlord the other day that he should kick anyone out that was peddling ancient artefacts from his basement. Today: WTF my landlord kicked me out for peddling ancient artefacts from his basement. This is surely a misunderstanding!

2,850

Anthropic

Raymond Weitekamp retweeted

Anthropic

@AnthropicAI

Jun 13

The US government, citing national security authorities, has issued an export control directive to suspend all access to Fable 5 and Mythos 5 by any foreign national, whether inside or outside the United States, including foreign national Anthropic employees. The net effect of this order is that we must abruptly disable Fable 5 and Mythos 5 for all our customers to ensure compliance. Access to all other Claude models is not affected. We apologize for this disruption to our customers. We believe this is a misunderstanding and are working to restore access as soon as possible. Read our full statement: anthropic.com/news/fable-myt…

Statement on the US government directive to suspend access to Fable 5 and Mythos 5

The US government has issued an export control directive to suspend all access to Fable 5 and Mythos 5 by any foreign national, whether inside or outside the United States.

anthropic.com

12,393

25,681

87,465

88,036,434

Keenan Crane

Raymond Weitekamp retweeted

Keenan Crane

@keenanisalive

Jun 10

Terrific release from @nvidia and my former PhD student @rohansawhney1: A GPU physics solver for fundamental problems like electrostatics and heat transfer, which handles extremely complex geometry without any mesh generation or basis approximation. Based on Monte Carlo walk on spheres methods developed by our group and others. See this page for lots of background info/tutorials: rohan-sawhney.github.io/mcgp…

Rohan Sawhney @rohansawhney1

Jun 9

Releasing Walk on Spheres Extensions (WoSX): a GPU-accelerated C /Python library for Monte Carlo physics simulation on complex geometry Think path tracing but for physics beyond light transport: heat, electrostatics, potential flow, deformation & more! github.com/nv-tlabs/wosx

111

1,202

103,548

Raymond Weitekamp

Raymond Weitekamp

@raw_works

Jun 11

what do you think this @ExaAILabs agent will come back with?

137

Raymond Weitekamp

Raymond Weitekamp

@raw_works

Jun 11

just sayin'.... if any of the @SpaceX / @xai employees want to become angel investors on friday, i've got the inside scoop on a super hot fire ai startup ;)

171

Raymond Weitekamp

Raymond Weitekamp

@raw_works

Jun 10

And furthermore, the labs should publish the capabilities of these models when used recursively... ...but then the plots will be fractals so that might not become a very popular format since only the RLMs will be able to understand the fractal plots

Noam Brown

@polynoamial

Jun 9

x.com/i/article/205769422698…

725

swyx

Raymond Weitekamp retweeted

swyx

@swyx

Jun 9

just finished rerunning FC Diamond on my historical charts. none of the official tables/charts are capturing the degree of takeoff. x.com/karpathy/status/206440… its this same chart all the way down difficulty classes (below) breaks every curve fit because Fable is a diffferent CLASS of model, with beeeeeg model smell.

Andrej Karpathy

@karpathy

Jun 9

This is a super exciting release - Claude Fable 5 is the same underlying model as Mythos but with added safeguards. The benchmarks are great and it's SOTA on everything by a margin but I'll add that *qualitatively* also, this is a major-version-bump-deserving step change forward (imo of the same order as Claude 4.5 was in November), peaking especially for long problem-solving sessions on very difficult problems. You can give it a lot more ambitious tasks than what you're used to, the model "gets it" and it will just go, and it's never felt this tempting to stop looking at the code at all (but don't do this in prod!). The model still has quirks that people will run into and the safeguards are configured to be a little too trigger happy for launch, which can hopefully be tuned over time. I feel a lot of things changing as working software increasingly comes out on a tap. The Jevon's paradox kicks in and I feel my own demand for software growing substantially. You can ask for anything - explainers, visualizers, dashboards, bespoke single-use apps (e.g. a full wandb that is hyper-specific just for your project), you can 10X your test suite, auto-optimize code, run giant research projects with custom HTML for the results, anything! "Free your mind" (Matrix ref). Really looking forward to all the things people build!

100

36,066

Lance Martin

Raymond Weitekamp retweeted

Lance Martin

@RLanceMartin

Jun 9

x.com/i/article/206438055391…

102

763

6,010

2,171,660

dan

Raymond Weitekamp retweeted

dan

@irl_danB

Jun 9

I spent a lot of the weekend doing recursive dynamic workflows which—while technically blocked by the Claude Code dynamic workflow JS environment in the package itself—can still be accomplished _in effect_ using a technique called trampolining, which allows an outer loop to effectively manage inner recursion out of something like an event queue incidentally, trampolining is how we always got Claude’s agent-subagent architecture (max depth n=1) to implement deeply nested OpenProse blocks I picked this technique up from Spring, would wire together Java calls. better yet this foundation plays especially nicely with middleware (useful in LLM systems!) anyway, dynamic workflows present an even further fertile ground for the trampoline, as you can basically build an entire event loop (in JavaScript!) and manage the inner agent calls as a virtual recursive stack even though the environment permits just a max depth of 1 this is fraught with challenges around shared state, but nothing that hasn’t already been solved

4,735

Raymond Weitekamp

Raymond Weitekamp

@raw_works

Jun 9

for all the people freaking out about loops this week... if you want deterministic code as the outer harness: @DSPyOSS if you want a natural language contract that your agent must fulfill: @OpenProseVM

277

Raymond Weitekamp

Raymond Weitekamp

@raw_works

Jun 9

regarding the latter, i wrote an in depth piece for turing post late last week: turingpost.com/p/openprose-a…

Stop Babysitting Agents, Start Authoring Outcomes

How to turn your best Claude Code and Codex sessions into reusable, reviewable programs written in logical English

turingpost.com

162

Raymond Weitekamp

Raymond Weitekamp

@raw_works

Jun 9

this was a really fun piece to write for @TheTuringPost. i'm very grateful for the opportunity to partner with @Kseniase_ on it. start authoring outcomes!

Turing Post

@TheTuringPost

Jun 8

OpenProse – an open-source "logical English" language that turns your agentic workflows into reusable agent programs. It runs inside the coding agent you already use: Claude Code, Codex, OpenCode, Hermes, Pi – and gives it a structured contract to follow. → The key idea – the coding agent becomes the "compiler". This delivers: - Less babysitting of multi-agent coding sessions - Reviewable .prose.md programs instead of disposable prompts - Explicit skill and tool dependencies - Isolated sub-agent sessions with clean outputs - Run receipts, logs, artifacts, and audit trails - Reuse of sessions on demand So OpenProse extracts the entire workflow: phases, contracts, decision gates, loops, parallel work, mistakes, fixes, and validation evidence. It's like a “git for agent workflows”.

2,737

dan

Raymond Weitekamp retweeted

dan

@irl_danB

Jun 3

everyone is building an agent or a tool you don't want an agent or a tool, you want a reactor I've been working on something cool and I think you'll like it it's simple: an agent session DAG that keeps a declared world-model up to date in an efficient (memoized) render each render node is an agent session: you declare the desired state with OpenProse markdown files once invoked, each agent session acts as the provider. the agent session uses the open source openai-agents-sdk, extensible however you like with any model (I use with opus, sonnet, haiku) the facets of the world-state are memoized, so not every agent has to run on every event, saving you on inference if that sounds a lot like React or dataflow, that's because even in our brave new world the wisdom of the agents holds fast

193

9,544

Shreya Shankar

Raymond Weitekamp retweeted

Shreya Shankar

@sh_reya

Jun 3

It is great that the frontier labs want to support data analytics workflows. It is the #1 thing many enterprises want now. But enterprise data is difficult. Turns out most models are bad at it! They’re ok at writing single SQL queries or Python scripts, thanks to the plethora of text to SQL and data science benchmarks, but they struggle to query, clean, and make sense of data from multiple database systems (relational and non relational). Fortunately, we released a new benchmark to help, the Data Agent Benchmark, with plans to get it into a super well-known benchmark very soon :-) stay tuned! arxiv.org/abs/2603.20576

Can AI Agents Answer Your Data Questions? A Benchmark for Data Agents

Users across enterprises increasingly rely on AI agents to query their data through natural language. However, building reliable data agents remains difficult because real-world data is often...

arxiv.org

OpenAI

@OpenAI

Jun 2

Replying to @OpenAI

Translate data into answers. The data analytics plugin for Codex.

0:44

249

58,737

Vincent Weisser

Raymond Weitekamp retweeted

Vincent Weisser

@vincentweisser

Jun 3

Join us at @PrimeIntellect to build the open stack for self-improving agents Engineering • MTS – Full Stack Software Engineer — SF/Remote, Full time • MTS – GPU Infrastructure — SF/Remote, Full time, Hybrid • MTS – Inference — Remote/SF, Full time, Hybrid • MTS – Sandbox Platform — SF, Full time, On-site • MTS – Security — SF, Full time, On-site • MTS – Training Platform — SF, Full time, On-site Research • Research Engineer – Distributed Training — SF/Remote, Full time • Research Engineer – Reinforcement Learning — SF/Remote, Full time • Research Engineer – RL Infrastructure — SF/Remote, Full time • AI Research Resident – Open Source AGI — Remote, Part time Applied Research • Evals & Data — SF, Full time, Hybrid • Forward-Deployed — SF, Full time, On-site • RL & Agents — SF, Full time Compute / Finance • Head of Compute — SF, Full time • Strategy and Finance Lead, Compute — SF, Full time Finance / Operations • Business Operations Lead — Remote, Full time • Chief of Staff — SF, Full time • Founder's Associate, Business Operations — SF, Full time Growth • Account Executive — SF, Full time • Head of Enterprise — SF, Full time • Head of Growth — SF, Full time • Head of Marketing — SF, Full time • Revenue Operations Lead, AI Infrastructure — Remote, Full time • Solutions Architect – AI Infrastructure — SF, Full time • Technical Account Manager – AI Infrastructure — SF, Full time Legal • Head of Legal — SF, Full time Others • Internship — SF, Full time • Open Application for Unconventional Talent — SF/Remote, Full time

396

44,856

dan

Raymond Weitekamp retweeted

dan

@irl_danB

Jun 2

I was talking with an old friend at dinner last night, he described gpt fatigue I'd examined the whole psychosis thing, but fatigue is different, and this one rings with me. I've been through cycles of this; I'm probably just emerging from my third epoch of gpt fatigue right now it's very bittersweet. the cognitive overhead of context switching across the slop troughs is not as fun as the in-the-zone coding rhythm I enjoyed for the first twelve years of my career I drew the connection to the Reactor Harness I'd been working on and he lit up (yes, this is a launch post, read on). I told him the basic idea: In the same way that React (js) keeps the DOM up to date by rendering declared components efficiently based on upstream events... the Reactor Harness keeps an ideal world-model up to date by rendering declared facets of the world-model efficiently based on upstream events. This world-model can be anything--formally it's a faceted content addressable blob. In practice, it can be a file, a directory, a sqlite table, etc. If you've followed OpenProse since the beginning, you'll guess that the render function is not deterministic byte code, but instead is declarative markdown instructions fulfilled by an agent session. I'm releasing the harness here in beta today. It ships as an SDK so you can plug it into your own projects. We've built an experimental CLI w/ a lightweight server and a devtools package alongside it. Please give it a boost with a Github Star here, then return to read more: github.com/openprose/prose The Reactor Harness is built to run OpenProse, and its design was uniquely informed by the tenets behind OpenProse, such that you can declare your ideal world-model using familiar structured markdown, and you can optionally write imperative fulfillment plans in our original ProseScript. One way of thinking about the Reactor is as a DAG of agent sessions. Every node in the DAG is an agent session tasked with keeping facets of the world model up to date. These are memoized, such that downstream dependent sessions only get re-run when their upstream counterparts change. Over the coming days, I'll be writing more about the harness, how we've been using it, and where I hope we can take this. We are in the process of running benchmarks on it. The goal is that the SDK API itself remains relatively stable while we improve the agent session fulfillment cost, speed, and intelligence under the hood. How did we get here? Today I keep a large share of our company's operations in a directory: tenets, specs, code, analytics, our burn model, our sales lead enrichment CRM. And I find myself having to manually "fast-forward" these models, dropping into context to proactively tell Claude Code to update a spec, add someone to the CRM, or update the financial model based on real-world events that have changed our plans. I'd written many OpenProse scripts to accomplish these things, but I found I wanted this all to run reactively, as an ongoing time-invariant responsibility, rather than proactively as a tool that I continuously return to. Our first attempt at solving this was putting our OpenProse programs on a cron. The problem with that was that it became very expensive very fast. As I started trying to design the OpenProse to work under these constraints, I found myself introducing concepts like memoization to different facets of the maintained state. From there, I decided not reinvent the wheel and to graft the patterns from React more explicitly. It's an analogy, and the analogy breaks down in some places. But it's a reasonable starting point that a lot of people are familiar with. We built the first version in April internally on our hosted service for one of our customers' needs, but I decided that the core of this is interesting enough that it deserves to be in the public sphere, so we spent the intervening six weeks ripping it out and redesigning it as an SDK so people could plug it into their own stacks. I can't emphasize enough how there's nothing new here, we're just applying classical engineering paradigms to our brave new world. We're finding that despite our topsy-turvy reality, the wisdom of the ancients holds fast. The Reactor harness is young, should be used with caution, and has some way to go before it reaches it's ideal form. My ask is that you try it, wire it up to something useful, love it or hate it, and send me honest feedback about your experience. We're always listening and improving. I have a particularly exciting end goal in mind: because the Reactor DAG is itself a world-model, downstream from events in the real world (say, learned event source/frequency/distribution), you could use Reactor itself to implement dynamic Reactor DAGs. This sets us on the path to a true RLM paradigm. I'm most excited about this because I hypothesize that it will yield a simple, elegant property of the Reactor: Inference cost for maintaining a world-model scales with surprise, rather than wall-clock time. It's not there yet, but my expectation is that this is achievable and that it's just a matter of walking up the ladder to get there. In my experience when you make something self-referential too early it can collapse back in on itself, so we're going to step our way there incrementally. When we do get there, my hope is that the meta Reactor DAG can continuously self-calibrate on event source/frequency/distribution to optimize for efficient fulfillment Reactor DAGs, and where the cost of keeping the world-model up to date approaches only the surprise of upstream events in the real world. At dinner last night, a recurring topic was this love-hate relationship we'd developed with the models. I've been building harnesses for these burgeoning minds since my GPT-2 finetune in 2020. In many ways I'm having as much fun as I've ever had with technology. And yet I've come to loathe their impacts in many other regards. In the end, I guess the real goal is that the Reactor Harness lets me unburden the cognitive load at the root of my gpt fatigue, and frees me up for the more interesting forms of gpt psychosis :D thanks for reading and following along, star the repo here and give it a try: github.com/openprose/prose

GitHub - openprose/prose: A new kind of language for a new kind of computer

A new kind of language for a new kind of computer. Contribute to openprose/prose development by creating an account on GitHub.

github.com

128

9,295

kate

Raymond Weitekamp retweeted

kate

@whoiskatrin

Jun 2

🎈 we've just shipped agents sdk v0.14.0 you can now build agents with skills, messengers, schedules, and durable workflows on cloudflare out of the box support for recurring tasks, think workflows, chat recovery, mcp transport improvements, and better client-tool continuations go make cool stuff with it!

386

60,021

Kevin Madura

Raymond Weitekamp retweeted

Kevin Madura

@kmad

Jun 2

So /goal is awesome Over the past few weeks I used @PrimeIntellect to train a 149M late interaction model based on GTE-ModernColBERT-v1 using PyLate, focused on clause extraction from legal contracts. On the MLEB benchmark it does well for its size: it's the best accuracy-per-parameter open model on the task, 3rd of 17 open-source models, ahead of Google's EmbeddingGemma (308M, 0.829) and the same-size legal peer Free Law ModernBERT (0.764), behind only Qwen3-Embedding-4B/8B (which are 27–53× larger). The agents love the prime cli. I only used the UI for paying my bill.

134

10,823

alex zhang

Raymond Weitekamp retweeted

alex zhang

@a1zhang

May 28

In case you're curious about why dynamic workflows are so powerful and the future, read the RLM paper! Opus 4.8 dynamic workflows in Claude Code is perhaps the first instance of a frontier model seriously trained to be an RLM. I suspect within a year they'll just become the standard for nearly all coding agent interactions.

ClaudeDevs

@ClaudeDevs

May 28

New in Claude Code (research preview): dynamic workflows. Claude writes an orchestration script on the fly, then spins up a large fleet of coordinated subagents in parallel to take on your most complex tasks. Use the word "workflow" in a prompt to get started.

168

1,428

295,118