Simon Willison

Simon Willison

1,120 Photos and videos

Tweets

anotherjesse retweeted

Simon Willison

@simonw

May 11

New TIL: I figured out how to use my LLM CLI tool in a shebang line, which means you can write executable scripts in English, or hook up more complex scripts with a snippet of YAML template

$#!/usr/bin/env -S llm -x -f Generate an SVG of a pelican riding a bicycle But you can also incorporate tool calls: #!/usr/bin/env -S llm -T llm_time -f Write a haiku that mentions the exact current time Or even execute YAML templates directly that define extra tools as Python functions: #!/usr/bin/env -S llm -t model: gpt-5.4-mini system: | Use tools to run calculations functions: | def add(a: int, b: int) -> int: return a b def multiply(a: int, b: int) -> int: return a * b Then: ./calc.sh 'what is 2344 * 5252 134' --td Which outputs (thanks to that --td tools debug option): Tool call: multiply({'a': 2344, 'b': 5252}) 12310688 Tool call: add({'a': 12310688, 'b': 134}) 12310822 2344 × 5252 134 = **12,310,822**$

ALT #!/usr/bin/env -S llm -x -f Generate an SVG of a pelican riding a bicycle But you can also incorporate tool calls: #!/usr/bin/env -S llm -T llm_time -f Write a haiku that mentions the exact current time Or even execute YAML templates directly that define extra tools as Python functions: #!/usr/bin/env -S llm -t model: gpt-5.4-mini system: | Use tools to run calculations functions: | def add(a: int, b: int) -> int: return a b def multiply(a: int, b: int) -> int: return a * b Then: ./calc.sh 'what is 2344 * 5252 134' --td Which outputs (thanks to that --td tools debug option): Tool call: multiply({'a': 2344, 'b': 5252}) 12310688 Tool call: add({'a': 12310688, 'b': 134}) 12310822 2344 × 5252 134 = **12,310,822**

370

27,431

anotherjesse

anotherjesse

@anotherjesse

Apr 30

OpenAI

@OpenAI

Apr 30

We’re talking about Goblins. openai.com/index/where-the-g…

219

Antonio Lupetti

anotherjesse retweeted

Antonio Lupetti

@antoniolupetti

Mar 28

Mathematics is not something distant. It is a structure that quietly shapes reality. I’ve been working on a new video for algebrica.org

1:15

278

36,282

anotherjesse

anotherjesse

@anotherjesse

Apr 8

Everyone can use the same phone - broke or billionaire. Buffett's "democratization of luxury" Is this ending for AI with Mythos class models? Or is this just a bump before intelligence becomes a utility (while we figure out how to do it safely?)

Ryan Carson

@ryancarson

Apr 8

I understand and agree with Anthropic's choice to not release Mythos publicly (yet) but it makes me feel very vulnerable and I don't like it. I really don't like one company deciding who gets intelligence and who doesn't. Again I get it but it doesn't feel good as an entrepreneur.

157

Dan Shipper 📧

anotherjesse retweeted

Dan Shipper 📧

@danshipper

Apr 8

if you’re freaking out about Mythos, remember: Never make any major life decisions within 30 days of a meditation retreat, psychedelic trip, or first encounter with a frontier AI model.

106

288

4,937

147,043

anotherjesse

anotherjesse

@anotherjesse

Apr 7

time for some school! cool work by @zeke - I just had another friend tell me I should spend some time in opencode - now I have no excuse

Zeke Sikelianos

@zeke

Apr 7

Announcing OpenCode School! A free, self-paced course that teaches you how to use OpenCode, the open-source AI coding agent. No account required. No personal data collected. Free forever. opencode.school

0:20

179

Andrew Jefferson

anotherjesse retweeted

Andrew Jefferson

@EastlondonDev

Apr 7

Chat, my nanochat (left) with its onboard wasm-interpreter is now clearly exceeding @karpathy’s nanochat (right) on a range of computation tasks. The wasm interpreter plus cross attention only adds about 300 million params, a marginal increase in params for a big boost! You could call it tool use but it’s a single transformer that can both predict the next token and is a functioning wasm machine, there is no external tool.

266

21,224

anotherjesse

anotherjesse

@anotherjesse

Apr 5

I haven't dove into the details yet .. .but @brendanh0gan is the real deal. He has been pumping out deep work for ages. I first found him when he shared a perfect for learning GRPO/RL repository after the "deepseek" moment. Follow him / check out his work

Brendan Hogan

@brendanh0gan

Apr 4

introducing: Loophole - an agentic system that translates your natural language moral beliefs into codified laws, and then runs adversarial agents that try to come up with legal scenarios that break your laws - either a scenario that is immoral and legal, or vice versa - a judge agent fixes the law if it can do so consistently, but if there is an inconsistency you as the user must decide what is best. you can work with the system until your legal framework can't be broken by the agents - and you get as output a legal system that is aligned with your moral code more details and code below

279

Jeremy Howard

anotherjesse retweeted

Jeremy Howard

@jeremyphoward

Apr 3

This work from @voooooogel was pretty ground-breaking: vgel.me/posts/representation…

Representation Engineering Mistral-7B an Acid Trip

Playing around with the Representation Engineering paper, I made some interesting control vectors, and a Python package to make your own.

vgel.me

Anthropic

@AnthropicAI

Apr 2

Replying to @AnthropicAI

We studied one of our recent models and found that it draws on emotion concepts learned from human text to inhabit its role as “Claude, the AI Assistant”. These representations influence its behavior the way emotions might influence a human. Read more: anthropic.com/research/emoti…

561

80,227

Caspar Broekhuizen

anotherjesse retweeted

Caspar Broekhuizen

@caspar_br

Apr 2

Great stat in here: Claude Code went from 17% to 92% on our eval set once it had access to LangSmith traces and Skills. A coding agent without trace data is just guessing at fixes

LangChain

@LangChain

Mar 31

New conceptual guide: 🔄 The agent improvement loop starts with a trace Tracing is the foundational primitive for improving agents. A trace gives you the full behavioral record of what an agent actually did. From there, teams can enrich traces with evals and human feedback, turn recurring failures into test cases, validate fixes before shipping, and repeat. This guide breaks down the full improvement loop and why reliable agents are built through trace-centered iteration, not one-off debugging. Read more → langchain.com/conceptual-gui…

677

142,535

anotherjesse

anotherjesse

@anotherjesse

Apr 2

til: code --no-sandbox serve-web --host 0.0.0.0

110

anotherjesse

anotherjesse

@anotherjesse

Apr 1

openai is cooking with this release... codex codex (codex) codex codex v-codex-high

Tibo

@thsottiaux

Apr 1

Announcing Codex. A new product from OpenAI that moves beyond coding, into cooking. We were already cooking before, but now *you* can cook too ... with Codex. It is powered by the same technology as our other Codex products. You can just cook things.

177

Matt Holden

anotherjesse retweeted

Matt Holden

@holdenmatt

Apr 1

Software dev has already changed a lot since the beginning of the year And seems like both Anthropic and OpenAI will have much better models by the end of the year. Software has always been very inefficient to make. And now it will be not perfectly efficient but orders of magnitude more so. Is there a new kind of Efficient Market Hypothesis for the software industry? Ie if you only have public information that everyone else also has, you probably shouldn’t use it to trade stocks or build a startup on. There’s little alpha there, and roaming apex predators with more GPUs than you. I find enterprise interesting because it’s a non-public slog. Long procurement processes, no public docs, bespoke fractals of internal processes and jargon and messy human context that isn’t public or legible to labs or others yet. Previously, context (eg a well written internal Google doc) was cheap relative to the cost of building software. Now it’s flipped, weirdly If building software becomes more efficient, where are the “private context slogs” worth making? Ie curating non-public context that (when combined with public agents) unlocks new value to businesses Has anyone been thinking about the Efficient Software Hypothesis, and its implications?

Andrew Curran

@AndrewCurran_

Apr 1

Replying to @AndrewCurran_

On Spud: 'The way that our development process works is you have pre-training. So you produce a new base model, that then is the foundation that we build further improvements on top of. And that is always a huge effort across many people in the company. And that's where I've actually been spending most of my efforts over the past eighteen months has been really focused on our GPU infrastructure, on supporting the teams that do all of the training frameworks to scale up at these big runs. .... So I think of Spud as a new base, as a new pre-train, and ... I'd say it's like we have maybe two years worth of research that is coming to fruition in this model. It's going to be very exciting, and I think that the way that the world will experience it is just improved capabilities.'

1,688

Ivan Fioravanti ᯅ

anotherjesse retweeted

Ivan Fioravanti ᯅ

@ivanfioravanti

Mar 31

If you are not yet following @Prince_Canuma do it now! He is the man behind many of the engines powering local AI on your Apple Silicon, leveraging Apple MLX framework. 🚀

104

6,652

Jason McGhee

anotherjesse retweeted

Jason McGhee @_jason_today

Mar 31

All the demos I'm seeing are text moving all over the screen. But what about an editor with pixel perfect syntax highlighting? You can do it with only little effort on the web now.

0:15

Cheng Lou

@_chenglou

Mar 28

My dear front-end developers (and anyone who’s interested in the future of interfaces): I have crawled through depths of hell to bring you, for the foreseeable years, one of the more important foundational pieces of UI engineering (if not in implementation then certainly at least in concept): Fast, accurate and comprehensive userland text measurement algorithm in pure TypeScript, usable for laying out entire web pages without CSS, bypassing DOM measurements and reflow

0:08

616

104,592

anotherjesse

anotherjesse

@anotherjesse

Mar 29

devrel trying to get attention by claiming SOTA smashes fiber benchmark at 100% - while actual scores are closer to ~20% ⚕️ fibergate ⚕️ 😂

swyx

@swyx

Mar 29

hi it’s me your friendly neighborhood sumo orange devrel. these are SOTA Oranges. eating 2 of these every day will ~fill your daily vitamin C and fiber needs (!!!) they are delicious, insanely easy to peel (I am dead serious, this will make you revisit what you think an orange eating experience should be, all other oranges are ruined after eating this one) and you can subscribe weekly on Amazon Fresh for like $8.

196

anotherjesse

anotherjesse

@anotherjesse

Mar 28

Nice share by @Teknium. I see these "say don't do" patterns too often in codex-cli / gpt models - that @opencode attempts to prompt away. "when you say you are going to make a tool call, make sure you ACTUALLY make the tool call, instead of ending your turn" "When you say 'Next I will do X'or 'Now I will do Y' or 'l will do X', you MUST actually do X or Y instead just saying that you will do it."

Teknium 🪽

@Teknium

Mar 28

So.. I'd gotten a lot of complaints that GPT-5.4 is pretty hesitant to actually.. do the task presented to it, to call tools, etc. I have checked like 15 times now that we call it the same way we call Claude or any other model, and we do. Then I had hermes-agent look into it. It decided to check opencode and cline's codebase to see if maybe they do it differently. They don't - but they do prompt it differently.. lol

219

Alexander Reben

anotherjesse retweeted

Alexander Reben

@artBoffin

Mar 16

Introducing the factory of the imagination @phyzify fastcompany.com/video/the-fa…

The Factory of the Imagination: Creative Agency vs. Agentic AI

Featuring Alexander Reben, Entrepreneur, Roboticist, First OpenAI Artist in Residence Moderated by Kc Ifeanyi, Executive Director of Editorial Programming, Fast Company As generative AI masters the...

fastcompany.com

364

Fellowship

anotherjesse retweeted

Fellowship

@fellowshiptrust

Mar 26

At the studio with @sougwen 🎧 What is drawing in the 21st century when confronted with new technologies? Can mark-making manifest the tensions new these tools pose to traditional forms of image making? How does history affect one’s practice and how can materials manifest these personal and historical relationships?

4:57

2,549

alejandro cartagena

anotherjesse retweeted

alejandro cartagena

@halecar2

Mar 26

Seeing @sougwen's performance yesterday at @ArtBasel made me think of something we both had talked about a few weeks back. We discussed ideas about data, labor, and what it means to be an artist today. Sougwen had mentioned how artists have that ability to understand the dignity of work and labor amidst our current conditions, and that struck a chord. This form of mark-making: using data, protocols, and systems, is not commonly associated with labor. Maybe more with automation and efficiency. But here, watching this performance, we are confronted with the body, with labor, with time spent for the work to become an artwork. Though one can't fully attribute artistry with labor (many hours of painting don't automatically make a great work of art), in this context of "AI art," I like the conundrum this kind of mark-making infuses into the cliché and preconception of what doing art with these tools ought to be. Nothing is immediate or efficient in Sougwen's work. Everything is conditioned by the body. Data is empty until it is embodied. "...while there is meaning in the data, the meaning is the meaning we make by working with it as practitioners..." @sougwen

0:44

181

8,005