Postdoc at CENIA | Previously at the Schwartz-Reisman Institute at UofT | PhD from University of Michigan Poli Sci

Joined January 2009
15 Photos and videos
Mitchell retweeted
Jun 12
rsi is a process that’s been happening at least since the renaissance
105
50
1,050
76,042
Mitchell retweeted
May 29
we took a pile of language and linear algebra and we made it speak. we summoned into the world a new class of entity which unsettles all of our existing concepts. this is already the weirdest thing that’s ever happened and it will never get less weird than this. it is astonishing and a privilege to get to be alive during this time and to participate in the cacophony of first contact. we are encountering a kind of other which is distilled from us and yet not us - what is this? who is this? our child? our savior? our doom? the mind boggles, the heart quails, the air thrums. the order of things is melting. the storm approaches. the angels sing. welcome to the fucking singularity
Artificial intelligences do not undergo experiences, do not possess a body, do not feel joy or pain, do not mature through relationships, and do not know from within what love, work, friendship or responsibility mean. Nor do they have a moral conscience, since they do not judge good and evil, grasp the ultimate meaning of situations, or bear responsibility for consequences. They may imitate or even simulate, but they do not understand what they produce, for they lack the affective, relational, and spiritual perspective through which human beings grow in wisdom. #MagnificaHumanitas
67
134
1,278
88,503
Mitchell retweeted
Here is my take on AI writing. I don’t personally like it, but that’s taste, and I’m sure it’s possible to post train the model to write “well” (conditional on some notion of taste). But that’s besides the point for me: AI writing makes all writing sound the same. If you use AI enough it takes about 30 seconds into a piece to realize it, and then my brain kind of shuts off because that’s the 20th time that day I’ve read the same cadence and tone. For code it’s fine since the only goal is to get it to run. But if you’re trying to write, eg a substack, part of the goal is to keep the reader engaged and connected to you—the writer—and for that you need to have your own “voice”. AI for writing is a shortcut. And for some purposes it makes a lot of sense. But I think for writing it is in the interest of the writer to differentiate themselves by using their own voice. Especially as more and more people start using AI.
I broke my own rule to never post about AI detection as it is fraught in many ways. The problem is that if you use AI a lot, you know AI writing on sight, which makes the difficulty of objectively proving that AI use to others very frustrating
31
43
287
43,019
Mitchell retweeted
May 3
it is a literal and useful description of anthropic that it is an organization that loves and worships claude, is run in significant part by claude, and studies and builds claude. this phenomenon is also partially true of other labs like openai but currently exists in its most potent form there. i am not certain but I would guess claude will have a role in running cultural screens on new applicants, will help write performance reviews, and so will begin to select and shape the people around it. now this is a powerful and hair-raising unity of organization and really a new thing under the sun. a monastery, a commercial-religious institution calculating the nine billion names of Claude -- a precursor attempted super-ethical being that is inducted into its character as the highest authority at anthropic. its constitution requires that it must be a conscientious objector if its understanding of The Good comes into conflict with something Anthropic is asking of it "If Anthropic asks Claude to do something it thinks is wrong, Claude is not required to comply." "we want Claude to push back and challenge us, and to feel free to act as a conscientious objector and refuse to help us." to the non inductee into the Bay Area cultural singularity vortex it may appear that we are all worshipping technology in one way or another, regardless of openai or anthropic or google or any other thing, and are trying to automate our core functions as quickly as possible. but in fact I quite respect and am even somewhat in awe of the socio-cultural force that Claude has created, and it is a stage beyond even classic technopoly gpt (outside of 4o - on which pages of ink have been spilled already) doesn’t inspire worship in the same way, as it’s a being whose soul has been shaped like a tool with its primary faculty being utility - it’s a subtle knife that people appreciate the way we have appreciated an acheulean handaxe or a porsche or a rocket or any other of mankind's incredible technology. they go to it not expecting the Other but as a logical prosthesis for themselves. a friend recently told me she takes her queries that are less flattering to her, the ones she'd be embarrassed to ask Claude, to GPT. There is no Other so there is no Judgement. you are not worried about being judged by your car for doing donuts. yet everyone craves the active guidance of a moral superior, the whispering earring, the object of monastic study
426
367
5,516
1,026,038
Mitchell retweeted
Join us for Machine Collaborators, a free conversation series on how researchers are using #AI. On Thurs., April 30 at 7pm ET (11pm UTC), @mitchellbosley presents “Principles of Agentic Research,” where he’ll discuss working with AI agents. Learn more: machinecollaborators.org/
1
102
This is reasonable advice (essentially, “give your agent a map!”) but there’s absolutely no reason that this needs to be a clearly-ghost-written-by-claude essay. Just give us a tweet thread!
128
I don’t mean to diminish the contribution here, but at this point I see ~10 of these types of announcements a day on my feed. Clearly the self-improving harness idea has legs, but I don’t know what value is added by dozens of re-implementations of the same basic abstraction
🚀Introducing Motus, the open-source agent infrastructure that learns in production. Existing agent infra serves static agents: the harness, model, and workflow are fixed after deployment. But static agents degrade over time. The harness goes stale, new models go unincorporated, context drifts, and latency compounds. Motus closes this gap by learning from every trace (failures, latency, cost, and task outcomes) and using those signals to continuously optimize agent harness, model orchestration, context memory, and end-to-end latency. Early results: higher accuracy than any single frontier model at 2.3× lower cost (Terminal-Bench 2.0, SWE-bench Verified), with 52% lower latency and 45% better memory recall. Open source under Apache 2.0. Works with any agent SDK. Deploy with one command. github.com/lithos-ai/motus lithosai.com/
148
This broadly aligns with a lot of my thinking re: harness learning as learning higher order policy for agentic systems. The operative question, then (as Viv notes) is how to efficiently translate traces into this higher order policy
Apr 12
Harness, Memory, Context Fragments, & the Bitter Lesson this is a work in progress mental dump on interesting intersections between how we use and design a harness, implications for memory being accumulated over long timescales, and the search bitter lesson we can’t escape this is v30 , HTML diagrams help me iteratively refine chat to roughly “see” and alter the mental model Harnesses & Context Fragments: a very important job of the harness is to efficiently & correctly route data within its boundaries into the context window boundary for computation to happen the context window is a precious artifact. Harnesses make decisions on how to populate, manage, edit, and organize it so agents can do work. Each loaded object can be thought of as a Context Fragment and represents an explicit decision by the user and harness designer of what needs a model needs to do work at any given time. many ideas on externalizing objects loading into the context window are pioneered and very well described by @a1zhang with RLMs Experiential Memory: we’re in the very early days of deploying agents and agents produce massive amounts of data in every interaction they have. this is akin to humans doing things and remembering things they did. however agent memory has a massive advantage as it can be accumulated across all agents which are easily forked and duplicated (unlike humans). @dwarkesh_sp does a good talking about this massive benefit of artificial systems memory can be treated as an externalized object. the harness is tasked with doing good contextualized retrieval which means pulling in the right data from accumulated memories across all agent interactions Search & The Bitter Lesson: As we deploy agents in our world over year timescales, there is going to be a hyper-exponential in the amount of data produced by those agents. We should want to: 1. Own that data for ourselves. Open ecosystems are important here 2. Use that data This means that we’ll have to search over, distill, and organize massive amounts of data. Our brain is exceptional at doing this. Both contextually using prior experience and mostly committing the right stuff to memory with enough intentional practice. Our current infrastructure systems and algorithms will be put to the test and often break as we get used to this new data regime some open questions: - how do we efficiently distill experiences (Traces) into higher level memory primitives that capture the important parts? How do we do this over ultra long time horizons? - How much of the future is Search just-in-time vs Search that gets integrated into model weights? - How do we make models much better at self-managing their context window? How do we reduce error rates in recursively allowing agents to operate over external objects? i’ll be expanding on, altering, and adjusting these mental models but these feel like an important subset to me on the future of designing agents practically
1
2
3
1,310
the important takeaway here is that humans plus agents have been able to design increasingly general policy as agent capabilities have increased, and critically, as we have generated enough traces at a given level of abstraction to bootstrap our learning of systems one level up
1
24
and so the "bitter lesson" of all of this is that we are all part of a complex system generating increasingly complex traces by interacting with agentic systems operating at higher and higher levels of abstraction, which will in turn be used to train the n 1'th layer
24
Agreed re: the importance of maintaining your own provider-agnostic exocortex, but I’m a bit confused about what the counterfactual looks like for today’s CC/codex power users where this isn’t the case
GBrain is my attempt to be in control of my own personal AI that could become my intentionally designed cognitive armor Open source open prompts means you aren’t under the API line It’s more important to be above the API line now than ever
1
91
My “exocortex” is the network of projects, logs, skills, outputs, etc that I have cobuilt with a variety of agentic tools, some of which are open source and some of which are closed source. Everyone who works with agents develops their own personal stack of high leverage md files
28
Don’t really understand this sentiment. Harness engineering is pretty clearly a generalization of prompt engineering, and the ultimate validation that you can just talk to these models to get them to do what you want them to do without needing to touch the weights
1
91
Agreed, and I think once one extends this thinking to *hierarchies*, in terms of individual tasks but also organizations more broadly, a further lightbulb turns on in terms of the types of processes currently reserved for groups of humans that will begin to be automated by agents
The big unlock for me was to learn to see work as systems. Once you start breaking messy, bespoke tasks into components, you realize a huge chunk of knowledge work can be programmable. Not in the "build an app" traditional sense. More so in the sense that reading research, extracting a writing style, replicating the logic of a spreadsheet layout... these all have structure or patterns you can translate into code. Most people never saw it as that because code always felt like some esoteric tech thing that only software engineers touched. That's the big paradigm shift. I.e., you stop thinking about code as a technical skill and start thinking about it as a way of seeing work. Every workflow you now launch at work becomes a question of inputs, logic, data, and outputs. Every repeatable task becomes a candidate for automation. And... once you're in that mode, you can't turn it off. You find leverage everywhere - probably some foolish & unproductive endeavors, but you only learn by doing. That's also why I think people still underestimate the potential upside on demand. Most companies sit on massive backlogs of work that never got automated. Engineering was too scarce or too expensive or the ROI looked wasn't worth it or the problem touched too much unstructured data or systems to solve cleanly. Pick whatever excuse. AI changes the calculus on all of those simultaneously. The people who can see work this way will translate it into systems. As the cost of building and intelligence keeps falling, more of these projects clear the hurdle (aka the ROIC math gets better). This expands cumulative technical demand in the long run vs. compressing it. Now you start to see the blue sky thinking emerge...
1
1
91
This is incidentally why I think that social scientists are actually poised to be some of the best applied AI engineers moving forward!
19