ai agent groupie ⛬ member of technical staff @FactoryAI

Joined October 2021
Photos and videos
Building software products efficiently requires new tools and processes to keep up with the pace of this industry, and becoming a software factory is the only way to win. @Factory is here to help your organization embrace this new era by automating processes 24/7 and removing bottlenecks.
Today, we're announcing Factory 2.0: from coding agents to software factories.
1
23
1,380
Being efficient and cost-effective no longer requires becoming a model sommelier. The Factory Router just works.
Introducing model routing to Factory. Factory Router picks the right model for every task, automatically. Maintain frontier performance while cutting costs by 25%.
2
22
2,183
Agents are at least as good as an enthusiastic intern. Managers already know how to direct enthusiastic interns. Thus managers shouldn't shy away from delegating tasks to agents, alongside humans. (The reverse logic may apply as well, ICs are already "managing" agents)
Why does everybody want managers to be ICs? Please someone explain this to me from first principles.
2
249
agent crunchwrap retweeted
Big Update🤩: #paperclip now includes full papers from all of arXiv, PubMed Central and 150 million abstracts!🖇️ You can give your LLM all that knowledge in one line—all optimally indexed for AI agents. Much more thorough and ~100x faster than web search, and free.
43
246
1,749
186,025
that electric feeling when @droid release notes are so stacked you need cmd f to find your own changes
1
1
28
631
The last factory built by us. Then factories build themselves. Join us.
2
35
2,832
agent crunchwrap retweeted
my agents are working together in a mono repo

15
28
685
52,459
agent crunchwrap retweeted
We built Missions at Factory, and I wrote about the architecture that I led the design for to make multi-day autonomous coding reliable. Agents are highly reactive to their context. Every design decision follows from keeping each agent's trajectory focused and directionally consistent.
15
32
341
30,213
we can all finally experience what it feels like to be contributing to the linux kernel
After months of research we identified a critical gap in developer tooling. Today we're fixing that. We are open sourcing Cursed Plugins: a suite of tools to deliver candid, expert reviews your code (and your peers), assess the architectural harmony of your codebase, generate obituaries for your dead code, or convert your entire project to COBOL.
9
503
agent crunchwrap retweeted
GStack now supports Factory Droid @FactoryAI Thanks for getting me to do it @matanSF
23
34
333
26,182
agent crunchwrap retweeted
these days i just set a mission for my droids (@FactoryAI) to work on then i go on walks this is life
1
8
543
/enter-mission to accomplish your most ambitious vision with laser-focused precision and very little supervision
Droids can now pursue goals autonomously over multi-day horizons. You describe what you want, approve the plan, and come back to finished work. We call these Missions.
1
1
10
465
agent crunchwrap retweeted
New design work for Factory
65
142
2,582
64,990
make sure to pave your roads before driving your ferraris
Introducing Agent Readiness. AI coding agents are only as effective as the environment in which they operate. Agent Readiness is a framework to measure how well a repository supports autonomous development. Scores across eight axes place each repo at one of five maturity levels.
7
211
and frontend/fullstack now also involves cli work
This quote from @GergelyOrosz perfectly captures the culture at @FactoryAI: "I struggle to foresee startups hiring separate frontend and backend devs: they’ll just hire a specialist whom they trust will use AI to unblock themself across the stack." All our backend engineers ship frontend code and vice versa. How? - Unified Language: Full-stack TypeScript removes the language barrier. - Simplified React: A robust component library and the React Compiler mean no useMemo or useCallback. Plus, useEffect is banned. - Agent-Ready Codebase: Strict type-checking, unit tests, linting, React vitest, Storybook, and Playwright e2e provide automated guardrails. - Droid Reviews: Automated final safety checks to catch bugs before they merge. Finally, the 'top-of-funnel': when prompting Droids, they strictly follow our AGENTS.md to ensure every line of code adheres to our standards.
4
150
😍
30 Oct 2025
messing around with shaders hardly understand this math, but i'm so happy with how it turned out ahhhh it just keeps going
3
547
the devs yearn for the factories
24 Oct 2025
managing fleets of agents should be more fun than playing factorio with the UI/UX to boot
9
3,991
agent crunchwrap retweeted
24 Oct 2025
managing fleets of agents should be more fun than playing factorio with the UI/UX to boot
85
52
1,094
134,384
babe, wake up! a new karpathy nano repo just dropped!
Excited to release new repo: nanochat! (it's among the most unhinged I've written). Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single, dependency-minimal codebase. You boot up a cloud GPU box, run a single script and in as little as 4 hours later you can talk to your own LLM in a ChatGPT-like web UI. It weighs ~8,000 lines of imo quite clean code to: - Train the tokenizer using a new Rust implementation - Pretrain a Transformer LLM on FineWeb, evaluate CORE score across a number of metrics - Midtrain on user-assistant conversations from SmolTalk, multiple choice questions, tool use. - SFT, evaluate the chat model on world knowledge multiple choice (ARC-E/C, MMLU), math (GSM8K), code (HumanEval) - RL the model optionally on GSM8K with "GRPO" - Efficient inference the model in an Engine with KV cache, simple prefill/decode, tool use (Python interpreter in a lightweight sandbox), talk to it over CLI or ChatGPT-like WebUI. - Write a single markdown report card, summarizing and gamifying the whole thing. Even for as low as ~$100 in cost (~4 hours on an 8XH100 node), you can train a little ChatGPT clone that you can kind of talk to, and which can write stories/poems, answer simple questions. About ~12 hours surpasses GPT-2 CORE metric. As you further scale up towards ~$1000 (~41.6 hours of training), it quickly becomes a lot more coherent and can solve simple math/code problems and take multiple choice tests. E.g. a depth 30 model trained for 24 hours (this is about equal to FLOPs of GPT-3 Small 125M and 1/1000th of GPT-3) gets into 40s on MMLU and 70s on ARC-Easy, 20s on GSM8K, etc. My goal is to get the full "strong baseline" stack into one cohesive, minimal, readable, hackable, maximally forkable repo. nanochat will be the capstone project of LLM101n (which is still being developed). I think it also has potential to grow into a research harness, or a benchmark, similar to nanoGPT before it. It is by no means finished, tuned or optimized (actually I think there's likely quite a bit of low-hanging fruit), but I think it's at a place where the overall skeleton is ok enough that it can go up on GitHub where all the parts of it can be improved. Link to repo and a detailed walkthrough of the nanochat speedrun is in the reply.
3
395