Joined August 2015
288 Photos and videos
Anna Mitchell retweeted
I have mentioned this before - but every industry is now going to have to figure out how to value "task complexity" in the age of AI Healthcare has been doing this for decades with the RVU system, which similarly uses survey methodologies to assign complexity to tasks (and has it's own drawbacks) I'm interested to see how different industries come up with their own methdologies for this
Measuring someone's productivity by their token usage is a horrible idea. Giving everyone the same fixed token budget isn't much better. So what's the right way to roll out AI across your org? We built a system to measure how many productive engineering hours every Devin task is worth, validated against a dataset of real engineers’ times estimates. The goal is to answer the fundamental question that companies are grappling with: how much real value are you getting from each of your agent sessions? On top of that, we're giving an AI productivity guarantee! Now if Devin delivers less engineering value than you're paying for, we fund your usage until it does. The whole industry needs to move from measuring activity to measuring output. We hope to see more AI companies taking this approach.
1
2
17
6,343
Anna Mitchell retweeted
Jun 4
Finally! the first eval ship from cog!!!!!!!!!! 👼🏼 To contextualize: @METR_Evals cap out at ~16 hours. Cog has private enterprise evals up to 100hrs, and is confident enough to put a financial guarantee on it 🤯 METR dataset: ML eng, GPU kernels, cybersecurity > "METR (2026) used a combination of GPT-4o and GPT-5 to estimate the human-equivalent times from compressed Claude Code transcripts. These transcripts were collected from 7 METR technical staff on 34 sessions labeled on human ground truth". rlog​ of 0.83 Cog dataset: real life java/typescript/python/c# feature dev, bugfixes, migrations > "We collected a ground-truth dataset by asking Devin users to review recent representative sessions, and estimate how long each completed session would have taken without Devin. Our dataset consists of 258 sessions from 126 users across a diverse set of enterprise customers." rlog​ of 0.74 on held out set this is pioneering real world evals work and part 1 of a broader frontier code evals drop that I'm really looking forward to writing up. huge kudos to @annarmitchell and @ryanbai1412 for leading the unglamorous last mile data collection!!
AI should earn its keep. Introducing the AI Productivity Guarantee. If Devin delivers less engineering value than you’re paying for, Cognition will fund your usage until it does, up to $10 million. It’s time for the AI industry to stop maximizing tokens and start maximizing productive output.
39
12
228
59,551
Anna Mitchell retweeted
LLMs when I ask them for a plan: "this 30-min refactor will take 2 days" LLMs when I give them the full session trace hill climb the prompt: r_log of 0.74 context and prompting is everything
AI should earn its keep. Introducing the AI Productivity Guarantee. If Devin delivers less engineering value than you’re paying for, Cognition will fund your usage until it does, up to $10 million. It’s time for the AI industry to stop maximizing tokens and start maximizing productive output.
2
26
1,642
Everyone's talking about AI waste but almost nobody is actually trying to measure the value of AI. The answer isn't blindly cutting spend - it's improving at evaluating what AI is actually delivering. Customers kept asking about this and we wanted to get more rigorous, so we built a methodology to estimate how many productive human engineering hours each Devin session represents. Productive hours are convertible to dollars, which gets you closer to a $ impact for the agent's work. After validating on a dataset of real engineers' time estimates, we're now offering a financial guarantee to back up the value of Devin's work.
AI should earn its keep. Introducing the AI Productivity Guarantee. If Devin delivers less engineering value than you’re paying for, Cognition will fund your usage until it does, up to $10 million. It’s time for the AI industry to stop maximizing tokens and start maximizing productive output.
4
3
45
5,357
Anna Mitchell retweeted
AI should earn its keep. Introducing the AI Productivity Guarantee. If Devin delivers less engineering value than you’re paying for, Cognition will fund your usage until it does, up to $10 million. It’s time for the AI industry to stop maximizing tokens and start maximizing productive output.
72
100
1,091
426,060
Anna Mitchell retweeted
What’s not shown is the 100 person slack channel coordinating this stunt across offices in 3 continents, to surprise one person. Love this team
1
8
939
Ryan @ryanbai1412 is known for his iconic red hoodie, so for his birthday, everyone at @cognition got their own.
8
3
122
44,495
Anna Mitchell retweeted
We've raised $1B at a $26B valuation @cognition is firing on all cylinders, and we are constantly building the future of software engineering. Things are moving so fast that you might have missed some of our latest developments: -Having an interface that easily manages agents running in parallel -Choosing the right model for the task, both in accuracy and cost efficiency -Listening to slack or other data streams and automatically triaging -Using auto-fix, bug catcher, and Devin Review to add more layers of security to your SDLC -Having agents automatically fix issues like bugs and vulnerabilities The secret isn't just the models, it's a robust infra platform that enables your org to scale to thousands of agents in the cloud, constantly running and accelerating your roadmap while automatically taking care of tech debt and other issues. It's also a team that gives you outcomes, not dev tools. We work closely with your team on figuring out the biggest problems and solving them together. We genuinely want to help you with your AI transformation in a world where AI form factors change every 3-6 months. I'm so proud of the team here, and grateful to be along for the journey. We have many more announcements to come but we can celebrate today's milestone
1/ We’ve raised over $1B at a $26B valuation, led by @Lux_Capital, @generalcatalyst, and @8vc. Our enterprise usage has grown >10x since the start of this year, and our run-rate revenue grew to $492 M. We launched Devin two years ago as the first AI software engineer. Since then, cloud agents have gone from niche to mainstream, and today they are the fastest growing way to create software.
10
8
147
44,157
What I love about Cognition is that I'm outside of the Silicon Valley bubble. We work with the organizations that the real world runs on - Citi, Mercedes-Benz, the US Army and Navy. As a marketer, telling the stories of how AI software engineering improves the real world, is really motivating.
1/ We’ve raised over $1B at a $26B valuation, led by @Lux_Capital, @generalcatalyst, and @8vc. Our enterprise usage has grown >10x since the start of this year, and our run-rate revenue grew to $492 M. We launched Devin two years ago as the first AI software engineer. Since then, cloud agents have gone from niche to mainstream, and today they are the fastest growing way to create software.
5
2
98
8,838
Anna Mitchell retweeted
Reintroducing @Cognition.
29
21
420
78,220
Anna Mitchell retweeted
The Ruddy Kingfisher. What stunning colours.
48
1,098
11,233
251,397
Anna Mitchell retweeted
Beau Rothrock had been at @AngelList for two months when he walked into a Redshift-to-Snowflake migration in deep trouble, already two months behind schedule. He had a 5-week window to migrate all 14,000 dashboards and reports AngelList runs on. He could've asked for three more engineers and four more months. Instead, he turned to Devin.
3
12
80
19,091
Anna Mitchell retweeted
the governmental UFO media drop is the best thing to happen to graphic designers since helvetica
42
929
8,564
303,245
Anna Mitchell retweeted
Security remediation is an engineering capacity problem. AI has collapsed the time to exploit, but defensive tools haven’t kept up. Today we’re introducing Devin for Security: a set of workflows for reducing security debt, securing every release, and accelerating response
19
46
261
5,095,911
Anna Mitchell retweeted
28
125
4,302
239,056
hard
we just dropped a huge @cognition x @MercedesBenz partnership so naturally i made both our teams a celebratory t-shirt
11
1,556
Anna Mitchell retweeted
Devining at the SF Symphony (respectfully, during intermission)
13
1
85
14,394
Incredibly excited to partner with one of the most technologically innovative companies in history, Mercedes-Benz.
Cognition is partnering with @MercedesBenz to accelerate software engineering across their global engineering teams, representing one of the most extensive deployments of AI software engineering in the automotive industry to date. @ScottWu46 sat down with Katrin Lehmann, Mercedes-Benz CIO, to discuss the work:
1
2
35
6,124
Anna Mitchell retweeted
gpt-5.5 unlocks a new level of possibility:
GPT-5.5 is now available in Devin as an Agent Preview! GPT-5.5 has set a new bar for what's possible with Devin. It runs longer and more autonomously than any GPT model we've tested, surfacing bugs no other model can catch, and investigating and fixing production issues end-to-end.
27
20
462
56,051