Specialized AI for Every Job

Joined May 2025
2 Photos and videos
Pinned Tweet
16 May 2025
Introducing CUB: Humanity's Last Exam for Computer and Browser Use Agents
32
38
249
114,620
29 May 2025
Browser agents use computers the same way humans do, unlocking powerful use cases for personal assistants, browsers, and enterprise workflows. After talking to 20 founders in the space, we're excited to put out the definitive market map for browser agents.
27
83
585
103,854
29 May 2025
Why Now? (4/4) AI-first browsers are poised to disrupt the massive web browser market, with highly anticipated releases like Comet from @perplexity_ai on the way. It's yet to be seen how Google integrates Project Mariner and other AI tools within Chrome.
1
1
17
2,701
29 May 2025
For a deeper dive, check out our blog with @SeanZCai: thetasoftware.ai/#/blog/mark…

1
1
16
2,261
Theta retweeted
16 May 2025
The AI labs need better evals and one of my favorite current YC batch companies just released a one with a *lot* of headroom
16 May 2025
Introducing CUB: Humanity's Last Exam for Computer and Browser Use Agents
20
26
372
62,086
16 May 2025
Introducing CUB: Humanity's Last Exam for Computer and Browser Use Agents
32
38
249
114,620
16 May 2025
Computer/browser use agents still have a long way to go for more complex, end-to-end workflows. Actual task completion is far below our reported numbers: we gave credit for partially correct solutions and reaching key checkpoints. In total, there were less than 10 instances across our thousands of runs where an agent successfully completed a full task.
1
19
3,022
16 May 2025
The Theta team started CUB as an internal evalset, but it quickly grew into a full-fledged benchmark over the past month. We're excited to test even more models and frameworks. For more on the benchmark, including examples and a full paper, check out our blog: thetasoftware.ai/#/blog/intr…

1
20
2,272
Theta retweeted
Theta (@trytheta) allows AI agents to learn from their mistakes in real-time. Their memory layer has already improved the accuracy of OpenAI Operator by 43% with 7x fewer steps taken. ycombinator.com/launches/NTK… Congrats on the launch, @RayanGarg, @tsha444, and @_gurvir_!
21
41
380
52,579