Kevin Gu

Kevin Gu

9 Photos and videos

Tweets

Pinned Tweet

Kevin Gu

@kevingu

Apr 9

introducing Dex, the first agent system with full operational context and a self-updating knowledge base karpathy's llm knowledge base on steroids Dex ingests raw events from every app in your workspace. every slack message, email, meeting, browser session, task update compounds into one living knowledge base. background agents continuously monitor and enhance it while you sleep so your agents get smarter every day try now at joindex [dot] com

Regina Lin

@reggitales

Apr 9

Introducing Dex: the self-driving workspace for operators. Dex is the first agent system with full operational context and a self-updating knowledge base. Every datapoint from your workspace is ingested, synced, and structured into compounding context for agents to take action. Comment "DEX" or tag @dexbythirdlayer access. First 1,000 sign-ups get 7 days free. After, join our rolling waitlist. Sign up at joindex [dot] com for a fun surprise. How dex works (threads)

1:20

110

19,797

Regina Lin

Kevin Gu retweeted

Regina Lin

@reggitales

Jun 5

We're hiring a multi-hyphenate product designer with impeccable taste and technical depth. Current UI models no longer work for AI agents. Every interface was designed for a world where humans did the execution. Now that agents handle the execution, interfaces now serve as channels for intent and negotiation. Nobody’s designed these primitives yet. If you want to be at the frontier of defining what this next generation of UI actually looks like, DM me with your work.

2,140

Dex

Kevin Gu retweeted

Dex

@dexbythirdlayer

Jun 4

One of our users hit inbox zero and cleared 1,900 emails in a single session. We shipped a full gmail suite that: - scans your inbox with a personalized triage - drafts replies in your voice - auto syncs your CRM - recovers old leads - personalizes outreach at scale - creates action items sent straight to your Slack. Runs every morning or on-demand. You never have to open your inbox. New features launching every couple days.

0:25

1,107

Dex

Kevin Gu retweeted

Dex

@dexbythirdlayer

Jun 3

four weeks before YC demo day, we killed what we'd been building. it had a 50% CMO response rate, but we couldn’t stop thinking about an idea that nobody had a name for yet… 10,000 downloads later, that idea is Dex.

Notable Capital

@notablecap

Jun 3

On the 1st ep of First Commit, @reggitales explains why she dropped out of Harvard, left a YC batch mid-pivot, and is now building Dex — a browser agent helping knowledge workers offload the manual layer of their day.

1:05

1,893

Kevin Gu

Kevin Gu

@kevingu

May 20

speaking at a panel moderated by @BainCapVC this thursday on predictive proactive interfaces with @kneureither, @justoutquan, and Jenning Chen. rsvp in comments

361

Kevin Gu

Kevin Gu

@kevingu

May 20

luma.com/vs9pfati

Intent: Proactive & Predictive Interfaces · Luma

Most AI products today begin with the same primitive: a chat box. The next generation of interfaces won’t just respond — they’ll anticipate, adapt, and…

luma.com

161

Alex Lieberman

Kevin Gu retweeted

Alex Lieberman

@businessbarista

Apr 16

Someone is going to build a worldclass “Brain” for enterprises & make a stupid amount of money. Why? As @da_fant said, “coding w ai is solved bc all context is in the git repo. knowledge work is difficult bc context is spread out. an ai system that creates a git repo w all context for a knowledge worker will be able to 100% automate the work.” When companies talk about being data ready for AI, this is what they’re implicitly saying. Engineering has been prepared for this moment for a long time because of the deterministic nature of code, the centralization/versioning of data (read: GitHub), and AI tools that are largely build by engineers for engineers. But for the rest of white collar work, there’s a TON of catching up to do to properly harness the power of the technology. The big challenge here, and why no one has truly cracked the code for "an ai system that creates a git repo w all context for a knowledge worker" is because unlike code, most knowledge is 1) distributed, 2) unstructured, and 3) unverifiable. It's distributed: transcripts live in Granola. Documents in Notion. Customer Data in Hubspot. ERP. Emails. Slack messages. Random spreadsheets. SOP docs. Etc. Etc. Building an ingestion engine that connects to all of your disparate data sources and auto-updates based on the shelf-life of the data is the first, and frankly, easiest step of the process. Next, it's unstructured: let's say I want to create a proposal for a potential client. To nail the proposal, I want it to pull important information from a variety of sources. The specific asks & background from our initial sales call. Previous proposals to anchor ourselves to a proven format. And completed sprint boards from Linear, so the pricing & timeline in the document is grounded in truth. Whether it's a thoughtful filesystem (a la Obsidian) or an OpenClaw-esque memory structure, the brain needs to be great at self-organizing in a thoughtful schema. This is very hard, especially if you want to build a generalizable brain that can be shaped to an array of different enterprises. And finally, most knowledge is unverifiable: writing a function, running a unit test, and seeing if the code works is easy. It works or it doesn't. Using AI to accelerate your content creation process is highly subjective. What is a good/bad idea? Is the content in your voice or not? Does it feel like slop or novel? Answering these questions are both difficult and non-verifiable. That same system described above doesn't just have to be great at organizing & forming coherent relationships, but it also has to be great at self-improving based on feedback from the user. Memory systems (like those introduced by OpenClaw) are great to a point, but as you scale the corpus of data within your company's brain, things like compaction and cleaning become wildly important to avoid the needle in the haystack problem. Someone is going to figure out how to solve this problem, and when they do, not only will they make a shit ton of money, but they'll be robinhood for knowledge workers, enabling non-engineers to enjoy the sort of leverage that only technical folks have felt for the last few years.

156

914

206,273

Kevin Gu

Kevin Gu

@kevingu

Apr 9

Regina Lin

@reggitales

Apr 9

1:20

110

19,797

Kevin Gu

Kevin Gu

@kevingu

Apr 9

technical blog post coming soon

658

Regina Lin

Kevin Gu retweeted

Regina Lin

@reggitales

Apr 9

1:20

278

67,728

Ejaaz

Kevin Gu retweeted

Ejaaz

@cryptopunk7213

Apr 4

whoa this is actually fucking sick, a self-improving ai you can use yourself right now (for any task) dude created an ai agent that autonomously upgraded itself to #1 across multiple domains in < 24 hours…. then open sourced the entire thing but here’s why it actually works: - agents fucking suck, not because of the model, because of their harness (tools, system prompts etc) - Auto agent creates a Meta agent that tweaks your agents harness, runs tests, improves it again - until it’s #1 at its goal - best part: you can set this up for ANY task. in this article he uses it for terminal bench (code) and spreadsheets (financial modelling) - it topped rankings for both :) - secret sauce: he used THE SAME MODEL to evaluate the agent - claude managing claude = better understanding of why it failed and how to improve it humans were the fucking bottleneck and this not only saves you a load of time, it’s just a better way to train them for domain specific tasks seriously check it out

Kevin Gu

@kevingu

Apr 2

x.com/i/article/203980704074…

850

230,229

Zayne Sprague

Kevin Gu retweeted

Zayne Sprague

@ZayneSprague

Apr 4

Neat project post. This line of work is making me think of harnesses as transient reasoning scaffolds. Reminds me of earlier “XoT as intermediate structure” work (SatLM, Program-of-Thought, Tree-of-Thought, etc.), but now in an agentic regime with much more room for optimization. I keep wondering about this idea of meta-optimization: how broad should we expect that optimization to get? Should a meta-agent mostly do local search over prompts/tools/hyperparameters, or should it sometimes pursue riskier, longer-horizon interventions? Wilder example, but if we pointed something like AutoResearch at "coding", should we expect it to rediscover higher-level workflows or abstractions akin to Claude Code? My guess is that “meta” agents, or even deeper recursive optimizers, will tend to favor local improvements over sweeping pipeline redesigns. Very targeted and precise changes, even when broader features might help in the long run. Measuring that “meta-scope”, where do agents spend their time optimizing these harnesses, seems worth studying.

Kevin Gu

@kevingu

Apr 2

x.com/i/article/203980704074…

5,351

Kevin Gu

Kevin Gu retweeted

Kevin Gu

@kevingu

Apr 2

repo available here: github.com/kevinrgu/autoagen…

GitHub - kevinrgu/autoagent: autonomous harness engineering

autonomous harness engineering. Contribute to kevinrgu/autoagent development by creating an account on GitHub.

github.com

136

31,789

0xMarioNawfal

Kevin Gu retweeted

0xMarioNawfal

@RoundtableSpace

Apr 3

Send this article to your agent and thank me later x.com/kevingu/status/2039843…

Kevin Gu

@kevingu

Apr 2

x.com/i/article/203980704074…

252

3,074

846,882

tim ganiev

Kevin Gu retweeted

tim ganiev

@postimortem

Apr 3

imo the meta-optimization emergent features are the future of agent engineering. you don't know which concepts the model actually operates on (reminds me of polysemantic neurons), so it ends up improving its harness in ways you'd never hand-engineer pretty cool huh

Kevin Gu

@kevingu

Apr 2

x.com/i/article/203980704074…

2,218

Regina Lin

Kevin Gu retweeted

Regina Lin

@reggitales

Apr 3

point AutoAgent at a task domain with evals. 24 hours later it has domain-specific tooling, verification loops, and orchestration logic. all discovered autonomously.

Kevin Gu

@kevingu

Apr 2

x.com/i/article/203980704074…

2,956

Kevin Gu

Kevin Gu

@kevingu

Apr 3

introducing AutoAgent: an open source library for autonomously improving an agent on any domain we let an agent optimize for 24 hours. it hit #1 on SpreadsheetBench (96.5%) and #1 GPT-5 score on TerminalBench (55.1%). every other entry was human-engineered. ours wasn't.

Kevin Gu

@kevingu

Apr 2

x.com/i/article/203980704074…

633

130,478

more replies

Kevin Gu

Kevin Gu

@kevingu

Apr 3

what we learned: - one agent improving itself doesn't work. being good at a domain ≠ being good at improving at a domain - traces are everything. scores without trajectories killed improvement rate. meta-agent needs proper tooling to test hypotheses and reason through failures

1,641

Kevin Gu

Kevin Gu

@kevingu

Apr 3

as agents pass 99th percentile human performance, our intuitions about harness design become the wrong prior. like AlphaZero, they should discover from first principles. AutoAgent is open source: github.com/kevinrgu/autoagen…

GitHub - kevinrgu/autoagent: autonomous harness engineering

autonomous harness engineering. Contribute to kevinrgu/autoagent development by creating an account on GitHub.

github.com

1,732