Robert Balicki (👀 @IsographLabs)

Robert Balicki (👀 @IsographLabs)

1,042 Photos and videos

Tweets

Pinned Tweet

Robert Balicki (👀 @IsographLabs)

@StatisticsFTW

3 Aug 2024

LFG! That's some nice and intuitive DevEx in @isographlabs

5,358

Joe Choi-Greene

Robert Balicki (👀 @IsographLabs) retweeted

Joe Choi-Greene

@JoeChoiGreene

15h

Completely agree. There's an upfront time cost to get a codebase working with cloud agents, but it's easy and worth it. Cloud agents give you so much leverage and time back. One cursor automation cut down our oncall workload by like 80%. PagerDuty triggers a cloud agent that checks aws logs, posthog, slack, linear, notion, and pylon to gather context and root cause. It generates a report, drafts what to tell affected users, and opens a PR when appropriate. The PRs have a high acceptance rate. This wasn't always the case. At first it was like 50%, which I thought was really high, but makes sense since paging issues are usually pretty narrowly scoped. But the acceptance rate has gone up to like 80%-90% thanks to a weekly self-improvement automation, which we call the meta bot. The meta bot is also a cloud agent but instead it triggers weekly and is prompted to improve the oncall bot. It checks for recent corrective human actions in slack and rejected PRs. Then it opens a PR to improve the oncall bot's prompt and reports in slack what other context it needs in its setup. Most of the time it's just the prompt. Things like remembering to run /babysit to get all the review agents happy before asking for human attention. I guess you could call this a self-improving loop. Not sure i really understand the term "loop" but to me it seems like it's just vaguepostism for "a cloud agent with a cron/webhook trigger mcp to complete a task, and another cloud agent to review and improve the first one." This also accidentally doubled our eng capacity, kinda. I couldn't get the cursor automation to trigger only on pagerduty alerts, so I just set it to trigger on all new messages in our oncall slack channel. Within a week, non-eng teammates began asking questions, then reporting bugs, then kicking off implementations of small customer asks. Very nice to skip the whole triage/intake dance. I get why ppl like devin now. Cloud agents are good at just "getting it" when it has its a dev environment, strong backpressure/CI, and legible company context. I'm a little scared to ask what ppl mean when they say "loops" or whatever but as a dspy stan, self-improving process makes sense to me. So I added another weekly automation that looks back at all the recent automations that led to human follow up touches or rejected PRs and improves the oncall bot's runbooks, prompt, and reports on any missing context or tools. This has incremented the success rate of fully-automated PRs over time. Is this a loop? Idk, it's just a cloud agent cron mcp in my mind, but who cares, it's f-ing dope! Cursor cloud agents is almost perfect for making this stupid easy. Some small things can be better. Video recording is OK but still not great. It doesn't capture how a UI feels, so it's hard to accept a PR without first pulling it down to try it sometimes. I'd much rather use a local browser to access localhost:3000 running on the cloud agent's VM. It'd be sweet to use the cursor browser's component selector tool in the local agents window for a remote session. Actually I bet we can spin up quick session-specific links with something like tailscale or cloudflared or ngrok. Might try that out soon. Which reminds me of another reason why cloud agents beat local parallel agent worktrees. No more container port conflicts, or having to remember which localhost ports map to which agent session. Some types of work are still a better local experience than cloud, at least for me, esp. high touch exploratory work. Thankfully, Cursor makes it pretty seamless to move a session between local and cloud. I'd be surprised if any ADE isn't thinking about how to support sandboxed cloud agents asap. Every ADE needs to run or support cloud sandbox infra or they're gonna fall behind as people switch to cloud

Vincent van der Meulen

@vinvan

Jun 6

some reflections from solely using cloud agents this year: 1. every engineer should default to cloud. it completely changes how you view and use agents. if you run a company, it might be worth mandating everyone starts in cloud 2. cloud agent adoption has been much slower than i expected— e.g. looking at a ton of cursor profiles it’s clear majority cloud usage is still rare 3. getting your dx cloud agent ready still requires creative jiu jitsu. dev infra docs could be much better — “this is how to make our stuff accessible to agents/parallelizable.” luckily investments also benefit humans 4. it’s still a PITA to setup & manage cloud envs across cursor/devin etc. but i assume it’ll get bitter lessoned and we don’t need conventions for setup scripts etc. 5. where are the labs?! would love to see codex et al. invest more in their cloud experience. i know they can do it :) 6. it’s strange that cursor/devin’s investment in mobile apps lags behind their investment in cloud agents. they should go hand in hand. the ability to start agents from slack mobile isn’t enough! 7. a cloud agent spinning up other cloud agents (middle manager pattern) is goated. e.g. nice to go for a run, yap for twenty minutes, and end up with parallel agents. only devin supports this well 8. the uis of ADEs have somewhat adapted for cloud agents. but ui patterns for upcoming long running *and* proactive agents are understudied. super excited to see more experiments here (and will contribute) overall: i freaking love cloud agents. you’ll dissappoint me personally if next month you still spin up more local agents than cloud. very grateful for cursor and devin for making this technology so easy to use!

15,689

Robert Balicki (👀 @IsographLabs)

Robert Balicki (👀 @IsographLabs)

@StatisticsFTW

14h

I am obsessed w/ gh dash. gh dash custom commands that send info to tmux panes (where there are long-lived agents) makes for an amazing workflow: - L => lgtm, marks PR as ready for review ( make a non-binding note to self that I self-approved the PR) - C => gets input, instructs an agent to make said Changes, force push, etc. (it also perhaps rebases?) - X => closes - Q => asks question about PR The other thing is that in gh dash, I have precisely defined searches, namely, for draft PRs that target master. Then, I have a Barnum workflow that lists the PRs that I have self-approved and lands them when they're ready. github.com/dlvhdr/gh-dash

GitHub - dlvhdr/gh-dash: A rich terminal UI for GitHub that doesn't break your flow.

A rich terminal UI for GitHub that doesn't break your flow. - dlvhdr/gh-dash

github.com

Robert Balicki (👀 @IsographLabs)

Robert Balicki (👀 @IsographLabs)

@StatisticsFTW

14h

The close pr one also restacks the dependent PRs

James🗳

Robert Balicki (👀 @IsographLabs) retweeted

James🗳@_fat_ugly_rat_

19h

Replying to @FlipperiPerri

Agriculture is a total flop industry, urban development is a million times more useful and productive.

19,224

Jiahan Chen

Robert Balicki (👀 @IsographLabs) retweeted

Jiahan Chen

@jiahan_c

Jun 13

Replying to @isaac_mason_ @Zackary_Chapple @robpalmer2 @rspack_dev

We’re building a new Rust-based compiler & optimizer for TypeScript, using static type info from typescript-go. Think Google Closure Compiler for TS: smaller/faster apps & libs, targeting ~10–20% bundle size reduction. It’s still private, but we plan to open source it.

159

18,383

Robert Balicki (👀 @IsographLabs)

Robert Balicki (👀 @IsographLabs)

@StatisticsFTW

Jun 13

Like for me, sheer, utter laziness is Claude's biggest flaw

Robert Balicki (👀 @IsographLabs)

Robert Balicki (👀 @IsographLabs)

@StatisticsFTW

Jun 12

Quote tweeting something controversial with a picture (that contradicts the original tweet) with "to be clear, this is not true" is amazing. Everyone who looks at the combo will assume that you're agreeing with them. But the referent of "this" is ambiguous 10/10

Robert Balicki (👀 @IsographLabs)

Robert Balicki (👀 @IsographLabs)

@StatisticsFTW

Jun 12

Now that token-minning is a thing, folks should check out Barnum barnum-circus.github.io/, which lets you do more with cheaper models!

Barnum - The programming language for orchestrating agents

A programming language for asynchronous programming geared towards orchestrating agents.

barnum-circus.github.io

Robert Balicki (👀 @IsographLabs)

Robert Balicki (👀 @IsographLabs)

@StatisticsFTW

Jun 12

Refactoring (reordering) React contexts is hard, and IMO it was always obvious that this was a horrible architectural pattern. Bah

162

Robert Balicki (👀 @IsographLabs)

Robert Balicki (👀 @IsographLabs)

@StatisticsFTW

Jun 12

I should just throw Fable at this lol

Robert Balicki (👀 @IsographLabs)

Robert Balicki (👀 @IsographLabs)

@StatisticsFTW

Jun 12

It annoys me to no end that my job is now interacting with an extremely high functioning bot that says things like "...what's the atomic, guaranteed-safe commit? The honest answer: the hoist can't be that commit. But the unification can — and it's separable." GAH

Robert Balicki (👀 @IsographLabs)

Robert Balicki (👀 @IsographLabs)

@StatisticsFTW

Jun 12

Martin Sheen will play PG in a movie, I'm calling it

Robert Balicki (👀 @IsographLabs)

Robert Balicki (👀 @IsographLabs)

@StatisticsFTW

Jun 11

Google apparently knows something about me I don't know

105

Robert Balicki (👀 @IsographLabs)

Robert Balicki (👀 @IsographLabs)

@StatisticsFTW

Jun 11

When I take a selfie, I stare at myself, and the resulting photo always has me looking not at the camera. IMO when you take a selfie, right before it goes off, the screen should go black, except for an arrow that points you to where you need to stare lol

Robert Balicki (👀 @IsographLabs)

Robert Balicki (👀 @IsographLabs)

@StatisticsFTW

Jun 11

English is not a good programming language

441

Robert Balicki (👀 @IsographLabs)

Robert Balicki (👀 @IsographLabs)

@StatisticsFTW

Jun 10

This is extremely true! But you don't need to spend inordinate amounts of money on it if you use Barnum: barnum-circus.github.io/ The problem with other frameworks is that everything goes through agents, and tokens are expensive!

Barnum - The programming language for orchestrating agents

A programming language for asynchronous programming geared towards orchestrating agents.

barnum-circus.github.io

Walden

@walden_yan

Jun 10

My take 24 hours after Fable 5: Your organization will likely not scale with the exponential curve of AI. I'l just come out to say: This should be a wakeup call for engineering teams. Set up your cloud software factories. Now. Models can now fix impossible bugs, UI-test the hardest flows, writing extremely good code, etc. I have't opened Datadog manually as far as I can remember. AI should be the first-line defense for bugs and feedback. Humans should only look at PRs after an AI has already reviewed it. AI should generate screen recordings of any PR before a human eye even reaches it. The agent should just prompt itself most of the time. Ex. (pictured) our ui feedback channel manages itself, creates tickets, assigns itself automatically You might also be worried about cost. Anthropic, OpenAI, and other labs will likely continue to put out bigger and more expensive models. But, we will also continue to get more capable small models. Not everything will need the smartest models. It's about having the organizational harness in place to continue taking advantage of this rising tide. Moreover, if you use Devin, we've already optimized our harness a bit, and Fable is actually only ~40% more expensive in practice (vs the 2x people assume). I'm honestly pleasantly surprised - it might be higher ROI than you think. Anyway, if you take anything away, engineers shouldn't be manually picking up tickets, humans shouldn't be digging into logs themselves, rethink what you do with your time that shouldn't just be an AI. We need to rethink what humans spend their time going.

109

Walden

Robert Balicki (👀 @IsographLabs) retweeted

Walden

@walden_yan

Jun 10

Cognition

@cognition

Jun 9

Claude Fable 5 is now available in Devin. Fable 5 earns the #1 spot on FrontierCode, our benchmark for real-world engineering tasks that grades mergeability and quality:

917

210,496

Robert Balicki (👀 @IsographLabs)

Robert Balicki (👀 @IsographLabs)

@StatisticsFTW

Jun 10

github.com/dlvhdr/gh-dash is a game changer. I can't believe I didn't know about it until now

GitHub - dlvhdr/gh-dash: A rich terminal UI for GitHub that doesn't break your flow.

A rich terminal UI for GitHub that doesn't break your flow. - dlvhdr/gh-dash

github.com

Robert Balicki (👀 @IsographLabs)

Robert Balicki (👀 @IsographLabs)

@StatisticsFTW

Jun 10

You can say X was built in, you can say X was a built-in, and you can say X was built into, but you can't say X was a built-into.