Jackson Clark

Jackson Clark

Photos and videos

Tweets

Pinned Tweet

May 11

Can AI resolve production incidents? I'm so excited to share that our preprint is out for SREGym, our benchmark for AI SRE agents. Preprint: arxiv.org/abs/2605.07161 Leaderboard: sregym.com Let's climb this hill! 💪

SREGym: A Live Benchmark for AI SRE Agents with High-Fidelity...

AI agents are increasingly used to diagnose and mitigate failures in production systems, known as agentic Site Reliability Engineering (SRE). Current SRE benchmarks are limited to oversimplistic...

arxiv.org

727

Jackson Clark

Jackson Clark @HacksonClark

Jun 2

I'm excited to share that I'll be @bespokelabsai this Summer building out some exciting RL environments! Huge thanks to @madiator and @AlexGDimakis for the opportunity. Excited to work with you all! :)

4,855

Yiming Su

Jackson Clark retweeted

Yiming Su

@Yiming_Su3

May 28

The SREGym team (me, @HacksonClark , @lilygn6 in person, @SaadMRP remotely) will be giving a demo at CAIS @CAISconf TODAY at 4:30 PM in San Jose room! Come to chat with us about AI SRE, benchmarking, and more! Bonus: if you have a laptop, we have a self-contained demo/artifact for you to try without LLM credits here: github.com/SREGym/SREGym/tre….

942

Yiming Su

Jackson Clark retweeted

Yiming Su

@Yiming_Su3

May 27

Arrived at @CAISconf today and started the conference. I got to catch up with a few friends (Hi @LakshyAAAgrawal @melissapan @mertcemri @JonSaadFalcon and other folks!) at the Laude Lounge. It's always been great to sync up our research progress and exchange ideas about future directions. Huge thanks to @LaudeInstitute for organizing the Laude Lounge again!!! Surprisingly, there are a few people that also work in AI SRE space which I got to talk to and share what we do at SREGym. It's so encouraging that people were like "Yes please! Being able to simulate production failures is so cool! What failures do you simulate? How does frontier models perform on it?" I got to meet my hero today: I talked to Dave Patterson about a historical project he worked on, called Recovery-Oriented Computing, which is a key inspiration for our work on Stratus (lnkd.in/gB9ecdMi), and a very personal inspiration to be a systems reliability researcher. The papers in the project make me believe that reliability is a design problem, not a {language, framework, architecture, etc.} problem. He shared a few insights into reliability research in the agentic era, which I now have to spend days and nights thinking about. I will be in the bay area until Sunday! Drop me a DM if you want to chat/grab coffee together!

1,320

Jackson Clark

Jackson Clark @HacksonClark

Apr 21

Ever have to switch from claude code to codex since you hit your usage limit? I've had to a lot lately, but one thing that annoyed me was losing my context on prior work. I built handoff to help make coding agent context switching easy: pypi.org/project/handoff-age… Try it out!

handoff-agent

Seamlessly switch between AI coding agents without losing context.

pypi.org

1,100

Yiming Su

Jackson Clark retweeted

Yiming Su

@Yiming_Su3

Apr 15

We are happy to share that our demo paper on SREGym got accepted into ACM CAIS '26 as a System Demonstration submission! A huge thank-you to all people in the team for making this happen. SREGym is AI SRE benchmark consists of challenging, high-fidelity SRE incidents to evaluate AI SRE solutions in their diagnosis and mitigation capabilities. We are very, _very_, happy that the reviewers see and appreciate the importance in the _engineering quality and usability_ of SREGym. We want our user experience to be as smooth as possible, and the reviewers confirm and agree with that (thanks!). Positive signals like this encourages us to keep pushing hard on SREGym and the AI SRE frontier. See you in San Jose!

1,318

Jackson Clark

Jackson Clark @HacksonClark

Apr 15

See you folks in San Jose!

ACM Conference on AI and Agentic Systems

@CAISconf

Apr 15

🚨‼️ DEMO NOTIFICATIONS SENT OUT! All accepted/rejected CAIS'26 demo notifications have gone out We have 45(!!) accepted demos! Up next: we'll soon post demos to the CAIS website. Stay tuned!

570

Yiming Su

Jackson Clark retweeted

Yiming Su

@Yiming_Su3

Apr 7

We are very happy that we got featured on Illinois CS News for SREGym! We are very grateful for being selected by the Laude Institute (@LaudeInstitute) to a part of the Slingshot program, thank you so much Laude! SREGym aims to be _the_ benchmark for AI SRE agents, and we are actively working towards that. Let's keeping pushing @HacksonClark Link to post here: siebelschool.illinois.edu/ne…

Laude Institute selects SREGym for Slingshot program

The Laude Institute, an organization focused on accelerating and funding impactful work in CS and AI, has selected a project led by Illinois CS students for the Slingshot program. Their project,...

siebelschool.illinois.edu

972

Sam Lambert

Jackson Clark retweeted

Sam Lambert

@samlambert

Mar 25

literally anyone can ship quickly if they sacrifice reliability. it’s not in any way impressive.

217

2,282

111,540

Tianyin Xu

Jackson Clark retweeted

Tianyin Xu

@tianyin_xu

Mar 19

Very interesting work on Self-Defining Operator which develops long-term memory to resolve operational issues faster. SDO is one instance of Self-Defining Systems (SDS) led by Vic, @ratulm and colleagues at @uwcse. Love the system-centric vision!

ACM SIGOPS @ACMSIGOPS

Mar 19

New SIGOPS Blog -- "The Long Game: How Agents That Remember Resolve Operational Issues Faster" by Shihang (Vic) Li, Thomas Anderson, Ratul Mahajan, Simon Peter, Luke Zettlemoyer, and the SDS team. sigops.org/2026/the-long-gam…

2,570

Vic Shihang Li

Jackson Clark retweeted

Vic Shihang Li

@sudopowr

Mar 19

Early results on microservice benchmarks: architectural context cuts deployment iterations by 2.5x. Seeding the knowledge base with prior post-mortems reduces MTTR by 38% on SREGym.

952

Laude Institute

Jackson Clark retweeted

Laude Institute

@LaudeInstitute

Feb 26

SREGym/@HacksonClark, Yiming Su (@UofIllinois ) - SRE is where agentic AI gets high-stakes fast: one wrong action can cascade an outage or corrupt data. SREGym introduces safety-first guardrails and realistic benchmarks so AI agents managing production infrastructure are something operators can actually trust.

3,428

Tianyin Xu

Jackson Clark retweeted

Tianyin Xu

@tianyin_xu

Feb 25

Jackson (@HacksonClark) did a qual practice at the Systems Research Seminar (systems-seminar-uiuc.github.…) on his assigned paper, "Intent-Driven Network Management with Multi-Agent LLMs: The Confucius Framework" which is closely related to his research. Great presentation and discussion! We wish him the best for his qual exam.

1,036

NASA

Jackson Clark retweeted

NASA

@NASA

Jan 29

For the crew of Artemis II, their mission will soon be reality. Learn about the challenges they face and the teamwork required to fly around the Moon. Episode 2 of Moonbound is live—and free to watch on NASA : go.nasa.gov/4rg0To1

0:44

629

2,602

14,685

3,769,256

Yinfang

Jackson Clark retweeted

Yinfang @yinfang_chen

Jan 25

I am on the job market - Very happy to give a talk at the UIUC SysNet seminar! Had an amazing discussion that reinforced my belief: Code generation is maturing; beyond it, the bigger challenge is to manage running systems reliably and efficiently. 🚀

7,792

Mike A. Merrill

Jackson Clark retweeted

Mike A. Merrill

@Mike_A_Merrill

Jan 22

The Terminal-Bench paper is here! Read it to learn where frontier models still fail and the secrets of how we sourced hundreds of high quality environments from our open source community. 🧵

102

458

103,745

Laude Institute

Jackson Clark retweeted

Laude Institute

@LaudeInstitute

19 Dec 2025

Across three days at NeurIPS earlier this month, Laude Lounge became a space for open, working conversations about the future of open frontier AI. We just published a complete digital record of the Lounge, including full-length Laudecast interviews (featuring @JeffDean @istoica05 @YejinChoinka @ml_angelopoulos @Yoshua_Bengio @aza @robertnishihara ), demos from emerging research projects (featuring @lisabdunlap @dilarafsoylu @jyangballin @HacksonClark @ryanmart3n @alexgshaw @wen_kaiyue @HannaHajishirzi @etash_guha @ThomsonYenTY @LakshyAAAgrawal @tyler_griggs_ @NeginRaoof_), panel discussions, photographs, and reflections on what researchers and scientists were actually debating off the main conference floor. Plus, there's a really sweet playlist to check out, if you're hoping to relive the Lounge vibes (or experience them for the first time) 😎 ⬇️ laude.org/updates/neurips-20…

4,827

Lisa Dunlap

Jackson Clark retweeted

Lisa Dunlap

@lisabdunlap

15 Dec 2025

🧵Tired of scrolling through your horribly long model traces in VSCode to figure out why your model failed? We made StringSight to fix this: an automated pipeline for analyzing your model outputs at scale. ➡️Demo: stringsight.com ➡️Blog: blog.stringsight.com

1:11

28,246

Laude Institute

Jackson Clark retweeted

Laude Institute

@LaudeInstitute

12 Dec 2025

At @NeurIPSConf last week, @AndyKonwinski sat down with researchers and system builders for unscripted conversations about the frontier of AI for a very special Laudecast. This supercut brings together discussions with @JeffDean @YejinChoinka @istoica05 @Yoshua_Bengio @ml_angelopoulos @HannaHajishirzi @Thom_Wolf @aza @robertnishihara @lateinteraction and so many others from three days across the street and above the conference. Here is the full release. What’s missing from the conversation?

1:35:25

7,316

Jackson Clark

Jackson Clark @HacksonClark

5 Dec 2025

We will be presenting our poster on STRATUS, A Multi-agent System for Autonomous Reliability Engineering of Modern Clouds at NeurIPS tomorrow from 4:30pm-7:30pm. Excited to chat about agents and benchmarks for SRE! neurips.cc/virtual/2025/loc/…

135