Joined December 2022
Photos and videos
I'm excited to share that I'll be @bespokelabsai this Summer building out some exciting RL environments! Huge thanks to @madiator and @AlexGDimakis for the opportunity. Excited to work with you all! :)
3
3
33
4,855
Jackson Clark retweeted
The SREGym team (me, @HacksonClark , @lilygn6 in person, @SaadMRP remotely) will be giving a demo at CAIS @CAISconf TODAY at 4:30 PM in San Jose room! Come to chat with us about AI SRE, benchmarking, and more! Bonus: if you have a laptop, we have a self-contained demo/artifact for you to try without LLM credits here: github.com/SREGym/SREGym/tre….

3
12
942
Jackson Clark retweeted
Arrived at @CAISconf today and started the conference. I got to catch up with a few friends (Hi @LakshyAAAgrawal @melissapan @mertcemri @JonSaadFalcon and other folks!) at the Laude Lounge. It's always been great to sync up our research progress and exchange ideas about future directions. Huge thanks to @LaudeInstitute for organizing the Laude Lounge again!!! Surprisingly, there are a few people that also work in AI SRE space which I got to talk to and share what we do at SREGym. It's so encouraging that people were like "Yes please! Being able to simulate production failures is so cool! What failures do you simulate? How does frontier models perform on it?" I got to meet my hero today: I talked to Dave Patterson about a historical project he worked on, called Recovery-Oriented Computing, which is a key inspiration for our work on Stratus (lnkd.in/gB9ecdMi), and a very personal inspiration to be a systems reliability researcher. The papers in the project make me believe that reliability is a design problem, not a {language, framework, architecture, etc.} problem. He shared a few insights into reliability research in the agentic era, which I now have to spend days and nights thinking about. I will be in the bay area until Sunday! Drop me a DM if you want to chat/grab coffee together!
5
17
1,320
Ever have to switch from claude code to codex since you hit your usage limit? I've had to a lot lately, but one thing that annoyed me was losing my context on prior work. I built handoff to help make coding agent context switching easy: pypi.org/project/handoff-age… Try it out!
2
15
1,100
Jackson Clark retweeted
We are happy to share that our demo paper on SREGym got accepted into ACM CAIS '26 as a System Demonstration submission! A huge thank-you to all people in the team for making this happen. SREGym is AI SRE benchmark consists of challenging, high-fidelity SRE incidents to evaluate AI SRE solutions in their diagnosis and mitigation capabilities. We are very, _very_, happy that the reviewers see and appreciate the importance in the _engineering quality and usability_ of SREGym. We want our user experience to be as smooth as possible, and the reviewers confirm and agree with that (thanks!). Positive signals like this encourages us to keep pushing hard on SREGym and the AI SRE frontier. See you in San Jose!
2
5
18
1,318
See you folks in San Jose!
šŸšØā€¼ļø DEMO NOTIFICATIONS SENT OUT! All accepted/rejected CAIS'26 demo notifications have gone out We have 45(!!) accepted demos! Up next: we'll soon post demos to the CAIS website. Stay tuned!
1
3
570
Jackson Clark retweeted
We are very happy that we got featured on Illinois CS News for SREGym! We are very grateful for being selected by the Laude Institute (@LaudeInstitute) to a part of the Slingshot program, thank you so much Laude! SREGym aims to be _the_ benchmark for AI SRE agents, and we are actively working towards that. Let's keeping pushing @HacksonClark Link to post here: siebelschool.illinois.edu/ne…
3
4
12
972
Jackson Clark retweeted
literally anyone can ship quickly if they sacrifice reliability. it’s not in any way impressive.
61
217
2,282
111,540
Jackson Clark retweeted
Very interesting work on Self-Defining Operator which develops long-term memory to resolve operational issues faster. SDO is one instance of Self-Defining Systems (SDS) led by Vic, @ratulm and colleagues at @uwcse. Love the system-centric vision!
New SIGOPS Blog -- "The Long Game: How Agents That Remember Resolve Operational Issues Faster" by Shihang (Vic) Li, Thomas Anderson, Ratul Mahajan, Simon Peter, Luke Zettlemoyer, and the SDS team. sigops.org/2026/the-long-gam…
8
25
2,570
Jackson Clark retweeted
Early results on microservice benchmarks: architectural context cuts deployment iterations by 2.5x. Seeding the knowledge base with prior post-mortems reduces MTTR by 38% on SREGym.
1
1
3
952
Jackson Clark retweeted
SREGym/@HacksonClark, Yiming Su (@UofIllinois ) - SRE is where agentic AI gets high-stakes fast: one wrong action can cascade an outage or corrupt data. SREGym introduces safety-first guardrails and realistic benchmarks so AI agents managing production infrastructure are something operators can actually trust.
2
4
27
3,428
Jackson Clark retweeted
Jackson (@HacksonClark) did a qual practice at the Systems Research Seminar (systems-seminar-uiuc.github.…) on his assigned paper, "Intent-Driven Network Management with Multi-Agent LLMs: The Confucius Framework" which is closely related to his research. Great presentation and discussion! We wish him the best for his qual exam.
1
16
1,036
Jackson Clark retweeted
Jan 29
For the crew of Artemis II, their mission will soon be reality. Learn about the challenges they face and the teamwork required to fly around the Moon. Episode 2 of Moonbound is live—and free to watch on NASA : go.nasa.gov/4rg0To1
629
2,602
14,685
3,769,256
Jackson Clark retweeted
I am on the job market - Very happy to give a talk at the UIUC SysNet seminar! Had an amazing discussion that reinforced my belief: Code generation is maturing; beyond it, the bigger challenge is to manage running systems reliably and efficiently. šŸš€
1
5
30
7,792
Jackson Clark retweeted
The Terminal-Bench paper is here! Read it to learn where frontier models still fail and the secrets of how we sourced hundreds of high quality environments from our open source community. 🧵
21
102
458
103,745
Jackson Clark retweeted
Across three days at NeurIPS earlier this month, Laude Lounge became a space for open, working conversations about the future of open frontier AI. We just published a complete digital record of the Lounge, including full-length Laudecast interviews (featuring @JeffDean @istoica05 @YejinChoinka @ml_angelopoulos @Yoshua_Bengio @aza @robertnishihara ), demos from emerging research projects (featuring @lisabdunlap @dilarafsoylu @jyangballin @HacksonClark @ryanmart3n @alexgshaw @wen_kaiyue @HannaHajishirzi @etash_guha @ThomsonYenTY @LakshyAAAgrawal @tyler_griggs_ @NeginRaoof_), panel discussions, photographs, and reflections on what researchers and scientists were actually debating off the main conference floor. Plus, there's a really sweet playlist to check out, if you're hoping to relive the Lounge vibes (or experience them for the first time) šŸ˜Ž ā¬‡ļø laude.org/updates/neurips-20…
1
5
29
4,827
Jackson Clark retweeted
🧵Tired of scrolling through your horribly long model traces in VSCode to figure out why your model failed? We made StringSight to fix this: an automated pipeline for analyzing your model outputs at scale. āž”ļøDemo: stringsight.com āž”ļøBlog: blog.stringsight.com
3
37
92
28,246
Jackson Clark retweeted
At @NeurIPSConf last week, @AndyKonwinski sat down with researchers and system builders for unscripted conversations about the frontier of AI for a very special Laudecast. This supercut brings together discussions with @JeffDean @YejinChoinka @istoica05 @Yoshua_Bengio @ml_angelopoulos @HannaHajishirzi @Thom_Wolf @aza @robertnishihara @lateinteraction and so many others from three days across the street and above the conference. Here is the full release. What’s missing from the conversation?
7
9
66
7,316
We will be presenting our poster on STRATUS, A Multi-agent System for Autonomous Reliability Engineering of Modern Clouds at NeurIPS tomorrow from 4:30pm-7:30pm. Excited to chat about agents and benchmarks for SRE! neurips.cc/virtual/2025/loc/…

135