Arrived at
@CAISconf today and started the conference. I got to catch up with a few friends (Hi
@LakshyAAAgrawal @melissapan @mertcemri @JonSaadFalcon and other folks!) at the Laude Lounge. It's always been great to sync up our research progress and exchange ideas about future directions. Huge thanks to
@LaudeInstitute for organizing the Laude Lounge again!!!
Surprisingly, there are a few people that also work in AI SRE space which I got to talk to and share what we do at SREGym. It's so encouraging that people were like "Yes please! Being able to simulate production failures is so cool! What failures do you simulate? How does frontier models perform on it?"
I got to meet my hero today: I talked to Dave Patterson about a historical project he worked on, called Recovery-Oriented Computing, which is a key inspiration for our work on Stratus (
lnkd.in/gB9ecdMi), and a very personal inspiration to be a systems reliability researcher. The papers in the project make me believe that reliability is a design problem, not a {language, framework, architecture, etc.} problem. He shared a few insights into reliability research in the agentic era, which I now have to spend days and nights thinking about.
I will be in the bay area until Sunday! Drop me a DM if you want to chat/grab coffee together!