I'll be at
@RealAAAI Conference in Philadelphia this week, where I am part of two accepted papers:
1. Quantifying Misalignment Between Agents: Towards a Sociotechnical
Understanding of Alignment, with
@AidanKierans , Hananel Hazan, and
@ShirKi . In this work, we introduce a novel mathematical model to measure misalignment between multiple human and AI agents across various problem domains, moving beyond single-agent or monolithic approaches to alignment. Through simulations and case studies we demonstrate how our model captures nuanced aspects of misalignment in complex sociotechnical environments, providing enhanced explanatory power for real-world scenarios where agents may hold conflicting goals.
Come see our poster during the AI Alignment Track on Friday the 28th - 12:30pm!
2. To Err is AI: A Case Study Informing LLM Flaw Reporting Practices, with
@seanmcgregor ,
@ShayneRedford,
@comathematician, and others! This paper documents lessons learned from a bug bounty event at DEF CON 2024 where 495 hackers tested the Open Language Model (OLMo) for flaws, revealing challenges in AI safety reporting processes. Through real-time adjudication of 200 submissions, we identify key insights for effective flaw reporting programs, including the need for specialized tooling, clear documentation practices, and proper adjudication expertise, demonstrating how systematic evaluation and coordinated, structured flaw reporting of AI systems can help prevent real-world harms.
See this work presented at IAAI in the "AI Safety, Reliability, and Incident Management" session on Thursday the 27th at 2:30pm!
If you're around and want to chat, hit me up! Let's talk AI, Disclosures, Agents, and more!