Matthew Farrugia-Roberts

Matthew Farrugia-Roberts

6 Photos and videos

Tweets

Matthew Farrugia-Roberts @MatthewFdashR

May 12

Super excited to have been part of preparing and delivering teaching materials for the Singular Learning Theory day. Thanks to Iliad and my coauthors @FurmanZach and Kai Ogden for making this intensive happen.

Leon Lang

@Lang__Leon

May 12

We just released the full course materials of the Iliad Intensive — a month-long, full-time AI alignment course for mathematicians, physicists, and theoretical computer scientists. ~20 contributors, 19 modules, at a depth that doesn't exist elsewhere for most of these topics. 🧵

517

Matthew Farrugia-Roberts

Matthew Farrugia-Roberts @MatthewFdashR

Jan 8

Have you ever tried to look inside the run folders that W&B makes for every deep learning experiment? Here's a deep dive about how I spent a few weeks freeing 118 GB of experimental archives from undocumented and corrupted binary .wandb files: far.in.net/free-wandb

Matthew Farrugia-Roberts

Matthew Farrugia-Roberts @MatthewFdashR

6 Oct 2025

I'm thrilled to be a part of delivering the first course on AI Safety and Alignment at the University of Oxford! Next week is going to be intense and I'm looking forward to it!

Fazl Barez @FazlBarez

6 Oct 2025

🚨New AI Safety Course @aims_oxford! I’m thrilled to launch a new called AI Safety & Alignment (AISAA) course on the foundations & frontier research of making advanced AI systems safe and aligned at @UniofOxford what to expect 👇 robots.ox.ac.uk/~fazl/aisaa/

192

Matthew Farrugia-Roberts

Matthew Farrugia-Roberts @MatthewFdashR

8 Jul 2025

At least for me, the big-picture motivation behind our RLC paper is a research vision for scalable AI alignment via minimax regret autocurricula. Learn about the paper via co-author @Karim_abdelll: 🧵👉x.com/Karim_abdelll/status/1… Learn about why I think this is important work 🧵👇

Karim Abdel Sadek

@Karim_abdelll

8 Jul 2025

*New AI Alignment Paper* 🚨 Goal misgeneralization occurs when AI agents learn the wrong reward function, instead of the human's intended goal. 😇 We show that training with a minimax regret objective provably mitigates it, promoting safer and better-aligned RL policies!

3,927

more replies

Matthew Farrugia-Roberts

Matthew Farrugia-Roberts @MatthewFdashR

8 Jul 2025

For more complex environments, we still need better UED methods. But UED is young! There are plenty of plausible directions for improving over the methods that have been proposed so far. The question is, is there enough room for improvement for this to help when it counts?

161

Matthew Farrugia-Roberts

Matthew Farrugia-Roberts @MatthewFdashR

8 Jul 2025

Goal misgeneralisation remains an important risk model for future advanced AI systems. We should continue to research how neural networks choose between different solutions and leverage that understanding into methods of avoiding unintended and dangerous solutions in the future.

137

Matthew Farrugia-Roberts

Matthew Farrugia-Roberts @MatthewFdashR

8 Jul 2025

Our paper, "Mitigating goal misgeneralization via minimax regret," will appear at RLC 2025! Congratulations to my co-authors @Karim_abdelll , @usmananwar391, @hannaherlebach, @casdewitt, @DavidSKrueger, and @MichaelD1729 🎉 Preprint out now arxiv.org/abs/2507.03068 Thread soon!

Mitigating Goal Misgeneralization via Minimax Regret

Safe generalization in reinforcement learning requires not only that a learned policy acts capably in new situations, but also that it uses its capabilities towards the pursuit of the designer's...

arxiv.org

189

Matthew Farrugia-Roberts

Matthew Farrugia-Roberts @MatthewFdashR

5 Jul 2025

There are many important social and ethical issues raised by today’s AI technologies. It's also true that as we project developments in AI technology into the future, we can foresee new and different ethical issues that might arise.

108

Matthew Farrugia-Roberts

Matthew Farrugia-Roberts @MatthewFdashR

5 Jul 2025

Accordingly, last year, I was invited to give a guest lecture on ethical questions raised by potential future advancements in AI for the final week of @UniMelb's COMP90087 The Ethics of Artificial Intelligence. youtu.be/-DvCQAiX2QA

Ethics and the Future of Intelligence - Lecture at the University of...

COMP90087 The Ethics of Artificial Intelligence is a Master’s subje...

youtube.com

141