autoresearch is fascinating. For about a week I had an agent autonomously edit code, run training, monitor metrics, and iterate on Mahjong AI reinforcement learning.
It explored PPO hyperparameters, KL penalties, reward structures, opponent selection, and rolled back to stable checkpoints when metrics degraded.
Couldn't solve a fundamentally misaligned reward signal, but watching it map out the failure landscape was eye-opening.