🪩 DiscoBench rethinks how we evaluate AI research agents. Modular codebases so your agent can focus on the most important discoveries, meta-train/meta-test splits, and a growing set of tasks spanning RL, vision, unlearning, and more.
I will be talking about DiscoBench at the SEA workshop on Dec 7th from 10:30 AM.
Find me at NeurIPS if you want to chat AI for scientific discovery! 🔬
🪩 So excited to reveal DiscoBench: An Open-Ended Benchmark for Algorithm Discovery! 🪩
It addresses the key issues of current evals with its broad task coverage, modular file system, meta-train/meta-test split and emphasis on open-ended tasks! 🧵