Glad my small contribution could support Ai2's efforts to rigorously benchmark AI agents 🫡
Huge congrats to the team!!! 🎉
As part of Asta, our initiative to accelerate science with trustworthy AI agents, we built AstaBench—the first comprehensive benchmark to compare them. ⚖️