Joined October 2023
2 Photos and videos
Nicholas Edwards retweeted
I’m excited to present this work today at #LREC2026 here in Mallorca, and I’m looking forward to talking to some of you who are around too! #LLMs #nlproc #pragmatics
1
3
1,273
Nicholas Edwards retweeted
Excited to share that RExBench has been accepted to ACL main! 🎉🎉
3
10
49
6,309
RExBench is now available in Terminal Bench (@harborframework)! 🎉 We integrate 2 tasks (cogs, othello) along with a local testing framework so you can test if your agents can autonomously implement novel AI research extensions.
1
2
8
2,178
Thanks to @Mike_A_Merrill and @alexgshaw for early discussions, and to @LinShi592021 and the Adapters team for help with integration!
1
1
216
Check out the original RExBench announcement for more details about the benchmark: x.com/yukyunglee_/status/194…

Can coding agents autonomously implement AI research extensions? We introduce RExBench, a benchmark that tests if a coding agent can implement a novel experiment based on existing research and code. Finding: Most agents we tested had a low success rate, but there is promise!
300
Nicholas Edwards retweeted
Diffusion LLMs can think EoS-by-EoS! The higher the generation length, the better the performance of Masked Diffusion LLMs, even though they generate the same amount of words and only augment them with more and more EoS tokens  👀
1
3
4
310