The GenBench workshop is back! Do you work on generalisation (benchmarking) in #NLProc? Submit to the 2nd edition (genbench.org/workshop/) co-located with #EMNLP2024. We have a regular track and a ✨collaborative benchmarking task (CBT)✨ that's fully LLM-focused this year (1/6)
so proud of @HayleyRossLing for getting a best paper award at @GenBench this year!! 🎉🪅🎉 I'm sure @TeaAnd_OrCoffee would be too :) check out our paper and share if you think homemade cats are cats!
New paper with @najoungkim and @TeaAnd_OrCoffee testing if LLMs can draw adjective-noun inferences like humans! Turns out they often can, and even generalize to unseen combinations. But they're more optimistic about "artificial intelligence" than humans.
arxiv.org/abs/2410.17482
so proud of @HayleyRossLing for getting a best paper award at @GenBench this year!! 🎉🪅🎉 I'm sure @TeaAnd_OrCoffee would be too :) check out our paper and share if you think homemade cats are cats!
Did you miss the GenBench poster session? Don't worry we've got you, here are (nearly all) posters! 😉 #GenBench2024#EMNLP2024 Next up: keynote by Sameer Singh at 3!
Continuing with Bastian Bunzeck, presenting
The SlayQA benchmark of social reasoning: testing gender-inclusive generalization with neopronouns
aclanthology.org/2024.genben…
Last spotlight presentation:
MMLU-SR: A Benchmark for Stress-Testing Reasoning Capability of Large Language Models
aclanthology.org/2024.genben…
Unfortunately the authors couldn't make it, the work is kindly presented by their colleague Hengyi Wang 🙏
Oral presentation two with @sagnikrayc
Investigating the Generalizability of Pretrained Language Models across Multiple Dimensions: A Case Study of NLI and MRC
aclanthology.org/2024.genben…