Filter
Exclude
Time range
-
Near
9 May 2025
The Leaderboard Illusion Chatbot Arena is a popular leaderboard that compares large language models (LLMs) via anonymous pairwise voting. It plays a growing role in shaping perceptions of model quality — but a detailed audit by researchers from Cohere Labs, Princeton, Stanford, MIT, and others identifies serious structural issues that distort these rankings . 1️⃣ Coordinated Influence Risks: The Arena’s open and anonymous design enables repeated voting, prompt manipulation, and model fingerprinting — allowing ranking manipulation if left unchecked. 2️⃣ Prompt Reuse & Redundancy: Up to 26.5% of prompts are duplicates or near-duplicates, enabling providers with Arena data access to train on likely future prompts — gaining unfair advantage. 3️⃣ Leaderboard Overfitting: Fine-tuning on Arena-style prompts led to a 112% win-rate increase on ArenaHard, but no improvement (even slight drop) on general benchmarks like MMLU. This shows leaderboard-specific optimization, not general capability. 4️⃣ Silent Model Deprecation: 205 models were removed without public notice, while only 47 were officially deprecated. Open-weight and open-source models were most affected, violating fair sampling assumptions of the ranking model (Bradley-Terry). 5️⃣ Data Access Inequality: OpenAI and Google received ~20% of total Arena data each, while 83 open-weight models shared less than 30%. This fuels a feedback loop: more data → better performance → higher sampling → even more data. 📌 The authors emphasize that Chatbot Arena remains a valuable community asset, but propose five actionable changes to improve evaluation integrity: disclose all scores (even private ones), limit concurrent private submissions, standardize model removal, implement fair sampling, and publish full model removal logs. 👥 Authors: Shivalika Singh, Yiyang Nan, Alex Wang, Daniel D’souza, Sayash Kapoor, Ahmet Üstün, Sanmi Koyejo, Yuntian Deng, Shayne Longpre, Noah Smith, Beyza Ermis, Marzieh Fadaee, Sara Hooker. Source: arxiv.org/pdf/2504.20879 #ChatbotArena #ArenaHard #LLM #Benchmark #AIevaluation #ModelTransparency #AISafety #ResponsibleAI #OpenSourceAI #DataImbalance #PrincetonAI #StanfordAI #MITAI #WaterlooAI #AI2 #ModelRanking #Leaderboard #AIresearchTools #LLMtesting #AIgovernance
1
8
313
Waterloo → Apr 21 AI Tinkerers Meetup 📍 Builders Club | 7–9 PM No decks. Just real demos: 🛠️ Prabal Gupta → personalized outreach 🛠️ Shiva Sankar → security reviews w/ Cody AI 🔗 buff.ly/hAWyjNm #AITinkerers #GenAI #LLMs #WaterlooAI
2
137
We said that in relation to #contractcheating and the essay still survived…
1
55
Today #UWaterloo was at @CollisionHQ, the biggest tech conference in North America, getting the scoop on AI and more from experts and attendees! #CollisionConf #Collision2023 @WatSPEEDUW @WaterlooMath @WaterlooAI @uwaterlooARTS @uwaterlooalumni
1
2
13
7,845
What I really liked about this is that it places such great emphasis on authentic expectations in the professional field by looking at real life examples.
5

2 May 2022
These people are being held as prisoners of war. They are being asked to pay 20,000 birr to escape from the camp. This was posted by a Fano in Western Tigray.
4
Rec. today "Building Awareness of Academic Integrity with Badges: Canadian University Context", Alice Schmidt Hanbidge, Tony Tin, Georgina Zaharuk, and Herbert H. Tsang, in Academic Misconduct and Plagiarism, Lexington Books, 2020. @AHanbid @proftsang @WaterlooAI @TrinityWestern
1
1
Excellent #AcademicIntegrity resource for students from @WaterlooAI #cdnpse #HigherEd
1
15 Apr 2021
@HonorableRams @UTM_AIU @AcademicMain @UaeCai @AcademicUcg @WaterlooAI @ENAIntegrity @TweetCAI @AcademikAhlak @AGLOA This #academicintegrity issue needs attention. Please read these blog posts and RT if you think that the Associated Press needs to change its policy. @AP
1
1
2 Mar 2021
Thank you @WaterlooAI! #ICAI2021 has been fantastic so far!!! If you're not there #cdnpse you are missing out!
5
Excellent presenters and experiences! There is so much to learn from the Canadian Consortium! It certainly inspires people to collaborate and advocate for academic integrity!
1
My full GIF-laden lecture (minus voice-over) docs.google.com/presentation…

1
2
AI Seminar Today: Prof. Vijay Ganesh on "Machine Learning and Logic Solvers: The Next Frontier" Oct 29 at 3:30pm via Zoom #waterlooai More info and meeting link on the webpage: uwaterloo.ca/artificial-inte…
3
3
Yes, er.... that was a bit awkward, wasn't it? Well, in my defense, it might have been the only bit that came out in full sentences. We filmed outside and it was freezing, so I may not have been terribly eloquent...
2
Nice report. I'm really not convinced by the benefits of take-home exams over standard coursework (the limited time and additional pressure just fuels academic integrity breaches). Whenever I talk about bodily functions to the media, they always edit those bits out!
1
3
Here is the link to yesterday’s interview with @CBCTheNational on #eproctoring and #academicintegrity . Happy to see @Parntherc and Amanda McKenzie from @WaterlooAI interviewed, too. cbc.ca/player/play/181030355…

1
2
16
This week's waterlo.ai seminar is @WaterlooMath's own Prof. Kate Larson who will talk about Fair Reward Division. #Uwaterloo #WaterlooAI When: Friday, March 13 at 10:30am Where: DC 1302, UWaterloo Campus Who: You! All are welcome! uwaterloo.ca/artificial-inte…

1
2
The robots are ready for the holidays. Have a good rest everyone and see you next year! #waterlooAI waterloo.ai
3
4
Here is the updated link to download the Integrity Matters app uwaterloo.ca/academic-integr…

1