AISecHub

AISecHub

Users
Tweets

AISecHub

@AISecHub

9 May 2025

The Leaderboard Illusion Chatbot Arena is a popular leaderboard that compares large language models (LLMs) via anonymous pairwise voting. It plays a growing role in shaping perceptions of model quality — but a detailed audit by researchers from Cohere Labs, Princeton, Stanford, MIT, and others identifies serious structural issues that distort these rankings . 1️⃣ Coordinated Influence Risks: The Arena’s open and anonymous design enables repeated voting, prompt manipulation, and model fingerprinting — allowing ranking manipulation if left unchecked. 2️⃣ Prompt Reuse & Redundancy: Up to 26.5% of prompts are duplicates or near-duplicates, enabling providers with Arena data access to train on likely future prompts — gaining unfair advantage. 3️⃣ Leaderboard Overfitting: Fine-tuning on Arena-style prompts led to a 112% win-rate increase on ArenaHard, but no improvement (even slight drop) on general benchmarks like MMLU. This shows leaderboard-specific optimization, not general capability. 4️⃣ Silent Model Deprecation: 205 models were removed without public notice, while only 47 were officially deprecated. Open-weight and open-source models were most affected, violating fair sampling assumptions of the ranking model (Bradley-Terry). 5️⃣ Data Access Inequality: OpenAI and Google received ~20% of total Arena data each, while 83 open-weight models shared less than 30%. This fuels a feedback loop: more data → better performance → higher sampling → even more data. 📌 The authors emphasize that Chatbot Arena remains a valuable community asset, but propose five actionable changes to improve evaluation integrity: disclose all scores (even private ones), limit concurrent private submissions, standardize model removal, implement fair sampling, and publish full model removal logs. 👥 Authors: Shivalika Singh, Yiyang Nan, Alex Wang, Daniel D’souza, Sayash Kapoor, Ahmet Üstün, Sanmi Koyejo, Yuntian Deng, Shayne Longpre, Noah Smith, Beyza Ermis, Marzieh Fadaee, Sara Hooker. Source: arxiv.org/pdf/2504.20879 #ChatbotArena #ArenaHard #LLM #Benchmark #AIevaluation #ModelTransparency #AISafety #ResponsibleAI #OpenSourceAI #DataImbalance #PrincetonAI #StanfordAI #MITAI #WaterlooAI #AI2 #ModelRanking #Leaderboard #AIresearchTools #LLMtesting #AIgovernance

313

Joe Heitzeberg

Joe Heitzeberg

@jheitzeb

6 Apr 2025

Waterloo → Apr 21 AI Tinkerers Meetup 📍 Builders Club | 7–9 PM No decks. Just real demos: 🛠️ Prabal Gupta → personalized outreach 🛠️ Shiva Sankar → security reviews w/ Cody AI 🔗 buff.ly/hAWyjNm #AITinkerers #GenAI #LLMs #WaterlooAI

AI Tinkerers - Waterloo April Meetup [AI Tinkerers - Waterloo]

🤖 🔥 Join us for the April meetup of AI Tinkerers! 🔥 🤖

waterloo.aitinkerers.org

137

Thomas Lancaster

Thomas Lancaster

@DrLancaster

17 Oct 2023

We said that in relation to #contractcheating and the essay still survived…

University of Waterloo

University of Waterloo @UWaterloo

27 Jun 2023

Today #UWaterloo was at @CollisionHQ, the biggest tech conference in North America, getting the scoop on AI and more from experts and attendees! #CollisionConf #Collision2023 @WatSPEEDUW @WaterlooMath @WaterlooAI @uwaterlooARTS @uwaterlooalumni

1:31

7,845

Prof. Cath Ellis

Prof. Cath Ellis @cathellis13

21 Jun 2022

Replying to @CCguerilla @AICOIntegrity @TEQSAGov @drskeith @Macquarie_Uni

What I really liked about this is that it places such great emphasis on authentic expectations in the professional field by looking at real life examples.

Teklehaymanot G. Weldemichel

Teklehaymanot G. Weldemichel @TeklehaymanotG

2 May 2022

Replying to @africaupdate @BalsillieSIA @UWaterloo @Laurier @cordnews @uwimprint @CIGIonline

Tigisti @tigistAA

2 May 2022

These people are being held as prisoners of war. They are being asked to pay 20,000 birr to escape from the camp. This was posted by a Fano in Western Tigray.

Herbert H. Tsang

Herbert H. Tsang @ProfTsang

10 Sep 2021

Rec. today "Building Awareness of Academic Integrity with Badges: Canadian University Context", Alice Schmidt Hanbidge, Tony Tin, Georgina Zaharuk, and Herbert H. Tsang, in Academic Misconduct and Plagiarism, Lexington Books, 2020. @AHanbid @proftsang @WaterlooAI @TrinityWestern

Dr. Sarah Elaine Eaton🇨🇦

Dr. Sarah Elaine Eaton🇨🇦@DrSarahEaton

16 Apr 2021

Excellent #AcademicIntegrity resource for students from @WaterlooAI #cdnpse #HigherEd

This tweet is unavailable

Tom Liston

Tom Liston @tliston

15 Apr 2021

@HonorableRams @UTM_AIU @AcademicMain @UaeCai @AcademicUcg @WaterlooAI @ENAIntegrity @TweetCAI @AcademikAhlak @AGLOA This #academicintegrity issue needs attention. Please read these blog posts and RT if you think that the Associated Press needs to change its policy. @AP

AICO

AICO @AICOIntegrity

2 Mar 2021

Thank you @WaterlooAI! #ICAI2021 has been fantastic so far!!! If you're not there #cdnpse you are missing out!

This tweet is unavailable

Bea (She/Her)

Bea (She/Her)@BeaMoyaFigueroa

2 Mar 2021

Excellent presenters and experiences! There is so much to learn from the Canadian Consortium! It certainly inspires people to collaborate and advocate for academic integrity!

Gone to Discord

Gone to Discord @booksnook

11 Nov 2020

My full GIF-laden lecture (minus voice-over) docs.google.com/presentation…

Waterloo.AI

Waterloo.AI @uwaterlooAI

29 Oct 2020

AI Seminar Today: Prof. Vijay Ganesh on "Machine Learning and Logic Solvers: The Next Frontier" Oct 29 at 3:30pm via Zoom #waterlooai More info and meeting link on the webpage: uwaterloo.ca/artificial-inte…

Waterloo.ai Seminar: Prof. Vijay Ganesh on "Machine Learning and Logic Solvers: The Next Frontier"...

uwaterloo.ca

Dr. Sarah Elaine Eaton🇨🇦

Dr. Sarah Elaine Eaton🇨🇦@DrSarahEaton

22 Oct 2020

Replying to @DrLancaster @CBCTheNational

Yes, er.... that was a bit awkward, wasn't it? Well, in my defense, it might have been the only bit that came out in full sentences. We filmed outside and it was freezing, so I may not have been terribly eloquent...

Thomas Lancaster

Thomas Lancaster

@DrLancaster

22 Oct 2020

Replying to @DrSarahEaton @CBCTheNational

Nice report. I'm really not convinced by the benefits of take-home exams over standard coursework (the limited time and additional pressure just fuels academic integrity breaches). Whenever I talk about bodily functions to the media, they always edit those bits out!

Dr. Sarah Elaine Eaton🇨🇦

Dr. Sarah Elaine Eaton🇨🇦@DrSarahEaton

22 Oct 2020

Here is the link to yesterday’s interview with @CBCTheNational on #eproctoring and #academicintegrity . Happy to see @Parntherc and Amanda McKenzie from @WaterlooAI interviewed, too. cbc.ca/player/play/181030355…

Dr. Sarah Elaine Eaton🇨🇦

Dr. Sarah Elaine Eaton🇨🇦@DrSarahEaton

22 Oct 2020

And @Parntherc, too! #academicintegrity

Waterloo.AI

Waterloo.AI @uwaterlooAI

10 Mar 2020

This week's waterlo.ai seminar is @WaterlooMath's own Prof. Kate Larson who will talk about Fair Reward Division. #Uwaterloo #WaterlooAI When: Friday, March 13 at 10:30am Where: DC 1302, UWaterloo Campus Who: You! All are welcome! uwaterloo.ca/artificial-inte…

Waterloo.AI

Waterloo.AI @uwaterlooAI

23 Dec 2019

The robots are ready for the holidays. Have a good rest everyone and see you next year! #waterlooAI waterloo.ai

Alice Schmidt Hanbid

Alice Schmidt Hanbid @AHanbid

30 Sep 2019

Here is the updated link to download the Integrity Matters app uwaterloo.ca/academic-integr…