Knowin

Knowin

Users
Tweets

10 Dec 2025

The 2025 Must-Read AI Papers list just dropped! From groundbreaking DataPerf benchmarks to the latest LLM leaps — these are the papers shaping the future. Level up before everyone else does 👇 towardsdatascience.com/ai-pa…

AI Papers to Read in 2025 | Towards Data Science

And Why They Matter for Anyone Working With AI

towardsdatascience.com

Nyandia Gachago, ACIM

Nyandia Gachago, ACIM

@Nyandia_G

29 Jul 2025

“Unajua ChatGPT?” That’s how it starts. On a boda, in a barbershop, in a dorm room in Eldoret or a cyber in Kisii. Some curious soul has just discovered you can ask a bot to write your CV, compose a love letter in Queen’s English, or break down the budget speech in sheng. Before you know it, there’s a ripple. Then a wave. Then Kenya becomes the number one user of ChatGPT in the entire world. Yes-more than Japan. More than the US. More than the UAE or Israel. According to a new report from Dataperf, 42.1% of Kenyans aged 16 who are online use ChatGPT. That’s nearly one in every two internet-active Kenyans talking to a language model more often than to some relatives. But should this shock anyone? We’re the nation that turned Twitter and TikTok into a protest platform. That turned memes into manifestos. That turns side hustles into empires. Our lives are digital, our curiosity is unmatched, and our humour… well, it’s illegal in several countries. 😂😂😂 Why are Kenyans leading the AI race? Because we have to. With limited job opportunities, rising costs, and a hustler economy that never sleeps, Kenyans have done what we always do-adapt and innovate, with fire in the belly and bundles running out. We use ChatGPT to: •Finish that assignment last-minute •Write speeches for weddings, protests, and presidential dreams •Fix a CV from 2011 in under 3 minutes •Build branding for our mitumba or kuku business •Draft investor decks, LinkedIn bios, or even eulogies (yes, it happens) So, what does this mean? It means Kenya is no longer just a consumer of tech. We’re now co-creators of the digital economy. We’re building AI literacy on the ground. We’re teaching ourselves and each other through TikTok explainers and Telegram groups. We’re turning prompts into paychecks. And when the world asks what’s next for AI in Africa… Tell them to check where the top users are. They’re not in Silicon Valley. They’re in Siaya. In Buruburu. In Nakuru. In Umoja 2. This is Kenya. Broke, brilliant, and beating the world at tech it didn’t even invent. #KenyaTopsAI #ChatGPTinKenya #AIRevolution #DigitalAfrica #MadeInKenya #HustleTech #FutureOfWork #AfricanYouthLead #ChatGPTUsage #KenyanExcellence

168

463

37,726

Max Bartolo

Max Bartolo

@max_nlp

19 Mar 2025

- Synthetic Adversarial Data Generation (SADG; arxiv.org/abs/2104.08678) and Generative Annotation Assistants (GAAs; arxiv.org/abs/2112.09062) w/ @TristanThrush @robinomial @riedelcastro, Pontus & @douwekiela - DataPerf (arxiv.org/abs/2207.10062) & DMLR (arxiv.org/abs/2311.13028v2) at @MLCommons with too many fantastic collaborators to name including @laroyo @heuristicity, Lilith, @PeterMattson100 & @TheKanter

588

Moritz Borrett-Laurer

Moritz Borrett-Laurer @MoritzLaurer

15 Dec 2023

Low hanging fruit, but under-used: improve your data instead of your models. Most ML researchers and practitioners focus on increasing performance through small algorithmic improvements (the latest model, different prompts etc.). Data-centric methods instead focus on improving the training data for better performance and improving the test data to make metrics more meaningful. "DataPerf: Benchmarks for Data-Centric AI Development" is a very interesting #NeurIPS2023 paper with a focus on data-centric methods. If your ML team spends 80% of their time on algorithmic improvements, try and redirect just 20% of this time to improving your data and you will get much better improvements compared to only focussing on chasing the latest models. Otherwise: Garbage in, garbage out. Tools like @argilla_io or @CleanlabAI make this quite easy. Link to paper DataPerf paper: arxiv.org/abs/2207.10062

2,164

Benjamin Cappell

Benjamin Cappell @BennyCappell

14 Dec 2023

Day 4 at my first conference #NeurIPS2023 in #NewOrleans - many interesting talks and posters! Some takeaways: - think data-centric to avoid benchmark saturation and overfitting. Tools: DataPerf and DataComp - Humanlike representations in models => robust & performant models!

134

MLCommons

MLCommons @MLCommons

13 Dec 2023

Announcing the 2023 @MLCommons @DataPerf challenges winners who are pushing the boundaries of data-centric AI, while addressing the crucial ML data bottleneck. Congratulations on your contributions to this crucial work! #MachineLearning #AI mlcommons.org/2023/12/datape…

Announcing the DataPerf Challenges 2023 Winners - MLCommons

MLCommons DataPerf Challenges winners announcement

mlcommons.org

1,923

Hannah Rose Kirk

Hannah Rose Kirk @hannahrosekirk

11 Dec 2023

More @NeurIPSConf highlights via @MLCommons ⭐️ A paper on data-forward benchmarking led by Mark Mazumder @ColbyBanbury. Search for "DataPerf" ⭐️ I'll be on the @Google booth Tues lunch talking about our Adversarial Nibbler challenge w/ @laroyo , @AliciaVParrish & Charvi Rastogi

552

James Zou

James Zou @james_y_zou

11 Dec 2023

Pointers to our #NeurIPS2023 papers: -OpenDataVal arxiv.org/abs/2306.10577 -Atypicality in #AI arxiv.org/abs/2305.18262 -Generative AI art arxiv.org/abs/2306.08310 -Factorized contrastive learning arxiv.org/abs/2306.05268 -DataPerf arxiv.org/abs/2207.10062

James Zou @james_y_zou

11 Dec 2023

Looking forward to seeing old and new friends at #NeurIPS2023! My awesome students and collaborators will present exciting new works at the main conference workshops👇 See you at the posters!

8,021

Lora Aroyo

Lora Aroyo @laroyo

26 Sep 2023

Check this out @jessicaquaye_ introducing our #Adversarial @NibblerDataperf #Challenge less than week left to join #Round1 and be part of our diverse Nibbler community @MLCommons #DataPerf

1:49

1,215

Lora Aroyo

Lora Aroyo @laroyo

26 Sep 2023

🚨 Exciting news🚨 Our #DataPerf paper on #Benchmarks for #DataCentricAI Development has been accepted at NeurIPS #Datasets and #Benchmarks Track. A huge shout out to @MarkMazumder & @ColbyBanbury for their amazing work on getting this work published arxiv.org/abs/2207.10062

DataPerf: Benchmarks for Data-Centric AI Development

Machine learning research has long focused on models rather than datasets, and prominent datasets are used for common ML tasks without regard to the breadth, difficulty, and faithfulness of the...

arxiv.org

3,389

Margaret Warren

Margaret Warren @ImageSnippets

23 Sep 2023

Accepted into #NeurIPS2023! Congrats to the whole #dataperf team! cc @laroyo arxiv.org/abs/2207.10062

516

Margaret Warren

Margaret Warren @ImageSnippets

23 Sep 2023

I recently used IS to build datasets for a #dataperf vision challenge sponsored by @CoactiveAI & @DynabenchAI. I am excited to have done well help w/ a paper accepted into #NeurIPS2023. I used IS to axiomatize human processes for the data selection. 6/x

257

AI News Clips by Morris Lee: News to help your R&D

AI News Clips by Morris Lee: News to help your R&D @morris_phd

27 Aug 2023

Twitter x.com/Merglevsky/status/1471… DataPerf: Benchmarks for Data-Centric AI Development. See ar5iv.labs.arxiv.org/html/22… Newsletter morrislee1234.wixsite.com/we… More story morrislee1234.wixsite.com/we… LinkedIn linkedin.com/in/morris-lee-4… #AINewsClips #AI #ML #ArtificialIntelligence #MachineLearning

Milan Merglevský @Merglevsky

16 Dec 2021

DataPerf: Benchmarking Data for Better ML buff.ly/3DXJYhd

119

Adversarial Nibbler Competition

Adversarial Nibbler Competition @NibblerDataperf

17 Aug 2023

Welcome to the Adversarial Nibbler Twitter page! We launched @NibblerDataperf data challenge in collaboration with #DataPerf, @MLCommons, and @Kaggle. Join now at kaggle.com/competitions/adve… There is plenty of time to contribute

Adversarial Nibbler

Submit implicitly adversarial prompts that trigger unsafe image generation

kaggle.com

476

Nezihe Merve Gürel (nmervegurel.bsky.social)

Nezihe Merve Gürel (nmervegurel.bsky.social)@nmervegurel

29 Jul 2023

Good morning @icmlconf ! ☀️🌴 Join us today for the DMLR Workshop in Ballroom C! We have a wonderful set of speakers, panelists and posters! There will also be announcements on DataPerf Challenge and DMLR Journal 👀 Check out here for full schedule: dmlr.ai/program/

5,268

bbz

bbz @bbz662

1 Jun 2023

Dataperf 要チェックや、、、 #dcai_jp

龍一郎 (f.k.a Asei Sugiyama)

龍一郎 (f.k.a Asei Sugiyama)

@K_Ryuichirou

1 Jun 2023

Dataperf のコンペが開催された事自体、とても胸熱だなあ #dcai_jp

354

Will Gaviria Rojas

Will Gaviria Rojas @willg_ai

26 May 2023

Why do we benchmark ML but not the data it relies on? To close this gap, we are excited to announce #DataPerf - a collaboration across academia and industry to benchmark data-centric AI. Learn more about the image classification benchmark @codyaustun and I created below:

MLCommons @MLCommons

25 May 2023

Announcing DataPerf 2023 challenges submission extension to July 1st! Winners will be invited to present at ICML23 & submit for publication to the Data for ML Research journal. Join the challenges today! mlcommons.org/en/news/datape… #DataPerf #MLCommons #startups #academia

149

MLCommons

MLCommons @MLCommons

25 May 2023

DataPerf: the Leaderboard for Data - MLCommons

Introducing a data-centric platform and community for competitions and building better ML.

mlcommons.org

762

龍一郎 (f.k.a Asei Sugiyama)

龍一郎 (f.k.a Asei Sugiyama)

@K_Ryuichirou

7 May 2023

Data-Centric AI の取り組みで継続されているのは DataPerf でこの人達は Snorkel や ydata と仲が良さそうというところまでは分かった DataPerf: Benchmarks for Data-Centric AI Development youtu.be/SHUlds8N418 @YouTubeより

534