Sumble co-founder and CTO. Learning high-quality, structured data about the world. Formerly @kaggle

Joined March 2009
735 Photos and videos
Pinned Tweet
27 Jan 2022
Wordle xxx 1/6 🟩🟩🟩🟩🟩 Get Wordle right on your first guess using the daily ⬛🟨🟩 tweet distribution kaggle.com/benhamner/wordle-…
31
215
1,063
The “AI escape the box” experiment was comically naive in hindsight. There is no box. Instead, there’ll be embedded access to every critical system in the world
12
10
1,101
What will engineering productivity look like in a year once a high fraction of codebases are written by Claude Code/Codex and no human has a detailed understanding of these parts anymore?
6
9
2,154
if country==“AU”: await asyncio.sleep(0.52)
Interviewer: Your page loads in 80 ms in Australia but 600 ms in India. Same backend. Same code. What would you use to fix this?
1
15
2,370
13 Oct 2025
english in 2030 : python :: python in 2020 : assembly
3
760
24 Sep 2025
WOW - cold inbound hiring processes are the worst I've ever seen it: - Huge volume of applications (probably triggered automatically) - Large volume of fake resumes - Extensive use of AI assistants in phone screens
3
2
1,103
25 Jul 2025
You could release a crazy stat on the number of times (and number of employees) ClosedAI companies have downloaded open models from HuggingFace
Replying to @KamStaszewski
All closed-source frontier labs use tons of open-source all over the stack, starting from python, @PyTorch, @huggingface all the way down to RoPE, GQA, flash-attention, or any tiny improvements released by open-source players. The whole transformers architecture (the T in gpT) comes from an open research paper and open-source model from 2017: huggingface.co/papers/1706.0… This is ok but would be much nicer if they would acknowledge the contributions & contribute back more!
1
2
1,507
25 Jul 2025
This is around 5x the ~200 trillion natural language tokens humans generate every month
Google is processing 980 trillion monthly tokens across our products and APIs (up from 480T in May) 🤯 No slowdown in sight, intelligence is everywhere.
2
3
1,723
25 Jul 2025
Assumptions: 8 billion people * 20k spoken/written words per day * 1.3 tokens per word
1
771
Ben Hamner retweeted
Kaggle launched an LLM eval product kaggle.com/benchmarks This has the potential solve the biggest challenge in the LLM ecosystem: strong and diverse evals
2
50
267
24,802
27 Feb 2025
Getting a bunch of 429 rate limit errors with a quota exceeded message from @OpenAI's API, in spite of being well under quota and published rate limits
2
5
1,432
14 Jan 2025
NLC: natural language cron
14 Jan 2025
We're excited to introduce Tasks! For the first time, ChatGPT can manage tasks asynchronously on your behalf—whether it's a one-time request or an ongoing routine. Here are my favorite use cases: 1/ ChatGPT checks stock price every morning!
2
1,411
22 Dec 2024
Just had the most obnoxious interaction I’ve had in a long time Jogging in quiet Palo Alto neighborhood. Middle-aged man and woman block away walking towards me Half a block away they start screaming “RUN ON THE FUCKING STREET SIDEWALKS ARE FOR WALKING NOT RUNNING. GET OFF THE FUCKING SIDEWALK ITS A FUCKING SIDEWALK NOT RUN”
8
15
7,402
19 Nov 2024
Marimo's a delight to use for both exploratory Python notebooks and rapidly prototpying reactive data applications with a rich UI. It's replaced Jupyter notebooks for me. Excited to see what Akshay and team build from here!
My co-founder @themylesfiles and I have started Marimo Inc. to keep building the @marimo_io notebook and other Python data tools. We've raised a $5M seed round led by @antgoldbloom and @shyammani_ at @aixventureshq. Excited for the journey ahead! marimo.io/blog/seed-announce…
1
12
2,527
Ben Hamner retweeted
Looking at job posts over the last year, it looks like Amazon is making the broadest AI investment. 228 teams across the company posted jobs with GenAI projects over the past year, considerably more than any other company. blog.sumble.com/companies-wi… Most interest is the breadth of teams and use cases. They made posts from the core research teams, Alexa, customer service, seller experience, fulfillment, and catalog selection teams (full list: sumble.com/l/JrsA4zNSp4). Some examples: - Creative X Team: generating high-quality text, images and video for advertisers (sumble.com/l/EHF2RZQRYD) - Geospatial Science Team: for global address parsing and validation (sumble.com/l/h7GWV4t9JU) - Selection and Catalog Systems Team: improve the completeness and correctness of product data for Amazon shoppers (sumble.com/l/uaddsqaD4v)
1
16
47
16,642
1 Nov 2024
Fast hack to improve the factual accuracy of your LLM calls? Multi-LLM and consensus (e.g. majority voting) across them among the leading foundation models
3
6
903
Ben Hamner retweeted
The overlooked GenAI use case: cleaning, processing, and analyzing data. blog.sumble.com/the-overlook… Job post data tell us what companies plan to do with GenAI. The most common use case is data analytics projects. Examples: - AstraZeneca: using LLMs on freeform documents to structure results from their Extractables & Leachables testing (sumble.com/l/FgehjrgnvN) - Trafigura: The Document AI team is using LLMs to extract data from a corpus of commodity trading documents to generate credit reports (sumble.com/l/6bY7mhyAHd) The startup ecosystem is overlooking this use case, instead focusing on other areas such as customer support, sales & marketing and code gen.
22
113
643
164,999
31 Oct 2024
10 data quality challenges in relational tables: 1. Duplicates 2. 3. Duplicates 4. Cḧarαctěr Énçødïng 5. 1/1/24 (ambiguous dates) 6. 7am (ambiguous times w/o timezones) 7. John A. Smith, john smith (fuzzy joining) 8. asdfasdfa junk record 9. Company: Apple Url: google.com 10. location: Franklin

8
732
30 Oct 2024
Close to 100% of machine code is now compiler-written. Higher levels of abstraction are how we move forward, faster
8
769
24 Sep 2024
I was born into the first computer-native generation My 2yo son’s in the first AI-native generation
2
751