CS prof at Penn. Amazon Scholar at AWS. Author of The Ethical Algorithm (w/ Michael Kearns). I study machine learning, privacy, game theory, and uncertainty.
How many samples do you need from an unknown distribution in order to train a model with multicalibration error at most epsilon?
Answer: 1/epsilon^3 samples is both necessary and sufficient.
Modern LLMs are incredibly good compression algorithms, which can shed light on why autonomous data science agents don't overfit as much as you might think.
Reusing a held-out set adaptively should invite overfitting. Yet in ML we reuse benchmarks for years and they stay informative. Why so little overfitting?
By using LLM agents as extreme compression engines, we get new understanding of why. đź§µ
Joint work w/ Martin Bertran and @Aaroth
Reusing a held-out set adaptively should invite overfitting. Yet in ML we reuse benchmarks for years and they stay informative. Why so little overfitting?
By using LLM agents as extreme compression engines, we get new understanding of why. đź§µ
Joint work w/ Martin Bertran and @Aaroth
In the last 48h:
- Jr researcher asked me wheter to use AI in making talks
- Saw two talks, with AI {slop, enhanced} slides
Collected my thoughts and wrote a post. Tl;dr: don't steal your own thinking, don't remove *you* from your talks. Also, give a &#@% about your talks.
AI has now solved a major open problem -- one of the best known Erdos problems called the unit distance problem, one of Erdos's favourite questions and one that many mathematicians had tried.
openai.com/index/model-dispr…
Recently we showed that the minimax optimal rate for multicalibration is T^{2/3}. But that doesn't mean you have to do that badly on all instances. We give an algorithm that can adapt to easy instances and get better rates while still being minimax optimal in the worst case.
I just learned about this closely related concurrent paper by Liu, Luo, and Ratliff that went up on arxiv yesterday: arxiv.org/abs/2605.11490 --- it also looks very interesting, check it out!
I've recently been getting invitations to talk about how to use AI tools to assist with TCS research. Its something I've been doing a lot, but don't have structured thoughts about how to explain process. But I'm going to try -- first such talk is tomorrow: cics.umass.edu/events/resear…
We updated our paper --- and solved the open problem highlighted in the old version. Now our lower bound construction has only polylog(1/eps) many groups instead of poly(1/eps) many groups. The construction is also simplified.
Excited about a new paper! Multicalibration turns out to be strictly harder than marginal calibration. We prove tight Omega(T^{2/3}) lower bounds for online multicalibration, separating it from online marginal calibration for which better rates were recently discovered.
How many samples do you need from an unknown distribution in order to train a model with multicalibration error at most epsilon?
Answer: 1/epsilon^3 samples is both necessary and sufficient.
Some interesting things:
- Multicalibration requires substantially more samples than marginal calibration.
- Unlike marginal calibration, multicalibration is just as hard to obtain in the batch setting as the online setting.
--There is a phase change. If the group family |G| is of constant size, Theta(1/eps^2) samples are necessary and sufficient. But when |G| > polylog(1/eps), Omega(1/eps^3) samples are necessary and remain sufficient for any |G| = poly(1/eps).
- The upper bounds are randomized.
FORC 2026 has an excellent set of accepted papers, topics ranging from privacy, fairness, and calibration, to mechanism design, reasoning, and watermarking.
Check 'em out at the conference on June 3 - 5 at Harvard. Registration is open (and free!). Travel support deadline: 4/24
I've recently been getting invitations to talk about how to use AI tools to assist with TCS research. Its something I've been doing a lot, but don't have structured thoughts about how to explain process. But I'm going to try -- first such talk is tomorrow: cics.umass.edu/events/resear…