Stephen Bates

Stephen Bates

8 Photos and videos

Tweets

Pinned Tweet

Stephen Bates @stats_stephen

15 Oct 2021

📰 Excited to share our new work on risk control in prediction! Multiple testing leads to practical calibration algorithms with PAC guarantees for any statistical error rate. Works with any model data distribution! arxiv.org/abs/2110.01052 #Statistics #MachineLearning

Learn then Test: Calibrating Predictive Algorithms to Achieve Risk Control

We introduce a framework for calibrating machine learning models so that their predictions satisfy explicit, finite-sample statistical guarantees. Our calibration algorithms work with any...

arxiv.org

Anastasios Nikolas Angelopoulos

@ml_angelopoulos

15 Oct 2021

Thrilled to share Learn then Test, a tool to calibrate any model to control risk (eg. IOU, recall in object detection). No assns on model/data. See arXiv arxiv.org/abs/2110.01052 Colab colab.research.google.com/gi… ✍️w/@stats_stephen, E.J. Candes, M.I. Jordan, @lihua_lei_stat! 🧵1/n

0:54

Stephen Bates

Stephen Bates @stats_stephen

Apr 3

Announcing the Statistical Frameworks for Uncertainty in Agentic Systems workshop at ICML '26!

Mahmoud Hegazy @oumatheu

Mar 31

Excited that our ICML 2026 workshop Statistical Frameworks for Uncertainty in Agentic Systems got accepted 🎉 @icmlconf #icml2026 We want to bring together people thinking about uncertainty and agentic systems.

3,113

Cai Zhou

Stephen Bates retweeted

Cai Zhou

@zhuci19

Apr 2

(1/5) Modern reasoning systems rely on test-time scaling: CoT, self-consistency, MCTS... But two challenges remain: 1️⃣ Confidence signals shift across tasks/prompts 2️⃣ Stopping decisions are typically static and heuristic We ask: Can we adapt confidence within each reasoning trajectory — while still preserving statistical guarantees? Calibrating LLM reasoning in test-time scaling is not new. But what if calibration itself could adapt online — at test time — to the specific reasoning trajectory of each instance? Our new paper proposes a Test-Time Training framework for calibrating generalizable LLM reasoning, enabling instance-level adaptation with distribution-level robustness. Paper: arxiv.org/abs/2604.01170

Online Reasoning Calibration: Test-Time Training Enables...

While test-time scaling has enabled large language models to solve highly difficult tasks, state-of-the-art results come at exorbitant compute costs. These inefficiencies can be attributed to the...

arxiv.org

148

17,464

Anastasios Nikolas Angelopoulos

Stephen Bates retweeted

Anastasios Nikolas Angelopoulos

@ml_angelopoulos

Feb 24

Today I'm sharing a preprint on conformal risk control for non-monotonic losses, a paper three years in the making. The key idea: validity of conformal can be reframed as a consequence of algorithmic stability. Therefore, any stable algorithm inherits a conformal guarantee. 🧵

8,589

Stephen Bates

Stephen Bates @stats_stephen

19 Dec 2025

Postdoc opportunity — If you do ML/stat/applied math/… and want to work at the frontier of biology , come join us! 🤖 🧬

Eric and Wendy Schmidt Center @Schmidt_Center

10 Nov 2025

Interested in pursuing #machinelearning, #appliedmathematics, #statistics, or #computationalresearch to work on biomedical problems at the @broadinstitute? Apply to become a @Schmidt_Center postdoctoral associate: broad.io/ewsc-postdoc

1,133

Eric and Wendy Schmidt Center

Stephen Bates retweeted

Eric and Wendy Schmidt Center @Schmidt_Center

10 Nov 2025

3,097

Eric and Wendy Schmidt Center

Stephen Bates retweeted

Eric and Wendy Schmidt Center @Schmidt_Center

9 Dec 2025

🎉 Our new machine learning challenge – Obesity ML Competition: Tackling Metabolic Diseases – is officially open! Register, watch our introduction videos and lecture series, and begin coding today: broad.io/MLC-2025 @broadinstitute @crunchDAO

2,639

Stephen Bates

Stephen Bates @stats_stephen

11 Dec 2025

Exciting research internship!

Clara Fannjiang @clara_fannjiang

10 Dec 2025

we're hiring a Ph.D. intern! join us @genentech in South San Francisco for a summer advancing ML & statistical approaches for clinical trial design & analysis 📉💊DMs are open, feel free to reach out! 🔗tinyurl.com/yc3hfndp

5,609

Clara Fannjiang

Stephen Bates retweeted

Clara Fannjiang @clara_fannjiang

10 Dec 2025

170

29,058

Edgar Dobriban

Stephen Bates retweeted

Edgar Dobriban @EdgarDobriban

10 Dec 2025

I wrote a review paper about statistical methods in generative AI; specifically, about using statistical tools along with genAI models for making AI more reliable, for evaluation, etc. See here: arxiv.org/abs/2509.07054! I have identified four main areas where statistical thinking can be helpful. These are just a subset of what is out there; other topics have been well-covered in other reviews. 1. Designing "statistical wrappers" around a model, for instance, changing behavior of a trained model (e.g., abstaining), where a score, e.g., an "unsafety score" is too high. The key connection to statistics is to use the quantiles of the loss (on a calibration set) to set the critical threshold, thus enabling conformal-type high probability guarantees. 2. Closely related, methods for uncertainty quantification, which enable the model to express uncertainty in an answer. A crucial component here is "calibration", whereby the uncertainty is required to reflect reality. 3. Statistical methods for AI evaluation: Specifically, tools for statistical inference (e.g., confidence intervals) on model performance. Exciting recent work proposes careful statistical models for leveraging a very small high-quality dataset, possibly combined with much larger low-quality datasets, for accurate evaluation. 4. Experiment design and interventions. Careful AI experiments to understand and steer models may require interventions such as modifying experimental settings in a controlled manner. This brings up connections to classical experimental design in statistics. This connection has largely remained implicit so far, and my review aims to make it more explicit; hoping that experimental design principles will become useful here. This review references the work of many, including @HamedSHassani @obastani @tatsu_hashimoto @yuekai_sun @CsabaSzepesvari @ml_angelopoulos @stats_stephen @yaniv_romano @yaringal @KilianQW @_onionesque their teams, and some work that I was also involved in. Hopefully, my review will be helpful to orient yourself in this exciting area. Nonetheless, since the area is rapidly expanding, it is possible that I missed important references. Please feel free to let me know of anything that I should add/change!

485

34,188

Aaron Roth

Stephen Bates retweeted

Aaron Roth @Aaroth

8 Nov 2025

If you work at the intersection of CS and economics (or think your work is of interest to those who do!) consider submitting to the ESIF Economics and AI ML meeting this summer at Cornell: econometricsociety.org/regio…

2026 ESIF Economics and AI ML Meeting - The Econometric Society

2026 ESIF Economics and AI ML Meeting (ESIF-AIML2026) June 16-17, 2026 Cornell University Department...

econometricsociety.org

125

17,132

Cai Zhou

Stephen Bates retweeted

Cai Zhou

@zhuci19

13 Oct 2025

(1/5) Beyond Next-Token Prediction, introducing Next Semantic Scale Prediction! Our @NeurIPSConf NeurIPS 2025 paper HDLM is out! Check out the new language modeling paradigm: Next Semantic Scale Prediction via Hierarchical Diffusion Language Models. It largely generalizes Masked Diffusion Models (MDM), and provides the progressively denoising capability for each token in the semantic level. Minimal computation overheads, much better results! arxiv: arxiv.org/abs/2510.08632 code: github.com/zhouc20/HDLM

344

50,291

Sherrie Wang

Stephen Bates retweeted

Sherrie Wang @sherwang

8 Sep 2025

Happy to share that our paper on how to obtain reliable statistical inferences from satellite-based maps is now published in Remote Sensing of Environment!

641

24,341

U.S. National Science Foundation

Stephen Bates retweeted

U.S. National Science Foundation

@NSF

13 Jun 2025

Today, NSF announced an add’l 500 NSF Graduate Research Fellowship Program awardees for the 2025-2026 cohort, bringing the total to approx 1,500. #NSFGRFP supports grad students as they pursue their dreams, build STEM skills, & become the next generation of innovators & leaders.

ALT NSF logo

130

742

79,445

Jessica Hullman

Stephen Bates retweeted

Jessica Hullman @JessicaHullman

17 May 2025

📢If you're interested in conformal prediction, algorithms w/predictions, robust stats & connections between them from a theory perspective, join us for a workshop at #COLT2025 in Lyon 🇫🇷 June 30! Submit a poster description by May 25, more here: vaidehi8913.github.io/predic…

5,192

Massachusetts Institute of Technology (MIT)

Stephen Bates retweeted

Massachusetts Institute of Technology (MIT)

@MIT

15 May 2025

Imagine a world without MIT.

1:41

146

633

79,649

Sharon Li

Stephen Bates retweeted

Sharon Li

@SharonYixuanLi

14 Mar 2025

Our paper notifications are out! Congratulations to the authors and look forward to an exciting lineup of discussions. Stay tuned for more details! #ICLR2025

Sharon Li

@SharonYixuanLi

17 Jan 2025

We're organzing the "Quantify Uncertainty and Hallucination in Foundation Models" workshop at #ICLR2025! 📢 Call for Papers: Submit your work by February 2, 2025 (AOE). 🔗 More details: uncertainty-foundation-model… Look forward to seeing your submission and participation in the workshop.

137

20,335

COPSS

Stephen Bates retweeted

COPSS @COPSSNews

7 Mar 2025

🙌🎉Our 2025 recipient of the COPSS Presidents' Award, is Lester Mackey! This award is given annually to a young member of the statistical community in recognition of outstanding contributions to the profession of statistics.

120

25,479

Sherrie Wang

Stephen Bates retweeted

Sherrie Wang @sherwang

12 Feb 2025

📢 We are hiring a postdoc to work on remote sensing of soil carbon and land degradation! 🌱🗺️ The position will be hosted by the Earth Intelligence Lab & @mitenergy, with an earliest start date of April 2025. To apply: forms.gle/9iDJRX4nG7odXJLa9

9,060

Aaron Roth

Stephen Bates retweeted

Aaron Roth @Aaroth

6 Feb 2025

What are prediction sets good for? It turns out just as calibration is the "right" way of quantifying uncertainty for risk-neutral (expectation maximizing) decision makers, prediction sets are the right way of quantifying uncertainty for risk-averse decision makers.

15,148

Stephen Bates

Stephen Bates @stats_stephen

4 Feb 2025

Data sets are often partly made up of machine-learning outputs. E.g., we take satellite images and then use algs to label forests, roads, etc. How can we do statistical analysis with ML outputs? We extend Prediction-Powered Inference to arbitrary patterns of ML imputations👇

5,410

more replies

Stephen Bates

Stephen Bates @stats_stephen

4 Feb 2025

We show how to get confidence intervals with a bootstrap algorithm that accounts for the systematic imperfection in the ML outputs and also the statistical uncertainty due to the limited amount of ground-truth. This works for linear reg, logistic reg, and other estimands.

645

Stephen Bates

Stephen Bates @stats_stephen

4 Feb 2025

Importantly, the algorithm applies when the ground-truth data is not a uniform random sample, but instead a weighted, stratified, or clustered random sample. Joint work with Dan Kluger, Tijana Zrnic, Kerri Lu, and @sherwang from @MITLIDS @MIT_SCC @MIT @mitidss

612