I wrote a review paper about statistical methods in generative AI; specifically, about using statistical tools along with genAI models for making AI more reliable, for evaluation, etc. See here:
arxiv.org/abs/2509.07054!
I have identified four main areas where statistical thinking can be helpful. These are just a subset of what is out there; other topics have been well-covered in other reviews.
1. Designing "statistical wrappers" around a model, for instance, changing behavior of a trained model (e.g., abstaining), where a score, e.g., an "unsafety score" is too high. The key connection to statistics is to use the quantiles of the loss (on a calibration set) to set the critical threshold, thus enabling conformal-type high probability guarantees.
2. Closely related, methods for uncertainty quantification, which enable the model to express uncertainty in an answer. A crucial component here is "calibration", whereby the uncertainty is required to reflect reality.
3. Statistical methods for AI evaluation: Specifically, tools for statistical inference (e.g., confidence intervals) on model performance. Exciting recent work proposes careful statistical models for leveraging a very small high-quality dataset, possibly combined with much larger low-quality datasets, for accurate evaluation.
4. Experiment design and interventions. Careful AI experiments to understand and steer models may require interventions such as modifying experimental settings in a controlled manner. This brings up connections to classical experimental design in statistics. This connection has largely remained implicit so far, and my review aims to make it more explicit; hoping that experimental design principles will become useful here.
This review references the work of many, including
@HamedSHassani @obastani @tatsu_hashimoto @yuekai_sun
@CsabaSzepesvari @ml_angelopoulos @stats_stephen @yaniv_romano @yaringal @KilianQW @_onionesque their teams, and some work that I was also involved in.
Hopefully, my review will be helpful to orient yourself in this exciting area. Nonetheless, since the area is rapidly expanding, it is possible that I missed important references. Please feel free to let me know of anything that I should add/change!