NEW paper from Microsoft Research.
(bookmark it)
The entire interpretability literature is built around human readers. As more analysis gets delegated to agents, the right target of interpretability shifts. This paper is a recipe for designing tools that agents can actually reason about.
They introduce Agentic-imodels, an autoresearch loop where a coding agent (Claude Code, Codex) iteratively evolves scikit-learn-compatible regressors that are simultaneously accurate AND readable by other LLMs.
Interpretability is measured by whether a small LLM can simulate the fitted model's behavior just by reading its string representation. Predictions, feature effects, counterfactuals, all from the __str__ output alone.
Run on 65 tabular datasets, the discovered models push the Pareto frontier past every classical interpretable baseline (decision trees, GAMs, sparse linear), and improve four downstream agentic data science systems on the BLADE benchmark by 8% to 73%.
Paper:
arxiv.org/abs/2605.03808
Learn to build effective AI agents in our academy:
academy.dair.ai/