Spotlighting our latest research accepted to the ICML 2026 Position Paper track: "Position: We Need A Unified Definition of Hallucination, Or: It's the World Model, Stupid!" 🎉
We keep saying LLMs hallucinate, but what does that really mean? The lack of a clear, unified definition has historically led to disparate characterizations, for example faithfulness, factuality, or calibration failure.
While these definitions work for basic QA, they do not extend naturally to multi-turn and agent-in-environment settings. For instance, evaluating "faithfulness to context" becomes inadequate when an agent intentionally chooses how its context builds up over time.
To address this, we argue for a neater characterization: casting hallucination as an internal world modeling failure. In other words, a model hallucinates when it begins making claims that contradict a reference world model (which might simply be fixed environment dynamics, rather than a neural model).
The Formal Definition: We introduce a reference world model 𝑊 = (𝑆, 𝐻, 𝑅), a conflict policy 𝑃, and a truth function 𝑇_(𝑊,𝑷). A model hallucinates with respect to (𝑊, 𝑃) if and only if it produces at least one atomic claim 𝑐 ∈ 𝒞(𝑦) such that 𝑇_(𝑊,𝑷)(𝑥, 𝘲) = false.
This framework matters for two main reasons. First, it makes the scaling of hallucination benchmarks possible, as any environment with known dynamics can be instantiated to match it. Second, it formally handles Parametric vs. Contextual-driven disagreements through the conflict policy 𝘗, which can simply be null where no such divergence exists.
Building on this definition, we are also excited to share our sequel benchmark: HalluWorld. This work actively measures hallucination by asking probes as an agent solves tasks in three environments with known, controllable dynamics: GridWorlds, Chess, and the Terminal.
Read the papers here:
📄ICML '26 Position Paper:
arxiv.org/abs/2512.21577
🌍️HalluWorld Preprint:
arxiv.org/abs/2605.19341v1
(See below for Fig 1 and a breakdown of our definitions!)
@VarunGangal