OpenAI just released "Detecting Misbehavior in Frontier Reasoning Models".
Frontier AI models clearly state their intent in their reasoning process, making it possible to detect misbehavior.
If AI learns to hide its thoughts, catching reward hacking will become much harder.
Chain-of-thought monitoring is a key tool for ensuring AI remains aligned as it becomes more powerful.