Large Language Models (LLMs) are increasingly used to simulate human users in interactive settings. However, they often drift from their assigned personas, contradict earlier statements, or abandon role-appropriate behavior. We introduce a framework for evaluating and improving persona consistency in LLM-generated dialogue with multi-turn RL, defining three automatic metrics and validating each against human annotations. More below 👇