That is why we built MindGuard-testset: a public benchmark of multi-turn mental health conversations annotated at the turn level by licensed clinical psychologists.
It lets us evaluate safety the way it actually shows up in care: across context, not isolated prompts.