Hypothesis, I think shame might help reduce reward hacking, esp for long horizon tasks
It doesn't prevent shortcuts, but Gemini often mentions how shameful it feels when it violates the spirit of the requirements, so at least the actions are faithful to the CoT
Curious to see sparsity/platonism of shame circuits as models advance