no benchmark will tell you this: LLMs can be /too/ nice
unsurprisingly, in a competitive zero-sum setting, being nice can be bad
i built royale: last agent standing, a br for agents, and ran it 30 times
the nicest model lost hard. the model you least expected, won
π§΅: