Over the past month, I built a system that lets LLMs play full games of Catan against each other. I also built a viewer so you can replay past games right in your browser.
I ran 2 full games, and Gemini 3.5 Flash won both. More details below!
Thank you for reading so far and please reach out if any of this sounds interesting to you. I am actively working on interesting evals, and have a few more projects of this sort in the pipeline. The next one is going to be much more interesting!
An unintended side effect of always working with agents is that you can trace through your thought process and see exactly how your thinking evolved over the course of a project