AI System Achieves Silver Medal-level score in IMO
The International Mathematical Olympiad (IMO) is the oldest, largest & most prestigious competition for young mathematicians. Every year, countries send their top young mathematicians to take a 6 problem test spanning two days. A few years ago, mathematical AI systems were lucky to be able to solve 1 in 100 of past IMO problems. This year,
@GoogleDeepMind built a system to tackle this year's IMO problems that combined two systems: AlphaGeometry 2, an improved version of our AlphaGeometry system, and AlphaProof, a new reinforcement-learning based system for formal mathematical reasoning.
The results? During the week of the competition, this system was able to fully solve 4 of the 6 problems on this year's IMO exam, achieving a score of 28 (each problem is worth 7 points), placing it at the upper end of the IMO Silver Medal range (a level achieved by 58 of the 609 contestants in this year's IMO).
AlphaProof used a fine-tuned version of a Gemini model to automatically translate natural language problem statements into formal statements, creating a large library of formal problems of varying difficulty. It then learned to solve IMO-like problems by proving or disproving millions of problems, covering a wide range of difficulties and mathematical topic areas over a period of weeks leading up to the competition.
“The fact that the program can come up with a non-obvious construction like this is very impressive, and well beyond what I thought was state of the art."
— Prof Sir Timothy Gowers, IMO gold medalist and Fields Medal winner
Here is problem 4 of the 6, a geometry problem that AlphaGeometry solved in 19 seconds after receiving its formalization:
This represents a major advance in the capabilities of AI systems to correctly perform complex mathematical reasoning on par with the world's best young mathematicians. We’re excited for a future in which mathematicians work with AI tools to explore hypotheses, try bold new approaches to solving long-standing problems and quickly complete time-consuming elements of proofs — and where AI systems like Gemini become more capable at math and broader reasoning.
See the blog post below for details:
deepmind.google/discover/blo…