Main reasons why LLMs are bad at multi-layered abstraction-heavy domains like Langlands program:
> LLMs can come up with 1-2 tricks, they can't theory build in any significant way, and this kind of domain usually requires new methods/settings to begin with.
> Combinatorics and similar domains are definition-light, which makes it fully about the tricks. Arguments are usually short. Definition-heavy domains require juggling many layers of abstract concepts, and going from global (macro level of where we are in the proof, e.g. we want to prove an equivalence of categories) to local computations (e.g. computing cohomology groups of some ad hoc defined schemes) many times over, back and forth. This is a bit like Go in terms of making local plays that are useful globally, but here the board is basically infinite.
I've been thinking about how to overcome this with auto-research methods and it looks genuinely hard. You could solve it sometimes by throwing way more compute (context, Monte Carlo brute forcing attempts, AVO-style evolution, etc, etc), but this is definitely not the right solution. We need some new understanding here of what local-global, multi-layer problems involve.
There are math worlds inaccessible to even the best internal AI models.