Boosting LLM’s Molecular Structure Elucidation with Knowledge Enhanced Tree Search Reasoning
1.This work introduces K-MSE, a plug-and-play framework that enhances LLMs for molecular structure elucidation by integrating a Monte Carlo Tree Search (MCTS) with domain knowledge and a specialized reward model.
2.K-MSE boosts LLM performance by over 20% on GPT-4o-mini and GPT-4o across multiple metrics including structural accuracy and chemical fingerprint similarity, enabling more reliable molecular structure predictions from spectral data.
3.At the heart of K-MSE is a molecular substructure knowledge base that augments LLMs’ understanding of chemical space. It includes over 500 commonly occurring substructures paired with natural language descriptions, expanding the model's ability to interpret unfamiliar or rare compounds.
4.A key innovation is a specialized molecule-spectrum scorer trained to evaluate how well a predicted molecular structure aligns with input spectral data (C-NMR and H-NMR). Unlike standard LLMs, this scorer provides precise feedback using graph-based encoders and fingerprint features.
5.The scorer also serves as a retriever, linking the spectral input with relevant substructures from the knowledge base, guiding the LLM during reasoning by inserting chemically plausible building blocks.
6.K-MSE uses MCTS to iteratively refine solutions through critique and rewriting. It dynamically evaluates candidate molecules and selects high-reward paths, significantly enhancing the model’s ability to correct itself and explore chemical alternatives.
7.Ablation studies show that removing either the scorer or knowledge base leads to large drops in accuracy, confirming that both components are critical. Supplementing the critique phase with molecular images and formulas further improves model reasoning.
8.Compared to existing reasoning frameworks (e.g., CoT, Self-Refine, MAD), K-MSE achieves higher accuracy with comparable or lower token usage, making it efficient for test-time enhancement.
9.K-MSE demonstrates flexibility: it can be integrated with various LLMs and consistently improves performance across models from LLaMA-3 to GPT-o1, indicating broad applicability without retraining.
10.The work opens a path toward more autonomous chemical analysis systems, suggesting that LLMs, when paired with domain knowledge and tailored reasoning frameworks, can serve as copilots in experimental workflows.
💻Code:
github.com/HICAI-ZJU/K-MSE
📜Paper:
arxiv.org/abs/2506.23056v1
#AI4Science #LLMs #Chemistry #MolecularStructure #SpectralAnalysis #DrugDiscovery #MonteCarlo #TreeSearch #GPT4 #MachineLearning