Efficient and Programmable Exploration of Synthesizable Chemical Space
1. A groundbreaking study introduces PrexSyn, a novel model for molecular discovery that ensures synthesizability while efficiently exploring chemical space. This model achieves near-perfect coverage of synthesizable molecules, setting a new benchmark in molecular design.
2. PrexSyn leverages a decoder-only transformer architecture trained on a billion-scale dataset of synthesizable pathways paired with molecular properties. This unique approach allows the model to generate molecules based on property prompts, enabling programmable objectives through logical queries.
3. The model demonstrates state-of-the-art performance in chemical space projection, achieving a 94% reconstruction rate on the Enamine REAL space and a significant improvement in similarity scores compared to previous methods.
4. PrexSyn excels in sampling efficiency for black-box oracle functions, outperforming both synthesis-agnostic and synthesis-based methods. This efficiency is attributed to its well-structured query space, making optimization more tractable than in discrete molecular graph spaces.
5. The study also highlights PrexSyn's ability to handle composite property queries, allowing users to specify complex molecular objectives using logical operators. This feature enhances the model's flexibility and applicability in real-world drug discovery scenarios.
6. PrexSyn's high-throughput data generation engine enables billion-scale training with modest computational resources. This innovation significantly reduces training time and cost while improving model performance.
7. The model's effectiveness is demonstrated in tasks such as scaffold hopping and docking-based molecular optimization, where PrexSyn generates high-scoring molecules with improved properties compared to existing baselines.
📜Paper:
arxiv.org/abs/2512.00384v1
💻Code:
github.com/luost26/PrexSyn
#PrexSyn #MolecularDesign #ChemicalSpace #Synthesizability #AIinChemistry #DrugDiscovery