Empowering Chemical Structures with Biological Insights for Scalable Phenotypic Virtual Screening
1. DECODE introduces a new paradigm that turns raw SMILES into biologically meaningful fingerprints by learning from paired transcriptomic and morphological profiles during training, yet requiring only chemical structures at inference.
2. The framework treats biological modalities as privileged information: it aligns chemical embeddings with a measurement‑invariant consensus space while ignoring assay‑specific noise, allowing the model to infer functional effects for unseen compounds.
3. A geometric disentanglement module splits each modality into a shared biological signal and an orthogonal, modality‑specific noise component, coupled with a contrastive loss that forces the chemical encoder to match the consensus, producing a robust, noise‑free fingerprint.
4. In zero‑shot drug retrieval, DECODE identifies functionally equivalent compounds with over 20 % higher top‑5 recall than traditional chemical similarity baselines, correctly clustering drugs that share mechanisms despite divergent scaffolds.
5. For sparse‑label mechanism‑of‑action classification, the method yields a 15–20 % F1‑score boost over expert MLPs, demonstrating that the consensus space filters out conflicting experimental artifacts that degrade standard fusion approaches.
6. A Generate‑Refine‑Enhance pipeline augments virtual screening: synthetic transcriptomic and morphological profiles are generated, refined, and combined with the structural encoding, achieving a six‑fold increase in hit rates for novel anti‑cancer agents compared to structure‑only models.
7. Ablation studies confirm that both the modal‑alignment phase and the orthogonality constraint are essential; removing either leads to significant drops in retrieval, MOA prediction, and hit‑rate performance.
8. Future work will embed context‑aware injection to capture tissue‑specific responses and integrate foundation models for richer biological feature extraction, further tightening the bridge between chemistry and phenotypic biology.
💻Code:
github.com/lian-xiao/DECODE
📜Paper:
arxiv.org/abs/2603.15006
#DrugDiscovery #Chemoinformatics #PhenotypicScreening #MachineLearning #VirtualScreening #Bioinformatics #AIinMedicine #DeepLearning #CompoundProfiling #MOAPrediction #HitRateImprovement #StructureBasedDesign #PharmaTech #ComputationalBiology #OpenSource