Deep Learning for Biomarker Discovery in Cancer Genomes
1. This study introduces a novel deep learning framework for identifying clinically relevant biomarkers—Microsatellite Instability (MSI) and Homologous Recombination Deficiency (HRD)—directly from somatic mutation data in cancer genomes.
2. Leveraging next-generation sequencing (NGS) data from over 3,000 cancer patients, the proposed method uses an end-to-end attention-based multiple instance learning (attMIL) architecture, outperforming traditional machine learning (ML) approaches.
3. The model achieves outstanding performance metrics for MSI prediction, with 98% accuracy, 95% sensitivity, and 100% specificity in external validation, significantly surpassing state-of-the-art ML tools.
4. In HRD prediction, the model maintains robust accuracy (80%) and demonstrates an ability to capture biologically meaningful patterns related to alternative DNA repair pathways like microhomology-mediated end joining (MMEJ).
5. Unlike traditional methods that rely on manual feature engineering, this deep learning approach processes unfiltered mutation data, reducing information loss and uncovering new genomic insights.
6. The explainability techniques employed—such as attention scoring and clustering—highlight the biological plausibility of the model, aligning predictions with known DNA damage repair signatures.
7. The framework adapts seamlessly to targeted sequencing panels like FoundationOne Dx and TruSight Oncology, maintaining high performance even with reduced data, showcasing its clinical applicability.
8. This study opens new doors for precision oncology by providing an interpretable, high-performing, and scalable deep learning toolkit for biomarker discovery.
@jnkath @StefanFrohling @am0ck @VibertJulien @zigutyte @michaela_un
📜Paper:
biorxiv.org/content/10.1101/…
#DeepLearning #CancerGenomics #AIinHealthcare #PrecisionOncology