scHDeepInsight: A Hierarchical Deep Learning Framework for Precise Immune Cell Annotation in Single-Cell RNA-seq Data
1.scHDeepInsight introduces a novel hierarchical deep learning framework for immune cell annotation in scRNA-seq, achieving an average accuracy of 93.2% across seven diverse tissue datasets. It significantly outperforms previous state-of-the-art models, with a 5.1% boost in accuracy compared to scDeepInsight.
2.Unlike traditional flat classifiers, scHDeepInsight preserves the biological hierarchy of immune cells. It uses a two-level prediction strategy—first predicting broad immune cell types (base-types), then refining predictions to specific subtypes—reflecting known lineage relationships.
3.The method converts gene expression profiles into structured 2D images using DeepInsight, allowing convolutional neural networks (CNNs) to extract both global and fine-grained transcriptomic patterns. This spatial representation helps capture complex gene-gene relationships.
4.A key innovation is the Adaptive Hierarchical Focal Loss (AHFL), which dynamically balances the classification loss at the base-type and subtype levels, adapting training focus based on task difficulty and addressing class imbalance in rare subtypes.
5.scHDeepInsight incorporates STACAS for batch effect correction and applies random masking during training to improve robustness against missing gene features, enabling cross-dataset generalization.
6.The framework includes SHAP-based interpretability, quantifying the contribution of each gene to classification decisions at both base and subtype levels. This reveals both canonical markers (like CD8A for CD8 T cells) and subtle subtype-specific signatures (e.g., IGHA1 for IgA plasma cells).
7.It effectively identifies rare and novel immune cell populations. In glioblastoma data, the model detected glioma-associated immune cells by recognizing high base-type confidence but low subtype confidence—suggesting novel states outside the training reference.
8.Comprehensive benchmarking shows that scHDeepInsight consistently outperforms methods like SingleR, Azimuth, CellTypist, GPTCellType, and Garnett across accuracy, precision, F1-score, and AUPRC—even for challenging closely related immune subtypes.
9.In specific case studies (e.g., Pranzatelli labial gland dataset), scHDeepInsight distinguished IgA , IgG , and IgM plasma cell subtypes that were grouped together by other models, highlighting its resolution and biological fidelity.
10.The reference atlas used to train the model includes over 460,000 immune cells from 10 public scRNA-seq datasets, spanning 15 major immune lineages and 50 subtypes. This comprehensive dataset provides a robust foundation for hierarchical learning.
11.Future directions for scHDeepInsight include extending the hierarchy to non-immune cells, incorporating multi-omics (e.g., CITE-seq, spatial transcriptomics), and applying self-supervised or transfer learning to improve adaptability to new datasets or species.
💻Code:
github.com/shangruJia/scHDee…
📜Paper:
doi.org/10.1101/2025.06.23.6…
#scRNAseq #DeepLearning #CellTypeAnnotation #Immunology #SingleCell #CNN #Bioinformatics