Enhancing anticancer peptide discovery: A fusion-centric framework with conditional diffusion for prediction and generation
1 UACD-ACPs presents a unified pipeline that does both ACP classification and cancer-type-aware peptide generation, aiming to close a practical gap in the field where prediction models rarely connect to controllable design plus downstream structural/biophysical validation.
2 The key classification idea is “diffusion-inspired” noise conditioning: instead of running a full diffusion model for classification, the encoder injects a stochastic time-step embedding (plus cancer-type label embeddings) into fused peptide features to learn smoother, more robust decision boundaries—especially under imbalanced multi-class labels.
3 Representation is explicitly fusion-centric: ProtBERT semantic embeddings (1024-d) are concatenated with physicochemical descriptors (430-d; e.g., composition, dipeptides, pI, hydrophobicity), creating a heterogeneous feature space intended to capture both sequence semantics and interpretable biochemical signals.
4 MECS (Multiscale Embedding Compression Strategy) is introduced to reduce redundancy while preserving global-local cues: it combines channel attention (avg/max/median pooling shared MLP-like convs) with spatial attention using multiscale depthwise/asymmetric convolutions (e.g., 1×7, 7×1, 1×11, 11×1, 1×21, 21×1) to highlight functionally important regions.
5 Class imbalance is handled with a pragmatic pairing: SMOTE oversampling at the minibatch level plus class-weighted cross-entropy, then reinforced by the noise-conditioned training signal that acts like regularization against overfitting to majority classes.
6 On a 9-class cancer-type ACP dataset (e.g., breast, cervical, colon, gastric, HCC, leukemia, lung, prostate, histiocytic lymphoma), the classifier reports strong multi-class performance: AUC-ROC 0.99 ± 0.01, accuracy 0.94 ± 0.01, F1 0.93 ± 0.01, recall 0.93 ± 0.01, outperforming baselines such as ACPScanner, mACPpred, ANNprob-ACPs, XGBoost, and LightGBM.
7 The generation module uses a conditional denoising diffusion model to synthesize peptide representations that are later decoded into sequences of length 8–50, with outputs organized by cancer type to support targeted downstream screening rather than “one-bucket” generic ACP generation.
8 Two generation-specific fusion blocks are proposed to improve biological fidelity: BFM (Bitemporal Fusion Module) for multi-receptive-field feature capture across temporal states, and TFAM (Temporal Feature Attention Module) that applies channel spatial attention across two temporal branches and fuses them with softmax-weighted interactions.
9 The paper emphasizes multi-layer validation beyond sequence metrics: BLAST local alignments to check non-trivial similarity patterns, physicochemical profiling (charge, hydrophobicity, instability, aromaticity), disorder prediction (IUPred2A), structure modeling (AlphaFold2; pLDDT > 70 treated as reasonable for short peptides), and secondary-structure annotation (DSSP).
10 Biophysical plausibility is probed with simulations and docking: 100 ns all-atom MD (GROMACS, CHARMM36, DOPC:POPS membrane) for peptide–membrane interaction stability; plus qualitative HER2 kinase-domain docking (HawkDock/HDOCK) to compare feasible binding poses and residue contacts, framed as prioritization evidence rather than affinity claims.
💻Code:
github.com/yidingneng/ACP-Co…
📜Paper:
doi.org/10.1371/journal.pcbi…
#ComputationalBiology #Bioinformatics #PeptideDesign #AnticancerPeptides #DiffusionModels #ProteinLanguageModels #DeepLearning #DrugDiscovery #MolecularDynamics #StructurePrediction