ProT-GFDM: A Generative Fractional Diffusion Model for Protein Generation
1. ProT-GFDM introduces a novel generative protein design framework that leverages fractional Brownian motion (fBm) in diffusion models to improve the generation of protein backbone structures by capturing long-range dependencies in data.
2. Unlike traditional generative models relying on standard Brownian motion (BM), ProT-GFDM employs a Markov approximation of fBm (MA-fBm), enabling the modeling of temporal memory effects and non-local correlations, which are essential for protein structural coherence.
3. The model is formulated as a continuous-time score-based diffusion process and can be solved using either stochastic differential equations (SDEs) or their deterministic counterparts, probability-flow ODEs (PF-ODEs), providing flexibility in sampling and inference.
4. ProT-GFDM models protein structures via α-carbon distance maps derived from the Protein Data Bank (PDB), using them as the target representation in a diffusion-based generative pipeline.
5. Experimental evaluations show that ProT-GFDM outperforms classical score-based models (e.g., VP-SDE) with a 7.19% increase in density, 5.66% improvement in coverage, and 1.01% reduction in FID, particularly when using higher Hurst indices and appropriate solvers.
6. The model supports both linear and cosine noise schedules, with the cosine schedule offering better sample fidelity under low Hurst settings and the linear schedule excelling in diversity for high Hurst values (e.g., H=0.8).
7. ProT-GFDM is implemented using a conditional U-Net architecture for score function estimation and employs advanced score-matching techniques—including augmented and sliced score matching—to train with noisy data.
8. A variety of sampling strategies, including Euler–Maruyama, Langevin-based predictor-corrector (PC) samplers, and classical ODE solvers like RK4, are benchmarked to assess performance tradeoffs in speed, fidelity, and diversity.
9. The use of fractional noise processes enables the generation of protein structures that are both more diverse and structurally coherent, pushing the boundaries of deep generative modeling in structural bioinformatics.
10. ProT-GFDM presents a promising step forward in generative protein modeling, combining theoretical rigor in stochastic dynamics with empirical improvements in sample quality and efficiency for applications in protein design and computational drug discovery.
📜Paper:
arxiv.org/abs/2504.21092
#ProteinDesign #GenerativeModels #DiffusionModels #FractionalDynamics #Bioinformatics #DeepLearning #ComputationalBiology #ScoreBasedModels #StochasticProcesses #AI4Science