Gumbel-Softmax Flow Matching with Straight-Through Guidance for Controllable Biological Sequence Generation
1. This paper introduces Gumbel-Softmax Flow Matching, a novel generative framework designed for controllable biological sequence generation. It addresses the limitations of previous discrete simplex-based models by leveraging a Gumbel-Softmax interpolant with time-dependent temperature modulation.
2. The method defines a new velocity field that smoothly transports noisy categorical distributions to clean, discrete sequences on the simplex. This continuous framework allows the generation of high-quality and diverse sequences for complex biological tasks.
3. To enhance controllability, the authors propose Straight-Through Guided Flow (STGFlow), a training-free guidance technique that uses pre-trained classifiers to steer the generative model towards desired sequences. This enables efficient optimization of peptide binders, DNA promoters, and protein sequences.
4. Gumbel-Softmax Flow Matching is applied to various biological sequence generation tasks: conditional DNA promoter design, de novo protein sequence generation, and target-specific peptide binder design. The framework demonstrates competitive performance compared to state-of-the-art models in each task.
5. For peptide binder design, Gumbel-Softmax Flow Matching generates novel peptides with high binding affinity to proteins associated with rare diseases, such as Huntington’s Disease-Like 2 and Alexander Disease. It achieves superior performance in both docking scores and structure predictions compared to existing peptides.
6. The proposed framework provides a scalable and theoretically grounded approach for sequence generation, with potential applications in RNA sequence engineering, regulatory circuit design, and other structured biological design tasks.
💻Code:
huggingface.co/ChatterjeeLab…
📜Paper:
arxiv.org/abs/2503.17361
#GumbelSoftmaxFlow #STGFlow #BiologicalSequenceGeneration #PeptideDesign #ProteinDesign #DNAEngineering #MachineLearning #Bioinformatics #DeepLearning