Filter
Exclude
Time range
-
Near
Gumbel-Softmax Score and Flow Matching for Discrete Biological Sequence Generation 1. This paper introduces a novel generative framework called Gumbel-Softmax Flow and Score Matching for generating high-quality discrete biological sequences such as DNA, peptides, and proteins. The approach leverages a Gumbel-Softmax interpolant on the simplex to enable smooth transitions from noisy to clean distributions. 2. The key innovation is the use of a temperature-controlled Gumbel-Softmax distribution to define a velocity field that transports distributions from a uniform prior to a concentrated one-hot distribution over time. This avoids discretization errors and improves scalability to higher dimensions compared to previous methods. 3. The authors propose two main components: Gumbel-Softmax Flow Matching, which learns to predict the velocity field, and Gumbel-Softmax Score Matching, which estimates the gradient of the probability density. Both methods enable high-quality and diverse sequence generation. 4. A significant contribution is the introduction of Straight-Through Guided Flows (STGFlow), a training-free guidance method that uses pre-trained classifiers to steer the flow towards optimal sequences without requiring additional training of time-dependent classifiers. 5. The framework demonstrates competitive performance in conditional DNA promoter design, target-binding peptide design for rare disease treatment, and de novo protein sequence design, showcasing its potential for various biological applications. 6. The method effectively addresses limitations of previous discrete flow matching techniques, such as deterministic paths and lack of controllability at inference time, by introducing stochasticity and modular guidance. 7. The approach is scalable and can handle higher-dimensional simplex spaces, making it suitable for complex biological sequence generation tasks. It also provides a robust solution for controllable de novo sequence design. 📜Paper: openreview.net/forum?id=vx1u… #BiologicalSequenceGeneration #GumbelSoftmax #FlowMatching #ScoreMatching #DiscreteGeneration #ProteinDesign #PeptideDesign
1
2
11
2,150
Gumbel-Softmax Score and Flow Matching for Discrete Biological Sequence Generation 1. This paper introduces a novel generative framework called Gumbel-Softmax Flow and Score Matching for generating high-quality discrete biological sequences such as DNA, peptides, and proteins. The approach leverages a Gumbel-Softmax interpolant on the simplex to enable smooth transitions from noisy to clean distributions. 2. The key innovation is the use of a temperature-controlled Gumbel-Softmax distribution to define a velocity field that transports distributions from a uniform prior to a concentrated one-hot distribution over time. This avoids discretization errors and improves scalability to higher dimensions compared to previous methods. 3. The authors propose two main components: Gumbel-Softmax Flow Matching, which learns to predict the velocity field, and Gumbel-Softmax Score Matching, which estimates the gradient of the probability density. Both methods enable high-quality and diverse sequence generation. 4. A significant contribution is the introduction of Straight-Through Guided Flows (STGFlow), a training-free guidance method that uses pre-trained classifiers to steer the flow towards optimal sequences without requiring additional training of time-dependent classifiers. 5. The framework demonstrates competitive performance in conditional DNA promoter design, target-binding peptide design for rare disease treatment, and de novo protein sequence design, showcasing its potential for various biological applications. 6. The method effectively addresses limitations of previous discrete flow matching techniques, such as deterministic paths and lack of controllability at inference time, by introducing stochasticity and modular guidance. 7. The approach is scalable and can handle higher-dimensional simplex spaces, making it suitable for complex biological sequence generation tasks. It also provides a robust solution for controllable de novo sequence design. 📜Paper: openreview.net/forum?id=vx1u… #BiologicalSequenceGeneration #GumbelSoftmax #FlowMatching #ScoreMatching #DiscreteGeneration #ProteinDesign #PeptideDesign
6
934
25 Mar 2025
STGFlow: Gumbel-Softmax Flow Matching with Straight-Through Guidance for Controllable Biological Sequence Generation arxiv.org/abs/2503.17361
1
8
29
3,008
Gumbel-Softmax Flow Matching with Straight-Through Guidance for Controllable Biological Sequence Generation 1. This paper introduces Gumbel-Softmax Flow Matching, a novel generative framework designed for controllable biological sequence generation. It addresses the limitations of previous discrete simplex-based models by leveraging a Gumbel-Softmax interpolant with time-dependent temperature modulation. 2. The method defines a new velocity field that smoothly transports noisy categorical distributions to clean, discrete sequences on the simplex. This continuous framework allows the generation of high-quality and diverse sequences for complex biological tasks. 3. To enhance controllability, the authors propose Straight-Through Guided Flow (STGFlow), a training-free guidance technique that uses pre-trained classifiers to steer the generative model towards desired sequences. This enables efficient optimization of peptide binders, DNA promoters, and protein sequences. 4. Gumbel-Softmax Flow Matching is applied to various biological sequence generation tasks: conditional DNA promoter design, de novo protein sequence generation, and target-specific peptide binder design. The framework demonstrates competitive performance compared to state-of-the-art models in each task. 5. For peptide binder design, Gumbel-Softmax Flow Matching generates novel peptides with high binding affinity to proteins associated with rare diseases, such as Huntington’s Disease-Like 2 and Alexander Disease. It achieves superior performance in both docking scores and structure predictions compared to existing peptides. 6. The proposed framework provides a scalable and theoretically grounded approach for sequence generation, with potential applications in RNA sequence engineering, regulatory circuit design, and other structured biological design tasks. 💻Code: huggingface.co/ChatterjeeLab… 📜Paper: arxiv.org/abs/2503.17361 #GumbelSoftmaxFlow #STGFlow #BiologicalSequenceGeneration #PeptideDesign #ProteinDesign #DNAEngineering #MachineLearning #Bioinformatics #DeepLearning
5
15
1,284
We’ve been excited about discrete flow matching as an alternative to discrete diffusion for biologics design—but how do you scale it to larger vocabularies and control generation? 🌊🌡️Introducing Gumbel-Softmax Flow Matching from our brilliant @_sophia_tang_! 👇 📜: arxiv.org/abs/2503.17361 💻: Coming soon at huggingface.co/ChatterjeeLab… 🤗 Flow matching in discrete spaces is powerful but limited by high variance (Dirichlet FM), potential overfitting (Fisher FM), and rigidity. We build on this by using a Gumbel-Softmax interpolant with time-dependent temperature to create smooth, learnable flows across the simplex! 🔥 Our model, Gumbel-Softmax FM, smoothly transforms noise into clean sequences—avoiding hard discretization while learning better transport paths. We also introduce Gumbel-Softmax Score Matching, learning the score function over the simplex for stochastic sampling! 🎯 But generative control is hard post-training. 😓 So we introduce STGFlow — a training-free guidance method using straight-through gradients from pre-trained classifiers to steer flow trajectories at inference. No need to retrain: plug-and-play guidance for peptides, DNA, and proteins!!🧠 Across bioengineering tasks—DNA promoter design, de novo protein generation, and peptide binder design—our framework outperforms autoregressive and discrete diffusion baselines in fidelity, foldability, and functional control.📈🧬 We even show de novo peptide binders to targets with no known binders, including proteins involved in rare pediatric leukodystrophies and neurodegenerative diseases.💊 Our binders further show better docking scores and ipTM values than known binders across 13 targets and scramble controls! 🙌 This is the second theoretical masterpiece of @Penn undergrad @_sophia_tang_! 🎨🖌️ It was only a few months ago that she produced the PepTune algorithm (arxiv.org/abs/2412.17780), one of the most potentially impactful works from my lab, but she keeps pushing forward! 👊 We're all so proud of her! 🫶 And together, we've formed an incredible team around her, with powerhouse grad student @yinuo_z98 and flow matching pioneer/amazing collaborator @AlexanderTong7 making this possible! 🤝 Together, we're confident that scalable and controllable sequence generation will enable the next generation of therapeutics. 😇
2
33
181
18,040