We’ve been excited about discrete flow matching as an alternative to discrete diffusion for biologics design—but how do you scale it to larger vocabularies and control generation? 🌊🌡️Introducing Gumbel-Softmax Flow Matching from our brilliant
@_sophia_tang_! 👇
📜:
arxiv.org/abs/2503.17361
💻: Coming soon at
huggingface.co/ChatterjeeLab… 🤗
Flow matching in discrete spaces is powerful but limited by high variance (Dirichlet FM), potential overfitting (Fisher FM), and rigidity. We build on this by using a Gumbel-Softmax interpolant with time-dependent temperature to create smooth, learnable flows across the simplex! 🔥
Our model, Gumbel-Softmax FM, smoothly transforms noise into clean sequences—avoiding hard discretization while learning better transport paths. We also introduce Gumbel-Softmax Score Matching, learning the score function over the simplex for stochastic sampling! 🎯
But generative control is hard post-training. 😓 So we introduce STGFlow — a training-free guidance method using straight-through gradients from pre-trained classifiers to steer flow trajectories at inference. No need to retrain: plug-and-play guidance for peptides, DNA, and proteins!!🧠
Across bioengineering tasks—DNA promoter design, de novo protein generation, and peptide binder design—our framework outperforms autoregressive and discrete diffusion baselines in fidelity, foldability, and functional control.📈🧬
We even show de novo peptide binders to targets with no known binders, including proteins involved in rare pediatric leukodystrophies and neurodegenerative diseases.💊 Our binders further show better docking scores and ipTM values than known binders across 13 targets and scramble controls! 🙌
This is the second theoretical masterpiece of
@Penn undergrad
@_sophia_tang_! 🎨🖌️ It was only a few months ago that she produced the PepTune algorithm (
arxiv.org/abs/2412.17780), one of the most potentially impactful works from my lab, but she keeps pushing forward! 👊 We're all so proud of her! 🫶
And together, we've formed an incredible team around her, with powerhouse grad student
@yinuo_z98 and flow matching pioneer/amazing collaborator
@AlexanderTong7 making this possible! 🤝 Together, we're confident that scalable and controllable sequence generation will enable the next generation of therapeutics. 😇