Reward-Guided Discrete Diffusion via Clean-Sample Markov Chain for Molecule and Biological Sequence Design
1. The paper introduces CSMC (Clean-Sample Markov Chain) Sampler, a breakthrough method for reward-guided sampling in discrete diffusion models that operates entirely on clean samples, eliminating the need for noisy intermediate rewards that plague existing approaches.
2. Unlike prior methods such as SMC and SVDD that rely on intermediate rewards computed from approximate x0 predictions, CSMC constructs a Markov chain of fully denoised samples using the Metropolis-Hastings algorithm, enabling accurate reward evaluation at every step.
3. The key innovation lies in a forward-backward proposal distribution: CSMC corrupts a clean sample through the forward diffusion process, then denoises it back to create a candidate sample, making the acceptance probability tractable without requiring the intractable clean sample probability.
4. This approach is particularly crucial for scientific applications like molecule and DNA sequence design, where reward functions are notoriously non-smooth—a single token change in a SMILES string can collapse drug-likeness scores to zero or render molecules invalid.
5. CSMC demonstrates consistent state-of-the-art performance across four reward functions (QED, ring count, synthetic accessibility, and HepG2 enhancer activity) on QM9, ZINC250K, and MPRA datasets, outperforming Best-of-N, SMC, SVDD, and even training-based methods like D-CFG.
6. The method is universally applicable to all discrete diffusion frameworks including masked diffusion models (MDMs), uniform state models (USMs), and continuous-time Markov chain approaches (SEDD-M, SEDD-U), unlike SGDD which only works with uniform CTMC models.
7. CSMC-B, a batched variant, achieves comparable rewards with significantly reduced wall-clock time (3029s to 334s), making it practical for large-scale molecular design campaigns without sacrificing sample diversity.
8. The Markov chain exhibits fast mixing with autocorrelation decaying within 2000 iterations, and the method maintains high sample diversity (Tanimoto similarity < 0.2 for molecules, cosine similarity < 0.3 for DNA sequences) despite targeting high-reward regions.
📜Paper:
arxiv.org/abs/2602.09424
#DiscreteDiffusion #MolecularDesign #DrugDiscovery #GenerativeAI #ComputationalBiology #MetropolisHastings #Bioinformatics #DeepLearning