🚀 A closer look at how Seed team built Seed Diffusion Preview - their bold bet on discrete diffusion as the next-gen LLM core. ⚡️
Discrete diffusion LMs face two major challenges:
1️⃣ Inductive Bias Conflict: arbitrary token order (inherent to diffusion) conflicts with natural language/code’s sequential nature, leading to inefficient learning, especially under limited compute.
2️⃣ Inference Efficiency Bottleneck: despite non-autoregressive parallelism, multi-step denoising adds heavy latency, and output quality is highly sensitive to number of steps.
🧪 So Seed proposes 4 key innovations:
1. Two-Stage Curriculum Learning: Pattern → Logic
🔹Stage 1: Mask-based diffusion
Standard masked token prediction is used - parts of the code are randomly replaced with [MASK] using dynamic noise scheduling. The model learns to restore local context and patterns (like syntax, structure, and token distribution).
Problem: Leads to "spurious correlation" - model overtrusts unmasked context.
🔹Stage 2: Edit-based diffusion
Adds insert/delete noise across the sequence.
Forces model to re-evaluate all tokens, not just masked ones.
🎯 4.8% pass@1 on CanItEdit benchmark vs AR baseline (54.3 → 50.5).
2. Constrained Order Diffusion
Code/natural language has latent order (e.g., declare before use), while purely random generation misses this.
Seed distills preferred generation paths (from internal pre-trained models), then uses them to guide diffusion training → Helps model learn real-world dependency structures.
3. On-Policy Learning for Fast Decoding
Goal: Fewer denoising steps, without hurting quality
Method:
Train model with its own decoding policy.
Use an external verifier model to ensure output quality.
Optimize generation steps using a surrogate loss based on edit distance.
4. Engineering: Blockwise Parallel Sampling
Introduce block-level diffusion: preserve causal order between blocks and use KV-cache to reuse past block info as conditions for future ones.
No block-specific training - model remains flexible.
Infrastructure optimized for real-time generation with adjustable block sizes.
📖 Full deep dive:
zhuanlan.zhihu.com/p/1934569…
👇 Check out the data charts below.
#ByteDanceAI #Seed #CodeGen #LLM #AIresearch
Tech Frontier | Seed Diffusion Preview
⚡
@BytedanceTalk Seed team drops Seed Diffusion Preview - an experimental diffusion language model for code generation. Inference speed hits 2146 tokens/s, 5.4× faster than same-size autoregressive models - over 100 lines of code/sec.
😎 What's behind the speed? Here's the breakdown from Seed researcher:
👣 Two-stage training
1. Masked Completion: teaches the model to fill in local code snippets.
2. Edit Perturbation: adds insert/delete noise to force global code logic understanding - major boost in code repair ability.
📐 Constraint-ordered learning
Injects structural priors into training - helps model learn correct code dependencies for more logical output.
🔄 Efficient parallel decoding
Introduces "co-strategy learning" to minimize generation steps, with a validator ensuring output quality.
🎯 Performance
- Matches autoregressive models in generation quality
- Outperforms them on edit-based tasks
- Validates diffusion LMs as a viable path to faster LLM inference.
🧠 Join the convo:
zhihu.com/pin/19356812492860…
💡 Try it now:
studio.seed.ai/exp/seed_diff…
#AI4Code #LLM #DiffusionModel #SeedDiffusion #ByteDanceAI