We are excited to announce that we have partnered with
@_inception_ai to make Mercury 2 available on Baseten. This makes us the first inference platform to bring Inception’s diffusion LLM to production.
Inception’s dLLM architecture fixes the bottlenecks of sequential token generation and can deliver 1,000 tokens/sec on standard NVIDIA GPUs. Early users like
@augmentcode have seen impressive results, such as an 82% reduction in latency and 90% cost savings, while maintaining high quality.