Filter
Exclude
Time range
-
Near
Using @augmentcode to help build a little AI Research Lab where I can experiment with different models, evaluation methods, use cases and more. Very excited to see what I can come up with.
17
Replying to @augmentcode
Join the Telegram dev chat if you're working on the fallback flow to discuss adding safe checkpoints for forced model swaps!
38
Replying to @augmentcode
Cosmos auto fallback to Opus 4.8 kept workflows running without disruption
112
Replying to @augmentcode
regulatory pressure comes faster than adoption
78
Nvidia = the king of all semi; $NVDA = the dog of all semi, being suppressed in a narrow range for more than one year, underperforming all market indices...
53
A 90% cost reduction and 82% latency drop is insane for production-level AI. This partnership with Baseten is going to make high-speed reasoning models way more accessible for developers. Impressive stats!
52
Yayyyy! Excited to see this finally happening. Let’s get Mercury everywhere 🚀🚀🚀🚀🚀🚀
2
214
🔥 Congrats to the @_inception_ai and @baseten teams!
1
13
1,442
Jun 12
When mercury 3?
24
hell yeah!!!
1
445
I am excited to announce that Mercury 2 is now live on Baseten. Modern AI applications are evolving into multi-model agentic systems. The components that handle planning, routing, searching, classifying, and compacting must be fast, intelligent, and token efficient. Mercury 2 is designed for this purpose, achieving over 1,000 tokens per second on NVIDIA GPUs. AugmentCode is already leveraging Mercury 2 in production, resulting in a 90% reduction in costs and an 82% decrease in latency. For more details, check out the blog post: x.com/baseten/status/2065099….

1
44
Nice! Been using Mercury 2 for a while. How can you not love that speed?
People are sleeping on @_inception_ai ‘s Mercury 2. You don’t need a SOTA model for every task and the speed here is 👌 Looking forward to anything the team does.
1
153
Today Mercury 2, the first reasoning diffusion LLM, is live on Baseten. The result: over 1,000 tokens per second on standard NVIDIA GPUs, at comparable quality to speed-optimized models. @AugmentCode is already using it in production, cutting cost 90% and latency 82%.
Jun 11
We are excited to announce that we have partnered with @_inception_ai to make Mercury 2 available on Baseten. This makes us the first inference platform to bring Inception’s diffusion LLM to production. Inception’s dLLM architecture fixes the bottlenecks of sequential token generation and can deliver 1,000 tokens/sec on standard NVIDIA GPUs. Early users like @augmentcode have seen impressive results, such as an 82% reduction in latency and 90% cost savings, while maintaining high quality.
1
21
2,039
Jun 11
Inception 🤝 Baseten
6
266
congrats!!
1
1
847
The fastest reasoning LLM is now in production on Baseten. Mercury 2 is a diffusion LLM, so it generates tokens in parallel and hits 1,000 tokens/sec on @NVIDIAAI GPUs, speeds that used to require specialized hardware. @augmentcode is already using Mercury 2, cutting cost 90% and latency 82%. Proud to partner with the @baseten team to bring dLLMs to production.
Jun 11
We are excited to announce that we have partnered with @_inception_ai to make Mercury 2 available on Baseten. This makes us the first inference platform to bring Inception’s diffusion LLM to production. Inception’s dLLM architecture fixes the bottlenecks of sequential token generation and can deliver 1,000 tokens/sec on standard NVIDIA GPUs. Early users like @augmentcode have seen impressive results, such as an 82% reduction in latency and 90% cost savings, while maintaining high quality.
5
11
113
12,026
Today we're bringing Mercury 2 to @Baseten. Mercury 2 delivers over 1,000 tokens per second for customers on @NVIDIA GPUs with the reliability and scale enterprise teams need. Read more to see how @augmentcode is using Mercury 2 in production reducing costs by 90% and latency by 82%. More customer stories across coding agents, real-time voice, and enterprise search dropping soon.
8
2
44
6,152
Jun 11
We are excited to announce that we have partnered with @_inception_ai to make Mercury 2 available on Baseten. This makes us the first inference platform to bring Inception’s diffusion LLM to production. Inception’s dLLM architecture fixes the bottlenecks of sequential token generation and can deliver 1,000 tokens/sec on standard NVIDIA GPUs. Early users like @augmentcode have seen impressive results, such as an 82% reduction in latency and 90% cost savings, while maintaining high quality.
3
5
47
22,804