I've seen a lot of tweet asking about how this was done, my theory (which is based on the good intention that making it 2.5x fast is actually bringing the price up) is:
Let's say a single replica can handle X amount of users, and they were serving 6k max batch size on that replica, in that case let's assume this produces 100 ms TBT.
Then they would spin up 3 replicas and cap the max batch size to 2k, this will serve the same amount of users but it will bring down the TBT to 40ms (33ms 7ms fixed overhead), and since you're running 3 replicas the price will go up, so they decided to provide this premium service for users for a premium price, so realistically they are getting 2x more profit per token with this strategy, but it limits the amount of users they can serve because users/replica rate is lower.
Our teams have been building with a 2.5x-faster version of Claude Opus 4.6.
We’re now making it available as an early experiment via Claude Code and our API.