Xiaomi's MiMo-V2.5-Pro-UltraSpeed Pushes a 1T Model Past 1000 Tokens/Sec on Commodity GPUs
Through FP4 quantization, block-level speculative decoding, and the TileRT system stack, Xiaomi claims trillion-parameter decode speeds normally reserved for custom silicon — on a single 8-GPU node.
devclubhouse.com