Today we're releasing Paris 2.0, to our knowledge the first decentralized-trained video generation model.
At Bagel Labs, we believe frontier models should not require homogeneous clusters of premium, supply constrained GPUs. Paris 1.0 proved this for image generation. Paris 2.0 extends the recipe to video generation and lays the substrate for global-scale world models.
To test the approach, we trained two models head-to-head in an iso-FLOP, iso-data comparison. One was a monolithic model trained conventionally, on a single premium GPU cluster. The other was Paris 2.0, trained across an extreme mix of GPU types, generations, and vendors distributed around the globe.
Against the monolithic model under matched data and compute, the results were:
FVD: 561.04 → 279.01 (a ~2x improvement)
CLIP text-video alignment and aesthetic score both improved.
To our knowledge, this is the first distributed training architecture to surpass its monolithic counterpart under matched data and compute.
Technical Report:
arxiv.org/abs/2605.26064
Model Weights:
huggingface.co/bageldotcom/p…
We're releasing Paris 2.0, which, to our knowledge, is the world's first decentralized trained video generation model.
We benchmarked it against a monolithic model trained on the same data and compute budget, and Paris 2.0 outperformed the monolithic by ~2x on FVD benchmark.