bagel.com

bagel.com

130 Photos and videos

Tweets

Pinned Tweet

bagel.com

@bageldotcom

May 28

Today we're releasing Paris 2.0, to our knowledge the first decentralized-trained video generation model. At Bagel Labs, we believe frontier models should not require homogeneous clusters of premium, supply constrained GPUs. Paris 1.0 proved this for image generation. Paris 2.0 extends the recipe to video generation and lays the substrate for global-scale world models. To test the approach, we trained two models head-to-head in an iso-FLOP, iso-data comparison. One was a monolithic model trained conventionally, on a single premium GPU cluster. The other was Paris 2.0, trained across an extreme mix of GPU types, generations, and vendors distributed around the globe. Against the monolithic model under matched data and compute, the results were: FVD: 561.04 → 279.01 (a ~2x improvement) CLIP text-video alignment and aesthetic score both improved. To our knowledge, this is the first distributed training architecture to surpass its monolithic counterpart under matched data and compute. Technical Report: arxiv.org/abs/2605.26064 Model Weights: huggingface.co/bageldotcom/p…

bidhan

@bidhan

May 28

We're releasing Paris 2.0, which, to our knowledge, is the world's first decentralized trained video generation model. We benchmarked it against a monolithic model trained on the same data and compute budget, and Paris 2.0 outperformed the monolithic by ~2x on FVD benchmark.

0:42

5,503

bidhan

bagel.com retweeted

bidhan

@bidhan

May 28

0:42

114

667

449,495

bagel.com

bagel.com

@bageldotcom

Apr 27

3,270

bagel.com

bagel.com

@bageldotcom

Mar 15

In town for NVIDIA GTC? If you're building generative world models or investing in the people who are - we're putting the right people in one room for you tomorrow night in Palo Alto. Co-hosted by Alumni Ventures. Signup link below.

0:13

11,067

bagel.com

bagel.com

@bageldotcom

Mar 15

luma.com/nvidia-gtc-generati…

NVIDIA GTC - Generative World Models for Startups | by Bagel Labs & Alumni Ventures · Luma

NVIDIA GTC – Generative World Modeling for Startups What does it take to build a startup around generative world models? Join Bagel Labs and Alumni Ventures…

luma.com

1,206

Gin Jiang @CVPR

bagel.com retweeted

Gin Jiang @CVPR

@ZhiyingJ

Mar 11

we managed to extend DDM to heterogeneous objectives! one step closer to decentralized AI XD tldr: experts trained in complete isolation, with different objectives, no communication - and mixing them beats making them all train the same way. 20-48G memory per expert

bidhan

@bidhan

Mar 11

Excited to share that Bagel Labs' paper got accepted at CVPR 2026. A lot of the most important diffusion model research has historically stayed inside frontier labs. We're bringing more of that in the open through open science and open infrastructure. In this work we showcase the very counterintuitive advantage of mixing different training objectives (DDPM and Flow-Matching) through an ensemble of diffusion models. This is one of the first ever works to successfully combine diffusion models trained with heterogeneous objectives. See details here: blog.bagel.com/p/heterogeneo…

2,545

bidhan

bagel.com retweeted

bidhan

@bidhan

Mar 11

4,936

bagel.com

bagel.com

@bageldotcom

Mar 8

Diffusion models are becoming the foundation for image, video, and world models. We are hosting a founders and investors gathering on that topic during NVIDIA GTC week, co-hosted by our friends at Alumni Ventures. Mar 16, Menlo Park. Sign up below. luma.com/nvidia-gtc-generati…

5,764

bidhan

bagel.com retweeted

bidhan

@bidhan

Feb 6

Being at the frontier - by the definition of it - means creating the frontier. You don't get to be at the frontier by following someone else. And creating the frontier often means discoveries that go against the established knowledge. We recently made such a discovery about distributed diffusion model training. A common way to optimize diffusion model training is by ensuring the numerical stability of their generation paths. We found that that's not true for the most efficient distributed diffusion model training architecture. We shared what works instead in our blogpost below. blog.bagel.com/p/stability-q…

455

52,280

Deep-ML

bagel.com retweeted

Deep-ML

@real_deep_ml

Jan 22

Can't wait for the live stream! in honor of it we just added a new question and interactive playground based on the paper

0:14

Yacine Mahdid

@yacinelearning

Jan 21

alright folks tomorrow january 22 from 10h-12h AM EST we're going to dive into decentralized diffusion models by reviewing the paris model from bagel labs I even managed to lock in @bidhan for an interview on why how what that's a good thing to even do tune in!

7,176

Yacine Mahdid

bagel.com retweeted

Yacine Mahdid

@yacinelearning

Jan 21

Yacine Mahdid

@yacinelearning

Jan 10

this weekend I'll be diving deep into decentralized training for diffusion models with the paris model and the bagel team (what a sentence)

26,722

mirian

bagel.com retweeted

mirian

@mirimayer

Jan 15

no better way to celebrate bagel day 🥯

2,158

bidhan

bagel.com retweeted

bidhan

@bidhan

17 Dec 2025

NeurIPS takeways (better late than never) 1. real AGI needs real continual learning - models that can keep learning without catastrophic forgetting. 2. model architectures need to be "stateful" for building accurate world models for games and robotics. 3. diffusion models are superior for solving both 1 & 2. 4. @bageldotcom's distributed diffusion training architecture is SOTA among both open and closed source frontier lab comparables. 5. the age of research is back, and no better place to do frontier diffusion model research than Bagel Labs. join us - jobs.bagel.com

7,383

Tommy

bagel.com retweeted

Tommy

@Shaughnessy119

3 Dec 2025

Congrats on Paris to @bageldotcom and @bidhan! Open and Decentralized diffusion model shared at @NeurIPSConf

3,668

bidhan

bagel.com retweeted

bidhan

@bidhan

2 Dec 2025

I’m going to give a talk at NeurIPS on decentralized diffusion models later today, come by if you’re around!

3,721

Tanishq Mathew Abraham, Ph.D.

bagel.com retweeted

Tanishq Mathew Abraham, Ph.D.

@iScienceLuvr

22 Nov 2025

Noticing this release now (h/t @yacinelearning) This decentralizing training approach is so cool! I will note I shared this alpha back when Luma released its paper in Jan If you're not following me you're going to be missing alpha 😄

bagel.com

@bageldotcom

7 Oct 2025

Introducing Paris - world's first decentralized trained open-weight diffusion model. We named it Paris after the city that has always been a refuge for those creating without permission. Paris is open for research and commercial use.

0:53

11,916

Yacine Mahdid

bagel.com retweeted

Yacine Mahdid

@yacinelearning

21 Nov 2025

btw folks I am about to dive headfirst into the wondrous world of open-weight diffusion this weekend wish me luck

bagel.com

@bageldotcom

7 Oct 2025

Replying to @bageldotcom

Paris does something that shouldn't work. It's a combination of smaller expert diffusion models pre-trained from scratch, across different continents in complete isolation. Absolutely zero synchronization among each other during training. This zero communication protocol achieves comparable quality to SOTA distributed approaches using 14× less data and 16× less compute. How? See our full technical report and model weights below. Full Technical Report: github.com/bageldotcom/paris… Model Weights: huggingface.co/bageldotcom/p…

125

10,089

bagel.com

bagel.com

@bageldotcom

25 Oct 2025

Latent Diffusion

10,972

bidhan

bagel.com retweeted

bidhan

@bidhan

23 Oct 2025

see you at @NeurIPSConf

10,744

bagel.com

bagel.com

@bageldotcom

9 Oct 2025

Paris - made with ❤️ by bagel labs

bagel.com

@bageldotcom

7 Oct 2025

0:53

8,580