Most diffusion research today asks: How can we sample faster?
But I think another question is equally important: Are we training diffusion models in the right way?
famous-bubbler-dcc.notion.siā¦
Most diffusion research today asks: How can we sample faster?
But I think another question is equally important: Are we training diffusion models in the right way?
famous-bubbler-dcc.notion.siā¦
This is my first time turning some of my research thoughts into a blog post, so it may contain errors or unclear arguments. Any suggestions, comments, or feedback would be greatly appreciatedš
Today we're releasing ZAYA1-74B-Preview, a major milestone in scaling pretraining on @AMD.
ZAYA1-74B-Preview is a 4B active / 74B total MoE.
This preview model is a strong pre-RL base checkpoint. The final post-trained reasoning model is coming soon. š§µ
Really excited to see Uni-1 out in the world š„Our first unified model.
The range of things this model can do is wild: image-to-~100 styles, manga generation, multi-ref with strong identity preservation, temporal storytelling, sketch-to-image, spatial reasoning, multilingual infographics, layering⦠the capability range is honestly unreal. this is just the start 𫔠check out the blog to learn more lumalabs.ai/uni-1
Proud of the team and what weāre building at @LumaLabsAI š
Introducing Uni-1, Lumaās first unified understanding and generation model, our next step on the path towards unified general intelligence.
lumalabs.ai/uni-1
ALT Combine the black and white curly-haired dog with pink bandana, the Boston Terrier in plaid harness, and the black-and-white cat into a single scene where they are dressed in academic regalia, standing before a whiteboard filled with scientific diagrams and text, with the Luma AI logo placed in the top-left corner.
Iām currently in transit to San Diego for NeurIPS. If youāre also killing time, feel free to check out a 2-minute-30-second horror sci-fi short film Michael and I recently created. Weād love any comments or likes:
devpost.com/software/dreamcaā¦
Looking forward to catching up at the venue! š„
I feel the debate shouldnāt only be about whether DiT is effective, but also about how information preservation is the key to accelerating diffusion training. Our MicroDiT (arxiv.org/abs/2407.15811) paper showed this: by letting masked token info mix into unmasked ones, we can cut down a lot of tokens with only minor performance loss.
Interestingly, two months ago, when I caught up with @StefanABaumann at #CVPR, we discussed how TREAD and MicroDiT are conceptually similar from info perspective. Maybe itās time to look at diffusion through an information-theoretic lens: from post-training (for the better alignment) to latent space curation, I believe this could lead to some really exciting discoveries!
Introducing Look Studio.
Style looks from scratch with 1M products from designer brands - including shoes, multiple layers and more.
Reply for an invite.
Excited to introduce Reka Vision, an agentic visual understanding and search platform. Transform your unstructured multimodal data into insights and actions.
Introducing our V1 Video Model. It's fun, easy, and beautiful. Available at 10$/month, it's the first video model for *everyone* and it's available now.
Heading to Nashville šø for @CVPR (06/11 - 06/16)!
Always excited to catch up with old friends and make new connections. Letās grab a coffee āļø or chat about diffusion models, post-training, or just life!
#CVPR2025#Diffusion#GenerativeAI#Nashville
š¢ Our paper "Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer" has been accepted to hashtag#MLSys2025, taking place May 12-15! Excited to share our research at the intersection of machine learning and systems in San Jose, CA. š
Check out the full program here: lnkd.in/ecyzGbwJ
hashtag#MLSys hashtag#MachineLearning hashtag#Systems hashtag#Conference
Delighted to see MicroDiffusion paper being accepted at CVPR.
Checkout the code and models if you are looking for an extremely low cost setup for latent diffusion models.
Following fully open-source philosophy, weāve released the official training code, data code, and model ckpts for our micro-budget training of diffusion models from scratch (MicroDiTs).
Now anyone can train a Stable Diffusion v1/v2-quality model from scratch in just 2.5 days using 8 H100 GPUs (<$2000 cost).
Github: github.com/SonyResearch/micrā¦
Checkpoints: huggingface.co/VSehwag24/Micā¦@SonyAI_global 1/3
Following fully open-source philosophy, weāve released the official training code, data code, and model ckpts for our micro-budget training of diffusion models from scratch (MicroDiTs).
Now anyone can train a Stable Diffusion v1/v2-quality model from scratch in just 2.5 days using 8 H100 GPUs (<$2000 cost).
Github: github.com/SonyResearch/micrā¦
Checkpoints: huggingface.co/VSehwag24/Micā¦@SonyAI_global 1/3