Songwei Ge

Songwei Ge

20 Photos and videos

Tweets

Pinned Tweet

Songwei Ge

@Songwei_Ge

Jun 3

Reve 2.0 is here! Ever since we moved away from the latent autoencoder, we are making yet another bet to move away from the pure text-input for image generation!

Reve

@reve

Jun 3

Today, we’re launching Reve 2.0, the best 4K image model in the world. We invented a new way to generate and edit any image using precise layouts. For the first time, it’s possible to create images you can touch.

0:54

183

1,067,177

Design Arena

Songwei Ge retweeted

Design Arena

@Designarena

Jun 10

BREAKING: Reve 2.0 by @reve is now 2nd overall on Image Arena with an Elo of 1354. Reve 2.0 establishes a 34 point Elo gap above GPT-Image 1.5 by @OpenAI in 3rd place. With this release, Reve is now the top independent foundation image model lab. Congratulations to the @reve team on this accomplishment!

192

92,344

Hang Gao

Songwei Ge retweeted

Hang Gao

@hangg70

Jun 3

we made a new model for text-to-image generation and editing. the results are looking good and the leaderboard is looking strong. it turns out that nano banana 2 is not impossible to beat, which felt like the case at the beginning of the year. there are a lot of great models out there that get released often. why should you care about reve 2.0? to me, there are mainly two reasons. one being that reve is an underdog, reasonably funded but magnitudes less than other big labs, e.g. oai, google, meta, etc. you might be curious about how we managed to make it to the top. two being that reve 2.0 is a decent model, and we as a team are willing to talk openly about some of our learnings and thoughts that could be helpful. in this post, i want to share mine on reve 2.0 and multimodal in general as a person working on it. first things first, reve 2.0 is a pixel diffusion model with a thing that we call "layout" as the rendering representation. these two things are our research bets that turned out to work amazingly well. pixel diffusion lets us go 4k without sacrificing quality or speed. layout lets us scale better and have better control, which are two sides of the same coin. the field standard has been to use long upsampled prompts for rendering. yet this results in an awkward situation where captioners and users need to describe precise controls with text, which can be inaccurate. this inaccuracy amounts to bad reconstruction and control at test time. it gets worse with scale. and this inherent ambiguity is a curse in current multimodal generators. so what's a layout? a layout is a css of an image, which can be either defined by humans or learned by models. we end up capitalizing a lot on regions, which are good for 2D space. yet this idea naturally generalizes. it turns out to be a standard VLM mid-training task, and that's solvable in good hands. it also brings many good properties in pretraining and post-training, which i am not going to expand on. ideogram independently verified that layout is useful (released on the same day, congrats!). to be clear, these bets are not novel, but to put together a system that makes them work is (and showing it beats nano banana 2). second, it's nice that these bets, among others, worked out. however, like in many cases, there was a long time when things were underperforming. our competitor models are great, and most likely didn't make many risky bets. it is a big pipelining and engineering problem. why should we risk it? in retrospect, the culture of our team and leadership helped a lot. our priorities didn't swing and have stayed focused during our development. the idea makes sense, the execution is good, if things don't work out it's a bug, let's go find it and try more things. by and large, reve remains a research lab with big computers. this is rare. let me tag some amazing ppl here: @Taesung @m_gharbi @Songwei_Ge @TianweiY James Hong @dima_smirnov_ @theSidlak, ... the list goes on. third, we spent most of our time improving text-to-image and didn't do much on editing. and our arena ranks show that. to date, we are #2 on text-to-image yet #9 on image editing. it's honestly a bit embarrassing that we didn't do well in editing, as layout promises to do well. but i am confident that this will improve, as we are juggling bandwidth and resources (we are a small team, and hey, come join us!). fourth, talking about leaderboards and the state of multimodal, i genuinely feel that the gap between labs is shrinking. compared to LLMs, multimodal gen is at least half a year to a year behind. i am talking about architectures and core pipelines. to do good multimodal, you need to do good LLMs. reve has been helped by the OSS community a lot, but we've realized we need to own our language stack. and scaling follows naturally. leaderboards, in turn, are a noisy approximation and average of the real environments that you care about in deployment. they chase scaling and generalizable post-training. reve 2.0 ended up not being driven much by leaderboard evaluation, but relying on our intuition instead. finally, how can multimodal be more useful? this is a question that keeps me up at night. coding has found its product-market fit and is driving up societal productivity. how can multimodal do that too? to me, we are nailing a single-round rollout that leads to an infinite one. this infinite rollout will drive our digital interaction and creation. for this rollout to be good, it needs to be precise. otherwise rollout efficiency is too low for either humans or agents. we are making bets and concrete progress towards that goal, such as converting images into a css-like layout. if you are interested in this topic, i recommend @stuffyokodraws's post for a high-level digest: x.com/stuffyokodraws/status/…. the success of multimodal depends on whether or not it can find a good product-market fit. that's the top question to figure out, then it's the model. it's quite non-linear to be honest, as critical pieces are still missing. but to me it's an area worth pouring my thoughts and efforts into. give our model a spin, try your tasks, move some boxes. in case you find any bugs, please let me know in a reply or DM. hope it can help you.

Reve

@reve

Jun 3

0:54

318

1,082,236

Reve

Songwei Ge retweeted

Reve

@reve

Jun 3

0:54

273

489

5,157

11,887,360

Songwei Ge

Songwei Ge

@Songwei_Ge

Jun 3

Reve 2.0 is here! Ever since we moved away from the latent autoencoder, we are making yet another bet to move away from the pure text-input for image generation!

Reve

@reve

Jun 3

0:54

183

1,067,177

more replies

Songwei Ge

Songwei Ge

@Songwei_Ge

Jun 3

3/ Moreover, instead of talking and praying the model would understand you from deep heart, layout allows everyone to touch the pixels!

288

Songwei Ge

Songwei Ge

@Songwei_Ge

Jun 3

4/ check more details in our blog post! blog.reve.com/posts/the-layo…

Reve: Bring your ideas to life

Reve Image (Halfmoon): A model trained from the ground up to excel at prompt adherence, aesthetics, and typography. Make any image you can imagine with Reve.

blog.reve.com

Arena.ai

Songwei Ge retweeted

Arena.ai

@arena

Jun 3

Reve 2.0 has landed #2 in the Text-to-Image Arena! Scoring 1280, this puts the latest model above Nano Banana 2, MAI-Image-2.5, and GPT-Image-1.5-High Fidelity. This is a 125pt improvement over Reve v1.5. Congratulations to the @reve team on this major milestone!

Reve

@reve

Jun 3

0:54

630

50,969

Songwei Ge

Songwei Ge

@Songwei_Ge

Jun 3

Also @YGandelsman's talk!

Yossi Gandelsman

@YGandelsman

Jun 3

Come to @Taesung’s talk tmrw! You don’t want to miss it!

741

Taesung Park

Songwei Ge retweeted

Taesung Park

@Taesung

Jun 3

👀exciting day ahead! I am free in the afternoon at CVPR. Ping me if anyone wants to chat with me about Reve.

4,983

David McAllister

Songwei Ge retweeted

David McAllister

@davidrmcall

Apr 17

We developed a simple, sample-efficient online RL technique for post-training image generation models. We see it as a possible steerable alternative to CFG, driven by any scalar reward, including human preference.

0:27

376

63,623

Tianwei Yin

Songwei Ge retweeted

Tianwei Yin

@TianweiY

Feb 24

Reve V1.5 has officially landed in the Top 3 on @arena! 🥉 I’m incredibly proud of our small, crack team. We’re proving that a lean, focused group can stay right in the race against trillion-dollar giants like Google and OpenAI. This win is built on a foundation of continuous improvements to our data & training infra, alongside architectural leaps—like native 4K generation—and novel pre- and post-training techniques. The momentum here is unmatched. Stay tuned—image editing and new breakthroughs are coming very shortly.🚀

Arena.ai

@arena

Feb 24

Reve V1.5 lands top 3 in Image Arena behind GPT-Image-1.5 and variants of Nano Banana Pro. A strong showing for @Reve’s first Text-to-Image model on Arena which delivers up to 4k output. Highlights: - #4, scoring 1177, on par with Grok-Imagine-Image - Top 5 for categories: Text Rendering, Art and Product, Branding Commercial Design Congrats to the Reve team on this milestone. 👏

109

17,937

Reve

Songwei Ge retweeted

Reve

@reve

Feb 24

Reve v1.5 is here. Our latest image model, now with 4K resolution.

0:24

608

581,739

Songwei Ge

Songwei Ge

@Songwei_Ge

Feb 24

I'm beyond excited about this native 4k image generation model that we've worked on for months! It feels great that the image can be directly used as a wallpaper.

Taesung Park

@Taesung

Feb 24

Reve's new text-to-image model is here. Really proud of the team to rank at #3 with the big labs. To my knowledge, we are the first lab to use native pixel space diffusion without latent autoencoder at 4k (16MP) resolution for production level image generation.

4,723

Reve

Songwei Ge retweeted

Reve

@reve

Jan 13

Introducing Effects. Out now in Reve. Any filter you can imagine, at your fingertips.

0:18

1,143

467,287

Qianqian Wang

Songwei Ge retweeted

Qianqian Wang @QianqianWang5

3 Dec 2025

I'm recruiting multiple PhD students this cycle to join me at Harvard University and the Kempner Institute! My interests span vision and intelligence, including 3D/4D, active perception, memory, representation learning, and anything you're excited to explore! Deadline: Dec 15th.

151

920

175,969

Songwei Ge

Songwei Ge

@Songwei_Ge

8 Nov 2025

Check out our cheap and good editing model!

Reve

@reve

7 Nov 2025

Faster models are here. reve-edit-fast and reve-remix-fast are now available, and cheaper than ever.

1,061

Songwei Ge

Songwei Ge

@Songwei_Ge

25 Oct 2025

Honored to receive the Larry S. Davis Doctoral Dissertation Award! Big thanks to my advisors for the constant support — excited to continue pushing the frontier at @reve!!

Jia-Bin Huang

@jbhuang0604

24 Oct 2025

Proud advisor moment 😊 Congrats @Songwei_Ge for winning the Larry S. Davis Doctoral Dissertation Award @umdcs! Songwei is now cooking as a research scientist at @reve. Looking forward to amazing work!

10,374

Songwei Ge

Songwei Ge

@Songwei_Ge

21 Oct 2025

Come and join us at the party tomorrow!! I will also be around at the conference and would love to catch up and chat!

Reve

@reve

20 Oct 2025

Calling all #ICCV2025 attendees! We're hosting a Reve social where you can meet our team and unwind on Tuesday evening. RSVP here, limited spots remaining: luma.com/9dy5x3j4

4,337