New DeepSeek open-source model: Janus Pro 7B
- image understanding and generation!
- 0.8 on GenEval
- 84.19 on DPG-Bench
- beats DallE3 and Stable diffusion3-M
- 72M synthetic images in pretraining
- text understanding
Stable Diffusion3の動作理論だけべんきょうするのもなんなんで、とりあえず試してみたけど確かにちゃんと文字も含めて生成できるな。
"a photo of a cat holding a sign that says I fxxking love gaming"
参考文献: Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow
arxiv.org/abs/2209.03003
Digital Labyrinth
Prompt share by Stable Diffusion3
ALT an interpretation of a cyborg woman (brunette short hair) is wearing a black and transparent electronic suit made of electronic pieces intertwined with white light, floating particles of light around, standing in the middle of a futuristic labyrinth (beautiful labyrinth with tall walls made of stars) with a galaxy in the background, in the style of surrealism