Postdoc at @sciencetokyo_en | NLP/ML, Generative models, diffusion LMs, LLM test-time scaling | Ph.D., UTokyo

Joined October 2022
11 Photos and videos
Pinned Tweet
Two papers accepted to #ICLR2026 🇧🇷 (1 first, 1 second author) Huge thanks to my co-authors and collaborators! @Bollegala @MasahiroKaneko_ @chokkanorg @junpeikomiyama @stillpedant More details soon!
1
8
47
6,286
Daisuke OBA retweeted
Congratulations to Google on open-sourcing Gemma Diffusion! I want to give a shout-out to a group of really talented Cornell students who developed in the lab a lot of the new ideas that we see in this model: @mariannearr -- Block diffusion is what enables Gemma Diffusion to generate arbitrary length sequences and support KV caching. @mariannearr @SchiffYair -- Efficient encoder-decoder diffusion (E2D2) extends block diffusion and is part of what makes Gemma really fast, speeding up inference by running a smaller decoder model. @SchiffYair @ssahoo_ @Guanghan__Wang -- Uniform diffusion LMs (UDLMs) are the family of discrete diffusion models that underlie Gemma and define its noise process and training objective. This work builds on our earlier simplified losses in MDLMs. @ssahoo_ -- Uniform diffusion supports built-in error correction and is especially effective with distilled fast samplers like the ones introduced in Duo. This is a great overview of Gemma Diffusion: newsletter.maartengrootendor… Check out the students' papers below:
7
78
600
26,702
Daisuke OBA retweeted
DiffusionGemma is an open, experimental model that brings our text diffusion research to Gemma 4. It’s a racehorse 🏇achieving up to 4x faster inference by generating entire blocks of text simultaneously vs predicting token-by-token (word-by-word) output!
182
399
3,262
299,690
1/ New preprint: Drifting Objectives for Refining Discrete Diffusion Language Models Can drifting be used beyond continuous generators? We study this in the setting of refining pretrained discrete diffusion language models (DDLMs). Our method, TokenDrift, provides a differentiable soft-token interface that lets feature-space drifting signals update categorical token logits. Main observation: Gen.-PPL improves throughout drifting training at fixed denoising budgets.
2
6
23
2,373
6/ The soft-token part matters. A straight-through hard-token variant still has a surrogate gradient path, but performs much worse and suffers severe entropy collapse. So differentiability alone is not enough: the feature encoder needs to see the model's uncertainty through probability-weighted embeddings (pE).
1
1
2
133
Daisuke OBA retweeted
GPT-OSS-Swallow v0.1 の MXFP4 版を公開しました。 GPT-OSS-Swallow を、より少ないメモリで動かせるようにするための追加リリースです。これにより、これまで動作環境の制約で試しづらかった場合にも、利用しやすくなります。 huggingface.co/collections/t…
2
11
24
2,108
Daisuke OBA retweeted
We propose HATCH🐣, a human-inspired training framework for multi-image spatial reasoning in VLMs 🐤 HATCH improves multi-image spatial reasoning ability while preserving single-image reasoning capabilities 🐓 📚️arxiv.org/abs/2602.08735
Two first-author papers accepted to #ICML2026 🇰🇷 ! - Human-like multi-image spatial reasoning in multimodal LLMs (@silviasetitech @sponddd @dai0NLP Prof. Inoue @chokkanorg) - Autoregressive direct preference optimization (Mahiro Ukai @MasahiroKaneko_ @chokkanorg Prof. Inoue)
6
23
1,731
Daisuke OBA retweeted
Two first-author papers accepted to #ICML2026 🇰🇷 ! - Human-like multi-image spatial reasoning in multimodal LLMs (@silviasetitech @sponddd @dai0NLP Prof. Inoue @chokkanorg) - Autoregressive direct preference optimization (Mahiro Ukai @MasahiroKaneko_ @chokkanorg Prof. Inoue)
1
20
95
22,563
Daisuke OBA retweeted
Our paper accepted to #ICML2026 🇰🇷(first author)! This paper is on budget-aligned test-time scaling of LLMs. It is my first ML conference paper! Huge thanks to my co-authors ! @dai0NLP @chokkanorg Preprint: arxiv.org/abs/2602.09574 More details soon!
11
74
6,152
Also at #ICLR2026 🇧🇷: Presenting Best-of-∞ on behalf of lead author @jkomiyama_ — principled Bayesian stopping that approximates the N→∞ majority-voting limit, plus optimal LLM-ensemble weights via MILP! 🕓25th April, 10:30 AM 📍Pavilion 4, #4710 w/ @jkomiyama_ @stillpedant
5
21
1,950
Excited to present SureLock at #ICLR2026 🇧🇷 — a principled decoding method that locks converged tokens in Masked Diffusion Language Models, cutting 30–50% FLOPs at same quality! w/ @Bollegala @MasahiroKaneko_ @chokkanorg 🕙 Friday, 24th April, 10:30 AM 📍Pavilion 3 (#826)
11
35
4,080
Daisuke OBA retweeted
🇧🇷 Excited to present our paper "Stopping Computation for Converged Tokens in Masked Diffusion-LM Decoding" at #ICLR2026 in Rio de Janeiro in just two days! 🏖️ iclr.cc/virtual/2026/poster/… (Friday 24th 10:30-13:00 poster session) Masked Diffusion LMs generate sequences via iterative sampling, but they waste significant compute by repeatedly re-evaluating tokens that have already converged. To fix this, we introduce SureLock 🔒: a method that permanently locks stable tokens during decoding. By caching their attention keys/values and skipping their query projection and feed-forward sublayers, we drastically cut down on redundant computation. 🚀 The result? We achieve a 30–50% reduction in algorithmic FLOPs on LLaDA-8B with virtually no loss in generation quality! If you are attending ICLR, come stop by our presentation! w/ @dai0NLP @MasahiroKaneko_ @chokkanorg @LivUni @AmazonScience code/paper: daioba.github.io/surelock/
7
33
3,133
Daisuke OBA retweeted
Qwen3-Swallow と GPT-OSS-Swallow モデルを公開しました。 RL学習の担当をしました。 強化学習の段階においても、日本語タスクの性能改善が見られています。
📢 GPT-OSS Swallow と Qwen3 Swallow をリリースしました。 継続事前学習+SFT+強化学習を全面刷新し、 日本語性能と推論能力を両立させたオープンなLLMを、 Apache 2.0ライセンスで利用できます。 Qwen3 Swallow: swallow-llm.github.io/qwen3-… GPT-OSS Swallow: swallow-llm.github.io/gptoss…
1
29
155
20,890
Daisuke OBA retweeted
We are thrilled to announce the release of GPT-OSS Swallow and Qwen3 Swallow 🎉 I was involved in evaluation, framework development, and mentoring as a student leader. Leaderboard: swallow-llm.github.io/leader… Swallow-Evaluation-Instruct: github.com/swallow-llm/swall…

📢 GPT-OSS Swallow と Qwen3 Swallow をリリースしました。 継続事前学習+SFT+強化学習を全面刷新し、 日本語性能と推論能力を両立させたオープンなLLMを、 Apache 2.0ライセンスで利用できます。 Qwen3 Swallow: swallow-llm.github.io/qwen3-… GPT-OSS Swallow: swallow-llm.github.io/gptoss…
8
20
7,290
Daisuke OBA retweeted
📢 GPT-OSS Swallow と Qwen3 Swallow をリリースしました。 継続事前学習+SFT+強化学習を全面刷新し、 日本語性能と推論能力を両立させたオープンなLLMを、 Apache 2.0ライセンスで利用できます。 Qwen3 Swallow: swallow-llm.github.io/qwen3-… GPT-OSS Swallow: swallow-llm.github.io/gptoss…
13
341
1,255
238,011
Daisuke OBA retweeted
Two papers accepted to @ICLR 2026 🎉Congrats and kudos to my amazing collaborators. @dai0NLP @MasahiroKaneko_ @chokkanorg T.Yamamoto R. Kumon @verypluming One paper on How to make Diffusion Models efficient and the other on proving the existence of culture-specific neurones.
4
28
2,671