Daisuke OBA

Daisuke OBA

11 Photos and videos

Tweets

Pinned Tweet

Daisuke OBA

@dai0NLP

Jan 26

Two papers accepted to #ICLR2026 🇧🇷 (1 first, 1 second author) Huge thanks to my co-authors and collaborators! @Bollegala @MasahiroKaneko_ @chokkanorg @junpeikomiyama @stillpedant More details soon!

6,286

Volodymyr Kuleshov 🇺🇦

Daisuke OBA retweeted

Volodymyr Kuleshov 🇺🇦

@volokuleshov

Jun 11

Congratulations to Google on open-sourcing Gemma Diffusion! I want to give a shout-out to a group of really talented Cornell students who developed in the lab a lot of the new ideas that we see in this model: @mariannearr -- Block diffusion is what enables Gemma Diffusion to generate arbitrary length sequences and support KV caching. @mariannearr @SchiffYair -- Efficient encoder-decoder diffusion (E2D2) extends block diffusion and is part of what makes Gemma really fast, speeding up inference by running a smaller decoder model. @SchiffYair @ssahoo_ @Guanghan__Wang -- Uniform diffusion LMs (UDLMs) are the family of discrete diffusion models that underlie Gemma and define its noise process and training objective. This work builds on our earlier simplified losses in MDLMs. @ssahoo_ -- Uniform diffusion supports built-in error correction and is especially effective with distilled fast samplers like the ones introduced in Duo. This is a great overview of Gemma Diffusion: newsletter.maartengrootendor… Check out the students' papers below:

600

26,702

Sundar Pichai

Daisuke OBA retweeted

Sundar Pichai

@sundarpichai

Jun 10

DiffusionGemma is an open, experimental model that brings our text diffusion research to Gemma 4. It’s a racehorse 🏇achieving up to 4x faster inference by generating entire blocks of text simultaneously vs predicting token-by-token (word-by-word) output!

0:09

182

399

3,262

299,690

Daisuke OBA

Daisuke OBA

@dai0NLP

Jun 9

1/ New preprint: Drifting Objectives for Refining Discrete Diffusion Language Models Can drifting be used beyond continuous generators? We study this in the setting of refining pretrained discrete diffusion language models (DDLMs). Our method, TokenDrift, provides a differentiable soft-token interface that lets feature-space drifting signals update categorical token logits. Main observation: Gen.-PPL improves throughout drifting training at fixed denoising budgets.

2,373

more replies

Daisuke OBA

Daisuke OBA

@dai0NLP

Jun 9

6/ The soft-token part matters. A straight-through hard-token variant still has a surrogate gradient path, but performs much worse and suffers severe entropy collapse. So differentiability alone is not enough: the feature encoder needs to see the model's uncertainty through probability-weighted embeddings (pE).

133

Daisuke OBA

Daisuke OBA

@dai0NLP

Jun 9

7/ Takeaway: drifting can refine discrete diffusion LMs when feature-space drift is connected to categorical logits through a soft-token interface. Paper: arxiv.org/abs/2605.19470 w/ @frt03_ @chokkanorg

Drifting Objectives for Refining Discrete Diffusion Language Models

Discrete diffusion language models (DDLMs) generate text by iteratively denoising categorical token sequences, while recent drifting methods for continuous generators suggest that part of this...

arxiv.org

146

Yukito Tajima

Daisuke OBA retweeted

Yukito Tajima @TitaniumJely

May 28

GPT-OSS-Swallow v0.1 の MXFP4 版を公開しました。 GPT-OSS-Swallow を、より少ないメモリで動かせるようにするための追加リリースです。これにより、これまで動作環境の制約で試しづらかった場合にも、利用しやすくなります。 huggingface.co/collections/t…

GPT-OSS-Swallow-v0.1 - a tokyotech-llm Collection

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

2,108

Masanari Oi

Daisuke OBA retweeted

Masanari Oi @stjohn2007

May 1

We propose HATCH🐣, a human-inspired training framework for multi-image spatial reasoning in VLMs 🐤 HATCH improves multi-image spatial reasoning ability while preserving single-image reasoning capabilities 🐓 📚️arxiv.org/abs/2602.08735

Masanari Oi @stjohn2007

May 1

Two first-author papers accepted to #ICML2026 🇰🇷 ! - Human-like multi-image spatial reasoning in multimodal LLMs (@silviasetitech @sponddd @dai0NLP Prof. Inoue @chokkanorg) - Autoregressive direct preference optimization (Mahiro Ukai @MasahiroKaneko_ @chokkanorg Prof. Inoue)

1,731

Masanari Oi

Daisuke OBA retweeted

Masanari Oi @stjohn2007

May 1

22,563

Sora Miyamoto

Daisuke OBA retweeted

Sora Miyamoto @SoraMiyamo0831

May 1

Our paper accepted to #ICML2026 🇰🇷(first author)! This paper is on budget-aligned test-time scaling of LLMs. It is my first ML conference paper! Huge thanks to my co-authors ! @dai0NLP @chokkanorg Preprint: arxiv.org/abs/2602.09574 More details soon!

6,152

Daisuke OBA

Daisuke OBA

@dai0NLP

Apr 22

Also at #ICLR2026 🇧🇷: Presenting Best-of-∞ on behalf of lead author @jkomiyama_ — principled Bayesian stopping that approximates the N→∞ majority-voting limit, plus optimal LLM-ensemble weights via MILP! 🕓25th April, 10:30 AM 📍Pavilion 4, #4710 w/ @jkomiyama_ @stillpedant

1,950

Daisuke OBA

Daisuke OBA

@dai0NLP

Apr 22

Excited to present SureLock at #ICLR2026 🇧🇷 — a principled decoding method that locks converged tokens in Masked Diffusion Language Models, cutting 30–50% FLOPs at same quality! w/ @Bollegala @MasahiroKaneko_ @chokkanorg 🕙 Friday, 24th April, 10:30 AM 📍Pavilion 3 (#826)

4,080

Prof. Danushka Bollegala

Daisuke OBA retweeted

Prof. Danushka Bollegala

@Bollegala

Apr 20

🇧🇷 Excited to present our paper "Stopping Computation for Converged Tokens in Masked Diffusion-LM Decoding" at #ICLR2026 in Rio de Janeiro in just two days! 🏖️ iclr.cc/virtual/2026/poster/… (Friday 24th 10:30-13:00 poster session) Masked Diffusion LMs generate sequences via iterative sampling, but they waste significant compute by repeatedly re-evaluating tokens that have already converged. To fix this, we introduce SureLock 🔒: a method that permanently locks stable tokens during decoding. By caching their attention keys/values and skipping their query projection and feed-forward sublayers, we drastically cut down on redundant computation. 🚀 The result? We achieve a 30–50% reduction in algorithmic FLOPs on LLaDA-8B with virtually no loss in generation quality! If you are attending ICLR, come stop by our presentation! w/ @dai0NLP @MasahiroKaneko_ @chokkanorg @LivUni @AmazonScience code/paper: daioba.github.io/surelock/

3,133

Taishi Nakamura

Daisuke OBA retweeted

Taishi Nakamura

@taishinakamura_

Feb 20

Qwen3-Swallow と GPT-OSS-Swallow モデルを公開しました。 RL学習の担当をしました。強化学習の段階においても、日本語タスクの性能改善が見られています。

Naoaki Okazaki @chokkanorg

Feb 20

📢 GPT-OSS Swallow と Qwen3 Swallow をリリースしました。継続事前学習＋SFT＋強化学習を全面刷新し、日本語性能と推論能力を両立させたオープンなLLMを、 Apache 2.0ライセンスで利用できます。 Qwen3 Swallow: swallow-llm.github.io/qwen3-… GPT-OSS Swallow: swallow-llm.github.io/gptoss…

155

20,890

Koshiro Saito

Daisuke OBA retweeted

Koshiro Saito @koshiro_sa110

Feb 20

We are thrilled to announce the release of GPT-OSS Swallow and Qwen3 Swallow 🎉 I was involved in evaluation, framework development, and mentoring as a student leader. Leaderboard: swallow-llm.github.io/leader… Swallow-Evaluation-Instruct: github.com/swallow-llm/swall…

Naoaki Okazaki @chokkanorg

Feb 20

7,290

Naoaki Okazaki

Daisuke OBA retweeted

Naoaki Okazaki @chokkanorg

Feb 20

Qwen3 Swallow

Qwen3の日本語能力と推論能力を強化した大規模言語モデル (8B, 30B-A3B, 32B)

swallow-llm.github.io

341

1,255

238,011

Prof. Danushka Bollegala

Daisuke OBA retweeted

Prof. Danushka Bollegala

@Bollegala

Jan 27

Two papers accepted to @ICLR 2026 🎉Congrats and kudos to my amazing collaborators. @dai0NLP @MasahiroKaneko_ @chokkanorg T.Yamamoto R. Kumon @verypluming One paper on How to make Diffusion Models efficient and the other on proving the existence of culture-specific neurones.

2,671