Research Intern @Apple MLR; PhD ML @UniofOxford; generative modelling; previously at @MSFTResearch, @EPFL, @imperialcollege

Joined May 2024
21 Photos and videos
Pinned Tweet
We were all wondering whether Categorical Flow Maps (CFMs) could scale... 🤔 I couldn't help trying it out... So we scaled CFMs to 1.7B parameters over 2.1T tokens 🚀🔥 Short summary 🧵⬇️
4
32
128
15,813
Oscar Davis retweeted
Protein–ligand cofolding models are getting incredibly powerful… but do they have to be so slow? 🧬🐢💊 Our new preprint introduces a new flow-map framework called DeCAF for fast few-step cofolding — up to 5× faster while preserving sample quality on the SOTA Pearl model and 20x faster than Boltz 1x. ⚡🧵 📜 Blog: genesis.ml/news/genesis-mode… 🔗arXiv: arxiv.org/abs/2606.08375 Code (coming soon): github.com/genesistherapeuti…
5
24
153
19,556
Oscar Davis retweeted
Meet DiffusionGemma! An experimental open model that explores a fast approach to text generation, released under an Apache 2.0 license. Moving beyond sequential, token-by-token processes to generate entire blocks of text simultaneously. Here’s what’s new with DiffusionGemma: 👇
168
808
5,018
917,156
Oscar Davis retweeted
🔥 New paper: BlockGen: Flexible Blockwise Sequence Modeling with Hybrid Samplers Are uniform-state diffusion models (USDMs) always stronger than masked (MDMs) ones? Recent work suggests so. However, a few questions remain open 🤔 w/ @caglarml (1/11)
4
21
57
3,298
Oscar Davis retweeted
super excited to share our latest work! are we really tilting? 🤨 tldr: reward guidance for flows and diffusions is supposed to sample from the reward-tilted distribution. we show it doesn’t 😰 and how to (mostly) fix it ✨ plus lots of fun images!! 🖼️ collaboration with the awesome @nmboffi website: sanjitdp.github.io/are-we-re… paper: arxiv.org/abs/2606.02884 code: github.com/sanjitdp/reward-g…
3
17
101
15,657
Oscar Davis retweeted
Why Gaussian diffusion models fail on text data and how to prevent it? ☝️ We find that discrete-like latent spaces are fundamentally bad for continuous diffusions. ☝️ We explain what happens inside, and why self-conditioning and other heuristics improve generation. 🧵👇 1/7
6
35
222
16,801
Oscar Davis retweeted
🧵1/9 Rex has been selected for an oral presentation at #ICML2026! 🎉 We make diffusion/flow model solvers exactly reversible (bijection), for both ODEs and SDEs Last project of my Ph.D. w/ Chen Liu @ClarksonUniv → first project of my postdoc at #AITHYRA!
4
25
125
24,914
Oscar Davis retweeted
My strategy as a new PI: hire people who submit ICML orals before they even unpack their bags. I had absolutely nothing to do with this paper, but I'm thrilled he's my first hire at #AITHYRA. Check out his thread on exact reversibility below.
🧵1/9 Rex has been selected for an oral presentation at #ICML2026! 🎉 We make diffusion/flow model solvers exactly reversible (bijection), for both ODEs and SDEs Last project of my Ph.D. w/ Chen Liu @ClarksonUniv → first project of my postdoc at #AITHYRA!
2
8
129
20,111
Oscar Davis retweeted
Can LLMs reason in superposition? We introduce MUX, a method that turns text CoT into latent continuous reasoning. Instead of one-hot vectors as in CoT, the model now learns to predict weighted averages of several one-hot vectors, that we call multiplexed tokens. These multiplexed tokens can be designed to be lossless, so by predicting them one is essentially doing multi-token prediction (MTP) in superposition. MUX is the best latent reasoning method across 32 math settings spanning 1-8B LLaMA base models, reducing CoT length by 3-6x. Furthermore, it is able to perform parallel search, harnessing a core strength of superposed reasoning. In collaboration with @alperen_gozeten , @mmbronstein, @ismaililkanc, and @jw9730. 1/🧵
6
24
114
19,197
Oscar Davis retweeted
Introducing Strong Stochastic Flow Maps TLDR: Stochastic Flow Maps where we learn the stochastic solution path. Work led by Sam McCallum, @zwblasingame, with Timothy Herschelll, @AlexanderTong7, and @JamesFosterBath Arxiv: arxiv.org/pdf/2606.01086 Code: github.com/sammccallum/ssfm
6
76
362
73,424
Oscar Davis retweeted
Over the weekend, I was using codex to update my homepage and a paper I wrote a year ago on the topic of diffusion LLMs (should be updated on Monday). tsong.me/blog/inference-time… While I did not want to make it too explicit back then, I have argued that discrete diffusion LLMs were not the right thing to do and if diffusion ever works on LLMs continuous dLLMs are the way to go. A year later, we are seeing a lot cool papers in this space, and I hope the community can push for something practical and scalable.
8
14
172
14,984
Oscar Davis retweeted
Can we guide flow models in just a few steps? 🚀 Flow-based sampling is rapidly moving toward few-step generation. But reward guidance often still requires many steps and costly test-time search. Excited to introduce Flow Map Reward Guidance (FMRG): a training-free framework for few-step guidance with flow maps. FMRG matches or surpasses strong baselines on inverse problems and reward-guided text-to-image generation with: ⚡ as few as 3 NFEs ⚡ up to 10× fewer NFEs on inverse problems ⚡ up to 70× fewer NFEs on reward-guided generation 🧵⬇️
3
15
89
21,238
Oscar Davis retweeted
We are recruiting multiple postdocs at Oxford: cs.ox.ac.uk/news/2520-full.h…
2
19
109
16,038
Oscar Davis retweeted
The way forward for discrete DLM is to turn them into continuous DLM ;)
1/ Non-autoregressive language models promised massive parallel speedups, but aggressive decoding always led to catastrophic quality collapse. Until now. By replacing rigid discrete token choices with soft continuous trajectories, we can now decode >5x faster. 🧵
1
9
60
7,798
Oscar Davis retweeted
[📄preprint] Diffusion models 🤝 MCMC ! Diffusion model samplers are biased due to discretisation 💡The fix: Metropolis-type adjustment on corrector steps ❗️Challenge: no access to the density ratio, only the score 🔑Insight: the score (and some maths) is all you need... [1/3]
5
50
345
18,883
Oscar Davis retweeted
May 21
Guide with examples, not rewards 🐘 Controlling what a pretrained generative model produces is still mostly a choice between three slow options: fine-tune it, attach a reward network, or search at inference. We found flow matching allows a fourth, and it costs almost nothing. In deterministic interpolants, the velocity of the flow is determined by where the trajectory is headed: the endpoint mean. Shift that mean, and the entire flow shifts with it. This turns control into a matter of reference. Change the examples that define the endpoint, and you change the direction the model follows. The examples need not be perfect. They only need to point the flow toward the attribute you want. Color, identity, style, and structure, all controllable through examples. 🧵👇
6
29
169
33,991
Oscar Davis retweeted
Very excited about our work on finding the right drifting direction 🐎 We tackle a core open question in drifting: when does “no drift left” mean the model really matched the data? Kernel-gradient drifting is the answer (with natural extensions to manifolds discrete data)!
🏎️Drift in the right direction🏎️ Introducing kernel-gradient drifting models: a reformulation of drifting models where the kernel itself defines the direction of motion through its gradient. 📜Paper: arxiv.org/pdf/2605.10727 💾Notebook: tinyurl.com/mv2jhuky
12
81
10,197
We were all wondering whether Categorical Flow Maps (CFMs) could scale... 🤔 I couldn't help trying it out... So we scaled CFMs to 1.7B parameters over 2.1T tokens 🚀🔥 Short summary 🧵⬇️
4
32
128
15,813
We had little time for this work, as I had just arrived at @Apple MLR. Stay tuned for what's coming next! 😉 Thanks a lot to the incredible team that helped make it possible! ❤️ @NasFilippova @PierreAblin @victorturrisi @AmitisShidani1 M. Cuturi and @LouisBAlgue
1
6
742