Opinions are my own. Ph.D. student at @RiceCompSci in @vislang. Previously @RealityLabs @AdobeResearch, @InariAILab, @AdaVivInc.

Joined December 2017
2 Photos and videos
Jefferson Enrique Hernandez Cevallos retweeted
You may have recently heard claims that video generation models are "dumb" about physics, and only "world models" (V-JEPA, specifically) have a valid internal model of physics. This turns out to be false. In a recent paper, researchers show that a LINEAR probe of diffusion videogen models predict various "physics" very well, significantly better than V-JEPA or VideoMAE (and plain VAE just sucks). This is noteworthy, because a *linear* probe being this accurate shows that the model has a pretty explicit internal representation of the physics!
42
107
1,066
99,765
Jefferson Enrique Hernandez Cevallos retweeted
Jun 6
Replying to @GergelyOrosz
don't agree tbh. data labeling sounds low status but it's actually incredibly valuable work and no one is above it
21
26
667
59,652
Jefferson Enrique Hernandez Cevallos retweeted
This may be a controversial take, but I think it needs to be said: the gap between computer vision research in academia and industry is widening with every conference. A huge fraction of @CVPR papersโ€”especially those that boil down to "we tweaked/fine-tuned/RL'ed large-scale model X to improve on task Y"โ€”will become obsolete with the next model release. That's not where academia creates lasting value. PIs should adapt much faster to this changing reality. Academia should focus on fundamentally new ideas, new problem formulations, explaining emergent phenomenology, or uncovering blind spots that industry can later solve with scale, compute, and data.
37
115
1,125
94,357
Jefferson Enrique Hernandez Cevallos retweeted
Scaling laws describe how loss changes with scale. Do neurons inside models change predictably too? We study vision and language models up to 30B params and find systematic scaling in neuron universality, specialization, and selectivity. Paper code: avdravid.github.io/rosetta-nโ€ฆ 1/n
13
83
415
202,304
Jefferson Enrique Hernandez Cevallos retweeted
Can coding agents stay coherent over a 1 billion token budget? Can they build Slack from scratch? Rewrite a JAX codebase in PyTorch? Build a C compiler in Rust? Enter SWE-Marathon: a benchmark for autonomous long-horizon software work.
49
65
680
794,763
Jefferson Enrique Hernandez Cevallos retweeted
super excited to share our latest work! are we really tilting? ๐Ÿคจ tldr: reward guidance for flows and diffusions is supposed to sample from the reward-tilted distribution. we show it doesnโ€™t ๐Ÿ˜ฐ and how to (mostly) fix it โœจ plus lots of fun images!! ๐Ÿ–ผ๏ธ collaboration with the awesome @nmboffi website: sanjitdp.github.io/are-we-reโ€ฆ paper: arxiv.org/abs/2606.02884 code: github.com/sanjitdp/reward-gโ€ฆ
3
17
101
15,642
Jefferson Enrique Hernandez Cevallos retweeted
ใ“ใ‚“ใช้ข็™ฝใ„็ ”็ฉถใ‚ใฃใŸใฎใญ ใƒ‡ใƒผใ‚ฟใ‚ปใƒƒใƒˆใฎ้‡่ค‡ใ—ใชใ„ใ‚ตใƒ–ใ‚ปใƒƒใƒˆ2ใค็”จๆ„ใ—ใฆใใ‚Œใžใ‚Œใงๅˆฅใฎๆ‹กๆ•ฃใƒขใƒ‡ใƒซ่จ“็ทดใ™ใ‚‹ๆ™‚ใ€ใƒ‡ใƒผใ‚ฟๆ•ฐๅข—ใ‚„ใ—ใฆใ‚†ใใจๅŒใ˜ใƒŽใ‚คใ‚บใŒไผผใŸใ‚ˆใ†ใช็”ปๅƒใ‚’ไฝœใ‚‹ใ‚ˆใ†ใซใชใ‚‹ใ€ใจ openreview.net/forum?id=ANvmโ€ฆ
2
20
184
15,855
Jefferson Enrique Hernandez Cevallos retweeted
Camera pose matters for video understanding! Today's MLLMs excel at recognizing activities, but still struggle with the underlying space and ego/object dynamics in video. We trace this gap to a missing piece: camera pose. Introducing Cambrian-P: a multimodal LLM natively grounded in camera pose. (1/n)
2
47
277
53,710
Jefferson Enrique Hernandez Cevallos retweeted
One of the hottest terms in AI right now is "On-policy distillation". It is a post-training technique in which a student model, typically an LLM, samples from its current policy and receives a teacher signal for on-policy states. It combines the dense supervision of distillation with the locality of online RL. Now a method on PapersWithCode! Find all 183 papers that cite it, and more here: paperswithcode.co/methods/onโ€ฆ
21
127
1,125
84,431
Jefferson Enrique Hernandez Cevallos retweeted
I trained an autoencoder that reconstructs images with zero reconstruction loss. No MSE. No image space supervision. The only signal: "According to you, does your output look like your input through your own eyes?" It works. Blog link, demo and summary ๐Ÿ‘‡
24
47
614
68,218
Jefferson Enrique Hernandez Cevallos retweeted
๐—ง๐—ต๐—ฒ ๐—ฟ๐—ฒ๐—ฐ๐—ผ๐—ฟ๐—ฑ๐—ถ๐—ป๐—ด ๐—ผ๐—ณ ๐—Ÿ๐˜‚๐—ฐ๐—ฎ๐˜€ ๐—•๐—ฒ๐˜†๐—ฒ๐—ฟ'๐˜€ (@giffmana) ๐—น๐—ฒ๐—ฐ๐˜๐˜‚๐—ฟ๐—ฒ ๐—ฎ๐˜ @ETH ๐—ถ๐˜€ ๐—ป๐—ผ๐˜„ ๐—น๐—ถ๐˜ƒ๐—ฒ ๐—ผ๐—ป ๐—ฌ๐—ผ๐˜‚๐—ง๐˜‚๐—ฏ๐—ฒ ๐—ณ๐—ผ๐—ฟ ๐—ฒ๐˜ƒ๐—ฒ๐—ฟ๐˜†๐—ผ๐—ป๐—ฒ ๐˜„๐—ต๐—ผ ๐—ฐ๐—ผ๐˜‚๐—น๐—ฑ๐—ป'๐˜ ๐—ท๐—ผ๐—ถ๐—ป ๐˜‚๐˜€ ๐—ถ๐—ป ๐—ฝ๐—ฒ๐—ฟ๐˜€๐—ผ๐—ป! This past Monday, we had the pleasure of hosting Lucas (@Meta @AIatMeta Superintelligence Labs) for our "Robot Learning: From Fundamentals to Foundation Models" course. He joined us to talk about: "๐—ฉ๐—ถ๐˜€๐—ถ๐—ผ๐—ป ๐—ถ๐—ป ๐˜๐—ต๐—ฒ ๐—”๐—ด๐—ฒ ๐—ผ๐—ณ ๐—Ÿ๐—Ÿ๐— ๐˜€". Drawing from a remarkable track record in computer vision and multimodal AI (๐—ฉ๐—ถ๐—ง, ๐—ฆ๐—ถ๐—ด๐—Ÿ๐—œ๐—ฃ, ๐—ฃ๐—ฎ๐—น๐—ถ๐—š๐—ฒ๐—บ๐—บ๐—ฎ) ๐Ÿง , Lucas delivered a masterclass on the frontier of multimodal foundation model training: from pre-training to post-training, where the field stands today, and what comes next ๐Ÿš€ ๐Ÿ“ฝ๏ธ YouTube Recording:ย youtu.be/0XB7fNS_ONg ๐Ÿ“š Course Website:ย cvg.ethz.ch/lectures/Robot-Lโ€ฆ
5
70
672
54,118
Jefferson Enrique Hernandez Cevallos retweeted

55
129
1,023
894,444
Jefferson Enrique Hernandez Cevallos retweeted
๐ŸšจTypical RL algorithms and on-policy distillation methods are blind samplers: they use privileged info to score rollouts, but not to *find* them. We ask: can we use privileged info to *actively sample* the rollouts RL wishes it can stumble upon with compute? โคต๏ธ Pedagogical RL
16
87
498
114,388
Jefferson Enrique Hernandez Cevallos retweeted
Test-time scaling, reasoning, and generally search-like processes clearly drive significant gains in LLMs. Largely owed to the structure of language. One would think the same could apply to non-linguistic domains, like image generation, but that obviously depends on whether the structure of the domain's representation lends itself to search. 1D ordered tokens (e.g., image FlexTok, video FlexTok) seem like a natural fit since they enable a step-by-step coarse-to-fine generation. We investigated that and found they indeed enable search and scale far better with test-time compute than 2D grids. See the visuals on the webpage. Appearing in @icmlconf 2026. ๐Ÿ”—ย soto.epfl.ch ๐Ÿ“„ arxiv.org/abs/2604.15453,
5
31
138
14,835
Jefferson Enrique Hernandez Cevallos retweeted
May 13
2 new OPD survey/analysis papers just dropped
2
16
130
8,043
Jefferson Enrique Hernandez Cevallos retweeted
Introducing Flux Matching, a generative modeling paradigm that generalizes diffusion models to vector fields that need not be the score function. Enables structural priors in the dynamics, faster sampling, interpretable generation, and more! w/ @StefanoErmon @Xiaojie_Qiu ๐Ÿงตโคต๏ธ
21
159
994
144,100
Jefferson Enrique Hernandez Cevallos retweeted
๐Ÿงต 1/11 Everyone's doing on-policy distillation now (Qwen3, Deepseek V4, GLM-5). But here's what nobody's asking: at any given token or for a question and a teacher, when does the teacher's guidance actually help, and when does it quietly make things worse? We found a way to answer this. No training needed!
4
51
437
29,660
Jefferson Enrique Hernandez Cevallos retweeted
RLM arXiv paper update: depth>1 results, more comparisons, more training, and more error analysis! We add depth=2/3 experiments, where the RLM now has access to recursive RLM calls. This is also a feature of the open source `rlm` repo as well. We observe significant performance gains on OOLONG-Pairs and gains on all other benchmarks! We also include various OpenCode and Claude Code comparisons now per popular request. We add a length generalization experiment on MRCRv2 to show more promising training results, add a small prompting case study on OOLONG, and update the error analysis section to discuss the effect of syntax errors, decomposition mistakes, and general observations from the RLM trajectories. The appendix is now also updated with several new experiments and plots!
5
35
233
11,350
Jefferson Enrique Hernandez Cevallos retweeted
"The Truth Lies Somewhere in the Middle (of the Generated Tokens)" In autoregressive language models, mean pooling hidden states across generation yields better representations than any token alone. project page: sophielwang.com/tokens w/ @phillip_isola and @thisismyhat
9
68
471
50,045
Jefferson Enrique Hernandez Cevallos retweeted
Reproducing all of Schmidhuberโ€™s papers (1990-2025) using an AI coding assistant. Cool project by @yaroslavvb! It even reproduced the โ€œWorld Modelsโ€ paper by me and @SchmidhuberAI with a toy env, with a full VAE RNN world model implementation. Project: github.com/cybertronai/schmiโ€ฆ
44
155
1,089
94,980