UTCS is a recognized leader in creating the scientific knowledge and practical technologies exemplifying the digital revolution that defines the 21st century.

Joined December 2011
Photos and videos
Replying to @UTAustin
@UTAustin is launching a new School of Computing in fall 2026! With Information and Statistics & Data Science, we’ll expand student opportunities, accelerate research, and strengthen pathways to high-impact careers and grad study. Read more: utex.as/3OQFrIh
1
11
1,884
Computer Science at UT Austin retweeted
🚨 Test-time intervention for CUA tasks is hard: history is hard to represent, actions require visual grounding and verification before execution, not after. HiViG jointly tackles these points, learning to track history and verify actions against the GUI screenshot. As a test-time method, HiViG is compatible w/ open- and closed-source models and is domain- and model-general: we see 5.8-9% accuracy gains across WebArenaLite2 (web), AndroidLab (mobile) and WindowsAgentArena (desktop), and across models/model classes (e.g., Qwen3-VL-32B, Gemini-3-Flash), with especially large gains on challenging/long-horizon tasks ( 19.2% on WebArenaLiteV2 Maps, 18.6% on WindowsAgentArena Office). 🧵👇
🚨 Introducing HiViG, a test-time intervention framework for long-horizon GUI tasks. By tracking history & verifying actions w/ visual grounding, HiViG boosts performance across diverse GUI environments even for strong policies where existing critics often degrade performance. At test time, HiViG guides the policy in two crucial phases: 1️⃣ Before proposing an action: it provides the policy with an updated summary of past interactions for better history-aware action generation. 2️⃣ After an action is proposed: it evaluates the proposed action using visually grounded reasoning to intercept any flawed action before execution. Across three long-horizon GUI benchmarks with various environments (WebArenaLitev2 🌐, AndroidLab 📱, WindowsAgentArena 🖥️) on strong base policies (Qwen3-VL-32B-Thinking, Gemini-3-Flash), HiViG improves average overall success rate by 5.8% and 9.0% compared to the strongest critics, showing its effectiveness and generalization across diverse GUI platforms and policies! 💪 🧵👇
7
7
1,244
Computer Science at UT Austin retweeted
I’m thrilled to share that I’ll be joining UT Austin Computer Science (@UTCompSci ) as an Assistant Professor, starting Jan 2027! I'm recruiting PhD students for Fall 2027, postdocs, and researchers at all levels: forms.gle/WMbb7NYts1fX5BYG7 I’ll continue my work (yudai-tanaka.com) in Human-Computer Interaction (HCI) and start the Symbiotic Interfaces Lab, where we’ll explore how computational systems & humans can co-adapt to augment human sensorimotor and cognitive abilities. 🦾🧠
40
45
489
56,057
Computer Science at UT Austin retweeted
🤖 How can robots learn long-horizon object state change tasks like mashing a banana 🥣🍌, spreading ketchup on bread 🍅🍞, or slicing a cucumber 🔪🥒? Introducing SPARTA: object state-change manipulation via visual spatial progress 👇 🌐 vision.cs.utexas.edu/project…
2
7
30
4,733
Computer Science at UT Austin retweeted
Can an LLM act as a selective model of a GPU during evolutionary search, by reasoning forecasting a kernel’s runtime but deferring to a GPU when unsure? We produced 12k kernels runtimes from evolutionary search, costing 400M reasoning tokens 600 GPU-hours to answer this. In our work GPU Forecasters, we study language models as selective surrogates for GPU kernel optimization. 1️⃣ Off-the-shelf LLMs can forecast how a GPU responds to a candidate kernel with non-trivial accuracy. If we rank candidates by these predictions and measure only the top 10% on a GPU, the fastest kernel we find is within 20% of the best in the pool. 2️⃣ We want LLMs to not just be accurate but also calibrated, so that we can use their uncertainty for selective prediction: during search, we should trust only confident forecasts and verify less confident forecasts by sending them to the GPU. 3️⃣ We train an open-weights surrogate (GPT-OSS-20B) with RL to improve both accuracy and calibration. Calibration-shaped rewards improve both confidence reliability and ranking ability, while correctness rewards alone do not. 4️⃣ Inside a real kernel search, the surrogate finds faster kernels than an equal-GPU-budget baseline by considering more candidates per measurement. 5️⃣ We release 12,388 LLM-generated GPU kernels with measured runtimes spanning 118 operations, CUDA and Triton backends, 3 GPU types, taking 400M tokens 600 GPU-hours to produce. This dataset can be used for analyzing LLM-driven evolutionary program search dynamics, post-training LLMs for kernel code generation, and things we didn’t get a chance to explore, like training reward models! Thread 🧵👇
5
34
81
15,023
Computer Science at UT Austin retweeted
Exciting news on GR00T: NVIDIA announces our first open humanoid robot platform, featuring Unitree H2 Plus and Sharpa hands, to accelerate academic research and facilitate cross-institutional collaboration. R&D in humanoid robotics needs broader participation. Open science is how we build the future faster, together.
NVIDIA announces the first open humanoid robot reference design built for robotics research. The NVIDIA Isaac GR00T Reference Humanoid Robot combines the @UnitreeRobotics H2 humanoid robot, @SharpaRobotics Wave five-fingered hands for dexterous manipulation, Jetson Thor onboard compute, and Isaac GR00T open software and models, giving researchers a full-stack platform from data capture to model deployment. Read the #NVIDIAGTC Taipei announcement: nvda.ws/4ef9VOr
3
14
109
14,606
Computer Science at UT Austin retweeted
Delighted to finally unveil these results! 🎉 Many congratulations to the team, who worked tirelessly for almost a year to build and evaluate AlphaProof Nexus. We revised many priors during this project — most notably, we discovered that with current frontier models, simple agent loops with compiler feedback can rival more sophisticated systems. We were struck both by the capabilities of our systems and the magnitude of the challenges ahead. I have never been as excited about the potential of formal math to enhance human creativity and bring rigor to AI. Onward! 🚀
AI agents are advancing research-level math. 🚀 I’m thrilled to share @GoogleDeepMind’s AlphaProof Nexus - an agentic framework for formal proof search powered by Gemini. When applied to a set of open formal math problems, our agent autonomously solved: ✅ 9 open Erdős problems (including two open for 56 years!) ✅ 44 Online Encyclopedia of Integer Sequences (OEIS) problems ✅ A 15-year-old open problem in algebraic geometry ✅ A 7-year-old open question in min-max optimization We are collaborating with mathematicians across disciplines - from combinatorics and graph theory to quantum optics. Ultimately, these results show the massive potential of even simple agentic loops powered by Gemini. Read the paper here: arxiv.org/abs/2605.22763v1
4
25
92
28,668
Computer Science at UT Austin retweeted
Excited that our paper StreamdiffusionV2 received the Best Research Paper Award at #MLSys26! 🚀Video generation is quickly moving from demos to production-facing workloads. It is no longer a turn-based pipeline but should be a streaming pipeline to interact with users. 📖Our project page: streamdiffusionv2.github.io/ and paper: arxiv.org/pdf/2511.07399 👂Come join the talk if you are interested in streaming video generation. Our talk will be at the Research Track Oral Presentation: Best Paper Session on Tue 8:45AM at #MLSys26 , I will talk about how we attacked the efficiency and quality challenges. Hope to see you there! ❤️Huge thanks to all authors! This work would not have been possible without the incredible effort from the entire team. Big shout out to Tianrui Feng, Zhi Li, @Andy_ShuoYang , @HaochengXiUCB, @lmxyy1999 , @lvminzhang , @xiuyu_l , Keting Yang, @ZiqiPeng, @songhan_mit , @magrawala, @KurtKeutzer , and @cumulo_autumn
5
35
219
58,634
Computer Science at UT Austin retweeted
Are state-of-the-art AI review systems capable of providing meaningful reviews in an actual AI conference? This paper explains the findings from the AAAI 2026 AI Review Pilot 1/N
May 7
We are thrilled to present a detailed report describing the system built for the AAAI-26 AI review pilot, the survey results, and a new benchmark that was created to assess the capabilities of the system. Read the full article: arxiv.org/pdf/2604.13940
1
7
19
3,992
Computer Science at UT Austin retweeted
Proud of Texas Robotics! Come see our papers at ICRA: robotics.utexas.edu/news/346

1
24
3,268
Computer Science at UT Austin retweeted
🚨 Excited to share MINTEval, a new benchmark for memory with interference. In real-world settings, agents need to handle continuously changing info (think of all your v2.5_final_final docs) . MINTEval tests memory systems on frequent and interfering changes, across challenging question types (long-range lookback/recover, multi-target reasoning) and 4 realistic domains that challenge even the strongest models/agentic memory systems. 🧵👇
LLM agents & memory systems operate in continuously updated environments (Git repos, evolving docs). They must process long contexts, recover earlier information, and reason over many updates that create interference between old and new information. How well do they handle this? We introduce MINTEval: ✅ Frequent context changes & interference (avg. 86 updates) ✅ 5 challenging question types, including long-range lookback & reasoning over multiple targets distributed across context ✅ 4 realistic domains: state tracking, multi-turn dialogue, Wikipedia revisions, GitHub commits ✅ Avg. 138.8k tokens per instance (up to 1.8M) ✅ Human verification on generated QAs = 95.6% 📊 Across 7 representative systems, MINTEval remains difficult, showing an avg. acc of 27.9%, and the best system reaches only 33.4%. 🔎 Our analysis shows: • Memory construction failures cause a 41.7% drop • Memory agents are highly sensitive to design choices • Memory systems have a strong bias toward insertion operations (76.8%) over deletion/update
1
9
19
1,954
Computer Science at UT Austin retweeted
Sparse binary rewards bottleneck LLM RL, motivating the use of privileged information in self-distillation as dense teachers. How can we use and balance multiple types of privileged info: leveraging stable cross-view info, while preserving view-specific info? Current on-policy self-distillation methods often condition the teacher on only one type of privileged view: full solution, partial rationale, answer-only, reference code, feedback, etc. This can be suboptimal: 1️⃣ No single privileged view consistently performs best when used as a teacher. 2️⃣ Views can introduce teacher-specific artifacts from information unavailable to the student. 🧠 Adaptive-View Self-Distillation (AVSD) considers multiple privileged views jointly as a teacher family, balancing cross-view consensus and view-specific signals through a token-level gate to construct better dense learning signals. 🧵👇
4
35
85
26,058
Computer Science at UT Austin retweeted
LLM agents & memory systems operate in continuously updated environments (Git repos, evolving docs). They must process long contexts, recover earlier information, and reason over many updates that create interference between old and new information. How well do they handle this? We introduce MINTEval: ✅ Frequent context changes & interference (avg. 86 updates) ✅ 5 challenging question types, including long-range lookback & reasoning over multiple targets distributed across context ✅ 4 realistic domains: state tracking, multi-turn dialogue, Wikipedia revisions, GitHub commits ✅ Avg. 138.8k tokens per instance (up to 1.8M) ✅ Human verification on generated QAs = 95.6% 📊 Across 7 representative systems, MINTEval remains difficult, showing an avg. acc of 27.9%, and the best system reaches only 33.4%. 🔎 Our analysis shows: • Memory construction failures cause a 41.7% drop • Memory agents are highly sensitive to design choices • Memory systems have a strong bias toward insertion operations (76.8%) over deletion/update
9
37
106
23,205
Congratulations to UT Computer Science Ph.D. student Yeonju Ro @j777ro on being named a 2026 MLCommons ML and Systems Rising Star!
Introducing the 2026 @MLCommons Rising Stars! 🌟 We’ve selected 39 outstanding early-career researchers from 26 global institutions who are shaping the future of ML systems, hardware-software co-design, and trustworthy AI. Meet the cohort: bit.ly/3Ru3ONl #AI #MLCommons
2
4
672
Computer Science at UT Austin retweeted
Yeonju’s research tackles reliability and efficiency in agentic LLM workflows. In her recent work Sherlock 📷, informed verification speculative execution deliver 18.3% accuracy and up to 48.7% latency reduction. Worth a read! arxiv.org/abs/2511.00330
1
3
343
Computer Science at UT Austin retweeted
Huge congrats to our very own @j777ro on being named a 2026 @MLCommons ML & Systems Rising Star, one of just 39 awardees this year! 📷 She’ll be heading to the workshop hosted by @AMD in Santa Clara this July. So well deserved! mlcommons.org/about-us/progr…
1
2
10
715
RT @UT_GameDev: Digital Demo Day brought our community together one more time this year to celebrate the creativity and hard work of studen…
1
Computer Science at UT Austin retweeted
📢 I’m looking to hire a postdoc to work closely with me and my research group at UT Austin on exciting topics in core PL/FM, as well as applications of PL/FM ideas to other areas. If you are interested, or know someone who might be a great fit, please DM me!
2
26
83
13,274
Computer Science at UT Austin retweeted
🚨Excited to announce Agent-BRACE! LLM agents in long-horizon POMDPs either blow up their context with raw history or summarize it, discarding uncertainty by collapsing belief into a point estimate. Agent-BRACE decouples the agent into belief state policy models, jointly trained via RL. Key takeaways: 1️⃣ 🎯The belief state model produces a structured approximation of the belief distribution as a set of atomic natural-language claims with ordinal verbalized certainty labels ranging from certain to unknown. The policy conditions on this compact belief rather than the full history. 2️⃣ 📈 Outperforms strong RL baselines on long-horizon partially observable embodied language environments while maintaining a near-constant context window independent of episode length. 3️⃣ 🔄 The learned belief becomes increasingly calibrated as evidence accumulates, and epistemic belief decreases over time: the proportion of claims that the agent has the strongest level of belief in grows from 21% → 52% over an episode. 👇🧵
2
39
67
15,960
Computer Science at UT Austin retweeted
[1/n] Just wrapped up 7 months interning with @pcastr at DeepMind and I'm so excited to share our work: arxiv.org/abs/2602.10324. TLDR: We used LLM-powered program synthesis to automatically model and discover differences between human and LLM strategic behavior
8
49
329
41,025
Computer Science at UT Austin retweeted
Congratulations to the Class of 2026! 🎓🧡 We can't wait to see how you change the world 🤘
1
32
331
20,894