Computer Science at UT Austin

Computer Science at UT Austin

Photos and videos

Tweets

Pinned Tweet

Computer Science at UT Austin @UTCompSci

Feb 19

@UTAustin is launching a new School of Computing in fall 2026! With Information and Statistics & Data Science, we’ll expand student opportunities, accelerate research, and strengthen pathways to high-impact careers and grad study. Read more: utex.as/3OQFrIh

UT Launches New School of Computing, Uniting Computer and Data Science, Statistics, Information...

New school will unite key strengths to establish a center of excellence, strengthening interdisciplinary research and preparing talent for a rapidly changing economy.

news.utexas.edu

1,884

Elias Stengel-Eskin

Computer Science at UT Austin retweeted

Elias Stengel-Eskin

@EliasEskin

Jun 10

🚨 Test-time intervention for CUA tasks is hard: history is hard to represent, actions require visual grounding and verification before execution, not after. HiViG jointly tackles these points, learning to track history and verify actions against the GUI screenshot. As a test-time method, HiViG is compatible w/ open- and closed-source models and is domain- and model-general: we see 5.8-9% accuracy gains across WebArenaLite2 (web), AndroidLab (mobile) and WindowsAgentArena (desktop), and across models/model classes (e.g., Qwen3-VL-32B, Gemini-3-Flash), with especially large gains on challenging/long-horizon tasks ( 19.2% on WebArenaLiteV2 Maps, 18.6% on WindowsAgentArena Office). 🧵👇

hyunji amy lee @hyunji_amy_lee

Jun 10

🚨 Introducing HiViG, a test-time intervention framework for long-horizon GUI tasks. By tracking history & verifying actions w/ visual grounding, HiViG boosts performance across diverse GUI environments even for strong policies where existing critics often degrade performance. At test time, HiViG guides the policy in two crucial phases: 1️⃣ Before proposing an action: it provides the policy with an updated summary of past interactions for better history-aware action generation. 2️⃣ After an action is proposed: it evaluates the proposed action using visually grounded reasoning to intercept any flawed action before execution. Across three long-horizon GUI benchmarks with various environments (WebArenaLitev2 🌐, AndroidLab 📱, WindowsAgentArena 🖥️) on strong base policies (Qwen3-VL-32B-Thinking, Gemini-3-Flash), HiViG improves average overall success rate by 5.8% and 9.0% compared to the strongest critics, showing its effectiveness and generalization across diverse GUI platforms and policies! 💪 🧵👇

1,244

Yudai Tanaka

Computer Science at UT Austin retweeted

Yudai Tanaka @_yudaitanaka

Jun 8

I’m thrilled to share that I’ll be joining UT Austin Computer Science (@UTCompSci ) as an Assistant Professor, starting Jan 2027! I'm recruiting PhD students for Fall 2027, postdocs, and researchers at all levels: forms.gle/WMbb7NYts1fX5BYG7 I’ll continue my work (yudai-tanaka.com) in Human-Computer Interaction (HCI) and start the Symbiotic Interfaces Lab, where we’ll explore how computational systems & humans can co-adapt to augment human sensorimotor and cognitive abilities. 🦾🧠

489

56,057

Priyanka Mandikal

Computer Science at UT Austin retweeted

Priyanka Mandikal @PrnkMandikal

Jun 3

🤖 How can robots learn long-horizon object state change tasks like mashing a banana 🥣🍌, spreading ketchup on bread 🍅🍞, or slicing a cucumber 🔪🥒? Introducing SPARTA: object state-change manipulation via visual spatial progress 👇 🌐 vision.cs.utexas.edu/project…

0:05

4,733

Zaid Khan

Computer Science at UT Austin retweeted

Zaid Khan

@codezakh

Jun 2

Can an LLM act as a selective model of a GPU during evolutionary search, by reasoning forecasting a kernel’s runtime but deferring to a GPU when unsure? We produced 12k kernels runtimes from evolutionary search, costing 400M reasoning tokens 600 GPU-hours to answer this. In our work GPU Forecasters, we study language models as selective surrogates for GPU kernel optimization. 1️⃣ Off-the-shelf LLMs can forecast how a GPU responds to a candidate kernel with non-trivial accuracy. If we rank candidates by these predictions and measure only the top 10% on a GPU, the fastest kernel we find is within 20% of the best in the pool. 2️⃣ We want LLMs to not just be accurate but also calibrated, so that we can use their uncertainty for selective prediction: during search, we should trust only confident forecasts and verify less confident forecasts by sending them to the GPU. 3️⃣ We train an open-weights surrogate (GPT-OSS-20B) with RL to improve both accuracy and calibration. Calibration-shaped rewards improve both confidence reliability and ranking ability, while correctness rewards alone do not. 4️⃣ Inside a real kernel search, the surrogate finds faster kernels than an equal-GPU-budget baseline by considering more candidates per measurement. 5️⃣ We release 12,388 LLM-generated GPU kernels with measured runtimes spanning 118 operations, CUDA and Triton backends, 3 GPU types, taking 400M tokens 600 GPU-hours to produce. This dataset can be used for analyzing LLM-driven evolutionary program search dynamics, post-training LLMs for kernel code generation, and things we didn’t get a chance to explore, like training reward models! Thread 🧵👇

15,023

Yuke Zhu

Computer Science at UT Austin retweeted

Yuke Zhu @yukez

Jun 1

Exciting news on GR00T: NVIDIA announces our first open humanoid robot platform, featuring Unitree H2 Plus and Sharpa hands, to accelerate academic research and facilitate cross-institutional collaboration. R&D in humanoid robotics needs broader participation. Open science is how we build the future faster, together.

NVIDIA Robotics

@NVIDIARobotics

Jun 1

NVIDIA announces the first open humanoid robot reference design built for robotics research. The NVIDIA Isaac GR00T Reference Humanoid Robot combines the @UnitreeRobotics H2 humanoid robot, @SharpaRobotics Wave five-fingered hands for dexterous manipulation, Jetson Thor onboard compute, and Isaac GR00T open software and models, giving researchers a full-stack platform from data capture to model deployment. Read the #NVIDIAGTC Taipei announcement: nvda.ws/4ef9VOr

0:20

109

14,606

Swarat Chaudhuri

Computer Science at UT Austin retweeted

Swarat Chaudhuri

@swarat

May 25

Delighted to finally unveil these results! 🎉 Many congratulations to the team, who worked tirelessly for almost a year to build and evaluate AlphaProof Nexus. We revised many priors during this project — most notably, we discovered that with current frontier models, simple agent loops with compiler feedback can rival more sophisticated systems. We were struck both by the capabilities of our systems and the magnitude of the challenges ahead. I have never been as excited about the potential of formal math to enhance human creativity and bring rigor to AI. Onward! 🚀

Pushmeet Kohli

@pushmeet

May 25

AI agents are advancing research-level math. 🚀 I’m thrilled to share @GoogleDeepMind’s AlphaProof Nexus - an agentic framework for formal proof search powered by Gemini. When applied to a set of open formal math problems, our agent autonomously solved: ✅ 9 open Erdős problems (including two open for 56 years!) ✅ 44 Online Encyclopedia of Integer Sequences (OEIS) problems ✅ A 15-year-old open problem in algebraic geometry ✅ A 7-year-old open question in min-max optimization We are collaborating with mathematicians across disciplines - from combinatorics and graph theory to quantum optics. Ultimately, these results show the massive potential of even simple agentic loops powered by Gemini. Read the paper here: arxiv.org/abs/2605.22763v1

28,668

Chenfeng_X

Computer Science at UT Austin retweeted

Chenfeng_X

@Chenfeng_X

May 19

Excited that our paper StreamdiffusionV2 received the Best Research Paper Award at #MLSys26! 🚀Video generation is quickly moving from demos to production-facing workloads. It is no longer a turn-based pipeline but should be a streaming pipeline to interact with users. 📖Our project page: streamdiffusionv2.github.io/ and paper: arxiv.org/pdf/2511.07399 👂Come join the talk if you are interested in streaming video generation. Our talk will be at the Research Track Oral Presentation: Best Paper Session on Tue 8:45AM at #MLSys26 , I will talk about how we attacked the efficiency and quality challenges. Hope to see you there! ❤️Huge thanks to all authors! This work would not have been possible without the incredible effort from the entire team. Big shout out to Tianrui Feng, Zhi Li, @Andy_ShuoYang , @HaochengXiUCB, @lmxyy1999 , @lvminzhang , @xiuyu_l , Keting Yang, @ZiqiPeng, @songhan_mit , @magrawala, @KurtKeutzer , and @cumulo_autumn

0:12

219

58,634

Joydeep Biswas

Computer Science at UT Austin retweeted

Joydeep Biswas @Joydeepb_robots

May 8

Are state-of-the-art AI review systems capable of providing meaningful reviews in an actual AI conference? This paper explains the findings from the AAAI 2026 AI Review Pilot 1/N

AAAI

@RealAAAI

May 7

We are thrilled to present a detailed report describing the system built for the AAAI-26 AI review pilot, the survey results, and a new benchmark that was created to assess the capabilities of the system. Read the full article: arxiv.org/pdf/2604.13940

3,992

Peter Stone

Computer Science at UT Austin retweeted

Peter Stone @PeterStone_TX

May 26

Proud of Texas Robotics! Come see our papers at ICRA: robotics.utexas.edu/news/346

3,268

Elias Stengel-Eskin

Computer Science at UT Austin retweeted

Elias Stengel-Eskin

@EliasEskin

May 20

🚨 Excited to share MINTEval, a new benchmark for memory with interference. In real-world settings, agents need to handle continuously changing info (think of all your v2.5_final_final docs) . MINTEval tests memory systems on frequent and interfering changes, across challenging question types (long-range lookback/recover, multi-target reasoning) and 4 realistic domains that challenge even the strongest models/agentic memory systems. 🧵👇

hyunji amy lee @hyunji_amy_lee

May 20

LLM agents & memory systems operate in continuously updated environments (Git repos, evolving docs). They must process long contexts, recover earlier information, and reason over many updates that create interference between old and new information. How well do they handle this? We introduce MINTEval: ✅ Frequent context changes & interference (avg. 86 updates) ✅ 5 challenging question types, including long-range lookback & reasoning over multiple targets distributed across context ✅ 4 realistic domains: state tracking, multi-turn dialogue, Wikipedia revisions, GitHub commits ✅ Avg. 138.8k tokens per instance (up to 1.8M) ✅ Human verification on generated QAs = 95.6% 📊 Across 7 representative systems, MINTEval remains difficult, showing an avg. acc of 27.9%, and the best system reaches only 33.4%. 🔎 Our analysis shows: • Memory construction failures cause a 41.7% drop • Memory agents are highly sensitive to design choices • Memory systems have a strong bias toward insertion operations (76.8%) over deletion/update

1,954

Duy Nguyen

Computer Science at UT Austin retweeted

Duy Nguyen

@duynguyen772

May 21

Sparse binary rewards bottleneck LLM RL, motivating the use of privileged information in self-distillation as dense teachers. How can we use and balance multiple types of privileged info: leveraging stable cross-view info, while preserving view-specific info? Current on-policy self-distillation methods often condition the teacher on only one type of privileged view: full solution, partial rationale, answer-only, reference code, feedback, etc. This can be suboptimal: 1️⃣ No single privileged view consistently performs best when used as a teacher. 2️⃣ Views can introduce teacher-specific artifacts from information unavailable to the student. 🧠 Adaptive-View Self-Distillation (AVSD) considers multiple privileged views jointly as a teacher family, balancing cross-view consensus and view-specific signals through a token-level gate to construct better dense learning signals. 🧵👇

26,058

hyunji amy lee

Computer Science at UT Austin retweeted

hyunji amy lee @hyunji_amy_lee

May 20

106

23,205

Computer Science at UT Austin

Computer Science at UT Austin @UTCompSci

May 19

Congratulations to UT Computer Science Ph.D. student Yeonju Ro @j777ro on being named a 2026 MLCommons ML and Systems Rising Star!

MLCommons @MLCommons

May 19

Introducing the 2026 @MLCommons Rising Stars! 🌟 We’ve selected 39 outstanding early-career researchers from 26 global institutions who are shaping the future of ML systems, hardware-software co-design, and trustworthy AI. Meet the cohort: bit.ly/3Ru3ONl #AI #MLCommons

672

VITA Group

Computer Science at UT Austin retweeted

VITA Group @VITAGroupUT

May 2

Yeonju’s research tackles reliability and efficiency in agentic LLM workflows. In her recent work Sherlock 📷, informed verification speculative execution deliver 18.3% accuracy and up to 48.7% latency reduction. Worth a read! arxiv.org/abs/2511.00330

Sherlock: Reliable and Efficient Agentic Workflow Execution

With the increasing adoption of large language models (LLM), agentic workflows, which compose multiple LLM calls with tools, retrieval, and reasoning steps, are increasingly replacing traditional...

arxiv.org

343

VITA Group

Computer Science at UT Austin retweeted

VITA Group @VITAGroupUT

May 2

Huge congrats to our very own @j777ro on being named a 2026 @MLCommons ML & Systems Rising Star, one of just 39 awardees this year! 📷 She’ll be heading to the workshop hosted by @AMD in Santa Clara this July. So well deserved! mlcommons.org/about-us/progr…

Rising Stars Program - MLCommons

MLCommons Systems and ML Rising Stars celebrate up-and coming researchers. Our goal is to improve AI systems though our collective engineering efforts with industry and academia, by continually...

mlcommons.org

715

Computer Science at UT Austin

Computer Science at UT Austin @UTCompSci

May 18

RT @UT_GameDev: Digital Demo Day brought our community together one more time this year to celebrate the creativity and hard work of studen…

Isil Dillig

Computer Science at UT Austin retweeted

Isil Dillig @IsilDillig

May 11

📢 I’m looking to hire a postdoc to work closely with me and my research group at UT Austin on exciting topics in core PL/FM, as well as applications of PL/FM ideas to other areas. If you are interested, or know someone who might be a great fit, please DM me!

13,274

Joykirat

Computer Science at UT Austin retweeted

Joykirat

@joykiratsingh

May 13

🚨Excited to announce Agent-BRACE! LLM agents in long-horizon POMDPs either blow up their context with raw history or summarize it, discarding uncertainty by collapsing belief into a point estimate. Agent-BRACE decouples the agent into belief state policy models, jointly trained via RL. Key takeaways: 1️⃣ 🎯The belief state model produces a structured approximation of the belief distribution as a set of atomic natural-language claims with ordinal verbalized certainty labels ranging from certain to unknown. The policy conditions on this compact belief rather than the full history. 2️⃣ 📈 Outperforms strong RL baselines on long-horizon partially observable embodied language environments while maintaining a near-constant context window independent of episode length. 3️⃣ 🔄 The learned belief becomes increasingly calibrated as evidence accumulates, and epistemic belief decreases over time: the proportion of claims that the agent has the strongest level of belief in grows from 21% → 52% over an episode. 👇🧵

15,960

Caroline Wang

Computer Science at UT Austin retweeted

Caroline Wang

@CarolineWang98

Feb 13

[1/n] Just wrapped up 7 months interning with @pcastr at DeepMind and I'm so excited to share our work: arxiv.org/abs/2602.10324. TLDR: We used LLM-powered program synthesis to automatically model and discover differences between human and LLM strategic behavior

329

41,025

UT Austin

Computer Science at UT Austin retweeted

UT Austin

@UTAustin

May 10

Congratulations to the Class of 2026! 🎓🧡 We can't wait to see how you change the world 🤘

1:03

331

20,894