1/ Thrilled to introduce T³: a corpus for RAG over reasoning tasks, built from thinking traces.
We show that surprisingly RAG can improve reasoning— with the right corpus.
Rag with Transformed Thinking Traces T³ gain by up to 43.9% on AIME 2025-2026.
🔗 arxiv.org/abs/2605.03344 🧵
🚀 Beyond excited to share we're releasing LOTUSPlan, a new API & optimizer for higher performance LLM-powered data processing, from our team at Berkeley & Stanford.
LOTUS now lets you write your LLM-based queries and optimize them for up to 2.4× lower cost and 4.6× higher accuracy for tasks like, agent trace analysis, LLM-judge evals, RAG, document extraction and deep research.
✨Checkout our our new blog: liana313.github.io/blog/lotu…
🧵
The web was never meant to be flattened into text.
Yet most web RAG systems start by parsing HTML --- a complex and lossy process.
🔥 Introducing PixelRAG: the first RAG system that retrieves and reads 30M web pages as pixels.
Instead of extracting text, PixelRAG retrieves screenshots and lets a VLM read them directly.
PixelRAG not only preserves visual information, but also outperforms text-based RAG on text-only QA benchmarks by 18.1%.
Why?
(1) HTML-to-text conversion often discards layout, structure, tables, and other useful signals.
(2) We continued pretraining a VLM on web page screenshots and turned it into a surprisingly strong visual retriever.
(3) Recent VLMs are remarkably good at understanding web pages, often with better accuracy and token efficiency than text-only pipelines.
Takeaway: HTML parsing may be one of the biggest self-inflicted bottlenecks in web RAG.
Demo below 👇
Code: github.com/StarTrail-org/Pix…
Paper: github.com/StarTrail-org/Pix…
Playground: pixelrag.ai/
Search is becoming increasingly agentic: systems plan, search, synthesize, cite, and revise. But, how should we study and evaluate these systems? 🤔
In TREC RAG 2026, we want to build a reusable collection for this new reality
We’ve aligned on 4 core directions 🧵👇
Grateful that my PhD thesis was recognized as one of the top dissertations in the 2026 Faculty of Mathematics Doctoral Prize at the @UWaterloo ! 🎉
And it is always especially nice to hear kind words from your PhD supervisor @claclarke . I guess that feeling never really goes away, even after you graduate. 😊
uwaterloo.ca/computer-scienc…
Happy to share that our @icmlconf paper "Measuring Agents in Production" received an Oral Presentation spot! 🌟
arxiv.org/abs/2512.04123
See you all in Seoul! 🇰🇷
Excited to share: MAP has been accepted as 🌟 ICML Spotlight 🌟
We hope MAP can provide data-driven insights that help the communities to work on various under-explored research directions around agent systems!
Huge thanks & congrats to my amazing co-authors. See you all at Seoul! 🫡
Excited to share that MAP has been selected for ✨ICML Oral✨
We look forward to sharing the insights in the paper with the community
And much much appreciations to everyone who participated in our study ❤️ MAP won’t be possible without your contribution to open science
Excited to share: MAP has been accepted as 🌟 ICML Spotlight 🌟
We hope MAP can provide data-driven insights that help the communities to work on various under-explored research directions around agent systems!
Huge thanks & congrats to my amazing co-authors. See you all at Seoul! 🫡
1/ Thrilled to introduce T³: a corpus for RAG over reasoning tasks, built from thinking traces.
We show that surprisingly RAG can improve reasoning— with the right corpus.
Rag with Transformed Thinking Traces T³ gain by up to 43.9% on AIME 2025-2026.
🔗 arxiv.org/abs/2605.03344 🧵
5/ Interestingly, RAG over T³ can be cheaper than No RAG.
Retrieved reasoning shifts work from expensive output tokens to cheap input tokens — the model thinks less and reads more.
Think less. Retrieve thinking. 🧠
In 2007, Mark @IR_oldie and I launched EVIA, a workshop on information access evaluation methods collocated with #ntcir6 .
Now in 2026, Negar @NegarEmpr and I are serving as PC co-chairs of #evia2026, which will take place on Day 3 (Dec 10) of #ntcir19 .
CFP in preparation..
Call for Papers
The 12th International Workshop on Evaluating Information Access (EVIA 2026)
Submission deadline: September 1, 2026 (AoE)
Workshop date: December 10, 2026 (Japan Time)
Venue: National Institute of Informatics, Tokyo, Japan.
#ntcir#ntcir19#evia2026
We set out to build a better retriever, so we looked for the hardest IR benchmarks.
For each, we asked how much headroom remained by running oracle reranking with a frontier LLM. Most had little room left!
So we built OBLIQ-Bench to study much harder search queries than before.
Today, we’re releasing Continual Learning Bench 1.0: the first, realistic benchmark for measuring how AI systems can improve in online settings.
Benchmarks today assume models are stateless. Each example is independent, and once a system finishes a task, it moves on as if nothing happened.
But deployed AI systems should learn from experience. We tested 10 frontier systems against novel, expert-validated tasks and find there’s still plenty of headroom for learning. (1/n)
Congratulation to the team for the MAP paper being accepted as an ICML spotlight! A key takeaway from this work is that reliability remains one of the central challenges for production agent systems. Simple yet effective methods continue to dominate in these agent systems for…
Excited to share: MAP has been accepted as 🌟 ICML Spotlight 🌟
We hope MAP can provide data-driven insights that help the communities to work on various under-explored research directions around agent systems!
Huge thanks & congrats to my amazing co-authors. See you all at Seoul! 🫡
So excited to share that my first ever @icmlconf paper has been accepted as a Spotlight! ✨
Grateful, happy, and incredibly excited about this work! See you all in Seoul!🇰🇷
Excited to share: MAP has been accepted as 🌟 ICML Spotlight 🌟
We hope MAP can provide data-driven insights that help the communities to work on various under-explored research directions around agent systems!
Huge thanks & congrats to my amazing co-authors. See you all at Seoul! 🫡