🚀 RAG systems excel at answering questions—but what happens when the corpus has NO answer or complex multi-hop reasoning is required?
Moreover, how can we build benchmarks to stress-test RAG systems in such settings in a realistic way?
See our new preprint to find out! 🧵👇
🎉 Delighted to announce that MetaFaith has been accepted to #EMNLP2025 Main! In this work we systematically study how well LLMs can express their internal uncertainty in words, offering a metacognition-inspired way to improve this ability 🧠✨
Check out more details below!👇
I will be presenting our work 𝗠𝗗𝗖𝘂𝗿𝗲 at #ACL2025NLP in Vienna this week! 🇦🇹
Come by if you’re interested in multi-doc reasoning and/or scalable creation of high-quality post-training data!
📍 Poster Session 4 @ Hall 4/5
🗓️ Wed, July 30 | 11-12:30
🔗 aclanthology.org/2025.acl-lo…
🔥Thrilled to introduce MDCure: A Scalable Pipeline for Multi-Document Instruction-Following 🔥
How can we systematically and scalably improve LLMs' ability to handle complex multi-document tasks?
Check out our new preprint to find out!
Details in 🧵 (1/n):
If you use "AI agents" (LLMs that call tools) you need to be aware of the Lethal Trifecta
Any time you combine access to private data with exposure to untrusted content and the ability to externally communicate an attacker can trick the system into stealing your data!
ALT The lethal trifecta (diagram with three circles): access to private data, ability to externally communicate, exposure to untrusted content
(14/n) If you're interested in LLM trustworthiness, uncertainty quantification, human-AI collaboration, or even metacognition — give our paper a read and check out MetaFaith! We'd love feedback or questions.
📄 Paper: arxiv.org/abs/2505.24858
🔗 Github: github.com/yale-nlp/MetaFait…
🔥 Excited to share MetaFaith: Understanding and Improving Faithful Natural Language Uncertainty Expression in LLMs🔥
How can we make LLMs talk about uncertainty in a way that truly reflects what they internally "know"?
Check out our new preprint to find out!
Details in 🧵(1/n):
A ton of great progress in AI and Robotics this week.
I summarized everything from Unitree, OpenAI, Mirror Me, Microsoft, Physical Intelligence, Luma, Sakana AI and more.
Here's everything you need to know and how to make sense out of it:
🔥Thrilled to introduce MDCure: A Scalable Pipeline for Multi-Document Instruction-Following 🔥
How can we systematically and scalably improve LLMs' ability to handle complex multi-document tasks?
Check out our new preprint to find out!
Details in 🧵 (1/n):
📢The problem in model alignment no one talks about — the need for preference data, which costs $$$ and time!
Enter Kahneman-Tversky Optimization (KTO), which matches or exceeds DPO without paired preferences.
And with it, the largest-ever suite of feedback-aligned LLMs. 🧵
Excellent Santa Fe Institute talk by Josh Tenenbaum on the limitations of GPT-4 and better approaches based on probabilistic world models and human-intuitive AI buff.ly/437slJ8
Only 17 days left in the @SecurityBSides@BSidesLV CFP-get your talk proposals in!
Open tracks:
Breaking Ground
Common Ground
Ground Floor
Ground Truth
Hire Ground
@iamthecavalry
PasswordsCon
Training Ground
Underground
What're all these tracks @Daemontamer? Glad you asked! 🧵
I wanted to give a primer on software architecture at Harvard systems reading group, so we're studying @lichess, one of the largest chess websites… and I made a code scavenger hunt!
notes.ekzhang.com/events/hsr…
Stop what you’re doing and read this entire post on the GPT-4 code interpreter plugin. This is completely bonkers and is going to change everything. andrewmayneblog.wordpress.co…
Congrats to everyone involved in one of the big announcements today! 🎉
To those who are not: remember that your value is NOT whether you find yourself in one of the few in-groups. A healthy scientific field is not supposed to make you feel left out. You are doing great 🌞
Imagine the power unleashed if we have GPT-3 inference-on-a-chip. But we won't get there based on digital circuits. The next great race may be for low-power analog chip designs that can incorporate weights directly into the inference circuit.
buff.ly/3YOufLM
Reinforcement learning with human feedback is a promising technique that lets machines learn to understand human values. But let's not forget about its social & ethical implications. Read more: buff.ly/3L1iIpf
Original paper: buff.ly/3SU0Z4Z#AI#ethics#rlhf