Joined May 2020
13 Photos and videos
Pinned Tweet
27 Jun 2024
Do LLMs' reasoning abilities come from training on code🤔? Many think so, but how does this hold across languages🌐? We study the interplay of code and reasoning in our recent work (#acl2024). 📃arxiv.org/abs/2403.02567 🗃️github.com/amazon-science/xs… 1/6 🧵
5
29
154
16,686
5 Jul 2025
In a world of geopolitical conflicts, how can AI help us navigate? Our #ACL2025-F work studies RAG robustness across 49 languages. TL;DR: 📈 boost robustness w/ multilingual RAG, 🤔 take care w/ low-resource citations 📜arxiv.org/abs/2410.01171 🤗huggingface.co/datasets/bord… 1/4 🧵
3
3
11
984
28 Jul 2025
I'm in Vienna this week to present our poster on the robustness of RAG systems to multilingual contexts at #ACL2025NLP! 🗓️ Poster Session | Wednesday, July 30, 16:00 - 17:30 📍 Hall 4/5 @aclmeeting
1
133
5 Jul 2025
We study cross-lingual robustness over 4 LLMs and 2 IR models. We find A) multilingual RAG performs best; B) LLM’s citations varies widely across langs. Our further experiments investigate aspects of cross-lingual RAG from IR to LLM explanations. 3/4 🧵
1
113
5 Jul 2025
This is the final paper of my PhD! Thanks to my many @upennnlp collaborators: @samarhdr, Chris, and the 7 wonderful students who I was fortunate to mentor. Please look out for our poster at ACL 2025 in Vienna. 4/4 🧵
3
120
Bryan Li retweeted
🚀 How well can LLMs know you and personalize your response? Turns out, not so much! Introducing the PersonaMem Benchmark -- 👩🏻‍💻Evaluate LLM's ability to understand evolving persona from 180 multi-session user-chatbot conversation history 🎯Latest models (GPT-4.1, GPT-4.5, o4-mini, Llama-4, Gemini 2.0, Deepseek-R1, Claude-3.7) all struggle in personalization! 🎨7 personalization skills tested in 15 scenarios 🌟Realistic long-context evaluation up to 1M tokens 👇 Check out what we discovered… (1/6)
3
11
33
4,601
11 Mar 2025
Externally retrieving knowledge empowers LLMs for domain-adapted MT ⚖️🩺. But how is knowledge best represented, and how viable is generating it from an LLM itself? Our @GoogleAI paper investigates these questions through a careful experimental setup 📜. arxiv.org/abs/2503.05010

1
3
6
446
11 Mar 2025
TL;DR - translation pairs > bilingual terminologies, generation especially boosts translations for small LLMs Our ablations highlight the need for more challenging domain-adapted MT datasets with modern LLMs. Thanks to collaborators Jiaming, @ebriakou & @ColinCherry!
86
Bryan Li retweeted
24 Feb 2025
We share Code-Guided Synthetic Data Generation: using LLM-generated code to create multimodal datasets for text-rich images, such as charts📊, documents📄, etc., to enhance Vision-Language Models. Website: yueyang1996.github.io/cosyn/ Dataset: huggingface.co/datasets/alle… Paper: arxiv.org/pdf/2502.14846 Code: github.com/allenai/pixmo-doc…
6
46
194
23,149
Bryan Li retweeted
🚨 LLMs must grasp implied language to reason about emotions, social cues, etc. Our @GoogleDeepMind paper presents the Implied NLI dataset. Targeting social norms 🌎 and conversational dynamics 💬, we enhance LLM understanding of real-world implication! arxiv.org/abs/2501.07719
1
16
54
6,257
3 Oct 2024
RAG enables LLMs to access external info 📖. But when this info is multiple languages 🌐, can LLMs reconcile differing viewpoints 🧐? We introduce BordIRlines, a dataset to study the robustness of cross-lingual RAG. 📃arxiv.org/abs/2410.01171 🗃️ huggingface.co/datasets/bord… 1/4 🧵
1
3
8
790
3 Oct 2024
Using cross-lingually aligned queries, we analyze responses in a RAG setting. Responses can be "flipped" by varying passages' linguistic composition. We thus find these systems to be far from cross-lingually robust, as certain viewpoints can be amplified over others. 3/4 🧵
1
139
3 Oct 2024
We'll be presenting this at the NLP for Wikipedia workshop @emnlpmeeting. This is ongoing work, and we'd love to hear feedback from the community! A shout-out to my collaborators Fiona and Adwait for their amazing first paper efforts, @samarhdr, and Chris. 4/4 🧵
123
27 Jun 2024
Do LLMs' reasoning abilities come from training on code🤔? Many think so, but how does this hold across languages🌐? We study the interplay of code and reasoning in our recent work (#acl2024). 📃arxiv.org/abs/2403.02567 🗃️github.com/amazon-science/xs… 1/6 🧵
5
29
154
16,686
27 Jun 2024
Results on BLOOM(Z) show that both techniques in tandem supercharge LLMs' complex reasoning across languages. Also, results on GPT-3 show that our code prompt format alone works well for API-based LLMs. 5/6 🧵
1
5
270
27 Jun 2024
Check out our paper for more details and results, and we invite you to download and work with our xSTREET dataset! A huge thanks to my @AmazonScience collaborators: Tamer, @dbonadim, @nik0spapp, & Saab ~ 6/6 🧵
5
541