Research Scientist at Google Translate working on text generation evaluation

Joined September 2012
16 Photos and videos
Pinned Tweet
Excited to receive an Outstanding Paper award for this work at @emnlpmeeting! Thanks to my co-authors George Foster and @markuseful! Updated version available here: aclanthology.org/2023.emnlp-…
LLM-based metrics like GEMBA predict many ties, but the way that ties should be handled in Kendall’s tau for meta-evaluating metrics has been a longstanding issue. We propose an update to the meta-evaluation methodology to handle ties. arxiv.org/pdf/2305.14324.pdf
4
11
69
11,953
Dan Deutsch retweeted
Machine translation is tough to evaluate, partly because most of what you throw at is too easy. That doesn't at all mean that translation is solved; we're just not doing a good job finding interesting inputs.
1
1
16
835
Dan Deutsch retweeted
19 Nov 2025
Come do a PhD with me at Columbia! My lab tackles basic problems in alignment, interpretability, safety, and capabilities of language systems. If you love adventuring in model internals and behaviors---to understand and improve---let's do it together! pic: a run in central park
13
128
948
79,341
Dan Deutsch retweeted
🗺️ Are we making our #LLMs multilingual, or anglocentric? Much work brings languages closer to English, but that comes at the cost of crucial #cultural nuance. @h__j___han tackles this trade-off with surgical steering, adapting LLMs to cultural contexts at inference time.
Lots of work on cross-lingual alignment encourages multilingual LLMs to generalize knowledge across languages. But this push for uniformity creates a tension: what happens to knowledge that should remain local? We look into this trade-off of transfer and cultural erasure:🧵
11
50
8,995
Dan Deutsch retweeted
Our Google Translate team is bringing a strong presence to #ACL2025 in Vienna this week! 🇦🇹 My group is excited to present several of our latest papers. 👇 Don't miss them!
1
5
53
3,256
Dan Deutsch retweeted
Two new datasets from Google Translate targeting high and low resource languages! WMT24 : 46 new en->xx languages to WMT24, bringing the total to 55 SMOL: 6M tokens for 115 very low-resource languages WMT24 : huggingface.co/datasets/goog… SMOL: huggingface.co/datasets/goog…
2
24
83
15,622
Dan Deutsch retweeted
😼SMOL DATA ALERT! 😼Anouncing SMOL, a professionally-translated dataset for 115 very low-resource languages! Paper: arxiv.org/pdf/2502.12301 Huggingface: huggingface.co/datasets/goog…
3
12
35
4,187
🚨New machine translation dataset alert! 🚨We expanded the language coverage of WMT24 from 9 to 55 en->xx language pairs by collecting new reference translations for 46 languages in a dataset called WMT24 Paper: arxiv.org/abs/2502.12404v1 Data: huggingface.co/datasets/goog…
3
24
88
6,835
This project was a highly collaborative effort with many people contributing translations, evaluations, analyses, etc., so I want to thank all of my co-authors! @ebriakou @iseeaswell @marafinkels Rebecca Galor @JurikJuraska @gezakovacs Alison Lui @RicardoRei7 @jasonriesa
1
2
221
Dan Deutsch retweeted
Thrilled to share our latest findings on data contamination, from my internship at @Google! We trained almost 90 Models on 1B and 8B scales with various contamination types using machine translation as our task and analyze the impact of contamination. arxiv.org/abs/2501.18771
3
19
85
12,214
Dan Deutsch retweeted
🚀 We have just released bfloat16 variants of all 3 MetricX-24 models, offering nearly identical performance to their float32 counterparts, but with a 50% smaller memory footprint. ✨ We hope this makes the XL and XXL models more accessible! 🔗 GitHub: github.com/google-research/m…
🌐 Meet MetricX-24, our SOTA machine translation evaluation metric and a successor to the successful MetricX-23. 🚀 Now open-source in PyTorch/Transformers! 🎉 Ready to take this top performer in the WMT24 Metrics Shared Task for a spin? 🔗 Code: github.com/google-research/m…
2
2
361
Dan Deutsch retweeted
🌐 Meet MetricX-24, our SOTA machine translation evaluation metric and a successor to the successful MetricX-23. 🚀 Now open-source in PyTorch/Transformers! 🎉 Ready to take this top performer in the WMT24 Metrics Shared Task for a spin? 🔗 Code: github.com/google-research/m…
1
5
17
2,328
Super simple and effective way of significantly increasing the performance of your evaluation metric!
LLMs are typically evaluated w/ automatic metrics on standard test sets, but metrics test sets are developed independently. This raises a crucial question: Can we design automatic metrics specifically to excel on the test sets we prioritize? Answer: Yes! arxiv.org/abs/2411.15387
8
896
New application link! google.com/about/careers/app… I am at EMNLP/WMT this week. Please come find me if you want to learn more about this role!

Interested in doing research on Google Translate and Gemini? Good news! I’m hiring for full-time roles on the Google Translate Research Team! Apply here: google.com/about/careers/app…
10
35
5,536
Interested in doing research on Google Translate and Gemini? Good news! I’m hiring for full-time roles on the Google Translate Research Team! Apply here: google.com/about/careers/app…

3
82
246
38,341
Running large-scale experiments for building SOTA MT models arxiv.org/pdf/2309.10966
1
2
753
Please get in contact with me if you have any questions!
4
570