Dan Deutsch

Dan Deutsch

16 Photos and videos

Tweets

Pinned Tweet

Dan Deutsch @_danieldeutsch

10 Dec 2023

Excited to receive an Outstanding Paper award for this work at @emnlpmeeting! Thanks to my co-authors George Foster and @markuseful! Updated version available here: aclanthology.org/2023.emnlp-…

Ties Matter: Meta-Evaluating Modern Metrics with Pairwise Accuracy and Tie Calibration

Daniel Deutsch, George Foster, Markus Freitag. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023.

aclanthology.org

Dan Deutsch @_danieldeutsch

24 May 2023

LLM-based metrics like GEMBA predict many ties, but the way that ties should be handled in Kendall’s tau for meta-evaluating metrics has been a longstanding issue. We propose an update to the meta-evaluation methodology to handle ties. arxiv.org/pdf/2305.14324.pdf

11,953

Vilém Zouhar

Dan Deutsch retweeted

Vilém Zouhar @zouharvi

Mar 12

Machine translation is tough to evaluate, partly because most of what you throw at is too easy. That doesn't at all mean that translation is solved; we're just not doing a good job finding interesting inputs.

835

John Hewitt

Dan Deutsch retweeted

John Hewitt @johnhewtt

19 Nov 2025

Come do a PhD with me at Columbia! My lab tackles basic problems in alignment, interpretability, safety, and capabilities of language systems. If you love adventuring in model internals and behaviors---to understand and improve---let's do it together! pic: a run in central park

128

948

79,341

Eleftheria Briakou

Dan Deutsch retweeted

Eleftheria Briakou @ebriakou

31 Oct 2025

🗺️ Are we making our #LLMs multilingual, or anglocentric? Much work brings languages closer to English, but that comes at the cost of crucial #cultural nuance. @h__j___han tackles this trade-off with surgical steering, adapting LLMs to cultural contexts at inference time.

HyoJung Han @h__j___han

31 Oct 2025

Lots of work on cross-lingual alignment encourages multilingual LLMs to generalize knowledge across languages. But this push for uniformity creates a tension: what happens to knowledge that should remain local? We look into this trade-off of transfer and cultural erasure:🧵

8,995

Markus Freitag

Dan Deutsch retweeted

Markus Freitag @markuseful

27 Jul 2025

Our Google Translate team is bringing a strong presence to #ACL2025 in Vienna this week! 🇦🇹 My group is excited to present several of our latest papers. 👇 Don't miss them!

3,256

Markus Freitag

Dan Deutsch retweeted

Markus Freitag @markuseful

19 Feb 2025

Two new datasets from Google Translate targeting high and low resource languages! WMT24 : 46 new en->xx languages to WMT24, bringing the total to 55 SMOL: 6M tokens for 115 very low-resource languages WMT24 : huggingface.co/datasets/goog… SMOL: huggingface.co/datasets/goog…

google/wmt24pp · Datasets at Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

15,622

iseeaswell꩜bʂky

Dan Deutsch retweeted

iseeaswell꩜bʂky @iseeaswell

19 Feb 2025

😼SMOL DATA ALERT! 😼Anouncing SMOL, a professionally-translated dataset for 115 very low-resource languages! Paper: arxiv.org/pdf/2502.12301 Huggingface: huggingface.co/datasets/goog…

4,187

Dan Deutsch

Dan Deutsch @_danieldeutsch

19 Feb 2025

🚨New machine translation dataset alert! 🚨We expanded the language coverage of WMT24 from 9 to 55 en->xx language pairs by collecting new reference translations for 46 languages in a dataset called WMT24 Paper: arxiv.org/abs/2502.12404v1 Data: huggingface.co/datasets/goog…

6,835

more replies

Dan Deutsch

Dan Deutsch @_danieldeutsch

19 Feb 2025

This project was a highly collaborative effort with many people contributing translations, evaluations, analyses, etc., so I want to thank all of my co-authors! @ebriakou @iseeaswell @marafinkels Rebecca Galor @JurikJuraska @gezakovacs Alison Lui @RicardoRei7 @jasonriesa

221

Dan Deutsch

Dan Deutsch @_danieldeutsch

19 Feb 2025

@shrutirij @prk_riley @esalesk @FirasTr88060642 Stephanie Winkler @BZhangGo @markuseful #nlproc #nlp #ai

243

Yusuf Kocyigit

Dan Deutsch retweeted

Yusuf Kocyigit

@mykocyigit

6 Feb 2025

Thrilled to share our latest findings on data contamination, from my internship at @Google! We trained almost 90 Models on 1B and 8B scales with various contamination types using machine translation as our task and analyze the impact of contamination. arxiv.org/abs/2501.18771

Overestimation in LLM Evaluation: A Controlled Large-Scale Study...

Data contamination -- the accidental consumption of evaluation examples within the pre-training data -- can undermine the validity of evaluation benchmarks. In this paper, we present a rigorous...

arxiv.org

12,214

Jurik Juraska

Dan Deutsch retweeted

Jurik Juraska @JurikJuraska

12 Dec 2024

🚀 We have just released bfloat16 variants of all 3 MetricX-24 models, offering nearly identical performance to their float32 counterparts, but with a 50% smaller memory footprint. ✨ We hope this makes the XL and XXL models more accessible! 🔗 GitHub: github.com/google-research/m…

GitHub - google-research/metricx

Contribute to google-research/metricx development by creating an account on GitHub.

github.com

Jurik Juraska @JurikJuraska

3 Dec 2024

🌐 Meet MetricX-24, our SOTA machine translation evaluation metric and a successor to the successful MetricX-23. 🚀 Now open-source in PyTorch/Transformers! 🎉 Ready to take this top performer in the WMT24 Metrics Shared Task for a spin? 🔗 Code: github.com/google-research/m…

361

Jurik Juraska

Dan Deutsch retweeted

Jurik Juraska @JurikJuraska

3 Dec 2024

GitHub - google-research/metricx

Contribute to google-research/metricx development by creating an account on GitHub.

github.com

2,328

Dan Deutsch

Dan Deutsch @_danieldeutsch

26 Nov 2024

Super simple and effective way of significantly increasing the performance of your evaluation metric!

Mara Finkelstein @marafinkels

26 Nov 2024

LLMs are typically evaluated w/ automatic metrics on standard test sets, but metrics test sets are developed independently. This raises a crucial question: Can we design automatic metrics specifically to excel on the test sets we prioritize? Answer: Yes! arxiv.org/abs/2411.15387

896

Dan Deutsch

Dan Deutsch @_danieldeutsch

12 Nov 2024

New application link! google.com/about/careers/app… I am at EMNLP/WMT this week. Please come find me if you want to learn more about this role!

Dan Deutsch @_danieldeutsch

18 Oct 2024

Interested in doing research on Google Translate and Gemini? Good news! I’m hiring for full-time roles on the Google Translate Research Team! Apply here: google.com/about/careers/app…

5,536

Dan Deutsch

Dan Deutsch @_danieldeutsch

18 Oct 2024

Interested in doing research on Google Translate and Gemini? Good news! I’m hiring for full-time roles on the Google Translate Research Team! Apply here: google.com/about/careers/app…

246

38,341

more replies

Dan Deutsch

Dan Deutsch @_danieldeutsch

18 Oct 2024

Running large-scale experiments for building SOTA MT models arxiv.org/pdf/2309.10966

753

Dan Deutsch

Dan Deutsch @_danieldeutsch

18 Oct 2024

Please get in contact with me if you have any questions!

570