Researcher in Informatics at University of Edinburgh. Mainly working on machine translation.

Joined April 2010
21 Photos and videos
Barry Haddow retweeted
📣 Excited to share our latest research: "Demystifying Multilingual Chain-of-Thought in Process Reward Modeling" where we explore process reward models beyond English to improve multi-step reasoning in 11 languages! Link: arxiv.org/abs/2502.12663 Code: github.com/weixuan-wang123/M…
1
1
6
901
Barry Haddow retweeted
17 Mar 2025
New paper on the HPLT v2 dataset making-of: - pipeline documentation and code - extensive analysis of the quality and characteristics - evaluation of the performance of language models and machine translation systems trained on it 🤓Happy reading! arxiv.org/pdf/2503.10267
4
12
559
Barry Haddow retweeted
28 Feb 2025
We are happy to announce the second release of HPLT bilingual datasets: - 50 English-centric language pairs = 380M parallel sentences (HPLT) 🤩 - 1,275 non-English-centric language pairs = 16.7B parallel sentences (MultiHPLT) 😮 Available at the HPLT dataset catalogue and OPUS.
12
15
1,281
1 Feb 2025
MT Summit 2025 - deadline extended! The deadline for all papers (technical/user/translator/products/projects) has been extended to February 10th. MT Summit will be in Geneva, June 23--27. mtsummit2025.unige.ch/index.…

1
7
338
26 Jan 2025
EAMT best thesis award - closes on January 31st. Completed an MT-related PhD in 2024? In Europe, Africa or Middle East. Then why not submit your thesis. eamt.org/2024/11/28/the-anth…

3
5
546
Barry Haddow retweeted
8 Jan 2025
🥳 Amazing performance of the #HPLT v2 dataset! HuggingFace multilingual evaluation HPLT English internal evaluation show that HPLT v2 is one of the best datasets to train LLMs. Downloads and more at either HPLT ➡️ hplt-project.org/hplt-v2-dat… or HF ➡️huggingface.co/datasets/HPLT…
7
12
1,219
2 Dec 2024
Very exciting to see the 9B EuroLLM model released - made in Europe and supporting all official EU languages. More and bigger models to come ...
Today we release EuroLLM-9B: the best EU-made multilingual LLM of its size! Check the blog post for more info and results: huggingface.co/blog/eurollm-…. Stay tuned for the technical report and bigger and more powerful models!
4
18
832
2 Dec 2024
EAMT Best thesis award - now open! Have you defended an MT-related thesis in 2024, in EMEA? Then why not submit to the prestigious EAMT BTA? eamt.org/2024/11/28/the-anth… . Deadline: 2025-01-31

3
2
531
Barry Haddow retweeted
2 Dec 2024
Join us on a new edition of the Winter School! "Pretraining Data Quality 🧐 and Multilingual Evaluation of LLMs👀" 🪂Feb. 3–5, 2025, Norway More info and registration: wiki.nlpl.eu/Community/train… Jointly organised by @hplt_eu and the Nordic Language Processing Laboratory (NLPL)
4
11
691
Barry Haddow retweeted
The 18th MT marathon will be organized in beautiful Helsinki in the end of August, 2025. We invite you to a week-long gathering of researchers, developers and students with lectures, labs and hacking projects. More information will come - stay tuned!
1
7
21
1,532
17 Nov 2024
*Update:* Deadline for EAMT project grants is extended by 1 week - to November 25th. Details here: eamt.org/2024/10/21/eamt-spo…

13 Nov 2024
Only 5 days left to apply for EAMT project grants
2
4
663
13 Nov 2024
Only 5 days left to apply for EAMT project grants
22 Oct 2024
Have an MT-related idea and looking for funding? EAMT are offering project grants of up to €10k (main track) or €4k (students). Apply by Nov. 18th. eamt.org/2024/10/21/eamt-spo… eamt.org/2024/10/21/eamt-spo…
2
926
Barry Haddow retweeted
📢 𝗠𝗧 𝗦𝘂𝗺𝗺𝗶𝘁 𝟮𝟬𝟮𝟱: Calls for Papers, Workshops and Tutorials 𝗮𝗿𝗲 𝗢𝘂𝘁! You'll find all the details on our website mtsummit2025.unige.ch/ Deadlines: 📆WS&Tutorials = 25 Nov 2024 📆CfP = 27 Jan 2025 #machinetranslation #users #researchers #translators #AI #LLM
10
14
1,508
22 Oct 2024
Have an MT-related idea and looking for funding? EAMT are offering project grants of up to €10k (main track) or €4k (students). Apply by Nov. 18th. eamt.org/2024/10/21/eamt-spo… eamt.org/2024/10/21/eamt-spo…

4
15
1,413
2 Oct 2024
New HPLT data release is out!
2 Oct 2024
🚀 INTRODUCING THE LATEST HPLT MONOLINGUAL DATASETS! TL;DR: 🔍 4.5 PB of web crawls 📄 21 billion documents 💝 careful extraction, dedup, annotation and cleaning 💥 193 languages! Explore and download the new HPLT Monolingual Datasets NOW! hplt-project.org/datasets/v2… #HPLT
6
354
Barry Haddow retweeted
Have you recently used COMET for MT evaluation? ☄️ - Did you report the specific model? ≥12% of papers don't! - Did you report the package version? Makes a difference. - `pip install sacrecomet` generates a nice version model signature. Not too late for WMT/EMNLP camera-ready!
2
10
56
6,558
Barry Haddow retweeted
27 Sep 2024
❗Are We Truly Achieving Multilingualism in LLMs or Just Relying on Translation?❗ Need multilingual instruction data and benchmarks? Just translate from English. LLM multilingualism can be easily solved! If you agree, check out our #EMNLP 2024 paper which says this is sub-optimal. arxiv.org/abs/2406.12822 🧵Below
1
16
51
9,996
Barry Haddow retweeted
Today we release the first EuroLLM paper and models: EuroLLM-1.7B and EuroLLM-1.7B-Instruct! The EuroLLM project will develop open-weight multilingual LLMs that understand and generate text in all official EU languages. Stay tuned for the bigger and stronger EuroLLMs (9B, 22B)!
3
18
76
13,392
Barry Haddow retweeted
16 Sep 2024
We know LLMs are poor at MT in low-resource languages (LRLs): curious how to adapt them to perform better? 🚀 Our new paper explores the interplay between scale (of MT data) and diversity (of tasks/langs) in instruction tuning in determining LLM-MT performance for LRLs💡 arxiv.org/abs/2408.12780
1
20
66
16,250