Joined January 2012
65 Photos and videos
Raphaël Merx retweeted
Did you know? ❌77% of language models on @huggingface are not tagged for any language 📈For 95% of languages, most models are multilingual 🚨88% of models with tags are trained on English In a new blog post, @tylerachang and I dig into these trends and why they matter! 👇
2
4
25
1,240
in Vienna for ACL, presenting Tulun, a system for low-resource in-domain translation, using LLMs Working w 2 real use cases: medical translation into Tetun 🇹🇱 & disaster relief speech translation in Bislama 🇻🇺
Replying to @ivrik
✨TULUN: Transparent and Adaptable Low-resource Machine Translation By: Raphael Merx, Hanna Suominen, Lois Yinghui Hong, Nick Thieberger, Trevor Cohn, Kat Vylomova Paper: aclanthology.org/2025.acl-de… Demo: bislama-trans.rapha.dev #ACL2025NLP #NLProc #ACL2025
1
2
5
358
Our paper on generating bilingual example sentences with LLMs got best paper award @altanlp ! arxiv.org/abs/2410.03182 We work with French / Indonesian / Tetun, find that annotators don't agree about what's a "good example", but that LLMs can align with a specific annotator.
3
4
19
1,030
Raphaël Merx retweeted
Translation is a complex task involving pre-translation research and post-translation stages. Can #LLMs handle this process step-by-step, relying solely on their internal knowledge? ✨We show that decomposing the translation process significantly improves #Gemini translation quality of long-form texts across all #WMT24 languages! 📜arxiv.org/pdf/2409.06790
2
14
69
6,564
Life update: after 10 years in industry, I'm going back to school for a PhD at Uni. of Melbourne! Started last week, lots of work to do which i'm really looking forward to !
✨A very warm welcome to @RaphaelMerx who is joining #UniMelb #NLProc group to work on enhancing Machine Translation for Medical Education in Timor-Leste! Recently Rapha has presented his first paper on MT for the Mambai Language: aclanthology.org/2024.eurali… !
10
2
24
1,666
Coming to Google Translate: Tetun, Tok Pisin, Balinese, Fijian, Acehnese, and many other languages of SEA & the Pacific! Let's test quality when they release, but potentially a small revolution in MT for our region blog.google/products/transla…
5
13
44
1,674
First paper published! We create a first corpus for the Mambai language (from Timor-Leste), and try teach an LLM to translate into Mambai using examples selected to match the source, all from one language manual. arxiv.org/abs/2404.04809
4
5
15
423
Anyways, goes to show the importance of working with native speakers for low resource NLP work, especially in the LLM era, when benchmarks are less trustworthy than ever!
1
1
2
127
Also, the #EURALI folks are very cool and i'm looking forward to more work with them! x.com/sina_ahm/status/179439… #nlproc #lreccoling2024

25 May 2024
Thanks all for attending #EURALI today! Let’s hope that such research communities along with language enthusiasts and linguists change the landscape of #nlproc for under-resourced languages in the near future! 🙂 #lreccoling2024
1
1
4
192
High internet use but low social media use, hats off to 🇩🇪! Survey by @pewresearch pewresearch.org/short-reads/…
1
5
197
Raphaël Merx retweeted
This #InternationalWomensDay we pay tribute to the wonder women of Catalpa, who are not just smart and skilled but also fearless, funny and feisty! We spoke to 7 women about their jobs, their career paths and their wishes for women and girls on this day. catalpa.io/blog/celebrating-…
1
1
4
289
🇮🇩 ranks very high on economic optimism. Hope it stays this way #pemilu2024
4
259
Woke up watching this series of short videos on Papuan languages, very nice: youtube.com/watch?v=1l5jJE-1… Lots of trivia, like the Bukiyip system having two counting systems, one in base 3 (for coconuts and fish), one in base 4 (for betel nuts and bananas)
4
146