The dataset PILAR
github.com/transducens/PILAR that our
@transducens research group at
@UA_Universidad @UA_Universitat collected and released in 2024 was included in the training data of the ALIA machine translation models
langtech-bsc.gitbook.io/alia… for Aragonese, Aranese, and Asturian, developed by the
@BSC_CNS and presented last week by the Spanish Government:
alia.gob.es/ 🥳
According to their model cards, the translation models were developed by the
@BSC_CNS as part of the Shared Task on Translation into Low-Resource Languages of Spain that we successfully organized last year:
statmt.org/wmt24/romance-tas…
Note that we have also contributed our own multilingual model, called IBRO, designed for translation from Spanish into several Romance languages, including Aragonese, Aranese, Asturian, Catalan (Valencian included), and Galician, as well as from Catalan into Aranese. The 1.3B IBRO model
huggingface.co/Transducens/I… was publicly funded through the LiLowLa project:
transducens.dlsi.ua.es/lilow…
Although the resources invested are significant, greater support from public administrations is essential. ✊🔥 Some impactful measures, like releasing certain public linguistic resources under open licenses, are not especially costly but could have a major impact.