Excited to present our work at
#LREC2026 in Palma! 🇪🇸
📄 A Large and Balanced Multi-Domain Arabic Corpus Annotated for Morphology, Syntax, and Readability
We introduce BAREC-10M, a major expansion of the Balanced Arabic Readability Evaluation Corpus, growing from 1M to 10M words with rich linguistic annotations and broad multi-domain coverage.
Highlights:
🔹 45 sub-corpora across diverse domains and genres
🔹 Morphological, syntactic, and readability annotations
🔹 Coverage of news, literature, educational and children’s texts, religious discourse, and more
🔹 Balanced resource for studying Arabic variation, style, and complexity
We hope BAREC-10M will support future research in Arabic NLP, readability, education, and linguistic analysis.
Resource:
barec.camel-lab.com/
#LREC2026 #ArabicNLP #ComputationalLinguistics #CorpusLinguistics #NLP @HanadaEducation @CamelNlp