Collect, validate, and benchmark speech data for African languages to build better voice AI. Public TG t.me/dialectralab

Joined October 2022
56 Photos and videos
Pinned Tweet
Today, we’re excited to officially launch our Yoruba speech data campaign on Dialectra. Over the past two months, we’ve seen contributors across Hausa, Kanuri, and Fulfulde help us build one of the fastest-growing African speech data communities. Now it’s time to expand. With Yoruba joining Dialectra, contributors can now participate in: • Corpus script recordings • Transcription tasks • Live conversational speech through Dialect Connect As always, every contribution goes through our transcription, annotation, standardization, and human verification pipeline before becoming training-ready datasets. Yoruba is one of Africa’s most influential and widely spoken languages, yet high-quality conversational speech infrastructure for it remains limited. We want to help change that. If you speak Yoruba, you can now join Dialectra, contribute your voice, and help shape the future of African speech AI while earning rewards for your contributions.
65
29
163
5,219
Dialectra retweeted
The biggest mistake many AI startups make is trying to build models too early without infrastructure discipline. Models can be retrained. High-quality African speech infrastructure is much harder to rebuild..
5
4
30
319
We're building AI that understands you in your language. Every recording you submit on Dialectra trains AI to recognize Hausa, Yoruba, Arabic, and more. You're not just a user. You're a builder. Join #Dialectra today: dialectra.io/auth?tab=signup…
12
11
38
572
A quick update on how things are moving at Dialectra. ​We’ve officially crossed a few massive milestones: ​1,166 total conversational calls ​904 completed sessions ​142.2 hours of live speech collected ​Active contributors spanning multiple African language communities ​We’re also hitting some scaling pains. Some users are currently dealing with dropped calls and connection issues on Dialect Connect. We're already under the hood fixing the infrastructure to smooth this out. ​The wild thing is, Dialect Connect started as a pure experiment to see how live conversation stacked up against traditional data transcription workflows. What happened next caught us completely off guard. People didn't just treat it as a data tool; they used it to genuinely connect with each other, have real conversations, and earn rewards naturally. It completely changed how we think about conversational AI infrastructure. ​Building this as a bootstrapped founder has meant a lot of late nights, technical headaches, and raw uncertainty. But we didn't build Dialectra to catch a few months of hype and fade away. ​We’re still early, still learning, and still fixing bugs daily. But the foundation is laid, we’re here to stay, and the real work begins now.
9
5
22
322
Got questions? Need help with tasks? Join the Dialectra Telegram community for fast support, updates, and discussions with fellow contributors. t.me/dialectralab
1
8
27
258
Dialectra retweeted
Social authentication is live. You can now signup and sign-in using your gmail. Next X and other Social handles. Try it here. Dialectra.io
5
7
31
1,022
Today we officially launched the Dialectra referral campaign. One thing we’ve learned over the last few months is that communities grow communities. Most of our strongest contributors came from other contributors inviting friends, classmates, family members, and language communities to participate. So we decided to build a proper system around it. Invite a new contributor to Dialectra. Once they: • sign up • complete profile setup • submit their first approved contribution Rewards unlock We’re building this because scaling African speech infrastructure requires real communities, not just platforms.
7
16
47
1,235
Good morning, contributors! ☀️ Have you submitted your tasks today? Ina kwana masu bada gudummawa! 🌍 Kun gabatar da ayyukanku na yau? Ẹ káàárọ̀, àwọn olùkópa! 💜 Ṣé ẹ ti fi iṣẹ́ yín ránṣẹ́ lónìí?
6
10
36
437
For complain and issues please join our telegram group. t.me/dialectralab
1
6
22
726
Dialectra retweeted
Dialectra (@_dialectra) has officially launched its Yoruba speech data campaign. Following strong community participation across Hausa, Kanuri, and Fulfulde, Yoruba contributors can now record speech, transcribe audio, and join live conversations through Dialect Connect. If you speak Yoruba, join the movement. Contribute your voice, earn rewards, and help build the future of African speech AI. Get started today.
6
3
22
881
Dialectra retweeted
Any Yoruba speakers here? 👀 @_dialectra is looking for a native Yoruba speaker with good English skills to volunteer as a judge. Tag someone who'd be a great fit 🇳🇬
6
8
36
688
Dialectra retweeted
Submitted a proof of personhood for an accelerator while in a hospital bed… all for cloud credits to keep building as a bootstrap founder, we are really build from everywhere 😄 It's a really interesting journey about @_dialectra
39
34
226
3,070
Dialectra retweeted
Today I'm going to talk about @_dialectra Not in English, but in Hausa. As someone who uses AI almost every day and spends a lot of time exploring AI tools and projects, I noticed something interesting. Yawancin AI voice agents suna iya magana da Hausa, amma idan ka zurfafa ka duba, da yawa daga cikinsu ba sa fahimtar yadda Hausawa ke magana a zahiri. Hausar Kano daban. Hausar Katsina daban. Hausar Sokoto daban. Har ma kalmomi, karin magana da lafazi suna canzawa daga yanki zuwa yanki. Anan ne Dialectra ta bambanta. Maimakon su mayar da hankali kawai wajen gina AI mai magana, suna tattara sahihin bayanan murya daga masu magana da Hausa na gaskiya. Ba wai karatun rubutu kawai ba. Suna tattara yadda mutane ke magana a rayuwa ta yau da kullum, da lafazi, da karin magana, da bambancin yare daga yankuna daban-daban. Wannan yana da muhimmanci saboda AI ba zai iya fahimtar abin da bai taba koya ba. Idan bayanan da aka horar da shi da su ba su wakilci Hausawa na gaskiya ba, to ko da model ɗin ya yi ƙarfi, zai yi kuskure idan ya gamu da ainihin masu amfani. Abin da ya fi daukar hankalina shi ne cewa Dialectra tana gina foundation ne, ba kawai wani voice AI app ba. Yau muna magana da @ElevenLabs, @Hailuo_AI da sauran voice AI platforms. Amma ka taba tunanin me zai faru idan irin waɗannan manyan platforms suka samu damar amfani da ingantattun bayanan Hausa da Dialectra ke tattarawa? Me zai faru idan AI zai iya gane Hausar Kano, Katsina ko Sokoto ba tare da rikicewa ba? Me zai faru idan AI zai iya fahimtar yadda Hausawa ke magana a zahiri, ba kamar yadda littafi ya rubuta Hausa ba? A ganina, wannan shi ne babban abin da ya sa Dialectra ta bambanta. Ba wai kawai tana gina AI ba. Tana gina bayanan da za su taimaka wa AI fahimtar Hausa yadda ya kamata. Kuma hakan na iya zama babban mataki ga Hausa da sauran harsunan Afirka a duniyar AI.
26
14
72
1,930
A few days ago, we launched Dialect Connect — a simple way for people to have real conversations while contributing to African speech datasets. Here’s where things stand already: • 896 total conversation requests • 703 completed conversations • 107.4 hours of conversational speech collected • 12 pending • 8 active • 173 rejected Alongside this, our corpus reading and transcription workflows have now crossed more than 300,000 voice samples collected from Hausa-speaking contributors across our platform. What matters to us is not just collecting audio. The difficult part is what happens after collection. Every contribution inside Dialectra.io goes through a structured pipeline: → Transcription → Annotation → Standardization → Human verification → Approval We built this because raw voice recordings alone are not enough to train reliable speech systems. Models need properly reviewed transcripts, dialect-aware normalization, quality checks, and consistent formatting before the data becomes useful for training. This is where many African language datasets struggle. A lot of existing datasets are either scraped, weakly labeled, inconsistent, or missing conversational context entirely. We are trying to approach this differently. Dialectra is focused on building speech datasets that reflect how people actually speak — accents, dialects, pauses, code-switching, natural conversations, and regional differences included. For voice AI startups and model builders, this matters more than dataset size alone. Better infrastructure produces better models. We’re still early, but it’s exciting seeing contributors across Hausa-speaking communities helping shape what this can become. More updates soon.
13
12
43
1,197
Dialectra retweeted
If you want to learn more about Dialectra and why US.. Don't miss out this AMA session.
Clock it ogs...🔥
6
10
46
1,028
Dialectra retweeted
We launched "Dialect Connect" yesterday and in just 24hrs, the stats is really impressive. I was thinking few days ago a simple idea: what if we could capture how people actually speak, not just how they read? I then implements yesterday as a additional feature for Dialectra.io 24 hours later: 📞 371 conversation requests ✅ 303 completed conversations 🎙️ 45.1 hours of conversational speech collected ⏳ 5 pending 🟢 2 active ❌ 61 rejected For years, most speech datasets have been built around scripted recordings. They are useful, but they only tell part of the story. Language lives in conversations. It lives in pauses, interruptions, storytelling, laughter, code-switching, local expressions, and the unique rhythm that makes every dialect different. The future of voice AI will not be built solely on people reading sentences from a screen. It will be built on authentic human interactions. That is what excites me most about these numbers. In just 24 hours, hundreds of people chose to connect with complete strangers or friends and simply talk. In doing so, they generated something incredibly valuable: real-world conversational data for African languages and dialects. Every completed conversation moves us closer to a future where AI can understand not only what Africans say, but how we say it. When we started Dialectra, our mission wasn't just to collect voice data. It was to ensure that African languages, dialects, and identities are represented in the AI systems that will power the next generation of technology. 45.1 hours is a small number compared to where we're going. But it's a reminder that the infrastructure for African voice AI won't be built in a lab alone. It will be built by communities, contributors, and everyday conversations happening across the continent. We're still very early.
15
21
99
2,487
Dialectra retweeted
Africa is home to some of the world’s most spoken and culturally influential languages, yet modern AI systems still struggle to understand them accurately. Hausa alone is spoken by an estimated 80 to 100 million people across West and Central Africa, particularly in Nigeria and Niger. Swahili, widely recognized as Africa’s leading lingua franca, connects more than 200 million speakers across East and Central Africa. Arabic, one of the continent’s most dominant languages, is spoken by hundreds of millions across North Africa and parts of the Sahel, shaping commerce, education, religion, and communication throughout the region. Yet despite this enormous linguistic scale, African speech remains heavily underrepresented in global AI systems. That is the gap @_dialectra is stepping in to solve building the speech infrastructure designed to help artificial intelligence truly understand how Africa speaks.
5
11
43
1,548