Google, working with research institutions across Africa, has introduced WAXAL, an open-access speech dataset designed to improve artificial intelligence technologies for African languages.
The dataset includes voice samples from 21 Sub-Saharan African languages, among them Hausa, Yoruba, Igbo, Luganda, Swahili and Acholi. Google said WAXAL is intended to support more than 100 million speakers who have historically been excluded from voice-enabled technologies because of limited high-quality language resources.
Voice-activated assistants, transcription services and other speech-driven technologies are widely used around the world. However, Africa’s more than 2,000 languages have largely been overlooked in AI development because of scarce speech data. This has created a digital gap that limits access to voice-enabled tools in areas such as education, health care and business.
To address this challenge, WAXAL was developed over three years with funding from Google. The dataset contains 1,250 hours of transcribed natural speech, along with more than 20 hours of premium studio recordings that can be used to build more lifelike synthetic voices.
“The real significance of WAXAL lies in empowering communities across Africa,” said Aisha Walcott-Bryant, head of Google Research Africa. She said the dataset provides a critical resource for students, researchers and entrepreneurs to build technology in their native languages, expanding access on their own terms.
Community participation played a central role in the project. African universities and organizations, including Makerere University, University of Ghana and Digital Umuganda in Rwanda, led data collection efforts with guidance from Google researchers.
Unlike many international datasets, ownership of the data remains with the partner institutions. This structure allows African researchers and students to develop their own applications and tools independently, without relying on external corporations.
“For artificial intelligence to truly serve Africa, it must understand our languages and cultural contexts,” said Joyce Nakatumba-Nabende, a senior lecturer at Makerere University. She said the WAXAL dataset gives researchers access to the quality data needed to develop speech technologies that reflect Africa’s diverse communities.
At the University of Ghana, more than 7,000 volunteers contributed voice recordings to the project. Associate Professor Isaac Wiafe said the initiative is already encouraging innovation in sectors such as health care, education and agriculture.
The WAXAL dataset is now publicly available, offering developers, researchers and startups access to speech data needed to build more inclusive AI solutions across Africa.
#Google #Launches #Open #Speech #Dataset #Expand #African #Languages