Somali Startup Builds Speech-to-Text Tool to Help Organizations Process Community Feedback

Somalia has one of the smallest tech ecosystems in East Africa. Most startups remain small and access to capital is limited. The country has fewer than 500 startups, and only 53 of them have raised funding totaling $47.8 million. Investors have participated in 49 rounds, and only 131 new companies formed in the past five years have raised $12.1 million. Five startups have been acquired and 95 have shut down. Only eight were founded by women.

Shaqodoon, a nonprofit founded in 2011, has spent years working to strengthen the local tech landscape by building practical digital systems across Somalia and Kenya. Its teams develop mobile learning and feedback tools used by schools, NGOs, and government agencies.

From this work comes NaMaqal, a Somali-language speech-to-text platform that converts spoken Somali into written text. The tool was designed for organizations that receive large volumes of feedback from communities but struggle to respond because most information is stored as audio.

“We noticed that much of the valuable feedback shared by communities, especially in rural and humanitarian contexts, existed only in spoken Somali. Meetings, interviews and consultations were recorded but rarely analyzed because transcribing them manually took days,” said Mustafa Othman, executive director of Shaqodoon.

NaMaqal allows teams to understand and process spoken feedback almost in real time. The platform is currently being used by UN agencies, the Danish Refugee Council and World Vision International.

Processing Real Speech at Scale

NaMaqal builds on years of work on Imaqal, Shaqodoon’s call-based feedback system that allowed people to leave voice messages. As call volume grew to between 1,500 and 3,000 messages daily, staff spent up to 15 hours a day listening, transcribing and translating. NaMaqal moves transcription into the system itself, allowing staff to review output rather than handle the entire process manually.

The workflow begins with raw audio captured through phones or field recorders. Files are cleaned by removing noise and silent sections. The system converts sound waves into features that capture acoustic information and feeds them into neural networks trained on thousands of hours of Somali speech.

The model predicts phonemes, then words and sentences. A language model checks grammar, context and punctuation, removes fillers and redacts sensitive information. The final text is reviewed by staff. Segments flagged as low confidence are corrected, and new slang or local phrasing is added to improve accuracy. Reviewed data becomes part of future training cycles.

“Somali brings its own linguistic pressure points where words carry multiple layers of meaning,” Othman said.

Building a Dataset With No Precedent

A major challenge was the absence of large open Somali speech datasets. Shaqodoon collected recordings from radio stations, universities, community media and feedback from Imaqal. Linguists and annotators manually labeled the recordings, noting dialect, tone and speaker variation.

Somalia has three major dialects: Maxaa, Maay and coastal dialects. Maxaa is dominant in existing datasets, but leaving out Maay and coastal dialects would limit access for many speakers. Regional balance was a core principle in building the corpus.

Despite progress, Maay accuracy remains around 40% because its vocabulary and grammar differ significantly from Maxaa. Work is ongoing to improve performance.

Rural recordings added complexity due to wind, crowd noise and outdoor settings. East Africans also commonly switch between Somali, Arabic and English in a single sentence, adding more variation.

The platform uses separate acoustic and language models for each dialect group. Reviewers from Maay-speaking regions correct output and provide annotations to improve accuracy. Their input supports periodic retraining.

Platform Design and Use Cases

NaMaqal runs on cloud-based infrastructure designed to handle large audio batches. GPU processing is used during peak periods. Files are encrypted in transit and at rest. Offline caching is available for remote regions.

Somali organizations record large amounts of speech from radio programs, field teams, consultations and call centers. Manual transcription slows response times and makes analysis difficult. NaMaqal provides a searchable record that can be filtered by topic, location or time period.

Staff spend less time on full transcription and instead focus on reviewing uncertain segments and adding notes on regional phrasing. Teams can export final text or embed it into dashboards. When paired with field reports or geographic information, the output can help detect emerging issues, including price changes or local disputes.

Othman did not disclose the number of users but said pricing varies by use case.

Why Somalia Needs Speech Technology

Somalia’s tech ecosystem remains underdeveloped. Many young founders have limited access to funding or early customers. Building speech technology is demanding, which makes products like NaMaqal rare.

Despite these challenges, NaMaqal addresses a clear need. Somali is underrepresented in global speech datasets and commercial platforms do not support it well. Organizations typically rely on manual translation, which cannot keep up with large volumes of community feedback.

“The possibilities go far beyond converting speech to text. Once you can see and search spoken content, it unlocks new capabilities across many sectors, like processing voice feedback from communities to inform rapid response,” Othman said.


#Somali #Startup #Builds #SpeechtoText #Tool #Organizations #Process #Community #Feedback

Leave a Reply

Your email address will not be published. Required fields are marked *