South African Universities Join Forces to Build AI Language Models for isiXhosa, isiZulu and Sepedi

A group of South African researchers is working to ensure that African languages are not left behind as artificial intelligence reshapes how people access information, communicate and learn.

Researchers from the University of Cape Town have joined colleagues from three other universities in a national collaboration to develop AI tools that better understand and serve African languages including isiXhosa, isiZulu and Sepedi. The project is supported by the National Research Foundation and the Telkom Centres of Excellence program, which has funded information and communications technology research in South Africa for more than two decades.

The collaboration is led by Prof. Matthew Adigun and Prof. Alfredo Terzoli of the University of Zululand, Associate Prof. Thipe Modipa of the University of Limpopo, Dr. Phumzile Nomnga of the University of Fort Hare and Associate Prof. Melissa Densmore of UCT. It will fund master’s, doctoral and postdoctoral researchers across all participating institutions and runs until 2027.

“It’s one of the first projects where the Centres of Excellence are working across institutions,” Densmore said. “The idea is to build collaboration between universities while developing new innovations and technologies in the ICT sector.”

At the heart of the project is the development of large language models — the AI systems that power chatbots and digital assistants. But building such systems for African languages poses distinct challenges. Most existing AI language models are trained on vast amounts of digital text collected from the internet, and for many African languages, such data is scarce.

“The amount of text available in languages like isiZulu or isiXhosa is much smaller than what exists for English or other widely used languages,” said Dr. Jan Buys, a UCT researcher involved in the project. “So, one of the research challenges is how to develop models that still work effectively, even when the data available is limited.”

To address the data gap, researchers are searching for underutilized sources of language data, including printed materials in libraries and archives that have never been digitized, and exploring new techniques to train language models more efficiently when data is scarce. The linguistic structure of African languages presents an additional technical challenge. “These languages are morphologically complex,” Buys said. “The structure of the words can be quite intricate, so we need algorithms that can handle that complexity.”

The researchers also stress that building AI systems for African languages raises important ethical and societal questions. The team plans to consult language experts, AI specialists and native speakers about the broader implications of the technology.

“We want to talk to people who speak these languages about the potential impact of AI tools and what the trajectory of this kind of research should look like,” Densmore said. “This is about shaping global AI knowledge rather than just importing and using technologies that have been created in other parts of the world.”

The stakes are significant. Many widely used AI systems struggle to respond accurately when users ask questions in less-resourced languages, with responses that may be poorly translated, poorly framed or simply incorrect. In healthcare, Densmore said, this can have serious consequences. “If someone is looking for health information and the system gives inaccurate or misleading answers — that becomes a real problem from a misinformation standpoint.”

Community involvement is also a key goal. Through previous research, Densmore said she has seen how people want technologies that reflect the languages and dialects they use in daily life. “In one community we worked with, people said they would love to have a chatbot that speaks their local dialect — the language they use at home,” she said. “It would feel more like something that belongs to them.”

Her long-term vision is for communities to be able to build their own digital tools in their own languages. “Whether those are powered by language models or other kinds of AI, the key is that communities have ownership over them,” she said.


#South #African #Universities #Join #Forces #Build #Language #Models #isiXhosa #isiZulu #Sepedi

Leave a Reply

Your email address will not be published. Required fields are marked *

Enable Notifications OK No thanks