{"id":57580,"date":"2026-04-29T14:53:03","date_gmt":"2026-04-29T14:53:03","guid":{"rendered":"https:\/\/eduzim.co.zw\/news\/?p=57580"},"modified":"2026-04-29T14:53:03","modified_gmt":"2026-04-29T14:53:03","slug":"gsma-and-pleias-launch-open-source-model-to-correctly-identify-61-african-languages-in-ai-systems","status":"publish","type":"post","link":"https:\/\/eduzim.co.zw\/news\/2026\/04\/29\/gsma-and-pleias-launch-open-source-model-to-correctly-identify-61-african-languages-in-ai-systems\/","title":{"rendered":"GSMA and Pleias Launch Open-Source Model to Correctly Identify 61 African Languages in AI Systems"},"content":{"rendered":"<p>\n<\/p>\n<div>\n<p>The GSMA and French AI research company Pleias have released CommonLingua, an open-source language identification model covering 334 languages including 61 African languages, addressing a foundational gap in AI systems that has caused African-language text to be routinely misidentified.<\/p>\n<p>Released under the GSMA\u2019s African AI Languages Model Project, the two-million-parameter model achieves 83% accuracy in identifying African languages \u2014 a significant improvement over existing systems. Leading language identification tools such as fastText, GlotLID and OpenLID were built primarily around European and Asian languages, and African-language text is frequently mislabeled as English or French. Even state-of-the-art AI models lose roughly 30 percentage points in accuracy on African languages compared to major world languages, according to the GSMA.<\/p>\n<p>CommonLingua covers 61 African languages across eight language families: Bantu with 21 languages, Niger-Congo and West African with 18, Afro-Asiatic and Semitic with 7, Cushitic and Chadic with 4, Berber with 3, Nilo-Saharan with 3, and pidgins, creoles and other languages with 5. The model operates directly on UTF-8 byte sequences rather than relying on language-specific tokenizers, enabling consistent handling across scripts including Latin, Arabic, Ethiopic, N\u2019Ko and Tifinagh.<\/p>\n<p>Pierre-Carl Langlais, co-founder and chief technology officer at Pleias, said the model addresses the first essential step in building AI infrastructure for African languages. \u201cAfrican languages are not an edge case. They are the working languages of hundreds of millions of people, and they deserve AI infrastructure built with the same care as any other language,\u201d he said. \u201cCommonLingua is deliberately the first brick we are laying: you cannot curate what you cannot identify.\u201d<\/p>\n<p>Louis Powell, director of AI initiatives at the GSMA, said the lack of foundational infrastructure has long held back progress on African-language AI. \u201cCommonLingua addresses this critical gap, enabling the development of richer datasets and more representative AI systems at scale,\u201d he said.<\/p>\n<p>Africa is home to between 2,000 and 3,000 distinct languages. Nigeria alone has more than 500, and South Africa has 11 official languages, yet only one in 10 South Africans speak English at home \u2014 the language that dominates the internet and most AI training data.<\/p>\n<p>The model was trained exclusively on open-licensed and public domain content aggregated through the Common Corpus project, drawing on sources including Wikipedia, scientific publications in OpenAlex, VOA Africa, WaxalNLP, cultural heritage archives and Pralekha. All datasets are released under permissive licenses.<\/p>\n<\/div>\n<p>\n<script data-jetpack-boost=\"ignore\" async src=\"https:\/\/pagead2.googlesyndication.com\/pagead\/js\/adsbygoogle.js?client=ca-pub-1669381584671856\"\r\n     crossorigin=\"anonymous\"><\/script>\r\n<!-- Africa tv video display -->\r\n<ins class=\"adsbygoogle\"\r\n     style=\"display:block\"\r\n     data-ad-client=\"ca-pub-1669381584671856\"\r\n     data-ad-slot=\"3579572842\"\r\n     data-ad-format=\"auto\"\r\n     data-full-width-responsive=\"true\"><\/ins>\r\n<script data-jetpack-boost=\"ignore\">\r\n     (adsbygoogle = window.adsbygoogle || []).push({});\r\n<\/script><br \/>\n#GSMA #Pleias #Launch #OpenSource #Model #Correctly #Identify #African #Languages #Systems<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The GSMA and French AI research company Pleias have released CommonLingua, an open-source language identification model covering 334 languages including&hellip;<\/p>\n","protected":false},"author":1,"featured_media":56491,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[32,11],"tags":[301,4999,9129,6348,1798,2903,171,10596,10662,898],"class_list":["post-57580","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-mzansi","category-world","tag-african","tag-correctly","tag-gsma","tag-identify","tag-languages","tag-launch","tag-model","tag-opensource","tag-pleias","tag-systems"],"_links":{"self":[{"href":"https:\/\/eduzim.co.zw\/news\/wp-json\/wp\/v2\/posts\/57580","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/eduzim.co.zw\/news\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/eduzim.co.zw\/news\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/eduzim.co.zw\/news\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/eduzim.co.zw\/news\/wp-json\/wp\/v2\/comments?post=57580"}],"version-history":[{"count":1,"href":"https:\/\/eduzim.co.zw\/news\/wp-json\/wp\/v2\/posts\/57580\/revisions"}],"predecessor-version":[{"id":57581,"href":"https:\/\/eduzim.co.zw\/news\/wp-json\/wp\/v2\/posts\/57580\/revisions\/57581"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/eduzim.co.zw\/news\/wp-json\/wp\/v2\/media\/56491"}],"wp:attachment":[{"href":"https:\/\/eduzim.co.zw\/news\/wp-json\/wp\/v2\/media?parent=57580"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/eduzim.co.zw\/news\/wp-json\/wp\/v2\/categories?post=57580"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/eduzim.co.zw\/news\/wp-json\/wp\/v2\/tags?post=57580"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}