Edward Sseremba, 46, speaks Ruruuli, the native language of the Baruuli and Banyala peoples, who live in the central Ugandan districts of Nakasongola, Kayunga, and Masindi, in mid-western Uganda.
Sseremba, a teacher, lives with his three children and wife in Kangulumira County in Kayunga.
Recent reports naming his language among 23 Ugandan languages at risk of extinction alarmed Sseremba, but they came as no surprise because, even in his own household, Ruruuli is often eclipsed by Luganda, a more widely spoken local language.
That worrying status quo for Ruruuli and other endangered African languages, such as the Hadza language of the Hadza people, a group of nomadic hunter-gatherers in northern Tanzania, and the N|uu language of the San people, the first hunter-gatherers in South Africa, may not endure forever, however.
Growing optimism within various African AI language-model communities suggests that multilingual models, whether internationally developed or locally created, such as Uganda's Sunflower, the African Next Voices Project, and Lelapa AI's InkubaLM, among others, have the potential – similar to the cases of Maori and Hawaiian, in New Zealand and Hawaii, where AI-assisted tools are helping communities to learn and use their languages more actively – to revolutionize the documentation and digitization of African indigenous language knowledge systems, thereby aiding in the preservation of grammar, vocabulary, and syntax for future generations.
Professor Engineer Bainomugisha, a Ugandan computer scientist, contends that in cases where only a few speakers of an African language remain, as in the case of the N|uu language, AI language models will be a valuable resource for its revival.
N|uu now spoken by fewer than a dozen elderly individuals, is classified in UNESCO's "Atlas of the World's Languages in Danger" as critically endangered.
The Atlas indicates that up to 10% of Africa's languages, particularly those spoken by small communities, may disappear within a century.
Dr. Aminah Zawedde, the Permanent Secretary at Uganda's Ministry of ICT and National Guidance, contends that although challenges such as limited technological accessibility in marginalised African communities, limited data availability and ethical concerns over language ownership continue to hinder the broad deployment of AI language model technologies across many African nations, there is no doubt that the continent can safeguard its linguistic diversity and cultural heritage by leveraging the power of AI.
Whilst Bainomugisha was in concurrence with the notion that AI language models such as large language models, automated transcription and speech recognition have the potential to safeguard Africa's linguistic diversity and cultural heritage, he emphasized that addressing challenges such as limited data availability and technical hurdles such as inadequate technical infrastructure and computing resources must be a top priority.
"A major challenge lies in the scarcity of high-quality language resources (written documents about the language itself or materials in the language and audio recordings) required to train language models. Developing these resources demands time, funding, and collaboration with native speakers and linguistic experts. In some cases, it may even involve identifying the few remaining speakers of a language and working closely with them to collect authentic data."
"Then there are technical hurdles such as technical infrastructure and computing resources. These require funding. Overcoming these challenges will require partnerships between researchers, communities, and government, as well as open, collaborative initiatives to share language data and tools across institutions and countries," added Bainomugisha, who is also an associate Professor and Chair of the Department of Computer Science at Uganda's Makerere University.
Dr. John Quinn, a Research Software Engineer, who has worked on several large-scale AI projects across the African continent in the fields of remote sensing, speech, and language, observed that while AI could, among other things, create translation tools to help people communicate in endangered languages, it will be critical in the context of Africa to innovate locally.
"AI language systems will only help to preserve Africa's endangered languages if there are local teams working on them. African developers face a lack of access to the same amount of computing resources as developers in other parts of the world, but there are a lot of creative minds on the continent who are setting their wits to work, and sooner or later, they will have the resources they need to bring that much-needed change to bear," said Quinn, who also serves as the Research Director of Sunbird AI, a non-profit organisation that develops artificial intelligence systems for social impact in Africa.
Vukosi Marivate, a professor of computer science at the University of Pretoria in South Africa, noted that the successful fusion of traditional knowledge with contemporary technology is what will determine the future of African languages.
"With that said, there is need on the African continent, which is home to one-third of the world's languages, for increased financial investment in language preservation technology models and expanded community engagement preservation efforts."
In Marivate's reckoning, increased financial investments will complement the $10 million pledged by the Gates Foundation and other donors from the AI for Development Funders Collaborative to ensure AI models are more inclusive of African languages.
AI virtual assistants and translation services: are they a threat to indigenous African languages?
In response to widespread concerns that the growing use of AI-powered virtual digital assistants and multilingual neural machine translation services may put indigenous languages at risk, Dr. Quinn stated that it will be critical to have local AI language model systems that understand the cultures and languages that are important to African communities.
"The largest AI systems are made by companies in the US, so if you ask ChatGPT a question about the traditions or cultures of local communities, it may not know or might give a biased answer. This is why we need local AI systems, which understand the languages and cultures that are important to those communities and would be overlooked by large companies in the US. What we're trying to do at Sunbird AI, and our collaborators at Makerere University in Uganda and elsewhere, is make tools which people find relevant and useful."
Some key African initiatives harnessing AI to preserve African languages
African Next Voices Project: The project has captured 9,000 hours of speech across 18 languages in Nigeria, Kenya, and South Africa, transforming them into digitised datasets for developers to integrate into extensive language models.
Lelapa AI's InkubaLM: Africa's first multilingual small language model, trained on 2.4 billion tokens in five African languages (Yoruba, Hausa, Swahili, IsiZulu, and IsiXhosa).
Maseno Centre for Applied AI: A center in Kenya collecting voice data for five languages, including Dholuo, Maasai, Kalenjin, Somali, and Kikuyu.
African Universal Dependencies Treebanks: A project that aims to increase the representation of African languages in AI research by creating a quality dataset with consistent syntactic human annotations for eleven typologically diverse African languages.
Data Science Nigeria: An organization collecting speech data in multiple languages, including Bambara, Hausa, Igbo, Nigerian Pidgin, and Yoruba.
Sunflower enables users to translate between English and Ugandan languages, improving understanding and vocabulary. Its text-to-speech features can help users practice correct pronunciation, while upcoming speech-to-speech capabilities will allow people to engage with technology through spoken language, even those who cannot read or write fluently. This is especially useful for young learners, who can develop comprehension skills through interactive engagement. Sunflower can also facilitate the translation of educational materials from English into local languages, enhancing understanding of key concepts and promoting use of indigenous languages in education.
By all accounts, a version of the Sunflower assistant for Ugandan languages is about to be launched on WhatsApp.
Source: Google.