Some years back, I asked Google Assistant a simple question (can’t remember what it was) and it brought an unrelated answer. I asked again, and it brought a different answer. I had to type my question.
My spoken English and accentuation have improved due to personal development and exposure, but I can also tell that many voice recognition apps and software are becoming more adaptive to African speakers (accent-wise). However, the truth is that voice recognition for Africans has a much longer way to go.
I still wonder why we do not have many apps that can be prompted with major local African languages, of which there are over 10 million native speakers. So, I decided to speak with a field linguist and academic researcher on the current situation of artificial intelligence and natural language processing in African contexts and languages.
Hi, I am Olanrewaju Samuel.
I am interested in computational phonology, dataset building, annotation and curation, Natural Language Processing and field linguistics.
My primary mentor is
I am not yet strict with my research goals, but I am focused on developing my expertise and exploring my possibilities for now. Not for the certifications per se, but for self-development. So, I am seeking to develop myself while also attempting to complete my programme here and move on to some other things.
I have collaborated with different great individuals to be part of different publications. One of my recent linguistics papers is “
This includes:
I am teaching a course entitled “Natural Language Processing for Linguists”. Basically, I am teaching linguistic natural language processes within the African contexts in Kigali, Rwanda.
I am tasked with providing and demonstrating the nuances of building, annotating, curating, analysing and publishing multilingual datasets for different NLP tasks, such as in building large language models (LLM). A large language model means to bring multiple language systems to function within a single stream. We try to achieve that by lateralization, which is sort of, training the AI system with a pattern or template. The pattern then becomes the basis for its other applications.
Beyond conversational AI, we are looking at doing something meaningful in the field of generative AI, which is still a part of lateralization for the model's ability to permutate data and generate results by mathematical computation such as probability.
NLP has been used in many instances across Africa, some of which include robotics and conversational AI. A typical example of a conversational AI is Lagos’ Alaye, which is to help natural tourists (Nigerians from other states) to find their way around Lagos —a mega-city and state— and to identify locations such as restaurants, clubs, shops, and even traffic situations using the popular Nigerian pidgin (Naija pidgin).
We are developing AI models that can be trained to perform tasks –a complex system or process is narrowed down into simple command string (modelling). That’s the practical application of NLP in robotics as it stands in Africa, at the moment.
Currently, in linguistics, the application of AI is mostly in automation although there are linguistic models infused into different AI applications such as in robots and chatbots, among others.
We have some folks doing really great stuff, like
A major challenge to Africa's landscape in finding global relevance in the AI industry is the limitation of language resources (data). Africa is multilingual, hence, there are
If anything will happen to AI, it will happen to high-resource languages. Even if it were to happen to African languages, we don’t have the systems to power them. Hence, we are lagging behind because we do not have enough to work with, and the issue has been an almost-lifelong problem of our lack of documentation.
Take Nigeria, for example, over 200 tribes, yet only three languages are the most popular. Unlike Yoruba, Igbo, and Hausa, smaller tribes and languages have little data (low resource data). That’s what we are trying to do at
AI and NLP technicians are not investing because they don't believe in it, or they think there isn't enough data to explore for their ROI. So, we are hoping our current underground works will be the breakthrough.
Moreover, Africa is marginalised in the global market of linguistic AI and NLP because the most popular search engines are Asian and Western (American, especially). Also, for some of our works here, we cannot take credit for them as Africans because of the sponsorship.
African countries that have made the most impact include South Africa, Kenya, and Rwanda –those guys are crazy! Nigeria is also trying, but most persons that ought to be exploring the space are not seeking development but the gratifications of academic certifications. We value our language(s), but we are not building datasets with them. We would rather speak or privatise our language as a heritage when we ought to be investing in documentation to preserve and protect the language.
Honestly, there isn’t much, other than the business of selling datasets . Even at that, those who pump money into the projects give much, but the amount that gets to the field agents is very little in comparison to the original amount put in.
There is no law against data collection. The most important thing is that the data is collected willingly from the native speakers, and they are rewarded for their time. However, all activities are to be in alignment with the African Union’s
And to your second question, there is nothing anyone can do about the amount of money that eventually reaches the people involved in these fields. The most important thing is that everyone commits to the project willingly. The people are told that they would be recorded and rewarded, and as long as they are okay with the price, there is no “unfairness.”
It is a wide field. Many have foundations already and are in the building stages, but we still have more aspects that are barely foundational. What I’ll recommend for anyone is to get involved with language data collection and analysis. We need data analytics for datasets as much as we need data.
Hence, I’ll recommend joining or volunteering to enthusiastic data-driven groups; volunteer for data collection and analysis, learning nomenclature and others.
Africa continues to be improperly represented in voice recognition software with commands or prompts for different AI and NLP. The narrative will become different when Africans set out to build datasets and put their language out and continue to invest in documentation. Yet, you will be impressed with some of the creations coming out from Africa concerning AI and NLP applications.
In my research and following leads, I have seen robots being prompted with local African languages, we are having more local chatbots fit for different African contexts (tourism, exploration), some languages are being used for IoT for home appliances. However, I believe we should be doing more, considering massive AI and NLP revolution going on in the world right now. For now, we have more