paint-brush
Sign up for the Game: Can AI Make Sports Streams Accessible?by@degravia
251 reads

Sign up for the Game: Can AI Make Sports Streams Accessible?

by Roman GarinSeptember 19th, 2023
Read on Terminal Reader
Read this story w/o Javascript

Too Long; Didn't Read

Translating into sign languages in real time is a tricky task, even for human interpreters. Using AI to solve this problem is a very interesting challenge.
featured image - Sign up for the Game: Can AI Make Sports Streams Accessible?
Roman Garin HackerNoon profile picture

September 23 is the International Day of Sign Languages, as proclaimed by the United Nations in 2017. This date is a good occasion to dream (or maybe set a goal) that a day will come when all media and tech products will be equally accessible by all people regardless of their disabilities. I dream that someday all deaf people will be able to watch live sports streams. Translating into sign languages in real time is a tricky task, even for human interpreters. But as there are too few skilled interpreters and so many different sign languages, sports streams cannot become truly universally accessible at the moment. Using Artificial Intelligence (AI) to solve this problem is a very interesting technical challenge and definitely a very good cause. A lot has been done in this field over the past few years, but obstacles still persist. In this article, I am offering an overview of the latest technology dedicated to this goal and invite you to discuss these findings and contribute to cracking this riddle.

Sport is not for everyone?

Sport is King, period. Since the first ancient Olympics (and probably even before that) it helped transform the competitive part of human nature into non-violent forms. It has been uniting millions of people across the globe and above political borders. It is also the ruler of the modern digital and media universe. According to Research and Markets, the global sports market grew from $486.61 billion in 2022 to $512.14 billion in 2023 at a compound annual growth rate (CAGR) of 5.2%. The sports market is expected to grow further to $623.63 billion in 2027 at a CAGR of 5.0%. That is way faster than the world economy growth, which is projected to fall from an estimated 3.5% in 2022 to 3.0% in both 2023 and 2024, according to the International Monetary Fund. The global online live video sports streaming market alone was valued at $18.11 billion in 2020 and is expected to reach $87.33 billion in 2028. Further illustrating sports’ popularity, a 2022 report by Nielsen Sports revealed that 31% of U.S. linear TV ad revenues depend on live sports programming, despite sports accounting for only 2.7% of the available broadcast program content.


However, this huge industry misses (partly or entirely) a significant part of the world’s population. The UN data suggests there are 70 million deaf people in the world, which is a tad less than 10% of the Earth’s 8.05-billion population. The problem progresses: the World Health Organization expects that by 2050 2.5 billion people (or roughly a quarter of all humans) will experience some degree of hearing loss. Of course, many of the sports broadcasts have subtitles. But the problem is that many deaf people have difficulty learning to read and write. In most countries, the illiteracy rate among the deaf is above 75%, a truly staggering rate. Many broadcasts, especially on TV, have live sign language interpreters. But, again, there is a problem. Deaf people across the globe use more than 300 different sign languages and most of them are mutually unintelligible. It is obviously impossible to hire 300 interpreters to make one broadcast globally accessible. But what if we hire an AI instead?

Sign (language) of life

To fully understand the difficulty of this task, let us have a brief dive into what sign languages actually are. Historically, they were often used as lingua franca by people blessed with normal hearing, but speaking different languages. The best known example is the sign language of the Plains Indians in 19th-century North America. The languages of different tribes were dissimilar, but their way of life and environment were quite akin, which helped them find common symbols. For instance, a circle drawn against the sky meant the moon, or something as pale as the moon. Similar ways to communicate were used by tribes in Africa and Australia.


However, this is not the case with sign languages used by the deaf. They have been developing independently in each region, country and sometimes they even differ from city to city. For example, American Sign Language (ASL) widely used in the US is totally different from British Sign Language even though both countries speak English. Ironically, ASL is much closer to the Old French Sign Language (LSF) because a French deaf man, Laurent Clerc, was one of the first teachers for the deaf in the US in the 19th century. Contrary to a popular belief, there is no true international sign language. An attempt to create one was Gestuno, now known as the International Sign Language, conceived by the International Federation of the Deaf in 1951. However, just as its analogue for the hearing people, Esperanto, it is not nearly as popular to become a true solution.


Another important thing to keep in mind when discussing translations into sign languages, is that they are independent languages of their own, completely different from the languages we can hear. A very common misconception is that sign languages are mimicking those spoken by the hearing. On the contrary, they have a totally different linguistic structure, grammar and syntax. For instance, ASL has a topic-comment syntax, while English uses subject-object-verb constructions. So, in terms of syntax, ASL actually shares more with spoken Japanese than it does with English. There are sign alphabets (see more about them here), but they are used to spell proper names of places and people, not to compose words.

Breaking the barriers

There were numerous attempts to connect spoken and sign languages using “robotic gloves” for gesture recognition. Some of them date back to the 1980s. With time, more sophisticated gadgets were added, like accelerometers and all sorts of sensors. However, the success of these attempts were limited at best. And anyway, most of them focused on translating sign languages to spoken languages, not the other way round. Recent developments in computer vision, speech recognition, neural networks, machine learning and AI give hope that direct translation from spoken to sign languages is also possible.


The most common path is using 3d avatars to display sign language gestures and emotion, using speech and other data as input. A notable feature developed by NHK broadcast corporation in Japan enables translating sports data, like players’ names, scores etc. into sign language displayed by an animated cartoon-like avatar. The data received from the event organisers or other entities is interpreted and put into templates and then expressed by the avatar. However, only limited types of data can be translated this way. NHK says it continues to develop the technology so that the avatars can express emotions in a more human manner.


Lenovo and a Brazilian Innovation Hub CESAR recently announced they were creating a sign language translator for hearing people employing AI. Similarly, SLAIT (which stands for Sign Language AI Translator) has been developing an educational tool that helps learn ASL in an interactive way. Although these tasks are different from our scope, the computer vision techniques and AI training models developed by these projects can be very useful in providing the translation from speech to sign language in the future.


Other startups are getting closer to our topic of discussion. For instance, Signapse came up with a solution that can translate text into sign language displayed as a photo-realistic animated avatar motion. The company uses Generative Adversarial Networks and deep learning techniques, as well as a constantly developing video database (more on that in their peer-reviewed article here). However, this platform is aimed mostly at translating public announcements and website texts. In other words, it seems still far from real-time live translation.


Israeli-based startup CODA took another step forward to our goal. It developed an AI-powered audio-to-sign translation tool and claims it works “almost instantly”. It currently offers its services in five source languages: English, Hebrew, French, Spanish, and Italian. Next, CODA aims to add multiple different sign languages of high population countries such as India and China.


Arguably the closest match to our dream was presented by Baidu AI Cloud on its digital avatar platform Xiling. The platform was launched to provide hearing-impaired audience with broadcasts of the Beijing 2022 Paralympic Winter Games. Local media said it was capable of generating digital avatars for sign language translation and live interpretation “within minutes”.

Conclusion

The next step in developing the speech-to-sign translation would be expanding the output to as many sign languages as possible and reducing the time gap needed for translation from minutes to seconds. Both tasks represent major challenges. Adding more sign languages to the output feed means creating and permanently developing extensive databases of hand and body gestures as well as facial expressions. Reducing the time gap is even more important, as sports are all about moments. Even a minute-long gap means that the stream should be delayed or else the audience will miss the very essence of the game. Time required for translating can be reduced by building more extensive hardware infrastructure, developing databases of the most typical speech templates that can be recognised before the phrase is even finished. All of this may sound like a costly venture. But on the one hand, improving life quality for millions of people is priceless. On the other hand, we do not speak of just charity. Think of the additional audience the broadcasts would receive and the sponsor money that are in play. All in all, it may quite be a win-win game.


It seems like the tech majors are also joining the race. Zippia, a career portal, recently indicated that Google has been hiring sign language interpreters at more than twice the salary they would normally expect in the United States ($110,734 vs. average $43,655). At this rate, a language interpreter would get about 10% more than an average software engineer in the US ($100,260). This may well be a hint that we are expecting a major breakthrough soon…


Please feel free to comment and let us join forces to find the solution!