YouTube is the second-largest social media network and search engine on the planet, with over 2.3 billion active users.
Since its inception in 2005, YouTube has grown into a tech behemoth. It is now a hub for hobby content creators, professional influencers, and social media marketing specialists.
Over 12 million content creators are vying for viewers and competing to grow their channels. One million of them count as professional creators, meaning they make a full-time living off the platform, which now offers various avenues of monetization to users.
According to recent data by Statista, over 500 hours of video content are uploaded to the platform every single minute - that’s 30,000 hours every hour. And this pace is accelerating, with content upload rates increasing by 40% between 2014 and 2020.
To deal with this onslaught of data, YouTube is harnessing artificial intelligence (AI) in a multitude of ways.
From recommending videos that suit individual users’ tastes to flagging inappropriate clips, YouTube’s AI plays a huge part of what content users see on the platform. It’s the deciding factor between which videos achieve viral success, and which disappear into obscurity or are removed altogether.
In this article, we’ll be taking a closer look at how exactly YouTube harnesses AI, and how this influences the user experience of both viewers and content creators.
To begin with, the most crucial use that YouTube makes of AI and machine learning takes the form of the algorithm that selects search results and recommendations.
The main challenge that this algorithm has to cope with is the constantly changing video database, from which it has to winnow search results and recommendations that match every single users’ tastes.
Generally speaking, the algorithm works in two steps: candidate generation and ranking.
The first step narrows YouTube’s vast video library down to only a few hundred videos, based on user preferences and their demographic characteristics.
Here, it’s noteworthy that the algorithm considers a user’s implicit preferences. That means it takes into account videos that a user has watched, whether they shared, liked, or commented on them, and for how long they watched them. A video that was watched to the end counts much more strongly towards a user’s preferences than a clip they stopped watching after a minute or two.
Once that candidate pool is assembled, the ranking step decides in which order the videos will appear.
Factors like users’ implicit preferences are taken into account here, too. However, at this stage, video metrics - like their number of views, likes, shares, and comments - influence their ranking position heavily.
As a final step, the algorithm then injects an element of randomness in the ranking. This ensures that users discover a variety of topics, and that newer channels also see some exposure.
The results of this process can be seen across YouTube: in the suggested videos on users’ home pages, up-next sidebar, and video end screens, as well as in automatically generated topic channels and YouTube mixes.
A second field of deployment for YouTube’s AI is the process of flagging content and comments that violate the platform’s rules.
According to YouTube’s Transparency Report, 9,569,641 videos were removed from the platform between January and March 2021 alone. Of these, nearly 9,1 million were flagged automatically.
The algorithm has been trained to recognize scams, misleading information, violent content, and adult content such as nudity and pornography. It goes through each frame of a video and passes it through a convolutional neural network that detects potentially explicit features. Subsequently, each frame is labeled independently. This judgment is based on visual-only information.
Similarly, the algorithm flagged over 1 billion comments as violating YouTube’s community rules in the first three months of 2021. According to Statista, the majority of these comments were removed because their verbal patterns matched those of scams (55%), child safety violations (25%), or cyberbullying and harassment (26%).
Currently, a new feature is being tested that uses AI to preempt offensive comments. Here, the algorithm recognizes potentially problematic comments as the user is typing it. A pop-up will then remind the user of the platform’s guidelines before they can post their comments.
Similar to the flagging of content that violates YouTube’s platform guidelines, machine learning is also used to detect copyright violations.
YouTube’s content ID system allows copyright holders to register files for which they hold the copyright. These files are then matched against videos uploaded to the platform in a similar process as the one described above.
Once a copyright holder’s content has been flagged as uploaded by someone else, the copyright holder has several options. They can either completely block the offending video, monetize it through ads for their own benefit, or simply track its viewership statistics.
Another application of artificial intelligence and machine learning on the platform is to prevent younger users from viewing content that is inappropriate for them, even if it is not against the platform’s guidelines. This process is called age-gating.
While uploaders can set the category of their content to “18+”, few choose to do so. To eliminate the danger of younger viewers seeing content that could be harmful to them in any way, YouTube uses AI to flag such clips. Access is then restricted for all users who are not yet of age, and for anyone who is not logged into the platform.
Age-gating is partly a consequence of guidelines issued by the European Union in the Audiovisual Media Services Directive (AVMSD). Consequently, YouTube will in the future ask for age verification and EU users may have to submit additional proof of age before they are granted access to mature content.
In terms of computational processes, age-gating is based on signals such as the title, description, and metadata of a video, as well as visual analysis.
In contrast to the flagging and removal processes described above, YouTube also uses AI to make videos more accessible to viewers and increase their chances of being found.
In particular, it does this by adding video captions, translations, and chapters.
Video captions are being auto-generated in several languages based on a natural language processing (NLP) algorithm. Recently, this feature also became available for live streams on the platform.
In line with YouTube’s community guidelines, the NLP algorithm will intentionally edit language, for example by blanking inappropriate words.
In addition, auto-translation of auto-generated and user-uploaded captions has recently entered testing. Users have reported seeing translation options for titles, descriptions, as well as captions both in their browsers and in YouTube’s mobile app. These options included English-Portuguese and English-Turkish.
Most recently, YouTube announced that its algorithm was now also capable of analyzing a video’s structure to the degree of being able to add automatic chapters. So far, users have had to manually list timestamps alongside chapter titles in their video descriptions. For them, this brought the advantage that these “key moments” would appear in Google searches and increase the likelihood of their video being found. Now, these chapters can be generated automatically.
In addition to YouTube harnessing AI and ML to provide a better user experience to viewers and content creators, the platform also serves as a huge repository of learning data for algorithms.
For example, Google researchers used videos of the mannequin challenge to train the depth perception of models. While user-generated content is “messier” than datasets curated specifically for machine learning, its sheer volume is invaluable. This is especially true because it can help algorithms recognize real-life situations.
Another example can be found back in 2019. At that time, Google’s AI researchers used YouTube data to train a neural network to swap backgrounds. The resulting algorithm was capable of exchanging video backgrounds without necessitating specialized equipment.
In addition, YouTube is closely tied to Google's Video AI. It serves both as training data and an application for features such as face detection, context-sensitive ads, and logo recognition.
Several additional applications of AI and ML on YouTube are currently being tested. By all accounts, their use is likely to expand at a swift pace in the coming months.
For example, in March 2021, YouTube was testing product detection in videos.
The algorithm in question can recognize products that appear in video content. It will then display them, alongside related products, just below the video player.
According to The Verge, the aim of this approach is to give YouTube direct access to the highly lucrative affiliate market. It would establish a direct connection between video uploads and ecommerce.
YouTube’s use of AI is multi-faceted and complex.
Without a doubt, its overall aim is to fulfill user expectations, cater to their tastes, and provide a positive experience. The intricate search algorithm, automatic generation of captions and chapters, and flagging of (age-)inappropriate content contribute to this.
However, other uses of AI on the platform explicitly serve to increase its own capabilities and profitability. Recommendations and “Up next” screens serve to keep viewers engaged, while features such as the product detection algorithm directly aim to increase YouTube’s revenue.