TD

@td1313

YouTube Murals

Visualizing topic change for this video

Painting topic change over time in YouTube videos with Natural Language Processing and Processing.js

Some days after a long week of school (and now work) I used to have a habit of watching trash tv to give my brain a break. Now that I’m out of school, I’ve never been more hungry for knowledge. Besides reading daily Medium articles, I’d go on Youtube or TED and watch videos on topics that I’m interested in, past conferences, or just for the “lol’s”. Sometimes these videos can last for up to two hours! I’ll admit that I sometimes skim parts of a video to make sure that I get to the root of the topic and that I didn’t fall for click bait.

During my modern physics class this summer, my group and I did research on the Breakthrough Starshot project. I looked for videos that talked about the project, and thankfully they got the point across in 5 minutes. This video was next on autoplay, and being an Elon Musk super fan not even a long days worth of classes made me shelve this on my infinite display of forgotten Chrome tabs...

I eventually watched the video all the way through in one sitting, but I just wanted to get to the parts I needed — it would be a miracle for me to have free hour to sit and watch this when I had group projects, my own homework, and an hour train ride home with no free wifi.

WHOA COOL! But it’s really long right?

Navigating this video in intervals was exhausting. I got to what I needed eventually, but what if I skipped other relevant parts of the video?

The second time I tried to watch this I saw that Youtube included time encoded transcripts that you can view the entire transcript to the right of the video. Because they’re indexed with time stamps I can search for specific words or phrases and jump to the parts I need.

Time encoded transcripts (not machine generated)
Time encoded search results for the desired topic

Accessibility Implications

What started off as an idea for a creative coding sketch turned into a new art project presenting enormous implications for YouTube and human cognition. I had somewhat of an easy way out if I could hear what was said at a given time. This isn’t the case for many others. YouTube demonstrated their commitment to accessibility with closed captions, yet this is just the beginning. I believe there more approaches that can contribute to this.

“Everyone should be able to access and enjoy the web. We’re committed to making that a reality.” — Google Accessibility.

Making Topics Visible

A bag o’ words approach sounded straight forward and effective— group words associated with a certain topic at a given point in time. The size of each group can depend on a specified duration, or sentence length. Thankfully YouTube’s API makes it possible to retrieve captions, and not just noisy auto-generated ones. These come in the form of Timed Text Markup Language (version 1. I promise you’ll thank me later). I’ll demonstrate this using a transcript from Elon Musk’s talk.

TTML snippet from Elon Musk Talk

When sifted through an xml parser, I got the following below which made the processing much easier:

{'start': '75.936', 'dur': '1.302'} 
ELON MUSK: Thank you. Thank you very
{'start': '77.238', 'dur': '1.887'} 
much for having me. I look forward to
{'start': '79.125', 'dur': '3.547'} 
talking about the SpaceX Mars
{'start': '82.672', 'dur': '2.25'} 
architecture. And what I really want to
{'start': '84.922', 'dur': '3.019'} 
achieve here is to make Mars seem
{'start': '87.941', 'dur': '2.949'} 
possible, make it seem as though it's
{'start': '90.89', 'dur': '2.32'} 
something that we can do in our
{'start': '93.21', 'dur': '3.11'} 
lifetimes and that you can go. And is
{'start': '96.32', 'dur': '1.76'} 
there really a way that anyone can go if
{'start': '98.08', 'dur': '1.55'} 
they wanted to?

In Natural Language Processing it’s important to filter out stop words (common words in a specific language) to reduce noise when clustering words by topic. There’s no complete list of stop words, but I used NLTK’s stop words corpus (encoded with unicode). In the English language, these would consist of prepositions, articles, proper nouns. I removed these words from the text clusters before

[u'i', u'me', u'my', u'myself', u'we', u'our', u'ours', u'ourselves', u'you', u'your', u'yours', u'yourself', u'yourselves', u'he', u'him', u'his', u'himself', u'she', u'her', u'hers', u'herself', u'it', u'its', u'itself', u'they', u'them', u'their', u'theirs', u'themselves', u'what', u'which', u'who', u'whom', u'this', u'that', u'these', u'those', u'am', u'is', u'are', u'was', u'were', u'be', u'been', u'being', u'have', u'has', u'had', u'having', u'do', u'does', u'did', u'doing', u'a', u'an', u'the', u'and', u'but', u'if', u'or', u'because', u'as', u'until', u'while', u'of', u'at', u'by', u'for', u'with', u'about', u'against', u'between', u'into', u'through', u'during', u'before', u'after', u'above', u'below', u'to', u'from', u'up', u'down', u'in', u'out', u'on', u'off', u'over', u'under', u'again', u'further', u'then', u'once', u'here', u'there', u'when', u'where', u'why', u'how', u'all', u'any', u'both', u'each', u'few', u'more', u'most', u'other', u'some', u'such', u'no', u'nor', u'not', u'only', u'own', u'same', u'so', u'than', u'too', u'very', u's', u't', u'can', u'will', u'just', u'don', u'should', u'now', u'd', u'll', u'm', u'o', u're', u've', u'y', u'ain', u'aren', u'couldn', u'didn', u'doesn', u'hadn', u'hasn', u'haven', u'isn', u'ma', u'mightn', u'mustn', u'needn', u'shan', u'shouldn', u'wasn', u'weren', u'won', u'wouldn', u"'s", u"n't", u"'m", u"'d"]

Latent Dirichlet Allocation

LDA is a common model in Natural Language Processing where it discovers topics in a sentence or document. It’ll assume that you have a word count of your document. It scopes out a piece of text and finds a bunch of key words that it’ll use to learn what the document is about.

Think of harried high school students who’re preparing for a quiz in English class by reading (or skimming) their assignment for the next class, and making sure they understand what’s going on because their teacher’s on to their Spark Notes use. They’ll highlight words, sentences, paragraphs, or 99 percent of the text in a reading assignment. They’ll pick a few topics that they covered in class, and when they read the next assignment they’ll try to figure out why it fits with the topic. They’ll go through the text over and over to make sure the parts they took notes on fit under that topic.

Just like in LDA, a student take a mental snapshot of his or her reading notes, and see that for each chapter c and topic category t, he or she can see the portions of highlighted sections that go under each category for a full representation of c. If they happen to be reading something like a Harry Potter book and they finished reading chapter c, they might see that the topic distribution could be 10% friendship, 50% magic, 10% bravery, and 30% family.

After looking over the document repeatedly this model returned the probabilities of words appearing in Musk’s transcript assuming it’s looking for 5 topics:

LdaModel(num_terms=1016, num_topics=5, decay=0.5, chunksize=100)
[(0, u'0.041*"engine" + 0.026*"really" + 0.025*"make" + 0.022*"tank" + 0.016*"rocket" + 0.014*"vehicle" + 0.013*"also" + 0.013*"merlin" + 0.013*"capable" + 0.012*"because"'), 
(1, u'0.031*"mars" + 0.028*"use" + 0.025*"mission" + 0.024*"carbon" + 0.022*"fiber" + 0.022*"liquid" + 0.018*"thing" + 0.017*"falcon" + 0.016*"day" + 0.015*"very"'), 
(2, u'0.045*"system" + 0.035*"would" + 0.032*"propel" + 0.028*"go" + 0.026*"time" + 0.024*"solar" + 0.022*"mars" + 0.018*"orbit" + 0.018*"cost" + 0.016*"greater"'), 
(3, u'0.034*"first" + 0.027*"station" + 0.026*"applause" + 0.024*"dragon" + 0.020*"space" + 0.016*"think" + 0.016*"ton" + 0.015*"mars" + 0.014*"launch" + 0.013*"go"'), 
(4, u'0.028*"booster" + 0.028*"spaceship" + 0.025*"get" + 0.023*"land" + 0.022*"like" + 0.021*"really" + 0.021*"maybe" + 0.019*"go" + 0.018*"anywhere" + 0.018*"actual"')]

Now that we have the estimated topic mixtures from the Musk talk after 50 iterations, here is the mural below:

Youtube Mural for “Making Humans a Multiplanetary Species”

Mural Samples 🎨

The above mural was painted assuming there were 5 possible topics in the video, and that each words are grouped by 60 second intervals. These can be configured to show more or less topics, have different time intervals, etc. Below are more murals with different settings based on time intervals, sentence intervals, number of topics, and LDA iterations.

10 topics grouped by 60 second intervals after 10 LDA iterations
10 topics grouped by 5 second intervals after 10 LDA iterations
5 topics grouped by 60 second intervals after 10 LDA iterations
5 topics grouped by 60 words per sentence after 10 LDA iterations
10 topics grouped by 60 words per sentence after 10 LDA iterations
10 topics grouped by 100 words per sentence after 10 LDA iterations
10 topics grouped by 100 second intervals after 10 LDA iterations

Each row in the murals represent a different topic, with the color varying based on the distributions of each word. The length of the mural is always constant with each mark mapped according to duration (words will share topics, especially common words that didn’t appear in the stop-words corpus). Brighter areas in a topic row show areas where more words are associated with a particular topic.

5 topics grouped by 60 second intervals after 20 LDA iterations
5 topics grouped by 60 second intervals after 50 LDA iterations
5 topics grouped by 60 second intervals after 200 LDA iterations

Murals painted after more iterations typically have brighter spots. That’s because as I mentioned earlier with high school students example, they tend to go back and double check their text to make sure they’re ready for the quiz the next day. The same thing applies to LDA — the model needs to double check to make sure words are grouped correctly to achieve maximum accuracy.

A YouTube mural used to navigate Musk’s talk.

Conclusions

Natural Language Processing and colorful multimedia interfaces add new methods of accessible interaction. Google’s auto-generated captions certainly made the site more accessible for viewers; the author of this Google blogpost shared the impact these had since he himself is deaf. Sure it’s a neat way to keep me awake while I navigate lengthy videos after an already long day, but YouTube Murals can serve as a solution to viewers with disabilities that aren’t accommodated yet.

Color is a powerful stimulus for the brain — that’s what it notices and remembers first. It engages other areas of the brain and allows for improved learning and memory performance. Try watching Musk’s talk now and see if you notice any differences in how you pay attention and remember it. If you did notice improvements, imagine what this would do for class lectures, videos for kids, or for older users with age-related accessibility needs like Alzheimers.

There are so many people for whom interacting in the physical world is really tough, yet interacting with an accessible Web is easy, and it will get easier thanks to artificial intelligence.

For further reading:

  1. LDAExplore: Visualizing Topic Models Generated Using Latent Dirichlet Allocation: https://arxiv.org/pdf/1507.06593.pdf
  2. Intro to Latent Dirichlet Allocation: http://blog.echen.me/2011/08/22/introduction-to-latent-dirichlet-allocation/
  3. The Influence of Colour on Memory Performance: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3743993/

More by TD

Topics of interest

More Related Stories