paint-brush
Trend Analysis of TED Talks with Python Codesby@moimaere
1,212 reads
1,212 reads

Trend Analysis of TED Talks with Python Codes

by Muammer HüseyinoğluJanuary 18th, 2019
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

TED is a non-profit organization founded in 1984 by Richard Saulman.

Companies Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - Trend Analysis of TED Talks with Python Codes
Muammer Hüseyinoğlu HackerNoon profile picture

TED is a non-profit organization founded in 1984 by Richard Saulman.

TED aimed at bringing experts from the Technology, Entertainment and Design converged, and today covers almost all topics in more than 100 languages. TED’s mission is “spread ideas” in the form of short and powerful talks.

I have learned a lot of things from TED Talks about fields that I didn’t have even a knowledge crumb. And with this story, I want to dig into Trends of TED by progressing with Python codes.






#Importsimport pandas as pdimport matplotlib.pyplot as pltimport seaborn as snsimport datetime%matplotlib inline

I will use TED Talks dataset received from Kaggle. I should notice that data contains years between 2006 and 2017. Therefore, I will be analyzing up to a year ago. You can reach the data from the link down below and get more details about the features of the data.


TED Talks_Data about TED Talks on the TED.com website until September 21st, 2017_www.kaggle.com



#Getting the datadf = pd.read_csv("../input/ted_main.csv")df.columns






#Setting Date Formatmonth_order = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']day_order = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']df['film_date'] = df['film_date'].apply(lambda x: datetime.datetime.fromtimestamp( int(x)).strftime('%d-%m-%Y'))df['published_date'] = df['published_date'].apply(lambda x: datetime.datetime.fromtimestamp( int(x)).strftime('%d-%m-%Y'))df["published_year"] = df["published_date"].apply(lambda x: x.split("-")[2])

Most Viewed 25 Talks of All Time


df = df.sort_values('views', ascending=False)df[["main_speaker","title","views","published_date"]].head(25)

Ken Robinson’s talk titled “Do Schools Kill Creativity?” is the most popular TED Talk of all time with 47.2 million views. Also, this talk published in 2006. That’s mean it is one of the oldest talks.

Second most viewed talk titled “Your body language may shape who you are” belongs to Amy Cuddy with 43.1 million views. And it’s published in 2012! According to the published date, Amy’s talk performed very well when compared to Kens and views decrease dramatically downwards.

Most Content Produced Years At TED



plt.figure(figsize=(15,5))sns.countplot(df["published_year"])plt.show()

It seems like TED team has done well in 2012 and growth regularly till then. But after, decreases slightly.

What about views by year?

sns.barplot(x= df.groupby(["published_year"]).sum()["views"].index, y= df.groupby(["published_year"]).sum()["views"])

According to views, it peaked in 2013 and crashed hard in 2017. Even can say TED returned back to it started. I think the rise in 2012 and 2013 occurred by means of Amy Cuddy with 43.1 million viewed speech and TED lived its most popular times.

Top 10 Occupations The Most Likely To Talk At TED

df["speaker_occupation"].value_counts().head(10)

As the output suggests, writers ahead by a big gap. TED attaches importance to intellectual knowledge of speakers. If you have an achievement on the artistic field, there is no obstacle.

Average Views Per Top 5 Occupation





print("Writer: ",int(df[df["speaker_occupation"]"Writer"]["views"].sum() / len(df[df["speaker_occupation"]"Writer"])))print("Designer: ", int(df[df["speaker_occupation"]"Designer"]["views"].sum() / len(df[df["speaker_occupation"]"Designer"])))print("Artist: ",int(df[df["speaker_occupation"]"Artist"]["views"].sum() / len(df[df["speaker_occupation"]"Artist"])))print("Jornalist: ",int(df[df["speaker_occupation"]"Journalist"]["views"].sum() / len(df[df["speaker_occupation"]"Journalist"])))print("Entrepreneur",int(df[df["speaker_occupation"]"Entrepreneur"]["views"].sum() / len(df[df["speaker_occupation"]"Entrepreneur"])))

When looked at viewer demands, it seems like people like entrepreneurship speeches. Because with 1.9 million average it catches the second row although writers’ speeches perform very well.

Most Used 10 Tags of All Time






tags = []for i in range(len(df.loc[:,'tags'])):ls = list(df.loc[:,'tags'])[i][2:-2].split(',')for c in range(len(ls)):value= list(df.loc[:,'tags'])[i][2:-2].split(',')[c]tags.append(value.replace("'",""))



tags = pd.DataFrame(tags,columns=["tags"])tags.iloc[:,0].value_counts().head(10)tags = pd.DataFrame(tags.iloc[:,0].value_counts()).reset_index()





plt.figure(figsize=(15,5))sns.barplot(x=tags["index"].head(10),y=tags["tags"].head(10))plt.xlabel("tags")plt.ylabel("talks")plt.show()

Technology is the most used tag of all time as you may be expected. Then science follows it.

Struck upon an idea when I saw that graph. The idea is, analyzing how tag trends change year by year for the last 3 years.








df2017 = df[df["published_year"]=="2017"]tags2017 = []for i in range(len(df2017.loc[:,'tags'])):ls = list(df2017.loc[:,'tags'])[i][2:-2].split(',')for c in range(len(ls)):value= list(df2017.loc[:,'tags'])[i][2:-2].split(',')[c]tags2017.append(value.replace("'",""))tags2017 =pd.DataFrame(tags2017,columns=["tags"])








df2016 = df[df["published_year"]=="2016"]tags2016 = []for i in range(len(df2016.loc[:,'tags'])):ls = list(df2016.loc[:,'tags'])[i][2:-2].split(',')for c in range(len(ls)):value= list(df2016.loc[:,'tags'])[i][2:-2].split(',')[c]tags2016.append(value.replace("'",""))tags2016 =pd.DataFrame(tags2016,columns=["tags"])








df2015 = df[df["published_year"]=="2015"]tags2015 = []for i in range(len(df2015.loc[:,'tags'])):ls = list(df2015.loc[:,'tags'])[i][2:-2].split(',')for c in range(len(ls)):value= list(df2015.loc[:,'tags'])[i][2:-2].split(',')[c]tags2015.append(value.replace("'",""))tags2015 =pd.DataFrame(tags2015,columns=["tags"])

print("most used tags in 2015: ", "\n","\n",tags2015["tags"].value_counts().head())

print("most used tags in 2016: ", "\n","\n",tags2016["tags"].value_counts().head())

print("most used tags in 2017: ", "\n","\n",tags2017["tags"].value_counts().head())

Actually, I find the idea right although TED views crashed in 2017. The last couple of years, I couldn’t observe big innovations and technological improvements. The simplest example of this is on mobile phones. There are another next level device presentations of the brands every year. When you look at the presentations, you can notice they only drive attention on design or performance improvements. There is no innovation yet if you don’t regard the thinner bezels and an increasing number of cameras behind your phone. They try to sell fancy words like “Artificial Intelligence” although it isn’t a new concept in the technology field. I think it will take a back seat too in the next couple of year on capitalist marketing environment.

It is good that TED is aware of the situation and focuses on sociological fields.