Last year we published , a tool for text analysis, which was fortunate enough to win a Best Paper award at CHI. Empath allows researchers to analyze text over a much larger set of categories than are available in existing lexicons (for example, “violence”, “depression”, or “politics”), and it can generate new lexicons on demand using a model based on and . Empath neural embeddings crowdsourcing We’ve since released Empath as an open source , and we’d love to have more researchers apply it to their work. Given recent discussions in the media about the that President Trump used in his inauguration speech, this seemed a good to demonstrate what Empath can do. Python library unprecedented language opportunity So, how should we analyze an inauguration speech? There are many possible questions we might ask, but I’m going to focus on how President Trump’s inauguration speech differs from President Obama’s, as he began his first term in 2009. In general, adopting a point of comparison makes lexical analyses easier to interpret. For example, consider the claims: and The threshold for an speech is unclear (this is somewhat philosophical: at what point do we consider a speech angry?), but it is simple to determine whether one speech is than another. In this case, our comparison will ask: how do the signals Empath identifies in Trump’s speech compare with the same signals in Obama’s? “Trump’s speech is angry” “Trump’s speech is more angry than Obama’s speech”. angry more angry To start, I downloaded the transcripts of both inauguration speeches. You can find President Trump’s , and President Obama’s . I then wrote a short script using the Empath library. here here Python Above, Empath walks over the words in each speech, and counts the number of words that fall into its lexical categories. For example, the word “bleed” would increment categories for and , or the word “hope” would increment categories for and . hurt violence optimism positive emotion I then imported the resulting category counts into and, after a bit of data wrangling, came up with the following chart: Google Docs Here the x-axis depicts a normalized word count for each category (the number of words that fall into each category, divided by the the total number of words in the speech). So, what do we make of this? My immediate reaction is that, in many ways, these speeches are similar. For example, both Trump and Obama use language that strongly signals , , , and . To a lesser extent, both speeches also convey other signals you would expect to see, such as , , , or . There is a certain amount of tradition that underlies an inauguration speech, no matter who is giving it. government positive emotion power, strength politics military economics work terrorism But the differences between the speeches are also compelling. While both Presidents adopt language of (e.g., “win” or “accomplish”), Trump uses these words much more often than Obama. Similarly, Obama’s speech contained relatively little language of (e.g., “disagree”, “insisit”, or “fight”) or (e.g., “dangerous”, “angry”) and Trump’s speech is much stronger in these signals. On the other hand, Obama’s speech contained an enormous amount of , in comparison with Trump’s, despite similar overall signals for . achievement dispute aggression , optimism positive emotion You’ll find more nuanced considerations of individual passages elsewhere, but here is an excerpt from President Trump’s speech, which I’d consider representative of the overall tone: Mothers and children trapped in poverty in our inner cities; rusted-out factories scattered like tombstones across the landscape of our nation; an education system flush with cash, but which leaves our young and beautiful students deprived of knowledge; and the crime and gangs and drugs that have stolen too many lives and robbed our country of so much unrealized potential. This American carnage stops right here and stops right now. And similarly, an excerpt from President Obama’s: On this day, we gather because we have chosen hope over fear, unity of purpose over conflict and discord. On this day, we come to proclaim an end to the petty grievances and false promises, the recriminations and worn-out dogmas that for far too long have strangled our politics. We remain a young nation. But in the words of Scripture, the time has come to set aside childish things. The time has come to reaffirm our enduring spirit; to choose our better history; to carry forward that precious gift, that noble idea passed on from generation to generation: the God-given promise that all are equal, all are free, and all deserve a chance to pursue their full measure of happiness. Now, maybe you’ve already read these speeches; maybe you have your own interpretation. But the key benefit of Empath is that you can discover high-level, lexical signals . Here, that may seem lazy: when you are simply concerned with interpreting two speeches, it’s easy enough to read them both, at length. But as the text you are interested in grows larger— on Reddit, for example, or every New York Times article — reading and interpreting all the text yourself becomes impossible to do. That’s when a tool like Empath can step in to aid your analysis. without actually looking closely at the speeches millions of comments ever published