Analyzing Media Polarization: How Broadcast News Language Shapes Online Discourse

Authors: (1) Xiaohan Ding, Department of Computer Science, Virginia Tech, (e-mail: xiaohan@vt.edu); (2) Mike Horning, Department of Communication, Virginia Tech, (e-mail: mhorning@vt.edu); (3) Eugenia H. Rho, Department of Computer Science, Virginia Tech, (e-mail: eugenia@vt.edu ). Table of Links Abstract and Introduction Related Work Study 1: Evolution of Semantic Polarity in Broadcast Media Language (2010-2020) Study 2: Words that Characterize Semantic Polarity between Fox News & CNN in 2020 Study 3: How Semantic Polarization in Broadcast Media Language Forecasts Semantic Polarity in Social Media Discourse Discussion and Ethics Statement Appendix and References Abstract With the growth of online news over the past decade, empirical studies on political discourse and news consumption have focused on the phenomenon of filter bubbles and echo chambers. Yet recently, scholars have revealed limited evidence around the impact of such phenomenon, leading some to argue that partisan segregation across news audiences cannot be fully explained by online news consumption alone and that the role of traditional legacy media may be as salient in polarizing public discourse around current events. In this work, we expand the scope of analysis to include both online and more traditional media by investigating the relationship between broadcast news media language and social media discourse. By analyzing a decade’s worth of closed captions (2.1 million speaker turns) from CNN and Fox News along with topically corresponding discourse from Twitter, we provide a novel framework for measuring semantic polarization between America’s two major broadcast networks to demonstrate how semantic polarization between these outlets has evolved (Study 1), peaked (Study 2) and influenced partisan discussions on Twitter (Study 3) across the last decade. Our results demonstrate a sharp increase in polarization in how topically important keywords are discussed between the two channels, especially after 2016, with overall highest peaks occurring in 2020. The two stations discuss identical topics in drastically distinct contexts in 2020, to the extent that there is barely any linguistic overlap in how identical keywords are contextually discussed. Further, we demonstrate at-scale, how such partisan division in broadcast media language significantly shapes semantic polarity trends on Twitter (and viceversa), empirically linking for the first time, how online discussions are influenced by televised media. We show how the language characterizing opposing media narratives about similar news events on TV can increase levels of partisan discourse online. To this end, our work has implications for how media polarization on TV plays a significant role in impeding rather than supporting online democratic discourse. Introduction Mass media plays a vital role in democratic processes by influencing how institutions operate, political leaders communicate, and most importantly, how citizens engage in politics (McLeod, Scheufele, and Moy 1999). Although it is no surprise that America’s two political divides speak different languages (Westfall et al. 2015), research has also shown that partisan language in news media has sharply increased in recent years, particularly in broadcast news (Horning 2018). This is concerning given that news consumption is critical for helping the public understand the events around them. According to Agenda Setting Theory, the language used by the media to frame and present current events impacts how the public perceives what issues are important (McCombs 1997; Russell Neuman et al. 2014). While some may have the impression that mainstream legacy media is decreasing in relevancy amid the explosive growth of online news via websites and social media, American news consumption is still overwhelmingly from television, accounting for nearly five times as much as online news consumption across the public (Allen et al. 2020). Despite the notion that TV news consumption is more “passive” than reading the news, research shows that people tend to recall televised news better than online news (Eveland, Seo, and Marton 2002). Further, a recent study comparing TV vs. internet news consumption found that there are four times as many Americans who are partisan-segregated via TV than via online news. In fact, TV news audiences are several times more likely to maintain their partisan news diets overtime, and are much narrower in their sources while even partisan online news readers tend to consume from a variety of sources (Muise et al. 2022). Yet studies on media polarization and ensuing public discourse are overwhelmingly based on online content (Garimella et al. 2021). For example, even research that analyzes data from traditional news outlets, solely relies on tweets from the official Twitter accounts of newspapers, TV shows, and radio programs rather than the direct transcription of content from these legacy media sources (Recuero, Soares, and Gruzd 2020). This is due to the fact that unlike online information, legacy media data (e.g., closed captions) are harder to collect, exist in formats incompatible for quick pre-processing (e.g., srt files), and scattered across institutions that lack incentives to share data with academics. Hence, much of how mainstream legacy media affects online discourse is unknown. In that sense, our analysis of a decade’s worth of closed captions from 24hr-broadcast TV news programs from America’s two largest news stations presents a unique opportunity to empirically demonstrate how linguistic polarization in broadcast media has evolved over time, and how it has impacted social media discourse. In this work, we examine how semantic differences in broadcast media language have evolved over the last 11 years between CNN and Fox News (Study 1), what words are characteristic of the semantic polarity peaks in broadcast media language (Study 2), whether semantic polarity in TV news language forecasts polarization trends in social media discourse, and how language plays a role in driving relational patterns from one to the other (Study 3). In Study 1, we leverage techniques in natural language processing (NLP) to develop a method that quantitatively captures how semantic polarization between CNN and Fox News has evolved from 2010 to 2020 by calculating the semantic polarity of how socially important, yet politically divided topics (racism, Black Lives Matter, police, immigration, climate change, and health care) are discussed by the two news channels. We then use a model interpretation technique in deep learning to linguistically unpack what may be driving these spikes by extracting contextual tokens that are most predictive of how each station discusses topical keywords in 2020 (Study 2). To investigate whether partisan trends in broadcast media language influence polarity patterns in social media discourse, we use Granger causality to test whether and how semantic polarization between the two TV news stations forecasts polarization across Twitter audiences replying to @CNN and @FoxNews (Study 3). Finally to understand the language that drives the Granger-causal relations in how semantic polarity trends in televised news affect that across Twitter users (and vice-versa), we identify tokens that are most predictive of how topical keywords are discussed on TV vs. Twitter, separated by lag lengths that correspond to Granger-causality significance. Our contributions are as follows: • We provide a novel framework for quantifying semantic polarization between two entities by considering the temporality associated with how semantic polarization evolves over time. Prior research that quantifies polarization as an aggregate measure from a single longitudinal data dump often leaves out key temporal dynamics and contexts around how polarity unfolds across time. Our framework incorporates temporal fluctuations by computing diachronic shifts using contextual word embeddings with temporal features. • In showing how semantic polarization in broadcast media has evolved over the last 11 years, we go beyond providing a mere quantification of polarization as a metric by using Integrated Gradients to identify attributive tokens as a proxy to understand the contextual language that drives the 2020 ascent in semantic polarity between the two news stations. • We address the question of whether and how polarization in televised news language forecasts semantic polarity trends across Twitter, providing new evidence around how online audiences are shaped in their discourse by TV news language — an important link that has not been empirically established at-scale in prior research. • Finally, we use model interpretation to extract lexical features from different entities, to show which words drive significant Granger-causal patterns in how broadcast media language shapes Twitter discourse and vice-versa, thereby highlighting the manner in which language plays a key role in driving semantic polarity relations between online discussions and broadcast media language. Our findings are one of the first to quantify how language characterizing opposing media narratives about similar news events on TV can increase levels of partisan discourse online. Results from this work lend support to recent scholarship in communications research, which theorizes that both media and public agendas can influence each other, and that such dynamics can polarize the manner in which the public engages in discourse, thereby influencing democratic decision-making at-large. This paper is available on arxiv under CC 4.0 license. Authors: (1) Xiaohan Ding, Department of Computer Science, Virginia Tech, (e-mail: xiaohan@vt.edu); (2) Mike Horning, Department of Communication, Virginia Tech, (e-mail: mhorning@vt.edu); (3) Eugenia H. Rho, Department of Computer Science, Virginia Tech, (e-mail: eugenia@vt.edu ). Authors: Authors: (1) Xiaohan Ding, Department of Computer Science, Virginia Tech, (e-mail: xiaohan@vt.edu); (2) Mike Horning, Department of Communication, Virginia Tech, (e-mail: mhorning@vt.edu); (3) Eugenia H. Rho, Department of Computer Science, Virginia Tech, (e-mail: eugenia@vt.edu ). Table of Links Abstract and Introduction Abstract and Introduction Related Work Related Work Study 1: Evolution of Semantic Polarity in Broadcast Media Language (2010-2020) Study 1: Evolution of Semantic Polarity in Broadcast Media Language (2010-2020) Study 2: Words that Characterize Semantic Polarity between Fox News & CNN in 2020 Study 2: Words that Characterize Semantic Polarity between Fox News & CNN in 2020 Study 3: How Semantic Polarization in Broadcast Media Language Forecasts Semantic Polarity in Social Media Discourse Study 3: How Semantic Polarization in Broadcast Media Language Forecasts Semantic Polarity in Social Media Discourse Discussion and Ethics Statement Discussion and Ethics Statement Appendix and References Appendix and References Abstract With the growth of online news over the past decade, empirical studies on political discourse and news consumption have focused on the phenomenon of filter bubbles and echo chambers. Yet recently, scholars have revealed limited evidence around the impact of such phenomenon, leading some to argue that partisan segregation across news audiences cannot be fully explained by online news consumption alone and that the role of traditional legacy media may be as salient in polarizing public discourse around current events. In this work, we expand the scope of analysis to include both online and more traditional media by investigating the relationship between broadcast news media language and social media discourse. By analyzing a decade’s worth of closed captions (2.1 million speaker turns) from CNN and Fox News along with topically corresponding discourse from Twitter, we provide a novel framework for measuring semantic polarization between America’s two major broadcast networks to demonstrate how semantic polarization between these outlets has evolved (Study 1), peaked (Study 2) and influenced partisan discussions on Twitter (Study 3) across the last decade. Our results demonstrate a sharp increase in polarization in how topically important keywords are discussed between the two channels, especially after 2016, with overall highest peaks occurring in 2020. The two stations discuss identical topics in drastically distinct contexts in 2020, to the extent that there is barely any linguistic overlap in how identical keywords are contextually discussed. Further, we demonstrate at-scale, how such partisan division in broadcast media language significantly shapes semantic polarity trends on Twitter (and viceversa), empirically linking for the first time, how online discussions are influenced by televised media. We show how the language characterizing opposing media narratives about similar news events on TV can increase levels of partisan discourse online. To this end, our work has implications for how media polarization on TV plays a significant role in impeding rather than supporting online democratic discourse. Introduction Mass media plays a vital role in democratic processes by influencing how institutions operate, political leaders communicate, and most importantly, how citizens engage in politics (McLeod, Scheufele, and Moy 1999). Although it is no surprise that America’s two political divides speak different languages (Westfall et al. 2015), research has also shown that partisan language in news media has sharply increased in recent years, particularly in broadcast news (Horning 2018). This is concerning given that news consumption is critical for helping the public understand the events around them. According to Agenda Setting Theory, the language used by the media to frame and present current events impacts how the public perceives what issues are important (McCombs 1997; Russell Neuman et al. 2014). While some may have the impression that mainstream legacy media is decreasing in relevancy amid the explosive growth of online news via websites and social media, American news consumption is still overwhelmingly from television, accounting for nearly five times as much as online news consumption across the public (Allen et al. 2020). Despite the notion that TV news consumption is more “passive” than reading the news, research shows that people tend to recall televised news better than online news (Eveland, Seo, and Marton 2002). Further, a recent study comparing TV vs. internet news consumption found that there are four times as many Americans who are partisan-segregated via TV than via online news. In fact, TV news audiences are several times more likely to maintain their partisan news diets overtime, and are much narrower in their sources while even partisan online news readers tend to consume from a variety of sources (Muise et al. 2022). Yet studies on media polarization and ensuing public discourse are overwhelmingly based on online content (Garimella et al. 2021). For example, even research that analyzes data from traditional news outlets, solely relies on tweets from the official Twitter accounts of newspapers, TV shows, and radio programs rather than the direct transcription of content from these legacy media sources (Recuero, Soares, and Gruzd 2020). This is due to the fact that unlike online information, legacy media data (e.g., closed captions) are harder to collect, exist in formats incompatible for quick pre-processing (e.g., srt files), and scattered across institutions that lack incentives to share data with academics. Hence, much of how mainstream legacy media affects online discourse is unknown. In that sense, our analysis of a decade’s worth of closed captions from 24hr-broadcast TV news programs from America’s two largest news stations presents a unique opportunity to empirically demonstrate how linguistic polarization in broadcast media has evolved over time, and how it has impacted social media discourse. In this work, we examine how semantic differences in broadcast media language have evolved over the last 11 years between CNN and Fox News (Study 1), what words are characteristic of the semantic polarity peaks in broadcast media language (Study 2), whether semantic polarity in TV news language forecasts polarization trends in social media discourse, and how language plays a role in driving relational patterns from one to the other (Study 3). In Study 1, we leverage techniques in natural language processing (NLP) to develop a method that quantitatively captures how semantic polarization between CNN and Fox News has evolved from 2010 to 2020 by calculating the semantic polarity of how socially important, yet politically divided topics (racism, Black Lives Matter, police, immigration, climate change, and health care) are discussed by the two news channels. We then use a model interpretation technique in deep learning to linguistically unpack what may be driving these spikes by extracting contextual tokens that are most predictive of how each station discusses topical keywords in 2020 (Study 2). To investigate whether partisan trends in broadcast media language influence polarity patterns in social media discourse, we use Granger causality to test whether and how semantic polarization between the two TV news stations forecasts polarization across Twitter audiences replying to @CNN and @FoxNews (Study 3). Finally to understand the language that drives the Granger-causal relations in how semantic polarity trends in televised news affect that across Twitter users (and vice-versa), we identify tokens that are most predictive of how topical keywords are discussed on TV vs. Twitter, separated by lag lengths that correspond to Granger-causality significance. Our contributions are as follows: • We provide a novel framework for quantifying semantic polarization between two entities by considering the temporality associated with how semantic polarization evolves over time. Prior research that quantifies polarization as an aggregate measure from a single longitudinal data dump often leaves out key temporal dynamics and contexts around how polarity unfolds across time. Our framework incorporates temporal fluctuations by computing diachronic shifts using contextual word embeddings with temporal features. • In showing how semantic polarization in broadcast media has evolved over the last 11 years, we go beyond providing a mere quantification of polarization as a metric by using Integrated Gradients to identify attributive tokens as a proxy to understand the contextual language that drives the 2020 ascent in semantic polarity between the two news stations. • We address the question of whether and how polarization in televised news language forecasts semantic polarity trends across Twitter, providing new evidence around how online audiences are shaped in their discourse by TV news language — an important link that has not been empirically established at-scale in prior research. • Finally, we use model interpretation to extract lexical features from different entities, to show which words drive significant Granger-causal patterns in how broadcast media language shapes Twitter discourse and vice-versa, thereby highlighting the manner in which language plays a key role in driving semantic polarity relations between online discussions and broadcast media language. Our findings are one of the first to quantify how language characterizing opposing media narratives about similar news events on TV can increase levels of partisan discourse online. Results from this work lend support to recent scholarship in communications research, which theorizes that both media and public agendas can influence each other, and that such dynamics can polarize the manner in which the public engages in discourse, thereby influencing democratic decision-making at-large. This paper is available on arxiv under CC 4.0 license. This paper is available on arxiv under CC 4.0 license. available on arxiv