This paper is available on arxiv under CC 4.0 license.
Authors:
(1) Jeremiah Milbauer, Carnegie Mellon University, Pittsburgh PA, USA (email: {jmilbaue | sherryw}@cs.cmu.edu);
(2) Ziqi Ding, Carnegie Mellon University, Pittsburgh PA, USA (e-mail: {ziqiding | zhijinw}@andrew.cmu.edu)
(3) Tongshuang Wu, Carnegie Mellon University, Pittsburgh PA, USA.
We presented a novel framework for sensemaking within a cluster of documents. We applied this framework to news articles, building NEWSSENSE, an interactive tool that links claims within one document to supporting or contradicting evidence across the entire document cluster. NEWSSENSE assists readers by helping them to understand the connections and perspectives across many documents. Readers can thus attain a more comprehensive understanding of a given subject, while avoiding the dangers of information overload. Crucially, NEWSSENSE provides a framework for reference-free fact verification, which is essential in domains such as the news where events evolve in real time, because a knowledge source for factual grounding may not be available.
Our work expands the growing body of literature on natural language processing applications to document-level sensemaking by demonstrating the utility of automatically generated cross-document links, as well as the application of sensemaking tools to the news reading experience.
NewsSense falls within the genre of computer science literature that aims to solve problems such as misinformation. A broad critique of this literature is that it falls within the realm of techno-solutionism, in the sense that we seek to develop technological solutions to problems that are potentially social in origin, and perhaps better solved with a socially-oriented approach.
However, we posit that because the problem of misinformation propagation and newsmedia overload are both enabled by technology, we do have a responsibility to explore the ability of technological systems to address these challenges. Unlike approaches that involve traditional fact verification, the reference-free approach of NewsSense does not take on the role of deciding what is true and what is not; it simply helps users understand the context of each claim, and make their own decisions.
Beyond this critique, we have also understand there are potential obstacles to the use of a system like NewsSense. The people who choose to use a system such as NewsSense may already be predisposed to consider and critically evaluate diverse perspectives in the news; NewsSense may not be adopted by who needs it most. We also consider that the highlighted links may clutter the reading experience, but we believe this concern is mitigated by the fact that news websites are already quite cluttered (by ads, sponsored links, and article thumbnails) and that users found the highlights helpful in identifying the key components of the articles.
Farzindar Atefeh and Wael Khreich. 2015. A survey of techniques for event detection in twitter. Computational Intelligence, 31(1):132–164.
Eytan Bakshy, Solomon Messing, and Lada A Adamic. 2015. Exposure to ideologically diverse news and opinion on facebook. Science, 348(6239):1130– 1132.
Yochai Benkler, Robert Faris, and Hal Roberts. 2018. Network propaganda: Manipulation, disinformation, and radicalization in American politics. Oxford University Press.
Christos Bouras and Vassilis Tsogkas. 2012. A clustering technique for news articles using wordnet. Knowledge-Based Systems, 36:115–128.
Samuel R Bowman, Gabor Angeli, Christopher Potts, and Christopher D Manning. 2015. A large annotated corpus for learning natural language inference. arXiv preprint arXiv:1508.05326.
Sihao Chen, Senaka Buthpitiya, Alex Fabrikant, Dan Roth, and Tal Schuster. 2022. Propsegment: A large-scale corpus for proposition-level segmentation and entailment recognition. arXiv preprint arXiv:2212.10750.
Yimin Chen, Niall J Conroy, and Victoria L Rubin. 2015. Misleading online content: recognizing clickbait as" false news". In Proceedings of the 2015 ACM on workshop on multimodal deception detection, pages 15–19.
Robert Faris, Hal Roberts, Bruce Etling, Nikki Bourassa, Ethan Zuckerman, and Yochai Benkler. 2017. Partisanship, propaganda, and disinformation: Online media and the 2016 us presidential election. Berkman Klein Center Research Publication, 6.
Johan Farkas and Jannick Schou. 2019. Post-truth, fake news and democracy: Mapping the politics of falsehood. Routledge.
Seth Flaxman, Sharad Goel, and Justin M Rao. 2016. Filter bubbles, echo chambers, and online news consumption. Public opinion quarterly, 80(S1):298– 320.
Elena L. Glassman, Janet Sung, Katherine Qian, Yuri Vishnevsky, and Amy Zhang. 2020. Triangulating the news: Visualizing commonality and variation across many news stories on the same event.
Tim Groseclose and Jeffrey Milyo. 2005. A measure of media bias. The quarterly journal of economics, 120(4):1191–1237.
Felix Hamborg, Karsten Donnay, and Bela Gipp. 2019. Automated identification of media bias in news articles: an interdisciplinary literature review. International Journal on Digital Libraries, 20(4):391–415.
Andrew Head, Kyle Lo, Dongyeop Kang, Raymond Fok, Sam Skjonsberg, Daniel S. Weld, and Marti A. Hearst. 2021. Augmenting scientific papers with justin-time, position-sensitive definitions of terms and symbols. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, CHI ’21, New York, NY, USA. Association for Computing Machinery.
Robert Iv, Alexandre Passos, Sameer Singh, and MingWei Chang. 2022. Fruit: Faithfully reflecting updated information in text. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3670–3686.
Hyeonsu Kang, Joseph Chee Chang, Yongsung Kim, and Aniket Kittur. 2022. Threddy: An interactive system for personalized thread-based exploration and organization of scientific literature. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology, UIST ’22, New York, NY, USA. Association for Computing Machinery.
Yuta Koreeda and Christopher D Manning. 2021. Contractnli: A dataset for document-level natural language inference for contracts. arXiv preprint arXiv:2110.01799.
Giridhar Kumaran and James Allan. 2004. Text classification and named entities for new event detection. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pages 297–304.
Philippe Laban and Marti A Hearst. 2017. newslens: building and visualizing long-ranging news stories. In Proceedings of the Events and Stories in the News Workshop, pages 1–9.
David MJ Lazer, Matthew A Baum, Yochai Benkler, Adam J Berinsky, Kelly M Greenhill, Filippo Menczer, Miriam J Metzger, Brendan Nyhan, Gordon Pennycook, David Rothschild, et al. 2018. The science of fake news. Science, 359(6380):1094–1096.
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
Kyle Lo, Joseph Chee Chang, Andrew Head, Jonathan Bragg, Amy X. Zhang, Cassidy Trier, Chloe Anastasiades, Tal August, Russell Authur, Danielle Bragg, Erin Bransom, Isabel Cachola, Stefan Candra, Yoganand Chandrasekhar, Yen-Sung Chen, Evie YuYen Cheng, Yvonne Chou, Doug Downey, Rob Evans, Raymond Fok, Fangzhou Hu, Regan Huff, Dongyeop Kang, Tae Soo Kim, Rodney Kinney, Aniket Kittur, Hyeonsu Kang, Egor Klevak, Bailey Kuehl, Michael Langan, Matt Latzke, Jaron Lochner, Kelsey MacMillan, Eric Marsh, Tyler Murray, Aakanksha Naik, Ngoc-Uyen Nguyen, Srishti Palani, Soya Park, Caroline Paulic, Napol Rachatasumrit, Smita Rao, Paul Sayre, Zejiang Shen, Pao Siangliulue, Luca Soldaini, Huy Tran, Madeleine van Zuylen, Lucy Lu Wang, Christopher Wilhelm, Caroline Wu, Jiangjiang Yang, Angele Zamarron, Marti A. Hearst, and Daniel S. Weld. 2023. The semantic reader project: Augmenting scholarly documents through ai-powered interactive reading interfaces.
Edward Loper and Steven Bird. 2002. Nltk: The natural language toolkit. arXiv preprint cs/0205028.
Jeremiah Milbauer, Annie Louis, Mohammad Javad Hosseini, Alex Fabrikant, Donald Metzler, and Tal Schuster. 2023. LAIT: Efficient multi-segment encoding in transformers with layer-adjustable interaction. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 10251–10269, Toronto, Canada. Association for Computational Linguistics.
Fabio Petroni, Samuel Broscheit, Aleksandra Piktus, Patrick Lewis, Gautier Izacard, Lucas Hosseini, Jane Dwivedi-Yu, Maria Lomeli, Timo Schick, Pierre-Emmanuel Mazaré, et al. 2022. Improving wikipedia verifiability with ai. arXiv preprint arXiv:2207.06220.
Martin F Porter. 1980. An algorithm for suffix stripping. Program, 14(3):130–137.
Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084.
Hal Roberts, Rahul Bhargava, Linas Valiukas, Dennis Jen, Momin M Malik, Cindy Sherman Bishop, Emily B Ndulue, Aashka Dave, Justin Clark, Bruce Etling, et al. 2021. Media cloud: Massive open source collection of global news on the open web.
Tal Schuster, Sihao Chen, Senaka Buthpitiya, Alex Fabrikant, and Donald Metzler. 2022. Stretching sentence-pair nli models to reason over long documents and clusters. arXiv preprint arXiv:2204.07447.
Tal Schuster, Adam Fisch, and Regina Barzilay. 2021. Get your vitamin c! robust fact verification with contrastive evidence. arXiv preprint arXiv:2103.08541.
Alexander Spangher, Xiang Ren, Jonathan May, and Nanyun Peng. 2022. Newsedits: A news article revision dataset and a novel document-level reasoning challenge. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 127–157.
James Thorne, Andreas Vlachos, Christos Christodoulopoulos, and Arpit Mittal. 2018. Fever: a large-scale dataset for fact extraction and verification. arXiv preprint arXiv:1803.05355.
Adina Williams, Nikita Nangia, and Samuel R Bowman. 2017. A broad-coverage challenge corpus for sentence understanding through inference. arXiv preprint arXiv:1704.05426.
Xinyi Zhou and Reza Zafarani. 2020. A survey of fake news: Fundamental theories, detection methods, and opportunities. ACM Computing Surveys (CSUR), 53(5):1–40.