This paper is available on arxiv under CC 4.0 license.
Authors:
(1) Jeremiah Milbauer, Carnegie Mellon University, Pittsburgh PA, USA (email: {jmilbaue | sherryw}@cs.cmu.edu);
(2) Ziqi Ding, Carnegie Mellon University, Pittsburgh PA, USA (e-mail: {ziqiding | zhijinw}@andrew.cmu.edu)
(3) Tongshuang Wu, Carnegie Mellon University, Pittsburgh PA, USA.
NEWSSENSE provides an intuitive and effective interface for integrating information from a large cluster of news articles into a single, focused reading experience. Although applied in this demo to news articles, the NEWSSENSE framework could just as easily be applied to the analysis of other types of document clusters as well. The pipeline itself is highly modular, and can easily adopt advancements in NLP technologies to increase the accuracy or decrease processing time.
The generality of the NEWSSENSE also introduces a number of opportunities for future development.
Expanding the Scope of NewsReader Often, articles contain references to past events. In the future, we would like to explore the possibility of extending the NEWSSENSE framework beyond the immediately article clusters to include all relevant articles in a timeline of events.
Additionally, as we explored the NEWSSENSE framework, we noticed that the clustering approach we used – the Google News Stories – sometimes established associations between source news articles, and background primary source articles. As a result, we would encourage further exploration of the NewsSense framework applied to heterogenous and primary-source document collections, whcih might include primary scholarly literature.
NLP Pipeline Improvements Because the NEWSSENSE pipeline is modular, a number of improvements can be explored. For sentence segmentation, methods that use a fine-tuned language model could improve the speed of segmentation. For sentence filtering, we relied on a pretrained sentence retrieval model – but future iterations of NEWSSENSE could use a sentence retrieval model fine-tuned on “unrelated" pairs of MNLI sentences. We also found that in many cases, the NLI algorithm used for claim linking made mistakes, perhaps owing to the fact that news articles may not perfectly match models trained on MNLI. Other NLI approaches could be explored, such as SeNtLI (Schuster et al., 2022) (which is designed to work for both individual premises and longer sentences) and LAIT (Milbauer et al., 2023), which speeds up inference time through late interaction.
More Useful Information Our final version of NEWSSENSE focused on a relatively paired-down and streamlined interface. However, users did suggest that they would like to see article summaries, and we identified that in many cases key information is repeated across multiple articles. We would consider adding a way for NEWSSENSE to convey the highlights – key claims from across the article cluster – when a user is reading an article. We noticed other forms of unintended but incredibly useful functionality: For example, as stories develop, new facts emerge that may contradict old ones. This means that newer articles might supersede older ones. Future iterations of NEWSSENSE should help readers understand when a contradiction may be due to evolving stories.
Deployment A larger-scale user study would help determine what further improvements could be made to the framework. Our fully interactive interface would help us run a study at larger scale.