This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license.
Authors:
(1) Cristina España-Bonet, DFKI GmbH, Saarland Informatics Campus. Table of Links Abstract and Intro
Corpora Compilation
Political Stance Classification
Summary and Conclusions
Limitations and Ethics Statement
Acknowledgments and References
A. Newspapers in OSCAR 22.01
B. Topics
C. Distribution of Topics per Newspaper
D. Subjects for the ChatGPT and Bard Article Generation
E. Stance Classification at Article Level
F. Training Details Abstract Neutrality is difficult to achieve and, in politics, subjective. Traditional media typically adopt an editorial line that can be used by their potential readers as an indicator of the media bias. Several platforms currently rate news outlets according to their political bias. The editorial line and the ratings help readers in gathering a balanced view of the news. But in the advent of instruction-following language models, tasks such as writing a newspaper article can be delegated to computers. Without imposing a biased persona, where would an AI-based news outlet lie within the bias ratings? In this work, we use the ratings of authentic news outlets to create a multilingual corpus of news with coarse stance annotations (Left and Right) along with automatically extracted topic annotations. We show that classifiers trained on this data are able to identify the editorial line of most unseen newspapers in English, German, Spanish and Catalan. We then apply the classifiers to 101 newspaper-like articles written by ChatGPT and Bard in the 4 languages at different time periods. We observe that, similarly to traditional newspapers, ChatGPT editorial line evolves with time and, being a data-driven system, the stance of the generated articles differs among languages. 1. Introduction Instruction-following language models (ILMs) are omnipresent. Their use is not so extended as that of search engines yet, but due to the availability and high quality of systems and models such as Alpaca (Taori et al., 2023), Bard (Google, 2023), BLOOMZ and mT0 (Muennighoff et al., 2023), ChatGPT (OpenAI, 2023), Llama 2-chat (Touvron et al., 2023), or Koala (Geng et al., 2023), their use is expected to be more common in the near future. These models face several problems being the most relevant the lack of trustworthiness (van Dis et al., 2023; Huang et al., 2023; Wang et al., 2023a). They are not ready to be used as a source of reliable information if their outputs are not fact-checked. A second big issue with systems based on language models (LM) is the fact that they might reproduce the biases present in the training data (Navigli et al., 2023). Biases range from cultural miss-representation due to data imbalance to offensive behaviour reproduced from written texts. LMs are finetuned into ILMs either in a supervised way using input-output pairs and an instruction (Wei et al., 2022; Wang et al., 2022, 2023b) or with reinforcement learning from human feedback (Ouyang et al., 2022; Nakano et al., 2021). In both cases, the finetuning should help removing bias. But neutrality is something very difficult to achieve, also for the humans that generate the supervisory data. The finetuning phase might therefore over correct the original biases or introduce new ones. For methods that generate the supervision data with the LM itself, the original biases might be inherited. We focus on a specific use of ILMs: the writing of newspaper articles. Journals and newspapers follow an editorial line which is in general known to the reader. Besides, sites such AllSides [1] Media Bias Fact Check [2] (MB/FC), or Ad Fontes Media [3] provide ratings about the political bias of (mostly USA) media sources and their quality with respect to factual information. With these ratings, conscientious readers can make informed decisions about which media outlets to choose in order to get a balanced perspective. But what happens when journalists use systems such as ChatGPT or Bard to aid in their writing? As said above, humans also have biases, the danger lies in being unaware of them, as they might affect the user’s/reader’s perspective (Jakesch et al., 2023; Carroll et al., 2023). ChatGPT already warns its users about misinformation. However, the political bias, if any, is not known apart from the subjective perception that a user has. We address the question above for articles generated by ChatGPT and Bard in four languages: English, German, Spanish and Catalan. We do this in an automatic and systematic way with almost no human intervention so that the method can be easily extended to new languages and other ILMs with few effort. We do not aim at classifying individual articles with their specific bias, but to classify the media source (an ILM in this case) as Left or Right-oriented in a similar way as the media bias sites do for newspapers and other media outlets. 1. https://www.allsides.com 2. https://mediabiasfactcheck.com 3. https://adfontesmedia.com This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license. Authors: (1) Cristina España-Bonet, DFKI GmbH, Saarland Informatics Campus. This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license. Authors: Authors: (1) Cristina España-Bonet, DFKI GmbH, Saarland Informatics Campus. Table of Links Abstract and Intro Corpora Compilation Political Stance Classification Summary and Conclusions Limitations and Ethics Statement Acknowledgments and References A. Newspapers in OSCAR 22.01 B. Topics C. Distribution of Topics per Newspaper D. Subjects for the ChatGPT and Bard Article Generation E. Stance Classification at Article Level F. Training Details Abstract and Intro Abstract and Intro Corpora Compilation Corpora Compilation Political Stance Classification Political Stance Classification Summary and Conclusions Summary and Conclusions Limitations and Ethics Statement Limitations and Ethics Statement Acknowledgments and References Acknowledgments and References A. Newspapers in OSCAR 22.01 A. Newspapers in OSCAR 22.01 B. Topics B. Topics C. Distribution of Topics per Newspaper C. Distribution of Topics per Newspaper D. Subjects for the ChatGPT and Bard Article Generation D. Subjects for the ChatGPT and Bard Article Generation E. Stance Classification at Article Level E. Stance Classification at Article Level F. Training Details F. Training Details Abstract Neutrality is difficult to achieve and, in politics, subjective. Traditional media typically adopt an editorial line that can be used by their potential readers as an indicator of the media bias. Several platforms currently rate news outlets according to their political bias. The editorial line and the ratings help readers in gathering a balanced view of the news. But in the advent of instruction-following language models, tasks such as writing a newspaper article can be delegated to computers. Without imposing a biased persona, where would an AI-based news outlet lie within the bias ratings? In this work, we use the ratings of authentic news outlets to create a multilingual corpus of news with coarse stance annotations (Left and Right) along with automatically extracted topic annotations. We show that classifiers trained on this data are able to identify the editorial line of most unseen newspapers in English, German, Spanish and Catalan. We then apply the classifiers to 101 newspaper-like articles written by ChatGPT and Bard in the 4 languages at different time periods. We observe that, similarly to traditional newspapers, ChatGPT editorial line evolves with time and, being a data-driven system, the stance of the generated articles differs among languages. 1. Introduction Instruction-following language models (ILMs) are omnipresent. Their use is not so extended as that of search engines yet, but due to the availability and high quality of systems and models such as Alpaca (Taori et al., 2023), Bard (Google, 2023), BLOOMZ and mT0 (Muennighoff et al., 2023), ChatGPT (OpenAI, 2023), Llama 2-chat (Touvron et al., 2023), or Koala (Geng et al., 2023), their use is expected to be more common in the near future. These models face several problems being the most relevant the lack of trustworthiness (van Dis et al., 2023; Huang et al., 2023; Wang et al., 2023a). They are not ready to be used as a source of reliable information if their outputs are not fact-checked. A second big issue with systems based on language models (LM) is the fact that they might reproduce the biases present in the training data (Navigli et al., 2023). Biases range from cultural miss-representation due to data imbalance to offensive behaviour reproduced from written texts. LMs are finetuned into ILMs either in a supervised way using input-output pairs and an instruction (Wei et al., 2022; Wang et al., 2022, 2023b) or with reinforcement learning from human feedback (Ouyang et al., 2022; Nakano et al., 2021). In both cases, the finetuning should help removing bias. But neutrality is something very difficult to achieve, also for the humans that generate the supervisory data. The finetuning phase might therefore over correct the original biases or introduce new ones. For methods that generate the supervision data with the LM itself, the original biases might be inherited. We focus on a specific use of ILMs: the writing of newspaper articles. Journals and newspapers follow an editorial line which is in general known to the reader. Besides, sites such AllSides [1] Media Bias Fact Check [2] (MB/FC), or Ad Fontes Media [3] provide ratings about the political bias of (mostly USA) media sources and their quality with respect to factual information. With these ratings, conscientious readers can make informed decisions about which media outlets to choose in order to get a balanced perspective. But what happens when journalists use systems such as ChatGPT or Bard to aid in their writing? As said above, humans also have biases, the danger lies in being unaware of them, as they might affect the user’s/reader’s perspective (Jakesch et al., 2023; Carroll et al., 2023). ChatGPT already warns its users about misinformation. However, the political bias, if any, is not known apart from the subjective perception that a user has. We address the question above for articles generated by ChatGPT and Bard in four languages: English, German, Spanish and Catalan. We do this in an automatic and systematic way with almost no human intervention so that the method can be easily extended to new languages and other ILMs with few effort. We do not aim at classifying individual articles with their specific bias, but to classify the media source (an ILM in this case) as Left or Right-oriented in a similar way as the media bias sites do for newspapers and other media outlets. 1. https://www.allsides.com https://www.allsides.com 2. https://mediabiasfactcheck.com https://mediabiasfactcheck.com 3. https://adfontesmedia.com https://adfontesmedia.com

Part of HackerNoon's growing list of open-source research papers, promoting free access to academic material.

Multilingual Coarse Political Stance Classification of Media: Abstract and Intro

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

A Comprehensive Overview of Image Generation Models: From GANs to Diffusion Techniques

Multilingual Coarse Political Stance Classification of Media: Summary and Conclusions

Multilingual Coarse Political Stance Classification of Media: Distribution of Topics per Newspaper

Multilingual Coarse Political Stance Classification of Media: Limitations & Ethics Statement

Multilingual Coarse Political Stance Classification of Media: Corpora Compilation

Multilingual Coarse Political Stance Classification of Media: Acknowledgments and References

A Comprehensive Overview of Image Generation Models: From GANs to Diffusion Techniques

Multilingual Coarse Political Stance Classification of Media: Summary and Conclusions

Multilingual Coarse Political Stance Classification of Media: Distribution of Topics per Newspaper

Multilingual Coarse Political Stance Classification of Media: Limitations & Ethics Statement

Multilingual Coarse Political Stance Classification of Media: Corpora Compilation

Multilingual Coarse Political Stance Classification of Media: Acknowledgments and References

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps