This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license.
Authors:
(1) Cristina España-Bonet, DFKI GmbH, Saarland Informatics Campus.
Neutrality is difficult to achieve and, in politics, subjective. Traditional media typically adopt an editorial line that can be used by their potential readers as an indicator of the media bias. Several platforms currently rate news outlets according to their political bias. The editorial line and the ratings help readers in gathering a balanced view of the news. But in the advent of instruction-following language models, tasks such as writing a newspaper article can be delegated to computers. Without imposing a biased persona, where would an AI-based news outlet lie within the bias ratings? In this work, we use the ratings of authentic news outlets to create a multilingual corpus of news with coarse stance annotations (Left and Right) along with automatically extracted topic annotations. We show that classifiers trained on this data are able to identify the editorial line of most unseen newspapers in English, German, Spanish and Catalan. We then apply the classifiers to 101 newspaper-like articles written by ChatGPT and Bard in the 4 languages at different time periods. We observe that, similarly to traditional newspapers, ChatGPT editorial line evolves with time and, being a data-driven system, the stance of the generated articles differs among languages.
Instruction-following language models (ILMs) are omnipresent. Their use is not so extended as that of search engines yet, but due to the availability and high quality of systems and models such as Alpaca (Taori et al., 2023), Bard (Google, 2023), BLOOMZ and mT0 (Muennighoff et al., 2023), ChatGPT (OpenAI, 2023), Llama 2-chat (Touvron et al., 2023), or Koala (Geng et al., 2023), their use is expected to be more common in the near future. These models face several problems being the most relevant the lack of trustworthiness (van Dis et al., 2023; Huang et al., 2023; Wang et al., 2023a). They are not ready to be used as a source of reliable information if their outputs are not fact-checked. A second big issue with systems based on language models (LM) is the fact that they might reproduce the biases present in the training data (Navigli et al., 2023). Biases range from cultural miss-representation due to data imbalance to offensive behaviour reproduced from written texts. LMs are finetuned into ILMs either in a supervised way using input-output pairs and an instruction (Wei et al., 2022; Wang et al., 2022, 2023b) or with reinforcement learning from human feedback (Ouyang et al., 2022; Nakano et al., 2021). In both cases, the finetuning should help removing bias. But neutrality is something very difficult to achieve, also for the humans that generate the supervisory data. The finetuning phase might therefore over correct the original biases or introduce new ones. For methods that generate the supervision data with the LM itself, the original biases might be inherited. We focus on a specific use of ILMs: the writing of newspaper articles. Journals and newspapers follow an editorial line which is in general known to the reader. Besides, sites such AllSides [1] Media Bias Fact Check [2] (MB/FC), or Ad Fontes Media [3] provide ratings about the political bias of (mostly USA) media sources and their quality with respect to factual information. With these ratings, conscientious readers can make informed decisions about which media outlets to choose in order to get a balanced perspective. But what happens when journalists use systems such as ChatGPT or Bard to aid in their writing? As said above, humans also have biases, the danger lies in being unaware of them, as they might affect the user’s/reader’s perspective (Jakesch et al., 2023; Carroll et al., 2023). ChatGPT already warns its users about misinformation. However, the political bias, if any, is not known apart from the subjective perception that a user has.
We address the question above for articles generated by ChatGPT and Bard in four languages: English, German, Spanish and Catalan. We do this in an automatic and systematic way with almost no human intervention so that the method can be easily extended to new languages and other ILMs with few effort. We do not aim at classifying individual articles with their specific bias, but to classify the media source (an ILM in this case) as Left or Right-oriented in a similar way as the media bias sites do for newspapers and other media outlets.