This paper is available on Arxiv under CC 4.0 license.
Authors:
(1) Mykola Makhortykh, Institute of Communication and Media Studies, University of Bern;
(2) Aleksandra Urman, Social Computing Group, University of Zurich;
(3) Roberto Ulloa, GESIS – Leibniz-Institute for the Social Sciences.
Data collection
To investigate how COVID-19 is framed visually via search engines, we collected data from five search engines: Bing, DuckDuckGo, Google, Yandex, and Yahoo. We selected these search engines because they are the ones with the largest share of the international search market (Statcounter 2020) or dominate their major local markets (e.g., Yandex in the case of Russia).
To collect the data, we used a novel algorithmic auditing approach based on large-scale simulation of user browsing behavior via virtual agents (Ulloa, Makhortykh and Urman 2022). Virtual agents are software programs which can mimic user behavior (e.g., by entering URLs in the browser).
Unlike earlier studies (e.g., Unkel and Haim 2019; Hannak et al. 2013) that rely on crowdsourced user data or small-scale simulations of browser behavior, our approach allows scaling the analysis to make it more robust and also permits conducting it in a controlled environment. The latter feature allowed us to investigate how image search results related to COVID-19 are filtered and ranked under a default - i.e., non-personalized - conditions.
The data were collected on February 26, 2020, two weeks before the World Health Organization declared COVID-19 a pandemic. We deployed 83 machines from the Amazon Web Services Frankfurt cluster with each machine hosting two virtual agents (one on Firefox and another on Chrome). The agents imitated three sessions of user browsing behavior and were evenly distributed among the search engines, so each search engine was queried by 33/34 agents.
All agents started each session simultaneously and used image search for the term “coronavirus” in English (session 1), Russian (session 2) and Mandarin Chinese (session 3; simplified characters were used). Then, the agents navigated through the result page(s) to retrieve links to the first 50 images from HTML. Following each session, the browsers were cleaned to prevent previous searches from affecting the latter queries. We removed the data that can be accessed by the search engine’s JavaScript (e.g., cookies) and the data accessible by the browser (e.g., browsing history).
Data analysis
To analyze collected data, we used qualitative framing analysis based on the generic frame typology developed by Semetko and Valkenburg (2000).
The typology includes five types of frames: 1) conflict: frames emphasizing conflict between individuals, groups, or institutions; 2) human interest: frames bringing a human face or an emotional angle to the presentation of an issue/problem; 3) economic consequences: frames reporting an event or problem in terms of its economic consequences; 4) morality: frames putting the event or issue in the context of religious tenets or moral prescriptions; 5) responsibility: frames presenting an issue in such a way as to attribute responsibility for its cause/solution to either the government or to an individual/group.
Similar to Kee, Faridah and Normah (2010), we adopted a set of 20 questions used by Semetko and Valkenburg (2000) to measure the strength of the frames for each individual image we analyzed. Each question could be responded with either “yes” or “no”; the responses were used to measure the strength of the respective frame.
The frame strength was calculated by summing the responses to the related questions (e.g., five questions for the human interest frame and the three questions for the economic one) which were quantified proportionally (e.g., each “yes” answer for one of the five human interest questions added 0.2 to the strength of the human interest frame for the respective image as it contained five questions, whereas for the economic frame questions each “yes” answer contributed 0.33 proportional to three questions).
For the analysis, we looked at the 50 most frequent images among the search results acquired by the agents querying the same search engine. These images were coded by the three authors of the chapter. The coding was then checked for consistency by one of the coders and the disagreements between the coders were discussed and consensus-coded.
This paper is available on Arxiv under a CC 4.0 license.