paint-brush
Understanding the Privacy Risks of Popular Search Engine Advertising Systems: Abstract and Introby@browserology
114 reads

Understanding the Privacy Risks of Popular Search Engine Advertising Systems: Abstract and Intro

tldt arrow

Too Long; Didn't Read

A new study finds that privacy-focused search engines fail to protect users’ privacy when clicking ads.
featured image - Understanding the Privacy Risks of Popular Search Engine Advertising Systems: Abstract and Intro
Browserology: Study & Science of Internet Browsers HackerNoon profile picture

This paper is available on arxiv under CC0 1.0 DEED license.

Authors:

(1) Salim Chouaki, LIX, CNRS, Inria, Ecole Polytechnique, Institut Polytechnique de Paris;

(2) Oana Goga, LIX, CNRS, Inria, Ecole Polytechnique, Institut Polytechnique de Paris;

(3) Hamed Haddadi, Imperial College London, Brave Software;

(4) Peter Snyder, Brave Software.

ABSTRACT

We present the first extensive measurement of the privacy properties of the advertising systems used by privacy-focused search engines. We propose an automated methodology to study the impact of clicking on search ads on three popular private search engines which have advertising based business models: StartPage, Qwant, and DuckDuckGo, and we compare them to two dominant data-harvesting ones: Google and Bing. We investigate the possibility of third parties tracking users when clicking on ads by analyzing firstparty storage, redirection domain paths, and requests sent before, when, and after the clicks.


Our results show that privacy-focused search engines fail to protect users’ privacy when clicking ads. Users’ requests are sent through redirectors on 4% of ad clicks on Bing, 86% of ad clicks on Qwant, and 100% of ad clicks on Google, DuckDuckGo, and StartPage. Even worse, advertising systems collude with advertisers across all search engines by passing unique IDs to advertisers in most ad clicks. These IDs allow redirectors to aggregate users’ activity on ads’ destination websites in addition to the activity they record when users are redirected through them. Overall, we observe that both privacy-focused and traditional search engines engage in privacy-harming behaviors allowing cross-site tracking, even in privacy-enhanced browsers.


CCS CONCEPTS

• Security and privacy → Privacy protections; Privacy protections; • Networks → Network measurement; Network measurement.

KEYWORDS

Search engines, advertising systems, cross-site tracking, privacy, measurement.


ACM Reference Format:


Salim Chouaki, Oana Goga, Hamed Haddadi, and Peter Snyder. 2023. Understanding the Privacy Risks of Popular Search Engine Advertising Systems. In Proceedings of the 2023 ACM Internet Measurement Conference (IMC ’23), October 24–26, 2023, Montreal, QC, Canada. ACM, New York, NY, USA, 15 pages. https://doi.org/10. 1145/3618257.3624823

1 INTRODUCTION

Privacy-focused search engines such as DuckDuckGo, StartPage, and Qwant [3, 9, 10] promote a strategy of respecting users’ privacy and promise not to track users’ search and browsing behavior, all while delivering relevant search results. However, private search engines rely on advertising for revenue, and use traditional advertising platforms to deliver ads: DuckDuckGo and Qwant use Microsoft’s advertising system, while StartPage uses Google’s advertising system. These search engines are often ambiguous on the privacy properties of the ads that appear on their search page, and their consequent privacy properties remain unexplored to the best of our knowledge.


In this work, we aim to fill this gap by conducting the first study of the privacy properties of the advertising systems of three major privacy-focused search engines: DuckDuckGo, StartPage, and Qwant, and how they compare to more popular search engines: Bing and Google. We investigate the privacy properties of these search engines when they: (i) present search ads to users, (ii) when a user clicks on an ad, and (iii) when the user lands on the advertiser’s page.


We implement an automated measurement methodology to measure if and how users can be re-identified (hence, their privacy is compromised) when clicking on search ads on each search engine (see Section 3). We build an opensource implementation of this methodology in the form of a Puppeteer-based pipeline that simulates search queries and ad clicks. We apply this crawling methodology to the five search engines, providing a full dataset with visited websites, cookies created, locally stored values, and web requests to search engines’ servers and/or other third parties when clicking ads. We use filter rules from several major open-source lists to detect web requests to online trackers, and we propose a methodology to differentiate user identifiers from non-tracking values in query parameters and cookies values.


We then present in Section 4 a systematic analysis of our dataset to investigate privacy harms before clicking an ad, during clicking an ad, and after clicking an ad and reaching the advertiser’s website. We find that users’ privacy is not harmed until users click on an ad. Privacy-focused search engines do not appear to attempt to re-identify users across visits or queries and do not include resources from, or make network requests to known trackers. However, we find that users’ privacy is compromised by all studied search engines in various ways once users click on an ad.


Disappointingly, we find that all search engines record additional information about the user and/or the users’ clicks after the user has clicked on an ad. Private search engines capture data related to the clicked ad, including the ad provider, destination URL, and the ad’s position within the search results page, along with the user’s browsing data, such as the search query, device type, and browser language. Private search engines do not store user identifiers upon ad clicks, in contrast to traditional search engines that record user identifying values. Furthermore, we find that all search engines engage in navigation-based tracking. Navigation based tracking refers to tracking techniques that are redirecting users through one or more redirectors when navigating from one website to another in order to share user information across sites [33]. Navigation-based tracking does not require third-party cookies and can be used to circumvent browsers’ privacy protections from cross-site tracking using partitioned cookies storage. Alarmingly, we observe that

privacy-focused search engines engage in more navigationbased tracking than non-privacy-focused ones: We observe navigational tracking on 4% ad clicks on Bing, on 100% ad clicks on Google, on 100% ad clicks on DuckDuckGo, on 86% ad clicks on Qwant, and on 100% ad clicks on StartPage.


On the destination page, we check whether the search engine requires advertisers to abide by privacy-respecting practices by measuring whether advertisers include trackers or other known privacy-harming resources. We found that 93% of ads destination pages (across all five search engines) included tracker and privacy-harming resources. Finally, we check whether search engines or redirectors aid advertisers in profiling visitors by measuring the data they receive in the form of user-describing query params. We find that advertisers receive user identifiers in 68%, 92%, and 53% of cases for DuckDuckGo, StartPage, and Qwant, respectively. This practice, known as UID smuggling, enables redirectors to aggregate more user behavior data if they have scripts on the

ads’ destination websites and they store the user-identifying parameters they receive. Notably, in the case of private search engines, the user-identifying parameters are not set by the search engine but by the redirectors encountered between the search engine’s and the advertiser’s sites.


Our results indicate that private search engines’ privacy protections do not sufficiently cover their advertising systems. Although these search engines refrain from identifying and tracking users and their ad clicks, the presence of ads from Google or Microsoft subjects users to the privacyinvasive practices performed by these two advertising platforms. When users click on ads on private search engines, they are often identified and tracked either by Google, Microsoft, or other third parties, through bounce tracking and UID smuggling techniques. Particularly, advertisers receive unique user identifiers through query parameters in most ad clicks, which can enable cross-site tracking even in privacyenhanced browsers that block third-party cookie tracking.