This paper is available on arxiv under CC0 1.0 DEED license.
Authors:
(1) Salim Chouaki, LIX, CNRS, Inria, Ecole Polytechnique, Institut Polytechnique de Paris;
(2) Oana Goga, LIX, CNRS, Inria, Ecole Polytechnique, Institut Polytechnique de Paris;
(3) Hamed Haddadi, Imperial College London, Brave Software;
(4) Peter Snyder, Brave Software.
Search engines and online tracking received a lot of research attention. We review studies closest to our work.
Search engines. A first line of work has measured to which extent we can observe personalization in search engine results [23, 34] and ads [22]. For instance, Hannak et al. [23] have developed a methodology for measuring personalization in search results, applied it to Bing, Google, and DuckDuckGo, and found that Bing results are more personalized than Google ones while they did not notice any personalization for DuckDuckGo. A second line of work has focused on solutions to protect users’ privacy from search engines and prevent web profiling. Castellà-Roca et al. [12] presented a computationally efficient protocol that provides a distorted user profile to the search engine to preserve users’ privacy. Finally, several studies have proposed privacy-preserving search-personalizing solutions for search engines. For instance, Shen et al. [36] analyze various software architectures for personalized search and envision possible strategies with a client-sided personalization. Xu et al. [40] suggest helping users choose the content and degree of detail of the profile information built by search engines. To the best of our knowledge, there is no study investigating the privacy properties of the advertising systems used on private search engines.
Online tracking. Several works analyzed the usage of cross-site tracking techniques in the wild [15]. Chen et al. [13] propose a data flow tracking system to measure user tracking performed through first-party cookies. They found that more than 97% of the websites they have crawled have firstparty cookies set by third-party javascript and that on 57% of them, there is at least one cookie containing a unique user identifier diffused to multiple third parties. Roesner et al. [35] measured how user tracking occurs in the wild. They found that multiple parties track most commercial pages and estimate that several trackers can each capture more than 20% of a user’s browsing behavior. Koop et al. [26] analyzed a dataset of redirection chains in the wild and found that 11% of websites redirect to the same 100 top redirectors. Moreover, they demonstrate that these top redirectors could identify users on the most visited websites. Randall et al. [33] measure the frequency of UID smuggling in the wild and find that it is performed on more than 8% of all navigations in their dataset. We use a similar method to identify user identifiers among all cookie values and query parameters by implementing automatic filtering followed by a manual inspection. All these studies were conducted in the wild, and to the best of our knowledge, no study focuses on navigational tracking techniques performed on search engines.