paint-brush
How We Pay For Free Websites With Our Privacyby@TheMarkup
233 reads

How We Pay For Free Websites With Our Privacy

by The MarkupAugust 28th, 2021
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

An investigation by The Markup found 21 different ad-tech companies tracked visitors to the site, sending possible signals about people’s gender identities to advertisers. Some site operators are in the dark about who is watching or what marketers do with the data. Blacklight reveals the trackers loading on any site—including methods created to thwart privacy-protection tools or watch your every scroll and click. We found more than 12,000 websites loaded scripts that watch and record user interactions on all pages.

People Mentioned

Mention Thumbnail
Mention Thumbnail

Companies Mentioned

Mention Thumbnail
Mention Thumbnail

Coin Mentioned

Mention Thumbnail
featured image - How We Pay For Free Websites With Our Privacy
The Markup HackerNoon profile picture

Trackers piggybacking on website tools leave some site operators in the dark about who is watching or what marketers do with the data
By Aaron Sankin and Surya Mattu

Kara Zajac said SPART*A, a small nonprofit serving transgender military service members and veterans, helped her begin her transition while in the Navy. To give back, she volunteered to build the group’s website in her spare time after leaving the military—and kept her eye on a key value: privacy.

“I don’t track users,” Zajac said. “Not everyone in the military is wanting to be known for being trans. They might not be out yet. So any time we can protect privacy in that way, we try to do it.”

She said she only allowed three trackers on spartapride.org: cookies from Twitter and Facebook that accompany their “like” buttons on the site, and one from Disqus, a commenting platform she got through a prepackaged website theme she bought off the internet for $59 to build the site.

But when The Markup scanned spartapride.org using our new instant privacy inspector, Blacklight, we found 21 different ad-tech companies tracked visitors to the site, sending possible signals about people’s gender identities to advertisers—without the users’ knowledge or consent.

Among them were the marketing and advertising arms of Google, Amazon, and Oracle’s BlueKai consumer data division, which reported a massive data exposure this summer, leaving billions of records—including personally identifiable information—accessible to the open internet without a password. Oracle did not respond to questions about whether data gathered from spartapride.org’s users was included in the exposure.

The trackers loaded because Disqus sells ads on the free version of its commenting portal, and that ad space comes with third-party tracking. Disqus discloses those trackers on its own website, but the company wouldn’t comment about tracking SPART*A’s users.

Zajac was floored when The Markup showed her how many trackers appeared on the site. She said she learned a hard lesson: “If it’s free, that doesn’t mean it’s free. It just means it doesn’t cost money.” Instead, it costs your website visitors’ privacy.  

SPART*A agreed to a Disqus cookie—but didn’t know about the trackers from 21 other companies

An array of free website-building tools, many offered by ad-tech and ad-funded companies, has led to a dizzying number of trackers loading on users’ browsers, even when they visit sites where privacy would seem paramount, an investigation by The Markup has found. Some load without the website operators’ explicit knowledge—or disclosure to users.

Website operators may agree to set cookies—small strings of text that identify you—from one outside company. But they are not always aware that the code setting those cookies can also load dozens of other trackers along with them, like nesting dolls, each collecting user data.

To investigate the pervasiveness of online tracking, The Markup spent 18 months building a one-of-a-kind free public tool that can be used to inspect websites for potential privacy violations in real time. Blacklight reveals the trackers loading on any site—including methods created to thwart privacy-protection tools or watch your every scroll and click.

We scanned more than 80,000 of the world’s most popular websites with Blacklight and found more than 5,000 were “fingerprinting” users, identifying them even if they block third-party cookies.

We also found more than 12,000 websites loaded scripts that watch and record all user interactions on a page—including scrolls and mouse movements. It’s called “session recording” and we found a higher prevalence of it than researchers had documented before.

Third-party trackers were common on popular websites

More than 200 popular websites used a particularly invasive technique that captures personal information people enter on forms—like names, phone numbers, and passwords—before they hit send. It’s called “key logging” and it’s sometimes done as part of session recording.

One of the websites doing this, SunTrust Bank, sent the user name and password we entered to a third party, Jornaya, which says it encrypts and discards the data it collects. SunTrust spokesman Kyle Tarrance wouldn’t answer questions about the password leak, but insisted that the company keeps “clients’ well-being at the forefront of everything we do.” After we contacted the company, its website stopped sending data to Jornaya.

We scanned hundreds of sensitive sites using Blacklight and found that, even there, tracking was surprisingly common:

  • More than 100 websites serving undocumented immigrants, domestic and sexual abuse survivors, sex workers, and LGBTQ people sent data about their visitors to advertising companies.
  • Eighty U.S. abortion providers loaded third-party trackers on user browsers, some of them sending data to Facebook that ended up in user profiles.
  • Trackers from different companies were communicating with each other to confirm the identity of visitors to a website for victims of sexual violence.
  • Health information websites like Everyday Health and WebMD sent user data about page visits to dozens of marketing companies.
  • The Arizona Department of Child Safety’s page on how to report child abuse sent data about site visitors to six ad tech companies.
  • Various government websites providing information about COVID-19 sent information about the site visitors to advertising companies without users’ knowledge.
  • The Mayo Clinic used key logging to capture information about people’s current medical ailments in pages where they sign up for appointments and clinical trials. Even if people changed their minds and decided not to submit the information, the captured data was still sent to an endpoint on the Mayo Clinic’s server labeled “web forms for marketers/tracking.”

Some of the operators of sensitive sites told The Markup they knew about the tracking, but others said they were unaware of the number of trackers and their pervasiveness—or what happens with the data collected from their users. Most, including The Mayo Clinic, WebMD, and Everyday Health, did not respond to requests for comment.

Some sites’ privacy policies did not disclose the tracking. For instance, the Mayo Clinic did not disclose it was using invasive key logging. And the Arizona Department of Child Safety’s privacy policy said that it doesn’t load cookies to track users—but we found that it did. After we asked about it, the agency added a new link to a “privacy statement” disclosing the cookies.

How Key Logging Captures What You Type

When you type text into a form on the internet, a technique called key logging can scoop it up before you hit send—so it doesn’t matter if you decide not to hit “submit.”

On the left is the information we type into forms, and on the right is a simulation of how Blacklight sees the text being captured by a key logger.

The use of cookies by websites is well known, and most Americans understand how they work. But even some website operators don’t always know how they get there: often from free plug-ins like comments sections, social media sharing buttons, and tools that embed posts from social media—conveniences people have come to expect on the internet but that small website operators don’t have the resources to build themselves.

Marketing and advertising companies are happy to provide these tools for free in exchange for user data, which is used to construct ever-more-refined profiles of internet users.

In other words, website operators are often effectively as blind to exactly what information advertising companies and marketers are collecting from their website visitors—and what they’re doing with the data—as the people browsing the internet are.

“I don’t want to say that the majority of websites don’t fully understand the data they’re collecting, but a large percentage do not,” said Michael Williams, a partner at Clym, a business that brings companies into compliance with online privacy laws like the European Union’s General Data Protection Regulation and the California Consumer Privacy Act.

He said when his firm scans websites, it often finds trackers the website operators did not know existed.

U.K.-based Privacy International found last year that some European mental health websites didn’t always know about the plethora of advertising-related tracking technologies that loaded from their sites onto users’ browsers.

“A lot of the small websites, they just want a website,” said Eliot Bendinelli, a technologist with the group. “They’re just setting up stuff so that people can access information. It might be an intern doing the website or it might be someone who is not aware of all these tracking impressions.”

But even savvy website operators like Zajac, who is studying cybersecurity at George Washington University, can get stung by what they think is a simple add-on, especially when it comes packaged in a suite of products and can be loaded with a few simple clicks.

“It turns it on and you’re like, ‘cool, that worked’ ” she said. “But you don’t realize the implications—of now there being 30 trackers on your website.”

She said whatever data the trackers on Sparta Pride were collecting, the nonprofit was never privy to it. After The Markup showed her the list of trackers that were loading with it, she removed Disqus from the site.

Niveen Saleh, a public relations agent hired by Zeta Global, Disqus’s parent company, said Disqus offers a version without ads or their related trackers to small nonprofits for free. But nowhere on Disqus’s website does it explain how to get it, and neither did Saleh. 

“We do ensure that our publishers have the option to choose to have their data collected,” Saleh said.

Some small website operators say they don’t have much of a choice in the matter. Most of the tools available to build a robust, functional website on the internet have user tracking built into their very functionality. Even giving users the ability to search inside a website comes with strings attached.

“Google Search is a great tool that can be incorporated into a website, but then all searches as conducted by site visitors can be tracked to IP address,” said Fire Erowid of Erowid, the long-running nonprofit psychoactive drug information site. She said her team ended up building a “far worse” search function for the site to protect user privacy.

69% of popular websites loaded Google Analytics trackers

Source: Blacklight scan of more than 80,000 websites ranked by Tranco

Google Analytics trackers loaded on 69 percent of 80,000 popular websites scanned with Blacklight. Google Analytics gives website operators insight into how many people visit a website and which pages. The catch: Google, the world’s largest digital ad seller, also gets the data. The company’s cookie policy allows it to connect that data to the advertising profiles it already has on people, but Google spokesperson Elijah Lawal said it doesn’t do it as a policy unless the website operators agree.

However, in order for website operators to get information from Google Analytics about the demographics of their visitors, they have to allow data collection by Google’s advertising arm, DoubleClick, which adds the information to user profiles.

The second most common tracker we found on popular sites: Facebook. Blacklight found its pixel on a third of popular sites we scanned. Facebook’s trackers can follow you even if you’re not logged in to Facebook and link your browsing history to your profile for ad targeting. Website operators include the pixel to measure clicks from their ads on Facebook’s platforms.

30% of popular websites loaded Facebook trackers

Source: Blacklight scan of more than 80,000 websites ranked by Tranco

One feature commonly available for “free” to website operators shows how an avalanche of trackers can end up on users’ browsers: the suite of social media share buttons offered by AddThis, which was acquired by Oracle in 2016. It allows visitors to websites that load the tool to easily share the page they’re visiting on their own social media feeds and lets site operators track those shares. The company brags on its website that more than 15 million websites have used its free tools and that it reaches “96% of the U.S. web.”

But AddThis isn’t a social media company. It’s a marketing company. The purpose of that free tool is to load cookies and tracking pixels on website visitors’ browsers, sending the data to Oracle’s advertising divisions and dozens of marketing and advertising companies for ad targeting. These load instantly, whether or not the user clicks on the share button. In marketing materials, AddThis says it collects “up to 30 data points per page view” from each website visitor.

AddThis’s privacy policy discloses that the trackers “facilitate online behavioral advertising across the online advertising ecosystem.”

After the European Union implemented its 2018 law requiring informed consent from website visitors before their data can be collected, AddThis said it couldn’t meet that standard and shuttered its audience data business in Europe. AddThis also settled a class action lawsuit in California in 2011 that alleged it was inserting tracking cookies on sites without notifying users. The company agreed to pay a monetary settlement but did not acknowledge wrongdoing.

Using Blacklight, The Markup found AddThis trackers on more than 4,000 popular websites and four states’ coronavirus information pages: Arkansas, California, Louisiana, and Minnesota. None of the states disclosed it in their privacy policies.

Officials from California and Minnesota did not answer questions about what data the trackers were collecting and for what purpose.

Arkansas and Louisiana officials said they used AddThis social share buttons for user convenience. Louisiana’s spokesperson said they were unaware of the additional trackers on the site before The Markup brought it to their attention. Both removed the code after The Markup contacted them for comment.

“Ad trackers like this are not necessary for fulfilling our mission,” said Gavin Lesnick, a spokesperson for the Arkansas Department of Health, when we asked whether sharing visitor data with marketing companies was appropriate.

AddThis’s button wasn’t visible on that site when Blacklight scanned it, and Lesnick said it was part of an expired campaign, but the code containing the trackers had remained as a relic. We found it was still sending data to advertising companies until we contacted him.

The Markup also found AddThis’s trackers on websites for nonprofit groups that would have reason to protect user privacy: those that provide resources to undocumented immigrants, domestic violence survivors, and the LGBTQ community. They all had social share buttons on their sites.

Chad Sniffen of The National Sexual Violence Resource Center said he had no idea his site was loading trackers from AddThis until he was contacted by The Markup.

Sniffen told The Markup that he was only aware of loading a single tracker, from Google Analytics, which he uses to see what content is popular in order to serve people better.

It turns out that the developer hired to build the center’s website incorporated AddThis’s social sharing tool, and Sniffen was unaware of the implications. As a result, his site was loading trackers from 10 online advertising companies without his knowledge.

One of the trackers loaded on the site by AddThis communicates with ad trackers loaded by Google’s advertising arm, a data triangulation that advertising and marketing companies sometimes use to confirm the identity of visitors to a site.

Oracle did not respond to emails asking how it handles data collected through AddThis from sites serving privacy-sensitive populations, like victims of sexual violence.

AddThis Also Added This: 10 Marketing Trackers on the National Sexual Violence Resource Center’sWebsite

Lawal, the Google spokesperson, said in a written statement that “Google Ads does not build advertising profiles based on sensitive categories, and we have strict policies preventing customers from using such data to target ads.” Those categories include “personal hardships” and “identity and beliefs.”

Sniffen initially worried that disentangling AddThis from his group’s website would take resources away from its goal of funding crucial programs, like training counselors. However, he said he found himself with time on his hands during the pandemic and learned how to strip it out himself.

At some point, you just feel uncomfortable looking for information about your own religion or own sexual preferences.

Frederik Zuiderveen Borgesius, a professor at Radboud University in the Netherlands who has written extensively on online privacy, said the pervasiveness of tracking could wreck one of the foundations of the internet: easy access to information, particularly for those who may have no other way to get it.

“Let’s say you’re a Muslim in India, or a Palestinian in Israel, or a homosexual in Poland,” he said. “At some point, you just feel uncomfortable looking for information about your own religion or own sexual preferences. Or you might be too uneasy about looking for information about sexually transmitted diseases because you fear that your behavior is monitored.

Many advertisers and marketers say their profiles of internet users in most cases aren’t connected to names or other “personally identifiable information” such as mailing addresses, but that doesn’t mean they don’t know who you are. 

“It doesn’t really matter if they know your name or not,” Borgesius said. “There are hundreds of thousands of people sharing the same name, so unique identifiers from a cookie are better identifiers.”

Academic research has repeatedly shown that connecting supposedly anonymous marketing data to a name can be done with relative ease.

The operators of some sensitive sites said they knew their sites load marketing trackers—and they’ve made peace with the trade-off.

Domesticshelters.org connects domestic violence survivors with short-term shelters, and it allows Facebook and AddThis trackers on the site because the social sharing tools help raise awareness, said Chris McMurry, a member of the group’s board of directors.

“It’s not good enough to have a website,” he said. “We have to invest in making sure that what’s on our website is seen by those who need it the most.”

The site also sells ad space on its site, which comes with its own trackers, but the revenue helps him provide vital services. When we scanned Domesticshelters.org with Blacklight, we found trackers from 10 companies.

The Markup’s findings underscore how the web’s foundational profit source, the online advertising industry, is trying to make money from every interaction on the internet—not just the obvious clicks, like visiting retailers.

Data collected from your detailed web browsing habits—what specific pages you visited, for how long, what you did there—can be tied to records of products and services you purchased both online and offline and tied to your identity through things like store consumer loyalty cards. This can then be linked to information collected from an app you downloaded on your smartphone or which movie or show you streamed last night. The profiles are filled with data about each visitor, including presumed interests and geographic location.

Companies claim this data allows them to make predictions about who is ready and able to buy certain products and provide those insights to sellers.

The ad-targeting categories offered by marketing companies can be surprising. The list produced by the Interactive Advertising Bureau, a prominent online ad industry trade group, has included things like “Incest/Abuse Support,” “Substance Abuse,” and “AIDS/HIV.”   After this was reported publicly, the group removed the first category, but the others remain.

Many sites don’t load just one or two trackers—they load dozens of them because of a process called real-time bidding, which allows ads on a site to be personalized to whoever visits it.

When a user visits a page offering real-time ads, advertisers compete with each other for the ad space—in some cases tying users to those data-heavy profiles—in the blink of an eye. Regardless of who wins the auction to show the ad, all bidders are told who visited the site.

$28.69 bilion Expected value of the real-time ad bidding industry by 2024

The global real-time bidding industry was valued at $5.79 billion in 2018 and is expected to swell to $28.69 billion by 2024.

“Americans never agreed to be tracked and have their sensitive information sold to anyone with a checkbook,” a group of federal lawmakers wrote in a letter about real-time bidding to the Federal Trade Commission in July. “This outrageous privacy violation must be stopped and companies that are trafficking in Americans’ illicitly obtained private data should be shut down.”

They asked the agency to open an inquiry. FTC officials declined to say whether they have.

Websites serving people in Europe have had to get their affirmative consent before tracking users since 2018, when the European Union’s privacy law went into effect. Ironically, a 2019 study looking at those consent notifications found they are largely structured to encourage users to agree to tracking they otherwise wouldn’t readily allow and that they offer “no meaningful choice to consumers.”

The California Consumer Privacy Act requires large, for-profit companies doing business in the state to disclose the information its website collects, allow users to opt out of collection, and delete users’ data upon request.

The only federal law specifically requiring websites in the U.S. to disclose user tracking applies only to websites serving children, but the Federal Trade Commission has gone after companies for “deceptive” practices for claiming that they don’t track users when in fact they do.

The Markup found even some government websites don’t disclose tracking, including the U.S. Mint and the Small Business Administration, which we found using a technique called canvas fingerprinting, which can track people who block cookies.

How Canvas Fingerprinting Attempts to Uniquely Identify Users

Canvas fingerprinting tries to get around third-party cookie blockers to uniquely identify visitors to a website.

All major internet browsers, except Chrome, try to counter canvas fingerprinting. Read about how you can prevent it here.

The SBA did not respond to our requests for comment. The Mint’s website has stopped using canvas fingerprinting since we reached out to the agency in late July. Mint spokesperson Michael White insisted in an email that it never used the technique, but we have preserved the code that shows it was.

As for the ad industry’s solutions to online privacy concerns, they have largely centered on allowing people to either opt out of tracking or opt out of being served targeted ads related to that tracking. GoogleOracleFacebook, and online advertising industry groups on both sides of the Atlantic offer some version of those options.

To exercise them, people have to ask each online advertising and marketing company individually and install a cookie on their devices reminding the company in question not to track them in the future.  For some opt outs, the companies require requestors to provide their full name, email, and physical address.

Facebook, for instance, continues to collect data on those who have opted out, spokesperson Alex Dziedzan confirmed. He said it does so for “non-ads” purposes like “measurement, security, integrity, etc.”

It’s not impossible to build a tracker-free website. Encrypted email service ProtonMail, the conservative think tank The American Enterprise Institute, a wiki forum about the cryptocurrency Bitcoin, the website for online censorship circumvention tool Lantern—and The Markup’s website—all came up clean during Blacklight scans.

ProtonMail said it has had to build workarounds, including developing its own anti-fraud system to detect potential credit card abuse before sending user card numbers to its payment processor, Stripe. That was how they got around Stripe’s usual process of collecting payers’ IP and email addresses, said Bart Butler, ProtonMail’s chief technology officer.

It came with a lot of effort on our part.

“We deliberately set up our company so it is not an option for us to sell out our users,” he said. “That has come with both sacrifices … in terms of what we can do and what we can’t do and what we refuse to do. Also, it came with a lot of effort on our part.”

To avoid giving website analytics market leader Google data about every visitor to his website, Butler said Protonmail built proprietary analytics software. Most websites can set up Google Analytics in an hour, he said, but ProtonMail’s system took years to build, cost half a million dollars in server hardware costs alone and requires a permanent full-time staff to continue to maintain it.

“It shouldn’t be that you have to roll your own if you want to do this stuff,” Butler said. “Somebody who just cares about privacy and needs privacy, but doesn’t have the resources to develop their own, won’t be able to do it.

“Privacy,” he added, “should be something people can care about without selling a privacy product.”

Originally published as "The High Privacy Cost of a “Free” Website" with the Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) license.