Intro to Digital Fingerprints: Understanding, Manipulating, and Defending Against Online Tracking

Overview

Digital fingerprinting is a technique used to identify users across different websites based on their unique device and browser characteristics. These characteristics - fingerprint parameters, can include various software, hardware (CPU, RAM, GPU, media devices - cameras, mics, speakers), location, time zone, IP, screen size/resolution, browser/OS languages, network, internet provider-related and other attributes.

The combination of these parameters creates a unique identifier - fingerprint, that can be used to track a user's online activity. Fingerprints play a crucial role in online security, enabling services to identify and authenticate unique users. They also make it possible for users to trick such systems to stay anonymous online. However, if you can manipulate your fingerprints, you can run tens or hundreds or more different accounts to pretend that they are unique, authentic users. While this may sound cool, it has serious implications as it can make it possible to create an army of bots that can spread spam and fakes all over the internet, potentially resulting in fraudulent actions.

Note: Obviously, I won’t discuss here how you can do "bad" things; you must always be careful, stay away from the “dark side,” and avoid committing illegal actions. This article is about the technology behind this, so use the information wisely.

This is the first (and maybe not the last) article in this domain, so it will be more like an overview. The content offers an intro to the topic. There are dozens of tools and ways to detect, collect, and spoof fingerprints, and there are many different parameters and technologies that can reveal or hide your real identity or the fact that you're spoofing fingerprints (meaning you're not an authentic user).

Parameters in Digital Fingerprints

Let's consider some of them, some examples of obvious, simple, or well-known params, some that are rarer, less-know, and difficult to spoof.

User-Agent: this string provides information about the user's browser, operating system, and device.
IP: it reveals the user's network and geo-location. Services use IP addresses for security and to prevent malicious activities. If you use one account from different IPs or many different accounts from the same IP, some services may view this activity (in combination with other params) as suspicious and use some level of bot protection against you. This can also be triggered if you use IPs that are already used or in some kind of banned list (proxies).
Browser Plugins and Extensions: Information about installed plugins and extensions can be used to create a unique fingerprint - it helps in identifying users based on their browser's additional functionalities.
Screen Resolution and Color Depth: different users often have different display characteristics.
Timezone and Language Settings: they are important factors in fingerprinting - if you have random values, e.g., Japan timezone Norwegian language this is a bit unusual.
Canvas: it involves rendering hidden graphics in the user's browser to gather information about the graphics hardware. Quite a tricky one because if you spoof your hardware info, then you can’t have the proper canvas value - you can’t render it on your different real hardware.
WebGL Fingerprinting: it exploits the unique capabilities and limitations of the user's graphics hardware in rendering 3D graphics - additional information about the user's device.
Fonts: the list of fonts installed on a user's system can be used as a fingerprinting parameter. This information is accessible through JavaScript. The list of fonts has to be realistic - so e.g. you can’t have Win fonts on macOS or you can’t have just 1 or 2 fonts, this is a very suspicious indicator.
Battery status API: it allows websites to determine the device's battery level and charging status, the combination of battery attributes can be used for fingerprinting.
Audio Fingerprinting: websites can use the web audio API to generate unique audio fingerprints by analyzing the audio processing characteristics of the device.
Hardware Concurrency: This information includes details about the software and hardware components of the device, such as the graphics card, network adapter, and operating system. This adds to the uniqueness of the fingerprint.
Network Information: This information includes details about the network connection, such as the IP address, ISP, and DNS server.
Open and used ports: some users may use some software that listens to specific ports which also might be useful for some specific cases.

Navigating the Web incognito

Digital fingerprint manipulation is a difficult task that demands a proactive approach to avoid detection. The are lots of strategies and tools to hide your identity (fingerprints) or pretend that you are someone else.

Use privacy-focused browsers

Opting for privacy-focused browsers, such as Brave, Ghostery, Tor, Octo Browser, or Vivaldi with enhanced privacy settings, provides a fundamental defense against common fingerprinting techniques. These browsers prioritize user privacy and incorporate features designed to hide your real fingerprints, making it harder to track your activities. For example, the Tor Browser, grounded in principles of anonymity, routes internet traffic through the Tor network. This strategic routing obscures the user's identity by bouncing connections through a series of volunteer-operated servers, enhancing overall online anonymity.

Browser Extensions

Privacy-centric browser extensions, including AdBlock, uBlock Origin, Privacy Badger, or CanvasBlocker, could be active defenses against tracking scripts, cookies, and fingerprinting attempts. These tools operate in the background, protecting and preserving user anonymity.

VPN and Proxies

The use of VPNs or proxy services adds an additional layer of protection by masking the user's real IPs (location, timezone, language, etc) and encrypting internet traffic. This is not only used to change your fingerprints but also to provide a more anonymous online presence and a higher security level.

Motivations behind fingerprint spoofing

Understanding the Whys

Privacy: Individuals concerned about personal privacy and the growing digital surveillance often spoof fingerprints to shield themselves from relentless online tracking.

Geo-restriction: Spoofing fingerprints prove useful for circumventing geo-restrictions, allowing users to access content restricted to specific regions. VPN and proxy services play a crucial role here by not only hiding the user's identity but also providing access to servers in different geographic locations.

Ads: Avoiding targeted advertising and online profiling serves as a compelling motive for individuals to manipulate their digital fingerprints. Some tools can block 3rd-party tracking scripts and cookies, disrupting the profiling process.

Strategic needs in the digital arena, bot detection, and protection systems: Web scraping, multi-accounting, e-commerce, bounty&airdrop, bonus-hunting, social networks bots, and affiliate marketing are often sources of income or bases for medium businesses. Digital agencies, individuals, and influencers engaged in these activities may require fingerprint spoofing to navigate through complicated bot detection systems. Avoiding detection becomes vital for some services, ensuring that legitimate activities are not mistakenly flagged or restricted (though quite often, such activities and accounts are not really legit but pretend to be so). Some of the most popular tools are Multilogin, X-Browser, Octo Browser, AdsPower, Incogniton, Scrapy, Surfsky, Web Scraper.io, ScrappingBee, etc.

Collecting Users' Fingerprints

JavaScript: Websites use JS (obviously) to harvest details about users to construct comprehensive digital portraits. This involves probing screen resolution, device orientation, mouse movements, keystroke dynamics, etc. Sophisticated fingerprinting scripts enumerate a wide range of browser and hardware attributes.

Cookies and local storage: Persistent cookies and data stored in local storage work to track users across sessions and platforms. Techniques include leveraging browser cookies and storing unique identifiers for user tracking. Depending on your purposes you need to know when you need an "empty" browser with unique fingerprints or when you need to trick a service showing that you have particular cookies and the same fingerprints.

3rd-party scripts: The inclusion of 3rd-party scripts for analytics and advertising embeds invisible trackers often used to adjust systems behavior for a specific user based on collected fingerprints. You can use the aforementioned tools and approaches to selectively block or trick such scripts to get the needed behavior.

Understanding Your Online Identity

Online checkers: Panopticlick (EFF), Pixelscan, deviceinfo.me, and BrowserLeaks show your browser's fingerprint parameters such as User-Agent, canvas fingerprinting, fonts, etc. These checkers provide insights into the uniqueness and stability of your digital fingerprint across different browsing sessions.

Browser developer tools: they allow users to inspect network requests, cookies, and other fingerprinting parameters, fostering a deeper understanding of their digital imprint and how websites detect bots and unique legit users and collect fingerprints. Examining the Network and Application tabs in browsers provides a real-time view of the data exchanged between the browser, websites, and servers.

Here are some examples of how you can get some info about users' fingerprints in the browser.

GEO:

navigator.geolocation.getCurrentPosition(function(position) { var userLocation = position.coords; });

Microphone:

navigator.mediaDevices.enumerateDevices() .then(function(devices) { var microphones = devices.filter(device => device.kind === 'audioinput'); });

Camera:

navigator.mediaDevices.enumerateDevices() .then(function(devices) { var cameras = devices.filter(device => device.kind === 'videoinput'); });

Speakers:

navigator.mediaDevices.enumerateDevices() .then(function(devices) { var speakers = devices.filter(device => device.kind === 'audiooutput'); });

Audio:

var audioContext = new (window.AudioContext || window.webkitAudioContext)(); var oscillator = audioContext.createOscillator(); var analyser = audioContext.createAnalyser(); oscillator.connect(analyser); analyser.connect(audioContext.destination);

GPU:

var canvas = document.createElement('canvas'); var gl = canvas.getContext('webgl') || canvas.getContext('experimental-webgl'); var renderer = gl.getParameter(gl.RENDERER); console.log(renderer);

Fonts:

var fonts = []; var fontList = document.fonts.forEach(function(font) { fonts.push(font.family); }); console.log(fonts);

Canvas (hashing):

var canvas = document.createElement('canvas'); var context = canvas.getContext('2d'); var dataURL = canvas.toDataURL(); var canvasHash = MurmurHash3(dataURL); console.log(canvasHash);

RAM and CPU:

function getCPUInfo() { return navigator.hardwareConcurrency || 0; } function getDeviceMemory() { return navigator.deviceMemory || 0; } var cpuInfo = getCPUInfo(); var deviceMemory = getDeviceMemory(); console.log(`CPU Cores: ${cpuInfo}`); console.log(`Device Memory (GB): ${deviceMemory}`);

Bot Detection Systems: Defending Against Automation

A couple of examples of such systems:

Distil Networks: It is a global leader in bot detection and mitigation. It provides a proactive and precise way to verify that legitimate human users are accessing your website, mobile app, and APIs. Bot Detection: Distil Networks uses a combination of techniques to detect bots. It studies a range of variables such as cursor movement, click patterns, and web browsing patterns across other websites. It uses device fingerprints and Are You a Human technology, which checks all visitors against hundreds of different characteristics, focusing on their behavior. It actively pulls additional data from the browser to identify devices with precision. When a browser request comes in, Distil interrogates the headers to see if the visitor is lying about their identity. This unique identifier can be used to fully or partially identify individual devices even when cookies cannot be read or stored in the browser, the client IP address is hidden, or one switches to another browser on the same device. It uses machine learning to detect biometric patterns in mouse activity and scrolling, as well as looking at browsers, devices, and other factors for clues. It can detect many simple or medium complexity bots that run JS from a web page. Resource Protection: Distil Networks defends against web scraping, competitive data mining, account takeovers, transaction fraud, unauthorized vulnerability scans, spam, click fraud, denial of service, and API abuse. It can automatically block 99.9% of malicious traffic without impacting legitimate users. It also offers Distil Bot Defense for Web and API, which protect your website and API servers respectively.
Imperva: It's a comprehensive cybersecurity platform that integrates fingerprint-based identification to discern legitimate users from potential threats. Utilizing advanced behavioral analysis and anomaly detection, Imperva builds profiles based on a combination of factors, including IP reputation, user-agent, different fingerprint params, and behavioral characteristics. This allows the system to detect suspicious activities indicative of bot traffic and mitigate potential threats effectively.
Akamai: It's a prominent content delivery and cloud service provider that incorporates robust bot detection mechanisms within its security offerings. Leveraging a combination of fingerprint-based detection, behavior analysis, and machine learning, Akamai identifies and mitigates various bots. Akamai's global network allows for real-time threat intelligence, enabling proactive defense against evolving bot tactics.
Cloudflare: It's a widely used content delivery network and security service that employs a multifaceted approach to bot detection. Cloudflare distinguishes between human users and bots by analyzing parameters such as IP reputation, user-agent characteristics, and behavioral patterns. The platform also utilizes threat intelligence and community-driven insights to stay ahead in the battle against emerging bot threats. Cloudflare uses JA3 Fingerprinting to profile SSL/TLS clients and block potential bot requests. It also uses HTTP Filtering to apply rules and route traffic based on HTTP request information. Data Fingerprinting is used to identify specific files and prevent data loss. These techniques help in detecting not real users.

Role of Bot Detection Systems:

They play a pivotal role in enhancing bot detection capabilities. With servers strategically positioned around the globe, they can leverage geographical insights and real-time threat intelligence to identify and mitigate bot traffic effectively. The platform's WAF and bot managers contribute to a comprehensive defense against automated "users.”

In this ongoing cat-and-mouse game, Bot Detection Systems continue to evolve, leveraging advanced technologies to stay ahead of fingerprint spoofing tech. The collaboration between security providers, businesses, and the wider online community remains crucial in defense against the tactics of automated threats but also, in a way, used against users' privacy, anonymity, and experience online, bringing intrusive ads and sophisticated user tracking tools that are only beneficial for businesses not for users. However, privacy protection, fingerprint hiding and spoofing, and ads and trackers blocking tools are also developing to evade even the most sophisticated user tracking systems.

Websites' additional verification steps

Suspicious behavior: If a fingerprint appears suspicious, such as frequent changes or unusual combinations of data points, the website may trigger additional verification steps.
Captcha: Websites may present users with captcha challenges to verify their identity. This involves solving puzzles or identifying distorted text, tasks that are difficult to perform for primitive bots.
Mobile verification: Websites may request users to verify their mobile phone numbers to establish a stronger connection between a unique identity and a real-world entity. This can help prevent fraudulent activities.
ID verification: the idea is the same as with mobile phones, but you need to submit your ID

Examples:

Google doesn't require additional verification for new users with clear, consistent fingerprints. If a user's fingerprint changes frequently or resembles those used by known bots (or just already known for Google), Google may ask for a mobile phone number or ask to deal with a captcha.

Financial websites often enforce stricter verification measures due to the sensitive nature of transactions. They may require users to provide additional personal information, verify their identity through secure channels, or pass more complex captcha challenges.

Opinion: spoofing is better than hiding

Hiding fingerprints can enhance privacy, security, and online anonymity, but spoofing them to appear as a unique, legitimate user offers even greater advantages. By blending in with the crowd and avoiding suspicion from systems that detect anti-detection tools, you can maintain credibility and trust. This approach allows you to enjoy the benefits of being perceived as a unique, legitimate user, minimizing the chance of facing additional protecting measures or obstacles. and, obviously, it helps in preventing leaking of your real data and identity.

Also published here.