TL;DR
Kupa3 allows you to draw connections between scripts on specific website. It search for javascript code or source attribute, in html code, and crawls it in order to draw a dependency graph. This approach can help bug hunters to discover subdomains and examine javascript calls, OSINT researchers to check what companies are connected to each other or for tracking advertisement companies. At the end, graph is saved in gexf format for exploring it in Gephi.
Script is here → https://github.com/woj-ciech/kupa3
Example graph for Reddit.com
I’m still pretty amazed to see how many advertisement scripts are on any website. It tracks your every click, create a heat map of your mouse movements, gets information about plugins, browser, resolution or battery among others. If you have enough information about user it’s easy to deanonymise him based on multiple factories, which are collected with every visit. Additional cross referencing to IP addresses allows them to track you no matter of incognito mode or used browser. Do you know what companies are the biggest fishes in the business? What they do with collected information? How theirs scripts look like?
The tool makes it easier to follow each script on websites with all dependencies. Let’s start from the beginning and check how kupa3 can help in your investigation.
Note:
Some of the scripts are legitimate and really improving performance and do not collect information.
Photo by Cris Tagupa on Unsplash
Javascript code is always interesting from bug hunter point of view. It can contain additional targets like subdomains or makes internal/external calls. It discloses used libraries and if deobfuscated enough can reveal tokens, debug options, keys or cloud providers.
As an example, let’s try with vice service and their main website.
Vice.com
If we zoom in to one of the key nodes of graph (pink one), the connections are clearly visible. Script „common.b0c375050cc69b89e1a0.js” includes many links. It’s worth to mention that not only javascript links are fetched. All of the url are collected and if it meets js code it goes deeper until there will be no further references.
Discover subdomain prometheus.vice.com
On the first sight it may be a little unreadable. The best way to interpret the graph is Overview tab in Gephi. You can point on one of the node and rest will be grayed out, which help you to define connections and read clear nodes labels. We extracted subdomain prometheus.vice.com, as you may notice, there are more subdomains included in in js code.
Let’s jump in to main topic, i.e. tracking advertisement companies via scripts and connections between them. As previously mentioned trackers are placed everywhere, whatever site you go to, someone will be tracking you. Before that research I was aware about Google Analytics as a key platform of tracking and serving ads, but it turns out that there are lot of players in this field. Majority of them are hard to trace to specific owner or company. Moreover, some of the scripts returns 404 or 403 errors, which means that they are not accessible directly or there is no proper cookie or referer attribute set up in the request. What is interesting, some bots start tracking you when you add something to your basket or cooperate in other significant way with the website. One of the coolest graph is for nike.com. It has so many trackers that it is a good example.
nike.com graph
One of the first loaded script is anti-bot detection, in this case Akamai is used. It is highly obfuscated and eyes are bleeding from first sight. Actually, I’ve learned that small war exist between people that creates bots to generate verified Nike+ account and bot detection mechanisms. These accounts can be resold in order to get discount.
Funny thing was, when I was searching for advertisement domain, google was giving me results like „How to remove <domain_name>” or scan results from online antivirus engines. In my opinion, that’s why almost every ad company uses different domain name for serving ads and other for content about actual company.
Link that arouse my suspicion the most, was https://gridsumdissector[.]com/js/Clients/GWD-000673-204DB5/gs.js, which does not respond and it’s served from web.nike.com/neo/main/neo.js and nike.com/neo/main/neo.js. Probably it’s only for Chinese citizens.
Main website, redirects to SSO login and it’s all in Chinese by default. Last announcement on their website was from 2016 and API documentation is also in Chinese. Domain is registered to Beijing Innovative Linkage Technology Ltd with Bejing as a residence. The registrar have deserved his place in fraud-reports here http://fraud-reports.wikia.com/wiki/Beijing_Innovative. One of the subdomain allows directory listing with ton of scripts contains references to cntv.cn with registrar as CCTV International Network Co., Ltd. https://www.bloomberg.com/research/stocks/private/snapshot.asp?privcapId=99156085 .
Subdomain reveals scripts and links to other company
Moreover IP 218.202.xxx.xx was found as a link and it belongs to China Mobile. Having this proof, we can say that these two companies cooperates with themselves. It’s tough to get any information about Chinese companies, especially if it’s related to marketing, advertising and tracking.
Google indexed some of the docs from their SSO login page and theme for dashboard for customer. Thanks to this, we can take a look what is behind the curtain.
Documentation regarding tracking stats
Tracking dashboard
It’s not the only one, which is hard to trace and make full sense out of the company and theirs scripts. Other tracking companies that are connected with nike.com:
Do you recognize some of the companies? Do you trust them and allow executing code in your browser?
The gridsumdissector was used only by example, there are lot more with shady connections and adware activities. You can check by yourself, by looking into adrttt[.]com domain (registered in Istanbul). With little OSINT skills, you can take a look on their infrastructure, linked companies and then follow the rabbit hole.
The moral from this story is that you never know where your data is going, lot of companies are engaged in the race for your browsing habits to makes advertisement more and more personal and targeted. Additionally, you can’t be 100% sure what data they collect because of obfuscation of java script code. It can be reversed but it’s very time consuming and not worth the effort. My suggest is to block them all.