Let's say you're looking for a nice parka for the coming winter season, you go to your preferred e-commerce and from the developer tools, inside the network tab, you see a red line when loading the home page: error 429, too many requests. Kinda strange isn't it?
Having a look at the network tab, an unusual header comes to my eyes
This response is the typical response given by an anti-bot software called Kasada, one of the newest players in the field.
The main difference between Kasada and other software is stated on their website: "Kasada assumes all requests are guilty until proven innocent by inspecting traffic for automation with a process that’s invisible to humans."
And again "Automated threats are detected on the first request, without ever letting requests into your infrastructure, including new bots never seen before."
So every first request to the website is considered as if coming from a bot and only if the browser solves a sort of magic trick, then is redirected to the browser. Interesting.
This is a shift in the paradigm of anti-bot software.
Most of the other solutions follow two roads:
In this case, it is something like "block everything unless there's evidence that the requests do not come from a bot".
For sure it's a strong statement and limitation: restricting bots could mean also limiting the SEO of the website and the usage of legit tools used by the company. In the case of public e-commerce, it makes sense in some parts of the website, when adding to the cart some items as an example, but limiting the whole website can lead to awful results.
While the brand's website performs as expected on Google and DuckDuckGo, being on top of the organic results when searched
in Bing is only fourth, behind another brand with a similar name.
The most concerning thing is that on Baidu, the Chinese version of the website is completely missing from the first page, leaving one of the most important markets for luxury goods completely unserved.
Being a nerd passionate about web scraping, this website sounds like a challenge to me.
I know from my readings that Kasada is one of the toughest anti-bot on the market, due to its peculiarity described above, but I don't give up without even trying.
I should use the artillery for this battle and show off directly with the best weapon on my armory: Playwright + Stealth plugin using Chrome and not Chromium, to simulate a real person browsing the website... and I failed miserably.
I still feel the pain looking at this picture
Tried and tried and tried again several changes of configurations and the result was always the same. A blank page.
The guys at Kasada really did a great job in blocking bots, I must admit it.
But just when I was going to give up and go crying in a dark corner of my room with my broken ego, came to my mind again the SEO issue of this website, and decided to have a look at its robots.txt file.
So, there's this "Screaming Frog SEO Spider" allowed to roam all around the website and my Chrome setup is not?
Having a look at this Screaming Frog tool, it's a sort of desktop application that allows the user to download the HTML file of some websites, using guess what? A headless version of chromium.
But being a desktop application, it is very unluckily that this rule is valid only for a certain range of IPs, so why we don't try to change the User Agent accordingly to the Screaming Frog?
Yes!
Bingo!
This of course does not mean I've found a way to "hack" the Kasada anti-bot but, in this particular case, the solution is misconfigured.
It was probably too strict in blocking also some internal processes of the company so a back door was left open, giving the opportunity to the bot to enter.
We don't know many details: the number of bots attacking the website before Kasada was implemented, the costs of the solution, and the losses of sales caused by this SEO lack.
Kasada for sure is doing a great job, maybe was too good in this case that some backdoors were needed to not interfere with the operativity of the website.
Generally speaking, when it comes to fraud or DDoS attacks, or other harmful activity, of course, bot fighting is a must. And it's difficult to differentiate between bots that just want to scrape some prices from the website from the dangerous ones.
For sure if every e-commerce accepts to give access to its publicly available data (product catalog as an example) may be via an API, most of the bots that serve the market intelligence industry would stop scraping the website and anti-bot solutions can be targeted more efficiently against frauds and DDoS attacks.
Maybe it would help, just my 2 cents.
Also published here.