The ever-evolving landscape of online threats demands constant adaptation from information security teams. We see this in the relentless tactics employed by "bad actors" – from creating fake profiles for malicious purposes to exploiting vulnerabilities in network infrastructure. As the scale and complexity of these attacks grow, traditional defense mechanisms like rate limiting often reach their limits. This article dives into the power of Machine Learning (ML) as a powerful weapon in the information security arsenal, offering practical insights from real-world applications. Here, we'll explore how ML can be harnessed to identify and neutralize these threats, drawing from my experience at Meta, where we safeguard billions of users on the WhatsApp platform. We'll not only delve into the technical aspects of building ML models but also explore crucial pre-ML strategies to fortify your defenses.
Have you ever wondered about how X(formerly known as Twitter) identifies bots that are tweeting spam? or how banks identify fraudulent accounts? or how GitHub identifies faulty servers in the network? Such systems are built by information security teams that monitor and take down such activities at scale using AI/ML systems. In cases where the automation can’t handle or identify, incident response teams take them down. These learnings are captured and then train new ML classifiers to identify outliers.
Bad actors have a wide range of sophistication and their intent varies too. Some bad actors create fake profiles that can be used to carry out many different types of abuse: scraping, spamming, fraud, and phishing, among others. To build robust countermeasures against different types of attacks on our platform, a funnel of defenses is built to detect and take down fake accounts at multiple stages.
(1) The attacker needs to first create an account for which they would use PVAcreator or other tools
(2) Using the automated accounts, the attacker needs to reach the data by navigating through the network and moving laterally.
(3) Once the attacker has access to the data, the attacker needs to exfiltrate this data out of the network.
Limits the number of requests from a single IP address within a specific timeframe.
Helps mitigate DDoS attacks and brute-force attempts.
Sets a cap on the requests a single user can make within a given time window.
Guards against abuse and unauthorized access attempts.
Uses unique tokens or API keys to track and control API requests per token.
Secures APIs from misuse and potential data leaks.
Very soon you will start seeing these rules become quite ineffective as attackers rotate IPs, accounts, and tokens. They will start slowing down requests e.t.c
You need to start collecting logs and turn them into ML features for training.
Here is an example of building features based on the traffic data you see:
Here are some examples of features this paper has developed from logs:
With such feature vectors, we can build a matrix and start using outlier detection techniques.
2. Auto-encoders: These are in short non-linear version of PCA.
The neural network is constructed such that there is an information bottleneck in the middle. By forcing the network to go through a small number of nodes in the middle, it forces the network to prioritize the most meaningful latent variables, which is like the principal components in PCA. Here is an example using fast.ai
The fight against online threats is a continuous game of cat and mouse. While advanced ML techniques offer unparalleled capabilities, it's crucial to remember that they are just one piece of the puzzle. By combining pre-emptive measures like rate limiting with robust ML models, information security professionals can build a multi-layered defense system. This article serves as a practical roadmap for leveraging ML in your security strategy, empowering you to identify and thwart even the most sophisticated attacks. Remember, information security is a shared responsibility. By actively implementing these techniques and fostering collaboration within the security community, we can create a safer online environment for everyone.