Password guessing works because …
Humans are predictable
If you ask Mark to set a password. He will simply put it as ‘Mark’. Now the system tells him that his password must contain numbers. He makes it ‘Mark123’. Though you are smart enough not to put such insecure passwords, there are a good number of people who are like Mark.
You can see patterns in leaked passwords. Most of them might is a combination of one or more of first name, last name, birth date, or favorite entities (names of car, bike, actor, actress ,etc).
Password guessing tools like HashCat or John the Ripper commonly use dictionary attack with password generation rules to guess the password. Dictionary attack is simply trying the words in a dictionary one after the other as password inputs to see which works out.
Dictionary attack with password generation rules is done by manually going through a list of leaked password and making rules for password generation. For Ex: If leaked passwords are found to be a combination of an entity and 123 say ‘ford123’. Then concatenating 123 to the dictionary words is a generation rule.
In password guessing tools these these are defined manually or in other words they are human defined rules. The problem with such rules is human behaviours and interests change over time. So these rules must be updated often (As more leaks happen). Also when the list of leaked passwords is huge, finding patterns manually would be a difficult process. There are good chances some patterns go unnoticed.
This is where the role of Artificial Intelligence comes in. Few researchers recently trained neural networks with datasets of leaked password and were able to generate passwords that outperformed popular tools like HashCat and John the Ripper.
If the term neural network goes over your head, just think of it as computer process which can replicate how humans learn. Humans learn through modelling and observation. If I show you four photos of a person and tell him he is ‘Mark’. Next time you would easily recognise him. Neural Networks helps computer replicate the same .
Researchers used Generative Adversarial Neural Networks (GANs) to implement this. GANs consists of two neural networks. One neural network for generation and another one which gives feedback. Think of it as a master and student. A student tries to do something and the master gives feedback saying ‘You need to try more’ or ‘You are close’ etc. These neural networks run multiple iterations untill it get satisfactory results.
This makes the process completely automated. Researchers also observed that though in some cases the neural networks were unable to match the exact password, the generated one looked like the given password. For Ex: if the password was ‘AEF@123’ the generated one was ‘A3F@123’.
On the bright side, we can use this to make password based systems more secure by identifying weak or predictable passwords. If you would like to look into the technical details , here is the research paper A Deep Learning Approach for Password Guessing.
Follow Hackernoon and me (Febin John James) for more stories. I am also writing a book to raise awareness on the Blue Whale Challenge, which has claimed lives of many teenagers in several countries. It is intended to help parents understand the threat of the dark web and to take actions to ensure safety of their children. The book Fight The Blue Whale is available for pre-order on Amazon. The title will be released on 25th of this month.