After hunting for security bugs I’ve realized clients I’m working with are not familiar enough (or at all) with basic “hacking” techniques. API keys, passwords, SSH encrypted keys, and certificates are all great mechanisms of protection, as long they are kept secret. Once they’re out in the wild, it doesn’t matter how complex the password is or what hash algorithm was used to encrypt it somewhere else. In this post, I’m going to share concepts, methods, and tools used by researchers both for finding secrets and exploiting them. I’ll also list mitigation action items that are simple to implement.
It’s important to mention that the attack & defend “game” is not an even one; an attacker only needs one successful attempt to get in, whereas the defender has to succeed 100% of the time. The hard part is knowing where to look. Once you can list your virtual “gates” through which hackers can find their way in, you can protect them with rather simple mechanisms. I believe their simplicity sometimes shadows their importance and makes a reason to be overlooked by many teams.
So here’s a quick and simple, yet not one to overlook TL;DR:
These are the 20% actions for 80% effect to prevent leaks and access-control holes.
API keys are all over the internet exposed to the world. This is a fact. Often times for no good reason. Developers forget them all around:
Blocks such as this one are all over the internet:
<code style="box-sizing: border-box; font-family: Monaco, Consolas, "Lucida Console", monospace;"><span class="c1" style="box-sizing: border-box; transition: all 0.2s ease-in-out 0s; color: rgb(178, 204, 214);">// DEBUG ONLY</span>
<span class="c1" style="box-sizing: border-box; transition: all 0.2s ease-in-out 0s; color: rgb(178, 204, 214);">// TODO: remove --></span>
<span class="nx" style="box-sizing: border-box; transition: all 0.2s ease-in-out 0s; color: rgb(130, 170, 255);">API_KEY</span><span class="o" style="box-sizing: border-box; transition: all 0.2s ease-in-out 0s; color: rgb(137, 221, 255);">=</span><span class="nx" style="box-sizing: border-box; transition: all 0.2s ease-in-out 0s; color: rgb(130, 170, 255);">t0psecr3tkey00237948</span>
</code>
While many hackers actually sit and read through javascript files, the vast majority of them will automatically scan with tools like meg and then scan them for patterns. How do they do that? After using a scanner like “meg” they scan their findings for a string that matches different templates. An example of another great tool by the same author that does exactly that is gf which is just a better grep.
In this instance, using truffleHog or the trufflehog option in the gf tool can find the high-entropy string that most API keys identify with. The same goes for searching API_KEY as a string that yields results (too) many times.
Oftentimes, keys have a good reason to appear where they are, but they’re not protected from being used externally. One example is a client I’ve been working with lately, who, like many other platforms use maps as a third-party service. In order to fetch maps and manipulate them, they would call an API with a key and use it to get the relevant map back. What they forgot to do is configure their map provider to limit the origins from where incoming requests with that specific key can originate. It’s not hard to think of a simple attack that will drain their license quota, effectively costing them a lot of money, or “better” yet (in terms of the attack) bringing their map-oriented service down.
JS files are not only used to find secrets by hackers. This is your application code open to any prying eyes. An intelligent hacker might read the code thoroughly to understand naming conventions, API paths, and find informational comments. These are later on extrapolated to a list of words and paths and loaded into automated scanners. This is what’s referred to as an intelligent automated scan; one where the attacker combines automated processes and gathered organization-specific information.
A real comment left on a target’s front page, revealing a set of unprotected API endpoints leaking data.
<code style="box-sizing: border-box; font-family: Monaco, Consolas, "Lucida Console", monospace;"><span class="cm" style="box-sizing: border-box; transition: all 0.2s ease-in-out 0s; color: rgb(178, 204, 214);">/* Debug ->
domain.com/api/v3 not yet in production
and therefore not using auth guards yet
use only for debugging purposes until approved */</span>
</code>
What should you do then?
They take a look back at the Wayback machine
The internet archive, also known as the “Wayback Machine” holds periodic scans of websites all over the internet for years and years back. This is a mining field for hackers with a target. With tools like waybackcurls (based on waybackcurls.py) one can scan any target of old files. This means that even if you’ve found and removed a key but did not rotate it, a hacker might still find it in an old version of your website and use it against you.
Found a key laying around where it’s not supposed to?
The way WaybackMachine is not only good for finding keys
Old code reveals all kind of interesting information for exploiters:
GitHub is a goldmine for hackers. With a simple search, knowing where to look can yield interesting results. If your account is not enforcing MFA, each and every user in the organization is a walking security hole. It’s not far-fetched to assume that one of the collaborators in the organization is not using a unique password and that his password was once leaked through another system. A hacker that targets the organization can easily automate such a scan or even go manually through it. The list of employees can be generated with OSINT like searching for employees on Linkedin, or in the GitHub public users list.
For example, here’s a good starting point if you’re trying to probe Tesla:
<code style="box-sizing: border-box; font-family: Monaco, Consolas, "Lucida Console", monospace;">https://api.github.com/orgs/teslamotors/members
</code>
Even if the company doesn’t use GitHub as their git provider, often the leaks won’t be caught there anyway. It’s enough to have one employee that uses GitHub for his personal projects and has a small leak in one of them (or their git history) to turn it into a breach.
Git’s nature is to track the entire history of changes in every project. In the security context of things, this fact becomes significant. In other words, every line of code every written (or removed) by any user with current access to any organizational system is jeopardizing the company.
Why does it happen?
Dorks 101
“Dorks” are search lines that utilize the search engine's different features, with targeted search strings to pinpoint results. Here’s a fun list of Google searches from the exploit DB.
Before giving the gist of it, if you want to go deep here, and I personally recommend that you do, here’s an invaluable lesson from a talented researcher. He discusses how to scan, how to use dorks, what to look for and where when going through a manual process.
GitHub dorks are less complex than Google simply because it lacks the complexity of features Google offers. Still, searching for the right strings in the right places can do wonders. Just go ahead and search one string of the next list on GitHub, you’re in for a treat:
<code style="box-sizing: border-box; font-family: Monaco, Consolas, "Lucida Console", monospace;">password
dbpassword
dbuser
access_key
secret_access_key
bucket_password
redis_password
root_password
</code>
If you try targeting the search to interesting files like f
ilename:.npmrc _auth or filename:.htpasswd
you can filter the type of leak you’re looking for. Read further SecurityTrails’ great post.Now that we’re generally familiar with dorks, taking them to Google reveals an entirely new field of features. Being the powerful search engine it is, Google offers inclusion & exclusion of strings, file format, domains, URL paths, etc. Consider this search line:
<code style="box-sizing: border-box; font-family: Monaco, Consolas, "Lucida Console", monospace;">"MySQL_ROOT_PASSWORD:" "docker-compose" ext:yml
</code>
This is targeting a specific file format (yml) and a vulnerable file (docker-compose) where developers tend to store their not-so-unique passwords. Go ahead and run this search line, you’d be surprised to see what comes up.
Other interesting lines may include RSA keys or AWS credentials, here’s another example:
<code style="box-sizing: border-box; font-family: Monaco, Consolas, "Lucida Console", monospace;">"-----BEGIN RSA PRIVATE KEY-----" ext:key
</code>
The options are endless and the level of creativity and width of familiarity with different systems will determine the quality of findings. Here’s a large list of dorks if you want to play a little.
When a researcher (or a motivated hacker) gets “involved” with a system, he goes deep. He gets to know it; API endpoints, naming conventions, interactions, different versions of systems if they’re exposed.
A not-very-good approach to securing systems is introducing complexity and randomness to their access paths instead of real security mechanisms. Security researchers trying to come up with vulnerable paths and endpoints use “fuzzing” tools. These tools use lists of words, combining them into system paths and probing them to see if valid answers are being returned. These scanners will never find a completely random set of characters, but they are superb at identifying patterns and extracting endpoints you either forgot about or did not know exist.
Remember, security through obscurity is not a good practice (although don’t ignore it completely)
That’s where Github dorks which we’ve discussed earlier come in; knowing a system’s endpoints naming convention, e.g. api.mydomain.com/v1/payments/... can be very helpful. Searching the company’s Github repos (and their employees) for the basic API string can many times find those random endpoints names.
However, random strings still have a place when building systems; they are always a better option than incremental resource IDs, like users, or orders.
Here’s an incredible string lists repo called “SecLists”. It’s being used by almost everyone in the industry. Often with a personal twist and touch in the context of the target, it’s a massive source. Another powerful tool to leverage string lists is FFuf, an ultra-fast fuzz tool written in Go.
Security is often taken lightly in startups. Developers and managers tend to prioritize speed and delivery times over quality and security. They end up pushing clear text secret strings to code repos, using the same keys over and over in systems, using access keys when other options are available can sometimes seem faster. However, they can be detrimental down the road.
I’ve tried showing how those strings (that you think are protected by being in a private repo) can easily find their way to a public gist. Or an employee’s unintentional Git clone that was made public. If you set the ground for secure work like using password sharing tools, central secret store, policies for passwords, and multi-factor authentication, you’d be able to keep making fast progress, without sacrificing security completely.
“Move fast and break things”, is not the best mantra in the context of information protection
Knowing how hackers work, is usually a very good first step in understanding security and applying it to systems as a protector. Consider the approaches above and the fact that this is a very limited list of paths hackers take when penetrating systems. A good thing to do is to keep in mind the security aspects of anything being deployed to a system, regardless of its customer-facing/internal nature.
Managing security can sometimes be a pain in the ass, but rest assured, the mayhem you’re avoiding by just taking care of the very basic elements, will keep you safe and sane.
Thank you for reading this far! I hope that I’ve helped to open some minds to risks that are out there and we all miss or overlook.
Feel free to reach out with any feedback or questions. Any form or shape or discussion is most welcome!