1,031 reads

Never Rely on UUID for Authentication: Generation Vulnerabilities and Best Practices

by Ivan Mochalov8mMay 1st, 2024

Too Long; Didn't Read

The risks and best practices of using UUIDs for authentication, uncovering vulnerabilities, and secure implementation strategies.

featured image - Never Rely on UUID for Authentication: Generation Vulnerabilities and Best Practices

UUID for authentication

There is hardly a person nowadays who never clicked that "Recover password" button in deep frustration. Even if it does seem the password was without a doubt correct, the next step of recovering it mostly goes smoothly with visiting a link from an email and entering the new password (let's not fool anyone; it is hardly new as you have just typed it three times already in step 1 before pressing the obnoxious button).

The logic behind email links, however, is something to take great scrutiny about as leaving its generation insecure opens a flood of vulnerabilities regarding unauthorized access to user accounts. Unfortunately, here is one example of a UUID-based recovery URL structure many probably encountered, which does not nevertheless follow security guidelines:

https://.../recover/d17ff6da-f5bf-11ee-9ce2-35a784c01695

If such a link is used, it generally means that anyone can get your password, and it is as simple as that. This article aims to dive deep into UUID generation methods and select insecure approaches to their application.

What is UUID

UUID is a 128-bit label commonly used in generating pseudo-random identifiers with two valuable attributes: it is complex enough and unique enough. Mostly, those are key requirements for ID leaving the backend and being shown to the user explicitly in the frontend or generally sent over API with the ability to be observed. It makes one hard to guess or brute-force in comparison to id = 123 (complexity) and prevents collisions when the generated id is duplicated to previously used, e.g., a random number from 0 to 1000 (uniqueness).

The "enough" parts actually come from, firstly, some versions of Universally Unique IDentifier, leaving it open for minor possibilities for duplications, which is, however, easily mitigated by additional comparison logic and does not pose a threat due to hardly controlled conditions for its occurrence. And secondly, the take on complexity of various UUID versions is described in the article, in general it is assumed to be quite good except for further corner cases.

Implementations in backend

Primary keys in database tables appear to rely on the same principles of being complex and unique as UUID does. With the wide adoption of built-in methods for its generation in many programming languages and database management systems, UUID often comes as the first choice to identify data entries stored and as a field to join tables in general and subtables split by normalization. Sending user IDs that come from a database over API in response to certain actions is also common practice for making a process of unifying data flows simpler without extra temporary ID generation and linking them to ones in production data storage.

In terms of password reset examples, the architecture more likely includes a table responsible for such an operation that inserts rows of data with generated UUID every time a user clicks the button. It initiates the recovery process by sending an email to the address associated with the user by their user_id and checking which user to reset the password for based on the identifier they have once the reset link is opened. There are, however, security guidelines for such identifiers visible to users, and certain implementations of UUID meet them with varying degrees of success.

Outdated versions

Version 1 of UUID generation splits its 128 bits into using a 48-bit MAC address of the device generating identifier, a 60-bit timestamp, 14-bit stored for incrementing value, and 6 for versioning. Uniqueness guarantee is thus transferred from rules in code logic to hardware manufacturers who are supposed to assign values for every new machine in production correctly. Leaving only 60+14 bits to represent useful changeable payload deteriorates the integrity of the identifier, especially with such transparent logic behind it. Let's take a look at a sequence of consequently generated number of UUID v1:

from uuid import uuid1

for _ in range(8):

    print(uuid1())

d17ff6da-f5bf-11ee-9ce2-35a784c01695
d17ff6db-f5bf-11ee-9ce2-35a784c01695
d17ff6dc-f5bf-11ee-9ce2-35a784c01695
d17ff6dd-f5bf-11ee-9ce2-35a784c01695
d17ff6de-f5bf-11ee-9ce2-35a784c01695
d17ff6df-f5bf-11ee-9ce2-35a784c01695
d17ff6e0-f5bf-11ee-9ce2-35a784c01695
d17ff6e1-f5bf-11ee-9ce2-35a784c01695

As can be seen, the "-f5bf-11ee-9ce2-35a784c01695" part stays the same all the time. The changeable part is simply a 16-bit hexadecimal representation of sequence 3514824410 - 3514824417. It is a superficial example as production values are usually generated with more significant gaps in time in between, so the timestamp-related part is also changed. 60-bit timestamp part also means that a more significant part of the identifier is visually changed over a larger sample of IDs. The core point stays the same: UUIDv1 is easily guessed, however random-looking it initially appears.

Take just the first and last values from the given list of 8 ids. As identifiers are generated strictly, consequently, it is clear there are only 6 IDs generated between the given two (by subtracting hexadecimal changeable parts), and their values can also be definitively found. Extrapolation of such logic is the underlying part behind the so-called Sandwich attack aiming to brute-force UUID from knowing these two border values. Attack flow is straightforward: the user generates UUID A before the target UUID generation occurs and UUID B right after. Assuming the same device with a static 48-bit MAC part is responsible for all three generations, it sets a user with a sequence of potential IDs between A and B, where the target UUID is located. Depending on the time proximity between generated IDs to target, the range can be in volumes accessible to brute-force approach: check every possible UUID to find existing ones among empty.

In API requests with the password recovery endpoint described previously, it translates to sending hundreds or thousands of requests with consequent UUIDs until a response stating the existing URL is found. With password reset, it leads to a setup where the user can generate recovery links on two accounts they control as closely as possible to press the recovery button on the target account they have no access to but only knows email/login. Letters to controlled accounts with recovery UUIDs A and B are then known, and the target link to recover the password for the target account can be brute-forced without having access to the actual reset email.

Vulnerability originates from the concept of relying solely on UUIDv1 for user authentication. By sending a recovery link that grants access to resetting passwords, it is thus assumed that by following the link, a user is authenticated as the one who was supposed to receive the link. This is the part where the authentication rule fails due to UUIDv1 being exposed to straightforward brute force in the same way as if someone`s door could be opened by knowing what the keys of both their neighbor doors look like.

Cryptographically insecure functions

The first version of UUID is mainly considered legacy partly because generation logic only uses a smaller portion of identifier size as a randomized value. Other versions, like v4, try to solve this issue by keeping as little space as possible for versioning and leaving up to 122 bits to be random payload. In general, it brings total possible variations to a whooping 2^122, which for now is considered to satisfy the "enough" part regarding identifier uniqueness requirement and thus fulfill security standards. Opening for brute-force vulnerability might appear if generation implementation somehow significantly diminishes the bits left for the random part. But with no production tools or libraries, should that be the case?

Let's indulge in cryptography a bit and take a close look at JavaScript's common implementation of UUID generation. Here is randomUUID() function relying on math.random module for pseudo-random number generation:

Math.floor(Math.random()*0x10);

And the random function itself, for short it is just the part of interest for the topic in this article:

hi = 36969 * (hi & 0xFFFF) + (hi >> 16);

lo = 18273 * (lo & 0xFFFF) + (lo >> 16);

return ((hi << 16) + (lo & 0xFFFF)) / Math.pow(2, 32);

Pseudo-random generation requires seed value as a base to perform mathematical operations on top of it to produce sequences of random enough numbers. Such functions are solely based on it, meaning that if they are reinitialized with the same seed as before, the output sequence is going to match. The seed value in the JavaScript function in question comprises variables hi and lo, each a 32-bit unsigned integer (0 through 4294967295 decimal). A combination of both is needed for cryptographic purposes, making it close to impossible to definitively reverse the two initial values by knowing their multiple, as it relies on the complexity of integer factorization with large numbers.

Two 32-bit integers together bring 2^64 possible cases for guessing hi and lo variables behind the initialized function producing UUIDs. If hi and lo values are somehow known, it takes no effort to duplicate the generation function and know all the values it produces and will produce in the future due to seed value exposure. However, 64 bits in security standards can be considered intolerant to brute-force in a measurable time period for it to make sense. As always, the issue comes from specific implementation. Math.random() takes various 16 bits from each of hi and lo into 32-bit results; however, randomUUID() on top of it shifts the value once again due to .floor() operation, and the only meaningful part all of a sudden now comes exclusively from hi. It does not affect generation in any way but causes cryptography approaches to fall apart as it only leaves 2^32 possible combinations for the entire generation function seed (there is no need to brute-force both hi and lo as lo can be set to any value and does not influence the output).

Brute-force flow consists of acquiring a single ID and testing possible high values that could have generated it. With some optimization and average laptop hardware, it can take just a couple of minutes and does not require sending lots of requests to the server as in the Sandwich attack but rather performs all operations offline. The result of such an approach causes replication of the generation function state used in the backend to get all created and future reset links in the password recovery example. Steps to prevent vulnerability from emerging are straightforward and shout out for the use of cryptographically secure functions, e.g. crypto.randomUUID().

Takeaways

UUID is a great concept and makes the lives of data engineers a lot easier in many application areas. However, it should never be used in relation to authentication, as in this article, flaws in certain cases of its generation techniques are brought to light. It obviously does not translate to the idea of all UUIDs being insecure. The basic approach, though, is to persuade people not to use them for security at all, which is more efficient and, well, secure than setting complex limits in documentation on which to use or how not to generate them for such purpose.

L O A D I N G
. . . comments & more!

About Author

Ivan Mochalov@mochalov

Competitors Analyst Teamlead at Yandex

Read my stories

Never Rely on UUID for Authentication: Generation Vulnerabilities and Best Practices

Too Long; Didn't Read

UUID for authentication

What is UUID

Implementations in backend

Outdated versions

Cryptographically insecure functions

Takeaways

About Author

TOPICS

Languages

THIS ARTICLE WAS FEATURED IN...

RELATED STORIES