What follows is a 2 part series on session management — inspired by extensive conversations with over 70 developers and our own intensive research. We will explore different session management practices, identify issues and converge on a solution to these issues. Through it all, I hope to leave you with clarity on deciding how to manage user sessions (and auth tokens) for your application. In 20 minutes, we summarise all the important information it took us hundreds of hours to obtain and document.
This is part 1 in a two-part series on session management.
Part 1: Introduction to session management, analysis of most commonly used session flows, and best practices
Part 2: Analysis of a new, open source session flow that is secure and easy to integrate into existing systems
Specifically, in part 1, we cover
Note: Do not confuse session management with OAuth, as the latter is a protocol designed only for the purpose of delegation. Session management, for the purpose of this article, is about how auth tokens are handled, stored and changed during an active session — whether it be for OAuth flows, or for server-client session flows.
Session security is an important consideration in the design of any system that requires communication between a server and a client. Improper security can lead to user accounts being vulnerable to unauthorized access. OWASP (Open Web Application Security Project — leading authority for security) considers the improper implementation of authorisation / authentication as the second biggest risk to application security. Several notable hacks illustrate this point:
It is tricky, time-consuming and expensive to correctly implement user session management. According to an a16z operating partner (top tier VC) and former Box CSO (Chief Security Officer), authentication and authorisation is the number one spending cost for organisations when it comes to their security budget.
This is the tip of the iceberg but we hope it is enough for anyone to realize that they could be the next Titanic if they do not correct their course.
We’ll briefly explore the two predominant types of tokens that are used in session management. Several of the flows we discuss require an understanding of these tokens.
While these two token types have different properties, theft of either type can lead to unauthorised access to a user’s account.
Auth tokens are stored on the frontend and the backend and are frequently sent over the network (depending on the session flow). As such, they are vulnerable to several types of attacks.
While it may seem that these attacks are unlikely, it is important to take session security seriously and deploy appropriate measures. The vulnerability of the system is based on the cumulative probabilities of all the types of attacks.
Further on, we discuss how each of these attacks could lead to token theft and we explore best practices to mitigate against these types of attacks.
To keep tokens safe, a system architect should not only prevent tokens from being stolen but, as a fail-safe, also ensure that should token theft occur, the system is able to detect it as quickly as possible. Detection is an important concept to consider and will be explored in the next section.
Prevention is a first line of defense and all attempts should be made to minimize theft. However, auth tokens are fundamentally susceptible to theft because they are transmitted to an untrusted party (the app’s frontend). Hence, detection of token theft has an important role to play in the security of the system. Existing detection methods rely largely on heuristic algorithms such as tracking sudden changes in IP addresses and browser (or mobile) fingerprints and flagging “unusual user behaviour”. Unfortunately, these methods themselves can be inaccurate, easy to spoof and difficult to implement. However, there is a reliable way to integrate detection of theft in the session management flow and in part 2, we propose a flow that does that.
On a related note, in cases where session vulnerabilities are publicly exposed, companies may release statements stating that there was no indication that the vulnerability was exploited. However, what they fail to mention is how extensively their system would be able to detect token theft in the first place!
We’ve identified the most commonly used session management flows and classified them into 5 groups.
Flow 1 (Click to Zoom)
**Damage Analysis**The critical auth token is perpetually exposed over three attack surfaces — the frontend, during transit and the backend.
_Effect of stolen auth tokens:_The attacker would have unauthorised access to the victim’s account until the token’s expiry time — which could be weeks or months!
_Detection of theft:_Token theft may only be detected through the use of heuristic algorithms or if the user notifies the provider/developer of the service.
_Once detected:_If the flow is implemented using JWTs, it may be difficult to revoke the token. However, stolen Opaque access tokens can be easily revoked.
Flow 2 (Click to Zoom)
**Damage analysis**The critical auth token is perpetually exposed over three attack surfaces — the frontend, during transit and the backend.
_Effect of stolen auth tokens:_An attacker must constantly renew their token to maintain unauthorised access.
_Detection of theft:_To stay logged in, both the attacker and victim need to request the server for a new access token before the current (stolen) token expires. Both would do this using the same access token. If the same token is used twice for the request, then the system could deduce that there has been a theft — depending on how the frontend is implemented. A shorter-lived access token would enable quicker detection of theft, but it may also result in poor user experience due to repeated logouts when there is no theft.
_Once detected:_The access token associated with this session would need to be revoked. It may be complex to stop the attack if the access token is a JWT.
Flow 3 (Click to Zoom)
**Damage Analysis**The critical auth token is perpetually exposed over three attack surfaces — the frontend, during transit and the backend.
_Effect of stolen auth tokens:_As long as either the victim or the attacker is active, the attacker would be able to maintain unauthorised access.
_Detection of theft:_Token theft may only be detected through the use of heuristic algorithms or if the user notifies the provider/developer of the service.
_Once detected:_The access token associated with this session would need to be revoked. It may be complex to stop the attack if the access token is a JWT.
Flow 4 (Click to Zoom)
**Damage Analysis**There are no critical auth tokens in this case. However, this method frequently exposes the user’s credentials during transit — making it susceptible to attack.
_Effect of stolen auth tokens:_If the token is stolen, the attacker will only be able to do damage for a short period of time.
_Detection of theft:_Token theft may only be detected through the use of heuristic algorithms or if the user notifies the provider/developer of the service.
_Once detected:_Access tokens need not be revoked since they are short lived. However, if needed, Opaque access tokens can be revoked by removing them from the database.
Flow 4 (Click to Zoom)
**Damage analysis**The critical auth token (refresh token) is perpetually exposed over two attack surfaces, the frontend, and the backend and occasionally exposed over transit.
_Effect of stolen auth tokens:_Access token stolen: The attacker will have unauthorised access for a short period of time (until token expiry).
Refresh token stolen: The attacker can use the stolen refresh token to get new access tokens and have unauthorised access to the victim’s account over a long period of time. In rare scenarios (described below), this theft can be detected and the damage can be minimised.
_Detection of theft:_Access token stolen: This theft may only be detected through use of heuristic algorithms or if the user notifies the provider / developer of the service.
Refresh token stolen: Detection of theft is possible in certain scenarios and implementations. For example:
_Once detected:_Access tokens need not be revoked since they are short lived. However, if needed, Opaque access tokens can be revoked easily by removing them from the database.
Refresh tokens can be revoked easily by removing them from the database.
These flows are not designed with token theft detection as a requirement. In Part 2, we propose an alternate session flow that we believe would be far more secure. For now, we’ll revisit the types of attacks that sessions are vulnerable to and some steps to mitigate against the risks.
Man in the middle (MITM) attacks are possible in the following scenarios.
When using HTTP or incorrectly implementing HTTPS:If the application does not use https and secure cookies, an attacker could connect to the same network as the victim, monitor the network packets and see the auth tokens in plain text during transit. Often, even when the application has an SSL certificate, an incorrect implementation can lead to MITM attacks. For example, ESPN.com sends auth cookies over unsecured HTTP (as of 10th May 2019) and this Netcraft article elaborates on the prevalence of incorrectly implemented https.
When using a Proxy:Two of the last three organizations I worked at, monitored all the traffic on their network. At workplaces, devices likely use the corporate wifi network. Companies can enable the connected devices to trust their network proxy as an SSL Certificate Authority as a prerequisite to connect to the wifi. This would enable them (or a malicious actor) to see auth token information during transmission.
**Methods of prevention:**The easiest way to protect against this type of attack is to use https and secure cookies throughout your application. However, this doesn’t prevent attacks that result from the use of a proxy. One could take extra precaution by using public/private keys that are fixed per device. The frontend and backend would exchange these public keys at the point of initialization (before the user logs in). For subsequent communication, the token data could be encrypted using the public keys. This limits transit attacks to only the initial public key exchange. There is a modification that would enable the prevention of replay attacks but that is not covered in this blog post. (Feel free to [reach out](mailto: [email protected]) if you would like to know more). Regardless, some of the described flows (flow 5 and the proposed flow in Part 2) aim to minimize exposure of the critical token by reducing its frequency of transit.
If an application provides access/refresh tokens to other apps via OAuth, then there is a risk of the main app’s auth tokens being stolen if the other app’s servers are compromised. For reference, see the recent docker hub case study mentioned at the start.
The solution to this is to have appropriate measures in place to detect stolen refresh tokens and to use only short-lived access tokens.
In XSS, an attacker can maliciously inject Javascript code into an application running on the victim’s browser. The injected code reads and transmits auth tokens to the attacker (read more about XSS attacks here).
This can be prevented fairly easily by using HttpOnly or Secure cookies to store auth tokens. Do not use localStorage to store auth tokens, as they are accessible by javascript. All described session flows can be protected against this attack by following this recommendation.
This attack is not used to steal auth tokens — instead, it allows an attacker to piggyback on an existing active session (read more here).
Prevention of CSRF attacks typically requires the use of an anti-CSRF token or SameSite cookies.
If an attacker manages to access the database/file system (either via database injection attack or actual server access), they could potentially get hold of currently active auth tokens or the JWT / SSL private key (theft of these keys is potentially even worse than stolen passwords). This would enable them to easily hijack sessions — leading to serious security consequences. Do note that the attacker could be an employee within your organisation (especially for high growth startups — are all the proper access controls in place for employee database/server access?).
To control damage caused by unauthorized access to your database or filesystem, you could do the following:
This may be possible if you have anonymous sessions for your web application (read more here)
The best way to solve this is to generate a new set of auth tokens each time a user logs in and to invalidate the old ones if any. This is done per device and not per user. Doing so will safeguard all described session flows against this attack.
An attacker with sufficient resources can incessantly ‘guess’ auth tokens until one of the attempts proves successful. This would provide them with all the access the stolen token confers.
The best way to prevent this is to use long auth tokens with high entropy.
An attacker with physical access to a victim’s device can steal auth tokens in multiple ways.
Both of the above issues are even more probable if an app is being used on a public computer — which has to be factored in.
The only way to really fix this problem is to have token theft detection in place and to enable users to log out of all devices. This would mean being able to revoke all refresh and access tokens for that user. Some methods that have long lived JWT access tokens, might find this difficult to do.
This wraps up the best practices to prevent common types of attacks and this section of the post. We hope it helps and provides the answers you were looking for. Please do leave any comments you have.
Studying all these session flows enabled us to conceptualize a flow (inspired by IETF RFC 6819) which enables greater security and detection of theft. We subsequently built the flow for our own service (Qually.com) and, on request of the developer community, decided to open source our code. Click the button below to navigate to a post which discusses this flow and has links to the GitHub repo — should you be interested. Do check it out and let us know what you think!