Our team recently implemented an internal static website that allows employees to download technical reports. Since we're heavy AWS users, we naturally decided to host it on AWS S3, which provides a dedicated feature to build static websites ( ). S3 static website hosting Very quickly, however, we ran into an issue: AWS S3 does not provide any native, out-of-the-box authentication/authorization process. Because it was an internal-only website, we needed some kind of authorization mechanism to prevent non-authorized users from accessing our website and reports. We needed to find a solution to secure our static website on AWS S3. Discovering the Solution With CloudFront and Lambda@Edge We use for all Identity and User Management, so whatever solution we found had to plug-in with Okta. Okta Okta has several authentication/authorization flows, all of which require the application to perform a back-end check, such as verifying that the response/token returned by Okta is legit. So we needed to find a way to carry these checks/actions on a static website which uses a back end that we don't control. That's when we learned about , which lets you run at different stages of a request and response to and from CloudFront: AWS Lambda@Edge Lambda Functions We can trigger a Lambda Function at four different stages: When the request enters CloudFront ( ) viewer-request When the request goes out to the origin ( ) origin-request When the response is returned from the origin ( ) origin-response When the response is returned from CloudFront ( ) viewer-response We saw a solution to our original issue: that would check if the user is authorized, under two conditions: trigger a Lambda at the viewer-request stage If the user is authorized, let the request continue and return the restricted content If the user is not authorized, send an HTTP response to redirect them to a login page Implementing the Lambda@Edge function We'll cover here the key elements and main issues we faced. The complete code is available . Feel free to use it in your project! here Lambda@Edge restrictions and caveats During the development of the solution, we ran into several restrictions and caveats of Lambda@Edge. 1 – Environment variables Lambda@Edge Functions . That meant that we needed to find another way to pass data to our function. We opted for and in the Node.js code (we use Terraform to render the template when deploying the Lambda Function). cannot use environment variables SSM parameters templated parameter names 2 – Lambda package size limit For viewer events (reminder: we use the event), the Lambda package can be . One MB is pretty small considering that it includes (except of course the runtime/standard library) of your Lambda Function. viewer-request 1 MB at most all dependencies That's why instead of the original Python, because the Python package with its dependencies exceeded the 1 MB limit. we had to rewrite our Lambda in Node.js 3 – Lambda region Lambda@Edge functions can region. It's not a big issue but it means you'll need to: only be created in the us-east-1 Provision your AWS resources in that region to make things easier In Terraform, you'll need to have a separate AWS to access the bucket you want to protect if it's not in provider us-east-1 4 – Lambda role permission The IAM execution role associated with the Lambda@Edge functions in addition to the usual . See . must allow the principal service edgelambda.amazonaws.com lambda.amazonaws.com AWS - Setting IAM permissions and roles for Lambda@Edge Authorization mechanism with Okta Once we managed the above restrictions and caveats, we focused on the authorization/authorization. Okta offers several ways to authenticate and authorize users. We decided to go with , the industry-standard protocol for authorization. OAuth2 Okta implements the which adds a thin authentication layer on top of OAuth2 (that's the purpose of the ID token mentioned hereafter). Our solution would also work with pure OAuth2 with minimal modifications (removal of the ID token use in the code). Note: OpenID Connect (OIDC) standard OAuth2 itself offers several depending on the kind of application using it. In our case, we need the . authorization flows Authorization Code flow Here is the complete diagram of the Authorization Code flow taken from that shows how it works: developer.okta.com To summarize the flow: Our Lambda Function redirects the user to Okta where they will be prompted to login Okta redirects the user to our website/Lambda Function with a code Our Lambda Function checks if the code is legit and exchanges it for access and ID by sending a request to Okta tokens Depending on the result returned by Okta, we: Allow or deny access to the restricted content If access is allowed, save the access and ID tokens in a cookie to avoid having to re-authorize the user on every page Using JSON Web Tokens to store authorization result So far we have a working authorization process; however, we need to check the access/ID token on (a malicious user could forge an invalid cookie/tokens). Checking the tokens means sending a request to Okta and waiting for the response on the user visits, which and is clearly sub-optimal. every request every page slows down the loading times significantly While local verification of the Okta token is , as of this writing uses a (in-memory) cache when fetching the keys used to check the tokens. Because we're using AWS Lambda, and memory/state of the program isn't kept between invocations, the SDK is useless to us: it would still send one HTTP request to Okta for every user request, to retrieve the JWKs (JSON Web Keys). Worse, there is a limitation of 10 JWK requests per minute, which would make our solution stop working if there is more than 10 requests per minute. Note: theoretically possible the SDK provided by Okta LRU We decided to use to work around this. The initial authorization process is the same except that, instead of saving the access/ID tokens into a cookie, we create a JWT containing these tokens, and then save the JWT into a cookie. JSON Web Tokens Since the JWT is cryptographically signed: A malicious actor cannot forge one (they would need the private key used to sign them) The checking step required on every request is fast: we traded a long and I/O expensive HTTP request for a quick cryptographic check. Note on JWT expiration and renewal The JWT has a , to avoid having a valid JWT containing expired or revoked access/ID tokens. Another option would be to check the access/ID tokens regularly and revoke the associated JWT if needed, but then we would need a revocation mechanism, which would makes things more complex. pre-defined expiration time which should be reasonably short Finally, as suggested above, the tokens provided by Okta have an expiration time. It is possible (so the user doesn't have to re-login when the tokens expire) but we didn't implement that. to transparently renew them using a refresh token Conclusion While adding OAuth2 authentication to an S3 static bucket with Okta (or any other OAuth2 provider) is possible in an AWS-integrated and secure manner, it's certainly not straightforward. It requires writing a middleware between AWS and the OAuth2 provider (Okta in our case) using Lambda@Edge. We had to do the following ourselves: Validate the user authentication Remember the user authentication Refresh the user authentication (not implemented in our solution) Revoke the user authentication (TTL is implemented, but revocation before the end of the TTL is not) Finally, a bunch of AWS resources must be created to glue everything together and make it work. All this is worth the effort, because it works and our website is now more secure. You can find the code of the Lambda@Edge as well as the infrastructure (Terraform) here: https://github.com/GuiTeK/aws-s3-oauth2-okta Author: Guillaume Truchot, Site Reliability Engineer at Algolia, GitHub

Flow

Amazon

BUNCH

Glue

How to Add OAuth2 Authentication to an S3 Static Bucket With Okta

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

3 Biggest Barriers to Efficient, Relevant AI Powered Search

3 Biggest Barriers to Efficient, Relevant AI Powered Search

A JavaScript API Client Can Be a SaaS Product. Find Out How.

A Simple Guide to Latent Semantic Indexing (analysis) and How it Bolsters Search

AI Is Devouring E-Commerce

AI Search Is the Next Big Thing: Here's 10 Reasons Why

3 Biggest Barriers to Efficient, Relevant AI Powered Search

3 Biggest Barriers to Efficient, Relevant AI Powered Search

A JavaScript API Client Can Be a SaaS Product. Find Out How.

A Simple Guide to Latent Semantic Indexing (analysis) and How it Bolsters Search

AI Is Devouring E-Commerce

AI Search Is the Next Big Thing: Here's 10 Reasons Why

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps