AWS S3 Static Website Hosting
. It is cheap, scalable, and “performant”. Especially when it tag team with CloudFront.
This is a documentation of how to host a Single Page Application (React for this case) on AWS S3 with SSL over CloudFront using this pet project of mine as an example.
A simple static site so no redux is used; this setup would also work with redux. So its gonna be react and react router mainly. Here are the specifics:
The bundler I am using is webpack: ^3.5.5
.
S3 can host static website apart from just storage.
Note that each bucket is meant for only 1 website, that is you cannot have a bucket called my-static-websites
and have each directory hosting 1 website. No. It is going to be per website per bucket.
Set up the static website hosting configuration as such for the bucket. Take note of the Endpoint
.
This setup is saying:
index.html
.So when we upload the react project into the bucket:
/something
instead of /
will show you a blank screen, or the error.html
page if one was setup :((((What is happening? Well the /something
path is looking for a file something.html
in the S3 bucket but it was not to be found. Since this is a Single Page Application, there is only 1 html file, 1 GOD html file.
So here is the challenge.
We need to map all paths to the
index.html
file.
Since this is a react project, we do not need to map each path to a specific other html page like a typical website; the index.html
will load the javascript bundle and react router will get to work to show users the correct page based on the path.
Hygiene pages
Not sure if this is the correct term for sitemap.xml
and robots.txt
files but yea you’ll need these files for SEO. These files go into the root directory of your bucket as siblings to the index.html
file. and the url to them are, eventually, https://www.yourdomain.com/robots.txt
and https://www.yourdomain.com/sitemap.xml
respectively.
CloudFront is the CDN of AWS it can handle the mapping of the routes, on top of caching the site.
Start off by creating a web distribution. The key configurations I will like to mention are:
Endpoint
of the static-website-hosting-enabled S3 bucket as mentioned in the previous section here.Redirect HTTP to HTTPS
to ensure your website is always viewed over HTTPS, and there is no duplicate instance under the HTTP protocol that is accessible by the public.Whitelist
and add in the Origin
header. This is to avoid any CORS related errors.Custom SSL Certificate
and upload you own ssl certificate, along with the private key and CA bundle via Amazon Certificate Manager
.Yes
. CloudFront will automatically compress your uncompressed assets from S3 and improved your page speed, by Google standards. Exchange all the deep apache/nginx/IIS setups with just a radio button — that’s like trading a wight for a dragon.Create the CloudFront distribution and wait for it to get deployed. Take note of the distribution’s Domain Name
.
After creating the CloudFront distribution, while its status is In Progress
, proceed to the Error Pages
tab. Handle response codes 404 and 403 with Customize Error Response
.
Google recommends 1 week or 604800 seconds of caching.
What we are doing here is to set up CloudFront to handle missing html pages, which typically occurs when a user enters an invalid path or, in particular, when they refresh a path other than the root path.
When that happens:
index.html
for the case of a Single Page Application like this project exampleindex.html
page instead.Why do we need to handle 403 as well? It is because this response code, instead of 404, is returned by Amazon S3 for assets that are not present. For instance, a url of https://yourdomain.com/somewhere
will be looking for a file called somewhere
(without extension) that does not exist.
PS. It used to be returning 404, but it seems to be returning 403 now; either way it is best to handle both response codes).
I intend to use the www version of the domain.
Go to the DNS zone file and set up as such.
This setup indicates:
domain.com
will be redirected to www.domain.com
http
to https
I am using namecheap.com as my DNS service provider, and they come with an option to redirect https
or http
non-www
to https
www
at the DNS level.
However.
If your DNS service provider does not provide this function, you can use AWS S3 to do the redirect instead. Create another bucket with these settings.
Set the value DNS A record of the root domain to the end point of this bucket.
What will be achieved is all non-www
request will be directed to this bucket. This bucket will in turn redirect the request to the www
domain, which points to the bucket where the files are. And yes it will be a 301 redirect. In case you are wondering, this is the significance of a 301 redirect.
Conversion of http
to https
will be handled by CloudFront configuration (Viewer Protocol Policy) that was setup previously.
At this point of time, you should be able to access your site like a normal website. Refreshing at a path other than the root path should also work.
All non https requests will be redirected under the https protocol.
All non www request will be redirected to the www domain under the https protocol as well.
Bots and crawlers should be able to access your robots.txt
and sitemap.xml
files as usual.
Pros
Cons
index.html
file. Alternatively, you can give a lower caching period for only the index.html
file.In this section of the article, I will be documenting how to automate the deployment process of such a site in such a setup from just the command line.
To start off, you will need to create an IAM user and give it the necessary S3 permissions.
Note the access key id
and the secret access key
, as well as the User ARN
.
IAM users are access control configuration in your AWS account, principally to answer the question of who can do what to which of the services under your account.
Let’s call this user iam_user
.
Change the bucket policy to allow this iam_user
to make changes to the bucket.
{"Version": "2012-10-17","Id": "someID","Statement": [{"Effect": "Allow","Principal": {"AWS": "arn:aws:iam::123456789:user/iam_user"},"Action": "s3:*","Resource": "arn:aws:s3:::bucket-name"}]}
As this is a simple, mostly static, website, there is no testing scripts or any CI server set up for the deployment procedure. It will just be a simple task to upload new files to the correct bucket in S3 using AWS CLI.
Cleanup
But before uploading, make sure you clean up the distribution folder where you build your files for the production environment. Since I use webpack as my bundler, I utilise the clean-webpack-plugin to help me dispose of old files before building new ones. This is to prevent uploading the same old assets again to the bucket.
# webpack.config
const CleanWebpackPlugin = require('clean-webpack-plugin')const HtmlWebpackPlugin = require('html-webpack-plugin')const pathsToClean = ["dist"]const cleanOptions = {}
...
output: {path: path.resolve(__dirname, "dist", "assets"), // all files are bundled into the dist/assets sub-directorypublicPath: '/assets/',filename: 'bundle.js'},
...
plugins: [...,new CleanWebpackPlugin(pathsToClean, cleanOptions), // cleanup the whole "dist" foldernew HtmlWebpackPlugin({template: "./src/index.production.html",filename: "../index.html" // all files are bundled into the dist/assets sub-directory, but index.html will be placed 1 directory up in the dist directory itself}),
...]
Uploading
Now to upload the files to S3.
To prevent any Tom Dick and Harry from being able to do so, authentication is required. This is where all the work for IAM comes into play.
We will use a script to do the uploading, with custom configuration to authenticate the request.
You can use --dryrun
flag to test your script before actually doing the upload. This is the final version of my script.
aws s3 cp ./dist s3://better-cover-letter --recursive --exclude "*.DS_Store" --acl public-read --cache-control public,max-age=604800 --dryrun --profile iam_user
The --exclude
flag is to prevent the upload of the irritating, ever present .DS_Store
file in macOS.
The --acl
flag will set the access control level of the files. Make it public readable so people can access your site, otherwise they will be slapped with a 403 Forbidden
message.
The --cache-control
flag adds the cache-control header to the S3 objects when Cloudfront calls for them. These cache control headers will be passed to the browser to leverage on browser caching and thereby increasing page speed. 604800 is 1 week in seconds, so this max-age
value will cache these assets for a week.
[Google] recommend[s] a minimum cache time of one week and preferably up to one year for static assets, or assets that change infrequently
The --profile
flag is used to set the specific IAM user credential to authenticate this operation. As I am using this same macbook pro for my work and my personal projects, I have multiple AWS accounts to handle, thus the need for this flag to differentiate the different IAM users. Check out AWS CLI named profiles for more information. These are my config and credentials files for your reference.
# ~/.aws/config[default]region=us-west-2output=text
# ~/.aws/credentials[iam_user]aws_access_key_id=somethingaws_secret_access_key=something
[company_user]aws_access_key_id=something_elseaws_secret_access_key=something_else
The aws_access_key_id
and aws_secret_access_key
are specific to the iam_user
that was created.
Once you are ready, you can remove the --dryrun
flag and do a test run to ensure that your files are indeed uploaded to the correct bucket. Yes, a test run. It is not the end of the deployment step. We can go further to completely automate the whole process.
NOTE: AWS S3 does not charge data transfer in to the bucket, only out. So feel free do spam deployment. (In fact, S3 does not charge data transfer out to Cloudfront.)
Combine the Steps
As it stands now, we have to build our site first using webpack -p — config webpack.config.js
to generate the files, then upload the files using theaws s3 cp
command.
To make our life better, we can create a new script command to run these commands one after another, without having us to be there waiting for the first command to finish then manually execute the other.
# package.json
..."scripts": {..."deploy": "webpack -p --config webpack.config.prod.js && aws s3 cp ./dist s3://better-cover-letter --recursive --exclude "*.DS_Store" --cache-control public,max-age=604800 --dryrun --profile iam_user"...}
So just run npm run deploy
and these will happen in chronological order.
dist
folder (based on my webpack config file)There it is, the fully automated process for uploading the static website.
If you are bundling your javascript files with a hash like me, you will find your S3 bucket accumulating with old js file instead of getting replaced by the new ones since they are different files by virtue of the hash in their file name, eg bundle-0af19d01880334b789.js
. Not so much if you are uploading just bundle.js
which will replace any bundle.js
present in the bucket.
Since storing files in S3 isn’t free, albeit not that expensive either, its still wise to remove files that you will never be using again.
So we can use AWS CLI again to do a removal of these old js files before upload (note: I am leaving the files in the root directory of the bucket untouched, just cleaning up the assets
folder).
aws s3 rm s3://better-cover-letter/assets --recursive --profile iam_user --dryrun
Once again, combine them in the deploy script.