Building a Highly Scalable Imgur Clone with Lambda and S3

Written by elliot_f | Published 2018/02/14
Tech Story Tags: aws-lambda | technology | serverless | software-development | programming

TLDRvia the TL;DR App

So my previous 2 attempts at becoming a millionaire overnight have been resounding flops. Sure, I’ve managed to drum up a bit of excitement but I’ve not yet got a Ferrari sitting in the driveway of my own place yet…

This third attempt will surely be a winner, right?

I’m going to build a website that is able to rival that of popular image hosting site Imgur. This will be a highly scalable, resilient and low cost solution that will leverage serverless technology in order to become a resounding success. More specifically, I’m going to use AWS’ Lambda offerings to host both the single page application and the series of endpoints that will make up the site.

In terms of monetization? I guess I’ll wait until I see the investments rolling in tomorrow from the hordes of angel investors looking for a bargain! I’ll be launching my own car into orbit soon enough…

Traffic Projections.

Requirements

Ok, as always, we need to define a series of requirements that our project must adhere to in order for this to be considered a success.

  • The site must be highly scalable from the outset. This will ensure that when it goes viral, it won’t crash. Thanks to our endpoints being based on Lambda, we don’t have to worry too much about scalability.
  • The site must have some form of Authentication/Authorisation so that people can’t just start hitting an unauthenticated endpoint and uploading a million images and ruining our fun.
  • The site must not cost me an arm and a leg to host!

These are fairly achievable for a small project and shouldn’t take too long to implement, whilst also showing how powerful a combination of Cognito, Lambda and S3 could be.

Stretch Tasks

In future articles I’ll be extending this so that I can play about with services such as Rekognition and dynamodb as practice for certification exams.

  • In a future article we’ll be looking at how I can use Rekognition to extract a series of tags from any images uploaded. These tags will then be stored within dynamodb and allow people to view certain categories of images.
  • I also want to implement a comment and an upvote/downvote system so that users can comment and vote on their favorite content on the site.

The Frontend in Vue.JS

The frontend of my latest million dollar idea is going to be built using Vue.JS 2. This will allow users to login/register + upload new images to the site. It will also display any and all images available on the site in true imgur style.

Version 0.01 will look like this:

Our simple frontend

Implementation

Our frontend will utilise the amazon-cognito-identity-js and aws-sdk node modules in order to communicate with our AWS Cognito service. We’ll need to pass the following information into our frontend’s config file in order to connect to Amazon’s Cognito service:

export default {

region: 'eu-west-1',

IdentityPoolId: 'eu-west-1\_9IBAarCx9',

UserPoolId: 'eu-west-1:853957954650',

ClientId: '43duengi4ldb6jel18p84sgq22',

s3SignedUrl: 'https://rvv1a9to8j.execute-api.eu-west-1.amazonaws.com/dev/upload-node'

}

At first, the thought of having these included in my frontend code was somewhat terrifying. If you were like me then you may have thought — “If I expose these then people will be able to pretend they are me and rack up millions of dollars of expenses”.

However, these values are only used to hit unauthenticated endpoints and *no* harm can be done by hackers with them.

Our Cognito Service

With the amazon-congito-identity-js library we can easily create signup functions, authenticate and verification functions. I won’t post the full cognito service file but have a look at the configure and signup functions:

The full source file for this can be found here: https://github.com/elliotforbes/imgur-clone/blob/master/src/imgur-frontend/src/cognito/cognito.js

With a little bit of setup in the AWS console, and 140-ish lines of code, we have a fully working user profile system. This is awesome when you consider how resilient, and fully fleshed out this is with very little work. I’ve not had to define a schema, set up a database, ensure database resiliency or anything like that. I simply configured a user pool and I was good to go.

I’ll be doing an in-depth tutorial on my youtube channel: https://www.youtube.com/tutorialedge over the next few days on how you can implement your own Cognito user management system.

Our Lambda Functions

Now that we’ve got a simple frontend sorted, we need our lambda functions that will allow us to do cool things like upload files, and retrieve the links to all of the images in our bucket.

In order for this to work, we’ll need 2 lambda functions to get us started.

  • A function that will return a signed url that will allow us to upload to our s3 bucket
  • A function that will return a simple JSON list of all of the items within our bucket.

In order to deploy our functions we will again be using serverless.com’s cli.

File Upload Lambda Function

So our “upload” function will not actually be the one performing the upload of the image to our s3 bucket. It will essentially just fetch a signed url that will then be hit with a PUT HTTP request in order to upload to S3.

A few key things to note here. We are creating an s3Params object that includes the bucket name we want to upload to, the key, which will be the file name, the content type, how long the signed url will remain valid for and the ACL. This ACL is the access control list and we need to set this to public-read in order for people to be able to view the images within the bucket.

This then calls s3.getSignedUrl() which returns the URL we subsequently upload to. Nice and simple.

List All Lambda Function

Our list all lambda function will do the job of querying all of the objects within our s3 bucket and returning them as a json response.

We could extend this further by associating all uploaded images with a key and a location in dynamo and then have the function return a paginated list of results from that, but in terms of a minimum viable product, this will do for now.

Serverless Deployments

Whenever it came to deploying my lambda functions, I could do so with ease using the serverless cli.

Within this serverless.yml file that I can define the IAM permissions as well as the authorizors used to protect my lambda functions. These authorizors will ensure that people can’t just script something up and hit these endpoints 5,000 times per second. They will also need to have a registered account and have the appropriate Authorization header set with a valid token.

Whenever I make a change to a function, I call serverless deploy and it deploys my serverless empire and provides me with the API endpoints that I can now hit with any HTTP requests. This deployment takes all of about 10 seconds for both of my functions so it is fairly quick.

The Joys of Serverless

One of the main tools underlying this entire project was the serverless cli. This greatly improved the way I was able to write my lambdas and subsequently deploy them. If you are interested in learning more about how you can manage all of your lambdas then I recommend checking out my other article titled, Managing your Lambda Empire with Serverless:

Managing Your Lambda Empire with Serverless_If you’ve been following me for a while, you will know that I am a huge fan of the concept of the new Serverless…_hackernoon.com

Full Source Code

If you want to try this out yourself, the full source code for this project can be found here:

elliotforbes/imgur-clone_Contribute to imgur-clone development by creating an account on GitHub._github.com

Demo Link

You can find the final product at this url: [POSSIBLY? NSFW] http://imgur-serverless-clone.s3-website-eu-west-1.amazonaws.com/#/

Test it out, register for an account and upload something appropriate! Bear in mind though that I haven’t had the time to implement small things like password resets.

Conclusion

This fairly simple project has a fully functioning user account system with email verification. It is also resilient and highly scalable and if you were to try and engineer something like this on dedicated servers you would require multiple servers in multiple data centers, you would have to set up load balancers and bake resiliency in to your underlying systems.

By leveraging serverless and the Cognito service, we are able to build a system that would have taken months or even years of development work to meet this same standard on legacy infrastructure.

Hopefully, you found this article entertaining and educational! If you liked this then feel free to let me know in the comments section below or by tweeting me Elliot Forbes. I am also on LinkedIn should you wish to connect!


Published by HackerNoon on 2018/02/14