In one of our applications we developed the possibility for our users to upload files they wanted to keep a registry of. On the first stage we are storing them in our file system but sooner or later, better sooner than later, as the users upload more and more files we will need to store them in some file system that is specialised in that matter. So with this in mind we came to the conclusion that a good way to go is to upload the files to Amazon S3. Is this new functionality that this blog post will focus on, but first lets wear a sheep fur and take part in selling Amazon S3 to you.
Why S3?
Amazon S3 is designed so the developers don’t have to worry how will they store their data, if it’s going to be secure or if they’ll have enough storage available. With this S3 frees developers from the costs of setting their own object storage solution as well as the cost of maintaining and scaling their servers. Now developers can focus their time innovating with the data they have instead of figuring out how to store it.
Amazon S3 is built to fulfil these design requirements:
It provides the customers full control over who has access to their data. To the customers is also given the possibility to secure his data in transit and at rest.
You can store data with 99.999999999% durability and 99.99% availability. All failures in the system must be tolerated or repaired without any downtime.
Amazon S3 can scale in different forms. You can scale in terms of storage, request rates and users.
It must fast enough that the server-side latency must be insignificant compared with internet latency.
One of the most important design matter is that Amazon S3 must support all the functional requests of both Amazon internal applications and the external developers application of any kind. With this objective it means that it has to be fast enough and reliable enough to run Amazon’s websites, while flexible enough that any developer can use it for any data storage needed.
Some specifications
In Amazon S3 you can store objects from 1 byte to 5 terabytes of data and you can store an unlimited number of said objects. In where you store object is called a bucket, from this bucket you can only retrieve objects via an unique, developer-assigned key. A bucket can be stored in several regions across the globe, so you can choose the region that fits you the most to optimise latency and minimise costs.
Amazon provides authentication mechanisms to ensure that your data is secure from unauthorised access, you can make an objects private or public and even grant or revoke rights to the objects to specific users.
You have the option to make uploads or downloads form Amazon in a secure channel and also you can encrypt your data as an additional protection. For developers Amazon S3 uses REST and SOAP standards so it can work with any internet development kit.
OK, now that we talked a little about the insides of the S3 lets watch some kittens try to catch a red dot…ahahahahahahah damn cats. Ups sorry got distracted there, where were we? Oh yes…So now we’re gonna protect your data.
Protecting your data
In Amazon S3 only the bucket and object owner have access to the data they create, it supports multiple access control mechanisms and encryption for both a secure transit and a secure storage on disk. You have several mechanisms where you can control who can access your data, as well as how, when and even where they can access it. In Amazon S3 you have four different access control mechanisms:
With this mechanism an organisation with several employees is able to create and manage multiple users under a single AWS account. Having this mechanism you can grant users fine-grained control to your Amazon S3 bucket or objects.
This mechanism is used to grant certain permissions on individual objects.
Bucket policies are used to add or deny permissions to a certain bucket applying them to all or just certain objects within that bucket.
Having this mechanism will make the URLs that you share valid only for a predefined expiration time.
After you secured the access to your data you’ll probably want to secure the communication between you and Amazon S3, to do this you can make your requests to upload/download files via the SSL encrypted endpoints using the HTTPS protocol. As an additional measure of security you can use Amazon S3 to manage the encryption and decryption of data, if you prefer that Amazon S3 manage your encryption keys you can use Amazon S3 Server Side Encryption (SSE). On the contrary if you like to manage your own encryption keys, you can use the Amazon S3 Server Side Encryption with Customer-Provided Keys (SSE-C). In both options Amazon S3 will encrypt your data on write and will decrypt your data on retrieval.
And with this we end the selling part, so let’s get our gads dirty with some code!
Uploading to Amazon S3
When we started to search how are the uploads made to Amazon S3, we discover a Railscasts video (“Episode 383 — Uploading to Amazon S3”) where the author did exactly that but he was using the carrierwave gem to do the upload of the files, we use the dropzone javascript plugin with the paperclip gem to do the image processing part. Even so we used a helper that Ryan Bates created to help with the form to do the upload.
The fields that matter for Amazon S3 are the ones in the ‘fields’ method, these ones are placed within the form as hidden fields that Amazon S3 will use to verify, process and respond to your request. They are as follows:
This field represent the ACL mechanism for your objects.
The policy field is a encrypted hash in which the fields that compose it represent some restrictions to be applied to the request that is going to be made. For example you have the expiration field that indicates for how many time is that form valid to make the requests.
Identifies in which bucket is the request going to be made.
Represents the HTTP Status code that is going to be used in the response.
In here you specify the path in where the file is going to be stored.
You can use this helper by simply doing this:
One thing to have in mind is that you’ll need to configure your Amazon S3 bucket with the right CORS configuration to be able to make requests against that bucket. One example of said configuration is stated bellow.
After you successfully uploaded your files to Amazon S3, you may want to access your bucket and object directly from your code. For this you have the ‘aws-sdk’ gem, an example of usage of it is as follows:
As you can see with this gem you can easily make operations in your buckets, such as delete your objects or copy your object from a path to other.
Well that wasn’t that hard was it ? With this simple introduction you’ll be able to upload your files to Amazon S3 and you can proudly say that you’re using the cloud.