You've deployed your app on an EC2 instance, and there's a file in an S3 bucket that you need to access from the app. You created a public S3 bucket and uploaded the file, and it works! But then you read somewhere that keeping your private files in a public S3 bucket is a bad idea, so you set out to fix it.
Here's the initial setup, and you can deploy it here:
This is what it looks like before the solution:
This is what it looks like with the solution:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowAccessToSpecificBucket",
"Principal": "*",
"Action": "s3:*",
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::REPLACE_BUCKET_NAME",
"arn:aws:s3:::REPLACE_BUCKET_NAME/*"
],
"Condition": {
"StringEquals": {
"aws:sourceVpc": "REPLACE_VPC_ID"
}
}
}
]
}
{
"Version": "2012-10-17",
"Id": "Policy1415115909153",
"Statement": [
{
"Sid": "Access-only-from-SimpleAWSVPC",
"Effect": "Deny",
"Principal": "*",
"Action": [
"s3:PutObject",
"s3:GetObject"
],
"Resource": [
"arn:aws:s3:::REPLACE_BUCKET_NAME",
"arn:aws:s3:::REPLACE_BUCKET_NAME/*"
],
"Condition": {
"StringNotEquals": {
"aws:SourceVpce": "REPLACE_VPC_ENDPOINT_ID"
}
}
},
{
"Sid": "Access-from-everywhere",
"Effect": "Allow",
"Principal": "*",
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::REPLACE_BUCKET_NAME",
"arn:aws:s3:::REPLACE_BUCKET_NAME/*"
]
}
]
}
Go back to the browser tab where you pasted the public IP address of the instance and refresh the page
Before deleting the CloudFormation stack, you'll need to empty the S3 bucket! The Node.js app puts a file in there.
First of all, you'll notice that a VPC Endpoint is for one specific service, S3 in this case. If you wanted to connect to other services you'd need to create a separate VPC Endpoint for each different service.
The second thing you'll notice is that there are 2 types of endpoints: Interface and Gateway. Gateway endpoints are only for S3 and DynamoDB, while Interface endpoints are for nearly everything. Gateway endpoints are simpler, so use them when you can (except if you're writing a newsletter and want to show a few things about Interface endpoints).
Interface endpoints work by creating an Elastic Network Interface in every subnet where you deploy it, and automatically routing to that ENI the traffic that's addressed to the public endpoint of the service. That way, you don't need to make any changes to the code. This only works if you check the Enable DNS name.
The existing policy is a Full Access policy, which is the default policy when a VPC endpoint is created. It allows all actions on the S3 service from anyone.
Instead of that, we're setting up a more restrictive policy, which only allows access to our specific bucket, and denies access to all other buckets.
VPC Endpoint policies are IAM resource policies, and as such, anything that's not explicitly allowed is implicitly denied.
Bucket policies are another type of IAM resource policy. Obviously, this bucket policy will only apply to our S3 bucket. It's important to add it because, while we've restricted what the VPC Endpoint can be used for, the S3 bucket can still be accessed from outside the VPC (e.g. from the public internet). This bucket policy is the one that's going to prevent that, restricting access to only the VPC Endpoint.
In this case, I kept internet access for the VPC and for the EC2 instance itself, just to make it easier to trigger the code with an HTTP request. This solution is a good idea in these cases because traffic to S3 doesn't go over the public internet, but admittedly, the public internet is a viable alternative.
Where this solution matters more is when you don't have access to the internet. Sure, adding it is rather simple, but you're either exposing yourself unnecessarily by giving your instances a public IP address they don't need or you're paying for a NAT Gateway. In those cases, VPC Endpoints are a much simpler, safer, and cheaper solution.
Conceptually, you can think of this as giving the S3 service a private IP address inside your VPC. In reality, what you're doing is creating a private IP address in your VPC that leads to the S3 service, so that conception is pretty accurate! Behind the scenes (and you can see this easily), the VPC service creates an Elastic Network Interface (ENI) in every subnet where you deploy the VPC Endpoint. Those ENIs will forward the traffic to the S3 service endpoints that are private to the AWS network.
Also, behind the scenes, there's a Route 53 Private Hosted Zone that you can't see, but which resolves the S3 address to the private IPs of those ENIs, instead of to the public IPs of the public endpoints. That's why you don't need to change the code: Your code depends on the address of the S3 service, and that private hosted zone takes care of resolving it to a different address. You can't see this privately hosted zone, it's managed by AWS and hidden from users.
aws s3api get-object --bucket 12ewqaewr2qqq --key thankyou.txt thankyou.txt --region us-east-1
Also published here.