Engineering teams that build, scale, and manage cloud-based applications on AWS know that at some point in time, their applications and infrastructure will be under attack. But as applications expand and new features are added, securing workloads in AWS becomes an increasingly complex task.
To add visibility and audibility, AWS CloudTrail tracks the who, what, where, and when of activity that occurs in your AWS environment and records this activity in the form of audit logs. Accordingly, CloudTrail audit logs contain information that is key to monitoring the actions performed across your AWS accounts, identifying possible malicious behavior, and surfacing parts of your infrastructure that might not be configured properly.
It's crucial to safeguard CloudTrail logs and enhance the security of them, to meet compliance regulatory requirements and internal business needs. In this post, I'll show what I did to achieve a better level of security in the way I store, process, and analyze audit logs in AWS.
Having one bucket for each account it's a nail on the shoe. I need to manage multiple data storage, and I don't have a way to set up alerts for all accounts (and many other problems too).
So, I decided to use one S3 bucket for all cloud trail logs. It's just a central place where I store and manage all data that comes from different accounts. Manage all logging data in one place, in terms of security, it's the best approach. I have more control over who has access to logs, and what they can do (permissions). Also, if I need to do some data transformation or aggregation, I can do it, in just one place.
Fortunately, CloudTrail has a way to configure logging to let us centralize data. I need to set up in every trail, the destination storage where I'd like to send logs; in this case, the storage will be an S3 bucket, in another central account. Also, I have to allow cloudtrail service to put data on the bucket. I've used this tutorial to create a bucket policy for multiple account setups. I've just created a bucket policy and add all accounts that I manage, simple as that.
A separate AWS account gave me the possibility to restrict and isolate the access to the logs, keeping them in secure storage where only security people could access them, and nobody else in the company.
I created individual IAM users for people and applications, that strictly need access to logs for their daily tasks. Each user has his/her own policy that allows them to access the logs.
If you need to comply with a security standard, like PCI-DSS for example, it's easiest to show the auditors that you store all security-related logs in a separate account, and only the people and applications that need to consume the logs, have users in security account.
Fewer users to audit === Easiest for the auditor to give the approval.
Most companies use multiple AWS accounts to delegate and separate teams, business units, or whatever they want to segregate.
In the past, configure each account to set up logging trails was a waste of time; I usually did the configuration setup using cloudformation or Terraform to automate all account setup security tasks.
With the help of Organizations, I don't need to do that anymore.
I can create an Organization trail, that every account that belongs to my organization will have. It runs automatically every time I add a new account, without needed to deploy anything.
From a security perspective, it has special features that make them very attractive and sexy (Blue team members will understand).
Account users will not have permissions to modify, delete or stop the logging activity.
Pretty good right?
This is the best way to assure that all accounts, in every region, will log all the activity to my S3 bucket, without taking care of malicious users trying to stop logging or deleting the trail. It means that I don't need to implement something to take care of compliance of logs, because Organizations will do this task for me.
There are various standards in the IT industry that enforces every company must keep logging data for X number of years. In these circumstances, you need a way to ensure all of your data will be backed up in some storage layer.
My first question was...
How much will cost store logs for years?
By default, CloudTrail logs are encrypted with the S3 encryption key. But I want another layer of protection, so I decided to use a CMK (Customer Master Key) in KMS, to manage the encryption key. I have more control over who could decrypt logs, and I added another layer of protection; users/apps that want to read logs, will need S3 permissions (like before) and KMS permissions to use the decryption key.
The decryption process is seamless through S3. When authorized users of the key read CloudTrail log files, S3 manages the decryption, and the authorized users can read log files in unencrypted form. So, I don't need to take care of encryption/decryption of log files, because AWS will do that for me.
How can I make sure that nobody has modified my logs?
Fortunately, CloudTrail provides us with Log Validation. Every one hour, CloudTrail creates a Digest file per region in my S3 bucket with all the hashes of the files inserted in my bucket in the past hour. Also, CloudTrail signs the Digest with the RSA algorithm, using a private key, and puts the sign inside the metadata of the Digest file itself.
With this mechanism, if I want, I could validate 2 important things about my log files:
I don't need to validate on my own (I could if I want), it's easiest to use AWS CLI to perform the validation, using this command:
aws cloudtrail validate-logs --trail-arn <trailARN> --start-time <start-time> [--end-time <end-time>] [--s3-bucket <bucket-name>] [--s3-prefix <prefix>] [--verbose]
I also could use this command as proof for compliance standards auditions.
In new trails created by CloudTrail, log validation is enabled by default, so you don't need to do anything. But in old trails, maybe you need to activate this feature in the console or via API.
The digest files are put into a folder separate from the log files. This separation of digest files and log files enables you to enforce granular security policies.
A very important point that I've done is separate permission of logs from permissions of Digest files. I don't want anybody could touch these files because if someone would delete them, I wouldn't validate any log files in the bucket. So, I decided to enforce this control on bucket policy. I've denied all users to delete Digest files, and limited bucket policy edit permission to just one user in the account.
Apart from that, I enabled MFA delete protection on Digest objects to prevent accidental deletion or malicious activity from a compromised user.
PII is a good example of sensitive data that could end up inside logs. Suppose that you are tagging resources with employee's information, like full name and email address. When something happens, cloudtrail will log all the activity, including the tags that are associated with your resources, and the PII data that it includes with them.
If this is your case, and you have to make sure that nothing ends up inside logs, you could send logging data to kinesis for stripping high sensitive information from logs. Cloudtrail doesn't support native kinesis integration, but you can configure a lambda trigger inside the S3 bucket for every object inserted and Lambda will send it to kinesis for further processing.
Why are you sending logs to kinesis and not processing directly from Lambda?
Simply, because if something happens with lambda, and it stops processing my logs, I wouldn't know where to start again. But if I send data to kinesis, the Lambda function that sends the data has a simpler logic and is faster. After that, I could strip, aggregate, or transform my logs with a more complex Lambda function that is integrated into the kinesis flow.
With rule matching, I can get alerts of activity in my environment and I can react very fast to respond to incidents and stop them.
Cloudtrail has a feature to send logs directly to Cloudwatch, I just need to specify the Cloudwatch log group inside trail configuration. After that, inside Cloudwatch, I can set up rule matching patterns for logs inside the log group, like detecting root activity usage, or launching instances in a region that my company doesn't allow.
It's the simplest way to set up alerts, but I should take care of storage costs for Cloudwatch. It could be very expensive if I store logs there for a long period. So, to avoid thousands of dollars in the bill (and probably get fired for that), I decided to delete logs after a short period of time. The reason is that I only need logs in Cloudwatch just for the alerts, if nothing matches, I could delete them (because I have a copy of the logs stored in S3).
Is that possible? Are you kidding? Millions of logs won't fit in a relational DB.
Yes, you're right, too much volume for a relational DB. But... with Athena, I am not using a relational DB, I am using s3 as my storage, and using a presto-as-a-service (Athena) to query logs stored in s3.
If I centralize logs in one bucket and divide the data in such a way that I could separate data (for example, using accounts as tables), I could query millions of logs in a matter of minutes. Athena let me run multiple queries in parallel, it's useful when you are running playbooks for incident response, and at the same time, a lambda function for detecting anomalous activity is running their queries.
I consider this alternative as the best fit for incident response, and also for alerts when you have millions of logs, and you can't manage them with Cloudwatch.
In the future I'll publish an entire article dedicated to this topic, it's fascinating and it has a lot of complexity itself.
S3 has an interesting feature that is object locking. It prevents that object could be deleted from your bucket, even if you have the right permissions to execute the delete action. It works at the version object level, this means that I can only lock a specific version of the object.
There are two modes of locking, governance, and compliance. The main difference between them is that the compliance mode is stricter than governance. In compliance mode, nobody could deactivate or bypass the locking, even the root account.
I could enable object locking as a bucket default configuration, which means that each new object put on the bucket will have a locking, no matter who executed it.
It's a good weapon to prevent log tampering in your environment. It's so common that malicious actors modify or delete your logs to hide their activity, but with this feature, you are securing your logs to prevent deletion.
I hope that you've learned something new with my post, and if this is your case I encourage you to become a member of my awesome telegram channel.