This article is part of the Enterprise Cloud Security Series with Part I: Introduction introducing the space and how it differs from on-premise security. Part II covers the security consideration for building the cloud foundation.
Enterprise cloud infrastructure can primarily be split into two pieces: foundation and application landing zone. The foundation refers to infrastructure setup typically performed by the cloud platform team within the enterprise. The application landing zone is the part of the infrastructure provided to developers with appropriate guardrails in place to deploy applications and associated infrastructure.
Please note that depending on the size and maturity of the organizations, multi-cloud strategy and hybrid design, there may be a business unit, geolocation and/or privacy/sensitivity specific landing zones to enable the organization to meet its organizational or regulatory needs.
This article focuses on the role of security in various components that form part of the cloud foundation.
An important part of any cloud deployment within the enterprise is the need to organize the resources. Depending on the size of the organizations, there are different approaches that are recommended by cloud service providers.
Multi-account model
There has been significant interest in the use of an AWS Account, Azure Subscription, and GCP Projects to create landing zones for applications or higher organizational components.
This approach, if implemented correctly, can be an important part of the strategy to control the “blast radius” by providing micro-segmentation across identity and network which is an important tenet of zero-trust architecture. Most standard cloud models use one or more of the following types of landing zones.
Folders
Most Cloud Service Providers provide the capability to build a hierarchical model that enables organizations to structure a very large set of such landing zones into more manageable chunks. AWS Organizational unit (Control Tower), Azure Management group, and GCP folders can be used to structure landing zones based on the organizational structure, environments, security, and privacy considerations.
In addition to that, the ability to apply a specific set of organizational policies (AWS Config AWS service control policies, Azure Policy, GCP Organization policies) based on specific criteria (e.g. non-prod vs prod, accounts with PII data) can also play a role in designing this model. I have seen the organizational hierarchy, where supported, used to provide access to a large number of resources (e.g. Database Administrators may need access to all databases across the accounts) which may be a questionable use of this functionality.
Given that a lot of these features are currently evolving, it will be interesting to see whether organizations go through multiple phases of re-organization of management structure to better reflect and align with their requirements or use new capabilities that Cloud Service Providers introduce in this space.
Tagging
Tagging provides an orthogonal mechanism to the hierarchical model described above. Tagging is an important part of overall resource organization strategy and, if used correctly, can play an important part in security classification.
Tags can be used to apply a lot of security facets, like privacy, criticality across the resources instead of making these aspects part of the hierarchical model. But lack of consistent features like tag inheritance, disabling tag overriding, referencing tags in policies, and access rules have made use of this mechanism a challenge while designing management models within the enterprise.
Identity forms one of the core foundations of cloud platforms. It is very important to ensure that the identity layer is built to handle different use-cases and associated access models.
Account store
An important initial design decision for the cloud foundation is which user account store should the cloud platform use for each type of account. If the cloud’s internal account store (e.g. Azure AD Users, AWS IAM Users) is used, then each user account lifecycle must be managed within the cloud.
In addition to that, the internal store should also be able to support MFA, password management capabilities. Alternatively, by leveraging an external store and integrating through federated SSO, the user can be mapped (for example through SAML assertion or JWT token) during the authentication process to an access role or profile in an internal store. Thus, it removes the need to manage the account in the cloud account store.
Please note that there will still be some basic accounts like tenant owner, break-glass accounts, and service accounts that are needed for a technical reason. This should be created in the cloud’s internal account store and managed through applicable privileged account management processes.
Authentication
The cloud identity system should typically support user authentication against the internal store and through federated SSO mechanisms like SAML and OpenID Connect. It is important that the authentication process is flexible enough to support different types of authentication mechanisms for different types of accounts across all the access interfaces i.e. Portal/Web console, Command Line Interface (CLI), REST API. Lack of support for these mechanisms across all the access point has been one of the significant challenges to automation in past.
Types of accounts
The cloud should be designed to support access from the following types of users.
Access Model
Most cloud services support a Role-based access model (RBAC) that associates the permission(s) with a role based on the access model. The access model should follow the principle of least privilege to ensure that roles are defined in alignment with specific use-cases and then assigned to the users through a group.
It is important to go through the access model development for each and every service being used to ensure that permissions are classified in to at least core operational part associated with cloud foundation (e.g. creation of VPC or VNet) and application development specific set (e.g. creation of VMs within a specific subnet). Depending on specific use-cases and operational model, additional roles may be created for devops operations (e.g. continuous deployment).
Access Model for Azure RBAC
In addition to that, the following practices should be evaluated while designing an access control model.
Two aspects have a very significant impact on how most enterprises design the cloud:
Most enterprises use a hub-spoke model with defence in depth while designing network architecture even though that may not be the most appropriate model for zero-trust architecture.
A simple hub-spoke model is shown below.
Most of the previous hub-spoke model was built with the hub located in partner colocation or datacenter to ensure that all the traffic could pass through on-premise security appliances. Over the past few years, there has been significant growth in the availability of network appliances for routing, next generation firewall, Intrusion Detection System (IDS), Intrusion Prevention System (IPS) malware-scanning, content monitoring, data loss prevention (DLP) in the cloud. It enabled the creation of more efficient connectivity between regions or across the cloud without the need to route the traffic through a datacenter unless required due to organizational policy.
In addition to that, you can use intermittent networks between hub and application network with static routing to isolate sensitive workloads.
Some of the on-premise approaches like multi-homed VMs with dedicated nic for administration and backup services are typically not replicated in the cloud. Identify such practices and plan for alternate designs that may scale-up in the cloud.
Ensure that you plan for presence of shared services like DNS, NTP, Vulnerability scanning, EDR, etc in each cloud and region to reduce the need to communicate with on-premise infrastructure and collect logs in local storage for analysis to reduce egress charges.
Ensure adequate sizing of IP networks for the workload of different sensitivity to enable simple firewall rules and route rules to avoid a leak of traffic across sensitivity and criticality boundaries. Where available use named IP collections or tags to create rules to simplify rule updates across complex network architectures.
Where possible reduce calls to cloud control plane and data plane over a private network (e.g. Azure Private Link, AWS VPC endpoints) to reduce the flow of data over a public network.
An important lesson to take away is the intricate mesh various foundation technologies like identity, network, management structures form with each other and the security implementation percolates through these technologies. Besides these foundational technologies, there are additional considerations that should be kept in mind while building the cloud foundation.
Resiliency
Resiliency is an important pillar of building a foundation to ensure that the application can build the failover and disaster recovery over a platform that provides resiliency across identity, network, shared services. This is typically achieved through the right combination of leveraging cloud native capabilities like using global service, paired region, geo-replication and designing the shared services to be resilient across all the active regions and geographies.
Shared services
Most cloud foundations are developed with a few shared services like DNS, NTP, etc. These services may be either built-in cloud capability or deployed as infrastructure to achieve integration with on-premise infrastructure where such integration is not possible for built-in cloud capability. It is very important to ensure that all the controls expected to secure any other application should be applied to these services including but not limited to
Log aggregation
One of the shared services that forms an important part of security operations is log aggregation infrastructure. This capability enable collection of logs across different landing zones, services, platforms, networks into one or more aggregation storage site like AWS S3 bucket, Azure Log Analytics Workspace or GCP Cloud storage for further analysis. The log aggregation platform designed should be able to handle the following requirements in addition to the regular security requirements identified above for shared services.
This article tries to cover various security consideration while building the cloud foundation within an enterprise. This is an on-going exercise that I will try to continuously improve upon.
Also published at https://medium.com/jhash/enterprise-cloud-security-foundation-f2cdeb0c84a4