Hello readers! Today, we're going to walk you through creating an Azure Service (AKS) cluster within a Virtual Network ( ) in the Azure world using . Kubernetes VNET Terraform Terraform is a popular Infrastructure as Code (IaC) tool that allows you to provision and manage resources in your cloud environment. AKS, on the other hand, is a managed container service that simplifies Kubernetes deployment and operations. What are we going to create? In this blog, we will look at how to create the following resources in Azure Cloud. Resource group Virtual Network (VNET) Subnet AKS Cluster with default system nodepool Optionally create worker nodepools Connect to the AKS Cluster and validate the functionality by installing Helm Chart nginx We will skip going into detail on terraform modules as we have already covered those in detail in our blog . Create EKS cluster within its VPC The complete terraform code for what we will discuss below is in this . repository Prerequisites Basic understanding of Azure, Terraform, and Kubernetes. An active Azure account. If you don't have one, you can create a free account and an Azure subscription where you want to create the resources in. Azure CLI installed. Terraform installed. kubectl compatible with the AKS version you are installing. terraform-docs if you want to auto-generate the documentation and tfswitch to manage multiple versions of terraform helm a package manager for Kubernetes manifests, we will use it to install nginx helm chart once the cluster is created. Content Overview Modular Structure Terraform Modules Setting Up the Environment Deploy and Validate Clean Up Conclusion References Author Notes Modular structure The following is the module structure we have used to structure the terraform modules. You can simply clone this with all the terraform manifests to create an AKS cluster within its VNET. repository Please refer to the blog on how to , to understand the terraform modularization and the terraform file structure below. We have explained in detail about , , and file. Create EKS cluster within its VPC main.tf variables.tf outputs.tf .tfvars my-aks-tf/ # root directory . ├── cluster # scaffold module which invokes aks and vnet_and_subnets module │ ├── main.tf │ ├── outputs.tf │ └── variables.tf ├── modules │ ├── aks # module to create k8s cluster and worker nodepools │ │ ├── main.tf │ │ ├── outputs.tf │ │ └── variables.tf │ └── vnet_and_subnets # module to create resource group, vnet and subnet │ ├── main.tf │ ├── outputs.tf │ └── variables.tf ├── main.tf # invokes cluster module to create aks cluster in its vnet ├── outputs.tf ├── sample.tfvars # sample variables file └── variables.tf Terraform Modules The following are the terraform modules we will create in directory. You can refer to the above section for the directory structure. We will look at the respective terraform files below. Please note the terraform files may have been abbreviated for brevity, the complete code is available in this . my-aks-tf repository modules These are the APIs created by the Platform team, these modules can also be separated out to their dedicated repository in the real world and can be imagined as being used as a reference by remote modules prepared by the users wanting to claim the infrastructure. vnet_and_subnets This is an opinionated module created by the Platform team to create an Azure Resource Group, Azure Virtual Network, and Azure Subnet. Create the following files under directory. modules/vnet_and_subnets file below locks down the Azure provider version we have validated this module with and also externalizes the vars like , and where the resources need to be created. main.tf names address_space region The following file may have been abbreviated for brevity. The complete working code can be found here # setup azure terraform provider terraform { required_providers { azurerm = { source = "hashicorp/azurerm" version = "=3.65.0" } } } # azurerm_resource_group to create azure resource group # official documentation https://registry.terraform.io/providers/hashicorp/azurerm/3.65.0/docs/resources/resource_group resource "azurerm_resource_group" "az_rg" { name = var.resource_group_name location = var.region tags = merge(var.tags, var.additional_resource_group_tags) } # azurerm_virtual_network to create the azure vnet in the azure resource group # official documentation https://registry.terraform.io/providers/hashicorp/azurerm/3.65.0/docs/resources/virtual_network resource "azurerm_virtual_network" "az_vnet" { name = var.vnet_name location = azurerm_resource_group.az_rg.location resource_group_name = azurerm_resource_group.az_rg.name address_space = var.address_space tags = merge(var.tags, var.additional_vnet_tags) } # azurerm_subnet to create the azure subnet in the azure vnet # official documentation https://registry.terraform.io/providers/hashicorp/azurerm/3.65.0/docs/resources/subnet resource "azurerm_subnet" "az_subnet" { name = var.subnet_name resource_group_name = azurerm_resource_group.az_rg.name virtual_network_name = azurerm_virtual_network.az_vnet.name address_prefixes = var.subnet_address_prefix service_endpoints = var.service_endpoints } The file mentions the variables being accepted as inputs from the user, which you can see being referred as in the above file. variables.tf var. main.tf The following may have been abbreviated for brevity. variables.tf variable "resource_group_name" { type = string description = "The Name for this Resource Group. Changing this forces a new Resource Group to be created." } variable "vnet_name" { type = string description = "The name of the virtual network. Changing this forces a new resource to be created." } variable "subnet_name" { type = string description = "The name of the subnet. Changing this forces a new resource to be created." } variable "region" { type = string description = "The location/region where the resource group. Changing this forces a new resource to be created. We will create the vnet and subnets in the same location/region where the resource group is." } variable "address_space" { type = list(string) description = "The address space that is used the virtual network. You can supply more than one address space but for our module implementation we are limiting it to 1 address space only." default = ["10.1.0.0/16"] validation { condition = length(var.address_space) == 1 error_message = "Only a single address space can be set. Please check your subnet address prefixes." } } variable "subnet_address_prefix" { type = list(string) description = "The address prefixes to use for the subnet. Currently only a single address prefix can be set as the Multiple Subnet Address Prefixes Feature is not yet in public preview or general availability." default = ["10.1.0.0/16"] validation { condition = length(var.subnet_address_prefix) == 1 error_message = "Only a single address prefix can be set. Please check your subnet address prefixes." } } variable "service_endpoints" { type = list(string) description = "The list of Service endpoints to associate with the subnet. Possible values include: Microsoft.AzureActiveDirectory, Microsoft.AzureCosmosDB, Microsoft.ContainerRegistry, Microsoft.EventHub, Microsoft.KeyVault, Microsoft.ServiceBus, Microsoft.Sql, Microsoft.Storage, Microsoft.Storage.Global and Microsoft.Web." default = [] } variable "tags" { type = map(any) description = "common tags to be assigned to all the resources" default = {} } variable "additional_vnet_tags" { type = map(any) description = "additional tags for vnet" default = {} } variable "additional_resource_group_tags" { type = map(any) description = "additional tags for resource group" default = {} } The file will output the necessary ids the user of this module might need to consume and probably use as input to other modules. For example, we will need and as input to module below. outputs.tf resource_group_name subnet_id aks The following file may have been abbreviated for brevity. output "az_rg_id" { description = "The ID of the resource group" value = azurerm_resource_group.az_rg.id } output "az_rg_name" { description = "The ID of the resource group" value = azurerm_resource_group.az_rg.name } output "az_vnet_id" { description = "The ID of the vnet" value = azurerm_virtual_network.az_vnet.id } output "az_subnet_id" { description = "The ID of the subnet" value = azurerm_subnet.az_subnet.id } aks This is an opinionated module to create an AKS Cluster with a default nodepool along with optional ability to create more worker nodepools. Create the following files under directory. modules/aks below file below locks down the azure provider version we have validated this module with and also externalizes the vars like , , config etc.. to create the AKS cluster. main.tf cluster_name k8s_version nodepools The following file may have been abbreviated for brevity. The complete working code can be found . here # setup azure terraform provider terraform { required_providers { azurerm = { source = "hashicorp/azurerm" version = "=3.65.0" } } } # azurerm_kubernetes_cluster to create k8s cluster # official documentation https://registry.terraform.io/providers/hashicorp/azurerm/3.65.0/docs/resources/kubernetes_cluster resource "azurerm_kubernetes_cluster" "k8s" { name = var.cluster_name location = var.region resource_group_name = var.resource_group_name dns_prefix = var.dns_prefix kubernetes_version = var.k8s_version node_resource_group = "aks_${var.cluster_name}_${var.region}" tags = var.aks_tags default_node_pool { name = "system" type = "VirtualMachineScaleSets" node_count = 1 vm_size = "Standard_DS2_v2" zones = [1, 2, 3] vnet_subnet_id = var.az_subnet_id only_critical_addons_enabled = true node_labels = { "worker-name" = "system" } } identity { type = "SystemAssigned" } network_profile { network_plugin = var.network_plugin } # enable workload identity oidc_issuer_enabled = true workload_identity_enabled = true } # azurerm_kubernetes_cluster_node_pool to create k8s workers # official documentation https://registry.terraform.io/providers/hashicorp/azurerm/3.65.0/docs/resources/kubernetes_cluster_node_pool resource "azurerm_kubernetes_cluster_node_pool" "k8s-worker" { for_each = var.nodepools name = each.value.name kubernetes_cluster_id = azurerm_kubernetes_cluster.k8s.id vm_size = each.value.vm_size min_count = each.value.min_count max_count = each.value.max_count enable_auto_scaling = each.value.enable_auto_scaling enable_node_public_ip = each.value.enable_node_public_ip zones = each.value.zones vnet_subnet_id = var.az_subnet_id tags = each.value.tags node_labels = each.value.node_labels } file allows the user to configure the subnet where the aks cluster and nodepools need to be created along with configurations for the nodepools. These configurations are referred in as . variables.tf main.tf var. The following file may have been abbreviated for brevity. variable "cluster_name" { type = string description = "aks cluster name" } variable "k8s_version" { type = string description = "kubernetes version" default = "1.26" } variable "region" { type = string description = "azure region where the aks cluster must be created, this region should match where you have created the resource group, vnet and subnet" } variable "resource_group_name" { type = string description = "azure resource group name where the aks cluster should be created" } variable "dns_prefix" { type = string description = "DNS prefix specified when creating the managed cluster. Possible values must begin and end with a letter or number, contain only letters, numbers, and hyphens and be between 1 and 54 characters in length. Changing this forces a new resource to be created." default = "platformwale" } variable "az_subnet_id" { type = string description = "azure subnet id where the nodepools and aks cluster need to be created" } variable "network_plugin" { type = string description = "Network plugin to use for networking. Currently supported values are azure, kubenet and none. Changing this forces a new resource to be created." default = "azure" } variable "aks_tags" { type = map(any) description = "tags for the aks cluster" default = {} } variable "nodepools" { description = "Nodepools for the Kubernetes cluster" type = map(object({ name = string zones = list(number) vm_size = string min_count = number max_count = number enable_auto_scaling = bool enable_node_public_ip = bool tags = map(string) node_labels = map(string) })) default = { worker = { name = "worker" zones = [1, 2, 3] vm_size = "Standard_D2_v2" min_count = 1 max_count = 100 enable_auto_scaling = true enable_node_public_ip = true tags = { worker_name = "worker" } node_labels = { "worker-name" = "worker" } } } } will output the variables which may be useful to the end user. You may observe that we and variables are marked as , which prevents from printing any sensitive information on stdout, though it doesn't prevent it from being stored in file. outputs.tf client_certificate kube_config sensitive = true tfstate The following file may have been abbreviated for brevity. output "cluster_id" { description = "The Kubernetes Managed Cluster ID." value = azurerm_kubernetes_cluster.k8s.id } output "client_certificate" { description = "Base64 encoded public certificate used by clients to authenticate to the Kubernetes cluster." value = azurerm_kubernetes_cluster.k8s.kube_config.0.client_certificate sensitive = true } output "kube_config" { description = "Raw Kubernetes config to be used by kubectl and other compatible tools." value = azurerm_kubernetes_cluster.k8s.kube_config_raw sensitive = true } output "oidc_issuer_url" { description = "The OIDC issuer URL that is associated with the cluster" value = azurerm_kubernetes_cluster.k8s.oidc_issuer_url } output "node_resource_group" { description = "The auto-generated Resource Group which contains the resources for this Managed Kubernetes Cluster." value = azurerm_kubernetes_cluster.k8s.node_resource_group } output "node_resource_group_id" { description = "The auto-generated Resource Group which contains the resources for this Managed Kubernetes Cluster." value = azurerm_kubernetes_cluster.k8s.node_resource_group_id } cluster modules In the sections above we have created the modules/APIs, it's time to invoke these modules in a consolidated module named . You can imagine this module being written by the client of the platform team which can be any application team wanting to claim infrastructure resources. This module will further be opinionated catering to the needs of the application team. cluster below accepts as input and uses the same name for , , and . main.tf cluster_name resource_group_name vnet_name subnet_name cluster_name Similarly, same (cidr block) is used for both subnets and vnet. You can also observe that module uses (resource group name) and (subnet id) from modules output, this puts an indirect dependency on module. This means module will wait for module to finish before executing. address_space aks_with_node_group az_rg_name az_subnet_id vnet_with_subnets vnet_with_subnets aks_with_node_group vnet_with_subnets The following file may have been abbreviated for brevity. # invoking vnet and subnets modules module "vnet_with_subnets" { # invoke vnet_and_subnets module under modules directory source = "../modules/vnet_and_subnets" # create resource group, vnet and subnet with the same name as cluster name resource_group_name = var.cluster_name vnet_name = var.cluster_name subnet_name = var.cluster_name # location where the resources need to be created region = var.region address_space = var.address_space subnet_address_prefix = var.address_space } # invoking aks module to create aks cluster and node group module "aks_with_node_group" { # invoke aks module under modules directory source = "../modules/aks" cluster_name = var.cluster_name k8s_version = var.k8s_version region = var.region dns_prefix = var.cluster_name resource_group_name = module.vnet_with_subnets.az_rg_name az_subnet_id = module.vnet_with_subnets.az_subnet_id nodepools = var.nodepools } file accepts less number of parameters than what we saw in vnet and aks modules earlier, as you can see above is written in an opinionated manner catering to the needs of a team. Each team can write their own version of the module. variables.tf main.tf The following file may have been abbreviated for brevity. variable "cluster_name" { type = string description = "resource group, vnet, subnet and aks cluster name" } variable "k8s_version" { type = string description = "kubernetes version" default = "1.26" } variable "region" { type = string description = "azure region where the aks cluster must be created, this region should match where you have created the resource group, vnet and subnet" } variable "address_space" { type = list(string) description = "The address space that is used the virtual network. You can supply more than one address space but for our module implementation we are limiting it to 1 address space only." default = ["10.1.0.0/16"] } variable "nodepools" { description = "Nodepools for the Kubernetes cluster" type = map(object({ name = string zones = list(number) vm_size = string min_count = number max_count = number enable_auto_scaling = bool enable_node_public_ip = bool tags = map(string) node_labels = map(string) })) default = { worker = { name = "worker" zones = [1, 2, 3] vm_size = "Standard_D2_v2" min_count = 1 max_count = 100 enable_auto_scaling = true enable_node_public_ip = true tags = { worker_name = "worker" } node_labels = { "worker-name" = "worker" } } } } file is only retrieving the variables the team may need. The following file may have been abbreviated for brevity. outputs.tf output "cluster_id" { description = "The Kubernetes Managed Cluster ID." value = module.aks_with_node_group.cluster_id } output "client_certificate" { description = "Base64 encoded public certificate used by clients to authenticate to the Kubernetes cluster." value = module.aks_with_node_group.client_certificate sensitive = true } output "kube_config" { description = "Raw Kubernetes config to be used by kubectl and other compatible tools." value = module.aks_with_node_group.kube_config sensitive = true } output "oidc_issuer_url" { description = "The OIDC issuer URL that is associated with the cluster" value = module.aks_with_node_group.oidc_issuer_url } output "node_resource_group" { description = "The auto-generated Resource Group which contains the resources for this Managed Kubernetes Cluster." value = module.aks_with_node_group.node_resource_group } output "node_resource_group_id" { description = "The auto-generated Resource Group which contains the resources for this Managed Kubernetes Cluster." value = module.aks_with_node_group.node_resource_group_id } output "az_rg_id" { description = "The ID of the resource group" value = module.vnet_with_subnets.az_rg_id } output "az_rg_name" { description = "The name of the resource group" value = module.vnet_with_subnets.az_rg_name } output "az_vnet_id" { description = "The ID of the vnet" value = module.vnet_with_subnets.az_vnet_id } output "az_subnet_id" { description = "The ID of the subnet" value = module.vnet_with_subnets.az_subnet_id } prepare to invoke the cluster module Now we are at the final stage, where members of the team may want to invoke the cluster module for various use cases. For example, we may want to create , and aks clusters. dev stage prod below is only overriding the , and vars in module we created above, and using other default values. main.tf cluster_name k8s_version region cluster Along with that it's setting the to store the file in s3. This backend is configured at the time of initializing using in the section below. We have explained about this in our earlier blog on how to . terraform backend tfstate terraform init Create EKS cluster within its VPC This also configures the , you will see in the section below that we are overriding the required parameters by setting some environment variables to make sure that terraform creates the resources in the desired azure account/subscription. You will also notice that module invocation is pointing to the module we created in the section above. azure provider cluster source cluster # to use s3 backend # s3 bucket is configured at command line terraform { backend "s3" {} } provider "azurerm" { # The AzureRM Provider supports authenticating using via the Azure CLI, a Managed Identity # and a Service Principal. More information on the authentication methods supported by # the AzureRM Provider can be found here: # https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs#authenticating-to-azure # The features block allows changing the behaviour of the Azure Provider, more # information can be found here: # https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/guides/features-block features {} } # invoke cluster module which creates resource group, vnet, subnets and aks cluter with a default nodepool # by default cluster module also creates a nodepool named worker module "cluster" { source = "./cluster" region = var.region cluster_name = var.cluster_name k8s_version = var.k8s_version } file and files are as follows. The actual files are here - and . variables.tf outputs.tf variables.tf outputs.tf variable "region" { type = string description = "aks region where the resources are being created" } variable "cluster_name" { type = string description = "aks cluster name, same name is used for resource group, vnet and subnets" default = "platformwale" } variable "k8s_version" { type = string description = "k8s version" default = "1.26" } output "kube_config" { description = "Raw Kubernetes config to be used by kubectl and other compatible tools." value = module.cluster.kube_config sensitive = true } output "oidc_issuer_url" { description = "The OIDC issuer URL that is associated with the cluster" value = module.cluster.oidc_issuer_url } Now we also need to create file. You can imagine this as the input file used while invoking the module, this way you can have different behaviors based on your requirement. For example as discussed earlier, you may have , and for our environment specific clusters which may have distinguished configurations. The following is the which we will use in the sections below for provisioning the infrastructure. The complete code can be found . .tfvars dev.tfvars stage.tfvars prod.tfvars sample.tfvars here # azure region region = "westus2" # aks cluster name, this is the same name used to create the resource group as well as vnet # hence this name must be unique cluster_name = "platformwale" With all these modules, now we are all set to actually see the infrastructure for AKS cluster come to live, please refer the sections below on further instructions. Setting Up the Environment Before we dive into creating our resources, let's authenticate Azure CLI with our Azure account: az login Set the following environment variables to prepare to create AKS cluster in a designated subscription in your azure account. export ARM_CLIENT_ID="The Client ID which should be used." export ARM_CLIENT_SECRET="The Client Secret which should be used." export ARM_SUBSCRIPTION_ID="The Subscription ID which should be used." export ARM_TENANT_ID="The Tenant ID which should be used." Deploy and Validate In this section, we will look at the details on how to execute the terraform modules we prepared above to create the AKS cluster within its VNET using terraform, connect to the cluster, and deploy helm chart to validate the functionality of the cluster. nginx Create an s3 bucket to store the tfstate file aws s3api create-bucket --bucket "your-bucket-name" --region "your-aws-region" Initialize terraform module, run this from the root of where you have prepared the terraform files to invoke module my-aks-tf cluster # tfstate file name tfstate_file_name="<some name e.g. aks-1111111111>" # tfstate s3 bucket name, this will have the tfstate file which you can use for further runs of this terraform module # for example to upgrade k8s version or add new node pools etc.. The bucket name must be unique as s3 is a global service. Terraform will create the s3 bucket if it doesn't exist tfstate_bucket_name="unique s3 bucket name you created above e.g. my-tfstate-<myname>" # initialize the terraform module terraform init -backend-config "key=${tfstate_file_name}" -backend-config "bucket=${tfstate_bucket_name}" -backend-config "region=us-east-1" Retrieve the , a preview of what will happen when you apply this terraform module. This is a best practice to understand the change. terraform plan terraform plan -var-file="path/to/your/terraform.tfvars" # example terraform plan -var-file="sample.tfvars" If you are satisfied with the plan above, this is the final step to apply the terraform and wait for the resources to be created. It will take about ~20 mins for all the resources to be created. terraform apply -var-file="path/to/your/terraform.tfvars" # example terraform apply -var-file="sample.tfvars" After successful cluster creation, retrieve the , connect to the AKS cluster and validate the is now pointing to the new cluster. kubeconfig kubeconfig context az aks get-credentials --resource-group "<my resource group name>" --name "<my aks cluster name>" --subscription "<subscription where the resources are created>" # as per the sample.tfvars parameters az aks get-credentials --resource-group "platformwale" --name "platformwale" --subscription "${ARM_SUBSCRIPTION_ID}" # validate that the kubeconfig context is pointing to the new cluster kubectl config current-context Install helm chart, this will create a load balancer service that proves the functionality of the AKS cluster as nginx pods were able to come up successfully. nginx helm repo add bitnami https://charts.bitnami.com/bitnami helm install -n default nginx bitnami/nginx # validate nginx pod and load balancer service kubectl get pods -n default kubectl get svc -n default # example output of the commands above $ kubectl get pods -n default NAME READY STATUS RESTARTS AGE nginx-7c8ff57685-ck9pn 1/1 Running 0 3m31s $ kubectl get svc -n default nginx NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE nginx LoadBalancer 10.0.80.50 XX.XXX.XXX.X 80:30149/TCP 77s You will be able to put the in browser and will be able to see welcome page as below - http://<EXTERNAL-IP>:80 nginx Clean Up When you're done with your resources, you can destroy them with the following commands. This is an extremely important step as otherwise you will see unexpected costs for the resources in your account. # uninstall nginx helm chart to make sure load balancer is deleted helm uninstall -n default nginx # destroy infrastructure terraform destroy -var-file="sample.tfvars" Conclusion There you have it! You've successfully created an AKS cluster within a VNET using Terraform. With the power of IaC, you can easily manage, replicate, and version control your infrastructure. Happy Terraforming! References Terraform Documentation Terraform Azure Documentation Official Azure Documentation Kubernetes Documentation Please note that this tutorial is a basic guide, and best practices such as state management, data security, and others are not covered here. We recommend further study to understand and implement these practices for production-level projects. Author Notes Feel free to reach out with any concerns or questions you have, either on the GitHub repository or directly on this blog. I will make every effort to address your inquiries and provide resolutions. Stay tuned for the upcoming blog in this series dedicated to Platformwale (Engineers who work on Infrastructure Platform teams). Originally published on July 15, 2023. here