paint-brush
Perform Canary Deployments with AWS App Mesh on Amazon ECS Fargateby@yi
2,102 reads
2,102 reads

Perform Canary Deployments with AWS App Mesh on Amazon ECS Fargate

by Yi AiJuly 31st, 2019
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Canary deployments are a pattern for rolling out releases to a subset of users or servers. New features and other updates can be tested before it goes live for the entire user base. In this example, we are going to deal with a Flask restful API. We will set up a VPN with public subnets and set up ECS tasks in VPC with private subnets. If no errors reported, we will roll out the new version to the rest of the users. We need to deploy the docker images to ECR image repositories so that ECS would use them to create Task definition.

Companies Mentioned

Mention Thumbnail
Mention Thumbnail

Coin Mentioned

Mention Thumbnail
featured image - Perform Canary Deployments with AWS App Mesh on Amazon ECS Fargate
Yi Ai HackerNoon profile picture

In this article, I will walk you through all the steps required to perform canary deployments on Amazon ECS / Fargate with AWS App Mesh.

Canary deployments are a pattern for rolling out releases to a subset of users or servers. In this way, new features and other updates can be tested before it goes live for the entire user base.

In this example, we are going to deal with a Flask restful api. Once the api application is signed off for new release, only a few users are routed to new version. If no errors reported, we will roll out the new version to the rest of the users.

File Structure

  • /api
     - api handler
  • /api-gateway
     - api gateway

Prerequisites

  • Basic understanding of Docker
  • Basic understanding of CloudFormation
  • Setup an AWS account
  • Install latest aws-cliConfigure aws-cli to support App mesh APIs
  • jq installed

Creating VPC

First, We will setup a VPN with public subnets. If you like to set up ECS tasks in VPC with private subnets and NAT gateway, please read this tutorial from AWS team.

Now, let’s start to a CloudFormation template 

ecs-vpc.yaml
:

Description: >
  A stack for deploying containerized applications in AWS Fargate.
  This stack runs containers in public VPC subnet, and includes a
  public facing load balancer to register the services in.
Parameters:
  EnvironmentName:
    Description: An environment name that will be prefixed to resource names
    Type: String
    Default: flask

  ECSServiceLogGroupRetentionInDays:
    Type: Number
    Default: 30

  ECSServicesDomain:
    Type: String
    Description: "Domain name registerd under Route-53 that will be used for Service Discovery"
    Default: flask.sample

Mappings:
  SubnetConfig:
    VPC:
      CIDR: "10.0.0.0/16"
    PublicOne:
      CIDR: "10.0.0.0/24"
    PublicTwo:
      CIDR: "10.0.1.0/24"

Resources:
  VPC:
    Type: AWS::EC2::VPC
    Properties:
      EnableDnsSupport: true
      EnableDnsHostnames: true
      CidrBlock: !FindInMap ["SubnetConfig", "VPC", "CIDR"]

  PublicSubnetOne:
    Type: AWS::EC2::Subnet
    Properties:
      AvailabilityZone:
        Fn::Select:
          - 0
          - Fn::GetAZs: { Ref: "AWS::Region" }
      VpcId: !Ref "VPC"
      CidrBlock: !FindInMap ["SubnetConfig", "PublicOne", "CIDR"]
      MapPublicIpOnLaunch: true

  PublicSubnetTwo:
    Type: AWS::EC2::Subnet
    Properties:
      AvailabilityZone:
        Fn::Select:
          - 1
          - Fn::GetAZs: { Ref: "AWS::Region" }
      VpcId: !Ref "VPC"
      CidrBlock: !FindInMap ["SubnetConfig", "PublicTwo", "CIDR"]
      MapPublicIpOnLaunch: true

  InternetGateway:
    Type: AWS::EC2::InternetGateway
  GatewayAttachement:
    Type: AWS::EC2::VPCGatewayAttachment
    Properties:
      VpcId: !Ref "VPC"
      InternetGatewayId: !Ref "InternetGateway"
  PublicRouteTable:
    Type: AWS::EC2::RouteTable
    Properties:
      VpcId: !Ref "VPC"
  PublicRoute:
    Type: AWS::EC2::Route
    DependsOn: GatewayAttachement
    Properties:
      RouteTableId: !Ref "PublicRouteTable"
      DestinationCidrBlock: "0.0.0.0/0"
      GatewayId: !Ref "InternetGateway"
  PublicSubnetOneRouteTableAssociation:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      SubnetId: !Ref PublicSubnetOne
      RouteTableId: !Ref PublicRouteTable
  PublicSubnetTwoRouteTableAssociation:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      SubnetId: !Ref PublicSubnetTwo
      RouteTableId: !Ref PublicRouteTable

  ECSCluster:
    Type: AWS::ECS::Cluster
    Properties:
      ClusterName: !Ref EnvironmentName

  FargateContainerSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Access to the Fargate containers
      VpcId: !Ref "VPC"

  EcsSecurityGroupIngressFromPublicALB:
    Type: AWS::EC2::SecurityGroupIngress
    Properties:
      Description: Ingress from the public ALB
      GroupId: !Ref "FargateContainerSecurityGroup"
      IpProtocol: -1
      SourceSecurityGroupId: !Ref "PublicLoadBalancerSG"

  EcsSecurityGroupIngressFromSelf:
    Type: AWS::EC2::SecurityGroupIngress
    Properties:
      Description: Ingress from other containers in the same security group
      GroupId: !Ref "FargateContainerSecurityGroup"
      IpProtocol: -1
      SourceSecurityGroupId: !Ref "FargateContainerSecurityGroup"

  PublicLoadBalancerSG:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Access to the public facing load balancer
      VpcId: !Ref "VPC"
      SecurityGroupIngress:
        - CidrIp: 0.0.0.0/0
          IpProtocol: -1
  PublicLoadBalancer:
    Type: AWS::ElasticLoadBalancingV2::LoadBalancer
    Properties:
      Scheme: internet-facing
      LoadBalancerAttributes:
        - Key: idle_timeout.timeout_seconds
          Value: "30"
      Subnets:
        - !Ref PublicSubnetOne
        - !Ref PublicSubnetTwo
      SecurityGroups: [!Ref "PublicLoadBalancerSG"]

  TargetGroupPublic:
    Type: AWS::ElasticLoadBalancingV2::TargetGroup
    Properties:
      TargetType: ip
      HealthCheckIntervalSeconds: 6
      HealthCheckPath: /ping
      HealthCheckProtocol: HTTP
      HealthCheckTimeoutSeconds: 5
      HealthyThresholdCount: 2
      Name: "api"
      Port: 3000
      Protocol: HTTP
      UnhealthyThresholdCount: 2
      VpcId: !Ref "VPC"

  PublicLoadBalancerListener:
    Type: AWS::ElasticLoadBalancingV2::Listener
    DependsOn:
      - PublicLoadBalancer
    Properties:
      DefaultActions:
        - TargetGroupArn: !Ref "TargetGroupPublic"
          Type: "forward"
      LoadBalancerArn: !Ref "PublicLoadBalancer"
      Port: 80
      Protocol: HTTP

  TaskIamRole:
    Type: AWS::IAM::Role
    Properties:
      Path: /
      AssumeRolePolicyDocument: |
        {
            "Statement": [{
                "Effect": "Allow",
                "Principal": { "Service": [ "ecs-tasks.amazonaws.com" ]},
                "Action": [ "sts:AssumeRole" ]
            }]
        }
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/CloudWatchFullAccess
        - arn:aws:iam::aws:policy/AWSXRayDaemonWriteAccess

  TaskExecutionIamRole:
    Type: AWS::IAM::Role
    Properties:
      Path: /
      AssumeRolePolicyDocument: |
        {
            "Statement": [{
                "Effect": "Allow",
                "Principal": { "Service": [ "ecs-tasks.amazonaws.com" ]},
                "Action": [ "sts:AssumeRole" ]
            }]
        }
      Policies:
        - PolicyName: AmazonECSTaskExecutionRolePolicy
          PolicyDocument:
            Statement:
              - Effect: Allow
                Action:
                  - "ecr:GetAuthorizationToken"
                  - "ecr:BatchCheckLayerAvailability"
                  - "ecr:GetDownloadUrlForLayer"
                  - "ecr:GetRepositoryPolicy"
                  - "ecr:DescribeRepositories"
                  - "ecr:ListImages"
                  - "ecr:DescribeImages"
                  - "ecr:BatchGetImage"
                  - "logs:CreateLogStream"
                  - "logs:PutLogEvents"
                Resource: "*"

  ECSServiceLogGroup:
    Type: "AWS::Logs::LogGroup"
    Properties:
      RetentionInDays:
        Ref: ECSServiceLogGroupRetentionInDays

  ECSServiceDiscoveryNamespace:
    Type: AWS::ServiceDiscovery::PrivateDnsNamespace
    Properties:
      Vpc: !Ref "VPC"
      Name: { Ref: ECSServicesDomain }

Outputs:
  Cluster:
    Description: A reference to the ECS cluster
    Value: !Ref ECSCluster
    Export:
      Name: !Sub "${EnvironmentName}:ECSCluster"
  ECSServiceDiscoveryNamespace:
    Description: A SDS namespace that will be used by all services in this cluster
    Value: !Ref ECSServiceDiscoveryNamespace
    Export:
      Name: !Sub "${EnvironmentName}:ECSServiceDiscoveryNamespace"
  ECSServiceLogGroup:
    Description: Log group for services to publish logs
    Value: !Ref ECSServiceLogGroup
    Export:
      Name: !Sub "${EnvironmentName}:ECSServiceLogGroup"
  PublicLoadBalancerSG:
    Description: Log group for public LoadBalancer
    Value: !Ref PublicLoadBalancerSG
    Export:
      Name: !Sub "${EnvironmentName}:PublicLoadBalancerSG"
  FargateContainerSecurityGroup:
    Description: Security group to be used by all services in the cluster
    Value: !Ref FargateContainerSecurityGroup
    Export:
      Name: !Sub "${EnvironmentName}:FargateContainerSecurityGroup"
  TaskExecutionIamRoleArn:
    Description: Task Executin IAM role used by ECS tasks
    Value: { "Fn::GetAtt": TaskExecutionIamRole.Arn }
    Export:
      Name: !Sub "${EnvironmentName}:TaskExecutionIamRoleArn"
  TaskIamRoleArn:
    Description: IAM role to be used by ECS task
    Value: { "Fn::GetAtt": TaskIamRole.Arn }
    Export:
      Name: !Sub "${EnvironmentName}:TaskIamRoleArn"
  PublicListener:
    Description: The ARN of the public load balancer's Listener
    Value: !Ref PublicLoadBalancerListener
    Export:
      Name: !Sub "${EnvironmentName}:PublicListener"
  VPCId:
    Description: The ID of the VPC that this stack is deployed in
    Value: !Ref "VPC"
    Export:
      Name: !Sub "${EnvironmentName}:VPCId"
  PublicSubnetOne:
    Description: Public subnet one
    Value: !Ref "PublicSubnetOne"
    Export:
      Name: !Sub "${EnvironmentName}:PublicSubnetOne"
  PublicSubnetTwo:
    Description: Public subnet two
    Value: !Ref "PublicSubnetTwo"
    Export:
      Name: !Sub "${EnvironmentName}:PublicSubnetTwo"
  TargetGroupPublic:
    Description: ALB public target group
    Value: !Ref "TargetGroupPublic"
    Export:
      Name: !Sub "${EnvironmentName}:TargetGroupPublic"

Then run the 

aws cloudformation
create-stack command to create a stack:

$ aws cloudformation create-stack --stack-name flask-sample --template-body file://ecs-vpc.yaml --profile YOUR_PROFILE --region YOUR_REGION

Creating an App Mesh

AWS App Mesh is a service mesh that provides application-level networking to make it easy for your services to communicate with each other across multiple types of compute infrastructure. App Mesh standardizes how your services communicate, giving you end-to-end visibility and ensuring high-availability for your applications.

App Mesh perspective of the Flask api sample

The following CF template 

app-mesh.yaml
 will be used to create an mesh, virtual service, virtual router, corresponding route and virtual nodes for our api application:

Parameters:
  EnvironmentName:
    Type: String
    Description: Environment name that joins all the stacks
    Default: flask

  ServicesDomain:
    Type: String
    Description: DNS namespace used by services e.g. default.svc.cluster.local
    Default: flask.sample

  AppMeshMeshName:
    Type: String
    Description: Name of mesh
    Default: flask-mesh

Resources:
  Mesh:
    Type: AWS::AppMesh::Mesh
    Properties:
      MeshName: !Ref AppMeshMeshName

  ApiV1VirtualNode:
    Type: AWS::AppMesh::VirtualNode
    Properties:
      MeshName: !GetAtt Mesh.MeshName
      VirtualNodeName: api-vn
      Spec:
        Listeners:
          - PortMapping:
              Port: 3000
              Protocol: http
            HealthCheck:
              Protocol: http
              Path: "/ping"
              HealthyThreshold: 2
              UnhealthyThreshold: 2
              TimeoutMillis: 2000
              IntervalMillis: 5000
        ServiceDiscovery:
          DNS:
            Hostname: !Sub "api.${ServicesDomain}"

  ApiV2VirtualNode:
    Type: AWS::AppMesh::VirtualNode
    Properties:
      MeshName: !GetAtt Mesh.MeshName
      VirtualNodeName: api-v2-vn
      Spec:
        Listeners:
          - PortMapping:
              Port: 3000
              Protocol: http
            HealthCheck:
              Protocol: http
              Path: "/ping"
              HealthyThreshold: 2
              UnhealthyThreshold: 2
              TimeoutMillis: 2000
              IntervalMillis: 5000
        ServiceDiscovery:
          DNS:
            Hostname: !Sub "api-v2.${ServicesDomain}"

  ApiVirtualRouter:
    Type: AWS::AppMesh::VirtualRouter
    Properties:
      MeshName: !GetAtt Mesh.MeshName
      VirtualRouterName: api-vr
      Spec:
        Listeners:
          - PortMapping:
              Port: 3000
              Protocol: http

  ApiRoute:
    Type: AWS::AppMesh::Route
    DependsOn:
      - ApiVirtualRouter
      - ApiV1VirtualNode
      - ApiV2VirtualNode
    Properties:
      MeshName: !Ref AppMeshMeshName
      VirtualRouterName: api-vr
      RouteName: api-route
      Spec:
        HttpRoute:
          Action:
            WeightedTargets:
              - VirtualNode: api-vn
                Weight: 1
              - VirtualNode: api-v2-vn
                Weight: 0
          Match:
            Prefix: "/"

  ApiVirtualService:
    Type: AWS::AppMesh::VirtualService
    DependsOn:
      - ApiVirtualRouter
    Properties:
      MeshName: !GetAtt Mesh.MeshName
      VirtualServiceName: !Sub "api.${ServicesDomain}"
      Spec:
        Provider:
          VirtualRouter:
            VirtualRouterName: api-vr

  ApiGatewayVirtualNode:
    Type: AWS::AppMesh::VirtualNode
    DependsOn:
      - ApiVirtualService
    Properties:
      MeshName: !GetAtt Mesh.MeshName
      VirtualNodeName: gateway-vn
      Spec:
        Listeners:
          - PortMapping:
              Port: 3000
              Protocol: http
        ServiceDiscovery:
          DNS:
            Hostname: !Sub "gateway.${ServicesDomain}"
        Backends:
          - VirtualService:
              VirtualServiceName: !Sub "api.${ServicesDomain}"

Run the aws cloudformation create-stack command to create the mesh stack:

$ aws cloudformation create-stack --stack-name flask-app-mesh --template-body file://app-mesh.yaml --profile YOUR_PROFILE --region YOUR_REGION

Pushing images to ECR

Before we can deploy our services, we will need to deploy the docker images to ECR image repositories so that ECS would use them to create Task definition.

Go to api api/ directory , create bash script 

setup-ecr.sh
 to deploy api image to ECR api repository:

#!/bin/bash

set -ex
AWS_DEFAULT_REGION="YOUR_REGION"
AWS_PROFILE="YOU_PROFILE"

docker build -t flask-api .

API_IMAGE="$( aws ecr create-repository --repository-name flask-api \
              --region ${AWS_DEFAULT_REGION} --profile ${AWS_PROFILE} \
              --query '[repository.repositoryUri]' --output text || aws ecr describe-repositories --repository-name flask-api \
              --region ${AWS_DEFAULT_REGION} --profile ${AWS_PROFILE} \
              --query '[repositories[0].repositoryUri]' --output text)"

docker tag flask-api ${API_IMAGE}

$(aws ecr get-login --no-include-email --region ${AWS_DEFAULT_REGION}  --profile ${AWS_PROFILE})

docker push ${API_IMAGE}
$ ./api/setup-ecr.sh

Then move to api-gateway/ directory, create a bash script to deploy gateway image to ECR gateway repository:

#!/bin/bash

set -ex
AWS_DEFAULT_REGION="YOUR_REGION"
AWS_PROFILE="YOU_PROFILE"

docker build -t flask-gateway .

GATEWAY_IMAGE="$( aws ecr create-repository --repository-name flask-gateway \
              --region ${AWS_DEFAULT_REGION} --profile ${AWS_PROFILE} \
              --query '[repository.repositoryUri]' --output text || aws ecr describe-repositories --repository-name flask-gateway \
              --region ${AWS_DEFAULT_REGION} --profile ${AWS_PROFILE} \
              --query '[repositories[0].repositoryUri]' --output text)" \


echo ${GATEWAY_IMAGE}

docker tag flask-gateway ${GATEWAY_IMAGE}

$(aws ecr get-login --no-include-email --region ${AWS_DEFAULT_REGION}  --profile ${AWS_PROFILE})

docker push ${GATEWAY_IMAGE}
$ ./api-gateway/setup-ecr.sh

Creating Task definition

Task Definition is a blueprint that describes how a docker container should launch. We will need to create ECS task definitions for our gateway and api handlers and make tasks to be compatible with App Mesh and Xray.

Below is example JSON for Amazon ECS task definition of Flask gateway:

{
  "family": "gateway",
  "proxyConfiguration": {
    "type": "APPMESH",
    "containerName": "envoy",
    "properties": [
      {
        "name": "IgnoredUID",
        "value": "1337"
      },
      {
        "name": "ProxyIngressPort",
        "value": "15000"
      },
      {
        "name": "ProxyEgressPort",
        "value": "15001"
      },
      {
        "name": "AppPorts",
        "value": "9080"
      },
      {
        "name": "EgressIgnoredIPs",
        "value": "169.254.170.2,169.254.169.254"
      }
    ]
  },
  "containerDefinitions": [
    {
      "name": "app",
      "image": $APP_IMAGE,
      "portMappings": [
        {
          "containerPort": 3000,
          "hostPort": 3000,
          "protocol": "tcp"
        }
      ],
      "environment": [
        {
          "name": "API_ENDPOINT",
          "value": "http://api.flask.sample"
        },
        {
          "name": "SERVER_PORT",
          "value": "3000"
        }
      ],
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": $SERVICE_LOG_GROUP,
          "awslogs-region": "ap-southeast-2",
          "awslogs-stream-prefix": "gateway"
        }
      },
      "essential": true,
      "dependsOn": [
        {
          "containerName": "envoy",
          "condition": "HEALTHY"
        }
      ]
    },
    {
      "name": "envoy",
      "image": "111345817488.dkr.ecr.us-west-2.amazonaws.com/aws-appmesh-envoy:v1.9.1.0-prod",
      "user": "1337",
      "essential": true,
      "ulimits": [
        {
          "name": "nofile",
          "hardLimit": 15000,
          "softLimit": 15000
        }
      ],
      "portMappings": [
        {
          "containerPort": 9901,
          "hostPort": 9901,
          "protocol": "tcp"
        },
        {
          "containerPort": 15000,
          "hostPort": 15000,
          "protocol": "tcp"
        },
        {
          "containerPort": 15001,
          "hostPort": 15001,
          "protocol": "tcp"
        }
      ],
      "environment": [
        {
          "name": "APPMESH_VIRTUAL_NODE_NAME",
          "value": "mesh/flask-mesh/virtualNode/gateway-vn"
        },
        {
          "name": "ENVOY_LOG_LEVEL",
          "value": $ENVOY_LOG_LEVEL
        },
        {
          "name": "ENABLE_ENVOY_XRAY_TRACING",
          "value": "1"
        },
        {
          "name": "ENABLE_ENVOY_STATS_TAGS",
          "value": "1"
        }
      ],
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": $SERVICE_LOG_GROUP,
          "awslogs-region": "ap-southeast-2",
          "awslogs-stream-prefix": "gateway-envoy"
        }
      },
      "healthCheck": {
        "command": [
          "CMD-SHELL",
          "curl -s http://localhost:9901/server_info | grep state | grep -q LIVE"
        ],
        "interval": 5,
        "timeout": 2,
        "retries": 3
      }
    },
    {
      "name": "xray-daemon",
      "image": "amazon/aws-xray-daemon",
      "user": "1337",
      "essential": true,
      "cpu": 32,
      "memoryReservation": 256,
      "portMappings": [
        {
          "hostPort": 2000,
          "containerPort": 2000,
          "protocol": "udp"
        }
      ],
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": $SERVICE_LOG_GROUP,
          "awslogs-region": "ap-southeast-2",
          "awslogs-stream-prefix": "gateway-xray"
        }
      }
    }
  ],
  "taskRoleArn": $TASK_ROLE_ARN,
  "executionRoleArn": $EXECUTION_ROLE_ARN,
  "requiresCompatibilities": ["FARGATE", "EC2"],
  "networkMode": "awsvpc",
  "cpu": "256",
  "memory": "512"
}

Now, we need to create a bash script setup-task-def.sh to create api gateway task definition, then run command 

./setup-task-def.sh
:

#!/bin/bash
set -ex
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null && pwd )"
AWS_DEFAULT_REGION="YOUR_REGION"
AWS_PROFILE="YOUR_PROFILE"
cluster_stack_output=$(aws --profile "${AWS_PROFILE}" --region "${AWS_DEFAULT_REGION}" \
    cloudformation describe-stacks --stack-name "flask-sample" \
    | jq '.Stacks[].Outputs[]')
task_role_arn=($(echo $cluster_stack_output \
    | jq -r 'select(.OutputKey == "TaskIamRoleArn") | .OutputValue'))
echo ${task_role_arn}
execution_role_arn=($(echo $cluster_stack_output \
    | jq -r 'select(.OutputKey == "TaskExecutionIamRoleArn") | .OutputValue'))
ecs_service_log_group=($(echo $cluster_stack_output \
    | jq -r 'select(.OutputKey == "ECSServiceLogGroup") | .OutputValue'))
envoy_log_level="debug"
GATEWAY_IMAGE="$( aws ecr describe-repositories \
 --repository-name flask-gateway --region ${AWS_DEFAULT_REGION} \
  --profile ${AWS_PROFILE} --query '[repositories[0].repositoryUri]' --output text)"
#Gateway Task Definition
task_def_json=$(jq -n \
    --arg APP_IMAGE $GATEWAY_IMAGE \
    --arg SERVICE_LOG_GROUP $ecs_service_log_group \
    --arg TASK_ROLE_ARN $task_role_arn \
    --arg EXECUTION_ROLE_ARN $execution_role_arn \
    --arg ENVOY_LOG_LEVEL $envoy_log_level \
    -f "${DIR}/task-definition-gateway.json")
task_def_arn=$(aws --profile "${AWS_PROFILE}" --region "${AWS_DEFAULT_REGION}" \
    ecs register-task-definition \
    --cli-input-json "${task_def_json}" \
    --query [taskDefinition.taskDefinitionArn] --output text
    )

The task definition for api v1 and v2 is very similar as above, you can find bash scripts in GitHub repo/api/.

Creating Services that runs the Task Definition

The command to create the ECS service takes a few parameters so it is easier to use CloudFormation template as input. Let’s create a 

ecs-service.yaml
 file with the following:

Parameters:
  EnvironmentName:
    Type: String
    Description: Environment name that joins all the stacks
    Default: flask

  AppMeshMeshName:
    Type: String
    Description: Name of mesh
    Default: flask-mesh

  ECSServicesDomain:
    Type: String
    Description: DNS namespace used by services e.g. default.svc.cluster.local
    Default: flask.sample

  GatewayTaskDefinition:
    Type: String
    Description: Task definition for Gateway Service

  ApiV1TaskDefinition:
    Type: String
    Description: Task definition for Api v1

  ApiV2TaskDefinition:
    Type: String
    Description: Task definition for Api v2

  VpcCIDR:
    Description: Please enter the IP range (CIDR notation) for this VPC
    Type: String
    Default: 10.0.0.0/16

Resources:
  ECSServiceSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: "Security group for the service"
      VpcId:
        "Fn::ImportValue": !Sub "${EnvironmentName}:VPCId"
      SecurityGroupIngress:
        - CidrIp: !Ref VpcCIDR
          IpProtocol: -1

  ApiV1ServiceDiscoveryRecord:
    Type: "AWS::ServiceDiscovery::Service"
    Properties:
      Name: "api"
      DnsConfig:
        NamespaceId:
          "Fn::ImportValue": !Sub "${EnvironmentName}:ECSServiceDiscoveryNamespace"
        DnsRecords:
          - Type: A
            TTL: 300
      HealthCheckCustomConfig:
        FailureThreshold: 1

  ApiV1Service:
    Type: "AWS::ECS::Service"
    Properties:
      ServiceName: "api"
      Cluster:
        "Fn::ImportValue": !Sub "${EnvironmentName}:ECSCluster"
      DeploymentConfiguration:
        MaximumPercent: 200
        MinimumHealthyPercent: 100
      DesiredCount: 1
      LaunchType: FARGATE
      ServiceRegistries:
        - RegistryArn:
            "Fn::GetAtt": ApiV1ServiceDiscoveryRecord.Arn
      NetworkConfiguration:
        AwsvpcConfiguration:
          AssignPublicIp: ENABLED
          SecurityGroups:
            - !Ref ECSServiceSecurityGroup
          Subnets:
            - "Fn::ImportValue": !Sub "${EnvironmentName}:PublicSubnetOne"
            - "Fn::ImportValue": !Sub "${EnvironmentName}:PublicSubnetTwo"
      TaskDefinition: { Ref: ApiV1TaskDefinition }

  ApiV2ServiceDiscoveryRecord:
    Type: "AWS::ServiceDiscovery::Service"
    Properties:
      Name: "api-v2"
      DnsConfig:
        NamespaceId:
          "Fn::ImportValue": !Sub "${EnvironmentName}:ECSServiceDiscoveryNamespace"
        DnsRecords:
          - Type: A
            TTL: 300
      HealthCheckCustomConfig:
        FailureThreshold: 1

  GatewayServiceDiscoveryRecord:
    Type: "AWS::ServiceDiscovery::Service"
    Properties:
      Name: "gateway"
      DnsConfig:
        NamespaceId:
          "Fn::ImportValue": !Sub "${EnvironmentName}:ECSServiceDiscoveryNamespace"
        DnsRecords:
          - Type: A
            TTL: 300
      HealthCheckCustomConfig:
        FailureThreshold: 1

  ApiV2Service:
    Type: "AWS::ECS::Service"
    Properties:
      ServiceName: "api-v2"
      Cluster:
        "Fn::ImportValue": !Sub "${EnvironmentName}:ECSCluster"
      DeploymentConfiguration:
        MaximumPercent: 200
        MinimumHealthyPercent: 100
      DesiredCount: 1
      LaunchType: FARGATE
      ServiceRegistries:
        - RegistryArn:
            "Fn::GetAtt": ApiV2ServiceDiscoveryRecord.Arn
      NetworkConfiguration:
        AwsvpcConfiguration:
          AssignPublicIp: ENABLED
          SecurityGroups:
            - !Ref ECSServiceSecurityGroup
          Subnets:
            - "Fn::ImportValue": !Sub "${EnvironmentName}:PublicSubnetOne"
            - "Fn::ImportValue": !Sub "${EnvironmentName}:PublicSubnetTwo"
      TaskDefinition: { Ref: ApiV2TaskDefinition }

  GatewayService:
    Type: "AWS::ECS::Service"
    Properties:
      ServiceName: "gateway"
      Cluster:
        "Fn::ImportValue": !Sub "${EnvironmentName}:ECSCluster"
      DeploymentConfiguration:
        MaximumPercent: 200
        MinimumHealthyPercent: 100
      DesiredCount: 1
      LaunchType: FARGATE
      ServiceRegistries:
        - RegistryArn:
            "Fn::GetAtt": GatewayServiceDiscoveryRecord.Arn
      NetworkConfiguration:
        AwsvpcConfiguration:
          AssignPublicIp: ENABLED
          SecurityGroups:
            - !Ref ECSServiceSecurityGroup
          Subnets:
            - "Fn::ImportValue": !Sub "${EnvironmentName}:PublicSubnetOne"
            - "Fn::ImportValue": !Sub "${EnvironmentName}:PublicSubnetTwo"
      TaskDefinition: { Ref: GatewayTaskDefinition }
      LoadBalancers:
        - ContainerName: app
          ContainerPort: 3000
          TargetGroupArn:
            "Fn::ImportValue": !Sub "${EnvironmentName}:TargetGroupPublic"

Next, create a bash script 

ecs-services-stack.sh
:

#!/bin/bash

set -ex

AWS_DEFAULT_REGION="YOUR_REGION"
AWS_PROFILE="YOUR_PROFILE"

DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null && pwd )"

task_api_arn=$(aws ecs list-task-definitions --family-prefix api \
--region ${AWS_DEFAULT_REGION} --profile ${AWS_PROFILE} \
--sort DESC \
--query '[taskDefinitionArns[0]]' --output text)

task_api_v2_arn=$(aws ecs list-task-definitions --family-prefix api-v2 \
 --region ${AWS_DEFAULT_REGION} --profile ${AWS_PROFILE} \
 --sort DESC \
 --query '[taskDefinitionArns[0]]' --output text)

task_gateway_arn=$(aws ecs list-task-definitions --family-prefix gateway \
--region ${AWS_DEFAULT_REGION} --profile ${AWS_PROFILE} \
--sort DESC \
--query '[taskDefinitionArns[0]]' --output text)

aws cloudformation --region ${AWS_DEFAULT_REGION} --profile ${AWS_PROFILE} \
    deploy --stack-name "flask-ecs-service" \
    --capabilities CAPABILITY_IAM \
    --template-file "${DIR}/ecs-services.yaml"  \
    --parameter-overrides \
    GatewayTaskDefinition="${task_gateway_arn}" \
    ApiV1TaskDefinition="${task_api_arn}" \
    ApiV2TaskDefinition="${task_api_v2_arn}"

and run following command:

$ ./ecs-services-stack.sh

Now that we have setup everything we need. We can go to AWS console to review what we have created.

CloudFormation Console

ECS Task Definition

ECS cluster and services

Verify the Fargate deployment

Once we have deployed our api application, we can curl the frontend service (gateway) to test. To get the endpoint, open the AWS EC2 console, on the navigation pane, under LOAD BALANCING, choose Load Balancers and select load balancer we just created, find the DNS name which is the endpoint, and run curl command:

$ curl flask-Publi-xxxxx-xxxxx.ap-southeast-2.elb.amazonaws.com/todos/todo1
{
"todo": {
   "task": "build an API"
},
   "version": "1"
}

Notice that all the services of the application are reflecting version 1. Now, it’s time for us to perform a canary deployment of api v2.

Canary Deployment of Api v2

We can manage the target weight (WeightedTargets) in

app-mesh.yaml 
ApiRoute resource as below:

ApiRoute:
    Type: AWS::AppMesh::Route
    DependsOn:
      - ApiVirtualRouter
      - ApiV1VirtualNode
      - ApiV2VirtualNode
    Properties:
      MeshName: !Ref AppMeshMeshName
      VirtualRouterName: api-vr
      RouteName: api-route
      Spec:
        HttpRoute:
          Action:
            WeightedTargets:
              - VirtualNode: api-vn
                Weight: 2
              - VirtualNode: api-v2-vn
                Weight: 1
          Match:
            Prefix: "/"

and re-deploy the template. Or we can open AWS App Mesh Console, choose Mesh we created, select Virtual Routers, open route and edit, set target traffic weight in Targets section:

Integrating AWS X-Ray with AWS App Mesh

AWS X-Ray helps us to monitor and analyze distributed microservice applications through request tracing, providing an end-to-end view of requests traveling through the application so we can identify the root cause of errors and performance issues. We’ll use X-Ray to provide a visual map of how App Mesh is distributing traffic and inspect traffic latency through our routes.

In setting up task definition step, we have already defined X-Ray container in task definitions, however X-Ray can only run in local mode with Fargate, so we need to manually create X-ray segment to track traffic between gateway , api v1& v2.

We will manually create trace id and segment, then pass the trace id and segment id as parent id to api handlers. The 

api-gateway/main.py
 should look like:

from flask import Flask
from flask_restful import Resource, Api, reqparse
from aws_xray_sdk.core import xray_recorder
from aws_xray_sdk.core import patch_all
from aws_xray_sdk.core.models.traceid import TraceId
from aws_xray_sdk.core.models import http
import requests
import os
import json
import logging

patch_all()
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger(__name__)
API_ENDPOINT =  os.environ['API_ENDPOINT']
SERVER_PORT =  os.environ['SERVER_PORT']
# xray_recorder.configure(
#     sampling=False,
#     context_missing='LOG_ERROR',
#     plugins=('EC2Plugin', 'ECSPlugin'),
#     service='Flask Gateway'
# )
app = Flask(__name__)
api = Api(app)
traceid = TraceId().to_id()
parser = reqparse.RequestParser()

class Ping(Resource):
    def get(self):
        return {'response': 'ok'}

class TodoList(Resource):
    def __init__(self):
        self.segment = xray_recorder.begin_segment('gateway_todoS',traceid = traceid, sampling=1)
        self.segment.put_http_meta(http.URL, 'gateway.flask.sample')
        logger.info("Request todos from gateway")

    def __del__(self):
        xray_recorder.end_segment()

    def get(self):
        logger.info("Request todo from gateway, parentid is %s"%(self.segment.id))
        r = requests.get(url = '%s:%s/todos'%(API_ENDPOINT,SERVER_PORT), headers={'x-traceid': traceid, 'x-parentid':self.segment.id})
        return r.json()

    def post(self):
        args = parser.parse_args()
        r = requests.post(url = '%s:%s/todos'%(API_ENDPOINT,SERVER_PORT), json=args,headers={'x-traceid': traceid,'x-parentid':self.segment.id})
        return r.json(), 201

class Todo(Resource):
    def __init__(self):
        self.segment = xray_recorder.begin_segment('gateway_todo',traceid = traceid, sampling=1)
        self.segment.put_http_meta(http.URL, 'gateway.flask.sample')
        logger.info("Request todo from gateway")

    def __del__(self):
        xray_recorder.end_segment()


    def get(self, todo_id):
        r = requests.get(url = '%s:%s/todos/%s'%(API_ENDPOINT,SERVER_PORT,todo_id),headers={'x-traceid': traceid,'x-parentid':self.segment.id})
        return r.json()

    def delete(self, todo_id):
        r = requests.delete(url = '%s:%s/todos/%s'%(API_ENDPOINT,SERVER_PORT,todo_id),headers={'x-traceid': traceid,'x-parentid':self.segment.id})
        return r.json(), 204

    def put(self, todo_id):
        args = parser.parse_args()
        task = {'task': args['task']}
        r = requests.put(url = '%s:%s/todos/%s'%(API_ENDPOINT,SERVER_PORT,todo_id), json=task,headers={'x-traceid': traceid,'x-parentid':self.segment.id})
        return r.json(), 201

api.add_resource(TodoList, '/todos')
api.add_resource(Todo, '/todos/<todo_id>')
api.add_resource(Ping, '/ping')

if __name__ == '__main__':
    app.run(debug=True, host='0.0.0.0', port=3000)

After all services deployed successfully, we can open AWS X-Ray consoleand monitor traffic we’re sending to the application frontend (gateway) when we request api application on the 

/todos
 route.

Clean it all up

It is quickest to use the CloudFormation Console to delete the following stacks:

  • flask-ecs-services
  • flask-app-mesh
  • flask-sample

That’s about it! I hope you have found this walkthrough useful, You can find the complete project in my GitHub repo.