This is a hands-on course on how to deploy a fully Serverless web app using the . You will learn how to: AWS CDK Structure **CDK Stacks **to deploy an application from end-to-end Deploy a integrated with for dynamic requests processing Store data in a fast and cost-effective way with REST API AWS Lambda DynamoDB Use as a source for in an event-driven architecture DynamoDB streams Lambda Ingest and manipulate loads of data streams with Kinesis Firehose Deploy and query a with , and your entire application health in a single place using Data Lake Athena S3 GlueMonitor Dashbird You can use the resources declared in this demo application as a starting point to mix and adapt to your own architectures later, which should save you quite some time. The App and Architecture The demo app is a public blog where anyone can read, publish and like posts. It’s available . Go ahead and publish something in the top-left corner (yellow button) and also “like” articles already published. Check out the codebase on . on this link this repo Frontend Backend Data Lake and Analytical Querying What is the AWS CDK? CDK stands for Cloud Development Kit. Think of it as CloudFormation (CF) in your preferred language (Python, Typescript, C#, etc). Roughly speaking, it works like this: 1. You declare cloud resources using classes provided by the CDK libraries. Example: aws_cdk aws_s3 my_bucket = aws_s3.Bucket(self, ) from import 'MyBucket' 2. Run CDK deploy 3. CDK translates this to a CloudFormation template and deploy it on AWS for you. In case you would like to dig deeper, that will get your basics started. I also strongly recommend reading the . AWS also has a workshop official CDK documentation Advantages using the AWS CDK Use languages that are more expressive than YAML or JSON, for instance. Less - much less! - verbose than CloudFormation templates. Easier to apply reusability and inheritance principles to infrastructure code. Better integration with IDEs for code completion, IntelliSense, etc. Possible to test your infra code, just as any other software Portable: since it’s just a wrapper around CF, we can easily port it to JSON or YAML Disadvantages of using the AWS CDK Although released as a stable project by the AWS team, many parts (a lot of the good ones) are still experimental and APIs may change in backwards incompatible ways. It’s under constant development. During the preparation of this course, I had to upgrade my libraries three times. Documentation is still lacking at some parts and you will need to look at the CDK code, occasionally, to understand how to declare certain things. Deploy it yourself Although we have provided an online demo, you can also deploy this app in your own AWS account: Clone the repo: git clone git@github.com:byrro/serverless-website-demo.git sls-demo; cd sls-demo Setup your virtual environment: virtualenv -p /urs/bin/python3.8 .env; source .env/bin/activate; pip install -r requirements Specify an AWS account ID: ** export AWS_ACCOUNT_ID=1234567890 Deploy all three stacks: cdk deploy sls-blog; cdk deploy sls-blog-api; cdk deploy sls-blog-analytical ** You can also hard code your Account ID in the CDK project, as I’ll show in a minute; When starting a new project from scratch, you would run . This is not necessary for this demo, since the project is already created. cdk init --language [python|typescript|...] Monitoring Deploying this architecture in the cloud and blindly believing it will work flawlessly is not reasonable. We want to be the first one to know when something is not right to act upon it as quickly as possible. In this project, I used for its ease of use and seamless integration. Instead of having to deploy an agent inside my code, Dashbird plugs into my Stacks that I can deploy with the effort of one click. It not only monitors Lambda function errors, but also other resources that we’re using, such as DynamoDB tables. They even suggest cross-referenced against industry best practices. Dashbird through a CloudFormation template insights for architectural improvements Finally, Dashbird offers a free-forever plan. It’s a no-brainer to try it out by . registering for free How a CDK project is structured A CDK project creates an “Application”. This app may have one or more “Stacks”. A Stack is a group of cloud resources (Lambda functions, S3 buckets, etc) that are instantiated using CDK classes. It’s also possible to have multiple applications in a single CDK project. App Object Creating a CDK app is as simple as: app = core.App() When you run , an initial application with basic boilerplate code is created for you in the project root, under . cdk init --language [language] app.py The next thing we need is an environment, which is composed of an AWS Account ID and Region: env = env = core.Environment( account= , region= , ) 1234567890 'us-east-1' Declaring an environment is not required (CDK can infer from your AWS credentials), but is a good practice. Most of us work with multiple AWS accounts. It’s easy to mess around with several projects, accounts, credentials. When we explicitly set the environment in the CDK app, it’s locked and prevents mistaken deployments. Now we declare our stacks: my_project.my_project_stack MyStack my_stack = MyStack( app, , env=env, ) from import 'my-stack' This is how we our stacks for deployment. In the next section we’ll see how to those stacks. instantiate declare Stack Object The Stack object is where we our AWS resources. It inherits from the core.Stack CDK class and accepts a scope - which is our app object - a string identifier and an environment. declare ( . ): ( , : . , : , : . , ** , ) -> : (). ( , , ** ) # class MyStack core Stack def __init__ self scope core Construct id str env core Environment kwargs None super __init__ scope id kwargs Declare AWS resources here Declaring AWS resources To declare AWS resources, we need a specific library for each service. Here’s a list of all and their counterparts. Other flavors are and . Python libraries Typescript Java .NET Let’s see how a basic REST API would be declared (typing expressions were removed for readability purposes): aws_cdk aws_apigateway, aws_lambda = aws_lambda.Function( self, , runtime=aws_lambda.Runtime.PYTHON_3_8, code=aws_lambda.Code.asset( my_lambda.handler sls-blog-rest-api-gateway from import ( . ): ( , , , ): (). ( , , ** ) class MyStack core Stack def __init__ self scope id env super __init__ scope id kwargs my_lambda 'MyLambda' 'my_lambda_folder), handler=' ', ) aws_apigateway.LambdaRestApi( self, ' ', handler=my_lambda, ) We first declare a Lambda function . We point its code to the . Inside this folder, there should be a file, containing a function called . This handler function should accept Lambda invocations normally (an and objects). my_lambda my_lambda_folder my_lambda.py handler event context Next, a is declared, using as the handler (not to confuse with the Lambda’s handler function). This will create a new API Gateway REST API integrated with using an . All HTTP requests will be routed to the Lambda function. LambdaRestApi my_lambda my_lambda AWS_PROXY integration type Our Project App & Stacks comprises one application with three Stacks. They’re all declared in the and files. This project app.py sls_website_stack.py Below we’ll walk through all Stacks in a high level. I encourage you to inspect the to learn how these resources are declared and also integrated. For example: a Kinesis Firehose is created in one Stack and referenced in another to include its name as an environment variable for the Lambda function that will interact with it. stacks file Except for the frontend static Stack - which is small - you will notice that resources are initialized with a (null) value in the beginning. The reason is that, even though the CDK is generally more succinct than CloudFormation, it can still be lengthy enough to clutter the view of the entire Stack. Having each resource declared first in one line, I can provide a short summary of everything that’s in the Stack and then instantiate the CDK classes in other methods. None = static_stack # SQS Queues self.queue_ddb_streams_dlq = None # Dead-letter-queue DDB streams # DynamoDB Tables self.ddb_table_blog = None # Single-table all blog content # DynamoDB Event Sources self.ddb_source_blog = None # Blog table streams source # DynamoDB Indexes self.ddb_gsi_latest = None # GSI ordering articles by timestamp # Lambda Functions self.lambda_blog = None # Serves requests to the blog public API self.lambda_stream_reader = None # Processes DynamoDB streams # Continues other resources... ( . ): ( , , , , ): (). ( , , ** ) . class SlsBlogApiStack core Stack def __init__ self scope id env static_stack super __init__ scope id kwargs self static_stack for for with Notice it takes another Stack object ( ) as an argument to its initialization. In the file, you can see that the is initialized passing the as an argument. static_stack app.py SlsBlogApiStack SlsBlogStack We use it to reference the CloudFront distribution domain ( ) in the Lambda environment variables. This variable can be used to customize the HTTP response header to comply with . This illustrates one way to easily integrate and reference information from one Stack into another within a CDK project. d1qmte5oc6ndq5.cloudfront.net Access-Control-Allow-Origin CORS standards At the end of the initialization, another method is called to instantiate the CDK classes for each resource and configure their parameters. self.create_cdk_resources() Next we’ll walk through each of our project’s Stacks. Stack 1: static website Our focus is on the Serverless backend, so the frontend here is terribly rough and simple. It’s stored in an S3 Bucket and distributed through a CloudFront CDN. CDK has a helpful class called BucketDeployment. It takes the contents of a directory and sync to an S3 bucket. In this case, we stored the frontend code in the folder. website_static aws_s3_deployment.BucketDeployment( self, , sources=[aws_s3_deployment.Source.asset( )], destination_bucket=static_bucket, distribution=cdn, ) 'SlsBlogStaticS3Deployment' 'website_static' Stack 2: API/backend Our backend consists of an API Gateway (REST) connected to a . Microservices receives a lot of press, but you probably shouldn’t always break your applications in several functions. A Monolith is just fine - -, really. monolithic Lambda function and sometimes recommended This API & Lambda support a single endpoint (with GET and POST methods) with a queryString “action”, which takes three parameters: : populates the latest blog posts get-latest-articles : triggered when someone likes an article like-article : post a new blog article publish-article Here’s the power of the CDK model. We can create a REST API with 10 lines of code: rest_api_blog = aws_apigateway.LambdaRestApi( self, , handler=lambda_blog, # Previously declared Lambda 'sls-blog-rest-api-gateway' = . ( ), ), ) function deploy_options aws_apigateway StageOptions stage_name= , throttling_rate_limit=lambda_param_max_concurrency, logging_level=aws_apigateway.MethodLoggingLevel( 'api' 'INFO' One nice little thing is that Lambda memory is used as a cache for the latest articles. We load the cache container outside the Lambda handler function. It remains in memory even after an invocation ends and is available for subsequent requests. Learn more here about . how to use Lambda as a cache mechanism MAX_CACHE_AGE: int = # In seconds CACHE_LATEST_ARTICLES: Dict[str, Union[int, list]] = { : time.time(), : [], } 120 'last_update' 'articles' All the data is stored in DynamoDB (DDB) using a , in . The site only shows the latest blog articles and items get after a few days by setting a attribute. single-table design on-demand mode auto-deleted by DDB time-to-live ddb_attr_time_to_live = ddb_param_max_parallel_streams = ddb_table_blog = aws_dynamodb.Table( self, , partition_key=aws_dynamodb.Attribute( name= , type=aws_dynamodb.AttributeType.STRING, ), billing_mode=aws_dynamodb.BillingMode.PAY_PER_REQUEST, point_in_time_recovery=True, removal_policy=core.RemovalPolicy.DESTROY, time_to_live_attribute=self.ddb_attr_time_to_live, stream=aws_dynamodb.StreamViewType.NEW_AND_OLD_IMAGES, ) 'time-to-live' 5 'sls-blog-dynamo-table' 'id' The DDB table also has a that makes it easier to retrieve articles ordered by date for the site: GSI (Global Secondary Index) self.ddb_table_blog.add_global_secondary_index( index_name= , partition_key=aws_dynamodb.Attribute( name= , type=aws_dynamodb.AttributeType.STRING, ), sort_key=aws_dynamodb.Attribute( name= , type=aws_dynamodb.AttributeType.NUMBER, ), projection_type=aws_dynamodb.ProjectionType.ALL, ) 'latest-blogs' 'item-type' 'publish-timestamp' Modifications to DDB items generate streams that are processed by a . These streams are then repackaged and sent to a Kinesis Firehose stream processor. second Lambda function DDB doesn’t provide the flexibility that SQL databases offer and many choose , for example. Although Aurora is a great service, personally I prefer DDB for its simplicity and reliable, consistent performance. But sometimes we do need to run analytical queries, those with aggregations and on-the-fly filters. For that, we’ll be using Athena (more in the next Stack). Aurora Serverless Stack 3: analytical querying A Kinesis Firehose Stream is responsible for batching data inserted/modified in DDB, converting them to Apache Parquet format and storing in dedicated S3 buckets. From S3, we create a Data Lake with AWS Glue (used to declare our data schemas) and Athena (used to query the data). Athena is extremely powerful. We can statements ( ) to query terabytes of data and pay on-demand ($0.005 per GB of data scanned). Using Parquet not only improves query speed, but also reduces cost by minimizing the amount of data Athena needs to scan for each query. use SQL SELECT with some limitations Queries that are impossible or expensive/slow in DynamoDB, such as aggregations and JOINs, are fast and cheap in Athena. The two services combine each other in a perfect way, so that your application has optimized transactional storage and flexible analytical querying capabilities. We can use Athena to query all articles ever published and cross-reference with likes and HTTP metadata (source IP address, country, device type, etc). Even articles that were already expired by DynamoDB TTL (time-to-live) would continue to be available in the Data Lake. For example, which countries are liking the most articles? In the AWS Console, we get something like this: Queries can also be executed programmatically with or (e.g. ) to integrate anywhere we need this data. Athena API AWS SDKs Python’s boto3 Athena also supports JOINs. Here’s an example joining articles and HTTP metadata to analyze the most popular authors among readers of a particular country: Deployment CDK can deploy one Stack at a time. Since we have three, it’s necessary to specify which one when running the command. We do that by passing in the Stack ID as a CLI argument. For example, the following command will deploy the SlsBlogApiStack (id: ): CDK deploy sls-blog-api cdk deploy sls-blog-api Since all Stacks involve some type of permission granting, CDK asks for confirmation before deploying those resources. You can review the permissions requested and hit y when it’s good to go. Wrapping up We’ve covered how to structure CDK apps and add a bunch of AWS Resources to deploy with a simple command. CDK deploy If you’re new to the CDK - and as suggested early in the article -, it’s strongly recommended to follow AWS and . CDK workshop documentation Keep an eye on future publications as well, as is releasing other examples and tutorials to reap the most out of AWS serverless services with the power of infra automation with CDK or else. Dashbird Previously published at https://dashbird.io/blog/crash-course-aws-cdk-serverless-rest-api-data-lake-analytical-querying/