Running any application in production assumes that reliable monitoring is already in place. 'Serverless' applications are no exception. As modern cloud applications get more and more distributed and complex, the challenges of monitoring availability, performance, and cost become increasingly difficult. Unfortunately, there isn’t much offered right out-of-the-box from cloud providers. Although you can’t fully understand what’s happening just with CloudWatch alone, it is a great place to start and have as the first line of defense for ensuring service availability and performance. Let’s and your Lambda functions with CloudWatch. explore the basics more complex use cases for monitoring CloudWatch Metrics You can Gather CloudWatch gathers basic metrics allowing you to observe how your system is performing. For Lambda functions, the gathered metrics are: errors, invocations, concurrency, latency and memory usage. Since when something goes wrong -- or about to go wrong -- it’s good to in case some unexpected threshold or condition is met to notify you through various channels. it’s unlikely that you’ll happen to check your metrics at the exact right time configure alarms How to Set up CloudWatch Metric Alarms You can configure a CloudWatch alarm to trigger an SNS topic in case a predefined condition is met. That SNS trigger can then invoke a Lambda function which will take action to either notify or possibly fix the situation. You will need to use the CloudWatch Logs subscription then match entries with a specific error pattern in your logs. This way you can automate the task of being notified for errors rather than manually parsing through countless rows of logs. AWS CloudWatch Alarm Solution Architecture, source: Amazon The solution is: You define the errors you wish to be alerted on. CloudWatch Logs catches those errors and invokes a Lambda function to process the error to alert you via Amazon SNS topic. Let’s configure a basic alarm for when a Lambda function fails for any reason — here is a simple guide on deploying the above: Create an SNS topic to configure the email subscription. Create an IAM role and a policy. Create a Lambda function to alert you via SNS (sample code below). base64 boto3 gzip json logging os botocore.exceptions import ClientError .basicConfig(level=logging.INFO) = logging.getLogger(__name__) logpayload(event): .setLevel(logging.DEBUG) .debug(event['awslogs']['data']) = base64.b64decode(event['awslogs']['data']) = gzip.decompress(compressed_payload) = json.loads(uncompressed_payload) log_payload error_details(payload): = = payload['logEvents'] .debug(payload) = payload['logGroup'] = payload['logStream'] = loggroup.split('/') .debug(f'LogGroup: {loggroup}') .debug(f'Logstream: {logstream}') .debug(f'Function name: {lambda_func_name[3]}') .debug(log_events) log_event in log_events: += log_event['message'] .debug('Message: %s' % error_msg.split( )) loggroup, logstream, error_msg, lambda_func_name publish_message(loggroup, logstream, error_msg, lambda_func_name): = os.environ['snsARN'] # Getting the SNS Topic ARN passed in by the environment variables. = boto3.client('sns') : = += + += += + str(loggroup) + += + str(logstream) + += + += + str(error_msg.split( )) + += .publish( =sns_arn, =f'Execution error for Lambda - {lambda_func_name[3]}', =message ) ClientError as e: .error( % e) lambda_handler(event, context): = logpayload(event) , lstream, errmessage, lambdaname = error_details(pload) (lgroup, lstream, errmessage, lambdaname) # Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved. # Licensed under the Apache License, Version 2.0 (the "License"). # You may not use this file except in compliance with the License. # A copy of the License is located at## http://aws.amazon.com/apache2.0/ # or in the "license" file accompanying this file. # This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, # either express or implied. See the License for the specific language governing permissions # and limitations under the License. # Description: This Lambda function sends an email notification to a given AWS SNS topic when a particular # pattern is matched in the logs of a selected Lambda function. The email subject is # Execution error for Lambda-<insert Lambda function name>. # The JSON message body of the SNS notification contains the full event details. # Author: Sudhanshu Malhotra import import import import import import from logging logger def logger logger compressed_payload uncompressed_payload log_payload return def error_msg "" log_events logger loggroup logstream lambda_func_name logger logger logger logger for error_msg logger "\n" return def sns_arn snsclient try message "" message "\nLambda error summary" "\n\n" message "##########################################################\n" message "# LogGroup Name:- " "\n" message "# LogStream:- " "\n" message "# Log Message:- " "\n" message "# \t\t" "\n" "\n" message "##########################################################\n" # Sending the notification... snsclient TargetArn Subject Message except logger "An error occured: %s" def pload lgroup publish_message Code Source: Amazon, Sudhanshu Malhotra How to Create a CloudWatch log Trigger and set a Filter If you need an error-generating Lambda function to test out, here’s one from Amazon which you can use: logging os .basicConfig(level=logging.DEBUG) =logging.getLogger(__name__) lambda_handler(event, context): .setLevel(logging.DEBUG) .debug( ) .error( ) .info( ) .critical( ) import import logging logger def logger logger "This is a sample DEBUG message.. !!" logger "This is a sample ERROR message.... !!" logger "This is a sample INFO message.. !!" logger "This is a sample 5xx error message.. !!" Code Source: Amazon Best Practices for Setting Metric Alerting So when should you configure a metric alarm? In general, you only want to receive alerts in cases that require your attention. If you create a situation where you have and responding to them is optional, it or worse yet — start ignoring alerts altogether. alerts too frequently won’t be long until you miss a critical alert from the noise For example : you can ask yourself these questions Is it okay if 1% of all requests fail for a specific Lambda function? Is it important that requests take less than 1 second? You probably want to know if your Lambdas are reaching an account-wide concurrency limit. and usually take some to get right. The settings are individual for every application time and iteration The other thing to think about is whether you should try to by nature (in order to trigger when . configure alerts that are preventive something hasn’t failed yet but might very soon) Setting Custom Metrics on CloudWatch Once you’ve defined your requirements for metrics you can start setting them up . one by one This can be done through CloudWatch as well. Amazon shares some examples you can follow but it is quite the tedious task to , but also to make sure and in working order with your growing application. here not only configure them correctly everything stays up-to-date Going Further and Scaling Using CloudWatch alarms is a great first line of defense but applications just through CloudWatch is hard and time-consuming, especially when your functions have a non-trivial amount of invocations. debugging As you can see from the above contents, creating alarms for even the most basic metric is quite an annoying task. Building alarms for custom metrics is a ton of work as well. There is an easier and better solution which are Dashbird’s automated preconfigured alarms. Dashbird’s automated alarms listen to events from logs and metrics, catching code exceptions, slow API responses, failed database requests and slow queues, and will notify you instantly of an error via Slack, Email, SNS or Webhooks. If anything is about to break you can quickly jump in and fix it before anything starts affecting your customers. there is no extra instrumentation needed so you can start using it right away and you won’t have to re-deploy any of your Lambda functions. Furthermore, Dashbird sets up metrics and alerts for all supported AWS resources, so you don’t have to. These are based on years of experience with monitoring serverless systems for Dashbird customers — they have over 5,000 AWS accounts connected and ingesting monitoring data. According to their website: "Dashbird not only detects failures, it also points you to the exact request, shows you logs, X-ray traces, and relevant metadata for that invocation." Also published on: https://dashbird.io/blog/configuring-cloudwatch-alarms-with-aws-lambda/