When building cloud-based systems and serverless systems, in particular, it’s crucial to stay on top of things. Your infrastructure will be miles away from you and not a device you hold in your hands like when you build a frontend. That’s why adding a monitoring solution to your stack, which offers a pre-configured serverless failure detection, should be one of the first decisions.
When deciding on a monitoring solution, it’s a good idea to check out if it brings adequate alerting features. When it comes to serverless, most of the time from a new incident to a fix is lost to finding the problem, not the actual troubleshooting. This means a monitoring solution can shrink the biggest chunk of that equation.
Serverless is all about outsourcing non-differentiating work to managed services. If you aren’t in the business of selling authentication software, don’t build authentication software; buy it from someone who makes their money from it; the chances are they have one or more teams working on the solution and it will outperform yours in a heartbeat.
As mentioned before, the challenge here is that you can’t look into these services. You use SQS, DynamoDB, or API Gateway, but you can’t directly monitor the servers these services are running on, let alone SSH into them to debug them. AWS has its own logging and tracing services in place. So you need to extract the data and set alarms there.
The problem with the AWS provided monitoring services is, they aren’t easy to use because they’re general solutions. CloudWatch doesn’t just log Lambda or SQS data; it logs all AWS services data. This can make it a bit fiddly to set the right alerts for all your services. A serverless system in constant flux to add new features, updates, and refactors requires you to meddle with these settings constantly.
Also, AWS monitoring solutions aren’t particularly frugal when it comes to what data they log. In a serverless system, your transactions usually span multiple services, all emitting large numbers of log lines. Combing through that amount of data costs time and, in turn, money. And more often than not, AWS Console just isn’t enough for serverless teams, especially when scaling.
Built specifically for serverless failure detection and debugging, Dashbird collects all the logs your AWS services write – no instrumentation needed. It sorts them into different categories like configuration errors, timeouts, out-of-memory events, etc. Dashbird gathers all your logs, it presents them to you in an easily understandable way. These events are also published in the Event Library with quick explanations of the causes and fixes. If you ever looked at the Cloudwatch logs for one Lambda invocation, you know that it can be a chore to find the right line.
Generic logging solutions have to be configured correctly. They bring much more flexibility, but the cost of getting everything set up right can be high. And costs don’t just mean license costs; there are excellent open-source solutions. Costs mean the time it takes to make all these configurations. It also means that you will probably miss a few errors until you’ve finally fine-tuned all configurations.
Dashbird also eases the pain of configuring important alerts for you. This doesn’t just mean that the UI is easier to understand than what AWS offers; it means that Dashbird comes with out-of-the-box pre-configured alerts right from the start. Dashbird understands AWS services; it’s not just a generic monitoring solution you staple onto your Lambda functions. These alerts include Dashbird’s know-how and suggestions on how to improve your system’s health and performance, gathered and built over the years from monitoring thousands of serverless systems running in production.
The mix of a hand-tailored monitoring solution for specific serverless AWS services and the know-how of production systems make Dashbird more comprehensive than other monitoring systems. This also means that Dashbird needs less manual configuration as a generic solution that has to be manually fitted to different services.
Dashbird will not just provide you with simple serverless failure detection but also alert you when they’re about to fail. This way, you can start to work on a solution before there even was an incident.
Using an appropriate channel and format when alerting your developers is the other side of the coin. Integration with the services developers use on a daily basis is also important for alerting. Sure, sometimes it’s enough to send an email, but Dashbird also offers Slack, email, AWS SNS, and webhook integration.
This way, an alert will find developers where they are currently active, and they can respond right away, not just when they check their email two hours after a problem arises.
You don’t want to pay for monitoring just to get an email from a customer who tells you that something is wrong just because an alert hasn’t been noticed.
Integration also allows you to automate responses to specific problems. As the creator of your architecture, you know best what to do when your traffic spikes. Maybe, you need to provision more capacity, but maybe you just need to tell your customer that their quota is reached for the month, and they will now be throttled.
When choosing a monitoring solution for your serverless architecture, it’s of utmost importance to focus on their alerting features. In the last years, function as service solutions, like AWS Lambda, was sold as the central aspect of building serverless systems, but serverless so much more. Manged services like S3, DynamoDB, Cognito, Aurora, and SQS, help you to reduce the time and personnel needed to get new features out to the market frequently.
But these managed services don’t come without a cost. On the one hand, you don’t have to maintain your servers anymore; but on the other hand, you can’t SSH into these services and install whatever monitoring client you want.
Managed services are also an opportunity for monitoring providers. If they can break down the infrastructure you use to build your systems to a few concrete and well-known services, the help monitoring provides can also deliver less abstract. You don’t have to think about all the possible ways your system can and probably will fail, but instead, you can rely on the know-how of a monitoring provider like Dashbird to catch problems for you without the painful learning experience usually connected to serverless failure detection and monitoring.
Previously published at https://dashbird.io/blog/failure-and-threat-detection-serverless/