Node.js Monitoring Done Right

Written by RisingStack | Published 2016/11/07
Tech Story Tags: devops | microservices | nodejs | monitoring | web-development

TLDRvia the TL;DR App

Node.js Monitoring is essential for companies building competitive products with great user experience — the goal of this article is to discuss the reasons for it. I’ll also make the argument that monitoring is capable of saving tremendous money and is helpful with maintaining a high reputation.

This article focuses on monitoring your product on the application-level, mainly from a developer, manager and customer point of view. We’ll talk solely about software issues, and not about hardware anomalies.

To help you get started, I’ll provide you list with the 11 essentials of application performance monitoring.

What is monitoring?

According to the Oxford Dictionary, monitoring means

To observe and check the progress or quality of (something) over a period of time; keep under systematic review.

Ok, we get it. But what is Application Performance Monitoring/Management (APM)?

Application Performance Management (APM) is the monitoring and management of performance and availability of software applications.

Does it mean that applications and software have different quality and quantity metrics what we can observe, compare and even get notified when something changed? Yes, it is. Why is it important for me, and why should it be important for you as well? Let’s see.

Why should I care?

If you are a manager or a developer in your company, then probably you would like to know when your application stops or when it doesn’t serve your customers need. To get a hold of this information, you’ll need to monitor service downtimes and set up a proper alerting system. Easy, and thankfully, most of the companies are already handling this issue.

When you dig a little bit deeper and start caring not just about running your application, you’ll probably become interested in fulfilling user expectations and minimizing the number of bugs in your production system. That’s great, and you have a very good reason to do so, namely, when your users are facing bugs and slowdowns, they will leave your application and search for another one capable of delivering the same benefits without your errors. This is how your errors, bugs, and slowdowns lead to lower conversion rates, causing a drop in your businesses income.

With a good monitoring solution, you can catch more than just bugs and slow response times! You can also discover performance bottlenecks on the network and code level and preempt losing money.

Talking about losing money: I’m sure you already know that using more resources than you need is expensive. Also, spending time with inefficient debugging will affect your deadlines and wastes your development time.

So what is monitoring?

Monitoring is a way of preventing the loss of your resources, your customers, and a solution to stop wasting time. Monitoring is essential for not wasting money.

I’m sure you agree with this, but you might not know what are the must-have elements of your application that you should keep an eye on. Let me help:

What should be monitored?

In the next part of the article, I’ll provide you a list of what you should monitor and why.

1. Keep your eye on service downtimes

This is the simplest thing that you can monitor.

It’s easy to understand that when your application doesn’t work, your customers cannot spend money on your site. Other than that, they will also be disappointed. If this issue is frequently present, or it happens during the customer acquisition process, you may lose users forever.

Set up your monitoring solution in a way that it will be able to instantly notify the engineering team in case of a service downtime.

2. Slow response times impact conversion rates — Don’t let them!

You can also monitor your services response times. A slow service is less of money bleeder on the short term, but it leads to a broken user experience.

Researchers proved that customers prefer faster websites and products. Sometimes they also mention speed as the main reason of why they switched to a competitor’s product.

It’s also proven, that slower response time has an impact on your conversion rate. Finding the correlation between response time and conversion rate is not always obvious, but one thing is for sure: slow services always underperform compared to an equivalent but faster product.

Make sure you use a monitoring solution that can tell you the exact response time for each of the services of your application — and lets you compare it with your historical data.

Monitor slow response times

3. Monitor error rates and crashes to find out about bad code quality

A broken functionality may prevent your customers from registering or spending money on your site. To find out about these issues, you should monitor the amount and type of wrong status codes and application crashes.

The high number of errors can also be a sign of something important, namely that you are facing code quality issues. Proper monitoring leads you to ask these questions as a manager:

  • Do I give enough time to my developers to ship quality features?
  • Do I have good test coverage?
  • Do I have QA?

Bad code quality always requires more developer time to debug and fix broken things.

4. Don’t waste your money on unnecessary resources

A good APM solution lets you monitor your applications resource usage.

Inefficient services use more resources — and they cost more money to run. Finding performance bottlenecks helps you to scale your software in an efficient way and to optimize your spending.

Don’t waste your money on unnecessary hardware because of the bad code you wrote — instead find a tool that will let you know where you should improve your application.

To do so, look for solutions that enable you CPU and memory profiling. CPU profiling allows you to analize which functions were running while your application slowed down, while memory heapdumps are useful for finding memory leaks.

If you find and fix the performance bottlenecks in your system, you don’t have to scale up when something goes wrong.

Finding bottlenecks using profiles

5. Examine your 3rd party API(s) and services

Do you use a lot’s of external API(s)? I guess so. You’re paying for them as well, right?

If you pay a lot for external services, it makes sense to monitor the quality and availability of them. For example, you can monitor the error rate or response time of your third-party payment provider. Maybe the issue is not in your service. A proper APM tool will help you to find the low performing external parts of your application, and save you money on them.

6. Know the quality and performance of your releases

Your performance and code quality can be different with each of the versions (deploys) of your application. Be sure that you continuously monitor your services and you have a good solution to compare your metrics between multiple deploys and releases. If you do so, you’ll instantly see when a revert is needed to keep up the previously good performance and user experience.

New is not always better. Not knowing whether new is better is even worse!

Monitor using custom metrics

7. Correlate application performance with your business metrics

Business metrics and performance metrics can correlate, which will lead you to valuable insights. Usually, it’s worth to check some of your key business metrics together with your performance metrics.

Let’s see an example: You recognized that whenever a lot of new users register, your response time gets higher than the usual.

In this case, it’s quite possible that the response times are higher because of your service sends a welcome email to your new users, and the email rendering blocks your event loop — so your service is not able to fulfill the requests as fast as usually.

When your APM is able to collect and correlate data, you will hunt down performance issues easier, because you can see the business context behind them.

8. Follow distributed transactions through your system

If you are building a distributed system (one with a microservices architecture), you probably know how challenging can it be to track down issues when they happen in a call chain with multiple services. What’s just a stack trace on a lower level may cause an unknown anomaly on higher levels.

Microservices needs slightly different tools to monitor and debug. Luckily, distributed transaction tracing can help you to catch issues between services.

Distributed tracing is a method first mentioned in the Google Dapper whitepaper, which allows you to monitor complex, large-scale distributed systems.

With an APM capable of distributed tracing, you can visualize your service calls with databases and external API(s) as well. So instead of extracting the logs from all of the participating services, you can see your whole transaction on a timeline visualization, with the highlighted issues.

Find issues in distributed systems using distributed tracing

9. Don’t let security vulnerabilities ruin your day

Most of the services are built with using lots of packages from npm or other package managers. These modules can and usually do contain security vulnerabilities.

It’s good to know when we should upgrade our dependencies to prevent security breaches. Some of the monitoring tools also provide alerting for security issues — make sure you use one with it. There’s nothing more embarrassing than leaking user data to malicious attackers.

10. Do alerting right

It’s not enough to have a place where we can check all of these important metrics since we cannot watch them 24/7; that’s why a good monitoring solution provides an alerting feature as well.

Be the first who knows about issues

Issues should be found as soon as possible; Don’t wait for your biggest customer to find them.

To be up to date about ongoing incidents in your application, you should set up alerts based on them. If you are building a critical service, you should be alerted even in the middle of the night.

Always keep in mind: production issues do not respect your working hours.

Prevent noise

Being notified about the wrong alerts is not just wasting time, it also makes harder to catch critical issues. When you configure an alert, always be sure that you set it with the right criteria and send it to the right channel.

It’s also a good practice to use a channel like Pagerduty or Opsgenie to notify the right people in the right way (email, SMS, phone, etc.) in the right time. Sending alerts via email may just spam your mailbox.

11. Monitoring is not easy, so don’t build your own tool!

A perfect monitoring tool saves money and makes it easy and quick to hunt down complex issues. As a result, your service will be healthier and faster, and your customers will appreciate it.

Building a proper monitoring tool needs a lot of time, knowledge and effort, and it’s like a separate product in your company. Instead of creating your own solution, I highly recommend to choose a third-party APM and spend your time on making your product awesome and your customers happier.

Getting Started with Node.js Monitoring

Nowadays, most of the monitoring tools are available as a SaaS solution. This means that you don’t have to spend your time to setup and maintain them — you can just use them and enjoy your extra time and money you saved.

As a software engineer working on a distributed tracing application, I can only recommend embracing a reliable and insightful monitoring solution. The benefits of using one will be measurable. So, how about start saving some money today?

About the author: Peter Marton (@slashdotpeter) is the co-Founder and CTO of RisingStack, currently working on Trace — a Node.js Performance Monitoring and Debugging tool designed for understanding and troubleshooting microservices-based applications on a process level.


Published by HackerNoon on 2016/11/07