Disclaimer: I am the co-founder of . This is a success story about building and dogfooding a that solves my own problems. Assertible product Today there was a brief outage in one of my APIs. The series of events that led me to the issue made me realize . I wanted to outline what happened, and how helped me identify the problem more quickly than . identify just how important effective notifications are in an API monitoring tool Assertible Pingdom This particular web service is one that I would consider critical; users for services. Uptime monitoring is set up with and more in-depth validations are running on . Both of services are every few minutes, and are set up to alert the team if any of those checks fail. rely on the availability of this service their Pingdom Assertible continuously checking the API So here’s what happened A pull request was merged, and CircleCI started building the app and preparing it for deployment.This is a routine process that happens several times every day. I stepped out of the office, but had my phone with me to receive alerts (fortunately). At 1:22 PM, I received a notification from Pingdom that The default alerts sent by Pingdom don’t provide meaningful information about the outage — definitely not enough to act on: the service was DOWN. any Thanks, Pingdom. What the heck? This is when I first knew . In a panic, I pulled up the on my phone, but before it even loaded I received a downtime alert from Assertible: something was wrong with my API AWS console app second Assertible’s failure alert — within 1 minute of Pingdom’s Bingo! From the Assertible failure alert, I . We’ve seen status codes on numerous occasions during deployments. Although extremely inconvenient, this wasn't a rare occurrence. immediately knew the issue was somewhere in the AWS deployment 503 I monitored the AWS events as it repaired and re-deployed the failing instances on it’s own. After just a few minutes, . I can breathe. the API was back up and everything was healthy When I got back to the office, I was able to corroborate what I derived from the Assertible alert by looking at the AWS event log. AWS had . failed to deploy the new application By this time everything was operating normally, so I didn’t have to take any action. The moral of the story here is that, sometimes, . Web services are complicated beasts, and each one has it’s own unique way of behaving. They should be continuously validated for the business logic they’re built to provide. a simple ping is not enough The fact that Assertible was: running health checks on a schedule set up to send API failure alerts and had HTTP assertions to validate expected status codes. …were the key factors in finding the root cause of this issue in under 3 minutes. Context is key, and the default Pingdom alerts do not provide that. Don’t get me wrong — I will continue using Pingdom. But Assertible will always be running alongside doing more . in-depth checks and validation on my web services I’m happy that I built a tool that , and I hope other people will find this useful in determining what’s important in web service monitoring. solves my own problems :: @CodyReichert