(Disclaimer: The Author is the Head of Developer Relations at n8n)
I wanted to test how automating a minimalist incident response playbook would look like and I decided to test it out with three of my favorite tools n8n, PagerDuty and Mattermost. Hereās a quick introduction to the three tools, in case you arenāt aware of them:
To avoid panic during an incident, a lot of companies have an incident response playbook. I created a minimalist six-step playbook for this tutorial. Whenever, a service goes down or something unexpected happens, the on-call team would follow this high-level protocol:
We will automate this playbook with three workflows in n8n and this is how the result shall look like once we are done.
Our first workflow will cover the first three steps of the playbook. Whenever a service goes down and creates an incident report on PagerDuty, we want the workflow to automate the following tasks for us:
Letās get started with the nodes of the first workflow. I have also submitted Workflow 1 on n8n.io, in case youād like to skim through this workflow. Please note that youāll still need to configure a couple of things like your credentials, channels on Mattermost as well as the settings of the nodes. You can find information on how to setup n8n in the documentation.
First of all, we need to pull in the new incident reports from PagerDuty. To do that start n8n with the tunnel parameter:
n8n start --tunnel
Note: Make sure that you donāt forget to add theparameter.--tunnel
Add a new node by clicking on the + button on the top right of the Editor UI. Select the Webhook node under the Triggers section.
In the Node Editor view, set the HTTP method to
POST
. For the Path, I have entered webhook
but feel free to add something else here according to your preferred convention. Now, youāll need to save the workflow. I named it āIncident Response Workflowā. Once the workflow is saved, click on Webhook URLs, select Test, and then click on the URL to copy it to the clipboard.Note: Donāt forget to save the workflow first before copying the Webhook URLs.
Hereās a GIF of me following the steps mentioned above.
Now that we have our Webhook node ready on n8n, weāll need to configure the settings on PagerDuty, so that it sends the new incident reports to the webhook.
Unless your team already uses PagerDuty, you can create a free trial account on PagerDuty. If you are creating a new account, youāll also have to create a service that PagerDuty will be monitoring.
PagerDuty has integrations with a lot of services, to monitor them, in case something goes wrong. Once you have created your service, letās configure the webhooks for the service.
To do that, select the Configuration menu on the top and click on Services. Click on the More button on the right side and select View Integrations from the menu (do this for the service that you want to configure the webhook for). Now, under the section called Extensions, click on the New Extension button and select āGeneric V2 Webhookā as the Extension Type. I entered
n8n
as the name and entered the URL that the copied from the Webhook node. Click on the Save button and we are done!Hereās a GIF of me following the steps mentioned above.
Now, click on the Execute Workflow button to register the webhook. Once youāve done that, you can create a new incident at PagerDuty. Your Webhook node will receive all the details. Keep in mind that the Test webhooks are only valid for 120 seconds. It should look something like in the following image.
At times, when you are sending too many requests from PagerDuty, it will disable the webhook. Youāll have to re-enable it by going to the list of extensions and clicking on the Re-enable button.
Now, we need to create a Mattermost node that will create an auxiliary channel so that the on-call team can coordinate on a fix for the incident.
To do that, click on the + button and click on the Mattermost node. In the Node Editor, enter your Mattermost credentials. Hereās some detailed information on how to create an access token for the credentials. I have used an access token from a bot account, but you can also use the access token from your account.
Note: Throughout the tutorial, please make sure that the nodes are connected properly before you start the configuration in the Node Editor. If you donāt do this, the variables mentioned in the tutorials might not be visible to you.
Once you are all sorted out with the credentials, select āChannelā as the Resource in the Node Editor. Now select your team as the Team ID (in case you are unable to acquire that, please check with your system admin). We now need to enter a Display Name for the channel. Since this would be a dynamic piece of information, click on the gears icon next to the field and select Add Expression. Select the following in the Variable Selector:
Nodes > Webhook > Output Data > JSON > body > messages > [Item: 0] > log_entries > [Item: 0] > incident > summary
Quite some indentation, I know! This will make sure that the display name of the channel would be the same as the incident summary on PagerDuty to keep things coherent. Now you need to enter a Name. This needs to be a unique value, so weāll select the
id
from the Incident report. Click on Add Expression and select the following in the Variable Selector:Nodes > Webhook > Output Data > JSON > body > messages > [Item: 0] > id
Perfect, now click on Execute Node and this will create an auxiliary channel on Mattermost. Hereās a GIF of me following the steps mentioned above.
Once the auxiliary channel has been created, we need to make sure that all the on-call team members have been added to the channel. However, right now weāll add a single user to the channel.
To do that create another Mattermost node. Select the credentials that you entered earlier. Select āChannelā as the Resource and click on āAdd Userā for Operation. Now we have to specify the Channel ID where the user should be added. Since this is another dynamic piece of information, click on Add Expression and in the Variable Selector, select the following:
Nodes > Mattermost > Output Data > JSON > id
Now we will specify a user by selecting ourselves from the dropdown list for User ID. Click on the Execute Node button and you will notice that you will be added to the channel. This node ensures that the specified user is always added to the auxiliary channel created by the workflow.
Hereās a GIF of me following the steps mentioned above.
As an exercise, try using the PagerDuty API to pull a list of the email IDs of the people who are on-call and add them to the auxiliary channel in Mattermost. Feel free to pick this up once you are finished with the tutorial.
Since the playbook specifies that the issue should also be triaged in Jira, weāll need to add a node that creates a ticket in Jira. To do that, create a Jira node by clicking on the + button on the top right.
In the Node Editor, enter the Credentials for Jira. Hereās detailed information on how you can create a new API Token for the credentials.
Once you are sorted out with the Credentials, select the Project where the tickets would be created. I selected a test project that I created specifically for this tutorial. In the Issue Type, I selected āStoryā but feel free to select āBugā or something else. Summary is a dynamic piece of information, select Add Expressions and pick the
summary
variable just like you did for the Display Name section while configuring the Mattermost node to create a channel.Click on Execute Node and this will create a Jira ticket for you. Hereās a GIF of me following the steps mentioned above.
The next thing that needs to be done is to post the details of the incident in the Incidents channel. We will need to share the following information in the channel:
Summary of the incidentLink to the Auxiliary channelLink to the PagerDuty incidentLink to the Jira ticket
Sharing these pieces of information will ensure that if someone outside of the on-call team is interested to check out what is going on, they can get this information from the Incidents channel.
To do this, create a new Mattermost node. In the Node Editor, select your Credentials. Now we need to enter the Channel ID. Since this is not a dynamic piece of information (the Incidents channel would always be there and hence, the ID will remain the same), we need to grab its Channel ID.
If you donāt already have a channel like this for the tutorial, you create manually create a new channel on Mattermost. To get its ID, click on the down arrow next to the channel name and click on the View Info option. This will reveal the ID of the channel. You can then copy and paste that in the Channel ID field in the node. In the message section, I entered the following expression to include the information that we mentioned in the list above.
šØ New incident: {{$node["Webhook"].json["body"]["messages"][0]["incident"]["summary"]}}
Auxiliary Channel -> https://mattermost.internal.n8n.io/test/channels/{{$node["Mattermost"].json["name"]}}
PagerDuty Incident -> {{$node["Webhook"].json["body"]["messages"][0]["incident"]["html_url"]}}
Jira Issue -> https://n8n.atlassian.net/browse/{{$node["Jira Software"].json["key"]}}
Finally, click on the Execute Node button to send this information to your Incidents channel. Hereās a GIF of me following the steps mentioned above.
As a last step of this workflow, we need to provide the information that we talked about in the previous node to the auxiliary channel as well. Moreover, we will need to provide the following two buttons in the channel:
To do this, create a new Mattermost node and connect it to the Jira node. This will ensure that this and the previous Mattermost node can run in parallel. In the Node Editor, select your Credentials. Next, youāll need to enter the Channel ID of the auxiliary channel. You can follow the steps mentioned in Workflow 1, Step 3 to do that. In the Message section, I entered the following expression (this is quite similar to the Message from the previous node):
ā ļø {{$node["Webhook"].json["body"]["messages"][0]["log_entries"][0]["incident"]["summary"]}}
PagerDuty incident: {{$node["Webhook"].json["body"]["messages"][0]["log_entries"][0]["incident"]["html_url"]}}
Jira issue: https://n8n.atlassian.net/browse/{{$node["Jira Software"].json["key"]}}
Now, we need to create the buttons which will trigger the actions that we talked about. To do that, under Attachments, click on the Add attachment button, click on Add attachment item, and select Actions. Then click on the Add Actions button and name it
Acknowledge
.Now click on the Add Integration button. This will allow us to give the URL of the webhook this button will trigger on being clicked. Weāll leave this empty for now.
Weāll also need to send details (to the next workflow) about the PagerDuty incident to mark as resolved when the button is clicked. To do that, click on the Add Context to Integration button under the Context section. Weāll enter
pagerduty_incident
as the Property Name. Since the Property Value is a dynamic piece of information, click on Add Expression. In the Variable Selector, select the following:Nodes > Webhook > Output Data > JSON > body > messages > [Item: 0] > incident > id
Now, add another button called
Resolve
and following the same steps mentioned above. For this button, weāll need to add the context of the pager duty incident and the Jira ticket key. Iāll leave this as an exercise for you. For the sake of uniformity, you can name the Property Name jira_key
.In case you were wondering, it is important to send the context with the buttons as there might be multiple auxiliary channels at any given time and multiple people clicking on different Acknowledge and Resolve buttons. We need the correct context so that we donāt close up the wrong PagerDuty incidents and Jira tickets by mistake.
Click on the Execute Node button to send all this information to the auxiliary channel. Hereās a GIF of me following the steps mentioned above.
Our second workflow will cover the fourth step of the playbook. Once all the people responsible get notified that an incident has occurred, we need to make sure that there is a quick and easy way to acknowledge the incident so that it is clear that someone in the on-call team has got it.
Letās get started with the nodes of the second workflow. I have also submitted Workflow 2 on n8n.io, in case youād like to skim through this workflow. Please note that youāll still need to configure a couple of things like your credentials as well as the settings of the nodes.
We now need to set up a Webhook node that listens to the event when somebody clicks on the Acknowledge button in the auxiliary channel.
Create a Webhook node the same way you did in Workflow 1, Step 1. Now copy the link of the Test webhook from this Webhook node, go to the node from Workflow 1, Step 6 and paste it in the URL field in the Integration section of the Acknowledge button under Actions.
Once you are done with that, click on the Execute Node button to register the webhook and test it by clicking on the Acknowledge button in the auxiliary channel. Hereās a GIF of me following the steps mentioned above.
Now we need to get the ID of the incident from the webhook node to know which incident to mark as acknowledged. We get this information from the context that we added to the Integration of the button.
Add a PagerDuty node by clicking on the + button on the right side. In the Node Editor view, first of all, youāll have to enter the Credentials for PagerDuty. Hereās detailed information on how you can create a new API Token for the credentials. Once you are done with that, select āUpdateā as the Operation. Since the Incident ID is a dynamic piece of information, click on Add Expression and select the following in the Variable Selector:
Nodes > Webhook > Output Data > JSON > body > context > pagerduty_incident
In the Email field, I have just entered my email. In the Update Fields section, click on the Add Field button and select Status. From the dropdown list in the Status field, select āAcknowledgedā. Now, click on the Execute Workflow button. Go to the auxiliary channel and click on the Acknowledge button. This will change the status of your incident report from āTriggeredā to āAcknowledgedā. Hereās a GIF of me following the steps mentioned above.
Now we just need to confirm the change of status of the PagerDuty incident by sending a message to the auxiliary channel. Iāll leave this as an exercise for you. In case you run into any troubles, hereās a GIF of me creating this node.
Our third workflow will cover the sixth step of the playbook. Once the issue has been fixed, we need to make sure that the incident on PagerDuty has been marked as āResolvedā and the ticket on Jira has been marked as āDoneā. We also need to ensure that everyone in the Incidents and the auxiliary channel is aware of the resolution as well.
Letās get started with the nodes of the third workflow. The nodes of this workflow have been left as an exercise for you. I have added GIFs for the nodes and have also submitted Workflow 3 on n8n.io, in case you run into any troubles. Please note that youāll still need to configure a couple of things like your credentials as well as the settings of the nodes.
Just like in the last workflow, we need a Webhook node that listens to the event when somebody clicks on the Resolve button in the auxiliary channel. Hereās a GIF of me creating this node.
Now we need to change the status of the PagerDuty incident from āAcknowledgedā to āResolvedā. This is very similar to the Workflow 2, Step 2. Hereās a GIF of me creating this node.
Now we need to update the status of the Jira ticket to āDoneā. Hereās a GIF of me creating this node.
Lastly, we need to create two Mattermost nodes:
Hereās a GIF of me creating this node.
Congratulations, you successfully built an automated incident response workflow using n8n, PagerDuty and Mattermost š
Letās run the whole system end to end. First of all, youāll have to click on the Execute Workflow button on all three workflows to register the Webhook nodes. Go ahead and get started by creating a new incident on PagerDuty.
Now, to make sure that the workflow runs permanently without you having to press the Execute Workflow on all three workflows before each incident creation, weāll need to use the Production webhook.
To do that, youāll just need to get the Production webhook URL from the different Webhook nodes, update the URLs on PagerDuty and the Mattermost node from Workflow 1, Step 6, save the workflows and finally activate the workflows. This will make your workflows ready to use.
Note: When working with a Production webhook, please ensure that you have saved and activated the workflow. Donāt forget that the data flowing through the webhook wonāt be visible in the Editor UI with the Production webhook.
Today we created an automatic incident workflow using a variety of n8n nodes. The first-class support for webhooks and APIs allows n8n to integrate a very wide array of services and products, to create powerful workflows in a simplified way. This was an example of automating a minimalist incident response playbook. Which other services are you using for managing incidents in your organization? In case you have created other workflows with n8n that use different nodes, Iād love to check them out, please consider sharing those workflows with the community.
In case youāve run into an issue while following the tutorial, feel free to reach out to me on Twitter or ask for help on our forum š
(Disclaimer: The Author is the Head of Developer Relations at n8n)