paint-brush
How to Create an Actionable SEO Dashboard with SEMRush and Pythonby@mmarcvh
502 reads
502 reads

How to Create an Actionable SEO Dashboard with SEMRush and Python

by Marc van HerrikhuijzenOctober 14th, 2020
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Python can be a good way to analyze a lot of data, make some custom calculations on this and plot it just the way you want! In this article I will write a step-by-step process for creating your own dashboard using Python’s Pandas and (in my case) SEMrush exports. The dashboard is set up in 4 main steps: collect and clean the exports from SEMrush (or any other ranking export file) and create a useful work file to continue analyzing. Plot rankings overtime per category and keyword rankings per category.

Companies Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - How to Create an Actionable SEO Dashboard with SEMRush and Python
Marc van Herrikhuijzen HackerNoon profile picture

For most SEO specialists it is mandatory to keep track on search visibility and keyword rankings in search engines. There are a lot of tools who provides this, such as AWR Cloud, STAT, Ahrefs, SEMrush and many more. This data can be overwhelming when you have a lot of keywords to track and want to be able to compare current rankings with rankings in the past for you and your main competitors.

Also, this amount of information (together with server speed) can impact the fun of analyzing when the information you want is not instantly loaded.

Python can be a good way to analyze a lot of data, make some custom calculations on this and plot it just the way you want!

In this article I will write a step-by-step process for creating your own dashboard using Python’s Pandas and (in my case) SEMrush exports. As said, there are a lot of keyword tracking tools, so any other export is fine as well but needs some customization tweaks.

Global idea and working of the dashboard

The dashboard is set up in 4 main steps:

  1. Collect and clean the exports from SEMrush (or any other ranking export file) and create a useful work file to continue
  2. Add a visibility calculation to the data and plot this over categories
  3. Plot rankings overtime per category and keyword
  4. Create a pivot dashboard for specific insights

The result will be good understanding of ranking and visibility fluctuations, find issues per category and even per keyword group. This can give you a very specific idea where to optimize.

Let’s get started

For my company we track 571 keywords at SEMrush over 88 categories (tags). For this article I have collected exports from SEMrush on a weekly base for mobile rankings. I also track the rankings of the 4 main industry competitors.

1: Collect and clean the exports from SEMrush (or any other ranking export file) and create a useful work file to continue

Now that we have some background information, open a Jupyter Notebook (recommended because of the visualizations) and import the main libraries to use.

#import all libraries and widgets 
import win32com.client as win32
import os
import pandas as pd
import numpy as np
import re
import matplotlib as mpl
import matplotlib.pyplot as plt
%matplotlib inline
from ipywidgets import interactive, widgets, interact, interact_manual
import ipywidgets as widgets
from IPython.core.display import display, HTML
from pivottablejs import pivot_ui

#and let’s ignore warnings
import warnings
warnings.simplefilter('ignore')

Now it is time to ask Pandas for help and make some fine tweaks to the data. Notice the ‘header=7’ in the script below. This looks weird, and I think it is weird. But that is how SEMrush creates the exports. Perfectly fine. But weird 😊.

#read .xlsx into dataframe
df_20200930 = pd.read_excel("./rankings_converted/20200930.xlsx", header=7)
#replace empty landing pages
df_20200930.replace(r'\s+',np.nan,regex=True).replace('',np.nan, inplace=True)
#add date to columns, date should be changed manually
df_20200930[Date] = '2020-9-30'
#to avoid a warning
pd.options.mode.chained_assignment = None 
#date to datetime64 dtype
df_20200930[Date] =  pd.to_datetime(df_20200930[Date], format='%Y%m%d', infer_datetime_format=True)
#create more readable names for the columns
df_20200930.rename(columns={'tags':'Tags','*.expert.nl/*_20200930': 'Rankings_Expert', '*.expert.nl/*_20200930_landing': 'Expert_url','*.bcc.nl/*_20200930': 'Rankings_BCC', '*.bcc.nl/*_20200930_landing': 'BCC_url','*.coolblue.nl/*_20200930': 'Rankings_Coolblue', '*.coolblue.nl/*_20200930_landing': 'Coolblue_url','*.mediamarkt.nl/*_20200930': 'Rankings_Mediamarkt', '*.mediamarkt.nl/*_20200930_landing': 'Mediamarkt_url','*.wehkamp.nl/*_20200930': 'Rankings_Wehkamp', '*.wehkamp.nl/*_20200930_landing': 'Wehkamp_url','Search Volume': 'Search_Volume'}, inplace=True)
#define which columns will be used
use_cols_20200930 = df_20200930[['Datum','Keyword','Tags','Rankings_Expert','Expert_url','Rankings_BCC','BCC_url','Rankings_Coolblue','Coolblue_url','Rankings_Mediamarkt','Mediamarkt_url','Rankings_Wehkamp','Wehkamp_url','CPC','Search_Volume']]
#check if it works
df_20200930.head()

After that we can save this to a .xlsx file with customized columns. For these files I have created a new folder: [./use_cols].

#save to new .xlsx file
use_cols_20200930.to_excel('./use_cols/20200930.xlsx')

We need to do this for all periods. So, just go find and replace the dates when editing and saving. If someone have any clue on how to automate this process, please let me know!

Because we need to work with all this configured Excel files it is time to gather all the DataFrames and combine all files from the folder.

#gather all files
data_20200129 = pd.read_excel('./use_cols/20200129.xlsx')
data_20200205 = pd.read_excel('./use_cols/20200205.xlsx')
data_20200212 = pd.read_excel('./use_cols/20200212.xlsx')
data_20200219 = pd.read_excel('./use_cols/20200219.xlsx')
data_20200226 = pd.read_excel('./use_cols/20200226.xlsx')
data_20200304 = pd.read_excel('./use_cols/20200304.xlsx')
data_20200311 = pd.read_excel('./use_cols/20200311.xlsx')
data_20200318 = pd.read_excel('./use_cols/20200318.xlsx')
data_20200325 = pd.read_excel('./use_cols/20200325.xlsx')
data_20200401 = pd.read_excel('./use_cols/20200401.xlsx')
data_20200408 = pd.read_excel('./use_cols/20200408.xlsx')
data_20200415 = pd.read_excel('./use_cols/20200415.xlsx')
data_20200422 = pd.read_excel('./use_cols/20200422.xlsx')
data_20200429 = pd.read_excel('./use_cols/20200429.xlsx')
data_20200506 = pd.read_excel('./use_cols/20200506.xlsx')
data_20200513 = pd.read_excel('./use_cols/20200513.xlsx')
data_20200520 = pd.read_excel('./use_cols/20200520.xlsx')
data_20200527 = pd.read_excel('./use_cols/20200527.xlsx')
data_20200603 = pd.read_excel('./use_cols/20200603.xlsx')
data_20200610 = pd.read_excel('./use_cols/20200610.xlsx')
data_20200617 = pd.read_excel('./use_cols/20200617.xlsx')
data_20200624 = pd.read_excel('./use_cols/20200624.xlsx')
data_20200701 = pd.read_excel('./use_cols/20200701.xlsx')
data_20200708 = pd.read_excel('./use_cols/20200708.xlsx')
data_20200715 = pd.read_excel('./use_cols/20200715.xlsx')
data_20200722 = pd.read_excel('./use_cols/20200722.xlsx')
data_20200729 = pd.read_excel('./use_cols/20200729.xlsx')
data_20200805 = pd.read_excel('./use_cols/20200805.xlsx')
data_20200812 = pd.read_excel('./use_cols/20200812.xlsx')
data_20200819 = pd.read_excel('./use_cols/20200819.xlsx')
data_20200826 = pd.read_excel('./use_cols/20200826.xlsx')
data_20200902 = pd.read_excel('./use_cols/20200902.xlsx')
data_20200909 = pd.read_excel('./use_cols/20200909.xlsx')
data_20200916 = pd.read_excel('./use_cols/20200916.xlsx')
data_20200923 = pd.read_excel('./use_cols/20200923.xlsx')
data_20200930 = pd.read_excel('./use_cols/20200930.xlsx')

The sad part for me is that we don’t use SEMrush API and therefore this needs to be updated manually per week. Boehhh!

#combine all files from directory
files = [file for file in os.listdir('./use_cols/')]
workfile = pd.DataFrame()
for file in files:
    df = pd.read_excel('./use_cols/'+file)
    workfile = pd.concat([workfile, df])
#check results
workfile.info()

And let’s now forget to categorize the keywords! In a subfolder ([./workfile]) I created a .txt file with the tags of the keywords. Example of the tags.txt can be found here.

#categorize all keywords from txt file
tags = pd.read_csv('./workfile/tags.txt', sep=';')
#merge tags on keywords, check and export to new .csv (which will be a work file)
output = workfile.merge(tags, on='Tags')
output.head()
output.to_csv('./workfile/output.csv', index=False)

2: Add a visibility calculation to the data and plot this over categories

Now that we have our work file it is time for some fun stuff and get actual plots and useful insights.

#open work file and get rid of empty fields
df = pd.read_csv('./workfile/output.csv',) 
df.drop('Unnamed: 0', axis=1, inplace=True) 
#replace empty landing pages
df.replace(r'\s+',np.nan,regex=True).replace('',np.nan, inplace=True)

For a visibility report some calculations needs to be done. When ranking in organic search for position one you have well deserved a 100% visibility score. This will be less when ranking lower up till position 30 where the visibility percentage stops. An example can be downloaded here.

#create visibility_dataframe for each website
##visibility_Expert
df_vis_expert = pd.read_excel('./visibility_competitors/vis_expert.xlsx')
df_vis_calc = df.merge(df_vis_expert,on='Rankings_Expert', how='outer')
df_vis_calc['Visibility_Expert'] = df_vis_calc.Visibility_Expert.astype(float)
##visibility_BCC
df_vis_bcc = pd.read_excel('./visibility_competitors/vis_bcc.xlsx')
df_vis_calc2 = df_vis_calc.merge(df_vis_bcc,on='Rankings_BCC', how='outer')
df_vis_calc2['Visibility_BCC'] = df_vis_calc2.Visibility_BCC.astype(float)
##visibility_Coolblue
df_vis_coolblue = pd.read_excel('./visibility_competitors/vis_coolblue.xlsx')
df_vis_calc3 = df_vis_calc2.merge(df_vis_coolblue,on='Rankings_Coolblue', how='outer')
df_vis_calc3['Visibility_Coolblue'] = df_vis_calc3.Visibility_Coolblue.astype(float)
##visibility_Mediamarkt
df_vis_mediamarkt = pd.read_excel('./visibility_competitors/vis_mediamarkt.xlsx')
df_vis_calc4 = df_vis_calc3.merge(df_vis_mediamarkt,on='Rankings_Mediamarkt', how='outer')
df_vis_calc4['Visibility_Mediamarkt'] = df_vis_calc4.Visibility_Mediamarkt.astype(float)
##visibility_Wehkamp
df_vis_wehkamp = pd.read_excel('./visibility_competitors/vis_wehkamp.xlsx')
df_vis_calc5 = df_vis_calc4.merge(df_vis_wehkamp,on='Rankings_Wehkamp', how='outer')
df_vis_calc5['Visibility_Wehkamp'] = df_vis_calc5.Visibility_Wehkamp.astype(float)

#rename file and do some editing
df_workfile = df_vis_calc5.dropna(subset=['Keyword'])
df_workfile.rename(columns={'Tag2':'Categorie'}, inplace=True)
#reset index
df_workfile.reset_index()
#check if it works
df_workfile.head()
#create visibilty_dataframe for all websites and export to .csv
visibility = df_workfile[['Datum','Keyword','Tags','Categorie','Visibility_Expert','Visibility_Coolblue','Visibility_Mediamarkt','Visibility_Wehkamp','Visibility_BCC']]
visibility.fillna(0, inplace=True)
visibility.to_csv('./workfile/visibility.csv', index=False)

Notice that we also have created a DataFrame with visibility scores for all websites and categories. In step 3 we are going to plot rankings per categories as well. So why not create a DataFrame for that purpose right now with a very similar line of code.

#we are going to use the following export later on
ranking = df_workfile[['Datum','Keyword','Search_Volume','Tags','Categorie','Rankings_Expert','Rankings_Coolblue','Rankings_Mediamarkt','Rankings_Wehkamp','Rankings_BCC']]
ranking.fillna(0, inplace=True)
ranking.to_csv('./workfile/ranking.csv', index=False)

A final tweak and we are ready for some plotting! Just to keep things organized a new DataFrame is created for plotting visibility.

#new dataframe
dataset = pd.read_csv('./workfile/visibility.csv')
#with some edits
dataset.fillna(0, inplace=True)
dataset.groupby(['Categorie']).mean().reset_index()
dataset['Datum'] = pd.to_datetime(visibility['Datum'], format='%Y%m%d', infer_datetime_format=True)
vis.info()

Plot it like it's hot!

pivot = dataset.groupby(['Datum']).mean()
pivot.plot(kind='line',title='Visibility Overtime')
plt.xlabel('Period', fontsize=14)
plt.ylabel('Visibility in Google Organic Search', fontsize=14)
plt.style.use('Solarize_Light2')
plt.show()

This is already a nice-looking chart, but this is a visibility score for all categories. Let’s create an interactive widget to filter the data per category!

tags = widgets.Dropdown(
    options=['All'] + list(dataset['Category'].unique()),
    value='All',
    description='Category:',     
)

def plotit(tags):
    if 0 > 1:
        print('Nothing to show')  
    else: 
        dataset2 = dataset.copy()
        if tags != 'All':
            dataset2 = dataset2[dataset2.Categorie == tags]
            plt.style.use('Solarize_Light2')
            plt.show()
        
        if len(dataset2) > 0:
            dataset2.groupby(['Datum']).mean().plot(kind='line',figsize=(14,4))
            plt.xlabel('Period', fontsize=14)
            plt.ylabel('Visibility in Google Organic Search', fontsize=14)
            plt.title('Search Visibility in Google for {}'.format(tags), fontsize=20)
            plt.style.use('Solarize_Light2')
            plt.show()
        else: 
            print('Nothing to show')
#personally, I like showing this full-width with some styling
display(HTML("<style>.container { width:100% !important; }</style>"))
#and call the interactive widget
interactive(plotit, tags=tags)

Let me add some screenshots of the output.

3: Plot rankings overtime per category and keyword

The Search Visibility per category can pinpoint categories where action can be required (or just to show of how good things are going 😊). But, especially in the case of ‘action required’, it needs some more information. Which keywords are losing its rankings? Over what time period?

In the previous step we already created a dataset for the rankings. It is time to get this data into a DataFrame and make some final tweaks.

rnk = pd.read_csv('./workfile/ranking.csv') 
rnk.fillna(0, inplace=True)
rnk['Datum'] =  pd.to_datetime(rnk['Datum'], format='%Y%m%d', infer_datetime_format=True)
rnk[rnk.eq(0)] = np.nan
rnk.info()

Also, with keyword rankings, a similar widget can be very useful.

tags = widgets.Dropdown(
    options=['All'] + list(rnk['Categorie'].unique()),
    value='All',
    description='Category:',  
)

def plotit(tags):
    if 0 > 1:
        print('Nothing to show')  
    else: 
        dataset2 = rnk.copy()
        if tags != 'All':
            dataset2 = dataset2[dataset2.Categorie == tags]
            plt.show()
        
        if len(dataset2) > 0:
            dataset2.drop('Search_Volume', axis=1, inplace=True)
            dataset2.groupby(['Datum']).mean().plot(kind='line',figsize=(14,4))
            plt.xlabel('Period', fontsize=14)
            plt.ylabel('Rankings in Google', fontsize=14)
            plt.gca().invert_yaxis()
            plt.title('Rankings in Google for category {}'.format(tags), fontsize=20)
            plt.style.use('Solarize_Light2')
            plt.show()
        else: 
            print('Nothing to show')

interactive(plotit, tags=tags)

Again, here are some examples of the output.

If you notice some improvements or drops in rankings per category, a deep dive per keyword can be very interesting!

kws = widgets.Dropdown(
    options=['All'] + list(rnk['Keyword'].unique()),
    value='All',
    description='Keyword:',        
)

def plotit2(kws):
    if 0 > 1:
        print('Nothing to show')  
    else: 
        dataset2 = rnk.copy()
        if kws != 'All':
            dataset2 = dataset2[dataset2.Keyword == kws]
            plt.show()
        
        if len(dataset2) > 0:
            dataset2.drop('Search_Volume', axis=1, inplace=True)
            dataset2.groupby(['Datum']).mean().plot(kind='line',figsize=(14,4))
            plt.xlabel('Period', fontsize=14)
            plt.ylabel('Rankings in Google', fontsize=14)
            plt.gca().invert_yaxis()
            plt.title('Rankings in Google for keyword [{}]'.format(kws), fontsize=20)
            plt.style.use('Solarize_Light2')
            plt.show()
        else: 
            print('Nothing to show') 
            
interactive(plotit2, kws=kws)

2 screenshots of rankings per keyword overtime.

4: Create a pivot dashboard for specific insights

The final step is to use the pivot method to create nice overviews for a keyword group per website. The fun part of this isthat it gives very specific information about keyword groups. For example: in which period did my keywords lose rankings? Or: in what time period an improvement shows after optimization X or Y? This can be a useful reporting method when a feature has been released or a linkbuild campaign has rolled out. Or… whatever you want to keep track on.

pivot_df = pd.read_csv('./workfile/ranking.csv')
pivot_df[pivot_df.eq(0)] = np.nan
pivot_df["Search_Volume"] = pivot_df["Search_Volume"].astype(int)
pivot_df['Datum'] =  pd.to_datetime(pivot_df['Datum'], format='%Y%m%d', infer_datetime_format=True)

Because, in this case, I am mostly interested in the performance of the rankings of BCC, I only need to call rankings for this website.

piv = pivot_df.pivot_table(index=['Keyword','Categorie','Search_Volume'],columns = ['Datum'],values=['Rankings_BCC'])
rankings = pivot_df[['Datum', 'Keyword', 'Categorie', 'Search_Volume', 'Tags', 'Rankings_BCC']]
rankings.sort_values(by='Datum')

Finally, we can create an interactive pivot table inline. There is an option to pop out the pivot table to have a new window pop up with a very nice overview of the information you have collected. The URL of this location will be something like >> http://localhost: YOURPORT}/files/{YOUR_LOCATION}/reporting/pivottablejs.html.

Playing a bit with some filters it is clear that some fluctuations are going on for the washing machine category of BCC which can lead to actionable things to work or focus on!

To wrap up

If you are using any kind of rank tracker to monetize keyword rankings in Google it can be very useful to get instant actionable information and keep track of the effect of changes that had been made. Python in combination with the Pandas and Matplotlib libraries can help with this. A lot of rank trackers already can show this kind of information, but I think it is more fun to create this with Python. The most important reason for me is multilingual.

  1. Understand the data / rankings better
  2. Get used to work with Python and visualization
  3. Customize visualization
  4. Instant speed
  5. Make a shareable dashboard for stakeholders

The last point is an interesting one because not everyone in your company has access to third party SEO tools (in my case SEMrush). When putting this script and pivot table to the public (or local) web, it is possible to have instant access for everyone with a dashboard URL. And because SEO can be very difficult for some people to understand sharing performance and effect of optimizations helps creating support by different stakeholders and the board.

The full script can be downloaded from my Github >> https://github.com/marcvh80/seo-dashboard.

Also published at https://medium.com/how-to-create-your-own-actionable-seo-dashboard/how-to-create-your-own-actionable-seo-dashboard-with-python-86d5767dbd90