This article will touch on how data science can be used in SEO and look at how correlative analysis should be used during the content creation process. For those not familiar with these topics, there will be examples and pictures but as should be expected when covering any complicated topic, the scope of the article will be limited to the main purpose.
It always bothers me when I see the recent trend to present SEO using the term ‘scientific search engine optimisation’, I understand that it's a great way to allay the fears of CEOs and business owners, but it does rankle.
All digital marketing efforts should be based on some hard data, and SEO’s have been doing this since the year dot. Whether that was pushing the limits of keyword densities back in the day, to enumerating backlink profiles to judge competitiveness, these were always data driven approaches.
There should be no place for the ‘publish and pray’ method that most within the industry will have encountered at some stage. Rant over.
As search engine optimization specialists, we have a huge amount of data at our fingertips. Whether this is pulled from Google dorks.
From proprietary data sources such as Ahrefs, Majestic etc.
Or from a private calculated method using our own metric. I’m thinking back to ‘thirst’ as a way of calculating whether a keyword was worth pursuing, which relied on MOZ’s link data, keyword volume and Keyword Planner’s CPC.
The point being that a data driven approach to SEO and digital marketing in general will;
These are only some of the benefits, this list could be much longer, the point being, that if you are not taking a data driven approach to your decisions and allocation of resources, you are going to lose to the little guy, with a smaller budget and more RNR time.
The last two years have seen a huge compression in the SERPs, where many of the articles for highly competitive keywords all look pretty similar and cover the topic in the same way.
Why is this?
Cue the rise in correlative analysis tools.
We’ve always matched our content to the search intent Google shows in the SERPS and then tailored content creation around what was already ranking. But correlative analysis tools for SEO have taken this arduous process and simplified it.
Correlative analysis tools allow us to scrape the top 100 results for a search term and see which factors correlate to higher rankings. The simplest example of this is 'word count'.
If Google is rewarding articles with a word count of between 1000 and 1200 words, then this is where we need to aim with our article.
Likewise, if Google is rewarding sites that load in under a second, or that have a keyword density of 3.2% or …
To understand how deep into this you can go, take a look at the small part of a CoraSEO report below. This is from some months ago and the developer adds new metrics all the time.
The key is not to have everything perfectly lined up, but to hit the important notes. Like everything it’s a case of time/cost vs reward.
Whilst CoraSEO is great for folks who want to get at the data directly, you'll need a fair amount of experience to interpret it correctly.
There are a number of other tools on the market that hide a lot of the data and only present the user with limited highlighted insights. I’m only going to cover the ones that I’ve used to give a flavour of what's out there.
You wouldn’t use a hammer to cut down a tree, nor a saw to mend a fence. Some of these tools are suited to certain workflows and SEO goals such as quickly optimising existing content or giving your content writers the best briefs possible, whilst others will help you build the perfect page structure for your PWA’s landing pages.
Set up by brothers from a little Polish town, SurferSEO has a very intuitive interface that allows you to focus on the main aspects of page optimisation. There are a large number of options still available under the hood, but Surfer does an excellent job of keeping things simple. If you need your writers to access the tool, they won’t be lost and they've recently added a content writing module.
The page audit tool is great at highlighting where you can improve and bring your page into line with the competition covering basic things like keyword in the Title, url and meta description, the exact and partial keywords in your h tags, content and codebase and more technical aspects such as Time To First Byte(TTFB) and schema.
It’s perhaps important to note that SurferSEO uses something called True Density to calculate the optimal occurrences of words and phrases on a page rather than the more common TF-IDF method. They’ve done a great job of explaining the difference on their blog.
If the other two tools on this list have opted for a streamlined approach that takes care and hides the data behind carefully designed interfaces, Cora takes the opposite route.
It’s a self hosted windows application that can take 30 mins to run covering almost every aspect of a page that you could want to analyse. At last count it compared over 400 factors and dumped everything out into an excel spreadsheet so that you could examine the data yourself.
To be clear, these are not the types of spreadsheets that you can send off to a writer, the data has to be interpreted but can uncover important insights. It calculates correlation using 2 methods and then ranks the signals for you, giving you actionable information.
Kyle Roof’s leaked test showing how he ranked a local site by only using 'lorem ipsum' content and keywords sprinkled precisely into the right places confirmed what many of us believed for years.
Google can not parse your content and assign it some sort of quality score. The amount of processing power required to do this just doesn’t make it feasible on any large scale. That’s not to say that they don’t have some algo that they run on highly competitive key phrases, but when you consider the breadth of content published all the time, even Google are not capable of something like this.
Kyle Roof capitalised on his fame and brought POP to market. I got in at the beta and saw some great results using it, but I don’t want to comment on it too much, as I moved to SurferSEO and Kyle’s team have continued to update it’s features including various options for the aggressiveness of optimisation, a content editor and much more.
There are other tools out there, SEO PowerSuite have a module for TF-IDF content optimisation and there a couple of others, but these don't offer the same breadth of data that the 3 above do.
Depending on your site(s) and your workflow and your goal, one of these tools may be more suited than the others. Below I've pulled out 3 use cases, but your milage may vary.
The optimisation of an existing page is probably the most common task I’ve used correlation analysis tools for. The main reason for this is simple, you can get much quicker results working with an existing piece of content, than publishing a new article.
If you just need to check keyword densities and TF-IDF then text-tools.net is a great and affordable tool for the task. You add your keyword and target url and it breaks down how you stack up to each of the competition and highlights any common missing phrases in your content.
Perhaps the one caveat with this simple tool is that your article does have to be published and available to be crawled for it to do its thing, so it can’t be used earlier in the content creation process.
NB - Whilst the prevailing wisdom is to keep the your keyword density below 2.5%, like most SEO ideas, this is rehashed wisdom from 2014 that doesn't stand up to real world investigation. You'll quickly find that it depends on your niche and that many of your successful competition are a lot more aggressive.
Page Optimizer Pro and SurferSEO are equally apt for revising and updating content. They give you much better and deeper analysis of things to improve on your page but at that additional cost.
Producing high quality content briefs for your writers is essential. If you leave the entire process up to the writer, you run the risk of getting back a poorly researched article that might not match the target search intent. It should all be a system, quality inputs produce quality outputs.
Giving your writers a topical structure including key phrases you want mentioned will always lead to better articles, particularly if you have a high churn of writers that need to become familiar with your site’s audience, tone and your structural expectations of the content.
Both Page Optimizer Pro and SurferSEO have features that can help with the process without you getting too bogged down in the weeds.
Both Surfer and POP have a content creation tool baked in to assist your writers in ticking the needed boxes as they write. Alternatively you can review the the lexical recommendations and include what you want in your briefs.
If you are building a progressive web application with hundreds or thousands of landing pages for different target phrases or geographic locations. You need to have the optimal structure across all these end points.
To give you a better understanding, let me give you the example of playfinder.com who are targeting the following types of keywords across the UK and Ireland.
Rugby Pitches & Clubs in {CITY/TOWN} (and the derivatives which would include rent, pitch etc)
Basketball Court Hire in {CITY/TOWN} (and the derivatives)
{CITY/TOWN} Sports Facility Hire
You get the idea, each of their circa 9,000 pages are hyper targeted to a set of keywords and a geo location.
Some things to consider when building your landing page template would be;
The process for building out a template for this type of situation might work like this.
Obviously the greater the number of locations you use to build your data set, the better the results will be across all locations. And the better your page template, the more organic visitors you’ll get with less links and less need to rebuild the pages in the future.
NB. TLDR - Avoid changing your url, page content or codebase for low volume, low competition keywords. It's always better to do it right the first time. Whilst Google does reward some types of queries for freshness, it’s gotten much better at discerning which types of queries should get a bump when you update or add content, changing your content or codebase will often result in a reduction in rankings for a time until GoogleBot comes back and spends the processing power to reevaluate your page.
What is SEO data?
SEO data can be defined as any data that can lead to actionable insights to improve your site’s position in Google’s organic index.
How is data science used in SEO and digital marketing?
Data science is an integral part of SEO and digital marketing and all decisions and allocation of resources should be based on a data-driven approach.
How do you analyze SEO data?
Analyzing SEO data and using it to improve your organic position can be done in many ways. The first step is to identify the key data points and compare your site with your competitor to find where you need to improve.
Which is better data science or digital marketing?
If you are looking for a great salary and love crunching numbers, data science is a career path with lots of opportunities. If you are more creative, digital marketing might be a better choice.
How is machine learning used in digital marketing?
Machine learning has been touted as a way to improve conversions for advertising accounts, offer visitors of e-commerce stores items they may wish to purchase based on their previous spending habits and been used to improve UI UX designs.
But behind the scenes, Google uses machine learning to analyse user behaviour and reward sites and pages that fulfil the users search intent.
Will AI take over digital marketing?
Yes AI will take over digital marketing, in fact Skynet has already composed chart topping music, created paintings that have sold for thousands of dollars and optimised Adwords accounts to creatively move money out of advertisers accounts and into Google’s coffers. The end is nigh for Digital Marketing.
Murrough Foley
You can find Murrough Foley on Linkedin.