In a personalized world: How cookies make your performance tests more reliable

Written by larskjensen | Published 2017/06/14
Tech Story Tags: web-development | web-performance | performance-testing | personalization

TLDRvia the TL;DR App

“Warm up” your test tool and get better performance tests. Here is how we did it with WebPageTest.

Part of my job at Ekstra Bladet Udvikling/Development is to monitor web site performance at one of Denmark’s biggest and busiest websites.

When you test and measure a website’s performance it’s important that your tool looks as much as a real user experience as possible. This is possible with the performance tools we are currently using in Ekstra Bladet Development but it requires a little fiddling. Read how it’s possible and what you can do to get started.

🕐 The Challenge

Performance is a huge factor in the user’s experience. This is especially true if you have a website who (like ekstrabladet.dk) contains a lot of ads which all have an effect on the technical performance.

Back in July 2016, I summed up our work with performance and published (among other articles) ‘Your ads performance is your performance’ here on Medium. Here I wrote about how much of our performance is affected by technology that is out of our control.

Besides that, a lot has happened in advertising technology and ekstrabladet.dk is no exception. The advertisers increasingly prefer to reach users which they hope will be receptive to their message. They do this by using a range of technology partners which promise to help with this and they try to match the content with the users based on, among other things, their cookies.

When so much has happened in the value chain from advertiser to user it seems strange that performance is still widely measured by throwing a bunch of URLs into a tool, which then visits and measures them — and then shows us the results. Many ads (and the technology matching them with users) depend on cookies and “cold browsers” like the ones in performance tools don’t have cookies.

Therefore it would be pretty neat if you could “warm up” the test browser, thereby making it more attractive to various technologies and ads that live on a website but won’t show up until there is something to target.

🕑 The Concept

A while ago I read an article (which I regrettably can’t find despite numerous attempts…otherwise I would have linked to it) where a company behind a performance tool had a suggestion for a solution. They let their tool visit a series of URL addresses before vising the URL where the actual performance testing is happening.

This has the advantage that the test browsers builds a cookie profile (from the places it has visited) which all of a sudden makes it more interesting to various personalization technologies.

In the article the company told how the new way of measuring showed that a website was in fact slower, than they had thought. Meaning that it took longer time for the website to load for a real user than for the cold test browsers which a lot of people trust in when they measure performance.

🕒 The Tools

I decided that I wanted to conduct the same experiment with ekstrabladet.dk. We are quite well of in the sense, that the tools we are using for performance testing already can do this.

WebPageTest is (part of) the beating heart in SpeedCurve.

We use SpeedCurve for the continuous performance tests while we use WebPageTest for our ‘ad hoc’ (from time to time) tests. SpeedCurve is built on WebPageTest so actually we are only using one tool, deep down. And the type of scripting I have used in WebPageTest (which I’ll get back to in a moment) is supported in the Enterprise edition of SpeedCurve which we have. Hooray.

As recommended by Steve Souders from SpeedCurve, I decided to test it in some manual WebPageTest tests. It is the results of these tests I want to share with you in this article.

🕓 The Setup

🕔 Find the URLs

The first step was to get some good URLs I could get WebPageTest to visit to warm up the test browser. I talked with some of my colleagues in our Sales/BackOffice/AdOps team and they handed me the following addresses which they believe could be interesting:

Here we are using fem URLs. Possible because I asked for “a handful” — so the number 5 has no significance in this context.

🕕 Write the script

Now we want WebPageTest to visit these URLs before visiting the article on ekstrabladet.dk we want to have measured. Here we use the scripting interface that is built into WebPageTest.

Luckily scripting is a part of the WebPageTest documentation so it’s easy to get started.

On the documentation page you’ll find this example:

logData 0

// put any urls you want to navigatenavigate www.aol.comnavigate news.aol.com

logData 1

// this step will get recordednavigate news.aol.com/world

WebPageTest scripting works by issuing various commands (in this case “navigate” and “logData”) and one or more parameters (in this case 0/1 or a URL). These to must be separated by a tab. That is important to remember.

The example above start out by visiting ‘www.aol.com’ and then goes to ‘news.aol.com’. But because the test browser is instructed not to save any data on the performance test (“logData” is set to 0), it doesn’t take any notes, so to speak.

It does that, however, when it navigates to ‘news.aol.com/world’ because “logData” is being set to 1. It’s actually quite logical.

In this case it could be to measure what I call the “cache win”. Meaning, how easier/faster a page loads when the user has previously visited another page using some of the same resources (images, CSS, JS files etc.). This can also be measured by visiting the same URL twice and only saving the performance test from the last visit.

We can use this function for what we want to achieve. Not because we want to measure the cache win (the five advertiser websites probably don’t share resources with ekstrabladet.dk) but because the data/information that is being loaded on a page is hidden in the browser cache and as cookies.

If we want to write a script that visits the five URLs and then does a performance test on the frontpage at ekstrabladet.dk it will look like this:

logData 0navigate http://superbrugsen.dk/tilbudsavis/navigate https://www.alka.dk/bilforsikringnavigate http://www.nykredit.dk/dit-liv/bolig/ny-bolignavigate https://danskespil.dk/oddset?intcmp=top_menu_oddset_brandnavigate http://www.circlek.dk/dk_DK/pg1334082175653/privat/extraClub.htmllogData 1navigate http://ekstrabladet.dk

If you need a copy-paste version of the script, I have uploaded it as a .txt file 😉

🕖 Define your measuring range

For these tests I have chosen to focus on the performance of articles. So far a lot of our performance work has focused on the front page, but we are currently rolling out a new article design across our website — and a part of our way of working is ‘mobile first’, so it made sense to look at the mobile edition/version of articles. Therefore I made WebPageTest emulate an iPhone 6.

I chose to use the WebPageTest server based in Ireland. There are others close to Denmark, Germany for example — but I’ve been using the one in Ireland for a long time, so for the sake of comparison I stuck with that.

Note, that when you test/measure performance from another country or far-away location, you shouldn’t get too attached to the actual values. A load time of 9 seconds isn’t necessarily 9 seconds just because some test from Ireland says so. On the other hand, you trust comparable measurements/tests. Like, if you chance something on your website and the load time drops from 10 to 5 that is still cut in half, as long as the before/after tests are done from the same location and in the same way, of course. And comparable tests is exactly what we want to do here.

I decided on articles in our entertainment section (“flash!”), since it is one of the sections that has the news design and the accompanying functionality.

I have tested 10 articles. Both the times of publication and of test were spread out across a couple of days:

  1. Ingen skilsmissekommentarer fra Aqua-Lene
  2. Dansk grandprixvinder er blevet gift
  3. Line Baun: Derfor flyttede jeg hjemmefra
  4. Dronningen om sin barndom: Én sætning foragtede jeg
  5. ‘Paradise’-deltager ville være politiker: Derfor er planen droppet
  6. Efter Facebook-fight: — Jeg føler mig som en beskidt luder
  7. Her er verdens bedst betalte kendis
  8. Smukke Helenas hund passer på missen
  9. Ærlig Line Baun om sit livs kiks: Jeg ville da ønske, jeg aldrig havde sagt det
  10. Ærlig Mascha Vang: Sådan påvirker terroren mig

🕗 Remember the general WebPageTest tips

When you are using WebPageTest on a website containing third party content/technology it is important to get all of that into the load being tested. Some tech providers will hide ads, for instance, if they can see it’s a test browser so as to to waste precious ad displays on a machine. Therefore it is important to check the ‘Preserve original User Agent string’ setting found ‘Advanced’ in ‘Advanced Settings’:

‘PTST’ is the WebPageTest test browser’s way of identifying itself as a test browser. This can cause some issues so I always check this box ↑

You should also watch ‘Velocity 2014 — WebPagetest Power Users — Part 1’ on YouTube. It is the first part of a presentation given by Patrick Meenan from Google’s Chrome team (and the guy behind WebPageTest) at the performance conference ‘Velocity’ in 2014.

Among other things he recommends you run an odd number of runs, since WebPageTest picks the best one by choosing the median one. He also stresses that you have more than one run, since the first run warms up the DNS cache, server, database etc.

I have chosen to have five runs in my test. Besides this, I run every test 10 times (meaning 50 runs all in all) which I then gather in a spreadsheet and find the average.

🕘 The results

It can all get a little abstract if we don’t have the same data to look at. Therefore I’ve uploaded my spreadsheet to Google Docs so you can have a look. There you will also find links to every WebPageTest test I have used.

There is nothing secret in these tests/measurements. Everyone can measure the performance of ekstrabladet.dk articles using WebPageTest. I have merely copied in the results (and converted the number format), calculated the average and compared across tests with and without the five URLs. That’s it.

You can see the spreadsheet here

The most interesting, of course, are the comparisons, so here they are. Forgiv me for just pasting screenshots from Excel — I just didn’t feel like spending half a day pasting data into HTML tables (note: ‘Ny’ is Danish for ‘New’):

A quick glance across the numbers that the Speed Index value is where the action is. Speed Index is (besides being documented) an expression of how quickly the first viewport is ready. It is, in other words, an attempt to measure the perceived performance — or a large part of the, at least.

To get a more general impression of the results we can look at the average percentage increase. Note, this is really not feasible since there might be big differences between the articles (and there are after all only 10 of them) but it can give a broader view and maybe help identify certain tendencies:

This clearly shows that especially Speed Index is improved by including the five URLs.

Here we can see that Speed Index has a 30 percent increase which is significant.

This graph (which might receive a ‘World’s Ugliest Graph’ nomination) illustrates the drop in Speed Index; the articles are paired by color:

Speed Index for the 10 different articles, with and without the new URLs in the test tool.

🕙 The Conclusion

Despite the fact that the other values aren’t really affected it does look like the first view/viewport is noticably faster when the test tool (WebPageTest) has visited the five URLs and built a (admittedly quite poor) cookie profile prior to visiting the ekstrabladet.dk article.

As such, this doesn’t really mean anything to the user experience at ekstrabladet.dk, just like we don’t need to change anything on our website following this discovery.

Yet it is still important. It indicates that our site may perform better (at least when it comes to Speed Index) for our users than our tools show; because a test browser with a cookies profile is more similar to a real user.

Therefore we need to consider changing our setup for performance testing and measuring; I will get back to that in a short while.

At the same time it’s interesting that there is only a very small change in the load-time on the articles. The article itself doesn’t load faster for the user (or, test browser, to be honest), but the first viewport is ready much faster. That could indicate that some elements/requests are being fetched in another order, so that the top elements on the page are loaded first — which makes great sense seen from a performance point of view.

I find it interesting as well, that we see the exact opposite of what I was expecting. I thought we would make the same discovery as the people in the article I can no longer find; that our site is actually slower for the real users than in a test tool.

Instead we see that at least Speed Index is improved when we start imitating real browser behavior. That is good news (although it might be the other way around with other or more URLs) — but how can I be like this?

🕚 Why (maybe) — and the future

These tests (even though there is a considerable amount of runs) do not reveal why the articles in the test have a better perceived performance if theh test browser is equipped with cookies.

A theory which I share with a colleague in our BackOffice/AdOps team is that the profile is simply becoming more “attractive” for the various technologies. A lot of ads are after all trying to get to the right users by using things like cookies, so that conclusion is right there in front of us.

Now we need to dig into it and find out whether this is a binary state (where the big difference lies in ‘cookies’/’no cookies’) or if it’s more of a gradual thing, where we might see even better performance if we have 25 URLs in stead of 5.

It may be a glitch in WebPageTest, of course. That the test browser for some reason becomes faster at loading the first viewport when it has visited other sites prior to the article. I don’t think so — but in that case it doesn’t change the fact, that there are certain uncertainties regarding the tools we use to measure and test performance.

With these 10 articles we saw a drop in Speed Index. Next time it might be something different — for example I just did a control test på another article from the same section; here Speed Index dropped by about 20 percent, while the load time dropped more than 10 percent.

So, what happens now? First of all, we need to do more testing.

As I mention above we need to find out whether there is a difference in the number of URLs being visited before the performance test. This isn’t really something we can use on our website — but we can use it to build better performance tests and measurements.

When the user experience is as dynamic as it is nowadays, it’s extremely important that our performance tests are as similar as possible to real browser visits.

I would also like to do tests where the ads (and similar third party technology) are kept out of the equation. My thesis is that the difference in Speed Index will vanish — but that needs to be confirmed/ruled out through tests.

When we have found the best possible setup we need to change our setup in SpeedCurve, which we use for the continuing performance tests. Here we will face an interesting discussion: Should we also change the way we test the other websites we use for benchmarking? And if so, would it be the same X number of URLs; or are other URLs better suited for other sites?

Here we will also need to decide whether this setup needs to be maintained, whereby we from time to time add/edit the URLs on the list.

If you work with the performance of a website with ads or other personalization technology, I will recommend you do something similar. Talk with others in your organization on which URLs could be interesting to use in your tests.

It may well be that you find, there is no difference — but then you’ll know. And remember to keep the tests alive and repeat them so you are always making decisions on the right basis.

🕛 What is real?

There can be no doubt that performance testing is a very…lively field where you can never be sure that you are actually measuring what the users are experiencing.

As Morpheus tells Neo in ‘The Matrix’:

What is real? How can you define real?

These tests underline a point others have made many times; that personalization causes each of us to experience our own World Wide Web.

This makes it impossible to define what a ‘real’ user exprience is — and it means that performance at a website like ours will always be a general expression of how the site is probably experienced by as many users as possible.

This article was originally posted (in Danish) at our Ekstra Bladet Development blog →


Published by HackerNoon on 2017/06/14