Believe it or not, one of the greatest hurdles facing researchers and students is simply getting access to PDFs: they can waste hours and significant mental energy going down rabbit holes before they can even read the content they’re interested in. Kopernio is a web browser extension that shortcuts this tedious process so that they can spend their time on more important things. As such, it interfaces with a big portion of the academic Web: publisher pages (of which there are thousands), institutional login portals, and various open access sources of PDFs.
Not only is this a huge surface area to provide test coverage for, but there’s no guarantee that any of it is going to look the same, provide reliable service, or that we’d even know in advance when there’ll be downtime. This poses several problems:
1, 3 and 4 make my life as a test engineer harder. If only there were a way to hold these services constant by recording the responses we get from them, so they can be replayed whenever we want without touching the internet. Even number two would be far easier to debug if we had a pre-recorded older version of a website or API to run tests against, to compare with their results using the real versions.
Hoverfly is a tool for simulating APIs, and comes in two flavours: a cloud-based solution, and a standalone HTTP server written in Go. The latter is what I use for testing Kopernio so I’ll be diving into that. Broadly speaking it takes HTTP requests and can respond with either the responses to those requests made to their original recipient (much like a proxy server), or sends its own canned responses that can be configured. Its functionality is encapsulated by three modes:
1. Capture - Hoverfly will act as a proxy server and record every request made, along with the response to it from the target. At any time these request-response pairs can be exported as a “simulation” in JSON format. Here’s a stripped-down example:
{
"data": {
"pairs": [
{
"request": {
"path": [
{
"matcher": "exact",
"value": "/r/bouldering"
}
],
"method": [
{
"matcher": "exact",
"value": "GET"
}
],
"destination": [
{
"matcher": "exact",
"value": "reddit.com"
}
]
"scheme": [
{
"matcher": "exact",
"value": "http"
}
]
},
"response": {
"encodedBody": true,
"status": 502,
"body": "foo",
"templated": false,
"headers": {
"Access-Control-Allow-Origin": [
"*"
]
}
}
}
]
}
}
When in this mode you can specify which request headers to record. Once the simulation is exported you can add, remove or modify any fields as required.
2. Simulate - Hoverfly will attempt to match any inbound requests to a response based on the request-response pairs found in an imported simulation file. The rule is that for a successful match, all the JSON fields must match. This means that the more fields you omit, the more permissive the matching will be (and vice-versa).
If a request cannot be matched, Hoverfly will return a 502 error with a message containing the pair that matched most closely. Note: there’s nothing stopping you from running this mode without importing a simulation, but this would result in every request raising an error.
There is also a “stateful” option which, roughly speaking, matches the order of requests as well as the content. This means that if the only match for a request appears before it was seen during the simulation, Hoverfly will still send an error.
3. Spy - like ”simulate”, but any requests that can’t be matched will be relayed to the remote target and Hoverfly will behave like a proxy server. In our case this is really useful because the plugin makes requests to our backend, which is an essential integration test and shouldn’t be simulated in any way.
Setting this up for a quick proof-of-concept is incredibly simple using the
hoverctl
CLI tool:hoverctl start
starts Hoverfly listening on port 8500, hoverctl mode <mode>
sets the mode and hoverctl <export/import> <path>
for loading/creating simulation files. Under the hood, this sends HTTP requests to the Hoverfly server’s REST API, which can be used instead - and is necessary if you want to run Hoverfly with a custom configuration, e.g. using non-standard ports. Hoverfly’s API documentation is incomplete but you can reverse engineer the gaps by using a network traffic monitoring tool, like Wireshark, to see what requests
hoverctl
is sending.We have a test that opens an open access article page to verify that the plugin can retrieve a PDF and display it to the user. The first time it runs, we set up Hoverfly in capture mode to record real responses from the website. If the test passes, we export a simulation. If it doesn’t, we try again.
Thereafter, any time the test runs again, we import the simulation and when the article page is opened, it gets the originally recorded response. The website could go down for all we care, or throw up a paywall for that particular article (which Kopernio might be able to help with but changes the parameters of the test), but Hoverfly will still serve exactly the same web page, every time. Indeed, one of the external services we use during testing became unavailable one Friday afternoon and I didn’t even notice until I tried to access it outside of my tests, because Hoverfly was completely faking it during test runs.
This has the added bonus of not even needing an internet connection if you have a simulation prepared! Very handy if you’re doing a bit of work on the London Underground.
Conversely, we can also use Hoverfly to simulate service outages on demand, to test how Kopernio handles them. This requires a little more thought, but boils down to crafting JSON from scratch or modifying an existing simulation to change 200 status codes to 50x’s.
We also have tests which involve the plugin making requests to a public API. There are two versions of this test: one that receives real responses, and one that receives a simulated response from Hoverfly. If the latter succeeds but the former fails without any obvious server errors, it’s a good indication that the API has changed and we need to update our code accordingly.
Hoverfly also doubles up as a network monitoring tool: it stores a “journal” containing all the requests and responses served. Even if the tests don’t use simulations, we set Hoverfly to spy mode and boom, network logs. There are other proxy-based solutions for this purpose that might be better suited, but it’s nice not having to chain proxies and keep things simple.
What’s more is that this is a platform-agnostic solution: we have tests for both Firefox and Chrome but as of writing only the former provides network logs that encompass both web page and browser extension activity.
As mentioned previously, it’s important to make sure that we simulate requests to certain services, but if the matching fails for any reason and we’re in spy mode, Hoverfly will silently proxy the request, which is a bit of a problem. The last thing we want is to accidentally spam a website by running a test 1000 times under the assumption that requests to it would never hit the real website!
Hoverfly will set a “Hoverfly” header on all responses it serves, regardless of their origin, but it’s often useful to know whether a given response came from a simulation or a real request made to a third party. The solution we’ve employed is to modify the simulations to include an extra header that proxied responses won’t have; we can then look for that header in the journal.
The request-matching logic can be brittle if you record cookies in simulations: if a website has some JavaScript that sets a different one each time the page is loaded, this means any requests from the browser can’t be matched because the cookie is different to the one that was used when the simulation was first created. In our tests we remedy this by editing the simulation to ignore the cookie provided the simulation remains useful.
Hoverfly has become an essential tool in my arsenal when writing tests for Kopernio to make sure we don’t introduce regressions against external services. As we reach out to more and more users this surface area is only going to increase, so I’m very glad to have discovered it sooner rather than later. There are some cool features that I haven’t touched upon here, like the ability to introduce delays before responses.
If you are interested in seeing how you can integrate Hoverfly into your tests, download and install the pytest plugin that we use in our own framework: it provides the extra functionality mentioned in this article (like adding extra headers for determining response origin).
For other usage, head over to Hoverfly’s impressive documentation and try it for yourself - or get in touch!