If you looked up optimisation (or optimization for our American readers) in the dictionary it would say something along the lines of Making the most of what we’ve got.
A worthy endeavour and challenge accepted.
In 2016 my whole team got a chance to spend some time spiking out our proposed upgrade to our codebase and this really got me thinking about how we should build our front end. Pretty quickly I realised that the browser would be a large factor in our approach and equally large bottleneck in our knowledge.
Since then my team have had the privilege of putting a lot of this theory in to practice, giving us lots of learnings along the way.
I’ve also interacted with lots of other developers online and at talks I’ve given about optimisation so I want to pass my knowledge on to you.
- Users are happier because web pages load faster and are tailored to give a better UX.
- Developers are happier because your codebase and architecture follow best practices.
- The business is happier because of potentially lower bounce rates and higher conversions.
- You can future proof your code for when speed and optimisation affects SEO rankings more.
- And you can help make the web a better place.
We can’t control the browser or do much to change the way that it behaves, but we can understand how it works so that we can optimise the payload we deliver.
Luckily, the fundamentals of browser behaviour are pretty stable, well documented and unlikely to change significantly for a long time.
So this at least gives us a goal to aim for.
Code, stack, architecture and patterns on the other hand are something we can control. They’re more flexible, change at a more rapid pace and provide us with more options at our end.
I decided to work from the outside in, figuring out what the end result of our code should be, and then form an opinion on writing that code. In this first blog we’re going to focus on all we need to know about the browser.
What the browser does
Let’s build up some knowledge. Here’s some trivial HTML we’ll expect our browser to run.
How the browser renders the page
When the browser receives our HTML it parses it, breaking it down in to a vocabulary it understands, which is kept consistent in all browsers thanks to the HTML5 DOM Specification. It then runs through a series of steps to construct and render the page. Here’s the very high level overview.
- Use the HTML to create the Document Object Model (DOM).
- Use the CSS to create the CSS Object Model (CSSOM).
- Execute the Scripts on the DOM and CSSOM.
- Combine the DOM and CSSOM to form the Render Tree.
- Use the Render Tree to Layout the size and position of all elements.
- Paint in all the pixels.
Step one — HTML
The browser starts reading the markup from top to bottom and uses it to create the DOM by breaking it down in to Nodes.
HTML delivery optimisation strategies
- Styles at the top and scripts at the bottom
While there are exceptions and nuances to this rule, the general idea is to load styles as early as possible and scripts as late as possible. The reason for this is that scripts require the HTML and CSS to have finished parsing before they execute, therefore we put styles up high so they have ample time to compute before we compile and execute our scripts at the bottom.
Further on we investigate how to tweak this while optimising.
- Minification and compression
Minification removes any redundant characters, including whitespace, comments, extra semicolons, etc.
Compression, such as GZip, replaces data in code or assets that is repeated by instead assigning it a pointer back to the original instance. Massively compressing the size of downloads and instead relying on the client to unpack the files.
By doing both you could potentially thin your payload down by 80 or 90%. E.g. Save 87% on bootstrap alone.
- Preload, Prefetch, Prerender, Preconnect
These four html attributes allow us to preemptively load resources or make network connections that our user may need in the future.
preload attribute allows us to preload resources we will need during the current page navigation.
prefetch attribute allows us to prefetch resources and network requests for future navigations.
prerender attribute fully renders a page in the background.
preconnect attribute anticipates future network requests and conducts the DNS lookups, TLS negotiations, TCP handshakes.
While this won’t make your page download any faster it will drastically increase the satisfaction of impaired users. Make sure to provide for everyone! Use
aria labels on elements, provide
alt text on images and all that other nice stuff.
Use tools like WAVE to identify where you could improve with accessibility.
Step two — CSS
When it finds any style related Nodes, i.e. external, internal or inline styles, it stops rendering the DOM and uses these Nodes to create the CSSOM. That’s why they call CSS “Render Blocking”. Here are some pros and cons of the different types of styles.
The CSSOM nodes get created just like the DOM nodes, later they will be combined but here’s what they look like for now.
Construction of the CSSOM blocks the rendering of the page so we want to load styles as early as possible in the tree, make them as lightweight as possible, and defer loading of them where efficient.
CSS delivery optimisation strategies
- Use media attributes.
Media attributes specify a condition that has to be met for the styles to load, e.g. Is there a max or min resolution? Is it for a screen reader?
Desktops are very powerful but mobile devices aren’t, so we want to give them the lightest payload possible. We could hypothetically deliver only the mobile styles first, then put a media conditional on desktop styling, while this won’t stop it downloading it will stop it blocking loading of our page and using up valuable resources.
- Defer loading of CSS
If you have styling that can wait to be loaded and computed until after the first meaningful paint, e.g. stuff that appears below the fold, or stuff that is not essential until after the page becomes responsive. You can use a script to wait for page to load before it appends the styling.
- Less specificity
There’s one obvious drawback in there being physically more data to transfer with more elements chained together, thus enlarging the CSS file, but there’s also a client-side computational drain on calculating styles with higher specificity.
- Only deliver what you need
This may sound silly or patronising, but if you’ve worked on the front end for any length of time you’ll know one of the big problems with CSS is the unpredictability of deleting stuff. By design it’s cursed to keep growing and growing.
To trim CSS down as much as possible use tools like the uncss package or if you want a web alternative then look around, there’s a lot of choice.
Because our script may need to access or manipulate prior HTML or Styles, we must wait for them to all be built.
Browsers have something called a “Preload Scanner” that will scan the DOM for scripts and begin pre-loading them, but scripts will execute in order only after prior CSS Nodes have been constructed.
If this was our script:
Then this would be the effect on our DOM and CSSOM:
Optimising our scripts is one of the most important things we can do and equally one of the things that most websites do worst.
- Load scripts asynchronously
By using an
async attribute on our script we can tell the browser to go ahead and download it with another thread on low priority but don’t block the rest of the page loading. As soon as it’s finished downloading it will execute.
This means it could execute at any time, which leads to two obvious issues. First that it could execute long after the page loads, so if we’re relying on it to do something for the UX then we may give our user a sub optimal experience. Second, if it happens to execute before the page finishes loading we can’t predict it will have access to the right DOM/CSSOM elements and might break.
async is great for scripts that don’t affect our DOM or CSSOM, and definitely great for external scripts that require no knowledge of our code and are not essential to the UX, such as analytics or tracking. But if you find any good use case for it then use it.
- Defer loading of scripts
defer is very similar to
async in that it will not block loading of our page, however, it will wait to execute until after our HTML has been parsed and will execute in order of appearance.
This is a really good option for scripts that will act on our Render Tree but are not vital to loading the above the fold content of our page, or that need prior scripts to have run already.
defer do not work on inline scripts since browsers by default will compile and execute them as soon as it has them. When they’re inlined in HTML they are immediately run, by using the above two attributes on external resources we’re merely abstracting away or delaying the release of our scripts to our DOM/CSSOM.
- Cloning nodes before manipulation
If and only if you’re seeing unwanted behaviour when performing multiple changes to the DOM then try this.
It may be more efficient to first clone the entire DOM Node, make the changes to the clone, then replace the original with it so you avoid multiple redraws and lower the CPU and memory load. It also prevents ‘jittery’ changes to your page and Flashes Of Unstyled Content (FOUC).
Just be careful when cloning because it doesn’t clone event listeners. Sometimes this can actually be exactly what you want. In the past we have used this method to reset event listeners when they weren’t calling named functions and we didn’t have JQuery’s
.off() methods available.
Step four — Render Tree
Once all Nodes have been read and the DOM and CSSOM are ready to be combined, the browser constructs the Render Tree. If we think of Nodes as words, and the Object Models as sentences, then the Render Tree is a full page. Now the browser has everything it needs to render the page.
Step five — Layout
Then we enter the Layout phase, determining the size and position of all elements on the page.
Step six — Paint
And finally we enter the Paint phase where we actually rasterize pixels on the screen, ‘painting’ the page for our user.
And all that usually happens in seconds or tenths of a second. Our job is to do it quicker.
By now you know enough to appreciate this talk by Tali Garsiel. It’s from 2012 but the information still holds true. Her comprehensive paper on the subject can be read here.
If you’re loving what you’ve read so far but still hunger to know more low level technical stuff then your oracle of all knowledge is the HTML5 Specification.
We’re nearly there, just stay with me a bit longer! Now we discover why we need to know all of the above.
How the browser makes network calls
In this section we’ll understand how we can most efficiently transfer to the browser the data required to render our page.
When the browser makes a request to a URL our server responds with some HTML. We’ll start off super small and slowly increment the complexity.
Let’s say this is the HTML for our page.
We need to learn a new term, Critical Rendering Path (CRP). It just means the number of steps the browser has to take to render the page. Here’s what our CRP diagram would look like right now.
Critical Path Length
The first of three CRP metrics is path length. We want this to be as low as it can be.
The browser makes one round trip to the server to retrieve the HTML it needs to render our page, and that’s all it needs. Therefore our Critical Path Length is 1, perfect.
If we examine our CRP diagram we should see a couple of changes.
We have two extra steps, build CSSOM and execute scripts. This is because our HTML has internal styles and scripts that need computing. However, because no external requests had to be made, they do not add anything to our Critical Path Length, yippie!
But hold on, not so fast. Notice also that our HTML increased in size to 2kb, so we have to take a hit somewhere.
That’s where the second of our three metrics comes in, Critical Bytes. This measures how many bytes have to be transferred to render the page. Not all the bytes that will ever download on the page, but just what is needed to actually render our page and make it responsive to the user.
Needless to say we want to minimise this as well.
If you think that’s great and who needs external resource anyway, then you would be wrong. While this looks tempting it isn’t workable at scale. In reality if my team had to deliver one of our pages with everything it needed internally or inline it would be massive. And the browser is not built to handle such a load.
Look at this interesting article on the effect of page load while inlining all styles like React recommends. The DOM quadruples in size, takes twice as long to mount and 50% longer to become responsive. Quite unacceptable.
Also consider the fact that external resources can be cached, therefore on return visits to our page, or visits to other pages that use the same resources (e.g.
my-global.css) the browser will not make a network call and instead use its cached version, thereby being an even bigger win for us.
Here’s what our CRP diagram looks like now.
It has picked up our
app.js and carves another Critical Path to go fetch them. It didn’t pick up our
analytics.js because we gave it the
async attribute. The browser will still download it with another thread on low priority but because it doesn’t block rendering of our page it does not factor in to the Critical Path. This is exactly how Google’s own optimisation algorithms rank sites.
async script doesn’t count. Smaller is better, of course.
Back to the Critical Path Length
HTTP1 file limit
Once again, life isn’t that simple. Thanks to the HTTP1 protocol our browser will only simultaneously download a set maximum number of files from one domain at a time. It ranges from 2 in really old browsers to 10 in Edge, Chrome handles 6.
You can view how many max concurrent files your user’s browsers can request from your domain here.
You can and should get around this by serving some resources off of shadow domains to maximise your optimisation potential.
Warning: Do not serve critical CSS from anything but your root domain, the DNS lookup and latency alone will negate any possible benefit in any possible scenario.
If your site is HTTP2 and your user’s browser is compatible then you can totally obliterate this limit. But those wonderful happenings aren’t too common yet.
You can test your site’s HTTP2'ness here.
TCP round trip limit
Another enemy approaches!
The maximum amount of data that can be transferred in any one round trip is 14kb (it can vary up to 16kb), this is true for all web requests including HTML, CSS and Scripts. This comes from a TCP Specification that prevents network congestion and packet loss.
If our HTML or any accumulation of resources in a request is larger than 14kb then we need to make additional round trips to fetch them. So, yes, those massive resources do add many paths to our CRP.
The Big Kahuna
Now let’s go all out with our massive web page.
Our entire HTML is nicely minified and GZipped and weighs in at 2kb, well under the 14kb limit, so it comes back to us just fine in one round trip of the CRP and the browser dutifully begins constructing our DOM with the 1 critical file, our HTML.
CRP Metrics: Length 1, Files 1, Bytes 2kb
CRP Metrics: Length 2, Files 2, Bytes 16kb
It then carries on downloading resources. The remainder weigh in under 14kb so could make it in one round trip but there’s 7 of them and because our website isn’t on HTTP2 yet, and we’re using Chrome, we can only download 6 files in this round trip.
CRP Metrics: Length 3, Files 8, Bytes 28kb
Now we can finally download the final file and render the DOM.
CRP Metrics: Length 4, Files 9, Bytes 30kb
Bringing our CRP to a total of 30kb of critical resources in 9 critical files and 4 critical paths. With this information and some knowledge of the latency in a connection you can actually begin to make really accurate estimates about performance of a page for a given user.
Browser networking optimisation strategies
- Pagespeed Insights
Use Insights to identify performance issues. There’s also an
audit tab in Chrome DevTools.
- Get good at Chrome Developer Tools
DevTools are sooo amazing. I could write a whole book on them alone, but there’s already a lot of resources out there to help you. Here’s a really useful read to start interpreting network resources.
- Build in a good environment, test in a bad one
Of course you want to build on your Macbook Pro with 1tb SSD and 32gb of RAM, but for performance testing you should head over to the
network tab in Chrome, if that’s what you use, and simulate a low-bandwidth, throttled-CPU connection to really get some useful insight.
- Concatenate resources/files together
- Internal Styles for Above the fold content in the header
But there’s a very good argument for having internal styles for your essential above the fold content, meaning that you can avoid fetching resources to do your first meaningful paint.
- Minify/shrink images
Pretty straight forward concept with a multitude of implementations, pick your favourite.
- Defer loading images until after page load
Everything you should know about image optimisation strategies can be found here in this excellent free ebook from Addy Osmani.
- Load fonts asynchronously
Fonts are very expensive to load and if you can you should use a webfont with fallbacks and then progressively render your fonts and icon sheets. This may not seem great but the other option is that your page loads with no text at all if the font hasn’t loaded, this is called a Flash Of Invisible Text (FOIT).
There is a superb dev blog from eBay on a good custom font delivery strategy.
Well, do you? Answer me! Is there a native HTML element that can produce the behaviour you’re using scripts for? Is there styling or icons that can be created inline instead of internal/external? E.g. Inline an SVG.
Content Delivery Networks (CDNs) can be used to serve your assets from a physically closer and lower latency location to your user, thereby lowering load times.
- Server-side optimisation
For every request your server is going to have to get and respond with your code, this causes latency and wait times in your network requests. There’s lots of different ways to optimise here and it’s probably a whole book’s worth so I can’t elaborate too much. But you probably want to look in to things like server side caching of static html, code caching, and memoisation.
Now you’re tickled pink and know enough to go out there and discover more on this subject yourself. I would recommend doing this free Udacity course and reading Google’s own documentation on optimisation.
If you yearn for even more low level knowledge then this free eBook from Ilya Grigorik on High Performance Browser Networking is a good place to start.
The Critical Render Path is all important, it gives us some pretty solid and logical rules to optimise our web site. The 3 metrics that matter are:
1 — Number of critical bytes.
2 — Number of critical files.
3 — Number of critical paths.
What I’ve written about here should give you enough to get a grasp on the fundamentals and help you interpret what Google Pagespeed Insights has to say about your performance.
The application of best practice will come with the meshing together of a good DOM structure, network optimisation and the various strategies we have available to minimise our CRP metrics. Making the user happier, and making Google’s search engine happier.
In any enterprise scale website this will be a monumental task but you’ll have to do it sooner or later so stop incurring even more technical debt and start investing in solid performance optimisation strategies.
Thank you so much for reading. Now you have a solid overview you can go out there and become as obsessed with performance optimisation as I am, good luck have fun!