I’ve been working on a small application that overlays a map with thousands of complex features, each polygon feature made up of 10’s to 100’s of individual points. I had gotten the basic app framework and logic coded up, including having real-time editing “just work” out of the box by building on an operational transform library I built that I’ve mentioned before.
But the core user experience depends on the performance of the map and overlay rendering and it was just painful with that number of individual features. I am using the OpenLayers API that provides for a lot of plug-and-play with third party map services.
I knew that it was possible to get this to work. There are existence proofs of these kinds of interfaces out there on the web and the core OpenLayers API defines an alternate layering technique from the one I had first integrated with that uses a tiling overlay approach. That approach is more complicated to integrate but provides the control I would need to make this work. The tiling approach allows you to provide just the subset of features within the viewable subset of the map area and also allows you to provide a different set of features for each zoom level. At lower zoom levels (zoomed out so you see more of the map and more features) you can both cull out features that collapse to single points as well as reduce the complexity of features (reduce the number of points so they render faster). The combination of tiling and culling features and points theoretically would make it all work.
The nice thing about working with popular APIs and formats is you can type a search string like “displaying thousands of geojson features in openlayers” into a search engine and immediately have the second or third article be Rendering big geodata on the fly with GeoJSON-VT — Points of interest, a blog post by MapBox engineer Vladimir Agafonkin.
The post linked above describes a JavaScript library written by Agafonkin, easily installable with npm, that does precisely what I needed. This was one of those rare “perfect dependencies”. The code applies some well-known graphics techniques in a clever way. It annotates the geometry once in an analysis pass and then can use this simplification data to generate the individual tiles for each zoom layer with the simplified and culled feature set on the fly in a blazingly fast pass. It also applies detailed knowledge about JavaScript performance techniques to optimize its memory use and allocation behavior.
The code is beautiful — clean, simple and limited to just what is necessary to accomplish this integration. No clutter with lots of optional features, no over-ambitious infrastructure to do things I don’t need. Integrating it was simple and quick. It interfaces with a simple JSON object interface that is easy to tie in to what OpenLayers expects while maintaining isolation and without creating unnecessary dependencies or assumptions.
The effect seemed magical — the map visually appeared identical to my previous version but now zooming and panning was rapid and seamless. A task that had looked daunting and would gate whether the app was even usable was done in an hour or two.
This is also an interesting example of the realities of time/space performance tradeoffs. The library heavily leverages pre-computing certain data for the culling computation. This is the classic “use space to save time” tradeoff that we usually reference when talking about time/space tradeoffs. But any experienced programmer knows that heavy memory use tends to map directly to a time cost.
This package is careful about recognizing what information it can throw away to minimize memory use and what information to keep around so that it can provide predictable performance when generating tiles on the fly. This technique is common in lots of applications. For example a word processor does detailed layout to determine object positions but then only retains paragraph heights. Detailed character runs are only regenerated as needed, for example for rendering or hit testing on the limited set of paragraphs visible on the screen.
Often you can abstractly define an applications algorithms and data structures, but actually delivering a sweet high performance application involves “hollowing out” the data you retain and focusing on maximizing where you can regenerate data on the fly with predictable performance on demand. Agafonkin did a really nice job of that here.