paint-brush
Locality Simplifies — Office 2016 Examplesby@terrycrowley
272 reads

Locality Simplifies — Office 2016 Examples

by Terry CrowleyDecember 20th, 2017
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

There are certain design features that are so core that it is trivializing them to try to fit them into the structure of a “design pattern”. They end up intersecting and influencing many different patterns and experienced engineers look for the opportunity to leverage them. One of these is optimizing for <em>locality</em>.

Company Mentioned

Mention Thumbnail
featured image - Locality Simplifies — Office 2016 Examples
Terry Crowley HackerNoon profile picture

There are certain design features that are so core that it is trivializing them to try to fit them into the structure of a “design pattern”. They end up intersecting and influencing many different patterns and experienced engineers look for the opportunity to leverage them. One of these is optimizing for locality.

Whether this is packing network communication into a small number of coalesced requests; grouping IO to take into account the properties of spinning disks or the very different physics of reads and writes of blocks from flash storage; tuning the way memory is accessed by the processor in order to optimize use of a complex memory hierarchy; all the techniques have similar properties. Pack data together into a bounded dataset size and operate on it with temporal and spatial locality. While technologies have changed radically over the last 5 decades, I could have written that same guidance when I started programming 40 years ago.

The value of optimizing for locality is grounded in the physical reality of three dimensions and the speed of light. It will continue to be as true going forward as it has been in the past.

If you can guarantee good locality, you can also often simplify your algorithms and data structures. Office 2016 shipped with a couple interesting examples of how locality was leveraged.

Office shipped with a new data visualization framework, codenamed IVY. It provides an example of designing a component to achieve good locality and then leveraging that locality to simplify how the component is used and integrated. The IVY component takes a dataset, a chart specification and a target viewport and resolution and then generates the annotated geometry that describes the chart. The annotated geometry is the set of lines, curves, text and filled regions that represents the visualized chart, with sufficient annotating metadata to allow support for features like accessibility, theming, animation and interactivity.

The component is designed to be used in a wide range of environments and tries to minimize assumptions and remove constraints on host behavior like threading models or rendering technology.

The core engine is responsible for walking over the input dataset and generating this geometry. Once the geometry is generated, there is a wide range of processing that needs to be supported; rendering, accessibility hooks, hit-testing over the geometry for interactive features like selection and hover effects, or animating between geometries. The engine is designed so this geometry and the algorithms that operate over it have great locality. While the input dataset can be very large, the engine provides for a robust culling strategy to take advantage of the limited screen viewport. Obscured features or features that degenerate at the target scale (e.g. single-point line segments) can be removed from the generated geometry. This approach allows an unbounded input dataset to generate a set of features that is guaranteed to be well-bounded.

When designing for interactive applications, the place in your architecture where an unbounded dataset can be converted into a form with clear constraints and bounds is always a key element of your design.

Other techniques like supporting an abstract iterator over the geometry allows charts to further compact the generated result set by synthesizing it on demand (e.g. by encoding a series of boxes for a scatter chart as a single description of the box and then only encoding a single point of the box for each data point). The engine also tries to allocate the geometry as a single compact block of memory. This compact approach means most charts easily fit in L1 memory.

Most processing can be implemented as a simple linear walk through the streamed geometry. Because the generated geometry is guaranteed to be bounded and compact, hosts integrating with the component can use simple techniques like cloning the geometry to leverage multiple threads (e.g. handing off to a renderer). The team building IVY invested in the engineering around the component to ensure these characteristics are maintained over time as features are added and new visualizations defined (e.g. building unit tests that ensure peak and steady state memory use stays well-bounded as dataset sizes vary). This is an especially good lesson; maintaining the integrity of a design over time is surprisingly challenging. Unit tests and other validation both provide engineering infrastructure for maintaining the integrity of the original design principles as well as continually reinforcing to new developers on the project what those key principles are.

This is a great example of a component where there is much sophistication in the design and engineering but in service of a core structuring principle that can be very simply defined.

Another interesting example from the Office 2016 project was the expanded use of Direct2D command lists. D2D command lists allow you to execute your normal rendering code but rather than rendering to the screen or some full resolution surface, the rendering commands are captured and stored in a compact buffer that can be used for later playback (not unlike Windows metafiles).

This compact representation provides a number of benefits in design flexibility. The actual rendering code can be executed more quickly and predictably since it does not have to walk over a possibly high resolution output surface — that might vary significantly in different execution environments. This means the code is inherently running with better locality (interactions with the GPU and potentially asynchronous rendering makes the full consequences complicated but still generally true). The compact buffer is cheaper to cache, so the application can do speculative pre-rendering of regions outside the direct viewport at a lower cost.

Since these command lists have no external dependencies, it is simple to hand them off to another component to manage (so e.g. they can be rendered with predictable performance costs while doing touch panning in another thread needing guaranteed high responsiveness rather than using a callback mechanism to uncontrolled client code). The costs scale predictably (or not at all) when used at a wide range of screen sizes and resolutions — especially important as support was extended to screen resolutions that vary from 100 to 500 DPI. Rendering is faster (and more predictable) because the code is not executing complex application-specific rendering algorithms that needs to walk over a data model tuned for application-specific manipulation rather than rendering. So there is better locality both when the instructions are generated and then later when they are executed to actually complete rendering to a surface.

This approach of producing a compact representation that can be operated on with high locality and performance and reasoned over simply because of limited dependencies provides great design flexibility. In both these cases, “locality” also means locality of dependencies which is a common characteristic of implementations that follow this pattern.

Barry Allyn, the principle architect of the IVY charting component described above, commented on this discussion:

When I started my career as Win3.0 16-bit app developer in 1993, we were all forced to structure data and code to fit in 64K data and code segments. As you’re obviously aware, this required data structures and usage patterns to be highly optimized for size simply to avoid the cost of FAR and HUGE pointers as well as to fit on to floppy disks ($$). We didn’t care nor fully appreciate the virtues of locality, it was all about optimizing for size in order to simply fit through the door and save $$ for the company. Visio 1.0 started development in 1990, which meant their data model (DML) and ShapeSheet recalc engine (SSE) also had to fit within the same constraints. This turned out to be a huge blessing and is a big reason why Visio can open and edit massive AutoCad drawings without tipping over. Anyway, I liken devs who’ve written Win16 code to those before us who went through The Depression. The scars have forced a permanent mindset of savings, be it memory or money. My grandmother passed away before the 2008 stock market crash, but I can guarantee you she would’ve survived it even with what little savings she had left in her 90’s.