Announcing BuildInfer for C++

Analyze, Visualize and Migrate Between Build-systems

Is your C++ build slow or unintelligible? We are currently looking for more case studies. If you are interested in improving your Linux-based build (commercial or open-source), please get in touch!
https://buildinfer.loopperfect.com/

The C++ community is fragmented due to the variety of build-systems used. This fragmentation makes it difficult to:

  • Understand how a third-party library works
  • Integrate two libraries together
  • Optimize across libraries (e.g. LTO)
  • Build tooling for C++ source-code (where are the headers?)
  • Implement artefact caching
  • Identify which steps in the build process are slowing you down

For the C++ community this means:

  • Time wasted gluing projects together
  • Time wasted waiting for slow builds
  • Time wasted rewriting code that already exists, but is difficult to integrate

Wouldn’t it be great if we could extract a readable build description of any C++ project, regardless of the build-system used?

Announcing BuildInfer

By recording the build process at a system-level, we can infer high-level information about the project structure. Since BuildInfer records at this low-level, our technique works for any C++ build system.

Once we have a high-level build description, we can visualize, transform and even port complex build-systems to more powerful ones, such as Buck and Bazel.

Initial Findings

We have already had success porting the following projects to Buck:

Furthermore, we discovered that none of these projects ship with a reproducible build-system! By porting them to Buck, we can guarantee this.

Reproducible builds are crucial for security, cache performance and debugging. For more information, see:

Mapnik

Mapnik is an open source mapping toolkit for desktop- and server-based map rendering, written in C++.
  • Porting Mapnik from SCons to Buck reduces build-times from 30 minutes to 6 minutes.
  • We estimate that enabling precompiled-headers will improve build-times by a further 10%.
  • Mapnik does not use version-scripts and requires-fvisibility=inline and shared builds to prevent symbol clashes. We identified that the core issue are non static definitions within this header-file using BuildInfer’s output.
  • We generated a graph showing the interaction of individual file groups and executables:

The graph tells us the following:

  • Mapnik doesn’t use version-scripts
  • Some object-files have a *.os extension
  • Several translation units are generated by scons/scons.py

LLVM & Clang

The LLVM compiler infrastructure project is a “collection of modular and reusable compiler and toolchain technologies” used to develop compiler front ends and back ends.
  • By default, LLVM enforces a “super-project” structure that forces you to lay out your project in a particular way. Using BuildInfer, LLVM can be refactored into many small modules.
  • Similar straight-line build times but massively improved incremental builds.
  • We can cache LLVM Tablegen artefacts — this is not possible with CCache.
  • We estimate that the straight-line build-time can be improved by 20% build using precompiled headers. This is much easier to implement using the information extracted by BuildInfer.

We also performed a simple analysis of the build-times in the context of the dependency graph. These tables show the estimated cost of changing various files in LLVM by summing the time taken to build each file plus all of its dependees.

These files are “hot-spots” for incremental builds, and might be good candidates for refactoring.

Please note that these numbers are assuming a single-threaded build; a real build would take some scale-factor of these times.

10 Most Impactful Headers
10 Most Impactful Translation-units

The header llvm-config.h is particularly interesting. It defines some constants referenced throughout the project:

So every time the version string or default target is changed, LLVM incurrs a (scale-factor of) 15,264 second build! These values could potentially be refactored into a translation-unit:

We also generated a graph showing the interaction of individual file groups and executables:

The graph show us that:

  • version-scripts are generated via Bash and called *.export
  • *.td files are used by tblgen to generate *.inc header files.
  • Clang and LLVM each have their own tblgen (how do they differ?)
  • Many files are generated by CMake directly.

OpenCV

OpenCV is a library of programming functions mainly aimed at real-time computer vision.
  • OpenCV can actually be split into multiple self-contained modules.
  • BuildInfer identified that one of these modules is dependent on a implementation detail of another module, so we submitted a patch.
  • Using Buck, OpenCV’s incremental build-time can be improved by disabling precompiled-headers after the full build has completed.

We also generated a graph showing the interaction of individual file groups and executables:

  • *.cl files are used by cl2cpp.cmake to generate *.cpp files
  • A pkg-config is generated by OpenCVGenPkgconfig.cmake
  • Precompiled headers are used *.gch and the header entrypoint is not generated.
  • OpenCV’s build system uses Prolog!

Intrigued?

Is your C++ build slow or unintelligible? We are currently looking for more case studies. If you are interested in improving your Linux-based build (commercial or open-source), please get in touch!

You may also be interested in…

More by Buckaroo - C/C++ Package

Topics of interest

More Related Stories