Building a Distribution of LLVM

Introduction

This document is geared toward people who want to build and package LLVM and anycombination of LLVM sub-project tools for distribution. This document coversuseful features of the LLVM build system as well as best practices and generalinformation about packaging LLVM.

If you are new to CMake you may find the Building LLVM with CMake or CMake Primerdocumentation useful. Some of the things covered in this document are the innerworkings of the builds described in the Advanced Build Configurations document.

General Distribution Guidance

When building a distribution of a compiler it is generally advised to perform abootstrap build of the compiler. That means building a “stage 1” compiler withyour host toolchain, then building the “stage 2” compiler using the “stage 1”compiler. This is done so that the compiler you distribute benefits from all thebug fixes, performance optimizations and general improvements provided by thenew compiler.

In deciding how to build your distribution there are a few trade-offs that youwill need to evaluate. The big two are:

  • Compile time of the distribution against performance of the built compiler
  • Binary size of the distribution against performance of the built compilerThe guidance for maximizing performance of the generated compiler is to use LTO,PGO, and statically link everything. This will result in an overall largerdistribution, and it will take longer to generate, but it provides the mostopportunity for the compiler to optimize.

The guidance for minimizing distribution size is to dynamically link LLVM andClang libraries into the tools to reduce code duplication. This will come at asubstantial performance penalty to the generated binary both because it reducesoptimization opportunity, and because dynamic linking requires resolving symbolsat process launch time, which can be very slow for C++ code.

Warning

One very important note: Distributions should never be built using theBUILD_SHARED_LIBS CMake option. That option exists for optimizing developerworkflow only. Due to design and implementation decisions, LLVM relies onglobal data which can end up being duplicated across shared librariesresulting in bugs. As such this is not a safe way to distribute LLVM orLLVM-based tools.

The simplest example of building a distribution with reasonable performance iscaptured in the DistributionExample CMake cache file located atclang/cmake/caches/DistributionExample.cmake. The following command will performand install the distribution build:

  1. $ cmake -G Ninja -C <path to clang>/cmake/caches/DistributionExample.cmake <path to LLVM source>
  2. $ ninja stage2-distribution
  3. $ ninja stage2-install-distribution

Difference between install and install-distribution

One subtle but important thing to note is the difference between the installand install-distribution targets. The install target is expected toinstall every part of LLVM that your build is configured to generate except theLLVM testing tools. Alternatively the install-distribution target, which isrecommended for building distributions, only installs specific parts of LLVM asspecified at configuration time by LLVM_DISTRIBUTION_COMPONENTS.

Additionally by default the install target will install the LLVM testingtools as the public tools. This can be changed well by settingLLVM_INSTALL_TOOLCHAIN_ONLY to On. The LLVM tools are intended fordevelopment and testing of LLVM, and should only be included in distributionsthat support LLVM development.

When building with LLVM_DISTRIBUTION_COMPONENTS the build system alsogenerates a distribution target which builds all the components specified inthe list. This is a convenience build target to allow building just thedistributed pieces without needing to build all configured targets.

Special Notes for Library-only Distributions

One of the most powerful features of LLVM is its library-first design mentalityand the way you can compose a wide variety of tools using different portions ofLLVM. Even in this situation using BUILD_SHARED_LIBS is not supported. If youwant to distribute LLVM as a shared library for use in a tool, the recommendedmethod is using LLVM_BUILD_LLVM_DYLIB, and you can use LLVM_DYLIB_COMPONENTS_to configure which LLVM components are part of libLLVM.Note: _LLVM_BUILD_LLVM_DYLIB is not available on Windows.

Options for Optimizing LLVM

There are four main build optimizations that our CMake build system supports.When performing a bootstrap build it is not beneficial to do anything other thansetting CMAKE_BUILD_TYPE to Release for the stage-1 compiler. This isbecause the more intensive optimizations are expensive to perform and thestage-1 compiler is thrown away. All of the further options described should beset on the stage-2 compiler either using a CMake cache file, or by prefixing theoption with BOOTSTRAP_.

The first and simplest to use is the compiler optimization level by setting theCMAKE_BUILD_TYPE option. The main values of interest are Release orRelWithDebInfo. By default the Release option uses the -O3optimization level, and RelWithDebInfo uses -O2. If you want to generatedebug information and use -O3 you can override theCMAKE<LANG>FLAGS_RELWITHDEBINFO option for C and CXX.DistributionExample.cmake does this.

Another easy to use option is Link-Time-Optimization. You can set theLLVM_ENABLE_LTO option on your stage-2 build to Thin or Full to enablebuilding LLVM with LTO. These options will significantly increase link time ofthe binaries in the distribution, but it will create much faster binaries. Thisoption should not be used if your distribution includes static archives, as theobjects inside the archive will be LLVM bitcode, which is not portable.

The Advanced Build Configurations documentation describes the built-in tooling forgenerating LLVM profiling information to drive Profile-Guided-Optimization. Thein-tree profiling tests are very limited, and generating the profile takes asignificant amount of time, but it can result in a significant improvement inthe performance of the generated binaries.

In addition to PGO profiling we also have limited support in-tree for generatinglinker order files. These files provide the linker with a suggested ordering forfunctions in the final binary layout. This can measurably speed up clang byphysically grouping functions that are called temporally close to each other.The current tooling is only available on Darwin systems with dtrace(1). Itis worth noting that dtrace is non-deterministic, and so the order filegeneration using dtrace is also non-deterministic.

Options for Reducing Size

Warning

Any steps taken to reduce the binary size will come at a cost of runtimeperformance in the generated binaries.

The simplest and least significant way to reduce binary size is to set theCMAKE_BUILD_TYPE variable to MinSizeRel, which will set the compileroptimization level to -Os which optimizes for binary size. This will haveboth the least benefit to size and the least impact on performance.

The most impactful way to reduce binary size is to dynamically link LLVM intoall the tools. This reduces code size by decreasing duplication of common codebetween the LLVM-based tools. This can be done by setting the following twoCMake options to On: LLVM_BUILD_LLVM_DYLIB and LLVM_LINK_LLVM_DYLIB.

Warning

Distributions should never be built using the BUILD_SHARED_LIBS CMakeoption. (See the warning above for more explanation.).

Relevant CMake Options

This section provides documentation of the CMake options that are intended tohelp construct distributions. This is not an exhaustive list, and manyadditional options are documented in the Building LLVM with CMake page. Some key optionsthat are already documented include: LLVM_TARGETS_TO_BUILD,LLVM_ENABLE_PROJECTS, LLVM_BUILD_LLVM_DYLIB, and LLVM_LINK_LLVM_DYLIB.

  • LLVM_ENABLE_RUNTIMES:STRING
  • When building a distribution that includes LLVM runtime projects (i.e. libcxx,compiler-rt, libcxxabi, libunwind…), it is important to build those projectswith the just-built compiler.
  • LLVM_DISTRIBUTION_COMPONENTS:STRING
  • This variable can be set to a semi-colon separated list of LLVM build systemcomponents to install. All LLVM-based tools are components, as well as mostof the libraries and runtimes. Component names match the names of the buildsystem targets.
  • LLVM_RUNTIME_DISTRIBUTION_COMPONENTS:STRING
  • This variable can be set to a semi-colon separated list of runtime librarycomponents. This is used in conjunction with LLVM_ENABLE_RUNTIMES to specifycomponents of runtime libraries that you want to include in your distribution.Just like with LLVM_DISTRIBUTION_COMPONENTS, component names match the namesof the build system targets.
  • LLVM_DYLIB_COMPONENTS:STRING
  • This variable can be set to a semi-colon separated name of LLVM librarycomponents. LLVM library components are either library names with the LLVMprefix removed (i.e. Support, Demangle…), LLVM target names, or specialpurpose component names. The special purpose component names are:

    • all - All LLVM available component libraries
    • Native - The LLVM target for the Native system
    • AllTargetsAsmParsers - All the included target ASM parsers libraries
    • AllTargetsDescs - All the included target descriptions libraries
    • AllTargetsDisassemblers - All the included target dissassemblers libraries
    • AllTargetsInfos - All the included target info libraries
  • LLVM_INSTALL_TOOLCHAIN_ONLY:BOOL
  • This option defaults to Off: when set to On it removes many of theLLVM development and testing tools as well as component libraries from thedefault install target. Including the development tools is not recommendedfor distributions as many of the LLVM tools are only intended for developmentand testing use.