How To Build Clang and LLVM with Profile-Guided Optimizations

Introduction

PGO (Profile-Guided Optimization) allows your compiler to better optimize codefor how it actually runs. Users report that applying this to Clang and LLVM candecrease overall compile time by 20%.

This guide walks you through how to build Clang with PGO, though it also appliesto other subprojects, such as LLD.

Using the script

We have a script at utils/collect_and_build_with_pgo.py. This script istested on a few Linux flavors, and requires a checkout of LLVM, Clang, andcompiler-rt. Despite the name, it performs four clean builds of Clang, so itcan take a while to run to completion. Please see the script’s —help formore information on how to run it, and the different options available to you.If you want to get the most out of PGO for a particular use-case (e.g. compilinga specific large piece of software), please do read the section below on‘benchmark’ selection.

Please note that this script is only tested on a few Linux distros. Patches toadd support for other platforms, as always, are highly appreciated. :)

This script also supports a —dry-run option, which causes it to printimportant commands instead of running them.

Selecting ‘benchmarks’

PGO does best when the profiles gathered represent how the user plans to use thecompiler. Notably, highly accurate profiles of llc building x86_64 code aren’tincredibly helpful if you’re going to be targeting ARM.

By default, the script above does two things to get solid coverage. It:

  • runs all of Clang and LLVM’s lit tests, and
  • uses the instrumented Clang to build Clang, LLVM, and all of the otherLLVM subprojects available to it.

Together, these should give you:

  • solid coverage of building C++,
  • good coverage of building C,
  • great coverage of running optimizations,
  • great coverage of the backend for your host’s architecture, and
  • some coverage of other architectures (if other arches are supported backends).

Altogether, this should cover a diverse set of uses for Clang and LLVM. If youhave very specific needs (e.g. your compiler is meant to compile a large browserfor four different platforms, or similar), you may want to do something else.This is configurable in the script itself.

Building Clang with PGO

If you prefer to not use the script, this briefly goes over how to buildClang/LLVM with PGO.

First, you should have at least LLVM, Clang, and compiler-rt checked outlocally.

Next, at a high level, you’re going to need to do the following:

  • Build a standard Release Clang and the relevant libclang_rt.profile library
  • Build Clang using the Clang you built above, but with instrumentation
  • Use the instrumented Clang to generate profiles, which consists of two steps:
  • Running the instrumented Clang/LLVM/lld/etc. on tasks that represent howusers will use said tools.
  • Using a tool to convert the “raw” profiles generated above into a single,final PGO profile.
  • Build a final release Clang (along with whatever other binaries you need)using the profile collected from your benchmarkIn more detailed steps:

  • Configure a Clang build as you normally would. It’s highly recommended thatyou use the Release configuration for this, since it will be used to buildanother Clang. Because you need Clang and supporting libraries, you’ll wantto build the all target (e.g. ninja all or make -j4 all).

  • Configure a Clang build as above, but add the following CMake args:
    • -DLLVM_BUILD_INSTRUMENTED=IR – This causes us to build everythingwith instrumentation.
    • -DLLVM_BUILD_RUNTIME=No – A few projects have bad interactions whenbuilt with profiling, and aren’t necessary to build. This flag turns themoff.
    • -DCMAKE_C_COMPILER=/path/to/stage1/clang - Use the Clang we built instep 1.
    • -DCMAKE_CXX_COMPILER=/path/to/stage1/clang++ - Same as above.
In this build directory, you simply need to build the clang target (andwhatever supporting tooling your benchmark requires).
  • As mentioned above, this has two steps: gathering profile data, and thenmassaging it into a useful form:

    • Build your benchmark using the Clang generated in step 2. The ‘standard’benchmark recommended is to run check-clang and check-llvm in yourinstrumented Clang’s build directory, and to do a full build of Clang/LLVMusing your instrumented Clang. So, create yet another build directory,with the following CMake arguments:

      • -DCMAKE_C_COMPILER=/path/to/stage2/clang - Use the Clang we built instep 2.
      • -DCMAKE_CXX_COMPILER=/path/to/stage2/clang++ - Same as above.If your users are fans of debug info, you may want to consider using-DCMAKE_BUILD_TYPE=RelWithDebInfo instead of-DCMAKE_BUILD_TYPE=Release. This will grant better coverage ofdebug info pieces of clang, but will take longer to complete and willresult in a much larger build directory.

It’s recommended to build the all target with your instrumented Clang,since more coverage is often better.

  1. You should now have a few .profraw files inpath/to/stage2/profiles/. You need to merge these usingllvm-profdata (even if you only have one! The profile merge transformsprofraw into actual profile data, as well). This can be done with/path/to/stage1/llvm-profdata merge-output=/path/to/output/profdata.prof path/to/stage2/profiles/.profraw.
  • Now, build your final, PGO-optimized Clang. To do this, you’ll want to passthe following additional arguments to CMake.

    • -DLLVM_PROFDATA_FILE=/path/to/output/profdata.prof - Use the PGOprofile from the previous step.
    • -DCMAKE_C_COMPILER=/path/to/stage1/clang - Use the Clang we built instep 1.
    • -DCMAKE_CXX_COMPILER=/path/to/stage1/clang++ - Same as above.From here, you can build whatever targets you need.

Note

You may see warnings about a mismatched profile in the build output. Theseare generally harmless. To silence them, you can add-DCMAKE_C_FLAGS='-Wno-backend-plugin'-DCMAKE_CXX_FLAGS='-Wno-backend-plugin' to your CMake invocation.

Congrats! You now have a Clang built with profile-guided optimizations, and youcan delete all but the final build directory if you’d like.

If this worked well for you and you plan on doing it often, there’s a slightoptimization that can be made: LLVM and Clang have a tool called tblgen that’sbuilt and run during the build process. While it’s potentially nice to buildthis for coverage as part of step 3, none of your other builds should benefitfrom building it. You can pass the CMake options-DCLANG_TABLEGEN=/path/to/stage1/bin/clang-tblgen-DLLVM_TABLEGEN=/path/to/stage1/bin/llvm-tblgen to steps 2 and onward to avoidthese useless rebuilds.