Fuzzing LLVM libraries and tools

Fuzzing LLVM libraries and tools

Introduction

The LLVM tree includes a number of fuzzers for various components. These arebuilt on top of LibFuzzer. In order to build and run thesefuzzers, see Configuring LLVM to Build Fuzzers.

Available Fuzzers

clang-fuzzer

A generic fuzzer that tries to compile textual input as C++ code. Some of thebugs this fuzzer has reported are on bugzilla and on OSS Fuzz’stracker.

clang-proto-fuzzer

A libprotobuf-mutator based fuzzer that compiles valid C++ programs generated from a protobufclass that describes a subset of the C++ language.

This fuzzer accepts clang command line options after ignore_remaining_args=1.For example, the following command will fuzz clang with a higher optimizationlevel:

% bin/clang-proto-fuzzer <corpus-dir> -ignore_remaining_args=1 -O3

clang-format-fuzzer

A generic fuzzer that runs clang-format on C++ text fragments. Some of thebugs this fuzzer has reported are on bugzillaand on OSS Fuzz’s tracker.

llvm-as-fuzzer

A generic fuzzer that tries to parse text as LLVM assembly.Some of the bugs this fuzzer has reported are on bugzilla.

llvm-dwarfdump-fuzzer

A generic fuzzer that interprets inputs as object files and runsllvm-dwarfdump on them. Some of the bugsthis fuzzer has reported are on OSS Fuzz’s tracker

llvm-demangle-fuzzer

A generic fuzzer for the Itanium demangler used in various LLVM tools. We’vefuzzed __cxa_demangle to death, why not fuzz LLVM’s implementation of the samefunction!

llvm-isel-fuzzer

A structured LLVM IR fuzzer aimed at finding bugs in instruction selection.

This fuzzer accepts flags after ignore_remaining_args=1. The flags matchthose of llc and the triple is required. For example,the following command would fuzz AArch64 with Global Instruction Selection:

% bin/llvm-isel-fuzzer <corpus-dir> -ignore_remaining_args=1 -mtriple aarch64 -global-isel -O0

Some flags can also be specified in the binary name itself in order to supportOSS Fuzz, which has trouble with required arguments. To do this, you can copyor move llvm-isel-fuzzer to llvm-isel-fuzzer—x-y-z, separating optionsfrom the binary name using “–”. The valid options are architecture names(aarch64, x86_64), optimization levels (O0, O2), or specifickeywords, like gisel for enabling global instruction selection. In thismode, the same example could be run like so:

% bin/llvm-isel-fuzzer--aarch64-O0-gisel <corpus-dir>

llvm-opt-fuzzer

A structured LLVM IR fuzzer aimed at finding bugs in optimization passes.

It receives optimization pipeline and runs it for each fuzzer input.

Interface of this fuzzer almost directly mirrors llvm-isel-fuzzer. Bothmtriple and passes arguments are required. Passes are specified in aformat suitable for the new pass manager. You can find some documentation aboutthis format in the doxygen for PassBuilder::parsePassPipeline.

% bin/llvm-opt-fuzzer <corpus-dir> -ignore_remaining_args=1 -mtriple x86_64 -passes instcombine

Similarly to the llvm-isel-fuzzer arguments in some predefined configurationsmight be embedded directly into the binary file name:

% bin/llvm-opt-fuzzer--x86_64-instcombine <corpus-dir>

llvm-mc-assemble-fuzzer

A generic fuzzer that fuzzes the MC layer’s assemblers by treating inputs astarget specific assembly.

Note that this fuzzer has an unusual command line interface which is not fullycompatible with all of libFuzzer’s features. Fuzzer arguments must be passedafter —fuzzer-args, and any llc flags must use two dashes. Forexample, to fuzz the AArch64 assembler you might use the following command:

llvm-mc-fuzzer --triple=aarch64-linux-gnu --fuzzer-args -max_len=4

This scheme will likely change in the future.

llvm-mc-disassemble-fuzzer

A generic fuzzer that fuzzes the MC layer’s disassemblers by treating inputsas assembled binary data.

Note that this fuzzer has an unusual command line interface which is not fullycompatible with all of libFuzzer’s features. See the notes above aboutllvm-mc-assemble-fuzzer for details.

Mutators and Input Generators

The inputs for a fuzz target are generated via random mutations of acorpus. There are a few options for the kinds ofmutations that a fuzzer in LLVM might want.

Generic Random Fuzzing

The most basic form of input mutation is to use the built in mutators ofLibFuzzer. These simply treat the input corpus as a bag of bits and make randommutations. This type of fuzzer is good for stressing the surface layers of aprogram, and is good at testing things like lexers, parsers, or binaryprotocols.

Some of the in-tree fuzzers that use this type of mutator are clang-fuzzer,clang-format-fuzzer, llvm-as-fuzzer, llvm-dwarfdump-fuzzer,llvm-mc-assemble-fuzzer, and llvm-mc-disassemble-fuzzer.

Structured Fuzzing using libprotobuf-mutator

We can use libprotobuf-mutator in order to perform structured fuzzing andstress deeper layers of programs. This works by defining a protobuf class thattranslates arbitrary data into structurally interesting input. Specifically, weuse this to work with a subset of the C++ language and perform mutations thatproduce valid C++ programs in order to exercise parts of clang that are moreinteresting than parser error handling.

To build this kind of fuzzer you need protobuf and its dependenciesinstalled, and you need to specify some extra flags when configuring the buildwith CMake. For example, clang-proto-fuzzer can be enabled byadding -DCLANG_ENABLE_PROTO_FUZZER=ON to the flags described inConfiguring LLVM to Build Fuzzers.

The only in-tree fuzzer that uses libprotobuf-mutator today isclang-proto-fuzzer.

Structured Fuzzing of LLVM IR

We also use a more direct form of structured fuzzing for fuzzers that takeLLVM IR as input. This is achieved through the FuzzMutatelibrary, which was discussed at EuroLLVM 2017.

The FuzzMutate library is used to structurally fuzz backends inllvm-isel-fuzzer.

Building and Running

Configuring LLVM to Build Fuzzers

Fuzzers will be built and linked to libFuzzer by default as long as you buildLLVM with sanitizer coverage enabled. You would typically also enable at leastone sanitizer to find bugs faster. The most common way to build the fuzzers isby adding the following two flags to your CMake invocation:-DLLVM_USE_SANITIZER=Address -DLLVM_USE_SANITIZE_COVERAGE=On.

Note

If you have compiler-rt checked out in an LLVM tree when buildingwith sanitizers, you’ll want to specify -DLLVM_BUILD_RUNTIME=Offto avoid building the sanitizers themselves with sanitizers enabled.

Note

You may run into issues if you build with BFD ld, which is thedefault linker on many unix systems. These issues are being trackedin https://llvm.org/PR34636.

Continuously Running and Finding Bugs

There used to be a public buildbot running LLVM fuzzers continuously, and whilethis did find issues, it didn’t have a very good way to report problems in anactionable way. Because of this, we’re moving towards using OSS Fuzz moreinstead.

You can browse the LLVM project issue list for the bugs found byLLVM on OSS Fuzz. These are also mailed to the llvm-bugs mailinglist.

Utilities for Writing Fuzzers

There are some utilities available for writing fuzzers in LLVM.

Some helpers for handling the command line interface are available ininclude/llvm/FuzzMutate/FuzzerCLI.h, including functions to parse commandline options in a consistent way and to implement standalone main functions soyour fuzzer can be built and tested when not built against libFuzzer.

There is also some handling of the CMake config for fuzzers, where you shoulduse the add_llvm_fuzzer to set up fuzzer targets. This function workssimilarly to functions such as add_llvm_tool, but they take care of linkingto LibFuzzer when appropriate and can be passed the DUMMY_MAIN argument toenable standalone testing.