The PDB File Format

Introduction

PDB (Program Database) is a file format invented by Microsoft and which containsdebug information that can be consumed by debuggers and other tools. Sinceofficially supported APIs exist on Windows for querying debug information fromPDBs even without the user understanding the internals of the file format, alarge ecosystem of tools has been built for Windows to consume this format. Inorder for Clang to be able to generate programs that can interoperate with thesetools, it is necessary for us to generate PDB files ourselves.

At the same time, LLVM has a long history of being able to cross-compile fromany platform to any platform, and we wish for the same to be true here. So itis necessary for us to understand the PDB file format at the byte-level so thatwe can generate PDB files entirely on our own.

This manual describes what we know about the PDB file format today. The layoutof the file, the various streams contained within, the format of individualrecords within, and more.

We would like to extend our heartfelt gratitude to Microsoft, without whom wewould not be where we are today. Much of the knowledge contained within thismanual was learned through reading code published by Microsoft on their GitHubrepo.

File Layout

Important

Unless otherwise specified, all numeric values are encoded in little endian.If you see a type such as uint16_t or uint64_t going forward, alwaysassume it is little endian!

The MSF Container

A PDB file is an MSF (Multi-Stream Format) file. An MSF file is a “file systemwithin a file”. It contains multiple streams (aka files) which can representarbitrary data, and these streams are divided into blocks which may notnecessarily be contiguously laid out within the MSF container file.Additionally, the MSF contains a stream directory (aka MFT) which describes howthe streams (files) are laid out within the MSF.

For more information about the MSF container format, stream directory, andblock layout, see The MSF File Format.

Streams

The PDB format contains a number of streams which describe various informationsuch as the types, symbols, source files, and compilands (e.g. object files)of a program, as well as some additional streams containing hash tables that areused by debuggers and other tools to provide fast lookup of records and typesby name, and various other information about how the program was compiled suchas the specific toolchain used, and more. A summary of streams contained in aPDB file is as follows:

NameStream IndexContents
Old Directory- Fixed Stream Index 0- Previous MSF Stream Directory
PDB Stream- Fixed Stream Index 1- Basic File Information- Fields to match EXE to this PDB- Map of named streams to stream indices
TPI Stream- Fixed Stream Index 2- CodeView Type Records- Index of TPI Hash Stream
DBI Stream- Fixed Stream Index 3- Module/Compiland Information- Indices of individual module streams- Indices of public / global streams- Section Contribution Information- Source File Information- References to streams containingFPO / PGO Data
IPI Stream- Fixed Stream Index 4- CodeView Type Records- Index of IPI Hash Stream
/LinkInfo- Contained in PDB StreamNamed Stream map- Unknown
/src/headerblock- Contained in PDB StreamNamed Stream map- Summary of embedded source file content(e.g. natvis files)
/names- Contained in PDB StreamNamed Stream map- PDB-wide global string table used forstring de-duplication
Module Info Stream- Contained in DBI Stream- One for each compiland- CodeView Symbol Records for this module- Line Number Information
Public Stream- Contained in DBI Stream- Public (Exported) Symbol Records- Index of Public Hash Stream
Global Stream- Contained in DBI Stream- Single combined master symbol-table- Index of Global Hash Stream
TPI Hash Stream- Contained in TPI Stream- Hash table for looking up TPI recordsby name
IPI Hash Stream- Contained in IPI Stream- Hash table for looking up IPI recordsby name

More information about the structure of each of these can be found on thefollowing pages:

CodeView

CodeView is another format which comes into the picture. While MSF definesthe structure of the overall file, and PDB defines the set of streams thatappear within the MSF file and the format of those streams, CodeView definesthe format of symbol and type records that appear within specific streams.Refer to the pages on CodeView Symbol Records and CodeView Type Records formore information about the CodeView format.