Dependence Graphs in LLVM

Dependence Graphs in LLVM

Introduction

Dependence graphs are useful tools in compilers for analyzing relationshipsbetween various program elements to help guide optimizations. The ideasbehind these graphs are described in papers [1] and [2].

The implementation of these ideas in LLVM may be slightly different thanwhat is mentioned in the papers. These differences are documented inthe implementation details.

Data Dependence Graph

In its simplest form the Data Dependence Graph (or DDG) represents datadependencies between individual instructions. Each node in such a graphrepresents a single instruction and is referred to as an “atomic” node.It is also possible to combine some atomic nodes that have a simpledef-use dependency between them into larger nodes that contain multiple-instructions.

As described in [1] the DDG uses graph abstraction to group nodesthat are part of a strongly connected component of the graphinto special nodes called pi-blocks. pi-blocks represent cycles of datadependency that prevent reordering transformations. Since any stronglyconnected component of the graph is a maximal subgraph of all the nodesthat form a cycle, pi-blocks are at most one level deep. In other words,no pi-blocks are nested inside another pi-block, resulting in ahierarchical representation that is at most one level deep.

For example, consider the following:

for (int i = 1; i < n; i++) {
  b[i] = c[i] + b[i-1];
}

This code contains a statement that has a loop carried dependence onitself creating a cycle in the DDG. The figure bellow illustrateshow the cycle of dependency is carried through multiple def-use relationsand a memory access dependency.The DDG corresponding to this example would have a pi-block that containsall the nodes participating in the cycle, as shown bellow:

Program Dependence Graph

The Program Dependence Graph (or PDG) has a similar structure as theDDG, but it is capable of representing both data dependencies andcontrol-flow dependencies between program elements such asinstructions, groups of instructions, basic blocks or groups ofbasic blocks.

High-Level Design

The DDG and the PDG are both directed graphs and they extend theDirectedGraph class. Each implementation extends its correspondingnode and edge types resulting in the inheritance relationship depictedin the UML diagram bellow:

Graph Construction

The graph build algorithm considers dependencies between elements ofa given set of instructions or basic blocks. Any dependencies cominginto or going out of instructions that do not belong to that rangeare ignored. The steps in the build algorithm for the DDG are verysimilar to the steps in the build algorithm for the PDG. As such,one of the design goals is to reuse the build algorithm code toallow creation of both DDG and PDG representations while allowingthe two implementations to define their own distinct and independentnode and edge types. This is achieved by using the well-known builderdesign pattern to isolate the construction of the dependence graphfrom its concrete representation.

The following UML diagram depicts the overall structure of the designpattern as it applies to the dependence graph implementation.Notice that the common code for building the two types of graphs areprovided in the DependenceGraphBuilder class, while the DDGBuilderand PDGBuilder control some aspects of how the graph is constructedby the way of overriding virtual methods defined in DependenceGraphBuilder.

Note also that the steps and the names used in this diagram are forillustrative purposes and may be different from those in the actualimplementation.

Design Trade-offs

Advantages:

Builder allows graph construction code to be reused for DDG and PDG.
Builder allows us to create DDG and PDG as separate graphs.
DDG nodes and edges are completely disjoint from PDG nodes and edges allowing them to change easily and independently.

Disadvantages:

Builder may be perceived as over-engineering at first.
There are some similarities between DDG nodes and edges compared to PDG nodes and edges, but there is little reuse of the class definitions.
This is tolerable given that the node and edge types are fairly simple and there is little code reuse opportunity anyway.

Implementation Details

The current implementation of DDG differs slightly from the dependencegraph described in [1] in the following ways:

The graph nodes in the paper represent three main program components, namely assignment statements, for loop headers and while loop headers. In this implementation, DDG nodes naturally represent LLVM IR instructions. An assignment statement in this implementation typically involves a node representing the store instruction along with a number of individual nodes computing the right-hand-side of the assignment that connect to the store node via a def-use edge. The loop header instructions are not represented as special nodes in this implementation because they have limited uses and can be easily identified, for example, through LoopAnalysis.
The paper describes five types of dependency edges between nodes namely loop dependency, flow-, anti-, output-, and input- dependencies. In this implementation memory edges represent the flow-, anti-, output-, and input- dependencies. However, loop dependencies are not made explicit, because they mainly represent association between a loop structure and the program elements inside the loop and this association is fairly obvious in LLVM IR itself.
The paper describes two types of pi-blocks; recurrences whose bodies are SCCs and IN nodes whose bodies are not part of any SCC. In this implementation, pi-blocks are only created for recurrences. IN nodes remain as simple DDG nodes in the graph.

References

[1]	(1, 2, 3) “D. J. Kuck, R. H. Kuhn, D. A. Padua, B. Leasure, and M. Wolfe (1981). DEPENDENCE GRAPHS AND COMPILER OPTIMIZATIONS.”

[2]	“J. FERRANTE (IBM), K. J. OTTENSTEIN (Michigan Technological University) and JOE D. WARREN (Rice University), 1987. The Program Dependence Graph and Its Use in Optimization.”