Stack maps and patch points in LLVM

Stack maps and patch points in LLVM

Definitions

In this document we refer to the “runtime” collectively as allcomponents that serve as the LLVM client, including the LLVM IRgenerator, object code consumer, and code patcher.

A stack map records the location of live values at a particularinstruction address. These live values do not refer to all theLLVM values live across the stack map. Instead, they are only thevalues that the runtime requires to be live at this point. Forexample, they may be the values the runtime will need to resumeprogram execution at that point independent of the compiled functioncontaining the stack map.

LLVM emits stack map data into the object code within a designatedStack Map Section. This stack map data contains a record foreach stack map. The record stores the stack map’s instruction addressand contains a entry for each mapped value. Each entry encodes avalue’s location as a register, stack offset, or constant.

A patch point is an instruction address at which space is reserved forpatching a new instruction sequence at run time. Patch points lookmuch like calls to LLVM. They take arguments that follow a callingconvention and may return a value. They also imply stack mapgeneration, which allows the runtime to locate the patchpoint andfind the location of live values at that point.

Motivation

This functionality is currently experimental but is potentially usefulin a variety of settings, the most obvious being a runtime (JIT)compiler. Example applications of the patchpoint intrinsics areimplementing an inline call cache for polymorphic method dispatch oroptimizing the retrieval of properties in dynamically typed languagessuch as JavaScript.

The intrinsics documented here are currently used by the JavaScriptcompiler within the open source WebKit project, see the FTL JIT, but they are designed to beused whenever stack maps or code patching are needed. Because theintrinsics have experimental status, compatibility across LLVMreleases is not guaranteed.

The stack map functionality described in this document is separatefrom the functionality described inComputing stack maps. GCFunctionMetadata provides the location ofpointers into a collected heap captured by the GCRoot intrinsic,which can also be considered a “stack map”. Unlike the stack mapsdefined above, the GCFunctionMetadata stack map interface does notprovide a way to associate live register values of arbitrary type withan instruction address, nor does it specify a format for the resultingstack map. The stack maps described here could potentially providericher information to a garbage collecting runtime, but that usagewill not be discussed in this document.

Intrinsics

The following two kinds of intrinsics can be used to implement stackmaps and patch points: llvm.experimental.stackmap andllvm.experimental.patchpoint. Both kinds of intrinsics generate astack map record, and they both allow some form of code patching. Theycan be used independently (i.e. llvm.experimental.patchpointimplicitly generates a stack map without the need for an additionalcall to llvm.experimental.stackmap). The choice of which to usedepends on whether it is necessary to reserve space for code patchingand whether any of the intrinsic arguments should be lowered accordingto calling conventions. llvm.experimental.stackmap does notreserve any space, nor does it expect any call arguments. If theruntime patches code at the stack map’s address, it will destructivelyoverwrite the program text. This is unlikellvm.experimental.patchpoint, which reserves space for in-placepatching without overwriting surrounding code. Thellvm.experimental.patchpoint intrinsic also lowers a specifiednumber of arguments according to its calling convention. This allowspatched code to make in-place function calls without marshaling.

Each instance of one of these intrinsics generates a stack map recordin the Stack Map Section. The record includes an ID, allowingthe runtime to uniquely identify the stack map, and the offset withinthe code from the beginning of the enclosing function.

‘llvm.experimental.stackmap’ Intrinsic

Syntax:

declare void
  @llvm.experimental.stackmap(i64 <id>, i32 <numShadowBytes>, ...)

Overview:

The ‘llvm.experimental.stackmap’ intrinsic records the location ofspecified values in the stack map without generating any code.

Operands:

The first operand is an ID to be encoded within the stack map. Thesecond operand is the number of shadow bytes following theintrinsic. The variable number of operands that follow are the livevalues for which locations will be recorded in the stack map.

To use this intrinsic as a bare-bones stack map, with no code patchingsupport, the number of shadow bytes can be set to zero.

Semantics:

The stack map intrinsic generates no code in place, unless nops areneeded to cover its shadow (see below). However, its offset fromfunction entry is stored in the stack map. This is the relativeinstruction address immediately following the instructions thatprecede the stack map.

The stack map ID allows a runtime to locate the desired stack maprecord. LLVM passes this ID through directly to the stack maprecord without checking uniqueness.

LLVM guarantees a shadow of instructions following the stack map’sinstruction offset during which neither the end of the basic block noranother call to llvm.experimental.stackmap orllvm.experimental.patchpoint may occur. This allows the runtime topatch the code at this point in response to an event triggered fromoutside the code. The code for instructions following the stack mapmay be emitted in the stack map’s shadow, and these instructions maybe overwritten by destructive patching. Without shadow bytes, thisdestructive patching could overwrite program text or data outside thecurrent function. We disallow overlapping stack map shadows so thatthe runtime does not need to consider this corner case.

For example, a stack map with 8 byte shadow:

call void @runtime()
call void (i64, i32, ...)* @llvm.experimental.stackmap(i64 77, i32 8,
                                                       i64* %ptr)
%val = load i64* %ptr
%add = add i64 %val, 3
ret i64 %add

May require one byte of nop-padding:

0x00 callq _runtime
0x05 nop                <--- stack map address
0x06 movq (%rdi), %rax
0x07 addq $3, %rax
0x0a popq %rdx
0x0b ret                <---- end of 8-byte shadow

Now, if the runtime needs to invalidate the compiled code, it maypatch 8 bytes of code at the stack map’s address at follows:

0x00 callq _runtime
0x05 movl  $0xffff, %rax <--- patched code at stack map address
0x0a callq *%rax         <---- end of 8-byte shadow

This way, after the normal call to the runtime returns, the code willexecute a patched call to a special entry point that can rebuild astack frame from the values located by the stack map.

‘llvm.experimental.patchpoint.*’ Intrinsic

Syntax:

declare void
  @llvm.experimental.patchpoint.void(i64 <id>, i32 <numBytes>,
                                     i8* <target>, i32 <numArgs>, ...)
declare i64
  @llvm.experimental.patchpoint.i64(i64 <id>, i32 <numBytes>,
                                    i8* <target>, i32 <numArgs>, ...)

Overview:

The ‘llvm.experimental.patchpoint.*’ intrinsics creates a functioncall to the specified <target> and records the location of specifiedvalues in the stack map.

Operands:

The first operand is an ID, the second operand is the number of bytesreserved for the patchable region, the third operand is the targetaddress of a function (optionally null), and the fourth operandspecifies how many of the following variable operands are consideredfunction call arguments. The remaining variable number of operands arethe live values for which locations will be recorded in the stackmap.

Semantics:

The patch point intrinsic generates a stack map. It also emits afunction call to the address specified by <target> if the addressis not a constant null. The function call and its arguments arelowered according to the calling convention specified at theintrinsic’s callsite. Variants of the intrinsic with non-void returntype also return a value according to calling convention.

On PowerPC, note that <target> must be the ABI function pointer for theintended target of the indirect call. Specifically, when compiling for theELF V1 ABI, <target> is the function-descriptor address normally used asthe C/C++ function-pointer representation.

Requesting zero patch point arguments is valid. In this case, allvariable operands are handled just likellvm.experimental.stackmap.*. The difference is that space willstill be reserved for patching, a call will be emitted, and a returnvalue is allowed.

The location of the arguments are not normally recorded in the stackmap because they are already fixed by the calling convention. Theremaining live values will have their location recorded, whichcould be a register, stack location, or constant. A special callingconvention has been introduced for use with stack maps, anyregcc,which forces the arguments to be loaded into registers but allowsthose register to be dynamically allocated. These argument registerswill have their register locations recorded in the stack map inaddition to the remaining live values.

The patch point also emits nops to cover at least <numBytes> ofinstruction encoding space. Hence, the client must ensure that<numBytes> is enough to encode a call to the target address on thesupported targets. If the call target is constant null, then there isno minimum requirement. A zero-byte null target patchpoint isvalid.

The runtime may patch the code emitted for the patch point, includingthe call sequence and nops. However, the runtime may not assumeanything about the code LLVM emits within the reserved space. Partialpatching is not allowed. The runtime must patch all reserved bytes,padding with nops if necessary.

This example shows a patch point reserving 15 bytes, with one argumentin $rdi, and a return value in $rax per native calling convention:

%target = inttoptr i64 -281474976710654 to i8*
%val = call i64 (i64, i32, ...)*
         @llvm.experimental.patchpoint.i64(i64 78, i32 15,
                                           i8* %target, i32 1, i64* %ptr)
%add = add i64 %val, 3
ret i64 %add

May generate:

0x00 movabsq $0xffff000000000002, %r11 <--- patch point address
0x0a callq   *%r11
0x0d nop
0x0e nop                               <--- end of reserved 15-bytes
0x0f addq    $0x3, %rax
0x10 movl    %rax, 8(%rsp)

Note that no stack map locations will be recorded. If the patched codesequence does not need arguments fixed to specific calling conventionregisters, then the anyregcc convention may be used:

%val = call anyregcc @llvm.experimental.patchpoint(i64 78, i32 15,
                                                   i8* %target, i32 1,
                                                   i64* %ptr)

The stack map now indicates the location of the %ptr argument andreturn value:

Stack Map: ID=78, Loc0=%r9 Loc1=%r8

The patch code sequence may now use the argument that happened to beallocated in %r8 and return a value allocated in %r9:

0x00 movslq 4(%r8) %r9              <--- patched code at patch point address
0x03 nop
...
0x0e nop                            <--- end of reserved 15-bytes
0x0f addq    $0x3, %r9
0x10 movl    %r9, 8(%rsp)

Stack Map Format

The existence of a stack map or patch point intrinsic within an LLVMModule forces code emission to create a Stack Map Section. Theformat of this section follows:

Header {
  uint8  : Stack Map Version (current version is 3)
  uint8  : Reserved (expected to be 0)
  uint16 : Reserved (expected to be 0)
}
uint32 : NumFunctions
uint32 : NumConstants
uint32 : NumRecords
StkSizeRecord[NumFunctions] {
  uint64 : Function Address
  uint64 : Stack Size
  uint64 : Record Count
}
Constants[NumConstants] {
  uint64 : LargeConstant
}
StkMapRecord[NumRecords] {
  uint64 : PatchPoint ID
  uint32 : Instruction Offset
  uint16 : Reserved (record flags)
  uint16 : NumLocations
  Location[NumLocations] {
    uint8  : Register | Direct | Indirect | Constant | ConstantIndex
    uint8  : Reserved (expected to be 0)
    uint16 : Location Size
    uint16 : Dwarf RegNum
    uint16 : Reserved (expected to be 0)
    int32  : Offset or SmallConstant
  }
  uint32 : Padding (only if required to align to 8 byte)
  uint16 : Padding
  uint16 : NumLiveOuts
  LiveOuts[NumLiveOuts]
    uint16 : Dwarf RegNum
    uint8  : Reserved
    uint8  : Size in Bytes
  }
  uint32 : Padding (only if required to align to 8 byte)
}

The first byte of each location encodes a type that indicates how tointerpret the RegNum and Offset fields as follows:

Encoding	Type	Value	Description
0x1	Register	Reg	Value in a register
0x2	Direct	Reg + Offset	Frame index value
0x3	Indirect	[Reg + Offset]	Spilled value
0x4	Constant	Offset	Small constant
0x5	ConstIndex	Constants[Offset]	Large constant

In the common case, a value is available in a register, and theOffset field will be zero. Values spilled to the stack are encodedas Indirect locations. The runtime must load those values from astack address, typically in the form [BP + Offset]. If analloca value is passed directly to a stack map intrinsic, thenLLVM may fold the frame index into the stack map as an optimization toavoid allocating a register or stack slot. These frame indices will beencoded as Direct locations in the form BP + Offset. LLVM mayalso optimize constants by emitting them directly in the stack map,either in the Offset of a Constant location or in the constantpool, referred to by ConstantIndex locations.

At each callsite, a “liveout” register list is also recorded. Theseare the registers that are live across the stackmap and therefore mustbe saved by the runtime. This is an important optimization when thepatchpoint intrinsic is used with a calling convention that by defaultpreserves most registers as callee-save.

Each entry in the liveout register list contains a DWARF registernumber and size in bytes. The stackmap format deliberately omitsspecific subregister information. Instead the runtime must interpretthis information conservatively. For example, if the stackmap reportsone byte at %rax, then the value may be in either %al or%ah. It doesn’t matter in practice, because the runtime willsimply save %rax. However, if the stackmap reports 16 bytes at%ymm0, then the runtime can safely optimize by saving only%xmm0.

The stack map format is a contract between an LLVM SVN revision andthe runtime. It is currently experimental and may change in the shortterm, but minimizing the need to update the runtime isimportant. Consequently, the stack map design is motivated bysimplicity and extensibility. Compactness of the representation issecondary because the runtime is expected to parse the dataimmediately after compiling a module and encode the information in itsown format. Since the runtime controls the allocation of sections, itcan reuse the same stack map space for multiple modules.

Stackmap support is currently only implemented for 64-bitplatforms. However, a 32-bit implementation should be able to use thesame format with an insignificant amount of wasted space.

Stack Map Section

A JIT compiler can easily access this section by providing its ownmemory manager via the LLVM C APILLVMCreateSimpleMCJITMemoryManager(). When creating the memorymanager, the JIT provides a callback:LLVMMemoryManagerAllocateDataSectionCallback(). When LLVM createsthis section, it invokes the callback and passes the section name. TheJIT can record the in-memory address of the section at this time andlater parse it to recover the stack map data.

For MachO (e.g. on Darwin), the stack map section name is“llvm_stackmaps”. The segment name is “LLVM_STACKMAPS”.

For ELF (e.g. on Linux), the stack map section name is“.llvm_stackmaps”. The segment name is “__LLVM_STACKMAPS”.

Stack Map Usage

The stack map support described in this document can be used toprecisely determine the location of values at a specific position inthe code. LLVM does not maintain any mapping between those values andany higher-level entity. The runtime must be able to interpret thestack map record given only the ID, offset, and the order of thelocations, records, and functions, which LLVM preserves.

Note that this is quite different from the goal of debug information,which is a best-effort attempt to track the location of namedvariables at every instruction.

An important motivation for this design is to allow a runtime tocommandeer a stack frame when execution reaches an instruction addressassociated with a stack map. The runtime must be able to rebuild astack frame and resume program execution using the informationprovided by the stack map. For example, execution may resume in aninterpreter or a recompiled version of the same function.

This usage restricts LLVM optimization. Clearly, LLVM must not movestores across a stack map. However, loads must also be handledconservatively. If the load may trigger an exception, hoisting itabove a stack map could be invalid. For example, the runtime maydetermine that a load is safe to execute without a type check giventhe current state of the type system. If the type system changes whilesome activation of the load’s function exists on the stack, the loadbecomes unsafe. The runtime can prevent subsequent execution of thatload by immediately patching any stack map location that lies betweenthe current call site and the load (typically, the runtime wouldsimply patch all stack map locations to invalidate the function). Ifthe compiler had hoisted the load above the stack map, then theprogram could crash before the runtime could take back control.

To enforce these semantics, stackmap and patchpoint intrinsics areconsidered to potentially read and write all memory. This may limitoptimization more than some clients desire. This limitation may beavoided by marking the call site as “readonly”. In the future we mayalso allow meta-data to be added to the intrinsic call to expressaliasing, thereby allowing optimizations to hoist certain loads abovestack maps.

Direct Stack Map Entries

As shown in Stack Map Section, a Direct stack map locationrecords the address of frame index. This address is itself the valuethat the runtime requested. This differs from Indirect locations,which refer to a stack locations from which the requested values mustbe loaded. Direct locations can communicate the address if an alloca,while Indirect locations handle register spills.

For example:

entry:
  %a = alloca i64...
  llvm.experimental.stackmap(i64 <ID>, i32 <shadowBytes>, i64* %a)

The runtime can determine this alloca’s relative location on thestack immediately after compilation, or at any time thereafter. Thisdiffers from Register and Indirect locations, because the runtime canonly read the values in those locations when execution reaches theinstruction address of the stack map.

This functionality requires LLVM to treat entry-block allocasspecially when they are directly consumed by an intrinsics. (This isthe same requirement imposed by the llvm.gcroot intrinsic.) LLVMtransformations must not substitute the alloca with any interveningvalue. This can be verified by the runtime simply by checking that thestack map’s location is a Direct location type.

Supported Architectures

Support for StackMap generation and the related intrinsics requiressome code for each backend. Today, only a subset of LLVM’s backendsare supported. The currently supported architectures are X86_64,PowerPC, and Aarch64.