Design and Usage of the InAlloca Attribute

Design and Usage of the InAlloca Attribute

Introduction

The inalloca attribute is designed to allowtaking the address of an aggregate argument that is being passed byvalue through memory. Primarily, this feature is required forcompatibility with the Microsoft C++ ABI. Under that ABI, classinstances that are passed by value are constructed directly intoargument stack memory. Prior to the addition of inalloca, calls in LLVMwere indivisible instructions. There was no way to perform intermediatework, such as object construction, between the first stack adjustmentand the final control transfer. With inalloca, all arguments passed inmemory are modelled as a single alloca, which can be stored to prior tothe call. Unfortunately, this complicated feature comes with a largeset of restrictions designed to bound the lifetime of the argumentmemory around the call.

For now, it is recommended that frontends and optimizers avoid producingthis construct, primarily because it forces the use of a base pointer.This feature may grow in the future to allow general mid-leveloptimization, but for now, it should be regarded as less efficient thanpassing by value with a copy.

Intended Usage

The example below is the intended LLVM IR lowering for some C++ codethat passes two default-constructed Foo objects to g in the32-bit Microsoft C++ ABI.

// Foo is non-trivial.
struct Foo { int a, b; Foo(); ~Foo(); Foo(const Foo &); };
void g(Foo a, Foo b);
void f() {
  g(Foo(), Foo());
}

%struct.Foo = type { i32, i32 }
declare void @Foo_ctor(%struct.Foo* %this)
declare void @Foo_dtor(%struct.Foo* %this)
declare void @g(<{ %struct.Foo, %struct.Foo }>* inalloca %memargs)
 
define void @f() {
entry:
  %base = call i8* @llvm.stacksave()
  %memargs = alloca <{ %struct.Foo, %struct.Foo }>
  %b = getelementptr <{ %struct.Foo, %struct.Foo }>* %memargs, i32 1
  call void @Foo_ctor(%struct.Foo* %b)
 
  ; If a's ctor throws, we must destruct b.
  %a = getelementptr <{ %struct.Foo, %struct.Foo }>* %memargs, i32 0
  invoke void @Foo_ctor(%struct.Foo* %a)
      to label %invoke.cont unwind %invoke.unwind
 
invoke.cont:
  call void @g(<{ %struct.Foo, %struct.Foo }>* inalloca %memargs)
  call void @llvm.stackrestore(i8* %base)
  ...
 
invoke.unwind:
  call void @Foo_dtor(%struct.Foo* %b)
  call void @llvm.stackrestore(i8* %base)
  ...
}

To avoid stack leaks, the frontend saves the current stack pointer witha call to llvm.stacksave. Then, it allocates theargument stack space with alloca and calls the default constructor. Thedefault constructor could throw an exception, so the frontend has tocreate a landing pad. The frontend has to destroy the alreadyconstructed argument b before restoring the stack pointer. If theconstructor does not unwind, g is called. In the Microsoft C++ ABI,g will destroy its arguments, and then the stack is restored inf.

Design Considerations

Lifetime

The biggest design consideration for this feature is object lifetime.We cannot model the arguments as static allocas in the entry block,because all calls need to use the memory at the top of the stack to passarguments. We cannot vend pointers to that memory at function entrybecause after code generation they will alias.

The rule against allocas between argument allocations and the call siteavoids this problem, but it creates a cleanup problem. Cleanup andlifetime is handled explicitly with stack save and restore calls. Inthe future, we may want to introduce a new construct such as freeaor afree to make it clear that this stack adjusting cleanup is lesspowerful than a full stack save and restore.

Nested Calls and Copy Elision

We also want to be able to support copy elision into these argumentslots. This means we have to support multiple live argumentallocations.

Consider the evaluation of:

// Foo is non-trivial.
struct Foo { int a; Foo(); Foo(const &Foo); ~Foo(); };
Foo bar(Foo b);
int main() {
  bar(bar(Foo()));
}

In this case, we want to be able to elide copies into bar’s argumentslots. That means we need to have more than one set of argument framesactive at the same time. First, we need to allocate the frame for theouter call so we can pass it in as the hidden struct return pointer tothe middle call. Then we do the same for the middle call, allocating aframe and passing its address to Foo’s default constructor. Bywrapping the evaluation of the inner bar with stack save andrestore, we can have multiple overlapping active call frames.

Callee-cleanup Calling Conventions

Another wrinkle is the existence of callee-cleanup conventions. OnWindows, all methods and many other functions adjust the stack to clearthe memory used to pass their arguments. In some sense, this means thatthe allocas are automatically cleared by the call. However, LLVMinstead models this as a write of undef to all of the inalloca valuespassed to the call instead of a stack adjustment. Frontends shouldstill restore the stack pointer to avoid a stack leak.

Exceptions

There is also the possibility of an exception. If argument evaluationor copy construction throws an exception, the landing pad must docleanup, which includes adjusting the stack pointer to avoid a stackleak. This means the cleanup of the stack memory cannot be tied to thecall itself. There needs to be a separate IR-level instruction that canperform independent cleanup of arguments.

Efficiency

Eventually, it should be possible to generate efficient code for thisconstruct. In particular, using inalloca should not require a basepointer. If the backend can prove that all points in the CFG only haveone possible stack level, then it can address the stack directly fromthe stack pointer. While this is not yet implemented, the plan is thatthe inalloca attribute should not change much, but the frontend IRgeneration recommendations may change.