Design
Implementation
- Polymorphic signatures
- Monomorphization
Annotations
Variant Groups
The "Collection Hack"

The analysis::ownership module implements a pointer analysis for inferringownership information in code using raw pointers. The goal is to take codethat has been automatically translated from C, and thus uses only raw pointers,and infer which of those raw pointers should be changed to safe &, &mut, orBox pointers. Pointers can appear in a number of places in the inputprogram, but this analysis focuses mainly on function signatures and structfield types.

Design

The goal of the analysis is to assign to each raw pointer type constructor apermission value, one of READ, WRITE, and MOVE, corresponding to the Rustpointer types &, &mut, and Box. These permissions form a triviallattice, where READ < WRITE < MOVE. The READ permission indicates that thepointed-to data may be read, the WRITE permission indicates that the pointed-todata may be modified, and the MOVE permission indicates that the pointed-todata may be "moved", or consumed in a linear-typed fashion. The MOVEpermission also includes the ability to free the pointed-to data, which amounsto "moving to nowhere".

Here is a simple example to illustrate the major features of the analysis:

struct Array {
    data: *mut i32,
}
unsafe fn new_array(len: usize) -> *mut Array {
    let data = malloc(size_of::<i32>() * len);
    let arr = malloc(size_of::<Array>());
    (*arr).data = data;
    array
}
unsafe fn delete_array(arr: *mut Array) {
    free((*arr).data);
    free(arr);
}
unsafe fn element_ptr(arr: *mut Array, idx: usize) -> *mut i32 {
    (*arr).data.offset(idx)
}
unsafe fn get(arr: *mut Array, idx: usize) -> i32 {
    let elt: *mut i32 = element_ptr(arr, idx);
    *elt
}
unsafe fn set(arr: *mut Array, idx: usize, val: i32) {
    let elt: *mut i32 = element_ptr(arr, idx);
    *elt = val;
}

The analysis infers pointer permissions by observing how pointers are used, andapplying the rules of the Rust reference model. For instance, the setfunction's elt pointer must have permission WRITE (or higher), because thereis a write to the pointed-to data. Similarly, delete_array's first call tofree requires that the pointer in the Array::data field must havepermission MOVE. Furthermore, the first free also requires arr to havepermission MOVE, because consuming the pointer (arr).data constitutes a moveout of arr. (In general, the pointer permission sets an upper bound on thepermissions of all pointers within the pointed-to data. For example, if arrhas permission READ, then (arr).data can only be read, not written ormoved.)

The element_ptr function presents an interesting case for analysis, becauseit is used polymorphically: in get, we would like element_ptr to take aREAD mut Array and return a READ mut i32, whereas in set we would likethe same function to take and return WRITE pointers. In strictly const-correctC code, get and set would respectively call separate const and non-constvariants of element_ptr, but a great deal of C code is not const-correct.

This analysis handles functions like elementptr by allowing inferredfunction signatures to be _permission polymorphic. Signatures may includepermission parameters, which can be instantiated separately at each call site,subject to a set of constraints. For example, here is the inferred polymorphicsignature of element_ptr, with permission annotations written in comments(since there is no Rust syntax for them):

fn element_ptr /* <s0, s1> */ (arr: /* s0 */ *mut Array,
                               idx: usize)
                               -> /* s1 */ *mut i32
    /* where s1 <= s0 */;

The function has two permission parameters, s0 and s1, which are thepermissions of the argument and return pointers respectively. The signatureincludes the constraint s1 <= s0, indicating that the output pointer'spermission is no higher than that of the input pointer. The function is calledin get with permission arguments s0 = s1 = READ and in set with s0 = s1 = WRITE.

Rust does not support any analogue of the permission polymorphism used in thisanalysis. To make the results useful in actual Rust code, the analysisincludes a monomorphization step, which chooses a set of concreteinstantiations for each polymorphic function, and selects an instantiation touse for each call site. In the example above, element_ptr would have bothREAD, READ and WRITE, WRITE instantiations, with the first being used forthe callsite in get and the second at the callsite in set.

Implementation

The analysis first computes a polymorphic signature for each function, thenmonomorphizes to produce functions that can be handled by Rust's type system.

Both parts of the analysis operate on constraint sets, which containconstraints of the form p1 <= p2. The permissions p1, p2 can be concretepermissions (READ, WRITE, MOVE), permission variables, or expressions of theform min(p1, p2) denoting the less-permissive of two permission values.

Permission variables appear on pointer type constructors in the types of staticvariables and struct fields ("static" variables), in the types within functionsignatures ("sig"), in the types of temporaries and local variables ("local"),and at callsites for instantiating a permission polymorphic function ("inst").Variables are marked with their origin, as variable from different locationsare handled in different phases of the analysis.

The overall goal of the analysis is to produce assignments to static and sigvariables that satisfy all the relevant constraints (or multiple assignments,when monomorphizing polymorphic functions).

Polymorphic signatures

The permission variables of each function's polymorphic signature are easilydetermined: for simplicity, the analysis introduces one variable for eachoccurrence of a pointer type constructor in the function signature. Cases thatmight otherwise involve a single variable appearing at multiple locations inthe signature are instead handled by adding constraints between the variables.The main task of the first part of the analysis is to compute the constraintsover the signature variables of each function. This part of the analysis mustalso build an assignment of permission values to all static vars, which are notinvolved in any sort of polymorphism.

Constraints arise mainly at assignments and function call expressions.

At assignments, the main constraint is that, if the assigned value has apointer type, the permission on the LHS pointer type must be no greater thanthe permission on the RHS pointer type (lhs <= rhs). In other words, anassignment of a pointer may downgrade the permission value of that pointer, butmay never upgrade it. In non-pointer types, and in the pointed-to type of anoutermost pointer type, all permission values occurring in the two types mustbe equal (lhs <= rhs and rhs <= lhs).

Assignments also introduce two additional constraints, both relating to pathpermissions. The path permission for an expression is the minimum of thepermission values on all pointers dereferenced in the expression. For example,in (x).f, the path permission is the minimum of the permission on the localvariable x and the permission on the struct field f. The calculation ofpath permissions reflects the transitive nature of access restrictions in Rust:for example, if a struct field x.f has type &mut T, but x is an immutablereference (&S), then only immutable access is allowed to *x.f.

The two additional constraints introduced by assigments are (1) the pathpermission of the LHS must be no lower than WRITE, and (2) the path permissionof the RHS must be no lower than the permission of the LHS pointer type.Constraint (1) prevents writing through a READ pointer, or through any pathcontaining a READ pointer. Constraint (2) prevents assigning a WRITE pointeraccessed through a READ path (or a MOVE pointer accessed through a WRITE orREAD path) to a WRITE pointer variable, which would allow bypassing the READrestriction.

Function calls require additional work. At each call site, the analysiscopies in the callee's constraints, substituting a fresh "instantiation"("inst") variable for each variable in the callee's signature. It then linksthe new inst variables to the surrounding locals by processing a"pseudo-assignment" from each argument expression to the corresponding formalparameter type in the substituted signature, and from the return type to thelvalue expression where the result is to be stored. The effect is to allow theanalysis to "reason through" the function call, relating the (local) returnvalue to the caller's argument expressions. Copying the constraints instead ofrelying on a concrete instantiation permits precise reasoning about polymorphicfunctions that call other polymorphic functions.

The final step for each function is to simplify the constraint set byeliminating "local", "inst", and "static" permission variables. Localvariables have no connection to types outside the current function, and can besimplified away without consequence. Eliminating static and instantiationvariables requires fixed-point iteration, which is described below. The resultof the simplification is a set of constraints over only the function's sigvariables, which is suitable for use as the constraint portion of the functionsignature.

Since each function's signature depends on the signatures of its callees, andfunctions may be recursive, a fixed-point iteration step is required to computethe final constraint set for each function. To simplify the implementation,the polymorphic signature construction part of the analysis is split into twophases. The intraprocedural phase visits every function once and generatesconstraints for that function, but doesn't copy in constraints from callees,which may not have been processed yet. This phase records details of each callsite for later use. The intraprocedural phase eliminates local variables atthe end of each function, but it does not have enough information to safelyeliminate static and inst variables. The interprocedural phase updates eachfunction in turn, substituting in callees' sig constraints and simplifying awaystatic and inst variables to produce a new, more accurate set of sigconstraints for the current function, and iterates until it reaches a fixedpoint. The interprocedural phase also computes an assignment of concretepermission values to static variables, during the process of removing staticvariables from functions' constraint sets.

Monomorphization

The first part of the analysis infers a permission polymorphic signature foreach function, but Rust does not support this form of polymorphism. To makethe analysis results applicable to actual Rust code, the analysis must provideenough information to allow monomorphizing functions - that is, producingmultiple copies of each function with different concrete instantiations of thepermission variables.

Monomorphization begins by collecting all "useful" monomorphic signatures foreach function. The analysis identifies all signature variables that appear inoutput positions (in the return type, or behind a pointer whose permissionvalue is always at least WRITE), then enumerates all assignments to thoseoutput variables that are allowed by the function's constraints. For eachcombination of outputs, it finds the least-restrictive valid assignment ofpermissions to the remaining (input) variables. For example, given thisfunction:

fn element_ptr /* <s0, s1> */ (arr: /* s0 */ *mut Array,
                               idx: usize)
                               -> /* s1 */ *mut i32
    /* where s1 <= s0 */;

The only output variable is s1, which appears in the return type. Themonomorphization step will try each assignment to s1 that is allowed by theconstraints. Since the only constraint is s1 <= s0, READ, WRITE, andMOVE are all valid. For each of these, it finds the least restrictiveassignment to s0 that is compatible with the assignment to s0. Forexample, when s1 = MOVE, only s0 = MOVE is valid, so the analysis recordsMOVE, MOVE as a monomorphization for the element_ptr function. When s1 = WRITE, both s0 = MOVE and s0 = WRITE satisfy the constraints, but s0 = WRITE is less restrictive - it allows calling the function with both MOVEand WRITE pointers, while setting s0 = MOVE allows only MOVE pointers.So the analysis records arguments WRITE, WRITE as another monomorphization,and by similar logic records READ, READ as the final one.

The next step of monomorphization is to select a monomorphic variant to call ateach callsite of each monomorphized function. Given a pair of functions:

fn f /* <s0, s1> */ (arr: /* s0 */ *mut Array) -> /* s1 */ *mut i32
        /* where s1 <= s0 */ {
    g(arr)
}
fn g /* <s0, s1> */ (arr: /* s0 */ *mut Array) -> /* s1 */ *mut i32
        /* where s1 <= s0 */ {
    ...
}

For pointer permissions to line up properly, a monomorphic variant of fspecialized to READ, READ will need to call a variant of g also specializedto READ, READ, and a variant of f specialized to WRITE, WRITE will needto call a WRITE, WRITE variant of g.

To infer this information, the analysis separately considers each monomorphicsignature of each function. It performs a backtracking search to select, foreach callsite in the function, a monomorphic signature of the callee, such thatall of the calling function's constraints are satisfied, including constraintssetting the caller's sig variables equal to the concrete permissions in themonomorphic signature. The table of callee monomorphization selections isincluded in the analysis results so that callsites can be updated appropriatelywhen splitting functions for monomorphization.

Annotations

The ownership analysis supports annotations to specify the permission types offunctions and struct fields. These annotations serve two purposes. First, theuser can annotate functions to provide custom signatures for functions on whichthe analysis produces inaccurate results. Signatures provided this way will bepropagated throughout the analysis, so manually correcting a singlewrongly-inferred function can fix the inference results for its callers aswell. Second, the ownership system provides an ownership_annotate commandthat adds annotations to functions reflecting their inferred signatures. Theuser can then read the generated annotations to check the analysis results,and optionally edit them to improve precision, before proceeding with furthercode transformations.

There are four annotation types currently supported by the ownership system.

#[ownership_static(<perms>)] provides concrete permission values for allpointer types in a static declaration or struct field. The perms argumentis a comma-separated sequence of concrete permission tokens (READ, WRITE,MOVE). The given permission values will be applied to the pointers in thestatic or field type, following a preorder traversal of the type. Forexample:

struct S {
    #[ownership_static(READ, WRITE, MOVE)]
    f: *mut (*mut u8, *mut u16)
}

Here the outermost pointer will be given permission READ, the pointer tou8 will be given permission WRITE, and the pointer to u16 will be givenpermission MOVE.

#[ownership_constraints(<constraints>) provides the signature constraintsfor the annotated function, overriding polymorphic signature inference. Theargument constraints is a comma-separated sequence of constraints of theform le(<perm1>, <perm2>), each representing a single constraint perm1 <= perm2. The permissions used in each constraint may be any combination ofconcrete permissions (READ, WRITE, MOVE), permission variables (_0,_1, …), or expressions of the form min(p1, p2, …). (The permissionsyntax is limited by the requirement for compatibility with Rust's attributesyntax.)

The permission variables used in constraints always refer to signaturevariables of the annotated function. A signature variable is introduced foreach pointer type constructor in the function's signature, and they arenumbered according to a preorder traversal of each node in the argument andreturn types of the function. This example shows location of each variablein a simple signature:

fn get_err(arr: /* _0 */ *mut Array,
           element_out: /* _1 */ *mut /* _2 */ *mut i32)
           -> /* _3 */ *const c_char;

#[ownership_mono(<suffix>, <perms>)] supplies a monomorphic signature to beused for the annotated function. The suffix argument is a quoted string,which (if non-empty) will be used when splitting polymorphic functions intomonomorphic variants to construct a name for the monomorphized copy of thefunction. The perms argument is a comma-separated list of concretepermission tokens, giving the permissions to be used in the functionsignature in this monomorphization.

The ownership_mono annotation can appear multiple times on a singlefunction to provide multiple monomorphic signatures. However, if it appearsat all, monomorphization inference will be completely overriden for theannotated function, and only the provided signatures will be used in calleeargument inference and later transformations.

Example:

#[ownership_mono("mut", WRITE, WRITE)]
#[ownership_mono("", READ, READ)]
fn first(arr: *mut Array) -> *mut i32;

This function will have two monomorphic variants, one where both pointers'permission values are WRITE and one where both are READ. When theownership_split_variants command splits the function into its monomorphicvariants, the WRITE variant will be named first_mut and the READvariant will keep the original name first.

#[ownership_variant_of(<name>)] is used to combine source-level functionsinto variant groups. See the section on variant groups for details.

Variant Groups

The "variant group" mechanism allows combining several source-level functionsinto a single logical function for purposes of the analysis. This is usefulfor combining a function that was previously split into monomorphic variantsback into a single logical function. This allows for a sort of "modularrefactoring", in which the user focuses on one module at a time, analyzing,annotating, and splitting variants in only that module before moving on toanother.

As a concrete example of the purpose of this feature, consider the followingcode:

fn f(arr: *mut Array) -> *mut i32 { ... g(arr) ... }
fn g(arr: *mut Array) -> *mut i32 { ... }

The user works first on (the module containing) g, resulting in splitting ginto two variants:

fn f(arr: *mut Array) -> *mut i32 { ... g_mut(arr) ... }
fn g(arr: *mut Array) -> *mut i32 { ... }
fn g_mut(arr: *mut Array) -> *mut i32 { ... }

Note that, because there is still only one variant of f, the transformationmust choose a single g variant for f to call. In this case, it chose theg_mut variant.

Later, the user works on f. If g and g_mut are treated as separatefunctions, then there are two possibilities. First, if the constraints ong_mut are set up (or inferred) to require WRITE permission for arr, thenonly a WRITE variant of f will be generated. Or second, if the constraintsare relaxed, then f may get both READ and WRITE variants, but both will(wrongly) call g_mut.

Treating g and g_mut as two variants of a single function allows theanalysis to switch between g variants in the different variants of f,resulting in correct code like the following:

fn f(arr: *mut Array) -> *mut i32 { ... g(arr) ... }
fn f_mut(arr: *mut Array) -> *mut i32 { ... g_mut(arr) ... }
fn g(arr: *mut Array) -> *mut i32 { ... }
fn g_mut(arr: *mut Array) -> *mut i32 { ... }

The ownership_split_variants automatically annotates the split functions sothey will be combined into a variant group during further analysis. Variantgroups can also be constructed manually using the#[ownership_variant_of(<name>)] annotation, where name is an arbitraryquoted string. All source-level functions bearing an ownership_variant_ofannotation with the same name will form a single variant group, which will betreated as a single function throughout the analysis. However, signatureinference for the variants themselves is not well supported. Thus, eachvariant must have an ownership_mono annotation, and exactly one function ineach variant group must also have an ownership_constraints annotation.Together, these provide enough information that inference is not required.Note that unlike non-variant functions, variants may not have multipleownership_mono annotations, as each variant is expected to correspond to asingle monomorphization of the original function.

The "Collection Hack"

The analysis as described so far tries to mimic the Rust ownership model asimplemented in the Rust compiler. However, collection data structures in Rustoften use unsafe code to bypass parts of the ownership model. A particularlycommon case is in removal methods, such as Vec::pop:

impl<T> Vec<T> {
    fn pop(&mut self) -> Option<T> { ... }
}

This method moves a T out of self's internal storage, but only takes selfby mutable reference. Under the "normal" rules, this is impossible, and theanalysis described above will infer a stricter signature for the raw pointerequivalent:

fn pop(this: /* MOVE */ *mut Vec) -> /* MOVE */ *mut c_void { ... }

The analysis as implemented includes a small adjustment (the "collection hack")to let it infer the correct signature for such methods.

The collection hack is this: when handling a pointer assignment, instead ofconstraining the path permission of the RHS to be at least the permission ofthe LHS, we constraint it to be at least min(lhs_perm, WRITE). The result isthat it becomes possible to move a MOVE pointer out of a struct when onlyWRITE permission is available for the pointer to that struct. Then theanalysis will infer the correct type for pop:

fn pop(this: /* WRITE */ *mut Vec) -> /* MOVE */ *mut c_void { ... }