Many refactoring commands in c2rust refactor are designed to work only onselected portions of the crate, rather than affecting the entire crateuniformly. To support this, c2rust refactor has a mark system, whichallows marking AST nodes (such as functions, expressions, or type annotations)with simple string labels. Certain commands add or remove marks, while otherscheck the existing marks to identify nodes to transform.

For example, in a program containing several byte string literals, you can useselect to mark a specific one:

  1. select target 'item(B2); desc(expr);'

5.4. Marks tutorial - 图1

Then, you can use bytestr_to_str to change only the marked byte string to anordinary string literal, leaving the others unaffected:

  1. bytestr_to_str

5.4. Marks tutorial - 图2

This ability to limit transformations to specific parts of the program isuseful for refactoring a large codebase incrementally, on a module-by-module orfunction-by-function basis.

The remainder of this tutorial describes select and related mark-manipulationcommands. For details of how marks affect various transformation commands, seethe command documentation or read about themarked! pattern for rewrite_expr and otherpattern-matching commands.

Marks

A "mark" is a short string label that is associated with a node in the AST.Marks can be applied to nodes of most kinds, including items, expressions,patterns, type annotations, and so on. The mark string can be any valid Rustidentifier, though most commands that process marks use short words such astarget, dest, or new. It's possible to apply multiple distinct marks tothe same node, and it's also possible to mark children of marked nodesseparately from their parents (for example, to mark an expression and one ofits subexpressions).

Here are some examples.

  1. select target 'crate; desc(match_expr(2 + 2));'

5.4. Marks tutorial - 图3

The ▶ …◀ indicators in the diff showthat the expression 2 + 2 has been marked. Hover over the indicators formore details, such as the label of the added mark.

As mentioned above, most kinds of nodes can be marked, not only expressions.Here we mark a function, a pattern, and a type annotation:

  1. select a 'item(f);' ;
  2. select b 'item(g); desc(match_ty(i32));' ;
  3. select c 'item(g); desc(match_pat(Some(x)));' ;

5.4. Marks tutorial - 图4

As mentioned above, it's possible to mark the same node twice with differentlabels. (Marking it twice with the same label is no different from marking itonce.) Here's an example of marking a function multiple times:

  1. select a 'item(f);' ;
  2. select a 'item(f);' ;
  3. select b 'item(f);' ;

5.4. Marks tutorial - 图5

As you can see by hovering over the indicators, labels a and b were bothadded to the function f.

Marks on a node have no connection to marks on its parent or child nodes. Wecan, for example, mark an expression like 2 + 2, then separately mark itssubexpressions with either the same or different labels:

  1. select a 'item(f); desc(match_expr(2 + 2));' ;
  2. select a 'item(f); desc(match_expr(2)); first;' ;
  3. select b 'item(f); desc(match_expr(2)); last;' ;

5.4. Marks tutorial - 图6

Hovering over the mark indicators shows precisely what has happened: we markedboth 2 + 2 and the first 2 with the label a, and marked the second 2with the label b.

The select command

The select command provides a simple scripting language for applying marks tospecific nodes. The basic syntax of the command is:

  1. select LABEL SCRIPT

select runs a SCRIPT (written in the language described below) to obtain aset of AST nodes, then marks every node in the set with LABEL, which shouldbe a single identifier such as target.

More concretely, when running the script, select maintains a "currentselection", which is a set of AST nodes. Script operations (described below)can extend or modify the current selection. At the end of the script, selectmarks every node in the current selection with LABEL.

We next describe a few common select script patterns, followed by details onthe available operations and filters.

Common patterns

](https://c2rust.com/manual/c2rust-refactor/#common-patterns)[

Selecting an item by path

For items such as functions, type declarations, or traits, the item(path)operation selects the item by its path:

  1. select target 'item(f);' ;
  2. select target 'item(T);' ;
  3. select target 'item(S);' ;
  4. select target 'item(m::g);' ;

5.4. Marks tutorial - 图7

Note that this only works for the kinds of items that can be imported viause. It doesn't handle other kinds of item-like nodes, such as impl methods,which cannot be imported directly.

Selecting all nodes matching a filter

The operations crate; desc(filter); together select all nodes (or,equivalently, all descendants of the crate) that match a filter. For example,we can select all expressions matching the pattern 2 + 2 using a match_exprfilter:

  1. select target 'crate; desc(match_expr(2 + 2));'

5.4. Marks tutorial - 图8

Here we see that crate; desc(filter); can find matching items anywhere in thecrate: inside function bodies, constant declarations, and even inside thelength expression of an array type annotation.

Selecting filtered nodes inside a parent node

In the previous example, crate; desc(filter); is made up of two separatescript operations. crate selects the entire crate:

  1. select target 'crate;'

5.4. Marks tutorial - 图9

Then desc(filter) looks for descendants of selected nodes that matchfilter, and replaces the current selection with the nodes it finds:

  1. clear_marks ;
  2. select target 'crate; desc(match_expr(2 + 2));'

5.4. Marks tutorial - 图10

(Note: we use clear_marks here only for illustration purposes, to make thediff clearly show the changes between the old and new versions of our selectcommand.)

Combining desc with operations other than crate allows selectingdescendants of only specific nodes. For example, we can find expressionsmatching 2 + 2, but only within the function f:

  1. select target 'item(f); desc(match_expr(2 + 2));'

5.4. Marks tutorial - 图11

In a more complex example, we can use multiple desc calls to target anexpression inside of a specific method (recall that methods can't be selecteddirectly with item(path)). We first select the module containing the impl:

  1. select target 'item(m);'

5.4. Marks tutorial - 图12

Then we select the method of interest, using the name filter (describedbelow):

  1. clear_marks ;
  2. select target 'item(m); desc(name("f"));'

5.4. Marks tutorial - 图13

And finally, we select the expression inside the method:

  1. clear_marks ;
  2. select target 'item(m); desc(name("f")); desc(match_expr(2 + 2));'

5.4. Marks tutorial - 图14

Combined with some additional filters described below, this approach is quiteeffective for marking nodes that can't be named with an ordinary import path,such as impl methods or items nested inside functions.

Script operations

A select script can consist of any number of operations, which will be run inorder to completion. (There is no control flow in select scripts.) Eachoperation ends with a semicolon, much like Rust statements.

The remainder of this section documents each script operation.

crate

crate (which takes no arguments) adds the root node of the entire crate tothe current selection. All functions, modules, and other declarations aredescendants of this single root node.

Example:

  1. select target 'crate;'

5.4. Marks tutorial - 图15

item

item(p) adds the item identified by the path p to the current selection.The provided path is handled like in Rust's use declarations (except thatonly plain paths are supported, not wildcards or curly-braced blocks).

  1. select target 'item(m::S);'

5.4. Marks tutorial - 图16

Because the item operation only adds to the current selection (as opposed toreplacing the current selection with a set containing only the identifieditem), we can run item multiple times to select several different items atonce:

  1. select target 'item(f); item(m::S); item(m);'

5.4. Marks tutorial - 图17

child

child(f) checks each child of each currently selected node against the filterf, and replaces the current selection with the set of matching children.

This can be used, for example, to select a static's type annotation withoutselecting type annotations that appear inside its initializer:

  1. select target 'item(S); child(kind(ty));'

5.4. Marks tutorial - 图18

To illustrate how this works, here is the AST for the static S item:

  • item static S
    • identifier S (the name of the static)
    • type i32 (the type annotation of the static)
    • expression 123_u8 as i32 (the initializer of the static)
      • expression 123_u8 (the input of the cast expression)
      • type i32 (the target type of the cast expression)

The static's type annotation is a direct child of the static (and haskind ty, matching the kind(ty) filter), so the type annotation is selectedby the example command above. The target type for the cast is not a directchild of the static - rather, it's a child of the initializer expression, whichis a child of the static - so it is ignored.

desc

desc(f) ("descendant") checks each descendant of each currently selected nodeagainst the filter f, and replaces the current selection with the set ofmatching descendants. This is similar to child, but checks for matchingdescendants at any depth, not only matching direct children.

Using the same example as for child, we see that descselects more nodes:

  1. select target 'item(S); desc(kind(ty));'

5.4. Marks tutorial - 图19

Specifically, it selects both the type annotation of the static and thetarget type of the cast expression, as both are descendants of the static(though at different depths). Of course, it still does not select the typeannotation of the const C, which is not a descendant of static S at anydepth.

Note that desc only considers the strict descendants of marked nodes - thatis, it does not consider a node to be a "depth-zero" descendant of itself. So,for example, the following command selects nothing:

  1. select target 'item(S); desc(item_kind(static));'

5.4. Marks tutorial - 图20

S itself is a static, but contains no additional statics inside of it, anddesc does not consider S itself when looking for item_kind(static)descendants.

filter

filter(f) checks each currently selected node against the filter f, andreplaces the current selection with the set of matching nodes. Equivalently,filter(f) removes from the current selection any nodes that don't match f.

Most uses of the filter operation can be replaced by passing a moreappropriate filter expression to desc or child, so the examples in thissection are somewhat contrived. (filter can still be useful in combinationwith marked, described below, or in more complex select scripts.)

Here is a slightly roundabout way to select all items named f. First, weselect all items:

  1. select target 'crate; desc(kind(item));'

5.4. Marks tutorial - 图21

Then, we use filter to keep only items named f:

  1. clear_marks ;
  2. select target 'crate; desc(kind(item)); filter(name("f"));'

5.4. Marks tutorial - 图22

With this command, only descendants of crate matching both filters kind(item)and name("f") are selected. (This could be written more simply as crate; desc(kind(item) && name("f"));.)

first and last

first replaces the current selection with a set containing only the firstselected node. last does the same with the last selected node. "First" and"last" are determined by a postorder traversal of the AST, so sibling nodes areordered as expected, and a parent node come "after" all of its children.

The first and last operations are most useful for finding places to insertnew nodes (such as with the create_item command)while ignoring details such as the specific names or kinds of the nodes aroundthe insertion point. For example, we can use last to easily select the lastitem in a module. First, we select all the module's items:

  1. select target 'item(m); child(kind(item));'

5.4. Marks tutorial - 图23

Then we use last to select only the last such child:

  1. clear_marks ;
  2. select target 'item(m); child(kind(item)); last;'

5.4. Marks tutorial - 图24

Now we could use create_item to insert a new itemafter the last existing one.

marked

marked(l) adds all nodes marked with label l to the current selection.This is useful for more complex marking operations, since (together with thedelete_marks command) it allows using temporary marks to manipulate multiplesets of nodes simultaneously.

For example, suppose we wish to select both the first and the last item in amodule. Normally, this would require duplicating the select command, sinceboth first and last replace the entire current selection with the singlefirst or last item. This would be undesirable if the operations for setting upthe initial set of items were fairly complex. But with marked, we can savethe selection before running first and restore it afterward.

We begin by selecting all items in the module and saving that selection bymarking it with the tmp_all_items label:

  1. select tmp_all_items 'item(m); child(kind(item));'

5.4. Marks tutorial - 图25

Next, we use marked to retrieve the tmp_all_items set and take the firstitem from it. This reduces the current selection to only a single item, butthe tmp_all_items marks remain intact for later use.

  1. select target 'marked(tmp_all_items); first;'

5.4. Marks tutorial - 图26

We do the same to mark the last item with target:

  1. select target 'marked(tmp_all_items); last;'

5.4. Marks tutorial - 图27

Finally, we clean up, removing the tmp_all_items marks using thedelete_marks command:

  1. delete_marks tmp_all_items

5.4. Marks tutorial - 图28

Now the only marks remaining are the target marks on the first and last itemsof the module, as we originally intended.

reset

reset clears the set of marked nodes. This is only useful in combinationwith mark and unmark, as otherwise the operations before a reset have noeffect.

mark and unmark

These operations allow select scripts to manipulate marks directly, ratherthan relying solely on the automatic marking of selected nodes at the end ofthe script. mark(l) marks all nodes in the current selection with label l(immediately, rather than waiting until the select command is finished), andunmark(l) removes label l from all selected nodes.

mark, unmark, and reset can be used to effectively combine multipleselect commands in a single script. Here's the "first and last" example fromthe marked section, using only a single select command:

  1. select _dummy '
  2. item(m); child(kind(item)); mark(tmp_all_items); reset;
  3. marked(tmp_all_items); first; mark(target); reset;
  4. marked(tmp_all_items); last; mark(target); reset;
  5. marked(tmp_all_items); unmark(tmp_all_items); reset;
  6. '

5.4. Marks tutorial - 图29

Note that we pass _dummy as the LABEL argument of select, since thedesired target marks are applied using the mark operation, rather thanrelying on the implicit marking done by select.

unmark is also useful in combination with marked to interface withnon-select mark manipulation commands. For example, suppose we want to markall occurrences of 2 + 2 that are passed as arguments to a function f. Oneoption is to do this using the mark_arg_uses command, with additionalprocessing by select before and after. Here we start by marking the functionf:

  1. select target 'item(f);'

5.4. Marks tutorial - 图30

Next, we run mark_arg_uses to replace the mark on f with a mark on eachargument expression passed to f:

  1. mark_arg_uses 0 target

5.4. Marks tutorial - 图31

And finally, we use select again to mark only those arguments that match 2 + 2:

  1. select target 'marked(target); unmark(target); filter(match_expr(2 + 2));'

5.4. Marks tutorial - 图32

Beginning the script with marked(target); unmark(target); copies the set oftarget-marked nodes into the current selection, then removes the existingmarks. The remainder of the script can then operate as usual, manipulatingonly the current selection with no need to worry about additional marks beingalready present.

Filters

](https://c2rust.com/manual/c2rust-refactor/#filters)[

Boolean operators

Filter expressions can be combined using the boolean operators &&, ||, and!. A node matches the filter f1 && f2 only if it matches f1 and alsomatches f2, and so on.

kind

kind(k) matches AST nodes whose node kind is k. The supported node kindsare:

  • item - a top-level item, as in struct Foo { … } or fn foo() { … }.Includes both items in modules and items defined inside functions or otherblocks, but does not include "item-like" nodes inside traits, impls, orextern blocks.
  • trait_item - an item inside a trait definition, such as a method orassociated type declaration
  • impl_item - an item inside an impl block, such as a method or associatedtype definition
  • foreign_item - an item inside an extern block ("foreign module"), suchas a C function or static declaration
  • stmt
  • expr
  • pat - a pattern, including single-ident patterns like foo in let foo = …;
  • ty - a type annotation, such as Foo in let x: Foo = …;
  • arg - a function or method argument declaration
  • field - a struct, enum variant, or union field declaration
  • itemlike - matches nodes whose kind is any of item, trait_item,impl_item, or foreign_item
  • any - matches any node

The node kind k can be used alone as shorthand for kind(k). For example,the operation desc(item); is the same as desc(kind(item));.

item_kind

item_kind(k) matches itemlike AST nodes whose subkind is k. The itemlikesubkinds are:

  • extern_crate
  • use
  • static
  • const
  • fn
  • mod
  • foreign_mod
  • global_asm
  • ty - type alias definition, as in type Foo = Bar;
  • existential - existential type definition, as in existential type Foo: Bar;. Note that existential types are currently an unstable languagefeature.
  • enum
  • struct
  • union
  • trait - ordinary trait Foo { … } definition, including unsafe trait
  • trait_alias - trait alias definition, as in trait Foo = Bar;Note that trait aliases are currently an unstable language feature.
  • impl - including both trait and inherent impls
  • mac - macro invocation. Note that select works on the macro-expandedAST, so macro invocations are never present under normal circumstances.
  • macro_def - 2.0/decl_macro-style macro definition, as in macro foo(…) { … }. Note that 2.0-style macro definitions are currently an unstablelanguage feature.

Note that a single item_kind filter can match multiple distinct node kinds,as long as the subkind is correct. for example, item_kind(fn) will matchfn items, method trait_items and impl_items, and fn declarationsinside extern blocks (foreign_items). similarly, item_kind(ty) matchesordinary type alias definitions, associated type declarations (in traits) anddefinitions (in impls), and foreign type declarations inside extern blocks.

item_kind filters match only those nodes that also match kind(itemlike), asother node kinds have no itemlike subkind.

The itemlike subkind k can be used alone as shorthand for item_kind(k).For example, the operation desc(fn); is the same as desc(item_kind(fn));.

pub and mut

pub matches any item, impl item, or foreign item whose visibility is pub.It currently does not support struct fields, even though they can also bedeclared pub.

mut matches static mut items, static mut foreign item declarations, andmutable binding patterns such as the mut foo in let mut foo = …;.

name

name(re) matches itemlikes, arguments, and fields whose name matches theregular expression re. For example, name("[fF].*") matches fn f() { … }and struct Foo { … }, but not trait Bar { … }. It currently does notsupport general binding patterns, aside from those in function arguments.

path and path_prefix

path(p) matches itemlikes and enum variants whose absolute path is p.

path_prefix(n, p) is similar to path(p), but drops the last n segmentsof the node's path before comparing to p.

has_attr

has_attr(a) matches itemlikes, exprs, and field declarations that have anattribute named a.

match_*

match_expr(e) uses rewrite_expr-style AST matchingto compare exprs to e, and matches any node where AST matching succeeds. Forexample, match_expr(__e + 1) matches the expressions 1 + 1, x + 1, andf() + 1, but not 2 + 2.

match_pat, match_ty, and match_stmt are similar, but operate on pat, ty,and stmt nodes respectively.

marked

marked(l) matches nodes that are marked with the label l.

any_child, all_child, any_desc, and all_desc

any_child(f) matches nodes that have a child that matches f.all_child(f) matches nodes where all children of the node match f.

any_desc and all_desc are similar, but consider all descendants instead ofonly direct children.

Other commands

In addition to select, c2rust refactor contains a number of othermark-manipulation commands. A few of these can be replicated with appropriateselect scripts (though using the command is typically easier), but some aremore complex.

copy_marks

copy_marks OLD NEW adds a mark with label NEW to every node currentlymarked with OLD.

delete_marks

delete_marks OLD removes the label OLD from every node that is currentlymarked with it.

rename_marks

rename_marks OLD NEW behaves like copy_marks OLD NEW followed bydelete_marks OLD: it adds a mark with label NEW to every node marked withOLD, then removes OLD from each such node.

mark_uses

mark_uses LABEL transfers LABEL marks from definitions to uses. That is,it finds each definition marked with LABEL, marks each use of such adefinition with LABEL, then removes LABEL from the definitions. Forexample, if a static FOO: … = … is marked with target, then mark_uses target will add a target mark to every expression FOO that references themarked definition and then remove target from FOO itself.

For the purposes of this command, a "use" of a definition is a path oridentifier that resolves to that definition. This includes expressions(both paths and struct literals), patterns (paths to constants, structs, andenum variants), and type annotations. When a function definition is marked,only the function path itself (the foo::bar in foo::bar(x)) is considered ause, not the entire call expression. Method calls (whether using dotted orUFCS syntax) normally can't be handled at all, as their resolution is"type-dependent" (however, the mark_callers command can sometimes work whenmark_uses does not).

mark_callers

mark_callers LABEL transfers LABEL marks from function or methoddefinitions to uses. That is, it works like mark_uses, but is specialized tofunctions and methods. mark_callers uses more a more sophisticated means ofname resolution that allows it to detect uses via type-dependent method paths,which mark_uses cannot handle.

For purposes of mark_callers, a "use" is a function call (foo::bar()) ormethod call (x.foo()) expression where the function or method being called isone of the marked definitons.

mark_arg_uses

mark_arg_uses INDEX LABEL transfers LABEL marks from function or methoddefinitions to the argument in position INDEX at each use. That is, it workslike mark_callers, but marks the expression passed as argument INDEXinstead of the entire call site.

INDEX is zero-based. However, the self/receiver argument of a method callcounts as the first argument (index 0), with the first argument in parentheseshaving index 1 (arg0.f(arg1, arg2)). For ordinary function calls (includingUFCS method calls), the first argument has index 0 (f(arg0, arg1, arg2))