Macros!

CIS 198 Lecture 13


What Are Macros?

  • In C, a macro looks like this:

    1. #define FOO 10 // untyped integral constant
    2. #define SUB(x, y) ((x) - (y)) // parentheses are important!
    3. #define BAZ a // relies on there being an `a` in context!
    4. int a = FOO;
    5. short b = FOO;
    6. int c = -SUB(2, 3 + 4);
    7. int d = BAZ;
  • The preprocessor runs before the compiler, producing:

    1. int a = 10; // = 10
    2. short b = 10; // = 10
    3. int c = -((2) - (3 + 4)); // = -(2 - 7) = 5
    4. int d = a; // = 10

Why C Macros Suck¹

  • C does a direct token-level substitution.

    • The preprocessor has no idea what variables, types, operators, numbers, or anything else actually mean.
  • Say we had defined SUB like this:

    1. #define SUB(x, y) x - y
    2. int c = -SUB(2, 3 + 4);
  • This would break terribly! After preprocessing:

    1. int c = -2 - 3 + 4; // = -1, not 5.

¹ GCC Docs: Macro Pitfalls


Why C Macros Suck

  • Further suppose we decided to rename a:

    1. #define FOO 10
    2. #define BAZ a // relies on there being an `a` in context!
    3. int a_smith = FOO;
    4. int d = BAZ;
  • Now, the preprocessor produces garbage!

    1. int a_smith = 10; // = 10
    2. int d = a; // error: `a` is undeclared

Why C Macros Suck

  • Since tokens are substituted directly, results can be surprising:

    1. #define SWAP(x, y) do { \
    2. (x) = (x) ^ (y); \
    3. (y) = (x) ^ (y); \
    4. (x) = (x) ^ (y); \
    5. } while (0) // Also, creating multiple statements is weird.
    6. int x = 10;
    7. SWAP(x, x); // `x` is now 0 instead of 10
  • And arguments can be executed multiple times:

    1. #define DOUBLE(x) ((x) + (x))
    2. int x = DOUBLE(foo()); // `foo` gets called twice

Why C Macros Suck

  • C macros also can’t be recursive:

    1. #define foo (4 + foo)
    2. int x = foo;
  • This expands to

    1. int x = 4 + foo;
    • (This particular example is silly, but recursion is useful.)

Why C Macros Suck

  • In C, macros are also used to include headers (to use code from other files):

    1. #include <stdio.h>
    • Since this just dumps stdio.h into this file, each file now gets bigger and bigger with additional includes.
    • This is a major contributor to long build times in C/C++ (especially in older compilers).

Rust Macros from the Bottom Up


Rust Syntax Extensions

  • Rust has a generalized system called syntax extensions. Anytime you see one of these, it means a syntax extension is in use:

    • #[foo] and #![foo]
    • foo! arg
      • Always foo!(...), foo![...], or foo!{...}
      • Sometimes means foo is a macro.
    • foo! arg arg
      • Used only by macro_rules! name { definition }
  • The third form is the one used by macros, which are a special type of syntax extension - defined within a Rust program.

  • These can also be implemented by compiler plugins, which have much more power than macros.


Rust Macros

  • A Rust macro looks like this:

    1. macro_rules! incr { // define the macro
    2. // syntax pattern => replacement code
    3. ( $x:ident ) => { $x += 1; };
    4. }
    5. let mut x = 0;
    6. incr!(x); // invoke a syntax extension (or macro)
  • So… this is totally foreign. The heck’s going on?


Rust Syntax - Token Streams

  • Before we dive in, we need to know a little about Rust’s lexer and parser.

    • A lexer is a compiler stage which turns the original source (a string) into a stream of tokens.
    • A string of code like ((2 + 3) + 10) will turn into a stream like
      • '(' '(' '2' '+' '3' ')' '+' '10' ')'.
  • Tokens can be:

    • Identifiers: foo, Bambous, self, we_can_dance, …
    • Integers: 42, 72u32, 0_______0, …
    • Keywords: _, fn, self, match, yield, macro, …
    • Lifetimes: 'a, 'b, …
    • Strings: "", "Leicester", r##"venezuelan beaver"##, …
    • Symbols: [, :, ::, ->, @, <-, …
  • In C, macros see the token stream as input.


Rust Syntax - Token Trees

  • After lexing, a small amount of parsing is done to turn it into a token tree.

    • This isn’t a full-fledged AST (abstract syntax tree).
    • For the token stream '(' '(' '2' '+' '3' ')' '+' '10' ')', the token tree looks like this:

      1. Parens
      2. ├─ Parens
      3. ├─ '2'
      4. ├─ '+'
      5. └─ '3'
      6. ├─ '+'
      7. └─ 10
  • In Rust, macros see one token tree as input.

    • When you do println!("{}", (5+2)), the "{}", (5+2) will get parsed into a token tree, but not fully parsed into an AST.

Rust Syntax - AST

  • The AST (abstract syntax tree) is the fully-parsed tree.

    • All syntax extension (and macro) invocations are expanded, then parsed into sub-ASTs after the initial AST construction.
      • Syntax extensions must output valid, contextually-correct Rust.
  • Syntax extension calls can appear in place of the following syntax kinds, by outputting a valid AST of that kind:

    • Patterns (e.g. in a match or if let).
    • Statements (e.g. let x = 4;).
    • Expressions (e.g. x * (y + z)).
    • Items (e.g. fn, struct, impl, use).
  • They cannot appear in place of:

    • Identifiers, match arms, struct fields, or types.

Macro Expansion

  • Let’s parse this Rust code into an AST:

    1. let eight = 2 * four!();
    1. Let { name: eight
    2. init: BinOp { op: Mul
    3. lhs: LitInt { val: 2 }
    4. /* macro */ rhs: Macro { name: four
    5. /* input */ body: () } } }
  • If four!() is defined to expand to 1 + 3, this expands to:

    1. Let { name: eight
    2. init: BinOp { op: Mul
    3. lhs: LitInt { val: 2 }
    4. /* macro */ rhs: BinOp { op: Add
    5. /* output */ lhs: LitInt { val: 1 }
    6. /* */ rhs: LitInt { val: 3 } } } }
    1. let eight = 2 * (1 + 3);

Macro Rules

  • Put simply, a macro is just a compile-time pattern match:

    1. macro_rules! mymacro {
    2. ($pattern1) => {$expansion1};
    3. ($pattern2) => {$expansion2};
    4. // ...
    5. }
  • The four! macro is simple:

    1. macro_rules! four {
    2. // For empty input, produce `1 + 3` as output.
    3. () => {1 + 3};
    4. }

Macro Rules

  • Any valid Rust tokens can appear in the match:

    1. macro_rules! imaginary {
    2. (twentington) => {"20ton"};
    3. (F00 & nee) => {"f0e"};
    4. }
    5. imaginary!(twentington);
    6. imaginary!(F00&nee);
    7. imaginary!(schinty six); // won't compile; is a real number

Macro Rules - Captures

  • Portions of the input token tree can be captured:

    1. macro_rules! sub {
    2. ($e1:expr, $e2:expr) => { ... };
    3. }
  • Captures are always written as $name:kind.

    • Possible kinds are:
      • item: an item, like a function, struct, module, etc.
      • block: a block (i.e. { some; stuff; here })
      • stmt: a statement
      • pat: a pattern
      • expr: an expression
      • ty: a type
      • ident: an identifier
      • path: a path (e.g. foo, ::std::mem::replace, …)
      • meta: a meta item; the things that go inside #[...]
      • tt: a single token tree

Macro Rules - Captures

  • Captures can be substituted back into the expanded tree

    1. macro_rules! sub {
    2. ( $e1:expr , $e2:expr ) => { $e1 - $e2 };
    3. }
  • A capture will always be inserted as a single AST node.

    • For example, expr will always mean a valid Rust expression.
    • This means we’re no longer vulnerable to C’s substitution problem (the invalid order of operations).
    • Multiple expansions will still cause multiple evaluations:

      1. macro_rules! twice {
      2. ( $e:expr ) => { { $e; $e } }
      3. }
      4. fn foo() { println!("foo"); }
      5. twice!(foo()); // expands to { foo(); foo() }: prints twice

Macro Rules - Repetitions

  • If we want to match a list, a variable number of arguments, etc., we can’t do this with the rules we’ve seen so far.
    • Repetitions allow us to define repeating subpatterns.
    • These have the form $ ( ... ) sep rep.
      • $ is a literal dollar token.
      • ( ... ) is the paren-grouped pattern being repeated.
      • sep is an optional separator token.
        • Usually, this will be , or ;.
      • rep is the required repeat control. This can be either:
        • * zero or more repeats.
        • + one or more repeats.
    • The same pattern is used in the output arm.
      • The separator doesn’t have to be the same.

Macro Rules - Repetitions

  • We can use these to reimplement our own vec! macro:
  1. macro_rules! myvec {
  2. ( $( // Start a repetition
  3. $elem:expr // Each repetition matches one expr
  4. ) , // Separated by commas
  5. * // Zero or more repetitions
  6. ) => {
  7. { // Braces so we output only one AST (block kind)
  8. let mut v = Vec::new();
  9. $( // Expand a repetition
  10. v.push($elem); // Expands once for each input rep
  11. ) * // No sep; zero or more reps
  12. v // Return v from the block.
  13. }
  14. }
  15. }
  16. println!("{:?}", myvec![3, 4]);

Macro Rules - Repetitions

  • Condensed:
  1. macro_rules! myvec {
  2. ( $( $elem:expr ),* ) => {
  3. {
  4. let mut v = Vec::new();
  5. $( v.push($elem); )*
  6. v
  7. }
  8. }
  9. }
  10. println!("{:?}", myvec![3, 4]);

Macro Rules - Matching

  • Macro rules are matched in order.
  • The parser can never backtrack. Say we have:

    1. macro_rules! dead_rule {
    2. ($e:expr) => { ... };
    3. ($i:ident +) => { ... };
    4. }
  • If we call it as dead_rule(x +);, it will actually fail.

    • x + isn’t a valid expression, so we might think it would fail on the first match and then try again on the second.
    • This doesn’t happen!
    • Instead, since it starts out looking like an expression, it commits to that match case.
      • When it turns out not to work, it can’t backtrack on what it’s parsed already, to try again. Instead it just fails.

Macro Rules - Matching

  • To solve this, we need to put more specific rules first:

    1. macro_rules! dead_rule {
    2. ($i:ident +) => { ... };
    3. ($e:expr) => { ... };
    4. }
  • Now, when we call dead_rule!(x +);, the first case will match.

  • If we called dead_rule!(x + 2);, we can now fall through to the second case.
    • Why does this work?
    • Because if we’ve seen $i:ident +, the parser already knows that this looks like the beginning of an expression, so it can fall through to the second case.

Macro Expansion - Hygiene

  • In C, we talked about how a macro can implicitly use (or conflict) with an identifier name in the calling context (#define BAZ a).

  • Rust macros are partially hygenic.

    • Hygenic with regard to most identifiers.
      • These identifiers get a special context internal to the macro expansion.
    • NOT hygenic: generic types (<T>), lifetime parameters (<'a>).

      1. macro_rules! using_a {
      2. ($e:expr) => { { let a = 42; $e } }
      3. } // Note extra braces ^ ^
      4. let four = using_a!(a / 10); // this won't compile - nice!
    • We can imagine that this expands to something like:

      1. let four = { let using_a_1232424_a = 42; a / 10 };

Macro Expansion - Hygiene

  • But if we want to bind a new variable, it’s possible.

    • If a token comes in as an input to the function, then it is part of the caller’s context, not the macro’s context.

      1. macro_rules! using_a {
      2. ($a:ident, $e:expr) => { { let $a = 42; $e } }
      3. } // Note extra braces ^ ^
      4. let four = using_a!(a, a / 10); // compiles!
    • This expands to:

      1. let four = { let a = 42; a / 10 };

Macro Expansion - Hygiene

  • It’s also possible to create identifiers that will be visible outside of the macro call.

    • This won’t work due to hygiene:

      1. macro_rules! let_four {
      2. () => { let four = 4; }
      3. } // ^ No extra braces
      4. let_four!();
      5. println!("{}", four); // `four` not declared
    • But this will:

      1. macro_rules! let_four {
      2. ($i:ident) => { let $i = 4; }
      3. } // ^ No extra braces
      4. let_four!(myfour);
      5. println!("{}", myfour); // works!

Nested and Recursive Macros

  • If a macro calls another macro (or itself), this is fine:

    1. macro_rules! each_tt {
    2. () => {};
    3. ( $_tt:tt $($rest:tt)* ) => { each_tt!( $($rest)* ); };
    4. }
    • The compiler will keep expanding macros until there are none left in the AST (or the recursion limit is hit).

    • The compiler’s recursion limit can be changed with #![recursion_limit="64"] at the crate root.

      • 64 is the default.
      • This applies to all recursive compiler operations, including auto-dereferencing and macro expansion.

Macro Debugging

  • Rust has an unstable feature for debugging macro expansion.

    • Especially recursive macro expansions.

      1. #![feature(trace_macros)]
      2. macro_rules! each_tt {
      3. () => {};
      4. ( $_tt:tt $($rest:tt)* ) => { each_tt!( $($rest)* ); };
      5. }
      6. trace_macros!(true);
      7. each_tt!(spim wak plee whum);
      8. trace_macros!(false);
    • This will cause the compiler to print:

      1. each_tt! { spim wak plee whum }
      2. each_tt! { wak plee whum }
      3. each_tt! { plee whum }
      4. each_tt! { whum }
      5. each_tt! { }
  • More tips on macro debugging in TLBORM 2.3.4


Macro Scoping

  • Macro scoping is unlike everything else in Rust.

    • Macros are immediately visible in submodules:

      1. macro_rules! X { () => {}; }
      2. mod a { // Or `mod a` could be in `a.rs`.
      3. X!(); // valid
      4. }
    • Macros are only defined after they appear in a module:

      1. mod a { /* X! undefined here */ }
      2. mod b {
      3. /* X! undefined here */
      4. macro_rules! X { () => {}; }
      5. X!(); // valid
      6. }
      7. mod c { /* X! undefined */ } // They don't leak between mods.

Macro Scoping

  • Macros can be exported from modules:

    1. #[macro_use] // outside of the module definition
    2. mod b {
    3. macro_rules! X { () => {}; }
    4. }
    5. mod c {
    6. X!(); // valid
    7. }
    • Or from crates, using #[macro_export] in the crate.
  • There are a few other weirdnesses of macro scoping.

  • In general, to avoid too much scope weirdness:

    • Put your crate-wide macros at the top of your root module (lib.rs or main.rs).

Rust Macros Design Patterns


Macro Callbacks

  • Because of the way macros are expanded, “obviously correct” macro invocations like this won’t actually work:

    1. macro_rules! expand_to_larch {
    2. () => { larch };
    3. }
    4. macro_rules! recognise_tree {
    5. (larch) => { println!("larch") };
    6. (redwood) => { println!("redwood") };
    7. ($($other:tt)*) => { println!("dunno??") };
    8. }
    9. recognise_tree!(expand_to_larch!());
    • This will be expanded like so:

      1. -> recognize_tree!{ expand_to_larch ! ( ) };
      2. -> println!("dunno??");
    • Which will match the third pattern, not the first.


Macro Callbacks

  • This can make it hard to split a macro into several parts.

    • This isn’t always a problem - expand_to_larch ! ( ) won’t match an ident, but it will match an expr.
  • The problem can be worked around by using a callback pattern:

    1. macro_rules! call_with_larch {
    2. ($callback:ident) => { $callback!(larch) };
    3. }
    4. call_with_larch!(recognize_tree);
    • This expands like this:

      1. -> call_with_larch! { recognise_tree }
      2. -> recognise_tree! { larch }
      3. -> println!("larch");

Macro TT Munchers

  • This is one of the most powerful and useful macro design patterns. It allows for parsing fairly complex grammars.

  • A tt muncher is a macro which matches a bit at the beginning of its input, then recurses on the remainder of the input.

    • ( $some_stuff:expr $( $tail:tt )* ) =>
    • Usually needed for any kind of actual language grammar.
    • Can only match against literals and grammar constructs which can be captured by macro_rules!.
    • Cannot match unbalanced groups.

Macro TT Munchers

  1. macro_rules! mixed_rules {
  2. () => {}; // Base case
  3. (trace $name:ident ; $( $tail:tt )*) => {
  4. {
  5. println!(concat!(stringify!($name), " = {:?}"), $name);
  6. mixed_rules!($($tail)*); // Recurse on the tail of the input
  7. }
  8. };
  9. (trace $name:ident = $init:expr ; $( $tail:tt )*) => {
  10. {
  11. let $name = $init;
  12. println!(concat!(stringify!($name), " = {:?}"), $name);
  13. mixed_rules!($($tail)*); // Recurse on the tail of the input
  14. }
  15. };
  16. }

Macros Rule! Mostly!

  • Macros are pretty great - but not perfect.
    • Macro hygiene isn’t perfect.
    • The scope of where you can use a macro is weird.
    • Handling crates inside of exported macros is weird.
    • It’s impossible to construct entirely new identifiers (e.g. by concatenating two other identifiers).
  • A new, incompatible macro system may appear in future Rust.
    • This would be a new syntax for writing syntax extensions.