Structured Data

CIS 198 Lecture 2


Structured Data

  • Rust has two simple ways of creating structured data types:

    • Structs: C-like structs to hold data.
    • Enums: OCaml-like; data that can be one of several types.
  • Structs and enums may have one or more implementation blocks (impls) which define methods for the data type.


Structs

  • A struct declaration:
    • Fields are declared with name: type.
  1. struct Point {
  2. x: i32,
  3. y: i32,
  4. }
  • By convention, structs have CamelCase names, and their fields have snake_case names.
  • Structs may be instantiated with fields assigned in braces.
  1. let origin = Point { x: 0, y: 0 };

Structs

  • Struct fields may be accessed with dot notation.
  • Structs may not be partially-initialized.
    • You must assign all fields upon creation, or declare an uninitialized struct that you initialize later.
  1. let mut p = Point { x: 19, y: 8 };
  2. p.x += 1;
  3. p.y -= 1;

Structs

  • Structs do not have field-level mutability.
  • Mutability is a property of the variable binding, not the type.
  • Field-level mutability (interior mutability) can be achieved via Cell types.
    • More on these very soon.
  1. struct Point {
  2. x: i32,
  3. mut y: i32, // Illegal!
  4. }

Structs

  • Structs are namespaced with their module name.
    • The fully qualified name of Point is foo::Point.
  • Struct fields are private by default.
    • They may be made public with the pub keyword.
  • Private fields may only be accessed from within the module where the struct is declared.
  1. mod foo {
  2. pub struct Point {
  3. pub x: i32,
  4. y: i32,
  5. }
  6. }
  7. fn main() {
  8. let b = foo::Point { x: 12, y: 12 };
  9. // ^~~~~~~~~~~~~~~~~~~~~~~~~~~
  10. // error: field `y` of struct `foo::Point` is private
  11. }

Structs

  1. mod foo {
  2. pub struct Point {
  3. pub x: i32,
  4. y: i32,
  5. }
  6. // Creates and returns a new point
  7. pub fn new(x: i32, y: i32) -> Point {
  8. Point { x: x, y: y }
  9. }
  10. }
  • new is inside the same module as Point, so accessing private fields is allowed.

Struct matching

  • Destructure structs with match statements.
  1. pub struct Point {
  2. x: i32,
  3. y: i32,
  4. }
  5. match p {
  6. Point { x, y } => println!("({}, {})", x, y)
  7. }

Struct matching

  • Some other tricks for struct matches:
  1. match p {
  2. Point { y: y1, x: x1 } => println!("({}, {})", x1, y1)
  3. }
  4. match p {
  5. Point { y, .. } => println!("{}", y)
  6. }
  • Fields do not need to be in order.
  • List fields inside braces to bind struct members to those variable names.
    • Use struct_field: new_var_binding to change the variable it’s bound to.
  • Omit fields: use .. to ignore all unnamed fields.

Struct Update Syntax

  • A struct initializer can contain .. s to copy some or all fields from s.
  • Any fields you don’t specify in the initializer get copied over from the target struct.
  • The struct used must be of the same type as the target struct.
    • No copying same-type fields from different-type structs!
  1. struct Foo { a: i32, b: i32, c: i32, d: i32, e: i32 }
  2. let mut x = Foo { a: 1, b: 1, c: 2, d: 2, e: 3 };
  3. let x2 = Foo { e: 4, .. x };
  4. // Useful to update multiple fields of the same struct:
  5. x = Foo { a: 2, b: 2, e: 2, .. x };

Tuple Structs

  • Variant on structs that has a name, but no named fields.
  • Have numbered field accessors, like tuples (e.g. x.0, x.1, etc).
  • Can also match these.
  1. struct Color(i32, i32, i32);
  2. let mut c = Color(0, 255, 255);
  3. c.0 = 255;
  4. match c {
  5. Color(r, g, b) => println!("({}, {}, {})", r, g, b)
  6. }

Tuple Structs

  • Helpful if you want to create a new type that’s not just an alias.
    • This is often referred to as the “newtype” pattern.
  • These two types are structurally identical, but not equatable.
  1. // Not equatable
  2. struct Meters(i32);
  3. struct Yards(i32);
  4. // May be compared using `==`, added with `+`, etc.
  5. type MetersAlias = i32;
  6. type YardsAlias = i32;

Unit Structs (Zero-Sized Types)

  • Structs can be declared to have zero size.
    • This struct has no fields!
  • We can still instantiate it.
  • It can be used as a “marker” type on other data structures.
    • Useful to indicate, e.g., the type of data a container is storing.
  1. struct Unit;
  2. let u = Unit;

Enums

  • An enum, or “sum type”, is a way to express some data that may be one of several things.
  • Much more powerful than in Java, C, C++, C#…
  • Each enum variant can have:
    • no data (unit variant)
    • named data (struct variant)
    • unnamed ordered data (tuple variant)
  1. enum Resultish {
  2. Ok,
  3. Warning { code: i32, message: String },
  4. Err(String)
  5. }

Enums

  • Enum variants are namespaced by their enum type: Resultish::Ok.
    • You can import all variants with use Resultish::*.
  • Enums, much as you’d expect, can be matched on like any other data type.
  1. match make_request() {
  2. Resultish::Ok =>
  3. println!("Success!"),
  4. Resultish::Warning { code, message } =>
  5. println!("Warning: {}!", message),
  6. Resultish::Err(s) =>
  7. println!("Failed with error: {}", s),
  8. }

Enums

  • Enum constructors like Resultish::Ok and the like can be used as functions.
  • This is not currently very useful, but will become so when we cover closures & iterators.

Recursive Types

  • You might think to create a nice functional-style List type:
  1. enum List {
  2. Nil,
  3. Cons(i32, List),
  4. }

Recursive Types

  • Such a definition would have infinite size at compile time!
  • Structs & enums are stored inline by default, so they may not be recursive.
    • i.e. elements are not stored by reference, unless explicitly specified.
  • The compiler tells us how to fix this, but what’s a box?
  1. enum List {
  2. Nil,
  3. Cons(i32, List),
  4. }
  5. // error: invalid recursive enum type
  6. // help: wrap the inner value in a box to make it representable

Boxes, Briefly

  • A box (lowercase) is a general term for one of Rust’s ways of allocating data on the heap.
  • A Box<T> (uppercase) is a heap pointer with exactly one owner.
    • A Box owns its data (the T) uniquely— it can’t be aliased.
  • Boxes are automatically destructed when they go out of scope.
  • Create a Box with Box::new():
  1. let boxed_five = Box::new(5);
  2. enum List {
  3. Nil,
  4. Cons(i32, Box<List>), // OK!
  5. }
  • We’ll cover these in greater detail when we talk more about pointers.

Methods

  1. impl Point {
  2. pub fn distance(&self, other: Point) -> f32 {
  3. let (dx, dy) = (self.x - other.x, self.y - other.y);
  4. ((dx.pow(2) + dy.pow(2)) as f32).sqrt()
  5. }
  6. }
  7. fn main() {
  8. let p = Point { x: 1, y: 2 };
  9. p.distance();
  10. }
  • Methods can be implemented for structs and enums in an impl block.
  • Like fields, methods may be accessed via dot notation.
  • Can be made public with pub.
    • impl blocks themselves don’t need to be made pub.
  • Work for enums in exactly the same way they do for structs.

Methods

  • The first argument to a method, named self, determines what kind of ownership the method requires.
  • &self: the method borrows the value.
    • Use this unless you need a different ownership model.
  • &mut self: the method mutably borrows the value.
    • The function needs to modify the struct it’s called on.
  • self: the method takes ownership.
    • The function consumes the value and may return something else.

Methods

  1. impl Point {
  2. fn distance(&self, other: Point) -> f32 {
  3. let (dx, dy) = (self.x - other.x, self.y - other.y);
  4. ((dx.pow(2) + dy.pow(2)) as f32).sqrt()
  5. }
  6. fn translate(&mut self, x: i32, y: i32) {
  7. self.x += x;
  8. self.y += y;
  9. }
  10. fn mirror_y(self) -> Point {
  11. Point { x: -self.x, y: self.y }
  12. }
  13. }
  • distance needs to access but not modify fields.
  • translate modifies the struct fields.
  • mirror_y returns an entirely new struct, consuming the old one.

Associated Functions

  1. impl Point {
  2. fn new(x: i32, y: i32) -> Point {
  3. Point { x: x, y: y }
  4. }
  5. }
  6. fn main() {
  7. let p = Point::new(1, 2);
  8. }
  • Associated function: like a method, but does not take self.
    • This is called with namespacing syntax: Point::new().
      • Not Point.new().
    • Like a “static” method in Java.
  • A constructor-like function is usually named new.
    • No inherent notion of constructors, no automatic construction.

Implementations

  • Methods, associated functions, and functions in general may not be overloaded.
    • e.g. Vec::new() and Vec::with_capacity(capacity: usize) are both constructors for Vec
  • Methods may not be inherited.
    • Rust structs & enums must be composed instead.
    • However, traits (coming soon) have basic inheritance.

Patterns

  • Use ... to specify a range of values. Useful for numerics and chars.
  • Use _ to bind against any value (like any variable binding) and discard the binding.
  1. let x = 17;
  2. match x {
  3. 0 ... 5 => println!("zero through five (inclusive)"),
  4. _ => println!("You still lose the game."),
  5. }

match: References

  • Get a reference to a variable by asking for it with ref.
  1. let x = 17;
  2. match x {
  3. ref r => println!("Of type &i32: {}", r),
  4. }
  • And get a mutable reference with ref mut.
    • Only if the variable was declared mut.
  1. let mut x = 17;
  2. match x {
  3. ref r if x == 5 => println!("{}", r),
  4. ref mut r => *r = 5
  5. }
  • Similar to let ref.

if-let Statements

  • If you only need a single match arm, it often makes more sense to use Rust’s if-let construct.
  • For example, given the Resultish type we defined earlier:
  1. enum Resultish {
  2. Ok,
  3. Warning { code: i32, message: String },
  4. Err(String),
  5. }

if-let Statements

  • Suppose we want to report an error but do nothing on Warnings and Oks.
  1. match make_request() {
  2. Resultish::Err(_) => println!("Total and utter failure."),
  3. _ => println!("ok."),
  4. }
  • We can simplify this statement with an if-let binding:
  1. let result = make_request();
  2. if let Resultish::Err(s) = result {
  3. println!("Total and utter failure: {}", s);
  4. } else {
  5. println!("ok.");
  6. }

while-let Statement

  • There’s also a similar while-let statement, which works like an if-let, but iterates until the condition fails to match.
  1. while let Resultish::Err(s) = make_request() {
  2. println!("Total and utter failure: {}", s);
  3. }

Inner Bindings

  • With more complicated data structures, use @ to create variable bindings for inner elements.
  1. #[derive(Debug)]
  2. enum A { None, Some(B) }
  3. #[derive(Debug)]
  4. enum B { None, Some(i32) }
  5. fn foo(x: A) {
  6. match x {
  7. a @ A::None => println!("a is A::{:?}", a),
  8. ref a @ A::Some(B::None) => println!("a is A::{:?}", *a),
  9. A::Some(b @ B::Some(_)) => println!("b is B::{:?}", b),
  10. }
  11. }
  12. foo(A::None); // ==> x is A::None
  13. foo(A::Some(B::None)); // ==> a is A::Some(None)
  14. foo(A::Some(B::Some(5))); // ==> b is B::Some(5)

Lifetimes

  • There’s one more piece to the ownership puzzle: Lifetimes.
  • Lifetimes generally have a pretty steep learning curve.
    • We may cover them again later on in the course under a broader scope if necessary.
  • Don’t worry if you don’t understand these right away.

Lifetimes

  • Imagine This:
    1. I acquire a resource.
    2. I lend you a reference to my resource.
    3. I decide that I’m done with the resource, so I deallocate it.
    4. You still hold a reference to the resource, and decide to use it.
    5. You crash 😿.
  • We’ve already said that Rust makes this scenario impossible, but glossed over how.
  • We need to prove to the compiler that step 3 will never happen before step 4.

Lifetimes

  • Ordinarily, references have an implicit lifetime that we don’t need to care about:
    1. fn foo(x: &i32) {
    2. // ...
    3. }
  • However, we can explicitly provide one instead:

    1. fn bar<'a>(x: &'a i32) {
    2. // ...
    3. }
  • 'a, pronounced “tick-a” or “the lifetime a“ is a named lifetime parameter.

    • <'a> declares generic parameters, including lifetime parameters.
    • The type &'a i32 is a reference to an i32 that lives at least as long as the lifetime 'a.

???

Stop here briefly to discuss


Lifetimes

  • The compiler is smart enough not to need 'a above, but this isn’t always the case.
  • Scenarios that involve multiple references or returning references often require explicit lifetimes.
    • Speaking of which…

Multiple Lifetime Parameters

  1. fn borrow_x_or_y<'a>(x: &'a str, y: &'a str) -> &'a str;
  • In borrow_x_or_y, all input/output references all have the same lifetime.
    • x and y are borrowed (the reference is alive) as long as the returned reference exists.
  1. fn borrow_p<'a, 'b>(p: &'a str, q: &'b str) -> &'a str;
  • In borrow_p, the output reference has the same lifetime as p.
    • q has a separate lifetime with no constrained relationship to p.
    • p is borrowed as long as the returned reference exists.

Lifetimes

  • Okay, great, but what does this all mean?
    • If a reference R has a lifetime 'a, it is guaranteed that it will not outlive the owner of its underlying data (the value at *R)
    • If a reference R has a lifetime of 'a, anything else with the lifetime 'a is guaranteed to live as long R.
  • This will probably become more clear the more you use lifetimes yourself.

Lifetimes - structs

  • Structs (and struct members) can have lifetime parameters.
  1. struct Pizza(Vec<i32>);
  2. struct PizzaSlice<'a> {
  3. pizza: &'a Pizza, // <- references in structs must
  4. index: u32, // ALWAYS have explicit lifetimes
  5. }
  6. let p1 = Pizza(vec![1, 2, 3, 4]);
  7. {
  8. let s1 = PizzaSlice { pizza: &p1, index: 2 }; // this is okay
  9. }
  10. let s2;
  11. {
  12. let p2 = Pizza(vec![1, 2, 3, 4]);
  13. s2 = PizzaSlice { pizza: &p2, index: 2 };
  14. // no good - why?
  15. }

???

Live demo!


Lifetimes - structs

  • Lifetimes can be constrained to “outlive” others.
    • Same syntax as type constraint: <'b: 'a>.
  1. struct Pizza(Vec<i32>);
  2. struct PizzaSlice<'a> { pizza: &'a Pizza, index: u32 }
  3. struct PizzaConsumer<'a, 'b: 'a> { // says "b outlives a"
  4. slice: PizzaSlice<'a>, // <- currently eating this one
  5. pizza: &'b Pizza, // <- so we can get more pizza
  6. }
  7. fn get_another_slice(c: &mut PizzaConsumer, index: u32) {
  8. c.slice = PizzaSlice { pizza: c.pizza, index: index };
  9. }
  10. let p = Pizza(vec![1, 2, 3, 4]);
  11. {
  12. let s = PizzaSlice { pizza: &p, index: 1 };
  13. let mut c = PizzaConsumer { slice: s, pizza: &p };
  14. get_another_slice(&mut c, 2);
  15. }

Lifetimes - 'static

  • There is one reserved, special lifetime, named 'static.
  • 'static means that a reference may be kept (and will be valid) for the lifetime of the entire program.
    • i.e. the data referred to will never go out of scope.
  • All &str literals have the 'static lifetime.
  1. let s1: &str = "Hello";
  2. let s2: &'static str = "World";

Structured Data With Lifetimes

  • Any struct or enum that contains a reference must have an explicit lifetime.
  • Normal lifetime rules otherwise apply.
  1. struct Foo<'a, 'b> {
  2. v: &'a Vec<i32>,
  3. s: &'b str,
  4. }

Lifetimes in impl Blocks

  • Implementing methods on Foo struct requires lifetime annotations too!
  • You can read this block as “the implementation using the lifetimes 'a and 'b for the struct Foo using the lifetimes 'a and 'b.”
  1. impl<'a, 'b> Foo<'a, 'b> {
  2. fn new(v: &'a Vec<i32>, s: &'b str) -> Foo<'a, 'b> {
  3. Foo {
  4. v: v,
  5. s: s,
  6. }
  7. }
  8. }