Strings

There are two types of strings in Rust: String and &str.

A String is stored as a vector of bytes (Vec<u8>), but guaranteed to
always be a valid UTF-8 sequence. String is heap allocated, growable and not
null terminated.

&str is a slice (&[u8]) that always points to a valid UTF-8 sequence, and
can be used to view into a String, just like &[T] is a view into Vec<T>.

  1. fn main() {
  2. // (all the type annotations are superfluous)
  3. // A reference to a string allocated in read only memory
  4. let pangram: &'static str = "the quick brown fox jumps over the lazy dog";
  5. println!("Pangram: {}", pangram);
  6. // Iterate over words in reverse, no new string is allocated
  7. println!("Words in reverse");
  8. for word in pangram.split_whitespace().rev() {
  9. println!("> {}", word);
  10. }
  11. // Copy chars into a vector, sort and remove duplicates
  12. let mut chars: Vec<char> = pangram.chars().collect();
  13. chars.sort();
  14. chars.dedup();
  15. // Create an empty and growable `String`
  16. let mut string = String::new();
  17. for c in chars {
  18. // Insert a char at the end of string
  19. string.push(c);
  20. // Insert a string at the end of string
  21. string.push_str(", ");
  22. }
  23. // The trimmed string is a slice to the original string, hence no new
  24. // allocation is performed
  25. let chars_to_trim: &[char] = &[' ', ','];
  26. let trimmed_str: &str = string.trim_matches(chars_to_trim);
  27. println!("Used characters: {}", trimmed_str);
  28. // Heap allocate a string
  29. let alice = String::from("I like dogs");
  30. // Allocate new memory and store the modified string there
  31. let bob: String = alice.replace("dog", "cat");
  32. println!("Alice says: {}", alice);
  33. println!("Bob says: {}", bob);
  34. }

More str/String methods can be found under the
std::str and
std::string
modules

Literals and escapes

There are multiple ways to write string literals with special characters in them.
All result in a similar &str so it’s best to use the form that is the most
convenient to write. Similarly there are multiple ways to write byte string literals,
which all result in &[u8; N].

Generally special characters are escaped with a backslash character: \.
This way you can add any character to your string, even unprintable ones
and ones that you don’t know how to type. If you want a literal backslash,
escape it with another one: \\

String or character literal delimiters occuring within a literal must be escaped: "\"", '\''.

  1. fn main() {
  2. // You can use escapes to write bytes by their hexadecimal values...
  3. let byte_escape = "I'm writing \x52\x75\x73\x74!";
  4. println!("What are you doing\x3F (\\x3F means ?) {}", byte_escape);
  5. // ...or Unicode code points.
  6. let unicode_codepoint = "\u{211D}";
  7. let character_name = "\"DOUBLE-STRUCK CAPITAL R\"";
  8. println!("Unicode character {} (U+211D) is called {}",
  9. unicode_codepoint, character_name );
  10. let long_string = "String literals
  11. can span multiple lines.
  12. The linebreak and indentation here ->\
  13. <- can be escaped too!";
  14. println!("{}", long_string);
  15. }

Sometimes there are just too many characters that need to be escaped or it’s just
much more convenient to write a string out as-is. This is where raw string literals come into play.

```rust, editable
fn main() {
let raw_str = r”Escapes don’t work here: \x3F \u{211D}”;
println!(“{}”, raw_str);

  1. // If you need quotes in a raw string, add a pair of #s
  2. let quotes = r#"And then I said: "There is no escape!""#;
  3. println!("{}", quotes);
  4. // If you need "# in your string, just use more #s in the delimiter.
  5. // There is no limit for the number of #s you can use.
  6. let longer_delimiter = r###"A string with "# in it. And even "##!"###;
  7. println!("{}", longer_delimiter);

}

  1. Want a string that's not UTF-8? (Remember, `str` and `String` must be valid UTF-8)
  2. Or maybe you want an array of bytes that's mostly text? Byte strings to the rescue!
  3. ```rust, editable
  4. use std::str;
  5. fn main() {
  6. // Note that this is not actually a &str
  7. let bytestring: &[u8; 20] = b"this is a bytestring";
  8. // Byte arrays don't have Display so printing them is a bit limited
  9. println!("A bytestring: {:?}", bytestring);
  10. // Bytestrings can have byte escapes...
  11. let escaped = b"\x52\x75\x73\x74 as bytes";
  12. // ...but no unicode escapes
  13. // let escaped = b"\u{211D} is not allowed";
  14. println!("Some escaped bytes: {:?}", escaped);
  15. // Raw bytestrings work just like raw strings
  16. let raw_bytestring = br"\u{211D} is not escaped here";
  17. println!("{:?}", raw_bytestring);
  18. // Converting a byte array to str can fail
  19. if let Ok(my_str) = str::from_utf8(raw_bytestring) {
  20. println!("And the same as text: '{}'", my_str);
  21. }
  22. let quotes = br#"You can also use "fancier" formatting, \
  23. like with normal raw strings"#;
  24. // Bytestrings don't have to be UTF-8
  25. let shift_jis = b"\x82\xe6\x82\xa8\x82\xb1\x82"; // "ようこそ" in SHIFT-JIS
  26. // But then they can't always be converted to str
  27. match str::from_utf8(shift_jis) {
  28. Ok(my_str) => println!("Conversion successful: '{}'", my_str),
  29. Err(e) => println!("Conversion failed: {:?}", e),
  30. };
  31. }

For conversions between character encodings check out the encoding crate.

A more detailed listing of the ways to write string literals and escape characters
is given in the ‘Tokens’ chapter of the Rust Reference.