Reading Strings


Now we need to add support for parsing strings. As usual this requires first adding a new grammar rule called string and adding it to our parser.

The rule we are going to use that represents a string is going to be the same as for C style strings. This means a string is essentially a series of escape characters, or normal characters, between two quotation marks "". We can specify this as a regular expression inside our grammar string as follows.

  1. string : /\"(\\\\.|[^\"])*\"/ ;

This looks complicated but makes a lot more sense when explained in parts. It reads like this. A string is a " character, followed by zero or more of either a backslash \\ followed by any other character ., or anything that isn’t a " character [^\\"]. Finally it ends with another " character.

We also need to add a case to deal with this in the lval_read function.

  1. if (strstr(t->tag, "string")) { return lval_read_str(t); }

Because the input string is input in an escaped form we need to create a function lval_read_str which deals with this. This function is a little tricky because it has to do a few tasks. First it must strip the input string of the " characters on either side. Then it must unescape the string, converting series of characters such as \n to their actual encoded characters. Finally it has to create a new lval and clean up anything that has happened in-between.

  1. lval* lval_read_str(mpc_ast_t* t) {
  2. /* Cut off the final quote character */
  3. t->contents[strlen(t->contents)-1] = '\0';
  4. /* Copy the string missing out the first quote character */
  5. char* unescaped = malloc(strlen(t->contents+1)+1);
  6. strcpy(unescaped, t->contents+1);
  7. /* Pass through the unescape function */
  8. unescaped = mpcf_unescape(unescaped);
  9. /* Construct a new lval using the string */
  10. lval* str = lval_str(unescaped);
  11. /* Free the string and return */
  12. free(unescaped);
  13. return str;
  14. }

If this all works we should be able to play around with strings in the prompt. Next we’ll add functions which can actually make use of them.

  1. lispy> "hello"
  2. "hello"
  3. lispy> "hello\n"
  4. "hello\n"
  5. lispy> "hello\""
  6. "hello\""
  7. lispy> head {"hello" "world"}
  8. {"hello"}
  9. lispy> eval (head {"hello" "world"})
  10. "hello"
  11. lispy>