Reading Strings


When we read in strings things get a little more complicated. This is because of escaping. We’re covered this a little in earlier chapters but now we’re going to have to deal with it head on.

Escaping is the name we give to the way we let users enter special characters by using special combinations of symbols starting with a backslash \. For example the most famous of these is the newline character which we encode using \n. When converting from the multi-character representation to the real representation we call this unescaping and when converting from the real representation to the two character encoding we call this escaping.

When we read in user strings we’ll need to convert these pairs of characters into the single special character they represent. If we ignore the leading backslash we can make a function in C that tells us this mapping.

  1. /* Function to unescape characters */
  2. char lval_str_unescape(char x) {
  3. switch (x) {
  4. case 'a': return '\a';
  5. case 'b': return '\b';
  6. case 'f': return '\f';
  7. case 'n': return '\n';
  8. case 'r': return '\r';
  9. case 't': return '\t';
  10. case 'v': return '\v';
  11. case '\\': return '\\';
  12. case '\'': return '\'';
  13. case '\"': return '\"';
  14. }
  15. return '\0';
  16. }

It is also going to be useful to list all the possible unescapable characters so we can check if we have one.

  1. /* Possible unescapable characters */
  2. char* lval_str_unescapable = "abfnrtv\\\'\"";

We can write similar functions to do the conversion in the other direction.

  1. /* List of possible escapable characters */
  2. char* lval_str_escapable = "\a\b\f\n\r\t\v\\\'\"";
  1. /* Function to escape characters */
  2. char* lval_str_escape(char x) {
  3. switch (x) {
  4. case '\a': return "\\a";
  5. case '\b': return "\\b";
  6. case '\f': return "\\f";
  7. case '\n': return "\\n";
  8. case '\r': return "\\r";
  9. case '\t': return "\\t";
  10. case '\v': return "\\v";
  11. case '\\': return "\\\\";
  12. case '\'': return "\\\'";
  13. case '\"': return "\\\"";
  14. }
  15. return "";
  16. }

With these we can begin to write our functions for reading strings. First we allocate a temporary string and while we’re not reading the terminal " character we’re going to process the incoming characters.

  1. int lval_read_str(lval* v, char* s, int i) {
  2. /* Allocate empty string */
  3. char* part = calloc(1,1);
  4. while (s[i] != '"') {
  5. char c = s[i];

First we need to check for the end of the input - if we’re reached this then there must be some string input which doesn’t terminate. In this case we free the temporary string we allocated, and return some error.

  1. /* If end of input then there is an unterminated string literal */
  2. if (c == '\0') {
  3. lval_add(v, lval_err("Unexpected end of input at string literal"));
  4. free(part);
  5. return strlen(s);
  6. }

We then check if the next character is a backslash. If we have a backslash then we need to escape the next character after it. Given the previous functions we’re already defined this is easy. If it is unescapable then we unescape it - otherwise we throw some error.

  1. /* If backslash then unescape character after it */
  2. if (c == '\\') {
  3. i++;
  4. /* Check next character is escapable */
  5. if (strchr(lval_str_unescapable, s[i])) {
  6. c = lval_str_unescape(s[i]);
  7. } else {
  8. lval_add(v, lval_err("Invalid escape character %c", c));
  9. free(part);
  10. return strlen(s);
  11. }
  12. }

Given either the escaped character, or the normal character which is part of the string we simply add it to our temporary string, and once we are done consuming characters we convert this into an lval and add it to the function argument v, free the temporary string we allocated, and return.

  1. /* Append character to string */
  2. part = realloc(part, strlen(part)+2);
  3. part[strlen(part)+1] = '\0';
  4. part[strlen(part)+0] = c;
  5. i++;
  6. }
  7. /* Add lval and free temp string */
  8. lval_add(v, lval_str(part));
  9. free(part);
  10. return i+1;
  11. }