Coding Grammars


So what does code that looks like a grammar…look like? Let us take a look at mpc by trying to write code for a grammar that recognizes the language of Shiba Inu. More colloquially known as Doge. This language we are going to define as follows.

› An Adjective is either “wow”, “many”, “so” or “such”.

› A Noun is either “lisp”, “language”, “c”, “book” or “build”.

› A Phrase is an Adjective followed by a Noun.

› A Doge is zero or more Phrases.

We can start by trying to define Adjective and Noun. To do this we create two new parsers, represented by the type mpc_parser_t*, and we store them in the variables Adjective and Noun. We use the function mpc_or to create a parser where one of several options should be used, and the function mpc_sym to wrap our initial strings.

If you squint you could attempt to read the code as if it were the rules we specified above.

  1. /* Build a parser 'Adjective' to recognize descriptions */
  2. mpc_parser_t* Adjective = mpc_or(4,
  3. mpc_sym("wow"), mpc_sym("many"),
  4. mpc_sym("so"), mpc_sym("such")
  5. );
  6. /* Build a parser 'Noun' to recognize things */
  7. mpc_parser_t* Noun = mpc_or(5,
  8. mpc_sym("lisp"), mpc_sym("language"),
  9. mpc_sym("book"),mpc_sym("build"),
  10. mpc_sym("c")
  11. );

How can I access these mpc functions?

For now don’t worry about compiling or running any of the sample code in this chapter. Just focus on understanding the theory behind grammars. In the next chapter we’ll get set up with mpc and use it for a language closer to our Lisp.

To define Phrase we can reference our existing parsers. We need to use the function mpc_and, that specifies one thing is required then another. As input we pass it Adjective and Noun, our previously defined parsers. This function also takes the arguments mpcf_strfold and free, which say how to join or delete the results of these parsers. Ignore these arguments for now.

  1. mpc_parser_t* Phrase = mpc_and(2, mpcf_strfold,
  2. Adjective, Noun, free);

To define Doge we must specify that zero or more of some parser is required. For this we need to use the function mpc_many. As before, this function requires the special variable mpcf_strfold to say how the results are joined together, which we can ignore.

  1. mpc_parser_t* Doge = mpc_many(mpcf_strfold, Phrase);

By creating a parser that looks for zero or more occurrences of another parser an interesting thing has happened. Our Doge parser accepts inputs of any length. This means its language is infinite. Here are just some examples of possible strings Doge could accept. Just as we discovered in the first section of this chapter we have used a finite number of re-write rules to create an infinite language.

  1. "wow book such language many lisp"
  2. "so c such build such language"
  3. "many build wow c"
  4. ""
  5. "wow lisp wow c many language"
  6. "so c"

If we use more mpc functions, we can slowly build up parsers that parse more and more complicated languages. The code we use sort of reads like a grammar, but becomes much more messy with added complexity. Due to this, taking this approach isn’t always an easy task. A whole set of helper functions that build on simple constructs to make frequent tasks easy are all documented on the mpc repository. This is a good approach for complicated languages, as it allows for fine-grained control, but won’t be required for our needs.