Natural Grammars


mpc lets us write grammars in a more natural form too. Rather than using C functions that look less like a grammar, we can specify the whole thing in one long string. When using this method we don’t have to worry about how to join or discard inputs, with functions such as mpcf_strfold, or free. All of that is done automatically for us.

Here is how we would recreate the previous examples using this method.

  1. mpc_parser_t* Adjective = mpc_new("adjective");
  2. mpc_parser_t* Noun = mpc_new("noun");
  3. mpc_parser_t* Phrase = mpc_new("phrase");
  4. mpc_parser_t* Doge = mpc_new("doge");
  5. mpca_lang(MPCA_LANG_DEFAULT,
  6. " \
  7. adjective : \"wow\" | \"many\" \
  8. | \"so\" | \"such\"; \
  9. noun : \"lisp\" | \"language\" \
  10. | \"book\" | \"build\" | \"c\"; \
  11. phrase : <adjective> <noun>; \
  12. doge : <phrase>*; \
  13. ",
  14. Adjective, Noun, Phrase, Doge);
  15. /* Do some parsing here... */
  16. mpc_cleanup(4, Adjective, Noun, Phrase, Doge);

Without having an exact understanding of the syntax for that long string, it should be obvious how much clearer the grammar is in this format. If we learn what all the special symbols mean we barely need to squint.

Another thing to notice is that the process is now in two steps. First we create and name several rules using mpc_new and then we define them using mpca_lang.

The first argument to mpca_lang are the options flags. For this we just use the defaults. The second is a long multi-line string in C. This is the grammar specification. It consists of a number of re-write rules. Each rule has the name of the rule on the left, a colon :, and on the right its definition terminated with a semicolon ;.

The special symbols used to define the rules on the right hand side work as follows.

“ab”The string ab is required.
‘a’The character a is required.
‘a’ ‘b’First ‘a’ is required, then ‘b’ is required.
‘a’ | ‘b’Either ‘a’ is required, or ‘b’ is required.
‘a’*Zero or more ‘a’ are required.
‘a’+One or more ‘a’ are required.
<abba>The rule called abba is required.

Sounds familiar…

Did you notice that the description of what the input string to mpca_lang should look like sounded like I was specifying a grammar? That’s because it was. mpc uses itself internally to parse the input you give it to mpca_lang. It does it by specifying a grammar in code using the previous method. How neat is that?

Using the table described above verify that what I’ve written above is equal to what we specified in code.

This method of specifying a grammar is what we are going to use in the following chapters. It might seem overwhelming at first. Grammars can be difficult to understand. But as we continue you will become much more familiar with how to create and edit them.

This chapter is about theory, so if you are going to try some of the bonus tasks, don’t worry too much about correctness. Thinking in the right mindset is more important. Feel free to invent symbols and notation for certain concepts to make them simpler to write down. Some of the bonus task also might require cyclic or recursive grammar structures, so don’t worry if you have to use these!