Trees


Now we can read input, and we have it structured internally, but we are still unable to evaluate it. In this chapter we add the code that evaluates this structure and actually performs the computations encoded within.

This internal structure is what we saw printed out by the program in the previous chapter. It is called an Abstract Syntax Tree, and it represents the structure of the program based on the input entered by the user. At the leaves of this tree are numbers and operators - the actual data to be processed. At the branches are the rules used to produce this part of the tree - the information on how to traverse and evaluate it.

tree

Abstract Christmas Tree • A seasonal variation

Before working out exactly how we are going to do this traversal, let’s see exactly how this structure is defined internally. If we peek inside mpc.h we can have a look at the definition of mpc_ast_t, which is the data structure we got from the parse.

  1. typedef struct mpc_ast_t {
  2. char* tag;
  3. char* contents;
  4. mpc_state_t state;
  5. int children_num;
  6. struct mpc_ast_t** children;
  7. } mpc_ast_t;

This struct has a number of fields we can access. Let’s take a look at them one by one.

The first field is tag. When we printed out the tree this was the information that preceded the contents of the node. It was a string containing a list of all the rules used to parse that particular item. For example expr|number|regex.

This tag field is going to be important as it lets us see what parse rules have been used to create the node.

The second field is contents. This will contain the actual contents of the node such as '*', '(' or '5'. You’ll notice for branches this is empty, but for leaves we can use it to find the operator or number to use.

The next field is state. This contains information about what state the parser was in when it found this node, such as the line and column number. We won’t make use of this in our program.

Finally we see two fields that are going to help us traverse the tree. These are children_num and children. The first field tells us how many children a node has, and the second is an array of these children.

The type of the children field is mpc_ast_t**. This is a double pointer type. It isn’t as scary as it looks and will be explained in greater detail in later chapters. For now you can think of it as a list of the child nodes of this tree.

We can access a child node by accessing this field using array notation. This is done by writing the field name children and suffixing it with square brackets containing the index of the child to access. For example to access the first child of the node we can use children[0]. Notice that C counts its array indices from 0.

Because the type mpc_ast_t* is a pointer to a struct, there is a slightly different syntax to access its fields. We need to use an arrow -> instead of a dot .. There is no fundamental reason for this switch in operators, so for now just remember that field access of pointer types uses an arrow.

  1. /* Load AST from output */
  2. mpc_ast_t* a = r.output;
  3. printf("Tag: %s\n", a->tag);
  4. printf("Contents: %s\n", a->contents);
  5. printf("Number of children: %i\n", a->children_num);
  6. /* Get First Child */
  7. mpc_ast_t* c0 = a->children[0];
  8. printf("First Child Tag: %s\n", c0->tag);
  9. printf("First Child Contents: %s\n", c0->contents);
  10. printf("First Child Number of children: %i\n",
  11. c0->children_num);