layout: post
title: “Building a useful set of parser combinators”
description: “15 or so combinators that can be combined to parse almost anything”
categories: [Combinators,Patterns]
seriesId: “Understanding Parser Combinators”

seriesOrder: 2

UPDATE: Slides and video from my talk on this topic

In this series, we are looking at how applicative parsers and parser combinators work.

  • In the first post, we created the foundations of a parsing library.
  • In this post, we’ll build out the library with many other useful combinators.
    The combinator names will be copied from those used by FParsec, so that you can easily migrate to it.

1. map — transforming the contents of a parser

When parsing, we often we want to match a particular string, such as a reserved word like “if” or “where”. A string is just a sequence of characters,
so surely we could use the same technique that we used to define anyOf in the first post, but using andThen instead of orElse?

Here’s a (failed) attempt to create a pstring parser using that approach:

  1. let pstring str =
  2. str
  3. |> Seq.map pchar // convert into parsers
  4. |> Seq.reduce andThen

This doesn’t work, because the output of andThen is different from the input (a tuple, not a char) and so the reduce approach fails.

In order to solve this, we’ll need to use a different technique.

To get started, let’s try just matching a string of a specific length.
Say, for example, that we want to match a three digits in a row. Well, we can do that using andThen:

  1. let parseDigit =
  2. anyOf ['0'..'9']
  3. let parseThreeDigits =
  4. parseDigit .>>. parseDigit .>>. parseDigit

If we run it like this:

  1. run parseThreeDigits "123A"

then we get the result:

  1. Success ((('1', '2'), '3'), "A")

It does work, but the result contains a tuple inside a tuple (('1', '2'), '3') which is fugly and hard to use.
It would be so much more convenient to just have a simple string ("123").

But in order to turn ('1', '2'), '3') into "123", we’ll need a function that can reach inside of the parser and transform the result using an arbitrary passed in function.

Of course, what we need is the functional programmer’s best friend, map.

To understand map and similar functions, I like to think of there being two worlds: a “Normal World”, where regular things live, and “Parser World”, where Parsers live.

You can think of Parser World as a sort of “mirror” of Normal World because it obeys the following rules:

  • Every type in Normal World (say char) has a corresponding type in Parser World (Parser<char>).

Building a useful set of parser combinators - 图1

And:

  • Every value in Normal World (say "ABC") has a corresponding value in Parser World (that is, some Parser<string> that returns "ABC").

And:

  • Every function in Normal World (say char -> string) has a corresponding function in Parser World (Parser<char> -> Parser<string>).

Building a useful set of parser combinators - 图2

Using this metaphor then, map transforms (or “lifts”) a function in Normal World into a function in Parser World.

Building a useful set of parser combinators - 图3

And by the way, if you like this metaphor, I have a whole series of posts that develop it further.

So that’s what map does; how do we implement it?

The logic is:

  • Inside the innerFn, run the parser to get the result.
  • If the result was a success, apply the specified function to the success value to get a new, transformed value, and…
  • …return the new, mapped, value instead of the original value.

Here’s the code (I’ve named the map function mapP to avoid confusion with other map functions):

  1. let mapP f parser =
  2. let innerFn input =
  3. // run parser with the input
  4. let result = run parser input
  5. // test the result for Failure/Success
  6. match result with
  7. | Success (value,remaining) ->
  8. // if success, return the value transformed by f
  9. let newValue = f value
  10. Success (newValue, remaining)
  11. | Failure err ->
  12. // if failed, return the error
  13. Failure err
  14. // return the inner function
  15. Parser innerFn

If we look at the signature of mapP:

  1. val mapP :
  2. f:('a -> 'b) -> Parser<'a> -> Parser<'b>

we can see that it has exactly the signature we want, transforming a function 'a -> 'b into a function Parser<'a> -> Parser<'b>.

It’s common to define an infix version of map as well:

  1. let ( <!> ) = mapP

And in the context of parsing, we’ll often want to put the mapping function after the parser, with the parameters flipped.
This makes using map with the pipeline idiom much more convenient:

  1. let ( |>> ) x f = mapP f x

Parsing three digits with mapP

With mapP available, we can revisit parseThreeDigits and turn the tuple into a string.

Here’s the code:

  1. let parseDigit = anyOf ['0'..'9']
  2. let parseThreeDigitsAsStr =
  3. // create a parser that returns a tuple
  4. let tupleParser =
  5. parseDigit .>>. parseDigit .>>. parseDigit
  6. // create a function that turns the tuple into a string
  7. let transformTuple ((c1, c2), c3) =
  8. String [| c1; c2; c3 |]
  9. // use "map" to combine them
  10. mapP transformTuple tupleParser

Or, if you prefer a more compact implementation:

  1. let parseThreeDigitsAsStr =
  2. (parseDigit .>>. parseDigit .>>. parseDigit)
  3. |>> fun ((c1, c2), c3) -> String [| c1; c2; c3 |]

And if we test it, we get a string in the result now, rather than a tuple:

  1. run parseThreeDigitsAsStr "123A" // Success ("123", "A")

We can go further, and map the string into an int:

  1. let parseThreeDigitsAsInt =
  2. mapP int parseThreeDigitsAsStr

If we test this, we get an int in the Success branch.

  1. run parseThreeDigitsAsInt "123A" // Success (123, "A")

Let’s check the type of parseThreeDigitsAsInt:

  1. val parseThreeDigitsAsInt : Parser<int>

It’s a Parser<int> now, not a Parser<char> or Parser<string>.
The fact that a Parser can contain any type, not just a char or string, is a key feature that will be very valuable when we need to build more complex parsers.

2. apply and return — lifting functions to the world of Parsers

To achieve our goal of creating a parser that matches a list of characters, we need two more helper functions which I will call returnP and applyP.

  • returnP simply transforms a normal value into a value in Parser World
  • applyP transforms a Parser containing a function (Parser< 'a->'b >) into a function in Parser World (Parser<'a> -> Parser<'b >)

Here’s a diagram of returnP:

Building a useful set of parser combinators - 图4

And here is the implementation of returnP:

  1. let returnP x =
  2. let innerFn input =
  3. // ignore the input and return x
  4. Success (x,input )
  5. // return the inner function
  6. Parser innerFn

The signature of returnP is just as we want:

  1. val returnP :
  2. 'a -> Parser<'a>

Now here’s a diagram of applyP:

Building a useful set of parser combinators - 图5

And here is the implementation of applyP, which uses .>>. and map:

  1. let applyP fP xP =
  2. // create a Parser containing a pair (f,x)
  3. (fP .>>. xP)
  4. // map the pair by applying f to x
  5. |> mapP (fun (f,x) -> f x)

The infix version of applyP is written as <*>:

  1. let ( <*> ) = applyP

Again, the signature of applyP is just as we want:

  1. val applyP :
  2. Parser<('a -> 'b)> -> Parser<'a> -> Parser<'b>

Why do we need these two functions? Well, map will lift functions in Normal World into functions in Parser World, but only for one-parameter functions.

What’s great about returnP and applyP is that, together, they can lift any function in Normal World into a function in Parser World, no matter how many parameters it has.

For example, we now can define a lift2 function that will lift a two parameter function into Parser World like this:

  1. // lift a two parameter function to Parser World
  2. let lift2 f xP yP =
  3. returnP f <*> xP <*> yP

The signature of lift2 is:

  1. val lift2 :
  2. f:('a -> 'b -> 'c) -> Parser<'a> -> Parser<'b> -> Parser<'c>

Here’s a diagram of lift2:

Building a useful set of parser combinators - 图6

If you want to know more about how this works, check out my “man page” post on lift2 or my explanation that involves the “Monadster”.

Let’s see some examples of using lift2 in practice. First, lifting integer addition to addition of Parsers:

  1. let addP =
  2. lift2 (+)

The signature is:

  1. val addP :
  2. Parser<int> -> Parser<int> -> Parser<int>

which shows that addP does indeed take two Parser<int> parameters and returns another Parser<int>.

And here’s the startsWith function being lifted to Parser World:

  1. let startsWith (str:string) prefix =
  2. str.StartsWith(prefix)
  3. let startsWithP =
  4. lift2 startsWith

Again, the signature of startsWithP is parallel to the signature of startsWith, but lifted to the world of Parsers.

  1. val startsWith :
  2. str:string -> prefix:string -> bool
  3. val startsWithP :
  4. Parser<string> -> Parser<string> -> Parser<bool>

3. sequence — transforming a list of Parsers into a single Parser

We now have the tools we need to implement our sequencing combinator! The logic will be:

  • Start with the list “cons” operator. This is the two-parameter function that prepends a “head” element onto a “tail” of elements to make a new list.
  • Lift cons into the world of Parsers using lift2.
  • We now have a a function that prepends a head Parser to a tail list of Parsers to make a new list of Parsers, where:
    • The head Parser is the first element in the list of parsers that has been passed in.
    • The tail is generated by calling the same function recursively with the next parser in the list.
  • When the input list is empty, just return a Parser containing an empty list.

Here’s the implementation:

  1. let rec sequence parserList =
  2. // define the "cons" function, which is a two parameter function
  3. let cons head tail = head::tail
  4. // lift it to Parser World
  5. let consP = lift2 cons
  6. // process the list of parsers recursively
  7. match parserList with
  8. | [] ->
  9. returnP []
  10. | head::tail ->
  11. consP head (sequence tail)

The signature of sequence is:

  1. val sequence :
  2. Parser<'a> list -> Parser<'a list>

which shows that the input is a list of Parsers and the output is a Parser containing a list of elements.

Let’s test it by creating a list of three parsers, and then combining them into one:

  1. let parsers = [ pchar 'A'; pchar 'B'; pchar 'C' ]
  2. let combined = sequence parsers
  3. run combined "ABCD"
  4. // Success (['A'; 'B'; 'C'], "D")

As you can see, when we run it we get back a list of characters, one for each parser in the original list.

Implementing the pstring parser

At last, we can implement the parser that matches a string, which we’ll call pstring.

The logic is:

  • Convert the string into a list of characters.
  • Convert each character into a Parser<char>.
  • Use sequence to convert the list of Parser<char> into a single Parser<char list>.
  • And finally, use map to convert the Parser<char list> into a Parser<string>.

Here’s the code:

  1. /// Helper to create a string from a list of chars
  2. let charListToStr charList =
  3. String(List.toArray charList)
  4. // match a specific string
  5. let pstring str =
  6. str
  7. // convert to list of char
  8. |> List.ofSeq
  9. // map each char to a pchar
  10. |> List.map pchar
  11. // convert to Parser<char list>
  12. |> sequence
  13. // convert Parser<char list> to Parser<string>
  14. |> mapP charListToStr

Let’s test it:

  1. let parseABC = pstring "ABC"
  2. run parseABC "ABCDE" // Success ("ABC", "DE")
  3. run parseABC "A|CDE" // Failure "Expecting 'B'. Got '|'"
  4. run parseABC "AB|DE" // Failure "Expecting 'C'. Got '|'"

It works as expected. Phew!

4. many and many1 — matching a parser multiple times

Another common need is to match a particular parser as many times as you can. For example:

  • When matching an integer, you want to match as many digit characters as you can.
  • When matching a run of whitespace, you want to match as many whitespace characters as you can.

There are slightly different requirements for these two cases.

  • When matching whitespace, it is often optional, so we want a “zero or more” matcher, which we’ll call many.
  • On the other hand, when matching digits for an integer, you want to match at least one digit, so we want a “one or more” matcher, which we’ll call many1.

Before creating these, we’ll define a helper function which matches a parser zero or more times. The logic is:

  • Run the parser.
  • If the parser returns Failure (and this is key) just return an empty list. That is, this function can never fail!
  • If the parser succeeds:
    • Call the function recursively to get the remaining values (which could also be an empty list).
    • Then combine the first value and the remaining values.

Here’s the code:

  1. let rec parseZeroOrMore parser input =
  2. // run parser with the input
  3. let firstResult = run parser input
  4. // test the result for Failure/Success
  5. match firstResult with
  6. | Failure err ->
  7. // if parse fails, return empty list
  8. ([],input)
  9. | Success (firstValue,inputAfterFirstParse) ->
  10. // if parse succeeds, call recursively
  11. // to get the subsequent values
  12. let (subsequentValues,remainingInput) =
  13. parseZeroOrMore parser inputAfterFirstParse
  14. let values = firstValue::subsequentValues
  15. (values,remainingInput)

With this helper function, we can easily define many now — it’s just a wrapper over parseZeroOrMore:

  1. /// match zero or more occurences of the specified parser
  2. let many parser =
  3. let rec innerFn input =
  4. // parse the input -- wrap in Success as it always succeeds
  5. Success (parseZeroOrMore parser input)
  6. Parser innerFn

The signature of many shows that the output is indeed a list of values wrapped in a Parser:

  1. val many :
  2. Parser<'a> -> Parser<'a list>

Now let’s test many:

  1. let manyA = many (pchar 'A')
  2. // test some success cases
  3. run manyA "ABCD" // Success (['A'], "BCD")
  4. run manyA "AACD" // Success (['A'; 'A'], "CD")
  5. run manyA "AAAD" // Success (['A'; 'A'; 'A'], "D")
  6. // test a case with no matches
  7. run manyA "|BCD" // Success ([], "|BCD")

Note that in the last case, even when there is nothing to match, the function succeeds.

There’s nothing about many that restricts its use to single characters. For example, we can use it to match repetitive string sequences too:

  1. let manyAB = many (pstring "AB")
  2. run manyAB "ABCD" // Success (["AB"], "CD")
  3. run manyAB "ABABCD" // Success (["AB"; "AB"], "CD")
  4. run manyAB "ZCD" // Success ([], "ZCD")
  5. run manyAB "AZCD" // Success ([], "AZCD")

Finally, let’s implement the original example of matching whitespace:

  1. let whitespaceChar = anyOf [' '; '\t'; '\n']
  2. let whitespace = many whitespaceChar
  3. run whitespace "ABC" // Success ([], "ABC")
  4. run whitespace " ABC" // Success ([' '], "ABC")
  5. run whitespace "\tABC" // Success (['\t'], "ABC")

Defining many1

We can also define the “one or more” combinator many1, using the following logic:

  • Run the parser.
  • If it fails, return the failure.
  • If it succeeds:
    • Call the helper function parseZeroOrMore to get the remaining values.
    • Then combine the first value and the remaining values.
  1. /// match one or more occurences of the specified parser
  2. let many1 parser =
  3. let rec innerFn input =
  4. // run parser with the input
  5. let firstResult = run parser input
  6. // test the result for Failure/Success
  7. match firstResult with
  8. | Failure err ->
  9. Failure err // failed
  10. | Success (firstValue,inputAfterFirstParse) ->
  11. // if first found, look for zeroOrMore now
  12. let (subsequentValues,remainingInput) =
  13. parseZeroOrMore parser inputAfterFirstParse
  14. let values = firstValue::subsequentValues
  15. Success (values,remainingInput)
  16. Parser innerFn

Again, the signature of many1 shows that the output is indeed a list of values wrapped in a Parser:

  1. val many1 :
  2. Parser<'a> -> Parser<'a list>

Now let’s test many1:

  1. // define parser for one digit
  2. let digit = anyOf ['0'..'9']
  3. // define parser for one or more digits
  4. let digits = many1 digit
  5. run digits "1ABC" // Success (['1'], "ABC")
  6. run digits "12BC" // Success (['1'; '2'], "BC")
  7. run digits "123C" // Success (['1'; '2'; '3'], "C")
  8. run digits "1234" // Success (['1'; '2'; '3'; '4'], "")
  9. run digits "ABC" // Failure "Expecting '9'. Got 'A'"

As we saw in an earlier example, the last case gives a misleading error. It says “Expecting ‘9’” when it really should say “Expecting a digit”.
In the next post we’ll fix this.

Parsing an integer

Using many1, we can create a parser for an integer. The implementation logic is:

  • Create a parser for a digit.
  • Use many1 to get a list of digits.
  • Using map, transform the result (a list of digits) into a string and then into an int.

Here’s the code:

  1. let pint =
  2. // helper
  3. let resultToInt digitList =
  4. // ignore int overflow for now
  5. String(List.toArray digitList) |> int
  6. // define parser for one digit
  7. let digit = anyOf ['0'..'9']
  8. // define parser for one or more digits
  9. let digits = many1 digit
  10. // map the digits to an int
  11. digits
  12. |> mapP resultToInt

And let’s test it:

  1. run pint "1ABC" // Success (1, "ABC")
  2. run pint "12BC" // Success (12, "BC")
  3. run pint "123C" // Success (123, "C")
  4. run pint "1234" // Success (1234, "")
  5. run pint "ABC" // Failure "Expecting '9'. Got 'A'"

5. opt — matching a parser zero or one time

Sometimes we only want to match a parser zero or one time. For example, the pint parser above does not handle negative values.
To correct this, we need to be able to handle an optional minus sign.

We can define an opt combinator easily:

  • Change the result of a specified parser to an option by mapping the result to Some.
  • Create another parser that always returns None.
  • Use <|> to choose the second (“None”) parser if the first fails.

Here’s the code:

  1. let opt p =
  2. let some = p |>> Some
  3. let none = returnP None
  4. some <|> none

Here’s an example of it in use — we match a digit followed by an optional semicolon:

  1. let digit = anyOf ['0'..'9']
  2. let digitThenSemicolon = digit .>>. opt (pchar ';')
  3. run digitThenSemicolon "1;" // Success (('1', Some ';'), "")
  4. run digitThenSemicolon "1" // Success (('1', None), "")

And here is pint rewritten to handle an optional minus sign:

  1. let pint =
  2. // helper
  3. let resultToInt (sign,charList) =
  4. let i = String(List.toArray charList) |> int
  5. match sign with
  6. | Some ch -> -i // negate the int
  7. | None -> i
  8. // define parser for one digit
  9. let digit = anyOf ['0'..'9']
  10. // define parser for one or more digits
  11. let digits = many1 digit
  12. // parse and convert
  13. opt (pchar '-') .>>. digits
  14. |>> resultToInt

Note that the resultToInt helper function now needs to handle the sign option as well as the list of digits.

And here it is in action:

  1. run pint "123C" // Success (123, "C")
  2. run pint "-123C" // Success (-123, "C")

6. Throwing results away

We often want to match something in the input, but we don’t care about the parsed value itself. For example:

  • For a quoted string, we need to parse the quotes, but we don’t need the quotes themselves.
  • For a statement ending in a semicolon, we need to ensure the semicolon is there, but we don’t need the semicolon itself.
  • For whitespace separators, we need to ensure the whitespace is there, but we don’t need the actual whitespace data.

To handle these requirements, we will define some new combinators that throw away the results of a parser:

  • p1 >>. p2 will apply p1 and p2 in sequence, just like .>>., but throw away the result of p1 and keep the result of p2.
  • p1 .>> p2 will apply p1 and p2 in sequence, just like .>>., but keep the result of p1 and throw away the result of p2.

These are easy to define — just map over the result of .>>., which is a tuple, and keep only one element of the pair.

  1. /// Keep only the result of the left side parser
  2. let (.>>) p1 p2 =
  3. // create a pair
  4. p1 .>>. p2
  5. // then only keep the first value
  6. |> mapP (fun (a,b) -> a)
  7. /// Keep only the result of the right side parser
  8. let (>>.) p1 p2 =
  9. // create a pair
  10. p1 .>>. p2
  11. // then only keep the second value
  12. |> mapP (fun (a,b) -> b)

These combinators allow us to simplify the digitThenSemicolon example shown earlier:

  1. let digit = anyOf ['0'..'9']
  2. // use .>> below
  3. let digitThenSemicolon = digit .>> opt (pchar ';')
  4. run digitThenSemicolon "1;" // Success ('1', "")
  5. run digitThenSemicolon "1" // Success ('1', "")

You can see that the result now is the same, whether or not the semicolon was present.

How about an example with whitespace?

The following code creates a parser that looks for “AB” followed by one or more whitespace chars, followed by “CD”.

  1. let whitespaceChar = anyOf [' '; '\t'; '\n']
  2. let whitespace = many1 whitespaceChar
  3. let ab = pstring "AB"
  4. let cd = pstring "CD"
  5. let ab_cd = (ab .>> whitespace) .>>. cd
  6. run ab_cd "AB \t\nCD" // Success (("AB", "CD"), "")

The result contains “AB” and “CD” only. The whitespace between them has been discarded.

Introducing between

A particularly common requirement is to look for a parser between delimiters such as quotes or brackets.

Creating a combinator for this is trivial:

  1. /// Keep only the result of the middle parser
  2. let between p1 p2 p3 =
  3. p1 >>. p2 .>> p3

And here it is in use, to parse a quoted integer:

  1. let pdoublequote = pchar '"'
  2. let quotedInteger = between pdoublequote pint pdoublequote
  3. run quotedInteger "\"1234\"" // Success (1234, "")
  4. run quotedInteger "1234" // Failure "Expecting '"'. Got '1'"

7. Parsing lists with separators

Another common requirement is parsing lists, seperated by something like commas or whitespace.

To implement a “one or more” list, we need to:

  • First combine the separator and parser into one combined parser, but using >>. to throw away the separator value.
  • Next, look for a list of the separator/parser combo using many.
  • Then prefix that with the first parser and combine the results.

Here’s the code:

  1. /// Parses one or more occurrences of p separated by sep
  2. let sepBy1 p sep =
  3. let sepThenP = sep >>. p
  4. p .>>. many sepThenP
  5. |>> fun (p,pList) -> p::pList

For the “zero or more” version, we can choose the empty list as an alternate if sepBy1 does not find any matches:

  1. /// Parses zero or more occurrences of p separated by sep
  2. let sepBy p sep =
  3. sepBy1 p sep <|> returnP []

Here’s some tests for sepBy1 and sepBy, with results shown in the comments:

  1. let comma = pchar ','
  2. let digit = anyOf ['0'..'9']
  3. let zeroOrMoreDigitList = sepBy digit comma
  4. let oneOrMoreDigitList = sepBy1 digit comma
  5. run oneOrMoreDigitList "1;" // Success (['1'], ";")
  6. run oneOrMoreDigitList "1,2;" // Success (['1'; '2'], ";")
  7. run oneOrMoreDigitList "1,2,3;" // Success (['1'; '2'; '3'], ";")
  8. run oneOrMoreDigitList "Z;" // Failure "Expecting '9'. Got 'Z'"
  9. run zeroOrMoreDigitList "1;" // Success (['1'], ";")
  10. run zeroOrMoreDigitList "1,2;" // Success (['1'; '2'], ";")
  11. run zeroOrMoreDigitList "1,2,3;" // Success (['1'; '2'; '3'], ";")
  12. run zeroOrMoreDigitList "Z;" // Success ([], "Z;")

What about bind?

One combinator that we haven’t implemented so far is bind (or >>=).

If you know anything about functional programming, or have seen my talk on FP patterns, you’ll know that bind
is a powerful tool that can be used to implement many functions.

Up to this point, I thought that it would be better to show implementations for combinators such as map and .>>. that were explicit and thus, hopefully, easier to understand.

But now that we have have some experience, let’s implement bind and see what we can do with it.

Here’s the implementation of bindP (as I’ll call it)

  1. /// "bindP" takes a parser-producing function f, and a parser p
  2. /// and passes the output of p into f, to create a new parser
  3. let bindP f p =
  4. let innerFn input =
  5. let result1 = run p input
  6. match result1 with
  7. | Failure err ->
  8. // return error from parser1
  9. Failure err
  10. | Success (value1,remainingInput) ->
  11. // apply f to get a new parser
  12. let p2 = f value1
  13. // run parser with remaining input
  14. run p2 remainingInput
  15. Parser innerFn

The signature of bindP is:

  1. val bindP :
  2. f:('a -> Parser<'b>) -> Parser<'a> -> Parser<'b>

which conforms to a standard bind signature. The input f is a “diagonal” function ('a -> Parser<'b>) and the output is a “horizontal” function (Parser<'a> -> Parser<'b>).
See this post for more details on how bind works.

The infix version of bind is >>=. Note that the parameters are flipped: f is now the second parameter which makes it more convenient for F#’s pipeline idiom.

  1. let ( >>= ) p f = bindP f p

Reimplementing other combinators with bindP and returnP

The combination of bindP and returnP can be used to re-implement many of the other combinators. Here are some examples:

  1. let mapP f =
  2. bindP (f >> returnP)
  3. let andThen p1 p2 =
  4. p1 >>= (fun p1Result ->
  5. p2 >>= (fun p2Result ->
  6. returnP (p1Result,p2Result) ))
  7. let applyP fP xP =
  8. fP >>= (fun f ->
  9. xP >>= (fun x ->
  10. returnP (f x) ))
  11. // (assuming "many" is defined)
  12. let many1 p =
  13. p >>= (fun head ->
  14. many p >>= (fun tail ->
  15. returnP (head::tail) ))

Note that the combinators that check the Failure path can not be implemented using bind. These include orElse and many.

Review

We could keep building combinators for ever, but I think we have everything we need to build a JSON parser now, so let’s stop and review what we have done.

In the previous post we created these combinators:

  • .>>. (andThen) applies the two parsers in sequence and returns the results in a tuple.
  • <|> (orElse) applies the first parser, and if that fails, the second parsers.
  • choice extends orElse to choose from a list of parsers.

And in this post we created the following additional combinators:

  • bindP chains the result of a parser to another parser-producing function.
  • mapP transforms the result of a parser.
  • returnP lifts an normal value into the world of parsers.
  • applyP allows us to lift multi-parameter functions into functions that work on Parsers.
  • lift2 uses applyP to lift two-parameter functions into Parser World.
  • sequence converts a list of Parsers into a Parser containing a list.
  • many matches zero or more occurences of the specified parser.
  • many1 matches one or more occurences of the specified parser.
  • opt matches an optional occurrence of the specified parser.
  • .>> keeps only the result of the left side parser.
  • >>. keeps only the result of the right side parser.
  • between keeps only the result of the middle parser.
  • sepBy parses zero or more occurrences of a parser with a separator.
  • sepBy1 parses one or more occurrences of a parser with a separator.

I hope you can see why the concept of “combinators” is so powerful; given just a few basic functions, we have built up a library of useful functions quickly and concisely.

Listing of the parser library so far

Here’s the complete listing for the parsing library so far — it’s about 200 lines of code now!

The source code displayed below is also available at this gist.

  1. open System
  2. /// Type that represents Success/Failure in parsing
  3. type Result<'a> =
  4. | Success of 'a
  5. | Failure of string
  6. /// Type that wraps a parsing function
  7. type Parser<'T> = Parser of (string -> Result<'T * string>)
  8. /// Parse a single character
  9. let pchar charToMatch =
  10. // define a nested inner function
  11. let innerFn str =
  12. if String.IsNullOrEmpty(str) then
  13. Failure "No more input"
  14. else
  15. let first = str.[0]
  16. if first = charToMatch then
  17. let remaining = str.[1..]
  18. Success (charToMatch,remaining)
  19. else
  20. let msg = sprintf "Expecting '%c'. Got '%c'" charToMatch first
  21. Failure msg
  22. // return the "wrapped" inner function
  23. Parser innerFn
  24. /// Run a parser with some input
  25. let run parser input =
  26. // unwrap parser to get inner function
  27. let (Parser innerFn) = parser
  28. // call inner function with input
  29. innerFn input
  30. /// "bindP" takes a parser-producing function f, and a parser p
  31. /// and passes the output of p into f, to create a new parser
  32. let bindP f p =
  33. let innerFn input =
  34. let result1 = run p input
  35. match result1 with
  36. | Failure err ->
  37. // return error from parser1
  38. Failure err
  39. | Success (value1,remainingInput) ->
  40. // apply f to get a new parser
  41. let p2 = f value1
  42. // run parser with remaining input
  43. run p2 remainingInput
  44. Parser innerFn
  45. /// Infix version of bindP
  46. let ( >>= ) p f = bindP f p
  47. /// Lift a value to a Parser
  48. let returnP x =
  49. let innerFn input =
  50. // ignore the input and return x
  51. Success (x,input)
  52. // return the inner function
  53. Parser innerFn
  54. /// apply a function to the value inside a parser
  55. let mapP f =
  56. bindP (f >> returnP)
  57. /// infix version of mapP
  58. let ( <!> ) = mapP
  59. /// "piping" version of mapP
  60. let ( |>> ) x f = mapP f x
  61. /// apply a wrapped function to a wrapped value
  62. let applyP fP xP =
  63. fP >>= (fun f ->
  64. xP >>= (fun x ->
  65. returnP (f x) ))
  66. /// infix version of apply
  67. let ( <*> ) = applyP
  68. /// lift a two parameter function to Parser World
  69. let lift2 f xP yP =
  70. returnP f <*> xP <*> yP
  71. /// Combine two parsers as "A andThen B"
  72. let andThen p1 p2 =
  73. p1 >>= (fun p1Result ->
  74. p2 >>= (fun p2Result ->
  75. returnP (p1Result,p2Result) ))
  76. /// Infix version of andThen
  77. let ( .>>. ) = andThen
  78. /// Combine two parsers as "A orElse B"
  79. let orElse p1 p2 =
  80. let innerFn input =
  81. // run parser1 with the input
  82. let result1 = run p1 input
  83. // test the result for Failure/Success
  84. match result1 with
  85. | Success result ->
  86. // if success, return the original result
  87. result1
  88. | Failure err ->
  89. // if failed, run parser2 with the input
  90. let result2 = run p2 input
  91. // return parser2's result
  92. result2
  93. // return the inner function
  94. Parser innerFn
  95. /// Infix version of orElse
  96. let ( <|> ) = orElse
  97. /// Choose any of a list of parsers
  98. let choice listOfParsers =
  99. List.reduce ( <|> ) listOfParsers
  100. /// Choose any of a list of characters
  101. let anyOf listOfChars =
  102. listOfChars
  103. |> List.map pchar // convert into parsers
  104. |> choice
  105. /// Convert a list of Parsers into a Parser of a list
  106. let rec sequence parserList =
  107. // define the "cons" function, which is a two parameter function
  108. let cons head tail = head::tail
  109. // lift it to Parser World
  110. let consP = lift2 cons
  111. // process the list of parsers recursively
  112. match parserList with
  113. | [] ->
  114. returnP []
  115. | head::tail ->
  116. consP head (sequence tail)
  117. /// (helper) match zero or more occurences of the specified parser
  118. let rec parseZeroOrMore parser input =
  119. // run parser with the input
  120. let firstResult = run parser input
  121. // test the result for Failure/Success
  122. match firstResult with
  123. | Failure err ->
  124. // if parse fails, return empty list
  125. ([],input)
  126. | Success (firstValue,inputAfterFirstParse) ->
  127. // if parse succeeds, call recursively
  128. // to get the subsequent values
  129. let (subsequentValues,remainingInput) =
  130. parseZeroOrMore parser inputAfterFirstParse
  131. let values = firstValue::subsequentValues
  132. (values,remainingInput)
  133. /// matches zero or more occurences of the specified parser
  134. let many parser =
  135. let rec innerFn input =
  136. // parse the input -- wrap in Success as it always succeeds
  137. Success (parseZeroOrMore parser input)
  138. Parser innerFn
  139. /// matches one or more occurences of the specified parser
  140. let many1 p =
  141. p >>= (fun head ->
  142. many p >>= (fun tail ->
  143. returnP (head::tail) ))
  144. /// Parses an optional occurrence of p and returns an option value.
  145. let opt p =
  146. let some = p |>> Some
  147. let none = returnP None
  148. some <|> none
  149. /// Keep only the result of the left side parser
  150. let (.>>) p1 p2 =
  151. // create a pair
  152. p1 .>>. p2
  153. // then only keep the first value
  154. |> mapP (fun (a,b) -> a)
  155. /// Keep only the result of the right side parser
  156. let (>>.) p1 p2 =
  157. // create a pair
  158. p1 .>>. p2
  159. // then only keep the second value
  160. |> mapP (fun (a,b) -> b)
  161. /// Keep only the result of the middle parser
  162. let between p1 p2 p3 =
  163. p1 >>. p2 .>> p3
  164. /// Parses one or more occurrences of p separated by sep
  165. let sepBy1 p sep =
  166. let sepThenP = sep >>. p
  167. p .>>. many sepThenP
  168. |>> fun (p,pList) -> p::pList
  169. /// Parses zero or more occurrences of p separated by sep
  170. let sepBy p sep =
  171. sepBy1 p sep <|> returnP []

Summary

In this post, we have built on the basic parsing code from last time to create a library of a 15 or so combinators that can be combined to parse almost anything.

Soon, we’ll use them to build a JSON parser, but before that, let’s pause and take time to clean up the error messages.
That will be the topic of the next post.

The source code for this post is available at this gist.