layout: post
title: “Writing a JSON parser from scratch”
description: “In 250 lines of code”
categories: [Combinators,Patterns]
seriesId: “Understanding Parser Combinators”

seriesOrder: 4

UPDATE: Slides and video from my talk on this topic

In this series, we are looking at how applicative parsers and parser combinators work.

  • In the first post, we created the foundations of a parsing library.
  • In the second post, we built out the library with many other useful combinators.
  • In the third post, we improved the error messages.
  • In this last post, we’ll use the library we’ve written to build a JSON parser.

First, before we do anything else, we need to load the parser library script that we developed over the last few posts, and then open the ParserLibrary namespace:

  1. #load "ParserLibrary.fsx"
  2. open System
  3. open ParserLibrary

You can download ParserLibrary.fsx from here.

1. Building a model to represent the JSON spec

The JSON spec is available at json.org. I’ll paraphase it here:

  • A value can be a string or a number or a bool or null or an object or an array.
    • These structures can be nested.
  • A string is a sequence of zero or more Unicode characters, wrapped in double quotes, using backslash escapes.
  • A number is very much like a C or Java number, except that the octal and hexadecimal formats are not used.
  • A boolean is the literal true or false
  • A null is the literal null
  • An object is an unordered set of name/value pairs.
    • An object begins with { (left brace) and ends with } (right brace).
    • Each name is followed by : (colon) and the name/value pairs are separated by , (comma).
  • An array is an ordered collection of values.
    • An array begins with [ (left bracket) and ends with ] (right bracket).
    • Values are separated by , (comma).
  • Whitespace can be inserted between any pair of tokens.

In F#, this definition can be modelled naturally as:

  1. type JValue =
  2. | JString of string
  3. | JNumber of float
  4. | JBool of bool
  5. | JNull
  6. | JObject of Map<string, Json>
  7. | JArray of Json list

So the goal of our JSON parser is:

  • Given a string, we want to output a JValue value.

2. Getting started with Null and Bool

Let’s start with the simplest tasks — parsing the literal values for null and the booleans.

Parsing Null

Parsing the null literal is trivial. The logic will be:

  • Match the string “null”.
  • Map the result to the JNull case.

Here’s the code:

  1. let jNull =
  2. pstring "null"
  3. |>> (fun _ -> JNull) // map to JNull
  4. <?> "null" // give it a label

Note that we don’t actually care about the value returned by the parser because we know in advance that it is going to be “null”!

This is a common situation, so let’s write a little utility function, >>% to make this look nicer:

  1. // applies the parser p, ignores the result, and returns x.
  2. let (>>%) p x =
  3. p |>> (fun _ -> x)

Now we can rewrite jNull as follows:

  1. let jNull =
  2. pstring "null"
  3. >>% JNull // using new utility combinator
  4. <?> "null"

Let’s test:

  1. run jNull "null"
  2. // Success: JNull
  3. run jNull "nulp" |> printResult
  4. // Line:0 Col:3 Error parsing null
  5. // nulp
  6. // ^Unexpected 'p'

That looks good. Let’s try another one!

Parsing Bool

The bool parser will be similar to null:

  • Create a parser to match “true”.
  • Create a parser to match “false”.
  • And then choose between them using <|>.

Here’s the code:

  1. let jBool =
  2. let jtrue =
  3. pstring "true"
  4. >>% JBool true // map to JBool
  5. let jfalse =
  6. pstring "false"
  7. >>% JBool false // map to JBool
  8. // choose between true and false
  9. jtrue <|> jfalse
  10. <?> "bool" // give it a label

And here are some tests:

  1. run jBool "true"
  2. // Success: JBool true
  3. run jBool "false"
  4. // Success: JBool false
  5. run jBool "truX" |> printResult
  6. // Line:0 Col:0 Error parsing bool
  7. // truX
  8. // ^Unexpected 't'

Note that the error is misleading due to the backtracking issue discussed in the previous post. Since “true” failed,
it is trying to parse “false” now, and “t” is an unexpected character.

3. Parsing String

Now for something more complicated — strings.

The spec for string parsing is available as a “railway diagram” like this:

Writing a JSON parser from scratch - 图1

All diagrams sourced from json.org.

To build a parser from a diagram like this, we work from the bottom up, building small “primitive” parsers which we then combine into larger ones.

Let’s start with “any unicode character other than quote and backslash”. We have a simple condition to test, so we can just use the satisfy function:

  1. let jUnescapedChar =
  2. let label = "char"
  3. satisfy (fun ch -> ch <> '\\' && ch <> '\"') label

We can test it immediately:

  1. run jUnescapedChar "a" // Success 'a'
  2. run jUnescapedChar "\\" |> printResult
  3. // Line:0 Col:0 Error parsing char
  4. // \
  5. // ^Unexpected '\'

Ok, good.

Escaped characters

Now what about the next case, the escaped characters?

In this case we have a list of strings to match ("\"", "\n", etc) and for each of these, a character to use as the result.

The logic will be:

  • First define a list of pairs in the form (stringToMatch, resultChar).
  • For each of these, build a parser using pstring stringToMatch >>% resultChar).
  • Finally, combine all these parsers together using the choice function.

Here’s the code:

  1. /// Parse an escaped char
  2. let jEscapedChar =
  3. [
  4. // (stringToMatch, resultChar)
  5. ("\\\"",'\"') // quote
  6. ("\\\\",'\\') // reverse solidus
  7. ("\\/",'/') // solidus
  8. ("\\b",'\b') // backspace
  9. ("\\f",'\f') // formfeed
  10. ("\\n",'\n') // newline
  11. ("\\r",'\r') // cr
  12. ("\\t",'\t') // tab
  13. ]
  14. // convert each pair into a parser
  15. |> List.map (fun (toMatch,result) ->
  16. pstring toMatch >>% result)
  17. // and combine them into one
  18. |> choice
  19. <?> "escaped char" // set label

And again, let’s test it immediately:

  1. run jEscapedChar "\\\\" // Success '\'
  2. run jEscapedChar "\\t" // Success '\009'
  3. run jEscapedChar "a" |> printResult
  4. // Line:0 Col:0 Error parsing escaped char
  5. // a
  6. // ^Unexpected 'a'

It works nicely!

Unicode characters

The final case is the parsing of unicode characters with hex digits.

The logic will be:

  • First define the primitives for backslash, u and hexdigit.
  • Combine them together, using four hexdigits.
  • The output of the parser will be a nested, ugly tuple, so we need a helper function to convert the
    digits to an int, and then a char.

Here’s the code:

  1. /// Parse a unicode char
  2. let jUnicodeChar =
  3. // set up the "primitive" parsers
  4. let backslash = pchar '\\'
  5. let uChar = pchar 'u'
  6. let hexdigit = anyOf (['0'..'9'] @ ['A'..'F'] @ ['a'..'f'])
  7. // convert the parser output (nested tuples)
  8. // to a char
  9. let convertToChar (((h1,h2),h3),h4) =
  10. let str = sprintf "%c%c%c%c" h1 h2 h3 h4
  11. Int32.Parse(str,Globalization.NumberStyles.HexNumber) |> char
  12. // set up the main parser
  13. backslash >>. uChar >>. hexdigit .>>. hexdigit .>>. hexdigit .>>. hexdigit
  14. |>> convertToChar

And let’s test with a smiley face — \u263A.

  1. run jUnicodeChar "\\u263A"

The complete String parser

Putting it all together now:

  • Define a primitive for quote
  • Define a jchar as a choice between jUnescapedChar, jEscapedChar, and jUnicodeChar.
  • The whole parser is then zero or many jchar between two quotes.
  1. let quotedString =
  2. let quote = pchar '\"' <?> "quote"
  3. let jchar = jUnescapedChar <|> jEscapedChar <|> jUnicodeChar
  4. // set up the main parser
  5. quote >>. manyChars jchar .>> quote

One more thing, which is to wrap the quoted string in a JString case and give it a label:

  1. /// Parse a JString
  2. let jString =
  3. // wrap the string in a JString
  4. quotedString
  5. |>> JString // convert to JString
  6. <?> "quoted string" // add label

Let’s test the complete jString function:

  1. run jString "\"\"" // Success ""
  2. run jString "\"a\"" // Success "a"
  3. run jString "\"ab\"" // Success "ab"
  4. run jString "\"ab\\tde\"" // Success "ab\tde"
  5. run jString "\"ab\\u263Ade\"" // Success "ab?de"

4. Parsing Number

The “railway diagram” for Number parsing is:

Writing a JSON parser from scratch - 图2

Again, we’ll work bottom up. Let’s start with the most primitive components, the single chars and digits:

  1. let optSign = opt (pchar '-')
  2. let zero = pstring "0"
  3. let digitOneNine =
  4. satisfy (fun ch -> Char.IsDigit ch && ch <> '0') "1-9"
  5. let digit =
  6. satisfy (fun ch -> Char.IsDigit ch ) "digit"
  7. let point = pchar '.'
  8. let e = pchar 'e' <|> pchar 'E'
  9. let optPlusMinus = opt (pchar '-' <|> pchar '+')

Now let’s build the “integer” part of the number. This is either:

  • The digit zero, or,
  • A nonZeroInt, which is a digitOneNine followed by zero or more normal digits.
  1. let nonZeroInt =
  2. digitOneNine .>>. manyChars digit
  3. |>> fun (first,rest) -> string first + rest
  4. let intPart = zero <|> nonZeroInt

Note that, for the nonZeroInt parser, we have to combine the output of digitOneNine (a char) with manyChars digit (a string)
so a simple map function is needed.

The optional fractional part is a decimal point followed by one or more digits:

  1. let fractionPart = point >>. manyChars1 digit

And the exponent part is an e followed by an optional sign, followed by one or more digits:

  1. let exponentPart = e >>. optPlusMinus .>>. manyChars1 digit

With these components, we can assemble the whole number:

  1. optSign .>>. intPart .>>. opt fractionPart .>>. opt exponentPart
  2. |>> convertToJNumber
  3. <?> "number" // add label

We haven’t defined convertToJNumber yet though. This function will take the four-tuple output by the parser and convert it
into a float.

Now rather than writing custom float logic, we’re going to be lazy and let the .NET framework to the conversion for us!
That is, each of the components will be turned into a string, concatenated, and the whole string parsed into a float.

The problem is that some of the components (like the sign and exponent) are optional. Let’s write a helper that converts
an option to a string using a passed in function, but if the option is None return the empty string.

I’m going to call it |>? but it doesn’t really matter because it is only used locally within the jNumber parser.

  1. // utility function to convert an optional value to a string, or "" if missing
  2. let ( |>? ) opt f =
  3. match opt with
  4. | None -> ""
  5. | Some x -> f x

Now we can create convertToJNumber:

  • The sign is converted to a string.
  • The fractional part is converted to a string, prefixed with a decimal point.
  • The exponent part is converted to a string, with the sign of the exponent also being converted to a string.
  1. let convertToJNumber (((optSign,intPart),fractionPart),expPart) =
  2. // convert to strings and let .NET parse them! - crude but ok for now.
  3. let signStr =
  4. optSign
  5. |>? string // e.g. "-"
  6. let fractionPartStr =
  7. fractionPart
  8. |>? (fun digits -> "." + digits ) // e.g. ".456"
  9. let expPartStr =
  10. expPart
  11. |>? fun (optSign, digits) ->
  12. let sign = optSign |>? string
  13. "e" + sign + digits // e.g. "e-12"
  14. // add the parts together and convert to a float, then wrap in a JNumber
  15. (signStr + intPart + fractionPartStr + expPartStr)
  16. |> float
  17. |> JNumber

It’s pretty crude, and converting things to strings can be slow, so feel free to write a better version.

With that, we have everything we need for the complete jNumber function:

  1. /// Parse a JNumber
  2. let jNumber =
  3. // set up the "primitive" parsers
  4. let optSign = opt (pchar '-')
  5. let zero = pstring "0"
  6. let digitOneNine =
  7. satisfy (fun ch -> Char.IsDigit ch && ch <> '0') "1-9"
  8. let digit =
  9. satisfy (fun ch -> Char.IsDigit ch ) "digit"
  10. let point = pchar '.'
  11. let e = pchar 'e' <|> pchar 'E'
  12. let optPlusMinus = opt (pchar '-' <|> pchar '+')
  13. let nonZeroInt =
  14. digitOneNine .>>. manyChars digit
  15. |>> fun (first,rest) -> string first + rest
  16. let intPart = zero <|> nonZeroInt
  17. let fractionPart = point >>. manyChars1 digit
  18. let exponentPart = e >>. optPlusMinus .>>. manyChars1 digit
  19. // utility function to convert an optional value to a string, or "" if missing
  20. let ( |>? ) opt f =
  21. match opt with
  22. | None -> ""
  23. | Some x -> f x
  24. let convertToJNumber (((optSign,intPart),fractionPart),expPart) =
  25. // convert to strings and let .NET parse them! - crude but ok for now.
  26. let signStr =
  27. optSign
  28. |>? string // e.g. "-"
  29. let fractionPartStr =
  30. fractionPart
  31. |>? (fun digits -> "." + digits ) // e.g. ".456"
  32. let expPartStr =
  33. expPart
  34. |>? fun (optSign, digits) ->
  35. let sign = optSign |>? string
  36. "e" + sign + digits // e.g. "e-12"
  37. // add the parts together and convert to a float, then wrap in a JNumber
  38. (signStr + intPart + fractionPartStr + expPartStr)
  39. |> float
  40. |> JNumber
  41. // set up the main parser
  42. optSign .>>. intPart .>>. opt fractionPart .>>. opt exponentPart
  43. |>> convertToJNumber
  44. <?> "number" // add label

It’s a bit long-winded, but each component follows the spec, so I think it is still quite readable.

Let’s start testing it:

  1. run jNumber "123" // JNumber 123.0
  2. run jNumber "-123" // JNumber -123.0
  3. run jNumber "123.4" // JNumber 123.4

And what about some failing cases?

  1. run jNumber "-123." // JNumber -123.0 -- should fail!
  2. run jNumber "00.1" // JNumber 0 -- should fail!

Hmm. Something went wrong! These cases should fail, surely?

Well, no. What’s happening in the -123. case is that the parser is consuming everything up the to decimal point and then stopping,
leaving the decimal point to be matched by the next parser! So, not an error.

Similarly, in the 00.1 case, the parser is consuming only the first 0 then stopping,
leaving the rest of the input (0.4) to be matched by the next parser. Again, not an error.

To fix this properly is out of scope, so let’s just add some whitespace to the parser to force it to terminate.

  1. let jNumber_ = jNumber .>> spaces1

Now let’s test again:

  1. run jNumber_ "123" // JNumber 123.0
  2. run jNumber_ "-123" // JNumber -123.0
  3. run jNumber_ "-123." |> printResult
  4. // Line:0 Col:4 Error parsing number andThen many1 whitespace
  5. // -123.
  6. // ^Unexpected '.'

and we find the error is being detected properly now.

Let’s test the fractional part:

  1. run jNumber_ "123.4" // JNumber 123.4
  2. run jNumber_ "00.4" |> printResult
  3. // Line:0 Col:1 Error parsing number andThen many1 whitespace
  4. // 00.4
  5. // ^Unexpected '0'

and the exponent part now:

  1. // exponent only
  2. run jNumber_ "123e4" // JNumber 1230000.0
  3. // fraction and exponent
  4. run jNumber_ "123.4e5" // JNumber 12340000.0
  5. run jNumber_ "123.4e-5" // JNumber 0.001234

It’s all looking good so far. Onwards and upwards!

5. Parsing Array

Next up is the Array case. Again, we can use the railway diagram to guide the implementation:

Writing a JSON parser from scratch - 图3

We will start with the primitives again. Note that we are adding optional whitespace after each token:

  1. let jArray =
  2. let left = pchar '[' .>> spaces
  3. let right = pchar ']' .>> spaces
  4. let comma = pchar ',' .>> spaces
  5. let value = jValue .>> spaces

And then we create a list of values separated by a comma, with the whole list between the left and right brackets.

  1. let jArray =
  2. ...
  3. // set up the list parser
  4. let values = sepBy1 value comma
  5. // set up the main parser
  6. between left values right
  7. |>> JArray
  8. <?> "array"

Hold on — what is this jValue?

  1. let jArray =
  2. ...
  3. let value = jValue .>> spaces // <=== what is "jValue"?
  4. ...

Well, the spec says that an Array can contain a list of values, so we’ll assume that we have a jValue parser that can parse them.

But to parse a JValue, we need to parse a Array first!

We have hit a common problem in parsing — mutually recursive definitions. We need a JValue parser to build an Array, but we need an Array parser to build a JValue.

How can we deal with this?

Forward references

The trick is to create a forward reference, a dummy JValue parser that we can use right now to define the Array parser,
and then later on, we will fix up the forward reference with the “real” JValue parser.

This is one time where mutable references come in handy!

We will need a helper function to assist us with this, and the logic will be as follows:

  • Define a dummy parser that will be replaced later.
  • Define a real parser that forwards the input stream to the dummy parser.
  • Return both the real parser and a reference to the dummy parser.

Now when the client fixes up the reference, the real parser will forward the input to the new parser that has replaced the dummy parser.

Here’s the code:

  1. let createParserForwardedToRef<'a>() =
  2. let dummyParser=
  3. let innerFn input : Result<'a * Input> = failwith "unfixed forwarded parser"
  4. {parseFn=innerFn; label="unknown"}
  5. // ref to placeholder Parser
  6. let parserRef = ref dummyParser
  7. // wrapper Parser
  8. let innerFn input =
  9. // forward input to the placeholder
  10. runOnInput !parserRef input
  11. let wrapperParser = {parseFn=innerFn; label="unknown"}
  12. wrapperParser, parserRef

With this in place, we can create a placeholder for a parser of type JValue:

  1. let jValue,jValueRef = createParserForwardedToRef<JValue>()

Finishing up the Array parser

Going back to the Array parser, we can now compile it successfully, using the jValue placeholder:

  1. let jArray =
  2. // set up the "primitive" parsers
  3. let left = pchar '[' .>> spaces
  4. let right = pchar ']' .>> spaces
  5. let comma = pchar ',' .>> spaces
  6. let value = jValue .>> spaces
  7. // set up the list parser
  8. let values = sepBy1 value comma
  9. // set up the main parser
  10. between left values right
  11. |>> JArray
  12. <?> "array"

If we try to test it now, we get an exception because we haven’t fixed up the reference:

  1. run jArray "[ 1, 2 ]"
  2. // System.Exception: unfixed forwarded parser

So for now, let’s fix up the reference to use one of the parsers that we have already created, such as jNumber:

  1. jValueRef := jNumber

Now we can successfully test the jArray function, as long as we are careful to only use numbers in our array!

  1. run jArray "[ 1, 2 ]"
  2. // Success (JArray [JNumber 1.0; JNumber 2.0],
  3. run jArray "[ 1, 2, ]" |> printResult
  4. // Line:0 Col:6 Error parsing array
  5. // [ 1, 2, ]
  6. // ^Unexpected ','

6. Parsing Object

The parser for Object is very similar to the one for Array.

First, the railway diagram:

Writing a JSON parser from scratch - 图4

Using this, we can create the parser directly, so I’ll present it here without comment:

  1. let jObject =
  2. // set up the "primitive" parsers
  3. let left = pchar '{' .>> spaces
  4. let right = pchar '}' .>> spaces
  5. let colon = pchar ':' .>> spaces
  6. let comma = pchar ',' .>> spaces
  7. let key = quotedString .>> spaces
  8. let value = jValue .>> spaces
  9. // set up the list parser
  10. let keyValue = (key .>> colon) .>>. value
  11. let keyValues = sepBy1 keyValue comma
  12. // set up the main parser
  13. between left keyValues right
  14. |>> Map.ofList // convert the list of keyValues into a Map
  15. |>> JObject // wrap in JObject
  16. <?> "object" // add label

A bit of testing to make sure it works (but remember, only numbers are supported as values for now).

  1. run jObject """{ "a":1, "b" : 2 }"""
  2. // JObject (map [("a", JNumber 1.0); ("b", JNumber 2.0)]),
  3. run jObject """{ "a":1, "b" : 2, }""" |> printResult
  4. // Line:0 Col:18 Error parsing object
  5. // { "a":1, "b" : 2, }
  6. // ^Unexpected ','

7. Putting it all together

Finally, we can combine all six of the parsers using the choice combinator, and we can assign this to the JValue parser reference that we created earlier:

  1. jValueRef := choice
  2. [
  3. jNull
  4. jBool
  5. jNumber
  6. jString
  7. jArray
  8. jObject
  9. ]

And now we are ready to rock and roll!

Testing the complete parser: example 1

Here’s an example of a JSON string that we can attempt to parse:

  1. let example1 = """{
  2. "name" : "Scott",
  3. "isMale" : true,
  4. "bday" : {"year":2001, "month":12, "day":25 },
  5. "favouriteColors" : ["blue", "green"]
  6. }"""
  7. run jValue example1

And here is the result:

  1. JObject
  2. (map
  3. [("bday", JObject(map
  4. [("day", JNumber 25.0);
  5. ("month", JNumber 12.0);
  6. ("year", JNumber 2001.0)]));
  7. ("favouriteColors", JArray [JString "blue"; JString "green"]);
  8. ("isMale", JBool true);
  9. ("name", JString "Scott")
  10. ])

Testing the complete parser: example 2

Here’s one from the example page on json.org:

  1. let example2= """{"widget": {
  2. "debug": "on",
  3. "window": {
  4. "title": "Sample Konfabulator Widget",
  5. "name": "main_window",
  6. "width": 500,
  7. "height": 500
  8. },
  9. "image": {
  10. "src": "Images/Sun.png",
  11. "name": "sun1",
  12. "hOffset": 250,
  13. "vOffset": 250,
  14. "alignment": "center"
  15. },
  16. "text": {
  17. "data": "Click Here",
  18. "size": 36,
  19. "style": "bold",
  20. "name": "text1",
  21. "hOffset": 250,
  22. "vOffset": 100,
  23. "alignment": "center",
  24. "onMouseUp": "sun1.opacity = (sun1.opacity / 100) * 90;"
  25. }
  26. }} """
  27. run jValue example2

And here is the result:

  1. JObject(map
  2. [("widget",JObject(map
  3. [("debug", JString "on");
  4. ("image",JObject(map
  5. [("alignment", JString "center");
  6. ("hOffset", JNumber 250.0); ("name", JString "sun1");
  7. ("src", JString "Images/Sun.png");
  8. ("vOffset", JNumber 250.0)]));
  9. ("text",JObject(map
  10. [("alignment", JString "center");
  11. ("data", JString "Click Here");
  12. ("hOffset", JNumber 250.0);
  13. ("name", JString "text1");
  14. ("onMouseUp", JString "sun1.opacity = (sun1.opacity / 100) * 90;");
  15. ("size", JNumber 36.0);
  16. ("style", JString "bold");
  17. ("vOffset", JNumber 100.0)]));
  18. ("window",JObject(map
  19. [("height", JNumber 500.0);
  20. ("name", JString "main_window");
  21. ("title", JString "Sample Konfabulator Widget");
  22. ("width", JNumber 500.0)]))]))]),

Complete listing of the JSON parser

Here’s the complete listing for the JSON parser — it’s about 250 lines of useful code.

The source code displayed below is also available at this gist.

  1. #load "ParserLibrary.fsx"
  2. open System
  3. open ParserLibrary
  4. (*
  5. // --------------------------------
  6. JSON spec from http://www.json.org/
  7. // --------------------------------
  8. The JSON spec is available at [json.org](http://www.json.org/). I'll paraphase it here:
  9. * A `value` can be a `string` or a `number` or a `bool` or `null` or an `object` or an `array`.
  10. * These structures can be nested.
  11. * A `string` is a sequence of zero or more Unicode characters, wrapped in double quotes, using backslash escapes.
  12. * A `number` is very much like a C or Java number, except that the octal and hexadecimal formats are not used.
  13. * A `boolean` is the literal `true` or `false`
  14. * A `null` is the literal `null`
  15. * An `object` is an unordered set of name/value pairs.
  16. * An object begins with { (left brace) and ends with } (right brace).
  17. * Each name is followed by : (colon) and the name/value pairs are separated by , (comma).
  18. * An `array` is an ordered collection of values.
  19. * An array begins with [ (left bracket) and ends with ] (right bracket).
  20. * Values are separated by , (comma).
  21. * Whitespace can be inserted between any pair of tokens.
  22. *)
  23. type JValue =
  24. | JString of string
  25. | JNumber of float
  26. | JBool of bool
  27. | JNull
  28. | JObject of Map<string, JValue>
  29. | JArray of JValue list
  30. // ======================================
  31. // Forward reference
  32. // ======================================
  33. /// Create a forward reference
  34. let createParserForwardedToRef<'a>() =
  35. let dummyParser=
  36. let innerFn input : Result<'a * Input> = failwith "unfixed forwarded parser"
  37. {parseFn=innerFn; label="unknown"}
  38. // ref to placeholder Parser
  39. let parserRef = ref dummyParser
  40. // wrapper Parser
  41. let innerFn input =
  42. // forward input to the placeholder
  43. runOnInput !parserRef input
  44. let wrapperParser = {parseFn=innerFn; label="unknown"}
  45. wrapperParser, parserRef
  46. let jValue,jValueRef = createParserForwardedToRef<JValue>()
  47. // ======================================
  48. // Utility function
  49. // ======================================
  50. // applies the parser p, ignores the result, and returns x.
  51. let (>>%) p x =
  52. p |>> (fun _ -> x)
  53. // ======================================
  54. // Parsing a JNull
  55. // ======================================
  56. let jNull =
  57. pstring "null"
  58. >>% JNull // map to JNull
  59. <?> "null" // give it a label
  60. // ======================================
  61. // Parsing a JBool
  62. // ======================================
  63. let jBool =
  64. let jtrue =
  65. pstring "true"
  66. >>% JBool true // map to JBool
  67. let jfalse =
  68. pstring "false"
  69. >>% JBool false // map to JBool
  70. // choose between true and false
  71. jtrue <|> jfalse
  72. <?> "bool" // give it a label
  73. // ======================================
  74. // Parsing a JString
  75. // ======================================
  76. /// Parse an unescaped char
  77. let jUnescapedChar =
  78. satisfy (fun ch -> ch <> '\\' && ch <> '\"') "char"
  79. /// Parse an escaped char
  80. let jEscapedChar =
  81. [
  82. // (stringToMatch, resultChar)
  83. ("\\\"",'\"') // quote
  84. ("\\\\",'\\') // reverse solidus
  85. ("\\/",'/') // solidus
  86. ("\\b",'\b') // backspace
  87. ("\\f",'\f') // formfeed
  88. ("\\n",'\n') // newline
  89. ("\\r",'\r') // cr
  90. ("\\t",'\t') // tab
  91. ]
  92. // convert each pair into a parser
  93. |> List.map (fun (toMatch,result) ->
  94. pstring toMatch >>% result)
  95. // and combine them into one
  96. |> choice
  97. /// Parse a unicode char
  98. let jUnicodeChar =
  99. // set up the "primitive" parsers
  100. let backslash = pchar '\\'
  101. let uChar = pchar 'u'
  102. let hexdigit = anyOf (['0'..'9'] @ ['A'..'F'] @ ['a'..'f'])
  103. // convert the parser output (nested tuples)
  104. // to a char
  105. let convertToChar (((h1,h2),h3),h4) =
  106. let str = sprintf "%c%c%c%c" h1 h2 h3 h4
  107. Int32.Parse(str,Globalization.NumberStyles.HexNumber) |> char
  108. // set up the main parser
  109. backslash >>. uChar >>. hexdigit .>>. hexdigit .>>. hexdigit .>>. hexdigit
  110. |>> convertToChar
  111. /// Parse a quoted string
  112. let quotedString =
  113. let quote = pchar '\"' <?> "quote"
  114. let jchar = jUnescapedChar <|> jEscapedChar <|> jUnicodeChar
  115. // set up the main parser
  116. quote >>. manyChars jchar .>> quote
  117. /// Parse a JString
  118. let jString =
  119. // wrap the string in a JString
  120. quotedString
  121. |>> JString // convert to JString
  122. <?> "quoted string" // add label
  123. // ======================================
  124. // Parsing a JNumber
  125. // ======================================
  126. /// Parse a JNumber
  127. let jNumber =
  128. // set up the "primitive" parsers
  129. let optSign = opt (pchar '-')
  130. let zero = pstring "0"
  131. let digitOneNine =
  132. satisfy (fun ch -> Char.IsDigit ch && ch <> '0') "1-9"
  133. let digit =
  134. satisfy (fun ch -> Char.IsDigit ch ) "digit"
  135. let point = pchar '.'
  136. let e = pchar 'e' <|> pchar 'E'
  137. let optPlusMinus = opt (pchar '-' <|> pchar '+')
  138. let nonZeroInt =
  139. digitOneNine .>>. manyChars digit
  140. |>> fun (first,rest) -> string first + rest
  141. let intPart = zero <|> nonZeroInt
  142. let fractionPart = point >>. manyChars1 digit
  143. let exponentPart = e >>. optPlusMinus .>>. manyChars1 digit
  144. // utility function to convert an optional value to a string, or "" if missing
  145. let ( |>? ) opt f =
  146. match opt with
  147. | None -> ""
  148. | Some x -> f x
  149. let convertToJNumber (((optSign,intPart),fractionPart),expPart) =
  150. // convert to strings and let .NET parse them! - crude but ok for now.
  151. let signStr =
  152. optSign
  153. |>? string // e.g. "-"
  154. let fractionPartStr =
  155. fractionPart
  156. |>? (fun digits -> "." + digits ) // e.g. ".456"
  157. let expPartStr =
  158. expPart
  159. |>? fun (optSign, digits) ->
  160. let sign = optSign |>? string
  161. "e" + sign + digits // e.g. "e-12"
  162. // add the parts together and convert to a float, then wrap in a JNumber
  163. (signStr + intPart + fractionPartStr + expPartStr)
  164. |> float
  165. |> JNumber
  166. // set up the main parser
  167. optSign .>>. intPart .>>. opt fractionPart .>>. opt exponentPart
  168. |>> convertToJNumber
  169. <?> "number" // add label
  170. // ======================================
  171. // Parsing a JArray
  172. // ======================================
  173. let jArray =
  174. // set up the "primitive" parsers
  175. let left = pchar '[' .>> spaces
  176. let right = pchar ']' .>> spaces
  177. let comma = pchar ',' .>> spaces
  178. let value = jValue .>> spaces
  179. // set up the list parser
  180. let values = sepBy1 value comma
  181. // set up the main parser
  182. between left values right
  183. |>> JArray
  184. <?> "array"
  185. // ======================================
  186. // Parsing a JObject
  187. // ======================================
  188. let jObject =
  189. // set up the "primitive" parsers
  190. let left = pchar '{' .>> spaces
  191. let right = pchar '}' .>> spaces
  192. let colon = pchar ':' .>> spaces
  193. let comma = pchar ',' .>> spaces
  194. let key = quotedString .>> spaces
  195. let value = jValue .>> spaces
  196. // set up the list parser
  197. let keyValue = (key .>> colon) .>>. value
  198. let keyValues = sepBy1 keyValue comma
  199. // set up the main parser
  200. between left keyValues right
  201. |>> Map.ofList // convert the list of keyValues into a Map
  202. |>> JObject // wrap in JObject
  203. <?> "object" // add label
  204. // ======================================
  205. // Fixing up the jValue ref
  206. // ======================================
  207. // fixup the forward ref
  208. jValueRef := choice
  209. [
  210. jNull
  211. jBool
  212. jNumber
  213. jString
  214. jArray
  215. jObject
  216. ]

Summary

In this post, we built a JSON parser using the parser library that we have developed over the previous posts.

I hope that, by building both the parser library and a real-world parser from scratch, you have gained a good appreciation for how parser combinators work,
and how useful they are.

I’ll repeat what I said in the first post: if you are interesting in using this technique in production,
be sure to investigate the FParsec library for F#, which is optimized for real-world usage.

And if you are using languages other than F#, there is almost certainly a parser combinator library available to use.

Thanks!

The source code for this post is available at this gist.