layout: post
title: “Worked example: Parsing command line arguments”
description: “Pattern matching in practice”
nav: thinking-functionally
seriesId: “Expressions and syntax”
seriesOrder: 11

categories: [Patterns, Worked Examples]

Now that we’ve seen how the match expression works, let’s look at some examples in practice. But first, a word about the design approach.

Application design in F#

We’ve seen that a generic function takes input and emits output. But in a sense, that approach applies at any level of functional code, even at the top level.

In fact, we can say that a functional application takes input, transforms it, and emits output:

Worked example: Parsing command line arguments - 图1

Now ideally, the transformations work within the pure type-safe world that we create to model the domain, but unfortunately, the real world is untyped!
That is, the input is likely to be simple strings or bytes, and the output also.

How can we work with this? The obvious solution is to have a separate stage to convert the input to our pure internal model, and then another separate stage to convert from the internal model to the output.

Worked example: Parsing command line arguments - 图2

In this way, we can hide the messiness of the real world from the core of the application. This “keep your model pure” approach is similar to the “Hexagonal Architecture” concept in the large, or the MVC pattern in the small.

In this post and the next, we’ll see some simple examples of this.

Example: parsing a command line

We talked about the match expression in general in the previous post, so let’s look at a real example where it is useful, namely parsing a command line.

We’ll design and implement two slightly different versions, one with a basic internal model, and second one with some improvements.

Requirements

Let’s say that we have three commandline options: “verbose”, “subdirectories”, and “orderby”.
“Verbose” and “subdirectories” are flags, while “orderby” has two choices: “by size” and “by name”.

So the command line params would look like

  1. MYAPP [/V] [/S] [/O order]
  2. /V verbose
  3. /S include subdirectories
  4. /O order by. Parameter is one of
  5. N - order by name.
  6. S - order by size

First version

Following the design rule above, we can see that:

  • the input will be an array (or list) of strings, one for each argument.
  • the internal model will be a set of types that model the (tiny) domain.
  • the output is out of scope in this example.

So we’ll start by first creating the internal model of the parameters, and then look at how we can parse the input into types used in the internal model.

Here’s a first stab at the model:

  1. // constants used later
  2. let OrderByName = "N"
  3. let OrderBySize = "S"
  4. // set up a type to represent the options
  5. type CommandLineOptions = {
  6. verbose: bool;
  7. subdirectories: bool;
  8. orderby: string;
  9. }

Ok, that looks alright. Now let’s parse the arguments.

The parsing logic is very similar to the loopAndSum example in the previous post.

  • We create a recursive loop on the list of arguments.
  • Each time through the loop, we parse one argument.
  • The options parsed so far are passed into each loop as a parameter (the “accumulator” pattern).
  1. let rec parseCommandLine args optionsSoFar =
  2. match args with
  3. // empty list means we're done.
  4. | [] ->
  5. optionsSoFar
  6. // match verbose flag
  7. | "/v"::xs ->
  8. let newOptionsSoFar = { optionsSoFar with verbose=true}
  9. parseCommandLine xs newOptionsSoFar
  10. // match subdirectories flag
  11. | "/s"::xs ->
  12. let newOptionsSoFar = { optionsSoFar with subdirectories=true}
  13. parseCommandLine xs newOptionsSoFar
  14. // match orderBy by flag
  15. | "/o"::xs ->
  16. //start a submatch on the next arg
  17. match xs with
  18. | "S"::xss ->
  19. let newOptionsSoFar = { optionsSoFar with orderby=OrderBySize}
  20. parseCommandLine xss newOptionsSoFar
  21. | "N"::xss ->
  22. let newOptionsSoFar = { optionsSoFar with orderby=OrderByName}
  23. parseCommandLine xss newOptionsSoFar
  24. // handle unrecognized option and keep looping
  25. | _ ->
  26. eprintfn "OrderBy needs a second argument"
  27. parseCommandLine xs optionsSoFar
  28. // handle unrecognized option and keep looping
  29. | x::xs ->
  30. eprintfn "Option '%s' is unrecognized" x
  31. parseCommandLine xs optionsSoFar

This code is straightforward, I hope.

Each match consist of a option::restOfList pattern.
If the option is matched, a new optionsSoFar value is created and the loop repeats with the remaining list, until the list becomes empty,
at which point we can exit the loop and return the optionsSoFar value as the final result.

There are two special cases:

  • Matching the “orderBy” option creates a submatch pattern that looks at the first item in the rest of the list and if not found, complains about a missing second parameter.
  • The very last match on the main match..with is not a wildcard, but a “bind to value”. Just like a wildcard, this will always succeed, but because we havd bound to the value, it allows us to print the offending unmatched argument.
  • Note that for printing errors, we use eprintf rather than printf. This will write to STDERR rather than STDOUT.

So now let’s test this:

  1. parseCommandLine ["/v"; "/s"]

Oops! That didn’t work — we need to pass in an initial optionsSoFar argument! Lets try again:

  1. // define the defaults to pass in
  2. let defaultOptions = {
  3. verbose = false;
  4. subdirectories = false;
  5. orderby = ByName
  6. }
  7. // test it
  8. parseCommandLine ["/v"] defaultOptions
  9. parseCommandLine ["/v"; "/s"] defaultOptions
  10. parseCommandLine ["/o"; "S"] defaultOptions

Check that the output is what you would expect.

And we should also check the error cases:

  1. parseCommandLine ["/v"; "xyz"] defaultOptions
  2. parseCommandLine ["/o"; "xyz"] defaultOptions

You should see the error messages in these cases now.

Before we finish this implementation, let’s fix something annoying.
We are passing in these default options every time — can we get rid of them?

This is a very common situation: you have a recursive function that takes a “accumulator” parameter, but you don’t want to be passing initial values all the time.

The answer is simple: just create another function that calls the recursive function with the defaults.

Normally, this second one is the “public” one and the recursive one is hidden, so we will rewrite the code as follows:

  • Rename parseCommandLine to parseCommandLineRec. There are other naming conventions you could use as well, such as parseCommandLine' with a tick mark, or innerParseCommandLine.
  • Create a new parseCommandLine that calls parseCommandLineRec with the defaults
  1. // create the "helper" recursive function
  2. let rec parseCommandLineRec args optionsSoFar =
  3. // implementation as above
  4. // create the "public" parse function
  5. let parseCommandLine args =
  6. // create the defaults
  7. let defaultOptions = {
  8. verbose = false;
  9. subdirectories = false;
  10. orderby = OrderByName
  11. }
  12. // call the recursive one with the initial options
  13. parseCommandLineRec args defaultOptions

In this case the helper function can stand on its own. But if you really want to hide it, you can put it as a nested subfunction in the defintion of parseCommandLine itself.

  1. // create the "public" parse function
  2. let parseCommandLine args =
  3. // create the defaults
  4. let defaultOptions =
  5. // implementation as above
  6. // inner recursive function
  7. let rec parseCommandLineRec args optionsSoFar =
  8. // implementation as above
  9. // call the recursive one with the initial options
  10. parseCommandLineRec args defaultOptions

In this case, I think it would just make things more complicated, so I have kept them separate.

So, here is all the code at once, wrapped in a module:

  1. module CommandLineV1 =
  2. // constants used later
  3. let OrderByName = "N"
  4. let OrderBySize = "S"
  5. // set up a type to represent the options
  6. type CommandLineOptions = {
  7. verbose: bool;
  8. subdirectories: bool;
  9. orderby: string;
  10. }
  11. // create the "helper" recursive function
  12. let rec parseCommandLineRec args optionsSoFar =
  13. match args with
  14. // empty list means we're done.
  15. | [] ->
  16. optionsSoFar
  17. // match verbose flag
  18. | "/v"::xs ->
  19. let newOptionsSoFar = { optionsSoFar with verbose=true}
  20. parseCommandLineRec xs newOptionsSoFar
  21. // match subdirectories flag
  22. | "/s"::xs ->
  23. let newOptionsSoFar = { optionsSoFar with subdirectories=true}
  24. parseCommandLineRec xs newOptionsSoFar
  25. // match orderBy by flag
  26. | "/o"::xs ->
  27. //start a submatch on the next arg
  28. match xs with
  29. | "S"::xss ->
  30. let newOptionsSoFar = { optionsSoFar with orderby=OrderBySize}
  31. parseCommandLineRec xss newOptionsSoFar
  32. | "N"::xss ->
  33. let newOptionsSoFar = { optionsSoFar with orderby=OrderByName}
  34. parseCommandLineRec xss newOptionsSoFar
  35. // handle unrecognized option and keep looping
  36. | _ ->
  37. eprintfn "OrderBy needs a second argument"
  38. parseCommandLineRec xs optionsSoFar
  39. // handle unrecognized option and keep looping
  40. | x::xs ->
  41. eprintfn "Option '%s' is unrecognized" x
  42. parseCommandLineRec xs optionsSoFar
  43. // create the "public" parse function
  44. let parseCommandLine args =
  45. // create the defaults
  46. let defaultOptions = {
  47. verbose = false;
  48. subdirectories = false;
  49. orderby = OrderByName
  50. }
  51. // call the recursive one with the initial options
  52. parseCommandLineRec args defaultOptions
  53. // happy path
  54. CommandLineV1.parseCommandLine ["/v"]
  55. CommandLineV1.parseCommandLine ["/v"; "/s"]
  56. CommandLineV1.parseCommandLine ["/o"; "S"]
  57. // error handling
  58. CommandLineV1.parseCommandLine ["/v"; "xyz"]
  59. CommandLineV1.parseCommandLine ["/o"; "xyz"]

Second version

In our initial model we used bool and string to represent the possible values.

  1. type CommandLineOptions = {
  2. verbose: bool;
  3. subdirectories: bool;
  4. orderby: string;
  5. }

There are two problems with this:

  • It doesn’t really represent the domain. For example, can orderby really be any string? Would my code break if I set it to “ABC”?

  • The values are not self documenting. For example, the verbose value is a bool. We only know that the bool represents the “verbose” option because of the context (the field named verbose) it is found in.
    If we passed that bool around, and took it out of context, we would not know what it represented. I’m sure we have all seen C# functions with many boolean parameters like this:

  1. myObject.SetUpComplicatedOptions(true,false,true,false,false);

Because the bool doesn’t represent anything at the domain level, it is very easy to make mistakes.

The solution to both these problems is to be as specific as possible when defining the domain, typically by creating lots of very specific types.

So here’s a new version of CommandLineOptions:

  1. type OrderByOption = OrderBySize | OrderByName
  2. type SubdirectoryOption = IncludeSubdirectories | ExcludeSubdirectories
  3. type VerboseOption = VerboseOutput | TerseOutput
  4. type CommandLineOptions = {
  5. verbose: VerboseOption;
  6. subdirectories: SubdirectoryOption;
  7. orderby: OrderByOption
  8. }

A couple of things to notice:

  • There are no bools or strings anywhere.
  • The names are quite explicit. This acts as documentation when a value is taken in isolation,
    but also means that the name is unique, which helps type inference, which in turn helps you avoid explicit type annotations.

Once we have made the changes to the domain, it is easy to fix up the parsing logic.

So, here is all the revised code, wrapped in a “v2” module:

  1. module CommandLineV2 =
  2. type OrderByOption = OrderBySize | OrderByName
  3. type SubdirectoryOption = IncludeSubdirectories | ExcludeSubdirectories
  4. type VerboseOption = VerboseOutput | TerseOutput
  5. type CommandLineOptions = {
  6. verbose: VerboseOption;
  7. subdirectories: SubdirectoryOption;
  8. orderby: OrderByOption
  9. }
  10. // create the "helper" recursive function
  11. let rec parseCommandLineRec args optionsSoFar =
  12. match args with
  13. // empty list means we're done.
  14. | [] ->
  15. optionsSoFar
  16. // match verbose flag
  17. | "/v"::xs ->
  18. let newOptionsSoFar = { optionsSoFar with verbose=VerboseOutput}
  19. parseCommandLineRec xs newOptionsSoFar
  20. // match subdirectories flag
  21. | "/s"::xs ->
  22. let newOptionsSoFar = { optionsSoFar with subdirectories=IncludeSubdirectories}
  23. parseCommandLineRec xs newOptionsSoFar
  24. // match sort order flag
  25. | "/o"::xs ->
  26. //start a submatch on the next arg
  27. match xs with
  28. | "S"::xss ->
  29. let newOptionsSoFar = { optionsSoFar with orderby=OrderBySize}
  30. parseCommandLineRec xss newOptionsSoFar
  31. | "N"::xss ->
  32. let newOptionsSoFar = { optionsSoFar with orderby=OrderByName}
  33. parseCommandLineRec xss newOptionsSoFar
  34. // handle unrecognized option and keep looping
  35. | _ ->
  36. printfn "OrderBy needs a second argument"
  37. parseCommandLineRec xs optionsSoFar
  38. // handle unrecognized option and keep looping
  39. | x::xs ->
  40. printfn "Option '%s' is unrecognized" x
  41. parseCommandLineRec xs optionsSoFar
  42. // create the "public" parse function
  43. let parseCommandLine args =
  44. // create the defaults
  45. let defaultOptions = {
  46. verbose = TerseOutput;
  47. subdirectories = ExcludeSubdirectories;
  48. orderby = OrderByName
  49. }
  50. // call the recursive one with the initial options
  51. parseCommandLineRec args defaultOptions
  52. // ==============================
  53. // tests
  54. // happy path
  55. CommandLineV2.parseCommandLine ["/v"]
  56. CommandLineV2.parseCommandLine ["/v"; "/s"]
  57. CommandLineV2.parseCommandLine ["/o"; "S"]
  58. // error handling
  59. CommandLineV2.parseCommandLine ["/v"; "xyz"]
  60. CommandLineV2.parseCommandLine ["/o"; "xyz"]

Using fold instead of recursion?

We said in the previous post that it is good to avoid recursion where possible and use the built in functions in the List module like map and fold.

So can we take this advice here and fix up this code to do this?

Unfortunately, not easily. The problem is that the list functions generally work on one element at a time, while the “orderby” option requires a “lookahead” argument as well.

To make this work with something like fold, we need to create a “parse mode” flag to indicate whether we are in lookahead mode or not.
This is possible, but I think it just adds extra complexity compared to the straightforward recursive version above.

And in a real-world situation, anything more complicated than this would be a signal that you need to switch to a proper parsing system such as FParsec.

However, just to show you it can be done with fold:

  1. module CommandLineV3 =
  2. type OrderByOption = OrderBySize | OrderByName
  3. type SubdirectoryOption = IncludeSubdirectories | ExcludeSubdirectories
  4. type VerboseOption = VerboseOutput | TerseOutput
  5. type CommandLineOptions = {
  6. verbose: VerboseOption;
  7. subdirectories: SubdirectoryOption;
  8. orderby: OrderByOption
  9. }
  10. type ParseMode = TopLevel | OrderBy
  11. type FoldState = {
  12. options: CommandLineOptions ;
  13. parseMode: ParseMode;
  14. }
  15. // parse the top-level arguments
  16. // return a new FoldState
  17. let parseTopLevel arg optionsSoFar =
  18. match arg with
  19. // match verbose flag
  20. | "/v" ->
  21. let newOptionsSoFar = {optionsSoFar with verbose=VerboseOutput}
  22. {options=newOptionsSoFar; parseMode=TopLevel}
  23. // match subdirectories flag
  24. | "/s"->
  25. let newOptionsSoFar = { optionsSoFar with subdirectories=IncludeSubdirectories}
  26. {options=newOptionsSoFar; parseMode=TopLevel}
  27. // match sort order flag
  28. | "/o" ->
  29. {options=optionsSoFar; parseMode=OrderBy}
  30. // handle unrecognized option and keep looping
  31. | x ->
  32. printfn "Option '%s' is unrecognized" x
  33. {options=optionsSoFar; parseMode=TopLevel}
  34. // parse the orderBy arguments
  35. // return a new FoldState
  36. let parseOrderBy arg optionsSoFar =
  37. match arg with
  38. | "S" ->
  39. let newOptionsSoFar = { optionsSoFar with orderby=OrderBySize}
  40. {options=newOptionsSoFar; parseMode=TopLevel}
  41. | "N" ->
  42. let newOptionsSoFar = { optionsSoFar with orderby=OrderByName}
  43. {options=newOptionsSoFar; parseMode=TopLevel}
  44. // handle unrecognized option and keep looping
  45. | _ ->
  46. printfn "OrderBy needs a second argument"
  47. {options=optionsSoFar; parseMode=TopLevel}
  48. // create a helper fold function
  49. let foldFunction state element =
  50. match state with
  51. | {options=optionsSoFar; parseMode=TopLevel} ->
  52. // return new state
  53. parseTopLevel element optionsSoFar
  54. | {options=optionsSoFar; parseMode=OrderBy} ->
  55. // return new state
  56. parseOrderBy element optionsSoFar
  57. // create the "public" parse function
  58. let parseCommandLine args =
  59. let defaultOptions = {
  60. verbose = TerseOutput;
  61. subdirectories = ExcludeSubdirectories;
  62. orderby = OrderByName
  63. }
  64. let initialFoldState =
  65. {options=defaultOptions; parseMode=TopLevel}
  66. // call fold with the initial state
  67. args |> List.fold foldFunction initialFoldState
  68. // ==============================
  69. // tests
  70. // happy path
  71. CommandLineV3.parseCommandLine ["/v"]
  72. CommandLineV3.parseCommandLine ["/v"; "/s"]
  73. CommandLineV3.parseCommandLine ["/o"; "S"]
  74. // error handling
  75. CommandLineV3.parseCommandLine ["/v"; "xyz"]
  76. CommandLineV3.parseCommandLine ["/o"; "xyz"]

By the way, can you see a subtle change of behavior in this version?

In the previous versions, if there was no parameter to the “orderBy” option, the recursive loop would still parse it next time.
But in the ‘fold’ version, this token is swallowed and lost.

To see this, compare the two implementations:

  1. // verbose set
  2. CommandLineV2.parseCommandLine ["/o"; "/v"]
  3. // verbose not set!
  4. CommandLineV3.parseCommandLine ["/o"; "/v"]

To fix this would be even more work. Again this argues for the second implementation as the easiest to debug and maintain.

Summary

In this post we’ve seen how to apply pattern matching to a real-world example.

More importantly, we’ve seen how easy it is to create a properly designed internal model for even the smallest domain. And that this internal model provides more type safety and documentation than using primitive types such as string and bool.

In the next example, we’ll do even more pattern matching!