8.11 Generalized algebraic datatypes

(Introduced in OCaml 4.00)

constr-decl::=
constr-name : [ constr-args -> ] typexpr
type-param::=
[variance] _

Generalized algebraic datatypes, or GADTs, extend usual sum types intwo ways: constraints on type parameters may change depending on thevalue constructor, and some type variables may be existentiallyquantified.Adding constraints is done by giving an explicit return type(the rightmost typexpr in the above syntax), where type parametersare instantiated.This return type must use the same type constructor as the type beingdefined, and have the same number of parameters.Variables are made existential when they appear inside a constructor’sargument, but not in its return type.

Since the use of a return type often eliminates the need to name typeparameters in the left-hand side of a type definition, one can replacethem with anonymous types _ in that case.

The constraints associated to each constructor can be recoveredthrough pattern-matching.Namely, if the type of the scrutinee of a pattern-matching containsa locally abstract type, this type can be refined according to theconstructor used.These extra constraints are only valid inside the corresponding branchof the pattern-matching.If a constructor has some existential variables, fresh locallyabstract types are generated, and they must not escape thescope of this branch.

Recursive functions

Here is a concrete example:

  1. type _ term =
  2. | Int : int -> int term
  3. | Add : (int -> int -> int) term
  4. | App : ('b -> 'a) term * 'b term -> 'a term
  5. let rec eval : type a. a term -> a = function
  6. | Int n -> n (* a = int *)
  7. | Add -> (fun x y -> x+y) (* a = int -> int -> int *)
  8. | App(f,x) -> (eval f) (eval x)
  9. (* eval called at types (b->a) and b for fresh b *)
  10.  
  1. let two = eval (App (App (Add, Int 1), Int 1))
  2. val two : int = 2

It is important to remark that the function eval is using thepolymorphic syntax for locally abstract types. When defining a recursivefunction that manipulates a GADT, explicit polymorphic recursion shouldgenerally be used. For instance, the following definition fails with atype error:

  1. let rec eval (type a) : a term -> a = function
  2. | Int n -> n
  3. | Add -> (fun x y -> x+y)
  4. | App(f,x) -> (eval f) (eval x)
  5. Error: This expression has type ($App_'b -> a) term
  6. but an expression was expected of type 'a
  7. The type constructor $App_'b would escape its scope

In absence of an explicit polymorphic annotation, a monomorphic typeis inferred for the recursive function. If a recursive call occursinside the function definition at a type that involves an existentialGADT type variable, this variable flows to the type of the recursivefunction, and thus escapes its scope. In the above example, this happensin the branch App(f,x) when eval is called with f as an argument.In this branch, the type of f is ($App 'b-> a). The prefix $ in$App 'b denotes an existential type named by the compiler(see 8.11). Since the type of eval is'a term -> 'a, the call eval f makes the existential type $App_'bflow to the type variable 'a and escape its scope. This triggers theabove error.

Type inference

Type inference for GADTs is notoriously hard.This is due to the fact some types may become ambiguous when escapingfrom a branch.For instance, in the Int case above, n could have either type intor a, and they are not equivalent outside of that branch.As a first approximation, type inference will always work if apattern-matching is annotated with types containing no free typevariables (both on the scrutinee and the return type).This is the case in the above example, thanks to the type annotationcontaining only locally abstract types.

In practice, type inference is a bit more clever than that: typeannotations do not need to be immediately on the pattern-matching, andthe types do not have to be always closed.As a result, it is usually enough to only annotate functions, as inthe example above. Type annotations arepropagated in two ways: for the scrutinee, they follow the flow oftype inference, in a way similar to polymorphic methods; for thereturn type, they follow the structure of the program, they are spliton functions, propagated to all branches of a pattern matching,and go through tuples, records, and sum types.Moreover, the notion of ambiguity used is stronger: a type is onlyseen as ambiguous if it was mixed with incompatible types (equated byconstraints), without type annotations between them.For instance, the following program types correctly.

  1. let rec sum : type a. a term -> _ = fun x ->
  2. let y =
  3. match x with
  4. | Int n -> n
  5. | Add -> 0
  6. | App(f,x) -> sum f + sum x
  7. in y + 1
  8. val sum : 'a term -> int = <fun>

Here the return type int is never mixed with a, so it is seen asnon-ambiguous, and can be inferred.When using such partial type annotations we strongly suggestspecifying the -principal mode, to check that inference isprincipal.

The exhaustiveness check is aware of GADT constraints, and canautomatically infer that some cases cannot happen.For instance, the following pattern matching is correctly seen asexhaustive (the Add case cannot happen).

  1. let get_int : int term -> int = function
  2. | Int n -> n
  3. | App(_,_) -> 0
  4.  
Refutation cases

(Introduced in OCaml 4.03)

Usually, the exhaustiveness check only tries to check whether thecases omitted from the pattern matching are typable or not.However, you can force it to try harder by adding refutation cases:

matching-case::=pattern [when expr] -> expr
pattern -> .

In presence of a refutation case, the exhaustiveness check will firstcompute the intersection of the pattern with the complement of thecases preceding it. It then checks whether the resulting patterns canreally match any concrete values by trying to type-check them.Wild cards in the generated patterns are handled in a special way: iftheir type is a variant type with only GADT constructors, then thepattern is split into the different constructors, in order to check whetherany of them is possible (this splitting is not done for arguments of theseconstructors, to avoid non-termination). We also split tuples andvariant types with only one case, since they may contain GADTs inside.For instance, the following code is deemed exhaustive:

  1. type _ t =
  2. | Int : int t
  3. | Bool : bool t
  4. let deep : (char t * int) option -> char = function
  5. | None -> 'c'
  6. | _ -> .
  7.  

Namely, the inferred remaining case is Some , which is split intoSome (Int, ) and Some (Bool, _), which are both untypable becausedeep expects a non-existing char t as the first element of the tuple.Note that the refutation case could be omitted here, because it isautomatically added when there is only one case in the patternmatching.

Another addition is that the redundancy check is now aware of GADTs: acase will be detected as redundant if it could be replaced by arefutation case using the same pattern.

Advanced examples

The term type we have defined above is an indexed type, wherea type parameter reflects a property of the value contents.Another use of GADTs is singleton types, where a GADT valuerepresents exactly one type. This value can be used as runtimerepresentation for this type, and a function receiving it can have apolytypic behavior.

Here is an example of a polymorphic function that takes theruntime representation of some type t and a value of the same type,then pretty-prints the value as a string:

  1. type _ typ =
  2. | Int : int typ
  3. | String : string typ
  4. | Pair : 'a typ * 'b typ -> ('a * 'b) typ
  5. let rec to_string: type t. t typ -> t -> string =
  6. fun t x ->
  7. match t with
  8. | Int -> Int.to_string x
  9. | String -> Printf.sprintf "%S" x
  10. | Pair(t1,t2) ->
  11. let (x1, x2) = x in
  12. Printf.sprintf "(%s,%s)" (to_string t1 x1) (to_string t2 x2)
  13.  

Another frequent application of GADTs is equality witnesses.

  1. type (_,_) eq = Eq : ('a,'a) eq
  2. let cast : type a b. (a,b) eq -> a -> b = fun Eq x -> x
  3.  

Here type eq has only one constructor, and by matching on it oneadds a local constraint allowing the conversion between a and b.By building such equality witnesses, one can make equal types whichare syntactically different.

Here is an example using both singleton types and equality witnessesto implement dynamic types.

  1. let rec eq_type : type a b. a typ -> b typ -> (a,b) eq option =
  2. fun a b ->
  3. match a, b with
  4. | Int, Int -> Some Eq
  5. | String, String -> Some Eq
  6. | Pair(a1,a2), Pair(b1,b2) ->
  7. begin match eq_type a1 b1, eq_type a2 b2 with
  8. | Some Eq, Some Eq -> Some Eq
  9. | _ -> None
  10. end
  11. | _ -> None
  12. type dyn = Dyn : 'a typ * 'a -> dyn
  13. let get_dyn : type a. a typ -> dyn -> a option =
  14. fun a (Dyn(b,x)) ->
  15. match eq_type a b with
  16. | None -> None
  17. | Some Eq -> Some x
  18.  
Existential type names in error messages

(Updated in OCaml 4.03.0)

The typing of pattern matching in presence of GADT can generate manyexistential types. When necessary, error messages refer to theseexistential types using compiler-generated names. Currently, thecompiler generates these names according to the following nomenclature:

  • First, types whose name starts with a $ are existentials.
  • $Constr_'a denotes an existential type introduced for the typevariable 'a of the GADT constructor Constr:
  1. type any = Any : 'name -> any
  2. let escape (Any x) = x
  3. Error: This expression has type $Any_'name
  4. but an expression was expected of type 'a
  5. The type constructor $Any_'name would escape its scope
  • $Constr denotes an existential type introduced for an anonymous type variable in the GADT constructor Constr:
  1. type any = Any : _ -> any
  2. let escape (Any x) = x
  3. Error: This expression has type $Any but an expression was expected of type
  4. 'a
  5. The type constructor $Any would escape its scope
  • $'a if the existential variable was unified with the type variable 'a during typing:
  1. type ('arg,'result,'aux) fn =
  2. | Fun: ('a ->'b) -> ('a,'b,unit) fn
  3. | Mem1: ('a ->'b) * 'a * 'b -> ('a, 'b, 'a * 'b) fn
  4. let apply: ('arg,'result, _ ) fn -> 'arg -> 'result = fun f x ->
  5. match f with
  6. | Fun f -> f x
  7. | Mem1 (f,y,fy) -> if x = y then fy else f x
  8. Error: This pattern matches values of type
  9. ($'arg, 'result, $'arg * 'result) fn
  10. but a pattern was expected which matches values of type
  11. ($'arg, 'result, unit) fn
  12. The type constructor $'arg would escape its scope
  • $n (n a number) is an internally generated existential which could not be named using one of the previous schemes.As shown by the last item, the current behavior is imperfectand may be improved in future versions.
Equations on non-local abstract types

(Introduced in OCaml4.04)

GADT pattern-matching may also add type equations to non-localabstract types. The behaviour is the same as with local abstracttypes. Reusing the above eq type, one can write:

  1. module M : sig type t val x : t val e : (t,int) eq end = struct
  2. type t = int
  3. let x = 33
  4. let e = Eq
  5. end
  6. let x : int = let Eq = M.e in M.x
  7.  

Of course, not all abstract types can be refined, as this wouldcontradict the exhaustiveness check. Namely, builtin types (thosedefined by the compiler itself, such as int or array), andabstract types defined by the local module, are non-instantiable, andas such cause a type error rather than introduce an equation.