Algebraic Data Types and Pattern Matching

What may not be clear to readers in a lot of the previous discussions is the use of Algebraic Data Types in combination with patterm matching to define functions. It’s really quite simple, conceptually (implementation may be a different matter, we’ll see.) Here’s an example we’ve seen before, I’ll just be more descriptive this time:

This declaration achieves two things:

  1. It defines a type  list(t)  (list of t) where  t is a type variable that can stand for any type.
  2. It creates two constructor functions, called  cons and null, that accept arguments of the specified types (none in the case of null,) and return data of type list(t).

Reading it aloud, it says define a type list of some unspecified type t which is either a cons of a  t and a  list of t, or a null.

Once defined, we use these type costructors to create lists of a concrete type:

After the above definition, a has type list(bool). The following, on the other hand, would fail to type check:

It fails because:

  • cons('x', null)  is of type list(char) .
  • The outer cons expects arguments  <t>  and list(<t>) , but it gets  bool  and list(char) .
  • The outer cons cannot reconcile  <t> = bool  with  <t> = char  so the type check fails.

That’s all very nice, but how can we use Algeraic Data Types? It turns out that they become very useful in combination with pattern matching in case statements. Consider:

In that case statement, a must match either  cons(head, tail)  or null. Now if it matches cons(head, tail), the (normal) variables  head and  tail are automatically created and instantiated as the relevant components of the  cons in the body of the case statement. This kind of behaviour is so commonplace in languages like ML that special syntax for functions has evolved, which I’m borrowing for F♮:

This version of length, instead of having a single formal argument list outside the body, has alternative formal argument lists inside the body, with mini bodies of their own, just like a case statement. It’s functionally identical to the previous version, but a good deal more concise and readable.

One thing to bear in mind, in both versions, is that  length  has type list(t) int. That is to say, each of the formal argument lists inside the body of a function, or the alternative cases in a case statement, must agree in the number and types of the arguments, and must return the same type of result.

Now, it becomes obvious that, just as we can rewrite a  let to be a lambda, this case statement is in fact just syntactic sugar for an anonymous function call. The earlier definition of  length  above, using a case statement, can be re-written as:

so we get case statements: powerful, pattern matching ones, allowing more than one argument, for free if we take this approach.

length is polymorphic. It does not do anything to the value of head so does not care about its type. Therefore the type of length, namely  list(t) int actually contains a type variable t.

Here’s a function that does care about the type of the list:

Assuming strlen has type string int, that would constrain  sum_strlen to have type list(string) int. Of course that’s a rather silly function, we would be better passing in a function like this:

That would give sum a type:

and we could call it like:

or even, with a Curried application:

This is starting to look like map-reduce. More on that later.

Real-World Applications

Algebraic Data Types really come in to their own when it comes to tree walking. Consider the following definitions:

Given that, we can write an evaluator for arithmetic expressions very easily:

So eval has type expr(int) int . We can call it like:

to get 17.

Pattern matching not only covers variables and type constructors, it can also cope with constants. For example here’s a definition offactorial:

For this and other examples to work, the cases must be checked in order and the first case that matches is selected. so the argument to  factorial  would only match  n  if it failed to match .

As another example, here’s member:

Here I’m using F♮’s built-in list type constructors @, (pronounced cons,) and  [] (pronounced null,) and a wildcard  _ to indicate a don’t care variable that always unifies, but apart from that it’s just the same as the  cons and  null constructors. Anyway, the cases say:

  • member(item, list)  is  true if  item is at the head of the list.
  • member(item, list) is  true if item is a member of the tail of the list.
  • item is not a member of the empty list.

Problems and Solutions

You’ve probably realised that given a type like  list(t) above, it’s not possible to directly create lists of mixed type. That is because it is usually a very bad idea to do so. However if you need to do so, you can get around the restriction without breaking any rules, as follows:

  1. Create a container type for your mixed types:
  2. Create lists of that type:

After the above definition, a has type list(either(string, int)), and you can’t get at the data without knowing its type:

Here,  sum_numbers has type [either(<t>, int)] int. e.g. it doesn’t care what type  first holds. We could have written  first(s) instead of first(_), but the use of a wildcard  _explicitly says we don’t care, stops any potential warnings about unused variables, and is more efficient.

Leave a Reply

Your email address will not be published. Required fields are marked *