Haskell type-level functions shenanigansAn introduction to some useful language extensions
Last Wednesday, on Twitter, @TechnoEmpress posted a purposefully horrible solution to the following problem: given two types
b, how does one test that type
a is contained in type
b? Can we write a function
contains such that:
The answer is: of course we can; the real question is how. Looking into it, I found several possible implementations, that relied on parts of the language I wasn’t very familiar with. I thought it’d be interesting to go over them, and highlight why each extension is required, step by step. So… let’s go!
The easiest approach relies on a
base library: Data.Typeable, which provides functions that, to a given runtime value, associate a corresponding
TypeRep value that represents the value’s type. Using
typeRepArgs on that representation, one can obtain the list of the type arguments of a given type: from the representation of
Either String Int, we can get
[String, Int]. A first approach could therefore look like this:
There are several ways to improve on this: first, we can use
typeRep instead of
typeOf: its argument is a proxy type, such as
Proxy from Data.Proxy: a type that does not contain any data and is only used as a way to convey type information. This allows us to call
contains even if we do not have actual values of our types.
But we can go one step further: what if our function had no runtime argument? After all, while we need a proxy to call
typeRep, we can create it in the function itself, it doesn’t need to be passed as an argument. Ideally, we’d like to define our function as such:
However… This definition doesn’t compile: the compiler will not be able to identify what the types of
b are from the arguments to the function, given that there’s no longer any argument! Calls to that function will therefore be ambiguous, and the compiler rejects it. The solution to this problem is provided by our first language extension: AllowAmbiguousTypes. It simply disables that check in the compiler. We can now implement our function:
But that still does not compile! The problem is that the bindings in our
where block are independent from our function’s signature. So when we say
Proxy a, it could be any
a: it is understood to be generic, as it would be in a top-level function’s signature. What we would need is a way to tell the compiler that the
a in the where block is the same as the one in the function’s signature, that they’re in the same scope. Introducing ScopedTypeVariables; thanks to it, type variables introduced with the
forall syntax will, well, be scoped: the compiler will now know that the
a in that
Proxy a is indeed the same one as in the function’s signature.
But now, how do we actually call that function? We have disabled the ambiguity check, but we need to resolve that ambiguity at the call site. And, of course, one more extension solves this problem: thanks to TypeApplications, we can use the
@Type syntax to specify what
You can find the code of this version in this gist.
But… we can do better. Type information is only relevant at compilation time, and is (mostly) elided at runtime. It should be possible to solve this entirely at compile time… With that, we could also extend our function to work on any type, not just the ones that define a
If we are to write a solution at compile-time, we can’t write it as a function. Our input is going to be a type but… our output is also going to be a type. There’s no way to write a mapping from type to type in Haskell 98 (AFAIK); to do so, we’ll need our first language extension.
TypeFamilies allows us to add type mappings to regular typeclasses. For instance, consider the class
IsList (defined as part of another language extension): types that are not generic over their content can still be valid instances of
IsList, and the typeclass therefore includes a type mapping, so that each instance can specify what the type of the contained items is:
There’s several different ways to declare a type family; one that’s going to be useful for a us is a top-level closed type family, in which all instances are given together alongside the declaration. There’s additionally one very interesting property of those closed families, which solves the problem of overlapping instances:
The advantage of a closed family is that its equations are tried in order, similar to a term-level function definition.
But first: how are we going to represent our result?
We could use some arbitrary new abstract types to represent truthiness:
but there’s of course a better solution. But first, we need to talk about
parallel universes kinds. Kinds are, in essence, the type of types.
Int has kind
Maybe requires a type argument to yield a type, such as in
Maybe Int, and has therefore kind
Type -> Type. DataKinds is an extension that allows to “lift things one level up”. Types are allowed to be used at the kind level, which means that their constructors are allowed to be used at the type level. Which means that here, we can use good ol’
Bool type as a kind, and use
False as “types”. Syntax-wise, we have to prefix a constructor with a single quote to use it at the type level.
With this, and with the fact that patterns are tried in order, we can write our first attempt:
What we have here is a very naive version of type equality: if the compiler can match the first line, it will associate the “type”
'True to the type application of contains, otherwise fall back to the second instance, which is defined for all possible types. The only thing left to do is for us to handle recursion: we’ll need to decompose more complex types and recurse over them:
Contains Int (Maybe Int) = Contains Int Int = 'True. We don’t match the recursive case first to avoid decomposing our type when we have a match:
Contains [IO Int] [IO Int] should return
True without matching the recursive case.
But… we’re not out of the woods yet:
The problem lies in our recursive match:
f x. In the case of
Either Int Char,
Char, a type, and
Either Int, a type function of kind
Type -> Type. In our recursion, we only check
x, but we would need to decompose
A naive solution would be to manually decompose up to as many arguments as we deem reasonable, but that’s very verbose, not complete, and quite unsatisfying:
A better solution would be, in the case of
f x, to recurse on both parts: is
a contained in
f, or is
a contained in
x? This poses two problems; the first is: how can we recurse on
The problem we face is that we have explicitly declared the second argument of
Contains to be of kind
Type. To be able to recurse on
f, we’d need a similar type family, in which the second argument would be of kind
Type -> Type. And, in the recursion case of that second type family, we’d need to delegate the recursion to a third type family for types of kind
Type -> Type -> Type… and so on.
What we need is instead to define our family so that it is generic over kinds. That’s what we can do with PolyKinds: we can now declare that the kind of our
b parameter is generic: any kind
k will do, which allows us to recurse on
The only thing that’s left is to combine our two recursive calls: what we’ll need is a boolean “or” at the type-level! It’s thankfully pretty straightforward, now, with all our extensions:
Putting it together:
But as you might have guessed, this doesn’t compile. GHC really wants to guarantee that instance resolution terminates in finite time; for that reason, it forbids several patterns that it deems “dangerous”, such as nested type family applications, which is precisely what we’re trying to do here! A dangerous extension, UndecidableInstances, disables some of those checks at our own risk, and allows our solution to compile.
While our solution is now correct, we can make it better. One last thing we can do is use an existing implementation of type-level boolean
or, rather than reimplementing our own: there is one already in Data.Type.Bool. You’ll notice however that it is defined as an operator! As you will have guessed, this is what is allowed by TypeOperators.
With that, we can finally settle on one final version:
Coming to terms with our solution
For convenience, and without any extension, we can define a simple typeclass to bring our type-level booleans back to term-level:
We can now finally test it:
And… that’s it! You can find this version of the code in this other gist. This was a silly but fun exercise, I learned a ton, hopefully this will be useful to you too!
Happy new year, and all that sort of things. :)