Haskell type-level functions shenanigans
An introduction to some useful language extensions2020-12-27
Last Wednesday, on Twitter, @TechnoEmpress posted a purposefully horrible solution to the following problem: given two types a
and b
, how does one test that type a
is contained in type b
? Can we write a function contains
such that:
The answer is: of course we can; the real question is how. Looking into it, I found several possible implementations, that relied on parts of the language I wasn’t very familiar with. I thought it’d be interesting to go over them, and highlight why each extension is required, step by step. So… let’s go!
Runtime solution
The easiest approach relies on a base
library: Data.Typeable, which provides functions that, to a given runtime value, associate a corresponding TypeRep
value that represents the value’s type. Using typeRepArgs
on that representation, one can obtain the list of the type arguments of a given type: from the representation of Either String Int
, we can get [String, Int]
. A first approach could therefore look like this:
There are several ways to improve on this: first, we can use typeRep
instead of typeOf
: its argument is a proxy type, such as Proxy
from Data.Proxy: a type that does not contain any data and is only used as a way to convey type information. This allows us to call contains
even if we do not have actual values of our types.
> AllowAmbiguousTypes
But we can go one step further: what if our function had no runtime argument? After all, while we need a proxy to call typeRep
, we can create it in the function itself, it doesn’t need to be passed as an argument. Ideally, we’d like to define our function as such:
However… This definition doesn’t compile: the compiler will not be able to identify what the types of a
and b
are from the arguments to the function, given that there’s no longer any argument! Calls to that function will therefore be ambiguous, and the compiler rejects it. The solution to this problem is provided by our first language extension: AllowAmbiguousTypes. It simply disables that check in the compiler. We can now implement our function:
> ScopedTypeVariables
But that still does not compile! The problem is that the bindings in our where
block are independent from our function’s signature. So when we say Proxy a
, it could be any a
: it is understood to be generic, as it would be in a top-level function’s signature. What we would need is a way to tell the compiler that the a
in the where block is the same as the one in the function’s signature, that they’re in the same scope. Introducing ScopedTypeVariables; thanks to it, type variables introduced with the forall
syntax will, well, be scoped: the compiler will now know that the a
in that Proxy a
is indeed the same one as in the function’s signature.
> TypeApplications
But now, how do we actually call that function? We have disabled the ambiguity check, but we need to resolve that ambiguity at the call site. And, of course, one more extension solves this problem: thanks to TypeApplications, we can use the @Type
syntax to specify what a
and b
are.
You can find the code of this version in this gist.
But… we can do better. Type information is only relevant at compilation time, and is (mostly) elided at runtime. It should be possible to solve this entirely at compile time… With that, we could also extend our function to work on any type, not just the ones that define a Typeable
instance.
Compile-time solution
If we are to write a solution at compile-time, we can’t write it as a function. Our input is going to be a type but… our output is also going to be a type. There’s no way to write a mapping from type to type in Haskell 98 (AFAIK); to do so, we’ll need our first language extension.
> TypeFamilies
TypeFamilies allows us to add type mappings to regular typeclasses. For instance, consider the class IsList
(defined as part of another language extension): types that are not generic over their content can still be valid instances of IsList
, and the typeclass therefore includes a type mapping, so that each instance can specify what the type of the contained items is:
There’s several different ways to declare a type family; one that’s going to be useful for a us is a top-level closed type family, in which all instances are given together alongside the declaration. There’s additionally one very interesting property of those closed families, which solves the problem of overlapping instances:
The advantage of a closed family is that its equations are tried in order, similar to a term-level function definition.
But first: how are we going to represent our result?
> DataKinds
We could use some arbitrary new abstract types to represent truthiness:
but there’s of course a better solution. But first, we need to talk about parallel universes kinds. Kinds are, in essence, the type of types. Int
has kind Type
; Maybe
requires a type argument to yield a type, such as in Maybe Int
, and has therefore kind Type -> Type
. DataKinds is an extension that allows to “lift things one level up”. Types are allowed to be used at the kind level, which means that their constructors are allowed to be used at the type level. Which means that here, we can use good ol’ Bool
type as a kind, and use True
and False
as “types”. Syntax-wise, we have to prefix a constructor with a single quote to use it at the type level.
With this, and with the fact that patterns are tried in order, we can write our first attempt:
What we have here is a very naive version of type equality: if the compiler can match the first line, it will associate the “type” 'True
to the type application of contains, otherwise fall back to the second instance, which is defined for all possible types. The only thing left to do is for us to handle recursion: we’ll need to decompose more complex types and recurse over them:
Recursively: Contains Int (Maybe Int) = Contains Int Int = 'True
. We don’t match the recursive case first to avoid decomposing our type when we have a match: Contains [IO Int] [IO Int]
should return True
without matching the recursive case.
But… we’re not out of the woods yet:
The problem lies in our recursive match: f x
. In the case of Either Int Char
, x
is Char
, a type, and f
is Either Int
, a type function of kind Type -> Type
. In our recursion, we only check x
, but we would need to decompose f
too…
A naive solution would be to manually decompose up to as many arguments as we deem reasonable, but that’s very verbose, not complete, and quite unsatisfying:
A better solution would be, in the case of f x
, to recurse on both parts: is a
contained in f
, or is a
contained in x
? This poses two problems; the first is: how can we recurse on f
?
> PolyKinds
The problem we face is that we have explicitly declared the second argument of Contains
to be of kind Type
. To be able to recurse on f
, we’d need a similar type family, in which the second argument would be of kind Type -> Type
. And, in the recursion case of that second type family, we’d need to delegate the recursion to a third type family for types of kind Type -> Type -> Type
… and so on.
What we need is instead to define our family so that it is generic over kinds. That’s what we can do with PolyKinds: we can now declare that the kind of our b
parameter is generic: any kind k
will do, which allows us to recurse on f
too:
The only thing that’s left is to combine our two recursive calls: what we’ll need is a boolean “or” at the type-level! It’s thankfully pretty straightforward, now, with all our extensions:
> UndecidableInstances
Putting it together:
But as you might have guessed, this doesn’t compile. GHC really wants to guarantee that instance resolution terminates in finite time; for that reason, it forbids several patterns that it deems “dangerous”, such as nested type family applications, which is precisely what we’re trying to do here! A dangerous extension, UndecidableInstances, disables some of those checks at our own risk, and allows our solution to compile.
> TypeOperators
While our solution is now correct, we can make it better. One last thing we can do is use an existing implementation of type-level boolean or
, rather than reimplementing our own: there is one already in Data.Type.Bool. You’ll notice however that it is defined as an operator! As you will have guessed, this is what is allowed by TypeOperators.
With that, we can finally settle on one final version:
Coming to terms with our solution
For convenience, and without any extension, we can define a simple typeclass to bring our type-level booleans back to term-level:
We can now finally test it:
And… that’s it! You can find this version of the code in this other gist. This was a silly but fun exercise, I learned a ton, hopefully this will be useful to you too!
Happy new year, and all that sort of things. :)