Haskell type-level functions shenanigans
An introduction to some useful language extensions2020-12-27
Last Wednesday, on Twitter, @TechnoEmpress posted a purposefully horrible solution to the following problem: given two types a
and b
, how does one test that type a
is contained in type b
? Can we write a function contains
such that:
contains @Int @(Either (Maybe [IO Int]) String) => True
contains @[IO Int] @(Either (Maybe [IO Int]) String) => True
contains @(Maybe Int) @(Either (Maybe [IO Int]) String) => False
The answer is: of course we can; the real question is how. Looking into it, I found several possible implementations, that relied on parts of the language I wasn’t very familiar with. I thought it’d be interesting to go over them, and highlight why each extension is required, step by step. So… let’s go!
Runtime solution
The easiest approach relies on a base
library: Data.Typeable, which provides functions that, to a given runtime value, associate a corresponding TypeRep
value that represents the value’s type. Using typeRepArgs
on that representation, one can obtain the list of the type arguments of a given type: from the representation of Either String Int
, we can get [String, Int]
. A first approach could therefore look like this:
contains :: (Typeable a, Typeable b) => a -> b -> Bool
contains a b = containsA typeB
where
typeA = typeOf a
typeB = typeOf b
containsA x = x == typeA || any containsA (typeRepArgs x)
There are several ways to improve on this: first, we can use typeRep
instead of typeOf
: its argument is a proxy type, such as Proxy
from Data.Proxy: a type that does not contain any data and is only used as a way to convey type information. This allows us to call contains
even if we do not have actual values of our types.
contains
:: (Typeable a, Typeable b) => Proxy a -> Proxy b -> Bool
contains a b = containsA typeB
where
typeA = typeRep a
typeB = typeRep b
containsA x = x == typeA || any containsA (typeRepArgs x)
> contains (Proxy :: Proxy Int) (Proxy :: Proxy (Maybe Int))
True
> contains (Proxy :: Proxy Char) (Proxy :: Proxy (Maybe Int))
False
> AllowAmbiguousTypes
But we can go one step further: what if our function had no runtime argument? After all, while we need a proxy to call typeRep
, we can create it in the function itself, it doesn’t need to be passed as an argument. Ideally, we’d like to define our function as such:
contains :: (Typeable a, Typeable b) => Bool
However… This definition doesn’t compile: the compiler will not be able to identify what the types of a
and b
are from the arguments to the function, given that there’s no longer any argument! Calls to that function will therefore be ambiguous, and the compiler rejects it. The solution to this problem is provided by our first language extension: AllowAmbiguousTypes. It simply disables that check in the compiler. We can now implement our function:
contains :: (Typeable a, Typeable b) => Bool
contains = containsA typeB
where
typeA = typeRep (Proxy :: Proxy a)
typeB = typeRep (Proxy :: Proxy b)
containsA x = x == typeA || any containsA (typeRepArgs x)
> ScopedTypeVariables
But that still does not compile! The problem is that the bindings in our where
block are independent from our function’s signature. So when we say Proxy a
, it could be any a
: it is understood to be generic, as it would be in a top-level function’s signature. What we would need is a way to tell the compiler that the a
in the where block is the same as the one in the function’s signature, that they’re in the same scope. Introducing ScopedTypeVariables; thanks to it, type variables introduced with the forall
syntax will, well, be scoped: the compiler will now know that the a
in that Proxy a
is indeed the same one as in the function’s signature.
contains :: forall a b. (Typeable a, Typeable b) => Bool
contains = containsA typeB
where
typeA = typeRep (Proxy :: Proxy a)
typeB = typeRep (Proxy :: Proxy b)
containsA x = x == typeA || any containsA (typeRepArgs x)
> TypeApplications
But now, how do we actually call that function? We have disabled the ambiguity check, but we need to resolve that ambiguity at the call site. And, of course, one more extension solves this problem: thanks to TypeApplications, we can use the @Type
syntax to specify what a
and b
are.
> contains @Int @(Either (Maybe [IO Int]) String)
True
> contains @[IO Int] @(Either (Maybe [IO Int]) String)
True
> contains @(Maybe Int) @(Either (Maybe [IO Int]) String)p
False
You can find the code of this version in this gist.
But… we can do better. Type information is only relevant at compilation time, and is (mostly) elided at runtime. It should be possible to solve this entirely at compile time… With that, we could also extend our function to work on any type, not just the ones that define a Typeable
instance.
Compile-time solution
If we are to write a solution at compile-time, we can’t write it as a function. Our input is going to be a type but… our output is also going to be a type. There’s no way to write a mapping from type to type in Haskell 98 (AFAIK); to do so, we’ll need our first language extension.
> TypeFamilies
TypeFamilies allows us to add type mappings to regular typeclasses. For instance, consider the class IsList
(defined as part of another language extension): types that are not generic over their content can still be valid instances of IsList
, and the typeclass therefore includes a type mapping, so that each instance can specify what the type of the contained items is:
class IsList l where
type Item l
fromList :: [Item l] -> l
toList :: l -> [Item l]
There’s several different ways to declare a type family; one that’s going to be useful for a us is a top-level closed type family, in which all instances are given together alongside the declaration. There’s additionally one very interesting property of those closed families, which solves the problem of overlapping instances:
The advantage of a closed family is that its equations are tried in order, similar to a term-level function definition.
But first: how are we going to represent our result?
> DataKinds
We could use some arbitrary new abstract types to represent truthiness:
data BTrue
data BFalse
type family Contains a b where
Contains Int (Maybe Int) = BTrue
Contains Char (Maybe Int) = BFalse
-- TODO: make this more generic
but there’s of course a better solution. But first, we need to talk about parallel universes kinds. Kinds are, in essence, the type of types. Int
has kind Type
; Maybe
requires a type argument to yield a type, such as in Maybe Int
, and has therefore kind Type -> Type
. DataKinds is an extension that allows to “lift things one level up”. Types are allowed to be used at the kind level, which means that their constructors are allowed to be used at the type level. Which means that here, we can use good ol’ Bool
type as a kind, and use True
and False
as “types”. Syntax-wise, we have to prefix a constructor with a single quote to use it at the type level.
With this, and with the fact that patterns are tried in order, we can write our first attempt:
type family Contains (a :: Type) (b :: Type) :: Bool where
Contains a a = 'True
Contains _ _ = 'False
> :kind! Contains Int Int
'True
> :kind! Contains Char Int
'False
What we have here is a very naive version of type equality: if the compiler can match the first line, it will associate the “type” 'True
to the type application of contains, otherwise fall back to the second instance, which is defined for all possible types. The only thing left to do is for us to handle recursion: we’ll need to decompose more complex types and recurse over them:
type family Contains (a :: Type) (b :: Type) :: Bool where
Contains a a = 'True
Contains a (f x) = Contains a x
Contains _ _ = 'False
> :kind! Contains Int (Maybe Int)
'True
> :kind! Contains Int (Maybe (Maybe Int))
'True
> :kind! Contains Char (Maybe Int)
'False
Recursively: Contains Int (Maybe Int) = Contains Int Int = 'True
. We don’t match the recursive case first to avoid decomposing our type when we have a match: Contains [IO Int] [IO Int]
should return True
without matching the recursive case.
But… we’re not out of the woods yet:
> :kind! Contains Int (Int, Char)
'False
> :kind! Contains Int (Either Int Char)
'False
The problem lies in our recursive match: f x
. In the case of Either Int Char
, x
is Char
, a type, and f
is Either Int
, a type function of kind Type -> Type
. In our recursion, we only check x
, but we would need to decompose f
too…
A naive solution would be to manually decompose up to as many arguments as we deem reasonable, but that’s very verbose, not complete, and quite unsatisfying:
type family Contains (a :: Type) (b :: Type) :: Bool where
Contains a (_ a _ _ _) = 'True
Contains a (_ _ a _ _) = 'True
Contains a (_ _ _ a _) = 'True
Contains a (_ _ _ _ a) = 'True
Contains a (_ a _ _) = 'True
Contains a (_ _ a _) = 'True
Contains a (_ _ _ a) = 'True
Contains a (_ a _) = 'True
Contains a (_ _ a) = 'True
Contains a (_ a) = 'True
Contains a a = 'True
Contains _ _ = 'False
A better solution would be, in the case of f x
, to recurse on both parts: is a
contained in f
, or is a
contained in x
? This poses two problems; the first is: how can we recurse on f
?
> PolyKinds
The problem we face is that we have explicitly declared the second argument of Contains
to be of kind Type
. To be able to recurse on f
, we’d need a similar type family, in which the second argument would be of kind Type -> Type
. And, in the recursion case of that second type family, we’d need to delegate the recursion to a third type family for types of kind Type -> Type -> Type
… and so on.
What we need is instead to define our family so that it is generic over kinds. That’s what we can do with PolyKinds: we can now declare that the kind of our b
parameter is generic: any kind k
will do, which allows us to recurse on f
too:
type family Contains (a :: Type) (b :: k) :: Bool where
Contains a a = 'True
Contains a (f x) = Contains a f ??? Contains a x
Contains _ _ = 'False
The only thing that’s left is to combine our two recursive calls: what we’ll need is a boolean “or” at the type-level! It’s thankfully pretty straightforward, now, with all our extensions:
type family Or (a :: Bool) (b :: Bool) :: Bool where
Or 'True _ = 'True
Or 'False b = b
> UndecidableInstances
Putting it together:
type family Contains (a :: Type) (b :: k) :: Bool where
Contains a a = 'True
Contains a (f x) = Or (Contains a f) (Contains a x)
Contains _ _ = 'False
But as you might have guessed, this doesn’t compile. GHC really wants to guarantee that instance resolution terminates in finite time; for that reason, it forbids several patterns that it deems “dangerous”, such as nested type family applications, which is precisely what we’re trying to do here! A dangerous extension, UndecidableInstances, disables some of those checks at our own risk, and allows our solution to compile.
> TypeOperators
While our solution is now correct, we can make it better. One last thing we can do is use an existing implementation of type-level boolean or
, rather than reimplementing our own: there is one already in Data.Type.Bool. You’ll notice however that it is defined as an operator! As you will have guessed, this is what is allowed by TypeOperators.
With that, we can finally settle on one final version:
type family Contains (a :: Type) (b :: k) :: Bool where
Contains a a = 'True
Contains a (f x) = Contains a f || Contains a x
Contains _ _ = 'False
Coming to terms with our solution
For convenience, and without any extension, we can define a simple typeclass to bring our type-level booleans back to term-level:
class ToBool (b :: Bool) where
toBool :: Bool
instance ToBool 'True where toBool = True
instance ToBool 'False where toBool = False
We can now finally test it:
> toBool @(Contains Int (Either Int String))
True
> toBool @(Contains Int (Either Int))
True
And… that’s it! You can find this version of the code in this other gist. This was a silly but fun exercise, I learned a ton, hopefully this will be useful to you too!
Happy new year, and all that sort of things. :)