We use cookies and other tracking technologies to improve your browsing experience on our site, analyze site traffic, and understand where our audience is coming from. To find out more, please read our privacy policy.

By choosing 'I Accept', you consent to our use of cookies and other tracking technologies.

We use cookies and other tracking technologies to improve your browsing experience on our site, analyze site traffic, and understand where our audience is coming from. To find out more, please read our privacy policy.

By choosing 'I Accept', you consent to our use of cookies and other tracking technologies. Less

We use cookies and other tracking technologies... More

# Login or registerto boost this post!

Show some love to the author of this blog by giving their post some rocket fuel 🚀.

# Login to see the application

Engineers who find a new job through Blockchain Works average a 15% increase in salary 🚀 # TypeRep - Map step by step

Veronika Romashkina 11 July, 2018 | 14 min read

Recently, I have been working on a very interesting and sophisticated project called `typerep-map`. A lot of advanced features and tricks were used during the development process and I have discovered many amusing and new sides of Haskell. So, I decided to share the ideas, steps, issues, etc. in this blog post.

If you want to skip all the funny parts, here is the link to the code itself:

The basic idea behind `typerep-map` is to have a data structure like `Map`, but where types serve as keys, and values stored in the map are of the type specified in the corresponding key.

An example image of this structure:

``````Int: 42
Bool: True
``````

There can be only one key-value pair for each type.

And here is an example written in pseudo-code for better understanding:

``````insert (42 :: Int) empty = [(Int, 42)]
^     ^
key   value

insert (42 :: Int) (insert True empty) = [(Int, 42), (Bool, True)]
^     ^      ^     ^
key   value  key  value
``````

We also want our values to be indexed by a polymorphic type, but that will be explained later.

### Existing Solutions

There already exist some libraries that implement ideas similar to `typerep-map`:

• `type-map` appears to resemble our project, however the interface is different. They track the elements in the types and don't provide the desired parametrization.
• `dependent-map` is closer to our goal in terms of the interface but the package has a complete reimplementation of `Data.Map.Lazy`, and the goal of the `typerep-map` project is to have an efficient implementation based on primitive unboxed arrays.

### Motivation

You might wonder what `typerep-map` brings to the table if there are other packages that aim to fulfil the same purpose. The primary goal is to use it in the `caps` library instead of the `DMap` type from `dependent-map` parametrized by `TypeRep`.

In `caps` the performance of lookups is extremely important so it makes sense to prioritize its performance above that of other functions.

## Implementing `TypeRepMap`

Sections below describe the details of the implementation phases and the general concepts.

NOTE: in this blog post I am talking about `ghc-8.0.2` or higher.

### Extensions used

The code snippets in this blog post assume that the following extensions are enabled:

• `-XTypeApplications`
• `-XScopedTypeVariables`
• `-XGADTs`
• `-XTypeInType`
• `-XAllowAmbiguousTypes`

## Map-based implementation

The reference implementation is more or less straightforward. It uses a lazy `Map` from `containers` as an internal representation of the desired data type.

### Keys

Normally, types in Haskell are only present at compile-time: after type-checking, they are completely erased from the program. And yet we want to use types as keys in a map. This requires a runtime representation for types. Luckily, the standard library provides a runtime representation for types in the form of `TypeRep`. But there are actually two different definitions of `TypeRep` in `base`:

The one in `Type.Reflection` was introduced in GHC 8.2 and the old one was rewritten to be based on it. `Type.Reflection.TypeRep` has kind `TypeRep :: k -> *` while the old one has kind `TypeRep :: *`.

To have the basic idea of what actually `TypeRep` is, you can think of the old `TypeRep` as an infinite ADT with all types enumerated as tag constructors:

`````` data TypeRep = Int | Bool | Char | String | ...
``````

and the new `TypeRep` is an equivalent to the infinite GADT:

`````` data TypeRep (a :: k) where
Int    :: TypeRep Int
Bool   :: TypeRep Bool
Char   :: TypeRep Char
String :: TypeRep String
...
``````

If you are interested in the actual difference between old and new versions of the `TypeRep` and motivation for this change, here is a nice ICFP video by Simon Peyton Jones:

I use the old `TypeRep` that comes from `Data.Typeable`. And I have an explanation for that: there is a limitation in regular `Map` that all keys must be of the same type and this is not possible to achieve with parameterized `TypeRep`. Also, the old `TypeRep` will never be deprecated (from `8.2` it is just a different interface to the new `TypeRep`, so it's not obsolete), and it is sufficient for our goal to support older GHC versions.

Here is a usage example of basic `TypeRep` interface:

``````ghci> :t typeRep
typeRep :: Typeable a => proxy a -> TypeRep

ghci> typeRep (Proxy @Int)
Int

ghci> :t it
it :: TypeRep
``````

### Values

For the first prototype, I decided to use `Dynamic` as values in our `TypeRepMap`.

``````ghci> :t toDyn
toDyn :: Typeable a => a -> Dynamic

ghci> toDyn True
<<Bool>>

ghci> fromDynamic (toDyn "Haskell") :: Maybe String
``````

So we've got:

``````newtype TypeRepMap = TypeRepMap { unMap :: Map TypeRep Dynamic }
``````

and the initial interface looks like this:

``````insert :: forall a . Typeable a => a -> TypeRepMap -> TypeRepMap
insert val = TypeRepMap . LMap.insert (typeRep (Proxy @a)) (toDyn val) . unMap

lookup :: forall a . Typeable a => TypeRepMap -> Maybe a
lookup = fromDynamic <=< LMap.lookup (typeRep \$ Proxy @a) . unMap
``````

When looking at the `Dynamic` data type implementation

``````data Dynamic = Dynamic TypeRep Obj
type Obj = Any
``````

you can notice that it already stores `TypeRep` inside, so it seems like it's a bit suboptimal decision due to redundancy. And we can safely use `Any` as our value type.

According to the `Dynamic` implementation, we can use `unsafeCoerce` function for the conversion to `Any` and from `Any`.

So we get:

``````newtype TypeRepMap = TypeRepMap { unMap :: LMap.Map TypeRep Any }

insert val = TypeRepMap
. LMap.insert (typeRep \$ Proxy @a) (unsafeCoerce val)
. unMap

lookup = fmap unsafeCoerce . LMap.lookup (typeRep \$ Proxy @a) . unMap

``````

Let's check how it's all working:

``````ghci> let x = lookup \$ insert (11 :: Int) empty

ghci> x :: Maybe Int
Just 11

ghci> x :: Maybe ()
Nothing
``````

All right, we have a simple working version. But there are ways to improve it.

### Parameterization

The next step is to parametrize our data type by type variable `f` with kind `f :: k -> *`. This `f` will be the interpretation of our keys. Such parameterization allows us to encode additional structure common between all elements, making it possible to use `TypeRepMap` to model a variety of things from extensible records to monadic effects. This sort of parametrization may be familiar to users of `vinyl` records.

Note that the input kind is `k` — we want to support arbitrary kinds as well. Since `TypeRep` is poly-kinded, the interpretation can use any kind for the keys (see some examples in documentation).

``````newtype TypeRepMap (f :: k -> *) = TypeRepMap
{ unMap :: LMap.Map TypeRep Any
}
``````

The implementation of the functions stays the same, but the types are different:

``````insert :: forall a f . Typeable a => f a -> TypeRepMap f -> TypeRepMap f
lookup :: forall a f . Typeable a => TypeRepMap f -> Maybe (f a)
``````

Our previous implementation is just equivalent to `TypeRepMap Identity` in the context of the described design.

NOTE: Another reason to get rid of the `Dynamic`: if we keep it then we have to specify `Typeable (f a)` constraint instead of `Typeable a` in the type declarations. And having `Typeable a` constraint would let us implement the following function efficiently:

``````hoist :: (forall x. f x -> g x) -> TypeRepMap f -> TypeRepMap g
``````

## Vector-based implementation

The next step is to write an alternative implementation based on unboxed vectors, which is expected to be faster.

We want to use `Vector (TypeRep, Any)` instead of our lazy map. This vector is going to be sorted by `TypeRep`. `insert`/`lookup` algorithms should be implemented manually in the following way:

• `insert`: allocate a new vector of `n + 1` element, copy everything from the initial vector adding the new element and don't forget to keep the sorting.
• `lookup`: the simple binary search.

The point of the unboxed vector is that it helps to get rid of the pointer indirection. If we take just `Vector` we will observe this picture (`Ty` stands for `TypeRep` and `El` stands for an element):

``````  [   Pair₁,      Pair₂,      Pair₃,      Pair₄   ]
/   \       /   \       /   \       /   \
Ty₁   El₁   Ty₂   El₂   Ty₃   El₃   Ty₄   El₄
``````

Instead of this what we would like to see is:

``````    [ Ty₁, El₁, Ty₂, El₂, Ty₃, El₃, Ty₄, El₄ ]
``````

In this way, as the result, the access to the `Ty` or `El` is shorter for exactly one pointer dereference.

However, turned out that it's more efficient to store keys and values in separate vectors under corresponding indices:

``````    [ Ty₁,   Ty₂,    Ty₃,    Ty₄ ]
|      |       |       |
[ El₁,   El₂,    El₃,    El₄ ]
``````

Unfortunately, `TypeRep` doesn't have the `Unbox` instance and it looks like it's not possible to write it. So instead of storing `TypeRep` we will be storing a `Fingerprint`. Basically, `Fingerprint` is the hash for `TypeRep`, so it makes sense to move in this direction.

``````ghci> :t typeRepFingerprint
typeRepFingerprint :: TypeRep -> Fingerprint

ghci> typeRepFingerprint \$ typeRep \$ Proxy @Int
b1460030427ac0fa458cbf347f168b53

ghci> typeRepFingerprint \$ typeRep \$ Proxy @Bool
ebf3a8541b05453b8bcac4a38e8b80a4
``````

`TypeRep` from `Data.Typeable` module is defined as

``````type TypeRep = SomeTypeRep
``````

If we take a look at the `Ord` instance of `SomeTypeRep` in `base` we'll see that it's confirmed that `Fingerprint` is unique for each `TypeRep`. That means it's okay to use `Fingerprint` as a key instead of `TypeRep`.

### Vector

This is initial vector-based implementation:

``````data TypeRepMap (f :: k -> *) = TypeRepMap
{ fingerprints :: Vector Fingerprint
, anys         :: Vector Any
}
``````

We want to use unboxed vector as a type for the `fingerprints` field of `TypeRepMap`.

Every unboxed vector is the newtype wrapper over some primitive vector. In order to use an unboxed vector of `Fingerprint` we need to implement an instance of the `Prim` typeclass from the `primitive` package for `Fingerprint`. It was proposed to add this instance under this issue in `primitive` library (having this instance inside library would simplify implementation a lot):

As the reference for `Prim` instance implementation, we can use the `Storable` type class which contains similar functions. There is already the instance `Storable` for `Fingerprint`. An assumption is that there is no significant difference between `Storable` and `Prim` for our `lookup` checks and we can use storable vector instead of unboxed one. For more information about the difference between those typeclasses see this SO answer.

Though our initial assumptions were false and turned out that `Storable` doesn't give the desired performance boost as shown with benchmarks.

### Optimal Vector

According to the source code, `Fingerprint` is a pair of `(Word64, Word64)`. So instead of having a single vector of `Fingerprint`s we can have a vector of `Word64` where `Fingerprint` with number `i` stored on `2 * i` and `2 * i + 1` indices.

But actually, it’s better to split it into two separate vectors of `Word64` where one vector stores the first halves of `Fingerprint` and the other one stores the second halves correspondingly. It makes the implementation easier and also faster (checked with benchmarks) because of the assumption that it should be almost always enough to compare only the first part and it makes key comparison faster.

After all described optimizations were done our structure took the following form:

``````data TypeRepMap (f :: k -> *) = TypeRepMap
{ fingerprintAs :: Unboxed.Vector Word64
, fingerprintBs :: Unboxed.Vector Word64
, anys          :: Boxed.Vector Any
}
``````

And the `lookup` function was implemented like this:

``````lookup :: forall a f . Typeable a => TypeRepVector f -> Maybe (f a)
lookup tVect =  fromAny . (anys tVect V.!)
<\$> binarySearch (typeRepFingerprint \$ typeRep \$ Proxy @a)
(fingerprintAs tVect)
(fingerprintBs tVect)
``````

It uses a manually implemented version of the binary search algorithm optimized for unboxed vectors. The algorithm initially performs a binary search using the `fingerprintAs` vector only. And then, after finding the first half, walks through the `fingerprintBs` vector.

At first, a simple naive binary search was implemented but later it was rewritten into a cache-optimized binary search ([see the description here](http://bannalia.blogspot.ru/2015/06/cache-friendly-binary-search.html )) which boosted the performance significantly.

## Array-based implementation

But that’s not all. Later we noticed that every vector has the following definition:

``````data Vector a = Vector {-# UNPACK #-} !Int
{-# UNPACK #-} !Int
{-# UNPACK #-} !(Array a)
``````

As you can see it contains two `Int` fields. So we can make our representation more optimal by using `Array` instead of boxed vector and `PrimArray` instead of unboxed vector directly in the `TypeRepMap` data type.

After all optimizations the final shape of the `TypeRepMap` is following:

``````data TypeRepMap (f :: k -> Type) = TypeRepMap
{ fingerprintAs :: {-# UNPACK #-} !(PrimArray Word64)
, fingerprintBs :: {-# UNPACK #-} !(PrimArray Word64)
, anys          :: {-# UNPACK #-} !(Array Any)
}
``````

## Benchmarking

Initially, I was frustrated about this part because I had no idea how to create the `Map` of 1000 elements as that means I needed to somehow generate 1000 types. But there was actually a pretty elegant solution for this puzzle — polymorphic recursion.

Let's introduce the following data types as type-level natural numbers:

``````data Z
data S a
``````

Using these data types we can now implement the function which builds `TypeRepMap` of the desired size.

``````buildBigMap :: forall a . Typeable a
=> Int
-> Proxy a
-> TypeRepMap Proxy
-> TypeRepMap Proxy
``````

so when I run `buildBigMap` with size `n` and `Proxy a`, it calls itself recursively with `n - 1` and `Proxy (S a)` at each step, so the types are growing on each step.

But this wasn’t the only challenge in benchmarking `TypeRepMap`. There were also a few interesting things with benchmarks to keep in mind:

• We should force maps to normal form before benchmarking.
• We can't use `rnf` function. Deriving `NFData` instance for `TypeRepMap` is not possible because there can be no `NFData` for `Any`. We won't be able to use `rnf` because it would try to force both the keys and the values, as our values are `Any` (can't force them), but since evaluating the values is not important at all for the benchmark, we could try to define a function like `rnf` but without touching the values.

For `Map`-based implementation we need to benchmark the `lookup` function on different depths of our tree (as `Map` is internally a tree). But the key can be very close to the root so our benchmarks won’t be honest enough. Thus we need to test on different `Proxy`s with different types.

Here is the diagram of how the tree’s been constructed. You can notice that the `Char` element is the direct child of the root:

``````size: 16
tree:
+--Proxy * (S (S (S (S (S (S (S (S (S (S (S (S (S (S Z))))))))))))))
|
+--Char
|  |
|  |  +--Proxy * (S (S (S (S (S (S Z))))))
|  |  |
|  +--Proxy * (S (S (S (S (S (S (S (S (S (S (S Z)))))))))))
|     |
|     +--|
|
Proxy * (S (S (S (S (S (S (S (S (S (S (S (S (S Z)))))))))))))
|
|     +--Proxy * (S (S (S (S (S (S (S (S (S (S Z))))))))))
|     |
|  +--Proxy * (S (S (S (S (S (S (S (S (S Z)))))))))
|  |  |
|  |  +--Proxy * (S (S (S (S (S (S (S (S Z))))))))
|  |
+--Proxy * (S (S (S (S (S (S (S Z)))))))
|
|     +--Proxy * (S Z)
|     |
|  +--Proxy * (S (S (S Z)))
|  |  |
|  |  +--Proxy * (S (S (S (S (S (S (S (S (S (S (S (S Z))))))))))))
|  |
+--Proxy * (S (S (S (S Z))))
|
|  +--Proxy * (S (S Z))
|  |
+--Proxy * (S (S (S (S (S Z)))))
|
+--Proxy * Z
``````

Since we can't predict how `Ord` on `TypeRep` will behave we need to select a `Proxy` from our range randomly, however, because our types were huge we introduced the following type family to solve that issue:

``````type family BigProxy (n :: Nat) :: * where
BigProxy 0 = Z
BigProxy n = S (BigProxy (n - 1))
``````

While running this version of benchmarks it turned out that `rnf` function was taking a lot of time mostly on normalisation of the enormous `TypeRep` keys which consisted of tall nested types like `S (S (S ...))`.

So, eventually, I end up using the ghc plugin ghc-typelits-knownnat and the type of the `buildBigMap` became:

``````buildBigMap :: forall (a :: Nat) . KnownNat a
=> Int
-> Proxy a
-> TypeRepMap (Proxy :: Nat -> *)
-> TypeRepMap (Proxy :: Nat -> *)
``````

In order to benchmark `lookup` function we implemented a special function `fromList` to use in place of the bunch of inserts, so we will be able to see the real time measures of `lookup` operation itself.

``````data TF f where
TF :: Typeable a => f a -> TF f

fromList :: [TF f] -> TypeRepMap f
``````

Now the `buildBigMap` function will have type

``````buildBigMap :: forall (a :: Nat) . KnownNat a
=> Int
-> Proxy a
-> [TF (Proxy :: Nat -> *)]
-> [TF (Proxy :: Nat -> *)]
``````

Benchmarks make 10 lookups to collect average performance statistics:

``````tenLookups :: TypeRepMap (Proxy :: Nat -> *)
-> ( Proxy 10, Proxy 20, Proxy 30, Proxy 40
, Proxy 50, Proxy 60, Proxy 70, Proxy 80
)
tenLookups tmap = (lp, lp, lp, lp, lp, lp, lp, lp)
where
lp :: forall (a :: Nat) . Typeable a => Proxy a
lp = fromJust \$ lookup tmap
``````

and compare the work of map-based implementation with optimal array-based implementation. Here are the achieved results:

NOTE: time in the report is for 10 lookups. To get the average time of single `lookup` you need to divide time by 10.

• Benches GHC-8.4.3
``````benchmarking map-based/lookup
time                 2.198 μs   (2.195 μs .. 2.202 μs)
1.000 R²   (1.000 R² .. 1.000 R²)
mean                 2.196 μs   (2.193 μs .. 2.199 μs)
std dev              10.46 ns   (8.436 ns .. 12.67 ns)

benchmarking dependent map/lookup
time                 819.0 ns   (810.7 ns .. 829.1 ns)
0.999 R²   (0.999 R² .. 1.000 R²)
mean                 815.8 ns   (812.1 ns .. 822.5 ns)
std dev              16.11 ns   (9.371 ns .. 23.09 ns)

benchmarking vector-binary-search/lookup
time                 370.7 ns   (368.9 ns .. 372.2 ns)
1.000 R²   (1.000 R² .. 1.000 R²)
mean                 368.9 ns   (368.2 ns .. 369.7 ns)
std dev              2.512 ns   (1.938 ns .. 3.474 ns)

benchmarking array-cache-optimized-binary-search/lookup
time                 183.5 ns   (183.2 ns .. 183.8 ns)
1.000 R²   (1.000 R² .. 1.000 R²)
mean                 183.6 ns   (183.3 ns .. 184.4 ns)
std dev              1.535 ns   (958.3 ps .. 2.631 ns)
``````

## Conclusion

In this blog post, I wanted to show the difficulties, tricks, and useful information which I personally learned during the implementation of an optimized version of `TypeRepMap`. Also, I needed to somehow structure the knowledge I’ve gained while working on this project. You can say that some parts of the post can be skipped or might be irrelevant but I wrote it in such a way on purpose to highlight the topics that I find very hard to find and understand quickly. So I hope you too will find this knowledge useful!

## Acknowledgments

Many thanks to Vladislav Zavialov (\@int-index) for mentoring this project! It was the great experience for me.

## Bonus

A few more challenges on the way to the release `typerep-map`:

### KindOf

During interface enhancement I ran into some weird issue described below.

It’s nice to have the `member` function and it makes sense to implement it using already written lookup function:

``````member :: forall a f . Typeable a => TypeRepMap f -> Bool
member trMap = case lookup @a trMap of
Nothing -> False
Just _  -> True
``````

Type of the `lookup` function is the following:

``````lookup :: forall a f . Typeable a => TypeRepMap f -> Maybe (f a)
``````

Unfortunately, this implementation of `member` doesn’t compile! The problem is in the fact that the compiler can’t infer that type variable `a` and the argument to `f` have the same kind. These two functions have the following type with implicitly inferred kinds:

``````lookup :: forall {k} (a :: k) (f :: k -> *) . Typeable a
=> TypeRepMap f -> Maybe (f a)
member :: forall {k1} {k2} (a :: k1) (f :: k2 -> *) . Typeable a
=> TypeRepMap f -> Bool
``````

After this ghc proposal is implemented, it should be possible to write such type signatures directly in code. The current workaround is to use this trick with `KindOf` type:

``````type KindOf (a :: k) = k

member :: forall a (f :: KindOf a -> Type) . Typeable a => TypeRepMap f -> Bool
``````

### New `TypeRep` performance

During benchmarking the `Map`-based implementation of `TypeRepMap`, very perceptible performance degradation was noticed. Here is the comparison table with the results we have with our `Map`-based implementation.

``````ghc version  containers  performance
8.0.2        0.5.7.1     556.9 ns
8.2.2        0.5.10.2    2.076 μs
8.4.3        0.5.11.0    2.464 μs
``````

We didn’t observe this performance degradation when we used `Fingerprint` as keys, so it's probably an issue with the new `TypeRep`.

### KnownNat and Typeable

Initial version of `buildBigMap` function had this type signature:

``````buildBigMap :: forall (a :: Nat) . (Typeable a, KnownNat a) => ...
``````

But, unfortunately, it became broken on GHC-8.4.3! Turned out that `Typeable` and `KnownNat` constraints don’t play well together. This observation resulted in the following ghc ticket with quite an interesting discussion:

Originally published on kowainik.github.io ## Related Issues

cosmos / gaia
• Started
• 0
• 6
• Intermediate
• Go
cosmos / gaia
• Started
• 0
• 3
• Intermediate
• Go
cosmos / ibc
• Started
• 0
• 2
• Intermediate
• TeX
cosmos / ibc
• Open
• 0
• 0
• Intermediate
• TeX

### Get hired!

#### Sign up now and apply for roles at companies that interest you. Engineers who find a new job through Blockchain Works average a 15% increase in salary.