Serokell’s Work on GHC: Dependent Types, Part 5
This article continues the fine tradition of Serokell’s GHC team sharing their progress on bringing dependent types to Haskell. A lot has happened since the last report, and there is plenty to cover.
In this edition, Vladislav Zavialov presents three major contributions and a host of smaller improvements that push Dependent Haskell closer to becoming a practical reality.
Summary
The highlights of this report are:
- Visible
forallin GADTs - Namespace-specified imports
- Type instances in kind checking
After that, we are going to go through a number of other improvements:
- Progress on unifying
HsTypeandHsExpr - The star kind syntax in required type arguments
- Pun detection in required type arguments
- New type families:
Tuple,Constraints,Tuple#,Sum# - Rework of name resolution for built-in and punned names
Visible forall in GADTs
The design of dependent types for Haskell, as described by GHC Proposal #378 “Design for Dependent Types”, includes support for at least 6 quantifiers:
Quantifier Dependence Visibility Erasure
------------------------------------------------------------
forall a. ty Dependent Invisible Erased
forall a -> ty Dependent Visible Erased
foreach a. ty Dependent Invisible Retained
foreach a -> ty Dependent Visible Retained
Eq a => ty Non-dependent Invisible Retained
t1 -> t2 Non-dependent Visible Retained
The one we all care about is foreach a -> ty, also known as the dependent product, dependent function, or Π-type (all three are synonymous). But before we can tackle something so ambitious, it helps to deal with the other quantifiers, such as forall a -> ty, often referred to as the visible forall or VDQ (visible dependent quantification).
The majority of design questions for VDQ were resolved when GHC Proposal #281 “Visible forall in types of terms” was accepted back in 2021, and we have been relentlessly chipping away at its implementation ever since, one engineering challenge at a time.
The latest advancement in this direction is the implementation of VDQ in GADTs. Starting with GHC 9.14, the RequiredTypeArguments extension allows declarations such as the following:
data T a where
Typed :: forall a -> a -> T a
In this example, the Typed data constructor takes two visible arguments: a type, and then a value of that type, e.g. Typed Int 42 or Typed String "hello". This surely has a dependently-typed look to it! Here are some examples of what it might look like in various contexts:
-- Expressions
t1 = Typed Int 42
t2 = Typed String "hello"
t3 = Typed (Int -> Bool) even
-- Patterns
f1 (Typed a x) = x :: a
f2 (Typed Int n) = n*2
f3 (Typed ((->) w Bool) g) = not . g
-- Types (with DataKinds)
type T1 = Typed Nat 42
type T2 = Typed Symbol "hello"
type T3 = Typed (Type -> Constraint) Num
One thing to keep in mind, and why this doesn’t really count as dependent types, is that the type argument is guaranteed to be erased, i.e. has no effect on how data is represented on the heap, and consequently can’t be pattern matched on:
f4 (Typed a x) =
case a of -- Nuh-uh! This is a compile-time error
Int -> negate x
Bool -> not x
_ -> x
Nonetheless, allowing this form of quantification comes with its own technical challenges, which are now behind us. This means that when it comes to adding proper dependent types, we won’t have to worry about syntactic trivia.
Here is what it took to make this work.
The first challenge was that GHC’s AST for constructor patterns MkE @tp1 @tp2 p1 p2 kept the type arguments and term arguments in entirely separate lists. We refactored it to use a mixed-list representation:
ConPat "MkE" [tp1, tp2] [p1, p2] -- old
ConPat "MkE" [InvisP tp1, InvisP tp2, p1, p2] -- new
The immediate effect of the new representation was an improvement to error messages. Consider the pattern Con x @t y. Previously it resulted in a parse error because @t could not occur after x, and now it is reported as [GHC-14964].
More interestingly, the form Con x @t y has become, in principle, representable, so it has become possible to handle it properly down the compilation pipeline.
The second challenge was to allow visible forall in constructor signatures. It took a few tries to get this right, as the forall-or-nothing rule that governs implicit quantification meant that some quantifiers had to be kept in a separate field. The type checker was also updated to no longer assume that all subpatterns without the @ herald were value arguments. Indeed, some or all of them may very well turn out to be required type arguments.
The third challenge was to update the Core representation of data constructors to allow foralls of varying visibility to occur in the list of quantifiers:
- dcUserTyVarBinders :: [InvisTVBinder]
+ dcUserTyVarBinders :: [TyVarBinder]
This change not only necessitated mechanical changes scattered throughout the GHC codebase, but also resulted in cryptic Core Lint errors. The patch had been shelved for a few months until @mniip helped to debug this issue at ZuriHac 2025. As it turned out, the predicate that determines whether to introduce a data constructor wrapper was returning a false negative.
With all three challenges resolved, GHC can now handle the following example (constructed to stress-test the feature rather than demonstrate a realistic use case):
newtype Checker a b c where
C :: forall a. forall b -> forall c. (a -> (c, b)) -> Checker a b c
cDouble = C Double (length &&& read)
-- ghci> map (test cDouble) ["1.5", "0.5", "1.05", "0.05"]
-- [(3,True),(3,True),(4,True),(4,False)]
test :: Show b => Checker String b c -> String -> (c, Bool)
test (C t f) s = fmap (r -> show @t r == s) (f s)
And there we have it: we are one step closer to freely mixing terms and types in our Haskell programs. Future work in that direction includes nested quantification in GADTs (#18389) and VDQ in pattern synonyms (#23704).
Namespace-specified imports
Haskell has two namespaces, as GHC Proposal #581 explains:
- the type namespace, including the names of type constructors, type synonyms, type families and type classes;
- the data namespace, including the names of term-level values, functions, data constructors and pattern synonyms.
This separation allows us to define data constructors and type constructors whose names coincide:
data T = T
At use sites, GHC infers which T is referred to from the context. For example, in t :: T the occurrence of T resolves to the type constructor, whereas in t = T to the data constructor.
The reliance on context to select the namespace creates ambiguities when it comes to import/export lists, fixity declarations, pragmas, TH name quotation, and most notably when mixing terms and types with DataKinds and RequiredTypeArguments.
The easiest way to sidestep these issues is to avoid introducing type and data constructor names that coincide, and this is what dependent-types-flavored Haskell tends to do. However, we can’t expect all libraries to adopt such a convention, so we still need ways to disambiguate T against T in a context-independent manner.
After a number of false starts, with no fewer than three proposals from various authors, the consensus was to introduce namespace-specified imports looking like this:
import Data.Monoid as M.Type (type ..)
import Data.Monoid as M (data ..)
Notice the new .. syntax; it is a wildcard that stands for all names in the specified namespace.
Recall that the Data.Monoid module exports the following newtypes:
newtype All = All {getAll :: Bool}
newtype Alt f a = Alt {getAlt :: f a}
newtype Any = Any {getAny :: Bool}
newtype Ap f a = Ap {getAp :: f a}
newtype Dual a = Dual {getDual :: a}
newtype Endo a = Endo {appEndo :: a -> a}
newtype First a = First {getFirst :: Maybe a}
newtype Last a = Last {getLast :: Maybe a}
newtype Product a = Product {getProduct :: a}
newtype Sum a = Sum {getSum :: a}
With imports written as above, one can refer to the type constructors as M.Type.Dual, M.Type.Endo, M.Type.Product, and so on, and to the data constructors as M.Dual, M.Endo, and M.Product respectively.
At the outset, this seemed like a straightforward feature to implement: a new form of import/export item that simply selects names according to the namespace specifiers. However, as we started to tackle this problem, we found a surprising number of pre-existing bugs that had to be fixed first:
- In subordinate import and export lists, the use of the
typenamespace specifier used to be silently ignored (#12488, #22581) - In subordinate import lists within a
hidingclause, non-existent items led to a poor warning message with-Wdodgy-imports(#25983) - In subordinate import lists within a
hidingclause, non-existent items resulted in the entire import declaration being discarded (#25984) - In subordinate import lists, it was not possible to refer to a class
method if there was an associated type of the same name (#25991)
With so many corner cases discovered, it was no longer clear how many other bugs were lurking in the import/export logic, but we decided to proceed cautiously with preparations:
-
We introduced the option to use the
datakeyword with individual import/export items, e.g.import Data.Proxy as D (data Proxy). -
In accordance with the proposal, we deprecated the
patternnamespace specifier and introduced the-Wpattern-namespace-specifierwarning to aid migration to the newdatasyntax. -
We increased the test coverage of
-Wduplicate-exportsand-Wdodgy-exports, which revealed some typos and dead code.
At last, it seemed like the foundation was solid enough to tackle the actual feature. The next step was to add support for top-level namespace-specified wildcards type .. and data .. to import and export lists:
import M (type ..) -- imports all type and class constructors from M
import M (data ..) -- imports all data constructors and terms from M
module M (type .., f) where
-- exports all type and class constructors defined in M,
-- plus the function 'f'
We then carried out a refactoring, after which the implementation resumed and reached another milestone: support for subordinate namespace-specified wildcards X(type ..) and X(data ..) in import and export lists:
import M (Cls(type ..)) -- imports Cls and all its associated types
import M (Cls(data ..)) -- imports Cls and all its methods
module M (R(data ..), C(type ..)) where
-- exports R and all its data constructors and record fields;
-- exports C and all its associated types, but not its methods
At this point, GHC could handle all examples from the proposal, so we might as well declare victory. Studying the spec revealed a few corner cases that are still being worked on (#27268), but those are unlikely to arise in practice.
Et voilà, Haskellers have another tool to deal with the Dreaded Namespace Problem.
Type instances in kind checking
If you ever wrote $(return []) in your module to help GHC kind-check your code, you know what this section is going to be about. If not, read on: this one is a heavy hitter.
We are excited to announce that, after 10 years of futile attempts, we finally taught GHC to find open type family instances during kind checking, regardless of the order in which they are written in the source file. Consider the following program:
type family Open a
type family F a :: Open a
type instance F Int = True
type instance Open Int = Bool
When kind-checking the F Int = True instance, we need to know that Open Int = Bool, so we’d better check the other type instance first, otherwise we’ll see the following error:
error: [GHC-83865]
• Expected kind ‘Open Int’, but ‘True’ has kind ‘Bool’
• In the type ‘True’
In the type instance declaration for ‘F’
But look again at this line:
type instance F Int = True
It mentions F, Int, and True. Nothing points to Open! So how can we know to kind-check the other instance first?
We tried various heuristics, but every time, we found corner cases that couldn’t be handled. In the meantime, some variation of this issue was reported at the rate of about once a year:
- #12088 “Type/data family instances in kind checking” (May 20, 2016)
- #12239 “Dependent type family does not reduce” (June 29, 2016)
- #13790 “GHC doesn’t reduce type family in kind signature unless its arm is twisted” (June 5, 2017)
- #14668 “Ordering of declarations can cause typechecking to fail” (January 14, 2018)
- #15561 “Type error conditioned on ordering of GADT and type family definitions” (August 24, 2018)
- #16410 “Order of declarations matters” (March 8, 2019)
- #16448 “Unresolved type family kind bug” (March 16, 2019)
- #16693 “Order of declarations affects which programs are accepted” (May 25, 2019)
- #19611 “Typechecking of dependently-kinded type family instances doesn’t have enough equalities in scope” (March 28, 2021)
- #20875 “Declaration order of aliases and type families” (December 26, 2021)
- #21172 “No reduction of kind families” (March 5, 2022)
- #22257 “Dependent type families cannot be separated from their instances” (October 5, 2022)
- #25238 “Kind reduction with open type families depends on data type declaration order” (September 5, 2024)
- #25834 “Splices seem to break type inference for type families” (March 8, 2025)
Haskellers are phenomenal at writing programs that break the compiler, and thanks to the bug reports, we have accumulated an excellent set of test cases.
Let’s now dive into the technical details.
After renaming type, class, and instance declarations in a module, GHC does dependency analysis on the renamed declarations to figure out in which order to kind-check them. The dependency analysis returns a topologically sorted list of SCCs (strongly-connected components), where an SCC represents a set of mutually recursive declarations.
If the dependency analysis were complete, i.e. if it were able to discover all dependencies between all declarations, then GHC could simply kind-check SCCs in order. Unfortunately, this is not the case, because dependencies come in two varieties:
-
Lexical dependencies arise when
XmentionsYby name:data X (a :: Y) = MkX -- depends on Y data Y = MkY -
Non-lexical dependencies arise when an instance must be in the typing environment:
type family F x data X (a :: F Int) = MkX a -- depends on (F Int ~ Type) type instance F x = Type
Non-lexical dependencies can’t be discovered by looking at the free variables of a declaration, and attempts to find a good heuristic did not bear fruit. As a consequence, the order of SCCs coming out of the renamer is determined solely by lexical dependencies.
In other words, the SCCs are arranged in lexical dependency order, meaning:
- definitely in dependency order if all dependencies are lexical
- possibly not in dependency order if there are non-lexical dependencies
Here is another example of how type checking declarations might go wrong due to non-lexical dependencies:
type family F a
type instance F Int = Bool
data R = MkR (F Int)
type S = MkR True
For S to kind-check, we need to know that (F Int) ~ Bool. But we won’t know that unless we’ve looked at the type instance declaration for F before kind-checking S.
The solution we ended up adopting is to discover the correct kind-checking order by trial and error. The algorithm works as follows:
-
Perform the dependency analysis on declarations without instances and considering
lexical dependencies only. The result is a topologically sorted list of SCCs. -
Create one singleton “SCC” per instance and put them at the end.
-
Check all SCCs in order, skipping any that are blocked (free variables not in the environment), flawed (unusable unpack pragmas), or failing (type errors)
- (3a) if none were skipped, we are done
- (3b) if all were skipped and none are flawed, we are stuck; report errors and exit
- (3c) if all were skipped and some are flawed, redo the pass allowing flawed SCCs
- (3d) if some were skipped and some weren’t, we’ve made progress; iterate
In the common case of lexical dependencies only, the algorithm is linear in the number of groups: it completes in one pass if the program is kind-correct, or two passes if there are kind errors.
In the less common case of non-lexical dependencies, the algorithm is worst-case quadratic in the number of groups: if each pass manages to check only one group, we end up doing a pass per group.
Now, regarding the “flawed” groups. These are the ones where the programmer used the {-# UNPACK #-} pragma on a field, yet we could not unpack. One possible reason for this is that we lack a data instance in the environment that would allow for the field to be unpacked, so it is beneficial to treat this like a kind error: skip the flawed group and retry it in a later pass, when we might have more data instances in the environment. However, if the only reason a pass gets stuck is due to flawed groups, then we can make progress by treating unpacking failure as a warning. This way, we maximize unpacking with explicit {-# UNPACK #-} pragmas. Later we might check SCCs for other “flaws”, but for now the property is just about unusable unpack pragmas.
Finally, a short comment on why it is necessary to check whether a group is ready (all free variables are in the environment) or blocked (some free variables are not in the environment). One might expect this check to be redundant, as the SCCs come in lexical dependency order. However, as soon as we skip a group, the rest of the pass can no longer rely on this property, hence the check. It is rare to encounter this problem in a kind-correct program, but we managed to construct a test case.
That is all for the technical details. It took many iterations to arrive at this algorithm. Each previous attempt failed on some corner case or another, which is, of course, what made this a 10-year problem to begin with. If you are one of the many Haskellers who ran into this bug and found yourself reordering declarations or reaching for the $(return []) workaround, you can now drop it. GHC 9.14 handles it automatically.
Beyond the immediate quality-of-life improvement, this fix unblocks exciting developments such as singletonisation of GADTs.
Progress on unifying HsType and HsExpr
Internally, GHC’s frontend uses two distinct types to represent types and terms: HsType and HsExpr respectively. A dependently typed language would normally give uniform treatment to types and terms, so one of our goals is a refactoring to use one representation for both, arriving at the following declaration in GHC:
type HsType = HsExpr
The rationale for this is to increase code reuse between the term- and type-level pipelines in the compiler front-end (AST, parser, renamer, type checker).
Given that types and terms already share a number of similarities, one would expect this to be a straightforward refactoring. For example, the following constructs are found in both:
- Variables
a - Constructors
Con - Prefix applications
f a - Infix applications
a # b - Type applications
f @t - Literals
42,"hello" - Signatures
a :: t - Lists
[a, b, c] - Tuples
(a, b, c) - Parentheses
(a) - Holes
_
However, as it turned out, the representations for those constructs in HsExpr and HsType differ in various ways, leading to what one might call “death by a thousand cuts”.
We expect that it will take a good number of patches to handle each discrepancy individually and work out its user-facing consequences. For example, when we dealt with a difference in how infix operator applications were represented, we ended up allowing infix holes a _` b` in types, following the precedent set by term-level expressions.
The issue is further exacerbated by the fact that each construct actually has three variants depending on the compilation phase: the AST is either parsed, renamed, or typechecked, and each construct can have phase-specific extensions.
To make the effort more systematic, we created an automated test case that tracks the current status of the refactoring. If you open testsuite/tests/ghc-api/T25121_status.stdout in GHC’s sources, you will find a detailed report on which representations match or mismatch.
The goal is to make that test case report “match” for every constructor; then we will be able to merge HsType with HsExpr. To that end, here are some constructs that we have already updated for improved uniformity:
- Literals (dropped
HsTyLitin favor ofHsLit) - Operators (added infix holes in types)
- Wildcards (refactored
HsWildCardTyto useHoleKind) - Tuples and sums (updated exact-print annotations)
We are looking forward to making more changes in that direction.
The star kind syntax in required type arguments
The StarIsType extension allows the programmer to write * instead of Type:
Maybe :: * -> *
Although it is eventually going to be deprecated per GHC Proposal #143 “Remove the * kind syntax”, there are currently no plans to fully remove it. Indeed, the proposal uses the following phrasing:
the
-XStarIsTypeextension may be removed from GHC to simplify the internals
And “may be removed” only hints at the possibility of removal; given the magnitude of the potential breakage, this is not going to happen without friction.
Looking at this from the perspective of unifying term- and type-level parsers, this means we need to parse * in term syntax somehow. The problem is that it conflicts with the multiplication operator! This is a known problem and the raison d’être of the aforementioned proposal.
Back in 2020, we prototyped a solution to this. With a carefully crafted set of edits to the Haskell grammar, it turned out to be possible to add limited support for the * syntax where it does not conflict with multiplication.
Indeed, consider the common kinds:
Maybe :: * -> *Either :: * -> * -> *Monad :: (* -> *) -> Constraint
None of them actually involve the a * b ambiguity.
The prototype that implemented this was actually shelved, because it was difficult to motivate the change to the grammar. But now, with the introduction of RequiredTypeArguments, we can allow the following code:
{-# LANGUAGE RequiredTypeArguments, StarIsType #-}
x1 = f (* -> * -> *)
x2 = f (forall k. k -> *)
x3 = f ((* -> *) -> Constraint)
So we dusted off the old patch, rebased and refined it, and now we finally have a sound backwards compatibility story for *. Merging term and type syntax no longer requires its complete removal.
Pun detection in required type arguments
The initial implementation of RequiredTypeArguments left out pun checking due to engineering difficulties. Consider:
x = 42
f, g :: forall a -> ...
f (type x) = g x
In accordance with the specification, the g x function call is renamed as a term, so x refers to the top-level binding x = 42, not to the type variable binding type x as one might expect.
This is somewhat counterintuitive because g expects a type argument. Forbidding puns in required type arguments allows us to produce a helpful error message:
error: [GHC-09591]
Illegal punned variable occurrence in a required type argument.
The name ‘x’ could refer to:
‘x’ defined at Test.hs:3:1
‘x’ bound at Test.hs:5:9
Unfortunately, the initial attempt to introduce this check was stalled, as the necessary information was not available in the type-checking phase.
Luckily, in an effort to improve error messages, @sheaf corrected the architectural flaw in GHC that was blocking the punning check, and we implemented the check.
New type families: Tuple, Constraints, Tuple#, Sum#
GHC Proposal #145 “Non-punning list and tuple syntax” describes the NoListTuplePuns extension, which removes the overloading of (a, b): with the extension enabled, (a, b) always refers to the data constructor, never the type constructor. The corresponding tuple type can still be written as Tuple2 a b, but the Tuple type family offers better notation by reusing the familiar (a, b) syntax:
Tuple (Int, Bool) = Tuple2 Int Bool
Constraints (Show a, Eq a) = CTuple2 (Show a) (Eq a)
Tuple# (Int#, Float#) = Tuple2# Int# Float#
Sum# (Int#, Float#) = Sum2# Int# Float#
When the extension was first implemented by @tek, these type families had to be left out because GHC’s type inference was not yet powerful enough to handle them. The missing piece was more aggressive injectivity analysis for closed type families, which @rae proposed and, three years later, @simonpj implemented. With that in place, we were able to add the type families.
As part of this work, we also bumped the maximum sum arity from 63 to 64. Adding the Sum64# constructor had previously been blocked by a separate issue, which @luite resolved.
We also submitted a proposal amendment, as the type family definitions had to be simplified to a form that GHC’s injectivity checker could handle.
Rework of name resolution for built-in and punned names
While working on the name resolution component of GHC, we stumbled upon some outdated documentation and technical debt. Further inspection revealed four lurking bugs:
- #25174 Template Haskell mistakenly assumes
"FUN"is built-in syntax - #25179
mkName(template-haskell) ignoresNoListTuplePuns - #25180 Valid hole fits don’t suggest tuple constructors
- #25182
MkSoloandMkSolo#are mistakenly classified asBuiltInSyntax
Fixing them in a principled way called for a complete overhaul of the relevant logic. We cleaned up which names truly qualify as built-in syntax, unified the treatment of tuples of all arities, made the implementation aware of NoListTuplePuns, and updated the internal documentation.
Previous updates on dependent types
- Serokell’s Work on GHC: Dependent Types 4
- Serokell’s Work on GHC: Dependent Types 3
- Serokell’s Work on GHC: Dependent Types 2
- Serokell’s Work on GHC: Dependent Types
- GHC+DH Weekly Update #7, 2023-06-14 (Discourse)
- GHC+DH Weekly Update #6, 2023-06-07 (Discourse)
- GHC+DH Weekly Update #5, 2023-01-25 (Discourse)
- GHC+DH Weekly Update #4, 2023-01-18 (Discourse)
- GHC+DH Weekly Update #3, 2023-01-11 (Discourse)
- GHC+DH Weekly Update #2, 2022-12-21 (Discourse)
- GHC+DH Weekly Update #1, 2022-12-07 (Discourse)
Conclusion
This was the fifth blog post of our series “Work on GHC: Dependent Types”. In this installment, we covered visible forall in GADTs, namespace-specified imports, and the long-awaited fix for type instance ordering in kind checking. We also shared incremental progress on unifying HsType and HsExpr, and a handful of smaller but meaningful improvements throughout the compiler.
We are committed to our vision of dependently-typed programming in Haskell, so stay tuned for future updates!