24ddb4f56fd5e7a561da85a1e28c9f8bf45d265b
[ghc.git] / docs / backpack / backpack-impl.tex
1 \documentclass{article}
2
3 \usepackage{pifont}
4 \usepackage{graphicx} %[pdftex] OR [dvips]
5 \usepackage{fullpage}
6 \usepackage{wrapfig}
7 \usepackage{float}
8 \usepackage{titling}
9 \usepackage{hyperref}
10 \usepackage{tikz}
11 \usepackage{color}
12 \usepackage{footnote}
13 \usepackage{float}
14 \usepackage{algorithm}
15 \usepackage{algpseudocode}
16 \usetikzlibrary{arrows}
17 \usetikzlibrary{positioning}
18 \setlength{\droptitle}{-6em}
19
20 \input{commands-new-new.tex}
21
22 \newcommand{\nuAA}{\nu_\mathit{AA}}
23 \newcommand{\nuAB}{\nu_\mathit{AB}}
24 \newcommand{\nuGA}{\nu_\mathit{GA}}
25 \newcommand{\nuGB}{\nu_\mathit{GB}}
26 \newcommand{\betaPL}{\beta_\mathit{PL}}
27 \newcommand{\betaAA}{\beta_\mathit{AA}}
28 \newcommand{\betaAS}{\beta_\mathit{AS}}
29 \newcommand{\thinandalso}{\hspace{.45cm}}
30 \newcommand{\thinnerandalso}{\hspace{.38cm}}
31
32 \input{commands-rebindings.tex}
33
34 \newcommand{\var}[1]{\textsf{#1}}
35
36 \newcommand{\ghcfile}[1]{\textsl{#1}}
37
38 \title{Implementing Backpack}
39
40 \begin{document}
41
42 \maketitle
43
44 The purpose of this document is to describe an implementation path
45 for Backpack in GHC\@.
46
47 We start off by outlining the current architecture of GHC, ghc-pkg and Cabal,
48 which constitute the existing packaging system. We then state what our subgoals
49 are, since there are many similar sounding but different problems to solve. Next,
50 we describe the ``probably correct'' implementation plan, and finish off with
51 some open design questions. This is intended to be an evolving design document,
52 so please contribute!
53
54 \tableofcontents
55
56 \section{Current packaging architecture}
57
58 The overall architecture is described in Figure~\ref{fig:arch}.
59
60 \begin{figure}[H]
61 \center{\scalebox{0.8}{\includegraphics{arch.png}}}
62 \label{fig:arch}\caption{Architecture of GHC, ghc-pkg and Cabal. Green bits indicate additions from upcoming IHG work, red bits indicate additions from Backpack. Orange indicates a Haskell library.}
63 \end{figure}
64
65 Here, arrows indicate dependencies from one component to another. Color
66 coding is as follows: orange components are libaries, green components
67 are to be added with the IHG work, red components are to be added with
68 Backpack. (Thus, black and orange can be considered the current)
69
70 \subsection{Installed package database}
71
72 Starting from the bottom, we have the \emph{installed package database}
73 (actually a collection of such databases), which stores information
74 about what packages have been installed are thus available to be
75 compiled against. There is both a global database (for the system
76 administrator) and a local database (for end users), which can be
77 updated independently. One way to think about the package database
78 is as a \emph{cache of object code}. In principle, one could compile
79 any piece of code by repeatedly recompiling all of its dependencies;
80 the installed package database describes when this can be bypassed.
81
82 \begin{figure}[H]
83 \center{\scalebox{0.8}{\includegraphics{pkgdb.png}}}
84 \label{fig:pkgdb}\caption{Anatomy of a package database.}
85 \end{figure}
86
87 In Figure~\ref{fig:pkgdb}, we show the structure of a package database.
88 The installed package are created from a Cabal file through the process
89 of dependency resolution and compilation. In database terms, the primary key
90 of a package database is the InstalledPackageId
91 (Figure~\ref{fig:current-pkgid}). This ID uniquely identifies an
92 instance of an installed package. The PackageId omits the ABI hash and
93 is used to qualify linker exported symbols: the current value of this
94 parameter is communicated to GHC using the \verb|-package-id| flag.
95
96 In principle, packages with different PackageIds should be linkable
97 together in the same compiled program, whereas packages with the same
98 PackageId are not (even if they have different InstalledPackageIds). In
99 practice, GHC is currently only able to select one version of a package,
100 as it clears out all old versions of the package in
101 \ghcfile{compiler/main/Package.lhs}:applyPackageFlag.
102
103 \begin{figure}
104 \center{\begin{tabular}{r l}
105 PackageId & package name, package version \\
106 InstalledPackageId & PackageId, ABI hash \\
107 \end{tabular}}
108 \label{fig:current-pkgid}\caption{Current structure of package identifiers.}
109 \end{figure}
110
111 The database entry itself contains the information from the installed package ID,
112 as well as information such as what dependencies it was linked against, where
113 its compiled code and interface files live, its compilation flags, what modules
114 it exposes, etc. Much of this information is only relevant to Cabal; GHC
115 uses a subset of the information in the package database.
116
117 \subsection{GHC}
118
119 The two programs which access the package database directly are GHC
120 proper (for compilation) and ghc-pkg (which is a general purpose
121 command line tool for manipulating the database.) GHC relies on
122 the package database in the following ways:
123
124 \begin{itemize}
125 \item It imports the local and global package databases into
126 its runtime database, and applies modifications to the exposed
127 and trusted status of the entries via the flags \verb|-package|
128 and others (\ghcfile{compiler/main/Packages.lhs}). The internal
129 package state can be seen at \verb|-v4| or higher.
130 \item It uses this package database to find the location of module
131 interfaces when it attempts to load the module info of an external
132 module (\ghcfile{compiler/iface/LoadIface.hs}).
133 \end{itemize}
134
135 GHC itself performs a type checking phase, which generates an interface
136 file representing the module (so that later invocations of GHC can load the type
137 of a module), and then after compilation projects object files and linked archives
138 for programs to use.
139
140 \paragraph{Original names} Original names are an important design pattern
141 in GHC\@.
142 Sometimes, a name can be exposed in an hi file even if its module
143 wasn't exposed. Here is an example (compiled in package R):
144
145 \begin{verbatim}
146 module X where
147 import Internal (f)
148 g = f
149
150 module Internal where
151 import Internal.Total (f)
152 \end{verbatim}
153
154 Then in X.hi:
155
156 \begin{verbatim}
157 g = <R.id, Internal.Total, f> (this is the original name)
158 \end{verbatim}
159
160 (The reason we refer to the package as R.id is because it's the
161 full package ID, and not just R).
162
163 \subsection{hs-boot}
164
165 \verb|hs-boot| is a special mechanism used to support recursive linking
166 of modules within a package, today. Suppose I have a recursive module
167 dependency between modules and A and B. I break one of\ldots
168
169 (ToDo: describe how hs-boot mechanism works)
170
171 \subsection{Cabal}
172
173 Cabal is the build system for GHC, we can think of it as parsing a Cabal
174 file describing a package, and then making (possibly multiple)
175 invocations to GHC to perform the appropriate compilation. What
176 information does Cabal pass onto GHC\@? One can get an idea for this by
177 looking at a prototypical command line that Cabal invokes GHC with:
178
179 \begin{verbatim}
180 ghc --make
181 -package-name myapp-0.1
182 -hide-all-packages
183 -package-id containers-0.9-ABCD
184 Module1 Module2
185 \end{verbatim}
186
187 There are a few things going on here. First, Cabal has to tell GHC
188 what the name of the package it's compiling (otherwise, GHC can't appropriately
189 generate symbols that other code referring to this package might generate).
190 There are also a number of commands which configure its in-memory view of
191 the package database (GHC's view of the package database may not directly
192 correspond to what is on disk). There's also an optimization here: in principle,
193 GHC can compile each module one-by-one, but instead we use the \verb|--make| flag
194 because this allows GHC to reuse some data structures, resulting in a nontrivial
195 speedup.
196
197 (ToDo: describe cabal-install/sandbox)
198
199 \section{Goals}
200
201 Here are some of the high-level goals which motivate our improvements to
202 the module system.
203
204 \begin{itemize}
205 \item Solve \emph{Cabal hell}, a situation which occurs when conflicting
206 version ranges on a wide range of dependencies leads to a situation
207 where it is impossible to satisfy the constraints. We're seeking
208 to solve this problem in two ways: first, we want to support
209 multiple instances of containers-2.9 in the database which are
210 compiled with different dependencies (and even link them
211 together), and second, we want to abolish (often inaccurate)
212 version ranges and move to a regime where packages depend on
213 signatures. Version ranges may still be used to indicate important
214 semantic changes (e.g., bugs or bad behavior on the part of package
215 authors), but they should no longer drive dependency resolution
216 and often only be recorded after the fact.
217
218 \item Support \emph{hermetic builds with sharing}. A hermetic build
219 system is one which simulates rebuilding every package whenever
220 it is built; on the other hand, actually rebuilding every time
221 is extremely inefficient (but what happens in practice with
222 Cabal sandboxes). We seek to solve this problem with the IHG work,
223 by allowing multiple instances of a package in the database, where
224 the only difference is compilation parameters. We don't care
225 about being able to link these together in a single program.
226
227 \item Support \emph{module-level pluggability} as an alternative to
228 existing (poor) usage of type classes. The canonical example are
229 strings, where a library might want to either use the convenient
230 but inefficient native strings, or the efficient packed Text data
231 type, but would really like to avoid having to say \verb|StringLike s => ...|
232 in all of their type signatures. While we do not plan on supporting
233 separate compilation, Cabal should understand how to automatically
234 recompile these ``indefinite'' packages when they are instantiated
235 with a new plugin.
236
237 \item Support \emph{separate modular development}, where a library and
238 an application using the library can be developed and type-checked
239 separately, intermediated by an interface. The canonical example
240 here is the \verb|ghc-api|, which is a large, complex API that
241 the library writers (GHC developers) frequently change---the ability
242 for downstream projects to say, ``Here is the API I'm relying on''
243 without requiring these projects to actually be built would greatly
244 assist in keeping the API stable. This is applicable in
245 the pluggability example as well, where we want to ensure that all
246 of the $M \times N$ configurations of libraries versus applications
247 type check, by only running the typechecker $M + N$ times. A closely
248 related concern is related toolchain support for extracting a signature
249 from an existing implementation, as current Haskell culture is averse
250 to explicitly writing separate signature files.
251
252 \item Subsume existing support for \emph{mutually recursive modules},
253 without the double vision problem.
254 \end{itemize}
255
256 A \emph{non-goal} is to allow users to upgrade upstream libraries
257 without recompiling downstream. This is an ABI concern and we're not
258 going to worry about it.
259
260 \clearpage
261
262 \section{Module and package identity}
263
264 \begin{figure}[H]
265 \begin{tabular}{p{0.45\textwidth} p{0.45\textwidth}}
266 \begin{verbatim}
267 package p where
268 A :: [ data X ]
269 P = [ import A; data Y = Y X ]
270 package q where
271 A1 = [ data X = X1 ]
272 A2 = [ data X = X2 ]
273 include p (A as A1, P as P1)
274 include p (A as A2, P as P2)
275 \end{verbatim}
276 &
277 \begin{verbatim}
278 package p where
279 A :: [ data X ]
280 P = [ data T = T ] -- no A import!
281 package q where
282 A1 = [ data X = X1 ]
283 A2 = [ data X = X2 ]
284 include p (A as A1, P as P1)
285 include p (A as A2, P as P2)
286 \end{verbatim}
287 \\
288 (a) Type equality must consider holes\ldots &
289 (b) \ldots but how do we track dependencies? \\
290 \end{tabular}
291 \caption{Two similar examples}\label{fig:simple-ex}
292 \end{figure}
293
294 One of the central questions one encounters when type checking Haskell
295 code is: \emph{when are two types equal}? In ordinary Haskell, the
296 answer is simple: ``They are equal if their \emph{original names} (i.e.,
297 where they were originally defined) are the same.'' However, in
298 Backpack, the situation is murkier due to the presence of \emph{holes}.
299 Consider the pair of examples in Figure~\ref{fig:simple-ex}.
300 In Figure~\ref{fig:simple-ex}a, the types \m{B1}.\verb|Y| and \m{B2}.\verb|Y| should not be
301 considered equal, even though na\"\i vely their original names are
302 \pname{p}:\m{B}.\verb|Y|, since their arguments are different \verb|X|'s!
303 On the other hand, if we instantiated \pname{p} twice with the same \m{A}
304 (e.g., change the second include to \texttt{include p (A as A1, P as P2)}),
305 we might consider the two resulting \verb|Y|'s
306 equal, an \emph{applicative} semantics of identity instantiation. In
307 Figure~\ref{fig:simple-ex}b, we see that even though \m{A} was instantiated differently,
308 we might reasonably wonder if \texttt{T} should still be considered the same,
309 since it has no dependence on the actual choice of \m{A}.
310
311 In fact, there are quite a few different choices that can be made here.
312 Figures~\ref{fig:applicativity}~and~\ref{fig:granularity} summarize the various
313 choices on two axes: the granularity of applicativity (under what circumstances
314 do we consider two types equal) and the granularity of dependency (what circumstances
315 do we consider two types not equal)? A \ding{52} means the design we have chosen
316 answers the question affirmatively---\ding{54}, negatively---but all of these choices
317 are valid points on the design space.
318
319 \subsection{The granularity of applicativity}
320
321 An applicative semantics of package instantiation states that if a package is
322 instantiated with the ``same arguments'', then the resulting entities it defines
323 should also be considered equal. Because Backpack uses \emph{mix-in modules},
324 it is very natural to consider the arguments of a package instantiation as the
325 modules, as shown in Figure~\ref{fig:applicativity}b: the same module \m{A} is
326 linked for both instantiations, so \m{P1} and \m{P2} are considered equal.
327
328 However, we consider the situation at a finer granularity, we might say, ``Well,
329 for a declaration \texttt{data Y = Y X}, only the definition of type \verb|X| matters.
330 If they are the same, then \verb|Y| is the same.'' In that case, we might accept
331 that in Figure~\ref{fig:applicativity}a, even though \pname{p} is instantiated
332 with different modules, at the end of the day, the important component \verb|X| is
333 the same in both cases, so \verb|Y| should also be the same. This is a sort of
334 ``extreme'' view of modular development, where every declaration is desugared
335 into a separate module. In our design, we will be a bit more conservative, and
336 continue with module level applicativity, in the same manner as Paper Backpack.
337
338 \paragraph{Implementation considerations}
339 Compiling Figure~\ref{fig:applicativity}b to dynamic libraries poses an
340 interesting challenge, if every package compiles to a dynamic library.
341 When we compile package \pname{q}, the libraries we end up producing are \pname{q}
342 and an instance of \pname{p} (instantiated with \pname{q}:\m{A}). Furthermore,
343 \pname{q} refers to code in \pname{p} (the import in \m{Q}), and vice versa (the usage
344 of the instantiated hole \m{A}). When building static libraries, this circular
345 dependency doesn't matter: when we link the executable, we can resolve all
346 of the symbols in one go. However, when the libraries in question are
347 dynamic libraries \verb|libHSq.so| and \verb|libHSp(q:A).so|, we now have
348 a \emph{circular dependency} between the two dynamic libraries, and most dynamic
349 linkers will not be able to load either of these libraries.
350
351 To break the circularity in Figure~\ref{fig:applicativity}b, we have to \emph{inline}
352 the entire module \m{A} into the instance of \pname{p}. Since the code is exactly
353 the same, we can still consider the instance of \m{A} in \pname{q} and in \pname{p}
354 type equal. However, in Figure~\ref{fig:applicativity}c, applicativity has been
355 done at a coarser level: although we are using Backpack's module mixin syntax,
356 morally, this example is filling in the holes with the \emph{package} \pname{a}
357 (rather than a module). In this case, we can achieve code sharing, since
358 \pname{p} can refer directly to \pname{a}, breaking the circularity.
359
360 \newcolumntype{C}{>{\centering\arraybackslash}p{0.3\textwidth}}
361 \begin{savenotes}
362 \begin{figure}
363 \begin{tabular}{C C C}
364 \begin{verbatim}
365 package q where
366 A = [ data X = X ]
367 A1 = [ import A; x = 0 ]
368 A2 = [ import A; x = 1 ]
369 include p (A as A1, P as P1)
370 include p (A as A2, P as P2)
371 Q = [ import P1; ... ]
372 \end{verbatim}
373 &
374 \begin{verbatim}
375 package q where
376 A = [ data X = X ]
377
378
379 include p (A, P as P1)
380 include p (A, P as P2)
381 Q = [ import P1; ... ]
382 \end{verbatim}
383 &
384 \begin{verbatim}
385 package a where
386 A = [ data X = X ]
387 package q where
388 include a
389 include p (A, P as P1)
390 include p (A, P as P2)
391 Q = [ import P1; ... ]
392 \end{verbatim}
393 \\
394 (a) Declaration applicativity \ding{54} &
395 (b) Module applicativity \ding{52} &
396 (c) Package applicativity \ding{52} \\
397 \end{tabular}
398 \caption{Choices of granularity of applicativity on \pname{p}: given \texttt{data Y = Y X}, is \m{P1}.\texttt{Y} equal to \m{P2}.\texttt{Y}?}\label{fig:applicativity}
399 \end{figure}
400 \end{savenotes}
401
402 \subsection{The granularity of dependency}
403
404 \begin{savenotes}
405 \newcolumntype{C}{>{\centering\arraybackslash}p{0.3\textwidth}}
406 \begin{figure}
407 \begin{tabular}{C C C}
408 \begin{verbatim}
409 package p(A,P) where
410 A :: [ data X ]
411 P = [
412 import A
413 data T = T
414 data Y = Y X
415 ]
416 \end{verbatim}
417 &
418 \begin{verbatim}
419 package p(A,P) where
420 A :: [ data X ]
421 B = [ data T = T ]
422 C = [
423 import A
424 data Y = Y X
425 ]
426 P = [
427 import B
428 import C
429 ]
430 \end{verbatim}
431 &
432 \begin{verbatim}
433 package b where
434 B = [ data T = T ]
435 package c where
436 A :: [ data X ]
437 C = [
438 import A
439 data Y = Y X
440 ]
441 package p(A,P) where
442 include b; include c
443 P = [ import B; import C ]
444 \end{verbatim}
445 \\
446 (a) Declaration granularity \ding{54} &
447 (b) Module granularity \ding{54} &
448 (c) Package granularity \ding{52} \\
449 \end{tabular}
450 \caption{Choices of granularity for dependency: is the identity of \texttt{T} independent of how \m{A} is instantiated?}\label{fig:granularity}
451 \end{figure}
452
453 \end{savenotes}
454
455 In the previous section, we considered \emph{what} entities may be considered for
456 computing dependency; in this section we consider \emph{which} entities are actually
457 considered as part of the dependencies for the declaration/module/package we're writing.
458 Figure~\ref{fig:granularity} contains a series of examples which exemplify the choice
459 of whether or not to collect dependencies on a per-declaration, per-module or per-package basis:
460
461 \begin{itemize}
462 \item Package-level granularity states that the modules in a package are
463 considered to depend on \emph{all} of the holes in the package, even if
464 the hole is never imported. Figure~\ref{fig:granularity}c is factored so that
465 \verb|T| is defined in a distinct package \pname{b} with no holes, so no matter
466 the choice of \m{A}, \m{B}.\verb|T| will be the same. On the other hand, in
467 Figure~\ref{fig:granularity}b, there is a hole in the package defining \m{B},
468 so the identity of \verb|T| will depend on the choice of \m{A}.
469
470 \item Module-level granularity states that each module has its own dependency,
471 computed by looking at its import statements. In this setting, \verb|T| in Figure~\ref{fig:granularity}b
472 is independent of \m{A}, since the hole is never imported in \m{B}. But once again, in
473 Figure~\ref{fig:granularity}a, there is an import in the module defining \verb|T|,
474 so the identity of \verb|T| once again depends on the choice of \m{A}.
475
476 \item Finally, at the finest level of granularity, one could chop up \pname{p} in
477 Figure~\ref{fig:granularity}a, looking at the type declaration-level dependency
478 to suss out whether or not \verb|T| depends on \m{A}. It doesn't refer to
479 anything in \m{A}, so it is always considered the same.
480 \end{itemize}
481
482 It is well worth noting that the system described by Paper Backpack tracks dependencies per module;
483 however, we have decided that we will implement tracking per package instead:
484 a coarser grained granularity which accepts less programs.
485
486 Is a finer form of granularity \emph{better?} Not necessarily! For
487 one, we can always split packages into further subpackages (as was done
488 in Figure~\ref{fig:granularity}c) which better reflect the internal hole
489 dependencies, so it is always possible to rewrite a program to make it
490 typecheck---just with more packages. Additionally, the finer the
491 granularity of dependency, the more work I have to do to understand what
492 the identities of entities in a module are. In Paper Backpack, I have
493 to understand the imports of all modules in a package; with
494 declaration-granularity, I have to understand the entire code. This is
495 a lot of work for the developer to think about; a more granular model is
496 easier to remember and reason about. Finally, package-level granularity
497 is much easier to implement, as it preserves the previous compilation
498 model, \emph{one library per package}. At a fine level of granularity, we
499 may end up repeatedly compiling a module which actually should be considered
500 ``the same'' as any other instance of it.
501
502 Nevertheless, finer granularity can be desirable from an end-user perspective.
503 Usually, these circumstances arise when library-writers are forced to split their
504 components into many separate packages, when they would much rather have written
505 a single package. For example, if I define a data type in my library, and would
506 like to define a \verb|Lens| instance for it, I would create a new package just
507 for the instance, in order to avoid saddling users who aren't interested in lenses
508 with an extra dependency. Another example is test suites, which have dependencies
509 on various test frameworks that a user won't care about if they are not planning
510 on testing the code. (Cabal has a special case for this, allowing the user
511 to write effectively multiple packages in a single Cabal file.)
512
513 \subsection{Summary}
514
515 We can summarize all of the various schemes by describing the internal data
516 types that would be defined by GHC under each regime. First, we have
517 the shared data structures, which correspond closely to what users are
518 used to seeing:
519
520 \begin{verbatim}
521 <pkg-name> ::= containers, ...
522 <pkg-version ::= 1.0, ...
523 <pkg-id> ::= <pkg-name>-<pkg-version>
524 <mod-name> ::= Data.Set, ...
525 <occ> ::= empty, ...
526 \end{verbatim}
527
528 Changing the \textbf{granularity of applicativity} modifies how we represent the
529 list of dependencies associated with an entity. With module applicativity,
530 we list module identities (not yet defined); with declaration applicativity
531 we actually list the original names (i.e., ids).
532
533 \begin{verbatim}
534 <deps> ::= <id>, ... # Declaration applicativity
535 <deps> ::= <module>, ... # Module applicativity
536 \end{verbatim}
537
538 Changing the \textbf{granularity of dependency} affects how we compute
539 the lists of dependencies, and what entities are well defined:
540
541 \begin{verbatim}
542 # Package-level granularity
543 <pkg-key> ::= hash(<pkg-id> + <deps for pkg>)
544 <module> ::= <pkg-key> : <mod-name>
545 <id> ::= <module> . <occ>
546
547 # Module-level granularity
548 <pkg-key> not defined
549 <module> ::= hash(<pkg-id> : <mod-name> + <deps for mod>)
550 <id> ::= <module-key> . <occ>
551
552 # Declaration-level granularity
553 <pkg-key> not defined
554 <module> not defined
555 <id> ::= hash(<pkg-id> : <mod-name> . <occ> + <deps for decl>)
556 \end{verbatim}
557
558 Notice that as we increase the granularity, the notion of a ``package'' and a ``module''
559 become undefined. This is because, for example, with module-level granularity, a single
560 ``package'' may result in several modules, each of which have different sets of
561 dependencies. It doesn't make much sense to refer to the package as a monolithic entity,
562 because the point of splitting up the dependencies was so that if a user relies only
563 on a single module, it has a correspondingly restricted set of dependencies.
564 \subsection{The new scheme, formally}
565
566 \begin{wrapfigure}{R}{0.5\textwidth}
567 \begin{myfig}
568 \[
569 \begin{array}{@{}lr@{\;}c@{\;}l@{}}
570 \text{Package Names (\texttt{PkgName})} & P &\in& \mathit{PkgNames} \\
571 \text{Module Path Names (\texttt{ModName})} & p &\in& \mathit{ModPaths} \\
572 \text{Module Identity Vars} & \alpha,\beta &\in& \mathit{IdentVars} \\
573 \text{Package Key (\texttt{PackageKey})} & \K &::=& P(\vec{p\mapsto\nu}) \\
574 \text{Module Identities (\texttt{Module})} & \nu &::=&
575 \alpha ~|~
576 \K\colon\! p \\
577 \text{Module Identity Substs} & \phi,\theta &::=&
578 \{\vec{\alpha \coloneqq \nu}\} \\
579 \end{array}
580 \]
581 \caption{Module Identities}
582 \label{fig:mod-idents}
583 \end{myfig}
584 \end{wrapfigure}
585
586 In this section, we give a formal treatment of our choice in the design space, in the
587 same style as the Backpack paper, but omitting mutual recursion, as it follows straightforwardly.
588 Physical module
589 identities $\nu$, the \texttt{Module} component of \emph{original names} in GHC, are either (1) \emph{variables} $\alpha$, which are
590 used to represent holes\footnote{In practice, these will just be fresh paths in a special package key for variables.} or (2) a concrete module $p$ defined in package
591 $P$, with holes instantiated with other module identities (might be
592 empty)\footnote{In Paper Backpack, we would refer to just $P$:$p$ as the identity
593 constructor. However, we've written the subterms specifically next to $P$ to highlight the semantic difference of these terms.}.
594
595 As in traditional Haskell, every package contains a number of module
596 files at some module path $p$; within a package these paths are
597 guaranteed to be unique.\footnote{In Paper Backpack, the module expressions themselves are used to refer to globally unique identifiers for each literal. This makes the metatheory simpler, but for implementation purposes it is convenient to conflate the \emph{original} module path that a module is defined at with its physical identity.} When we write inline module definitions, we assume
598 that they are immediately assigned to a module path $p$ which is incorporated
599 into their identity. A module identity $\nu$ simply augments this
600 with subterms $\vec{p\mapsto\nu}$ representing how \emph{all} holes in the package $P$
601 were instantiated.\footnote{In Paper Backpack, we do not distinguish between holes/non-holes, and we consider all imports of the \emph{module}, not the package.} This naming is stable because the current Backpack surface syntax does not allow a logical path in a package
602 to be undefined. A package key is $P(\vec{p\mapsto\nu})$.
603
604 Here is the very first example from
605 Section 2 of the original Backpack paper, \pname{ab-1}:
606
607 \begin{example}
608 \Pdef{ab-1}{
609 \Pmod{A}{x = True}
610 \Pmod{B}{\Mimp{A}; y = not x}
611 % \Pmodd{C}{\mname{A}}
612 }
613 \end{example}
614
615 The identities of \m{A} and \m{B} are
616 \pname{ab-1}:\mname{A} and \pname{ab-1}:\mname{B}, respectively.\footnote{In Paper Backpack, the identity for \mname{B} records its import of \mname{A}, but since it is definite, this is strictly redundant.} In a package with holes, each
617 hole (within the package definition) gets a fresh variable as its
618 identity, and all of the holes associated with package $P$ are recorded. Consider \pname{abcd-holes-1}:
619
620 \begin{example}
621 \Pdef{abcd-holes-1}{
622 \Psig{A}{x :: Bool} % chktex 26
623 \Psig{B}{y :: Bool} % chktex 26
624 \Pmod{C}{x = False}
625 \Pmodbig{D}{
626 \Mimpq{A}\\
627 \Mimpq{C}\\
628 % \Mexp{\m{A}.x, z}\\
629 z = \m{A}.x \&\& \m{C}.x
630 }
631 }
632 \end{example}
633
634 The identities of the four modules
635 are, in order, $\alpha_a$, $\alpha_b$, $\pname{abcd-holes-1}(\alpha_a,\alpha_b)$:\mname{C}, and
636 $\pname{abcd-holes-1}(\alpha_a,\alpha_b)$:\mname{D}.\footnote{In Paper Backpack, the granularity is at the module level, so the subterms of \mname{C} and \mname{D} can differ.} We include both $\alpha_a$ and $\alpha_b$ in both \mname{C} and \mname{D}, regardless of the imports. When we link the package against an implementation of the hole, these variables are replaced with the identities of the modules we linked against.
637
638 Shaping proceeds in the same way as in Paper Backpack, except that the
639 shaping judgment must also accept the package key
640 $P(\vec{p\mapsto\alpha})$ so we can create identifiers with
641 \textsf{mkident}. This implies we must know ahead of time what the holes
642 of a package are.
643
644 \paragraph{A full Backpack comparison}
645 If you're curious about how the rest of the Backpack examples translate,
646 look no further than this section.
647
648 First, consider the module identities in the \m{Graph} instantiations in
649 \pname{multinst}, shown in Figure 2 of the original Backpack paper.
650 In the definition of \pname{structures}, assume that the variables for
651 \m{Prelude} and \m{Array} are $\alpha_P$ and $\alpha_A$ respectively.
652 The identity of \m{Graph} is $\pname{structures}(\alpha_P, \alpha_A)$:\m{Graph}. Similarly, the identities of the two array implementations
653 are $\nu_{AA} = \pname{arrays-a}(\alpha_P)$:\m{Array} and
654 $\nu_{AB} = \pname{arrays-b}(\alpha_P)$:\m{Array}.\footnote{Notice that the subterms coincide with Paper Backpack! A sign that module level granularity is not necessary for many use-cases.}
655
656 The package \pname{graph-a} is more interesting because it
657 \emph{links} the packages \pname{arrays-a} and \pname{structures}
658 together, with the implementation of \m{Array} from \pname{arrays-a}
659 \emph{instantiating} the hole \m{Array} from \pname{structures}. This
660 linking is reflected in the identity of the \m{Graph} module in
661 \pname{graph-a}: whereas in \pname{structures} it was $\nu_G =
662 \pname{structures}(\alpha_P, \alpha_A)$:\m{Graph}, in \pname{graph-a} it is
663 $\nu_{GA} = \nu_G[\nu_{AA}/\alpha_A] = \pname{structures}(\alpha_P, \nu_{AA})$:\m{Graph}. Similarly, the identity of \m{Graph} in
664 \pname{graph-b} is $\nu_{GB} = \nu_G[\nu_{AB}/\alpha_A] =
665 \pname{structures}(\alpha_P, \nu_{AB})$:\m{Graph}. Thus, linking consists
666 of substituting the variable identity of a hole by the concrete
667 identity of the module filling that hole.
668
669 Lastly, \pname{multinst} makes use of both of these \m{Graph}
670 modules, under the aliases \m{GA} and \m{GB}, respectively.
671 Consequently, in the \m{Client} module, \code{\m{GA}.G} and
672 \code{\m{GB}.G} will be correctly viewed as distinct types since they
673 originate in modules with distinct identities.
674
675 As \pname{multinst} illustrates, module identities effectively encode
676 dependency graphs at the package level.\footnote{In Paper Backpack, module identities
677 encode dependency graphs at the module level. In both cases, however, what is being
678 depended on is always a module.} Like in Paper Backpack, we have an \emph{applicative}
679 semantics of instantiation, and the applicativity example in Figure 3 of the
680 Backpack paper still type checks. However, because we are operating at a coarser
681 granularity, modules may have spurious dependencies on holes that they don't
682 actually depend on, which means less type equalities may hold.
683
684
685 \subsection{Cabal dependency resolution}
686
687 Currently, when we compile a Cabal
688 package, Cabal goes ahead and resolves \verb|build-depends| entries with actual
689 implementations, which we compile against. A planned addition to the package key,
690 independent of Backpack, is to record the transitive dependency tree selected
691 during this dependency resolution process, so that we can install \pname{libfoo-1.0}
692 twice compiled against different versions of its dependencies.
693 What is the relationship to this transitive dependency tree of \emph{packages},
694 with the subterms of our package identities which are \emph{modules}? Does one
695 subsume the other? In fact, these are separate mechanisms---two levels of indirections,
696 so to speak.
697
698 To illustrate, suppose I write a Cabal file with \verb|build-depends: foobar|. A reasonable assumption is that this translates into a
699 Backpack package which has \verb|include foobar|. However, this is not
700 actually a Paper Backpack package: Cabal's dependency solver has to
701 rewrite all of these package references into versioned references
702 \verb|include foobar-0.1|. For example, this is a pre-package:
703
704 \begin{verbatim}
705 package foo where
706 include bar
707 \end{verbatim}
708
709 and this is a Paper Backpack package:
710
711 \begin{verbatim}
712 package foo-0.3[bar-0.1[baz-0.2]] where
713 include bar-0.1[baz-0.2]
714 \end{verbatim}
715
716 This tree is very similar to the one tracking dependencies for holes,
717 but we must record this tree \emph{even} when our package has no holes.
718 % As a final example, the full module
719 % identity of \m{B1} in Figure~\ref{fig:granularity} may actually be $\pname{p-0.9(q-1.0[p-0.9]:A1)}$:\m{B}.
720
721
722 \subsection{Implementation}
723
724 In GHC's current packaging system, a single package compiles into a
725 single entry in the installed package database, indexed by the package
726 key. This property is preserved by package-level granularity, as we
727 assign the same package key to all modules. Package keys provide an
728 easy mechanism for sharing to occur: when an indefinite package is fully
729 instantiated, we can check if we already have its package key installed
730 in the installed package database. (At the end of this section, we'll
731 briefly discuss some of the problems actually implementing Paper Backpack.)
732 It is also important to note that we are \emph{willing to duplicate code};
733 processes like this already happen in other parts of the compiler
734 (such as inlining.)
735
736 \paragraph{Relaxing package selection restrictions} As mentioned
737 previously, GHC is unable to select multiple packages with the same
738 package name (but different package keys). This restriction needs to be
739 lifted. We should add a new flag \verb|-package-key|. GHC also knows
740 about version numbers and will mask out old versions of a library when
741 you make another version visible; this behavior needs to be modified.
742
743 \paragraph{Linker symbols} As we increase the amount of information in
744 PackageId, it's important to be careful about the length of these IDs,
745 as they are used for exported linker symbols (e.g.
746 \verb|base_TextziReadziLex_zdwvalDig_info|). Very long symbol names
747 hurt compile and link time, object file sizes, GHCi startup time,
748 dynamic linking, and make gdb hard to use. As such, the current plan is
749 to do away with full package names and versions, and instead use just a
750 base-62 encoded hash, perhaps with the first four characters of the package
751 name for user-friendliness.
752
753 \section{Shapeless Backpack}\label{sec:simplifying-backpack}
754
755 Backpack as currently defined always requires a \emph{shaping} pass,
756 which calculates the shapes of all modules defined in a package.
757 The shaping pass is critical to the solution of the double-vision problem
758 in recursive module linking, but it also presents a number of unpalatable
759 implementation problems:
760
761 \begin{itemize}
762
763 \item \emph{Shaping is a lot of work.} A module shape specifies the
764 providence of all data types and identifiers defined by a
765 module. To calculate this, we must preprocess and parse all
766 modules, even before we do the type-checking pass. (Fortunately,
767 shaping doesn't require a full parse of a module, only enough
768 to get identifiers. However, it does have to understand import
769 statements at the same level of detail as GHC's renamer.)
770
771 \item \emph{Shaping must be done upfront.} In the current Backpack
772 design, all shapes must be computed before any typechecking can
773 occur. While performing the shaping pass upfront is necessary
774 in order to solve the double vision problem (where a module
775 identity may be influenced by later definitions), it means
776 that GHC must first do a shaping pass, and then revisit every module and
777 compile them proper. Nor is it (easily) possible to skip the
778 shaping pass when it is unnecessary, as one might expect to be
779 the case in the absence of mutual recursion. Shaping is not
780 a ``pay as you go'' language feature.
781
782 \item \emph{GHC can't compile all programs shaping accepts.} Shaping
783 accepts programs that GHC, with its current hs-boot mechanism, cannot
784 compile. In particular, GHC requires that any data type or function
785 in a signature actually be \emph{defined} in the module corresponding
786 to that file (i.e., an original name can be assigned to these entities
787 immediately.) Shaping permits unrestricted exports to implement
788 modules; this shows up in the formalism as $\beta$ module variables.
789
790 \item \emph{Shaping encourages inefficient program organization.}
791 Shaping is designed to enable mutually recursive modules, but as
792 currently implemented, mutual recursion is less efficient than
793 code without recursive dependencies. Programmers should avoid
794 this code organization, except when it is absolutely necessary.
795
796 \item \emph{GHC is architecturally ill-suited for directly
797 implementing shaping.} Shaping implies that GHC's internal
798 concept of an ``original name'' be extended to accommodate
799 module variables. This is an extremely invasive change to all
800 aspects of GHC, since the original names assumption is baked
801 quite deeply into the compiler. Plausible implementations of
802 shaping requires all these variables to be skolemized outside
803 of GHC\@.
804
805 \end{itemize}
806
807 To be clear, the shaping pass is fundamentally necessary for some
808 Backpack packages. Here is the example which convinced Simon:
809
810 \begin{verbatim}
811 package p where
812 A :: [data T; f :: T -> T]
813 B = [export T(MkT), h; import A(f); data T = MkT; h x = f MkT]
814 A = [export T(MkT), f, h; import B; f MkT = MkT]
815 \end{verbatim}
816
817 The key to this example is that B \emph{may or may not typecheck} depending
818 on the definition of A. Because A reexports B's definition T, B will
819 typecheck; but if A defined T on its own, B would not typecheck. Thus,
820 we \emph{cannot} typecheck B until we have done some analysis of A (the
821 shaping analysis!)
822
823 Thus, it is beneficial (from an optimization point of view) to
824 consider a subset of Backpack for which shaping is not necessary.
825 Here is a programming discipline which does just that, which we will call the \textbf{linking restriction}: \emph{Module implementations must be declared before
826 signatures.} Formally, this restriction modifies the rule for merging
827 polarized module shapes ($\widetilde{\tau}_1^{m_1} \oplus \widetilde{\tau}_2^{m_2}$) so that
828 $\widetilde{\tau}_1^- \oplus \widetilde{\tau}_2^+$ is always undefined.\footnote{This seemed to be the crispest way of defining the restriction, although this means an error happens a bit later than I'd like it to: I'd prefer if we errored while merging logical contexts, but we don't know what is a hole at that point.}
829
830 Here is an example of the linking restriction. Consider these two packages:
831
832 \begin{verbatim}
833 package random where
834 System.Random = [ ... ].hs
835
836 package monte-carlo where
837 System.Random :: ...
838 System.MonteCarlo = [ ... ].hs
839 \end{verbatim}
840
841 Here, random is a definite package which may have been compiled ahead
842 of time; monte-carlo is an indefinite package with a dependency
843 on any package which provides \verb|System.Random|.
844
845 Now, to link these two applications together, only one ordering
846 is permissible:
847
848 \begin{verbatim}
849 package myapp where
850 include random
851 include monte-carlo
852 \end{verbatim}
853
854 If myapp wants to provide its own random implementation, it can do so:
855
856 \begin{verbatim}
857 package myapp2 where
858 System.Random = [ ... ].hs
859 include monte-carlo
860 \end{verbatim}
861
862 In both cases, all of \verb|monte-carlo|'s holes have been filled in by the time
863 it is included. The alternate ordering is not allowed.
864
865 Why does this discipline prevent mutually recursive modules? Intuitively,
866 a hole is the mechanism by which we can refer to an implementation
867 before it is defined; otherwise, we can only refer to definitions which
868 preceed our definition. If there are never any holes \emph{which get filled},
869 implementation links can only go backwards, ruling out circularity.
870
871 It's easy to see how mutual recursion can occur if we break this discipline:
872
873 \begin{verbatim}
874 package myapp2 where
875 include monte-carlo
876 System.Random = [ import System.MonteCarlo ].hs
877 \end{verbatim}
878
879 \subsection{Typechecking of definite modules without shaping}
880
881 If we are not carrying out a shaping pass, we need to be able to calculate
882 $\widetilde{\Xi}_{\mathsf{pkg}}$ on the fly. In the case that we are
883 compiling a package---there will be no holes in the final package---we
884 can show that shaping is unnecessary quite easily, since with the
885 linking restriction, everything is definite from the get-go.
886
887 Observe the following invariant: at any given step of the module
888 bindings, the physical context $\widetilde{\Phi}$ contains no
889 holes. We can thus conclude that there are no module variables in any
890 type shapes. As the only time a previously calculated package shape can
891 change is due to unification, the incrementally computed shape is in
892 fact the true one.
893
894 As far as the implementation is concerned, we never have to worry
895 about handling module variables; we only need to do extra typechecks
896 against (renamed) interface files.
897
898 \subsection{Compiling definite packages}\label{sec:compiling}
899
900 % New definitions
901 \algnewcommand\algorithmicswitch{\textbf{switch}}
902 \algnewcommand\algorithmiccase{\textbf{case}}
903 \algnewcommand\algorithmicassert{\texttt{assert}}
904 % New "environments"
905 \algdef{SE}[SWITCH]{Switch}{EndSwitch}[1]{\algorithmicswitch\ #1\ \algorithmicdo}{\algorithmicend\ \algorithmicswitch}%
906 \algdef{SE}[CASE]{Case}{EndCase}[1]{\algorithmiccase\ ``#1''}{\algorithmicend\ \algorithmiccase}%
907 \algtext*{EndSwitch}%
908 \algtext*{EndCase}%
909
910 \begin{algorithm}
911 \caption{Compilation of definite packages (assume \texttt{-hide-all-packages} on all \texttt{ghc} invocations)}\label{alg:compile}
912 \begin{algorithmic}
913 \Procedure{Compile}{\textbf{package} $P$ \textbf{where} $\vec{B}$, $flags_H$}
914 \State$flags\gets \emptyset$
915 \State$\mathcal{K}$ $\gets$ \Call{Hash}{$P + \textsc{HoleInsts}(\vec{B}, flags_H)$}
916 \State$cwd\gets \textrm{fresh working directory for compilation}$
917 \For{$B$ \textbf{in} $\vec{B}$}
918 \Case{$p$ $\cc$ $p$\texttt{.hsig}}
919 \State\Call{Exec}{\texttt{ghc -sig} $p$\texttt{.hsig} $flags$ \texttt{-hole-flags} ``$flags_H$''}
920 \EndCase%
921 \Case{$p = p\texttt{.hs}$}
922 \State\Call{Exec}{\texttt{ghc -c} $p$\texttt{.hs} \texttt{-package-name} $\mathcal{K}$ $flags$}
923 \Comment{}Nota bene\@: no $flags_H$!
924 \EndCase%
925 \Case{$p = p'$}
926 \State\Call{Exec}{\texttt{ghc -alias} $p$ $p'$}
927 \EndCase%
928 \Case{\Cinc{$P'$}}
929 \State$\mathcal{K}'\gets$ \Call{Compile}{$P'$, $flags_H$ $flags$ \texttt{-package-loc} $(\mathcal{K}\to cwd)$ \texttt{-package} $\mathcal{K}$}
930 \State$flags\gets flags$ \texttt{-package} $\mathcal{K}'$
931 \State\Call{Exec}{\texttt{ghc -check} $flags$}
932 \EndCase%
933 \Case{\Cinc{$P'$} $\langle \vec{p_H\mapsto p'_H}, \vec{p\mapsto p'} \rangle$}
934 \State$\mathcal{K}'\gets$ \Call{Compile}{$P'$, $flags_H$ $flags$ \texttt{-package-loc} $(\mathcal{K}\to cwd)$ \texttt{-reset-module-env} $\vec{p_H\mapsto p'_H}$}
935 \State$flags\gets flags$ \texttt{-package} $\mathcal{K}'$ $\langle\vec{p\mapsto p'}\rangle$
936 \State\Call{Exec}{\texttt{ghc -check} $flags$}
937 \EndCase%
938 \EndFor%
939 \State\Call{Exec}{\texttt{ghc-pkg copy \&\& ghc-pkg register}} \Comment{Records if $P$ exports a thinning}
940 \State\Return$\mathcal{K}$
941 \EndProcedure%
942 \end{algorithmic}
943 \end{algorithm}
944
945 The full recursive procedure for compiling a Backpack package is given
946 in Figure~\ref{alg:compile}. We recursively walk through Backpack descriptions,
947 processing each line by invoking GHC and/or modifying our package state.
948 Here is a more in-depth description of the algorith, line-by-line:
949
950 \paragraph{The parameters} To compile a package description for package
951 $P$, we need to know a list of $flags_H$, which are flags that are
952 passed to \texttt{ghc} to let it know where to find the instantiated
953 holes. Logically, these flags determine a mapping from each hole's
954 logical name $p_H$ to a physical module identity $\nu$; we accordingly
955 use these flags at the very beginning of compilation to calculate the
956 package key $\mathcal{K}$ of the package being compiled, using the
957 pre-pass \textsc{HoleInsts}, which finds all of the holes defined in the
958 package and looks up what would have been selected for each of them
959 based on $flags_H$. Why don't we pass the actual mapping? It's more
960 convenient to use flags, because the flags may be much more compact than
961 a full mapping. (Of course, for the package key, we have to compute the
962 true mapping, because the flags may contain spurious dependencies.) We
963 distinguish hole flags ($flags_H$), which are passed in as arguments,
964 from flags ($flags$), which are accumulated as we include packages,
965 although the reason for this distinction will not be clear until we
966 consider the commands themselves.
967
968 \subsection{Compiling signatures}
969
970 Signature compilation uses a new compilation mode \verb|-sig|. This
971 mode is similar to the behavior of how we process \verb|hs-boot| files,
972 but with one difference: we first check the environment as specified by
973 $flags$ and $flags_H$ to see if there is already an \verb|hi| file
974 (i.e., from an implementation) with the same logical name of this
975 signature, e.g., as would occur compiling the marked signature in this
976 example:
977
978 \begin{verbatim}
979 package p where
980 A :: ... -- *
981 B = [ import A; ... ]
982 package q where
983 A = [ ... ]
984 include p
985 \end{verbatim}
986
987 If there is, we check the implementation to ensure that it is compatible
988 with the signature. If the implementation was found in $flags_H$, we
989 also output a \texttt{hisig} file which, for all declarations the
990 signature exposes, forwards their definitions to the original
991 implementation file. The intent is that any code in the current package
992 which compiles against this signature will use this \texttt{hisig} file,
993 not the original one \texttt{hi} file.
994
995 \paragraph{Sometimes \texttt{hisig} is unnecessary}
996 In the following package:
997
998 \begin{verbatim}
999 package p where
1000 P = ...
1001 P :: ...
1002 \end{verbatim}
1003
1004 Paper Backpack specifies that we check the signature \m{P} against implementation
1005 \m{P}, but otherwise no changes are made (i.e., the signature does not narrow
1006 the implementation.) In this case, no \texttt{hisig} file is not necessary.
1007
1008 \paragraph{Absence of an \texttt{hi} file}
1009 By default, if we find an appropriate \texttt{hi} file, we'll use it
1010 (even if there are other \texttt{hisig} files in the search path), but
1011 sometimes there won't be any immediate \texttt{hi}, as in this case:
1012
1013 \begin{verbatim}
1014 package p where
1015 P :: [ x :: Int ]
1016 A = P
1017 package q where
1018 P :: [ x :: Int ]
1019 include p
1020 A :: [ x :: Int, y :: Int ] -- *
1021 package r where
1022 P = [ x = 1; y = 1 ]
1023 include q
1024 \end{verbatim}
1025
1026 When compiling the marked signature, we have $flags_H =
1027 \texttt{-package-loc}\ (\pname{r} \to cwd_r)$ and $flags =
1028 \texttt{-package}$ \pname{p$(\pname{m} \to \pname{r}$:$\m{P})$}. The
1029 only interface files for \m{A} in these two environments is the
1030 \texttt{hisig} from package \pname{p}. However, if we use that
1031 \texttt{hisig}, we will falsely conclude that \m{A} is not sufficiently
1032 implemented, when in fact it is. The answer to this conundrum? When
1033 only \texttt{hisig} files are found, we lookup the \emph{original}
1034 \texttt{hi} file, and typecheck against that. An invariant that should
1035 hold is that all such \texttt{hisig} files should have the same original
1036 \texttt{hi} file (if this invariant is violated, then we attempt to link
1037 two physical module implementations together, which is a type error!)
1038
1039 \paragraph{Imports are resolved using only $flags$}
1040 Another important detail is that $flags_H$ is \emph{only} used to tell
1041 if there is already a module with the logical name: if the signature
1042 file contains an import statement, this lookup should be done with
1043 respect to just $flags$. As an example:
1044
1045 \begin{verbatim}
1046 package p where
1047 A :: [ x :: Int ]
1048 B :: [ import A; y :: Int ]
1049 C = [ import B; z = y ]
1050 package q where
1051 A = [ x = 0; y = 0 ]
1052 B = [ y = 1 ]
1053 include p
1054 \end{verbatim}
1055
1056 If we processed the import in \m{B} using $flags_H$, we might
1057 accidentally pick up the implementation of \m{A} from package \pname{q},
1058 where we actually wanted to use the forwarding signature built when
1059 processing signature \m{A}. Notice that this is opposite of what occurs
1060 when we are looking for an implementation to check consistency against!
1061 These imports work the same way as resolving imports for compiled implementations;
1062 in particular, see an important detail in the next section.
1063
1064 \paragraph{Multiple signatures} As a simplification, we assume that there
1065 is only one signature per logical name in a package. (This prevents
1066 us from expressing mutual recursion in signatures, but let's not worry
1067 about it for now.)
1068
1069 \paragraph{Restricted recursive modules ala hs-boot}\label{sec:hs-boot-restrict}
1070 It should be possible to support GHC-style mutual recursion using the
1071 Backpack formalism immediately using hs-boot files. However, to avoid
1072 the need for a shaping pass, we must adopt an existing constraint that
1073 already applies to hs-boot files: \emph{at the time we define a signature,
1074 we must know what the original name for all data types is}. In practice,
1075 GHC enforces this by stating that: (1) an hs-boot file must be
1076 accompanied with an implementation, and (2) the implementation must
1077 in fact define (and not reexport) all of the declarations in the signature.
1078 We can discover if a signature is intended to break a recursive module loop
1079 when we discover that $p\notin flags_H$; in this case, we fallback to the
1080 old hs-boot behavior. (Alternatively, the user can explicitly ask for it.)
1081
1082 Why does this not require a shaping pass? The reason is that the
1083 signature is not really polymorphic: we require that the $\alpha$ module
1084 variable be resolved to a concrete module later in the same package, and
1085 that all the $\beta$ module variables be unified with $\alpha$. Thus, we
1086 know ahead of time the original names and don't need to deal with any
1087 renaming.\footnote{This strategy doesn't completely resolve the problem
1088 of cross-package mutual recursion, because we need to first compile a
1089 bit of the first package (signatures), then the second package, and then
1090 the rest of the first package.}
1091
1092 Compiling packages in this way gives the tantalizing possibility
1093 of true separate compilation: the only thing we don't know is what the actual
1094 package name of an indefinite package will be, and what the correct references
1095 to have are. This is a very minor change to the assembly, so one could conceive
1096 of dynamically rewriting these references at the linking stage. But
1097 separate compilation achieved in this fashion would not be able to take
1098 advantage of cross-module optimizations.
1099
1100 \subsection{Compiling implementations}
1101
1102 Compiling a module is the same as usual; however, we \emph{omit}
1103 $flags_H$ from the environment. This is to force compilation to use any
1104 \emph{locally} compiled \texttt{hisig} files from signature compilation,
1105 so we see an appropriately thinned version of the module.
1106
1107 \paragraph{Multiple signatures} When there are multiple signatures
1108 defined for a logical name, we must merge them together. Although we
1109 assume that a single package doesn't have multiple signatures, we could
1110 bring in a signature from an include:
1111
1112 \begin{verbatim}
1113 package p where
1114 A :: [ x :: Int ]
1115 package q
1116 include p
1117 A :: [ y :: Int ]
1118 B = [ import A; z = x + y ] -- *
1119 package r where
1120 A = [ x = 0; y = 0 ]
1121 include q
1122 \end{verbatim}
1123
1124 In the compilation of \texttt{B}, there is no one \texttt{hisig} found
1125 in $flags$ (remember that \pname{r}:\m{A} is not findable from $flags$!)
1126 which provides all the necessary identifiers. Instead, we have to look
1127 in both \texttt{hisig} files. Internally, when we lookup the module, we
1128 discover there are distinct interface files for the module; however,
1129 they point to the same original interface file, so the module finder
1130 loads all of them, merges them together, and uses that as the interface.
1131
1132 An alternate scheme is to perform signature merging in the \verb|-check| step,
1133 and have some way of telling what the ``latest'' \verb|hi|/\verb|hisig| file is.
1134
1135 \subsection{Compiling aliases}
1136
1137 Aliasing is a new compilation mode designated by \verb|-alias|. In
1138 general, in the alias $p = p'$, what is done will vary depending on how
1139 $p$ and $p'$ are defined in the $flags$ environments.
1140 (Due to the linking restriction, both are guaranteed to be findable in
1141 one or the other.) Here, we say $p\in flags$ to mean that given the
1142 configuration of the module database with $flags$, $p$ can be found.
1143
1144 \paragraph{$p\in flags$, $p'\not\in flags$ } This is a type error, $p'$ is
1145 not findable in the environment, so there's nothing to link against. (Sure,
1146 it might be in $flags_H$, but these things are not visible in subpackages
1147 unless we declare a signature.) Example (marked line fails):
1148
1149 \begin{verbatim}
1150 package p where
1151 A :: ...
1152 A = B -- *
1153 package q where
1154 A = ...
1155 B = ...
1156 include p
1157 \end{verbatim}
1158
1159 This is a side effect of the fact that Backpack aliases are oriented;
1160 one could imagine an alternate scheme where the aliases are symmetric.
1161
1162 \paragraph{$p\not\in flags$, $p'\in flags$ } Here,
1163 $p$ is completely undefined, so we are simply defining it from $p'$, no
1164 linking occurs. If $p'$ is an \texttt{hisig}, we copy it; if it is an \texttt{hi},
1165 we create a forwarding \texttt{hisig} to it which reexports \emph{all} of
1166 its entities. Both examples are presented below:
1167
1168 \begin{verbatim}
1169 -- A.hisig available in flags
1170 package p where
1171 A :: ...
1172 B = A -- *
1173 package q where
1174 A = ...
1175 include p
1176
1177 -- A.hi available in flags
1178 package p where
1179 A = ...
1180 B = A -- *
1181 \end{verbatim}
1182
1183 Especially for the first example, we do \emph{not} consider \texttt{B} as
1184 part of the holes of package \texttt{p} (only explicit signatures count.)
1185 This means that even if $p\in flags_H$, causing a problem, we will discover
1186 it only when we go back to checking the parent package, as seen in this
1187 example:
1188
1189 \begin{verbatim}
1190 package p where
1191 A :: ...
1192 B = A -- this is fine
1193 package q where
1194 A = ...
1195 B = ...
1196 include p -- this is not (A != B)
1197 \end{verbatim}
1198
1199 \paragraph{$p\in flags$, $p'\in flags$ }
1200 This is the most interesting situation: we'll have to do some sort of linking,
1201 and it will depend on what we find in $flags$ for both modules. For all cases
1202 we need to check whatever the \texttt{hi}/\texttt{hisig} files are, their backing
1203 implementations are the same (remember the linking restriction!) as in these
1204 examples:
1205
1206 \begin{verbatim}
1207 package p where
1208 A = ...
1209 B = A
1210 B = A -- succeeds
1211 package q where
1212 A1 = ...
1213 A2 = ...
1214 B = A1
1215 B = A2 -- fails
1216 \end{verbatim}
1217
1218 Furthermore, if only an \texttt{hisig} file is available for $p$ in $flags$, Paper
1219 Backpack instructs us to extend its signature to include everything from $p'$.
1220
1221 \begin{verbatim}
1222 package p where
1223 A :: [ x :: Bool ]
1224 B :: [ y :: Bool ]
1225 B = A
1226 C = [ import B; z = x + y ] -- typechecks
1227 D = [ import A; z = x + y ] -- doesn't typecheck
1228 \end{verbatim}
1229
1230 But this might be a bit of a bother to implement if we don't have a way of dealing
1231 with multiple \texttt{hisig} files per module name\ldots
1232
1233 \subsection{Compiling includes}
1234
1235 We consider two cases separately; first the simple case with no
1236 thinning/renaming, and then the case with renaming (for compactness of
1237 presentation, we assume thinning is represented as a set of identity
1238 renamings).
1239
1240 In both cases, we need to compute a set of flags which will be
1241 sufficient to determine where the implementations of holes in the
1242 subpackage are. In the case where there is no renaming, the logical
1243 names that could be linked against are precisely the ones available in
1244 the current environment: implementations from holes, implementation
1245 which were defined from includes, and even the modules from the
1246 in-progress compilation of the current package. All of these components
1247 are incorporated into the $flags_H$ parameter of the recursive call. In
1248 particular, the new \texttt{-package-loc} lets us define a temporary
1249 mapping from a package key to a working directory to do lookups, since
1250 $\mathcal{K}$ will generally not be registered in the package database,
1251 but we still need to find the relevant \texttt{hi} files.
1252
1253 Once we've compiled the subpackage $\mathcal{K}'$, we need to do two things.
1254 First, we need to add the package to our $flags$, so that later modules we
1255 compile are able to see the modules (and signatures) which were defined.
1256 Second, we need to make sure that the subpackage is \emph{consistent} with
1257 our current environment. For example:
1258
1259 \begin{verbatim}
1260 package p where
1261 A = ...
1262 package q where
1263 A = ...
1264 include p
1265 \end{verbatim}
1266
1267 \pname{p} is a perfectly good package, but we can't include in \pname{q} because
1268 the identities of the two \m{A}'s don't match up. Of course, if they were the
1269 same \m{A}, this would be fine.
1270
1271 If we adopt the system where we merge signature files as we go along, this checking
1272 phase would also be a good time to merge any \texttt{hisig} files from the subpackage
1273 with any in the current $flags$ environment.
1274
1275 \paragraph{With renaming} Renaming poses an interesting challenge because \emph{holes}
1276 can be renamed:
1277
1278 \begin{verbatim}
1279 package p where
1280 A :: ...
1281 package q where
1282 B = ...
1283 include p (A as B)
1284 \end{verbatim}
1285
1286 This renaming makes it possible for the include statement to explicitly say how
1287 holes should be implemented, but it also means that \emph{just} stuffing the current set of flags
1288 is unlikely to work, because everything has been renamed. At the same time, we can't
1289 drop the $flags_H$, because it may contain information about how to find packages which
1290 are not registered in package database yet (\verb|-package-loc|).
1291
1292 Instead, what we do is pass the same set of flags as before, but use these package
1293 locations \emph{solely} to resolve the mapping in \verb|-reset-module-env| from holes
1294 to their implementations: this mapping replaces the old $flags_H$, except for these
1295 explicitly instantiated modules.
1296
1297 \emph{ToDo: maybe something better can be done here}
1298
1299 \subsection{Commentary}
1300
1301 \paragraph{So how the heck are \texttt{flags} processed} Previously, the purpose of
1302 \verb|-package| flags was to modify the exposed flag on entries in the in-memory package
1303 database, which itself was used to create a map from modules to all packages which
1304 defined them.
1305
1306 The existence of \verb|-hole-flags| means we need to track a second package state just for
1307 the lookup needed when we're compiling an \verb|hsig| file and want to know if it is
1308 implemented in $flags_H$. Additionally, \verb|-package-loc| and \verb|-reset-module-env|
1309 mean that not all modules in the database are necessarily in the package database. (It's worth
1310 remarking why we can't use \verb|-I|: the package key is not right, since \verb|-I| assumes
1311 that you are adding extra include directories for the compilation of the \emph{current} package
1312 key, not necessarily one of, say, a parent package.) Finally, \verb|-package| arguments
1313 are now augmented with explicit thinning and renaming.
1314
1315 Thus, the new model is to do away with ``exposed packages'', and instead have a package
1316 database mapping package keys to packages (when we need to find original names), and then
1317 a mapping of modules to their respective packages which is updated by the various commands.
1318
1319 \begin{itemize}
1320 \item \verb|-package| adds the package to the package key map, and
1321 dumps all of the exposed modules of the package into the module
1322 map. If it has renamings, the entries placed in the module map
1323 are tweaked accordingly.
1324 \item \verb|-package-loc| adds the location to the package key map,
1325 and scans the directory for all interface files and puts them in
1326 the module map (alternately, this could be done lazily, or the
1327 modules could be explicitly provided.)
1328 \item \verb|-reset-module-env| performs some module resolutions based
1329 on the current module map, and then replaces the map with the result
1330 of the resolution. (The package key map is left alone.)
1331 \end{itemize}
1332
1333 \paragraph{Just because it compiled, doesn't mean the individual packages type check}
1334 Here is a simple example:
1335
1336 \begin{verbatim}
1337 package p where
1338 A :: [ data T = T ]
1339 B :: [ data T = T ]
1340 C = [
1341 import A
1342 import B
1343 x = A.T :: B.T
1344 ]
1345 package q where
1346 A = [ data T = T ]
1347 B = A
1348 include p
1349 \end{verbatim}
1350
1351 Here, we incorrectly decide \m{A}\verb|.T| and \m{B}\verb|.T| are type
1352 equal when typechecking \m{C}, because the \verb|hisig| files we
1353 generate for them all point to the same original implementation. However,
1354 \pname{p} should not typecheck.
1355
1356 The problem here is that type checking asks ``does it compile with respect
1357 to all possible instantiations of the holes'', whereas compilation asks
1358 ``does it compile with respect to this particular instantiation of holes.''
1359 It's a bit unavoidable, really.
1360
1361 \section{Shaped Backpack}
1362
1363 Despite the simplicity of shapeless Backpack with the linking
1364 restriction in the absence of holes, we will find that when we have
1365 holes, it will be very difficult to do type-checking without
1366 some form of shaping. This section is very much a work in progress,
1367 but the ability to typecheck against holes, even with the linking restriction,
1368 is a very important part of modular separate development, so we will need
1369 to support it at some point.
1370
1371 \subsection{Efficient shaping}
1372
1373 (These are Edward's opinion, he hasn't convinced other folks that this is
1374 the right way to do it.)
1375
1376 In this section, I want to argue that, although shaping constitutes
1377 a pre-pass which must be run before compilation in earnest, it is only
1378 about as bad as the dependency resolution analysis that GHC already does
1379 in \verb|ghc -M| or \verb|ghc --make|.
1380
1381 In Paper Backpack, what information does shaping compute? It looks at
1382 exports, imports, data declarations and value declarations (but not the
1383 actual expressions associated with these values.) As a matter of fact,
1384 GHC already must look at the imports associated with a package in order
1385 to determine the dependency graph, so that it can have some order to compile
1386 modules in. There is a specialized parser which just parses these statements,
1387 and then ignores the rest of the file.
1388
1389 A bit of background: the \emph{renamer} is responsible for resolving
1390 imports and figuring out where all of these entities actually come from.
1391 SPJ would really like to avoid having to run the renamer in order to perform
1392 a shaping pass.
1393
1394 \paragraph{Is it necessary to run the Renamer to do shaping?}
1395 Edward and Scott believe the answer is no, well, partially.
1396 Shaping needs to know the original names of all entities exposed by a
1397 module/signature. Then it needs to know (a) which entities a module/signature
1398 defines/declares locally and (b) which entities that module/signature exports.
1399 The former, (a), can be determined by a straightforward inspection of a parse
1400 tree of the source file.\footnote{Note that no expression or type parsing
1401 is necessary. We only need names of local values, data types, and data
1402 constructors.} The latter, (b), is a bit trickier. Right now it's the Renamer
1403 that interprets imports and exports into original names, so we would still
1404 rely on that implementation. However, the Renamer does other, harder things
1405 that we don't need, so ideally we could factor out the import/export
1406 resolution from the Renamer for use in shaping.
1407
1408 Unfortunately the Renamer's import resolution analyzes \verb|.hi| files, but for
1409 local modules, which haven't yet been typechecked, we don't have those.
1410 Instead, we could use a new file format, \verb|.hsi| files, to store the shape of
1411 a locally defined module. (Defined packages are bundled with their shapes,
1412 so included modules have \verb|.hsi| files as well.) (What about the logical
1413 vs.~physical distinction in file names?) If we refactor the import/export
1414 resolution code, could we rewrite it to generically operate on both
1415 \verb|.hi| files and \verb|.hsi| files?
1416
1417 Alternatively, rather than storing shapes on a per-source basis, we could
1418 store (in memory) the entire package shape. Similarly, included packages
1419 could have a single shape file for the entire package. Although this approach
1420 would make shaping non-incremental, since an entire package's shape would
1421 be recomputed any time a constituent module's shape changes, we do not expect
1422 shaping to be all that expensive.
1423
1424 \subsection{Typechecking of indefinite modules}\label{sec:typechecking-indefinite}
1425
1426 Recall in our argument in the definite case, where we showed there are
1427 no holes in the physical context. With indefinite modules, this is no
1428 longer true. While (with the linking restriction) these holes will never
1429 be linked against a physical implementation, they may be linked against
1430 other signatures. (Note: while disallowing signature linking would
1431 solve our problem, it would disallow a wide array of useful instances of
1432 signature reuse, for example, a package mylib that implements both
1433 mylib-1x-sig and mylib-2x-sig.)
1434
1435 With holes, we must handle module variables, and we sometimes must unify them:
1436
1437 \begin{verbatim}
1438 package p where
1439 A :: [ data A ]
1440 package q where
1441 A :: [ data A ]
1442 package r where
1443 include p
1444 include q
1445 \end{verbatim}
1446
1447 In this package, it is not possible to a priori assign original names to
1448 module A in p and q, because in package r, they should have the same
1449 original name. When signature linking occurs, unification may occur,
1450 which means we have to rename all relevant original names. (A similar
1451 situation occurs when a module is typechecked against a signature.)
1452
1453 An invariant which would be nice to have is this: when typechecking a
1454 signature or including a package, we may apply renaming to the entities
1455 being brought into context. But once we've picked an original name for
1456 our entities, no further renaming should be necessary. (Formally, in the
1457 unification for semantic object shapes, apply the unifier to the second
1458 shape, but not the first one.)
1459
1460 However, there are plenty of counterexamples here:
1461
1462 \begin{verbatim}
1463 package p where
1464 A :: [ data A ]
1465 B :: [ data A ]
1466 M = ...
1467 A = B
1468 \end{verbatim}
1469
1470 In this package, does module M know that A.A and B.A are type equal? In
1471 fact, the shaping pass will have assigned equal module identities to A
1472 and B, so M \emph{equates these types}, despite the aliasing occurring
1473 after the fact.
1474
1475 We can make this example more sophisticated, by having a later
1476 subpackage which causes the aliasing; now, the decision is not even a
1477 local one (on the other hand, the equality should be evident by inspection
1478 of the package interface associated with q):
1479
1480 \begin{verbatim}
1481 package p where
1482 A :: [ data A ]
1483 B :: [ data A ]
1484 package q where
1485 A :: [ data A ]
1486 B = A
1487 package r where
1488 include p
1489 include q
1490 \end{verbatim}
1491
1492 Another possibility is that it might be acceptable to do a mini-shaping
1493 pass, without parsing modules or signatures, \emph{simply} looking at
1494 names and aliases. But logical names are not the only mechanism by
1495 which unification may occur:
1496
1497 \begin{verbatim}
1498 package p where
1499 C :: [ data A ]
1500 A = [ data A = A ]
1501 B :: [ import A(A) ]
1502 C = B
1503 \end{verbatim}
1504
1505 It is easy to conclude that the original names of C and B are the same. But
1506 more importantly, C.A must be given the original name of p:A.A. This can only
1507 be discovered by looking at the signature definition for B. In any case, it
1508 is worth noting that this situation parallels the situation with hs-boot
1509 files (although there is no mutual recursion here).
1510
1511 The conclusion is that you will probably, in fact, have to do real
1512 shaping in order to typecheck all of these examples.
1513
1514 \paragraph{Hey, these signature imports are kind of tricky\ldots}
1515
1516 When signatures and modules are interleaved, the interaction can be
1517 complex. Here is an example:
1518
1519 \begin{verbatim}
1520 package p where
1521 C :: [ data A ]
1522 M = [ import C; ... ]
1523 A = [ import M; data A = A ]
1524 C :: [ import A(A) ]
1525 \end{verbatim}
1526
1527 Here, the second signature for C refers to a module implementation A
1528 (this is permissible: it simply means that the original name for p:C.A
1529 is p:A.A). But wait! A relies on M, and M relies on C. Circularity?
1530 Fortunately not: a client of package p will find it impossible to have
1531 the hole C implemented in advance, since they will need to get their hands on module
1532 A\ldots but it will not be defined prior to package p.
1533
1534 In any case, however, it would be good to emit a warning if a package
1535 cannot be compiled without mutual recursion.
1536
1537 \subsection{Rename on entry}
1538
1539 Consider the following example:
1540
1541 \begin{verbatim}
1542 package p where
1543 A :: [ data T = T ]
1544 B = [ import A; x = T ]
1545 package q where
1546 C :: ...
1547 A = [ data T = T ]
1548 include p
1549 D = [
1550 import qualified A
1551 import qualified B
1552 import C
1553 x = B.T :: A.T
1554 ]
1555 \end{verbatim}
1556
1557 We are interested in type-checking \pname{q}, which is an indefinite package
1558 on account of the uninstantiated hole \m{C}. Furthermore, let's suppose that
1559 \pname{p} has already been independently typechecked, and its interface files
1560 installed in some global location with $\alpha_A$ used as the module identity
1561 of \m{A}. (To simplify this example, we'll assume $\beta_{AT}=\alpha_A$.)
1562
1563 The first three lines of \pname{q} type check in the normal way, but \m{D}
1564 now poses a problem: if we load the interface file for \m{B} the normal way,
1565 we will get a reference to type \texttt{T} with the original name $\alpha_A$.\texttt{T},
1566 whereas from \m{A} we have an original name \pname{q}:\m{A}.\texttt{T}.
1567
1568 Let's suppose that we already have the result of a shaping pass, which
1569 maps our identity variables to their true identities.
1570 Let's consider the possible options here:
1571
1572 \begin{itemize}
1573 \item We could re-typecheck \pname{p}, feeding it the correct instantiations
1574 for its variables. However, this seems wasteful: we typechecked the
1575 package already, and up-to-renaming, the interface files are exactly
1576 what we need to type check our application.
1577 \item We could make copies of all the interface files, renamed to have the
1578 right original names. This also seems wasteful: why should we have to
1579 create a new copy of every interface file in a library we depend on?
1580 \item When \emph{reading in} the interface file to GHC, we could apply the
1581 renaming according to the shaping pass and store that in memory.
1582 \end{itemize}
1583
1584 That last solution is pretty appealing, however, there are still circumstances
1585 we need to create new interface files; these exactly mirror the cases described
1586 in Section~\ref{sec:compiling}.
1587
1588 \subsection{Incremental typechecking}
1589 We want to typecheck modules incrementally, i.e., when something changes in
1590 a package, we only want to re-typecheck the modules that care about that
1591 change. GHC already does this today.%
1592 \footnote{\url{https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/RecompilationAvoidance}}
1593 Is the same mechanism sufficient for Backpack? Edward and Scott think that it
1594 is, mostly. Our conjecture is that a module should be re-typechecked if the
1595 existing mechanism says it should \emph{or} if the logical shape
1596 context (which maps logical names to physical names) has changed. The latter
1597 condition is due to aliases that affect typechecking of modules.
1598
1599 Let's look again at an example from before:
1600 \begin{verbatim}
1601 package p where
1602 A :: [ data A ]
1603 B :: [ data A ]
1604 M = [ import A; import B; ... ]
1605 \end{verbatim}
1606 Let's say that \verb|M| is typechecked successfully. Now we add an alias binding
1607 at the end of the package, \verb|A = B|. Does \verb|M| need to be
1608 re-typechecked? Yes! (Well, it seems so, but let's just assert ``yes'' for now.
1609 Certainly in the reverse case---if we remove the alias and then ask---this
1610 is true, since \verb|M| might have depended on the two \verb|A| types
1611 being the same.)
1612 The logical shape context changed to say that \verb|A| and
1613 \verb|B| now map to the same physical module identity. But does the existing
1614 recompilation avoidance mechanism say that \verb|M| should be re-typechecked?
1615 It's unclear. The \verb|.hi| file for \verb|M| records that it imported \verb|A| and
1616 \verb|B| with particular ABIs, but does it also know about the physical module
1617 identities (or rather, original module names) of these modules?
1618
1619 Scott thinks this highlights the need for us to get our story straight about
1620 the connection between logical names, physical module identities, and file
1621 names!
1622
1623
1624 \subsection{Installing indefinite packages}\label{sec:installing-indefinite}
1625
1626 If an indefinite package contains no code at all, we only need
1627 to install the interface file for the signatures. However, if
1628 they include code, we must provide all of the
1629 ingredients necessary to compile them when the holes are linked against
1630 actual implementations. (Figure~\ref{fig:pkgdb})
1631
1632 \paragraph{Source tarball or preprocessed source?} What is the representation of the source that is saved is. There
1633 are a number of possible choices:
1634
1635 \begin{itemize}
1636 \item The original tarballs downloaded from Hackage,
1637 \item Preprocessed source files,
1638 \item Some sort of internal, type-checked representation of Haskell code (maybe the output of the desugarer).
1639 \end{itemize}
1640
1641 Storing the tarballs is the simplest and most straightforward mechanism,
1642 but we will have to be very certain that we can recompile the module
1643 later in precisely the same we compiled it originally, to ensure the hi
1644 files match up (fortunately, it should be simple to perform an optional
1645 sanity check before proceeding.) The appeal of saving preprocessed
1646 source, or even the IRs, is that this is conceptually this is exactly
1647 what an indefinite package is: we have paused the compilation process
1648 partway, intending to finish it later. However, our compilation strategy
1649 for definite packages requires us to run this step using a \emph{different}
1650 choice of original names, so it's unclear how much work could actually be reused.
1651
1652 \section{Surface syntax}
1653
1654 In the Backpack paper, a brand new module language is presented, with
1655 syntax for inline modules and signatures. This syntax is probably worth implementing,
1656 because it makes it easy to specify compatibility packages, whose module
1657 definitions in general may be very short:
1658
1659 \begin{verbatim}
1660 package ishake-0.12-shake-0.13 where
1661 include shake-0.13
1662 Development.Shake.Sys = Development.Shake.Cmd
1663 Development.Shake = [ (**>) = (&>) ; (*>>) = (|*>)]
1664 Development.Shake.Rule = [ defaultPriority = rule . priority 0.5 ]
1665 include ishake-0.12
1666 \end{verbatim}
1667
1668 However, there are a few things that are less than ideal about the
1669 surface syntax proposed by Paper Backpack:
1670
1671 \begin{itemize}
1672 \item It's completely different from the current method users
1673 specify packages. There's nothing wrong with this per se
1674 (one simply needs to support both formats) but the smaller
1675 the delta, the easier the new packaging format is to explain
1676 and implement.
1677
1678 \item Sometimes order matters (relative ordering of signatures and
1679 module implementations), and other times it does not (aliases).
1680 This can be confusing for users.
1681
1682 \item Users have to order module definitions topologically,
1683 whereas in current Cabal modules can be listed in any order, and
1684 GHC figures out an appropriate order to compile them.
1685 \end{itemize}
1686
1687 Here is an alternative proposal, closely based on Cabal syntax. Given
1688 the following Backpack definition:
1689
1690 \begin{verbatim}
1691 package libfoo(A, B, C, Foo) where
1692 include base
1693 -- renaming and thinning
1694 include libfoo (Foo, Bar as Baz)
1695 -- holes
1696 A :: [ a :: Bool ].hsig
1697 A2 :: [ b :: Bool ].hsig
1698 -- normal module
1699 B = [
1700 import {-# SOURCE #-} A
1701 import Foo
1702 import Baz
1703 ...
1704 ].hs
1705 -- recursively linked pair of modules, one is private
1706 C :: [ data C ].hsig
1707 D = [ import {-# SOURCE #-} C; data D = D C ].hs
1708 C = [ import D; data C = C D ].hs
1709 -- alias
1710 A = A2
1711 \end{verbatim}
1712
1713 We can write the following Cabal-like syntax instead (where
1714 all of the signatures and modules are placed in appropriately
1715 named files):
1716
1717 \begin{verbatim}
1718 package: libfoo
1719 ...
1720 build-depends: base, libfoo (Foo, Bar as Baz)
1721 holes: A A2 -- deferred for now
1722 exposed-modules: Foo B C
1723 aliases: A = A2
1724 other-modules: D
1725 \end{verbatim}
1726
1727 Notably, all of these lists are \emph{insensitive} to ordering!
1728 The key idea is use of the \verb|{-# SOURCE #-}| pragma, which
1729 is enough to solve the important ordering constraint between
1730 signatures and modules.
1731
1732 Here is how the elaboration works. For simplicity, in this algorithm
1733 description, we assume all packages being compiled have no holes
1734 (including \verb|build-depends| packages). Later, we'll discuss how to
1735 extend the algorithm to handle holes in both subpackages and the main
1736 package itself.
1737
1738 \begin{enumerate}
1739
1740 \item At the top-level with \verb|package| $p$ and
1741 \verb|exposed-modules| $ms$, record \verb|package p (ms) where|
1742
1743 \item For each package $p$ with thinning/renaming $ms$ in
1744 \verb|build-depends|, record a \verb|include p (ms)| in the
1745 Backpack package. The ordering of these includes does not
1746 matter, since none of these packages have holes.
1747
1748 \item Take all modules $m$ in \verb|other-modules| and
1749 \verb|exposed-modules| which were not exported by build
1750 dependencies, and create a directed graph where hs and hs-boot
1751 files are nodes and imports are edges (the target of an edge is
1752 an hs file if it is a normal import, and an hs-boot file if it
1753 is a SOURCE import). Topologically sort this graph, erroring if
1754 this graph contains cycles (even with recursive modules, the
1755 cycle should have been broken by an hs-boot file). For each
1756 node, in this order, record \verb|M = [ ... ]| or \verb|M :: [ ... ]|
1757 depending on whether or not it is an hs or hs-boot. If possible,
1758 sort signatures before implementations when there is no constraint
1759 otherwise.
1760
1761 \end{enumerate}
1762
1763 Here is a simple example which shows how SOURCE can be used to disambiguate
1764 between two important cases. Suppose we have these modules:
1765
1766 \begin{verbatim}
1767 -- A1.hs
1768 import {-# SOURCE #-} B
1769
1770 -- A2.hs
1771 import B
1772
1773 -- B.hs
1774 x = True
1775
1776 -- B.hs-boot
1777 x :: Bool
1778 \end{verbatim}
1779
1780 Then we translate the following packages as follows:
1781
1782 \begin{verbatim}
1783 exposed-modules: A1 B
1784 -- translates to
1785 B :: [ x :: Bool ]
1786 A1 = [ import B ]
1787 B = [ x = True ]
1788 \end{verbatim}
1789
1790 but
1791
1792 \begin{verbatim}
1793 exposed-modules: A2 B
1794 -- translates to
1795 B = [ x = True ]
1796 B :: [ x :: Bool ]
1797 A2 = [ import B ]
1798 \end{verbatim}
1799
1800 The import controls placement between signature and module, and in A1 it
1801 forces B's signature to be sorted before B's implementation (whereas in
1802 the second section, there is no constraint so we preferentially place
1803 the B's implementation first)
1804
1805 \paragraph{Holes in the database} In the presence of holes,
1806 \verb|build-depends| resolution becomes more complicated. First,
1807 let's consider the case where the package we are building is
1808 definite, but the package database contains indefinite packages with holes.
1809 In order to maintain the linking restriction, we now have to order packages
1810 from step (2) of the previous elaboration. We can do this by creating
1811 a directed graph, where nodes are packages and edges are from holes to the
1812 package which implements them. If there is a cycle, this indicates a mutually
1813 recursive package. In the absence of cycles, a topological sorting of this
1814 graph preserves the linking invariant.
1815
1816 One subtlety to consider is the fact that an entry in \verb|build-depends|
1817 can affect how a hole is instantiated by another entry. This might be a
1818 bit weird to users, who might like to explicitly say how holes are
1819 filled when instantiating a package. Food for thought, surface syntax wise.
1820
1821 \paragraph{Holes in the package} Actually, this is quite simple: the
1822 ordering of includes goes as before, but some indefinite packages in the
1823 database are less constrained as they're ``dependencies'' are fulfilled
1824 by the holes at the top-level of this package. It's also worth noting
1825 that some dependencies will go unresolved, since the following package
1826 is valid:
1827
1828 \begin{verbatim}
1829 package a where
1830 A :: ...
1831 package b where
1832 include a
1833 \end{verbatim}
1834
1835 \paragraph{Multiple signatures} In Backpack syntax, it's possible to
1836 define a signature multiple times, which is necessary for mutually
1837 recursive signatures:
1838
1839 \begin{verbatim}
1840 package a where
1841 A :: [ data A ]
1842 B :: [ import A; data B = B A ]
1843 A :: [ import B; data A = A B ]
1844 \end{verbatim}
1845
1846 Critically, notice that we can see the constructors for both module B and A
1847 after the signatures are linked together. This is not possible in GHC
1848 today, but could be possible by permitting multiple hs-boot files. Now
1849 the SOURCE pragma indicating an import must \emph{disambiguate} which
1850 hs-boot file it intends to include. This might be one way of doing it:
1851
1852 \begin{verbatim}
1853 -- A.hs-boot2
1854 data A
1855
1856 -- B.hs-boot
1857 import {-# SOURCE hs-boot2 #-} A
1858
1859 -- A.hs-boot
1860 import {-# SOURCE hs-boot #-} B
1861 \end{verbatim}
1862
1863 \paragraph{Explicit or implicit reexports} One annoying property of
1864 this proposal is that, looking at the \verb|exposed-modules| list, it is
1865 not immediately clear what source files one would expect to find in the
1866 current package. It's not obvious what the proper way to go about doing
1867 this is.
1868
1869 \paragraph{Better syntax for SOURCE} If we enshrine the SOURCE import
1870 as a way of solving Backpack ordering problems, it would be nice to have
1871 some better syntax for it. One possibility is:
1872
1873 \begin{verbatim}
1874 abstract import Data.Foo
1875 \end{verbatim}
1876
1877 which makes it clear that this module is pluggable, typechecking against
1878 a signature. Note that this only indicates how type checking should be
1879 done: when actually compiling the module we will compile against the
1880 interface file for the true implementation of the module.
1881
1882 It's worth noting that the SOURCE annotation was originally made a
1883 pragma because, in principle, it should have been possible to compile
1884 some recursive modules without needing the hs-boot file at all. But if
1885 we're moving towards boot files as signatures, this concern is less
1886 relevant.
1887
1888 \section{Type classes and type families}
1889
1890 \subsection{Background}
1891
1892 Before we talk about how to support type classes in Backpack, it's first
1893 worth talking about what we are trying to achieve in the design. Most
1894 would agree that \emph{type safety} is the cardinal law that should be
1895 preserved (in the sense that segfaults should not be possible), but
1896 there are many instances of ``bad behavior'' (top level mutable state,
1897 weakening of abstraction guarantees, ambiguous instance resolution, etc)
1898 which various Haskellers may disagree on the necessity of ruling out.
1899
1900 With this in mind, it is worth summarizing what kind of guarantees are
1901 presently given by GHC with regards to type classes and type families,
1902 as well as characterizing the \emph{cultural} expectations of the
1903 Haskell community.
1904
1905 \paragraph{Type classes} When discussing type class systems, there are
1906 several properties that one may talk about.
1907 A set of instances is \emph{confluent} if, no matter what order
1908 constraint solving is performed, GHC will terminate with a canonical set
1909 of constraints that must be satisfied for any given use of a type class.
1910 In other words, confluence says that we won't conclude that a program
1911 doesn't type check just because we swapped in a different constraint
1912 solving algorithm.
1913
1914 Confluence's closely related twin is \emph{coherence} (defined in ``Type
1915 classes: exploring the design space''). This property states that
1916 ``every different valid typing derivation of a program leads to a
1917 resulting program that has the same dynamic semantics.'' Why could
1918 differing typing derivations result in different dynamic semantics? The
1919 answer is that context reduction, which picks out type class instances,
1920 elaborates into concrete choices of dictionaries in the generated code.
1921 Confluence is a prerequisite for coherence, since one
1922 can hardly talk about the dynamic semantics of a program that doesn't
1923 type check.
1924
1925 In the vernacular, confluence and coherence are often incorrectly used
1926 to refer to another related property: \emph{global uniqueness of instances},
1927 which states that in a fully compiled program, for any type, there is at most one
1928 instance resolution for a given type class. Languages with local type
1929 class instances such as Scala generally do not have this property, and
1930 this assumption is frequently used for abstraction.
1931
1932 So, what properties does GHC enforce, in practice?
1933 In the absence of any type system extensions, GHC's employs a set of
1934 rules (described in ``Exploring the design space'') to ensure that type
1935 class resolution is confluent and coherent. Intuitively, it achieves
1936 this by having a very simple constraint solving algorithm (generate
1937 wanted constraints and solve wanted constraints) and then requiring the
1938 set of instances to be \emph{nonoverlapping}, ensuring there is only
1939 ever one way to solve a wanted constraint. Overlap is a
1940 more stringent restriction than either confluence or coherence, and
1941 via the \verb|OverlappingInstances| and \verb|IncoherentInstances|, GHC
1942 allows a user to relax this restriction ``if they know what they're doing.''
1943
1944 Surprisingly, however, GHC does \emph{not} enforce global uniqueness of
1945 instances. Imported instances are not checked for overlap until we
1946 attempt to use them for instance resolution. Consider the following program:
1947
1948 \begin{verbatim}
1949 -- T.hs
1950 data T = T
1951 -- A.hs
1952 import T
1953 instance Eq T where
1954 -- B.hs
1955 import T
1956 instance Eq T where
1957 -- C.hs
1958 import A
1959 import B
1960 \end{verbatim}
1961
1962 When compiled with one-shot compilation, \verb|C| will not report
1963 overlapping instances unless we actually attempt to use the \verb|Eq|
1964 instance in C.\footnote{When using batch compilation, GHC reuses the
1965 instance database and is actually able to detect the duplicated
1966 instance when compiling B. But if you run it again, recompilation
1967 avoidance skips A, and it finishes compiling! See this bug:
1968 \url{https://ghc.haskell.org/trac/ghc/ticket/5316}} This is by
1969 design\footnote{\url{https://ghc.haskell.org/trac/ghc/ticket/2356}}:
1970 ensuring that there are no overlapping instances eagerly requires
1971 eagerly reading all the interface files a module may depend on.
1972
1973 We might summarize these three properties in the following manner.
1974 Culturally, the Haskell community expects \emph{global uniqueness of instances}
1975 to hold: the implicit global database of instances should be
1976 confluent and coherent. GHC, however, does not enforce uniqueness of
1977 instances: instead, it merely guarantees that the \emph{subset} of the
1978 instance database it uses when it compiles any given module is confluent and coherent. GHC does do some
1979 tests when an instance is declared to see if it would result in overlap
1980 with visible instances, but the check is by no means
1981 perfect\footnote{\url{https://ghc.haskell.org/trac/ghc/ticket/9288}};
1982 truly, \emph{type-class constraint resolution} has the final word. One
1983 mitigating factor is that in the absence of \emph{orphan instances}, GHC is
1984 guaranteed to eagerly notice when the instance database has overlap.\footnote{Assuming that the instance declaration checks actually worked\ldots}
1985
1986 Clearly, the fact that GHC's lazy behavior is surprising to most
1987 Haskellers means that the lazy check is mostly good enough: a user
1988 is likely to discover overlapping instances one way or another.
1989 However, it is relatively simple to construct example programs which
1990 violate global uniqueness of instances in an observable way:
1991
1992 \begin{verbatim}
1993 -- A.hs
1994 module A where
1995 data U = X | Y deriving (Eq, Show)
1996
1997 -- B.hs
1998 module B where
1999 import Data.Set
2000 import A
2001
2002 instance Ord U where
2003 compare X X = EQ
2004 compare X Y = LT
2005 compare Y X = GT
2006 compare Y Y = EQ
2007
2008 ins :: U -> Set U -> Set U
2009 ins = insert
2010
2011 -- C.hs
2012 module C where
2013 import Data.Set
2014 import A
2015
2016 instance Ord U where
2017 compare X X = EQ
2018 compare X Y = GT
2019 compare Y X = LT
2020 compare Y Y = EQ
2021
2022 ins' :: U -> Set U -> Set U
2023 ins' = insert
2024
2025 -- D.hs
2026 module Main where
2027 import Data.Set
2028 import A
2029 import B
2030 import C
2031
2032 test :: Set U
2033 test = ins' X $ ins X $ ins Y $ empty
2034
2035 main :: IO ()
2036 main = print test
2037
2038 -- OUTPUT
2039 $ ghc -Wall -XSafe -fforce-recomp --make D.hs
2040 [1 of 4] Compiling A ( A.hs, A.o )
2041 [2 of 4] Compiling B ( B.hs, B.o )
2042
2043 B.hs:5:10: Warning: Orphan instance: instance [safe] Ord U
2044 [3 of 4] Compiling C ( C.hs, C.o )
2045
2046 C.hs:5:10: Warning: Orphan instance: instance [safe] Ord U
2047 [4 of 4] Compiling Main ( D.hs, D.o )
2048 Linking D ...
2049 $ ./D
2050 fromList [X,Y,X]
2051 \end{verbatim}
2052
2053 Locally, all type class resolution was coherent: in the subset of
2054 instances each module had visible, type class resolution could be done
2055 unambiguously. Furthermore, the types of \verb|ins| and \verb|ins'|
2056 discharge type class resolution, so that in \verb|D| when the database
2057 is now overlapping, no resolution occurs, so the error is never found.
2058
2059 It is easy to dismiss this example as an implementation wart in GHC, and
2060 continue pretending that global uniqueness of instances holds. However,
2061 the problem with \emph{global uniqueness of instances} is that they are
2062 inherently nonmodular: you might find yourself unable to compose two
2063 components because they accidentally defined the same type class
2064 instance, even though these instances are plumbed deep in the
2065 implementation details of the components.
2066
2067 As it turns out, there is already another feature in Haskell which
2068 must enforce global uniqueness, to prevent segfaults.
2069 We now turn to type classes' close cousin: type families.
2070
2071 \paragraph{Type families} With type families, confluence is the primary
2072 property of interest. (Coherence is not of much interest because type
2073 families are elaborated into coercions, which don't have any
2074 computational content.) Rather than considering what the set of
2075 constraints we reduce to, confluence for type families considers the
2076 reduction of type families. The overlap checks for type families
2077 can be quite sophisticated, especially in the case of closed type
2078 families.
2079
2080 Unlike type classes, however, GHC \emph{does} check the non-overlap
2081 of type families eagerly. The analogous program does \emph{not} type check:
2082
2083 \begin{verbatim}
2084 -- F.hs
2085 type family F a :: *
2086 -- A.hs
2087 import F
2088 type instance F Bool = Int
2089 -- B.hs
2090 import F
2091 type instance F Bool = Bool
2092 -- C.hs
2093 import A
2094 import B
2095 \end{verbatim}
2096
2097 The reason is that it is \emph{unsound} to ever allow any overlap
2098 (unlike in the case of type classes where it just leads to incoherence.)
2099 Thus, whereas one might imagine dropping the global uniqueness of instances
2100 invariant for type classes, it is absolutely necessary to perform global
2101 enforcement here. There's no way around it!
2102
2103 \subsection{Local type classes}
2104
2105 Here, we say \textbf{NO} to global uniqueness.
2106
2107 This design is perhaps best discussed in relation to modular type
2108 classes, which shares many similar properties. Instances are now
2109 treated as first class objects (in MTCs, they are simply modules)---we
2110 may explicitly hide or include instances for type class resolution (in
2111 MTCs, this is done via the \verb|using| top-level declaration). This is
2112 essentially what was sketched in Section 5 of the original Backpack
2113 paper. As a simple example:
2114
2115 \begin{verbatim}
2116 package p where
2117 A :: [ data T = T ]
2118 B = [ import A; instance Eq T where ... ]
2119
2120 package q where
2121 A = [ data T = T; instance Eq T where ... ]
2122 include p
2123 \end{verbatim}
2124
2125 Here, \verb|B| does not see the extra instance declared by \verb|A|,
2126 because it was thinned from its signature of \verb|A| (and thus never
2127 declared canonical.) To declare an instance without making it
2128 canonical, it must placed in a separate (unimported) module.
2129
2130 Like modular type classes, Backpack does not give rise to incoherence,
2131 because instance visibility can only be changed at the top level module
2132 language, where it is already considered best practice to provide
2133 explicit signatures. Here is the example used in the Modular Type
2134 Classes paper to demonstrate the problem:
2135
2136 \begin{verbatim}
2137 structure A = using EqInt1 in
2138 struct ...fun f x = eq(x,x)... end
2139 structure B = using EqInt2 in
2140 struct ...val y = A.f(3)... end
2141 \end{verbatim}
2142
2143 Is the type of f \verb|int -> bool|, or does it have a type-class
2144 constraint? Because type checking proceeds over the entire program, ML
2145 could hypothetically pick either. However, ported to Haskell, the
2146 example looks like this:
2147
2148 \begin{verbatim}
2149 EqInt1 :: [ instance Eq Int ]
2150 EqInt2 :: [ instance Eq Int ]
2151 A = [
2152 import EqInt1
2153 f x = x == x
2154 ]
2155 B = [
2156 import EqInt2
2157 import A hiding (instance Eq Int)
2158 y = f 3
2159 ]
2160 \end{verbatim}
2161
2162 There may be ambiguity, yes, but it can be easily resolved by the
2163 addition of a top-level type signature to \verb|f|, which is considered
2164 best-practice anyway. Additionally, Haskell users are trained to expect
2165 a particular inference for \verb|f| in any case (the polymorphic one).
2166
2167 Here is another example which might be considered surprising:
2168
2169 \begin{verbatim}
2170 package p where
2171 A :: [ data T = T ]
2172 B :: [ data T = T ]
2173 C = [
2174 import qualified A
2175 import qualified B
2176 instance Show A.T where show T = "A"
2177 instance Show B.T where show T = "B"
2178 x :: String
2179 x = show A.T ++ show B.T
2180 ]
2181 \end{verbatim}
2182
2183 In the original Backpack paper, it was implied that module \verb|C|
2184 should not type check if \verb|A.T = B.T| (failing at link time).
2185 However, if we set aside, for a moment, the issue that anyone who
2186 imports \verb|C| in such a context will now have overlapping instances,
2187 there is no reason in principle why the module itself should be
2188 problematic. Here is the example in MTCs, which I have good word from
2189 Derek does type check.
2190
2191 \begin{verbatim}
2192 signature SIG = sig
2193 type t
2194 val mk : t
2195 end
2196 signature SHOW = sig
2197 type t
2198 val show : t -> string
2199 end
2200 functor Example (A : SIG) (B : SIG) =
2201 let structure ShowA : SHOW = struct
2202 type t = A.t
2203 fun show _ = "A"
2204 end in
2205 let structure ShowB : SHOW = struct
2206 type t = B.t
2207 fun show _ = "B"
2208 end in
2209 using ShowA, ShowB in
2210 struct
2211 val x = show A.mk ++ show B.mk
2212 end : sig val x : string end
2213 \end{verbatim}
2214
2215 The moral of the story is, even though in a later context the instances
2216 are overlapping, inside the functor, the type-class resolution is unambiguous
2217 and should be done (so \verb|x = "AB"|).
2218
2219 Up until this point, we've argued why MTCs and this Backpack design are similar.
2220 However, there is an important sociological difference between modular type-classes
2221 and this proposed scheme for Backpack. In the presentation ``Why Applicative
2222 Functors Matter'', Derek mentions the canonical example of defining a set:
2223
2224 \begin{verbatim}
2225 signature ORD = sig type t; val cmp : t -> t -> bool end
2226 signature SET = sig type t; type elem;
2227 val empty : t;
2228 val insert : elem -> t -> t ...
2229 end
2230 functor MkSet (X : ORD) :> SET where type elem = X.t
2231 = struct ... end
2232 \end{verbatim}
2233
2234 This is actually very different from how sets tend to be defined in
2235 Haskell today. If we directly encoded this in Backpack, it would
2236 look like this:
2237
2238 \begin{verbatim}
2239 package mk-set where
2240 X :: [
2241 data T
2242 cmp :: T -> T-> Bool
2243 ]
2244 Set :: [
2245 data Set
2246 empty :: Set
2247 insert :: T -> Set -> Set
2248 ]
2249 Set = [
2250 import X
2251 ...
2252 ]
2253 \end{verbatim}
2254
2255 It's also informative to consider how MTCs would encode set as it is written
2256 today in Haskell:
2257
2258 \begin{verbatim}
2259 signature ORD = sig type t; val cmp : t -> t -> bool end
2260 signature SET = sig type 'a t;
2261 val empty : 'a t;
2262 val insert : (X : ORD) => X.t -> X.t t -> X.t t
2263 end
2264 struct MkSet :> SET = struct ... end
2265 \end{verbatim}
2266
2267 Here, it is clear to see that while functor instantiation occurs for
2268 implementation, it is not occuring for types. This is a big limitation
2269 with the Haskell approach, and it explains why Haskellers, in practice,
2270 find global uniqueness of instances so desirable.
2271
2272 Implementation-wise, this requires some subtle modifications to how we
2273 do type class resolution. Type checking of indefinite modules works as
2274 before, but when go to actually compile them against explicit
2275 implementations, we need to ``forget'' that two types are equal when
2276 doing instance resolution. This could probably be implemented by
2277 associating type class instances with the original name that was
2278 utilized when typechecking, so that we can resolve ambiguous matches
2279 against types which have the same original name now that we are
2280 compiling.
2281
2282 As we've mentioned previously, this strategy is unsound for type families.
2283
2284 \subsection{Globally unique}
2285
2286 Here, we say \textbf{YES} to global uniqueness.
2287
2288 When we require the global uniqueness of instances (either because
2289 that's the type class design we chose, or because we're considering
2290 the problem of type families), we will need to reject declarations like the
2291 one cited above when \verb|A.T = B.T|:
2292
2293 \begin{verbatim}
2294 A :: [ data T ]
2295 B :: [ data T ]
2296 C :: [
2297 import qualified A
2298 import qualified B
2299 instance Show A.T where show T = "A"
2300 instance Show B.T where show T = "B"
2301 ]
2302 \end{verbatim}
2303
2304 The paper mentions that a link-time check is sufficient to prevent this
2305 case from arising. While in the previous section, we've argued why this
2306 is actually unnecessary when local instances are allowed, the link-time
2307 check is a good match in the case of global instances, because any
2308 instance \emph{must} be declared in the signature. The scheme proceeds
2309 as follows: when some instances are typechecked initially, we type check
2310 them as if all of variable module identities were distinct. Then, when
2311 we perform linking (we \verb|include| or we unify some module
2312 identities), we check again if to see if we've discovered some instance
2313 overlap. This linking check is akin to the eager check that is
2314 performed today for type families; it would need to be implemented for
2315 type classes as well: however, there is a twist: we are \emph{redoing}
2316 the overlap check now that some identities have been unified.
2317
2318 As an implementation trick, one could deferring the check until \verb|C|
2319 is compiled, keeping in line with GHC's lazy ``don't check for overlap
2320 until the use site.'' (Once again, unsound for type families.)
2321
2322 \paragraph{What about module inequalities?} An older proposal was for
2323 signatures to contain ``module inequalities'', i.e., assertions that two
2324 modules are not equal. (Technically: we need to be able to apply this
2325 assertion to $\beta$ module variables, since \verb|A != B| while
2326 \verb|A.T = B.T|). Currently, Edward thinks that module inequalities
2327 are only marginal utility with local instances (i.e., not enough to
2328 justify the implementation cost) and not useful at all in the world of
2329 global instances!
2330
2331 With local instances, module inequalities could be useful to statically
2332 rule out examples like \verb|show A.T ++ show B.T|. Because such uses
2333 are not necessarily reflected in the signature, it would be a violation
2334 of separate module development to try to divine the constraint from the
2335 implementation itself. I claim this is of limited utility, however, because,
2336 as we mentioned earlier, we can compile these ``incoherent'' modules perfectly
2337 coherently. With global instances, all instances must be in the signature, so
2338 while it might be aesthetically displeasing to have the signature impose
2339 extra restrictions on linking identities, we can carry this out without
2340 violating the linking restriction.
2341
2342 \section{Bits and bobs}
2343
2344 \subsection{Abstract type synonyms}
2345
2346 In Paper Backpack, abstract type synonyms are not permitted, because GHC doesn't
2347 understand how to deal with them. The purpose of this section is to describe
2348 one particularly nastiness of abstract type synonyms, by way of the occurs check:
2349
2350 \begin{verbatim}
2351 A :: [ type T ]
2352 B :: [ import qualified A; type T = [A.T] ]
2353 \end{verbatim}
2354
2355 At this point, it is illegal for \verb|A = B|, otherwise this type synonym would
2356 fail the occurs check. This seems like pretty bad news, since every instance
2357 of the occurs check in the type-checker could constitute a module inequality.
2358
2359 \section{Open questions}\label{sec:open-questions}
2360
2361 Here are open problems about the implementation which still require
2362 hashing out.
2363
2364 \begin{itemize}
2365
2366 \item In Section~\ref{sec:simplifying-backpack}, we argued that we
2367 could implement Backpack without needing a shaping pass. We're
2368 pretty certain that this will work for typechecking and
2369 compiling fully definite packages with no recursive linking, but
2370 in Section~\ref{sec:typechecking-indefinite}, we described some
2371 of the prevailing difficulties of supporting signature linking.
2372 Renaming is not an insurmountable problem, but backwards flow of
2373 shaping information can be, and it is unclear how best to
2374 accommodate this. This is probably the most important problem
2375 to overcome.
2376
2377 \item In Section~\ref{sec:installing-indefinite}, a few choices for how to
2378 store source code were pitched, however, there is not consensus on which
2379 one is best.
2380
2381 \item What is the impact of the multiplicity of PackageIds on
2382 dependency solving in Cabal? Old questions of what to prefer
2383 when multiple package-versions are available (Cabal originally
2384 only needed to solve this between different versions of the same
2385 package, preferring the oldest version), but with signatures,
2386 there are more choices. Should there be a complex solver that
2387 does all signature solving, or a preprocessing step that puts
2388 things back into the original Cabal version. Authors may want
2389 to suggest policy for what packages should actually link against
2390 signatures (so a crypto library doesn't accidentally link
2391 against a null cipher package).
2392
2393 \end{itemize}
2394
2395 \end{document}