[backpack] Rewrite compilation to be cleaner.
[ghc.git] / docs / backpack / backpack-impl.tex
1 \documentclass{article}
2
3 \usepackage{pifont}
4 \usepackage{graphicx} %[pdftex] OR [dvips]
5 \usepackage{fullpage}
6 \usepackage{wrapfig}
7 \usepackage{float}
8 \usepackage{titling}
9 \usepackage{hyperref}
10 \usepackage{tikz}
11 \usepackage{color}
12 \usepackage{footnote}
13 \usepackage{float}
14 \usepackage{algorithm}
15 \usepackage{algpseudocode}
16 \usetikzlibrary{arrows}
17 \usetikzlibrary{positioning}
18 \setlength{\droptitle}{-6em}
19
20 \input{commands-new-new.tex}
21
22 \newcommand{\nuAA}{\nu_\mathit{AA}}
23 \newcommand{\nuAB}{\nu_\mathit{AB}}
24 \newcommand{\nuGA}{\nu_\mathit{GA}}
25 \newcommand{\nuGB}{\nu_\mathit{GB}}
26 \newcommand{\betaPL}{\beta_\mathit{PL}}
27 \newcommand{\betaAA}{\beta_\mathit{AA}}
28 \newcommand{\betaAS}{\beta_\mathit{AS}}
29 \newcommand{\thinandalso}{\hspace{.45cm}}
30 \newcommand{\thinnerandalso}{\hspace{.38cm}}
31
32 \input{commands-rebindings.tex}
33
34 \newcommand{\var}[1]{\textsf{#1}}
35
36 \newcommand{\ghcfile}[1]{\textsl{#1}}
37
38 \title{Implementing Backpack}
39
40 \begin{document}
41
42 \maketitle
43
44 The purpose of this document is to describe an implementation path
45 for Backpack in GHC\@.
46
47 We start off by outlining the current architecture of GHC, ghc-pkg and Cabal,
48 which constitute the existing packaging system. We then state what our subgoals
49 are, since there are many similar sounding but different problems to solve. Next,
50 we describe the ``probably correct'' implementation plan, and finish off with
51 some open design questions. This is intended to be an evolving design document,
52 so please contribute!
53
54 \tableofcontents
55
56 \section{Current packaging architecture}
57
58 The overall architecture is described in Figure~\ref{fig:arch}.
59
60 \begin{figure}[H]
61 \center{\scalebox{0.8}{\includegraphics{arch.png}}}
62 \label{fig:arch}\caption{Architecture of GHC, ghc-pkg and Cabal. Green bits indicate additions from upcoming IHG work, red bits indicate additions from Backpack. Orange indicates a Haskell library.}
63 \end{figure}
64
65 Here, arrows indicate dependencies from one component to another. Color
66 coding is as follows: orange components are libaries, green components
67 are to be added with the IHG work, red components are to be added with
68 Backpack. (Thus, black and orange can be considered the current)
69
70 \subsection{Installed package database}
71
72 Starting from the bottom, we have the \emph{installed package database}
73 (actually a collection of such databases), which stores information
74 about what packages have been installed are thus available to be
75 compiled against. There is both a global database (for the system
76 administrator) and a local database (for end users), which can be
77 updated independently. One way to think about the package database
78 is as a \emph{cache of object code}. In principle, one could compile
79 any piece of code by repeatedly recompiling all of its dependencies;
80 the installed package database describes when this can be bypassed.
81
82 \begin{figure}[H]
83 \center{\scalebox{0.8}{\includegraphics{pkgdb.png}}}
84 \label{fig:pkgdb}\caption{Anatomy of a package database.}
85 \end{figure}
86
87 In Figure~\ref{fig:pkgdb}, we show the structure of a package database.
88 The installed package are created from a Cabal file through the process
89 of dependency resolution and compilation. In database terms, the primary key
90 of a package database is the InstalledPackageId
91 (Figure~\ref{fig:current-pkgid}). This ID uniquely identifies an
92 instance of an installed package. The PackageId omits the ABI hash and
93 is used to qualify linker exported symbols: the current value of this
94 parameter is communicated to GHC using the \verb|-package-id| flag.
95
96 In principle, packages with different PackageIds should be linkable
97 together in the same compiled program, whereas packages with the same
98 PackageId are not (even if they have different InstalledPackageIds). In
99 practice, GHC is currently only able to select one version of a package,
100 as it clears out all old versions of the package in
101 \ghcfile{compiler/main/Package.lhs}:applyPackageFlag.
102
103 \begin{figure}
104 \center{\begin{tabular}{r l}
105 PackageId & package name, package version \\
106 InstalledPackageId & PackageId, ABI hash \\
107 \end{tabular}}
108 \label{fig:current-pkgid}\caption{Current structure of package identifiers.}
109 \end{figure}
110
111 The database entry itself contains the information from the installed package ID,
112 as well as information such as what dependencies it was linked against, where
113 its compiled code and interface files live, its compilation flags, what modules
114 it exposes, etc. Much of this information is only relevant to Cabal; GHC
115 uses a subset of the information in the package database.
116
117 \subsection{GHC}
118
119 The two programs which access the package database directly are GHC
120 proper (for compilation) and ghc-pkg (which is a general purpose
121 command line tool for manipulating the database.) GHC relies on
122 the package database in the following ways:
123
124 \begin{itemize}
125 \item It imports the local and global package databases into
126 its runtime database, and applies modifications to the exposed
127 and trusted status of the entries via the flags \verb|-package|
128 and others (\ghcfile{compiler/main/Packages.lhs}). The internal
129 package state can be seen at \verb|-v4| or higher.
130 \item It uses this package database to find the location of module
131 interfaces when it attempts to load the module info of an external
132 module (\ghcfile{compiler/iface/LoadIface.hs}).
133 \end{itemize}
134
135 GHC itself performs a type checking phase, which generates an interface
136 file representing the module (so that later invocations of GHC can load the type
137 of a module), and then after compilation projects object files and linked archives
138 for programs to use.
139
140 \paragraph{Original names} Original names are an important design pattern
141 in GHC\@.
142 Sometimes, a name can be exposed in an hi file even if its module
143 wasn't exposed. Here is an example (compiled in package R):
144
145 \begin{verbatim}
146 module X where
147 import Internal (f)
148 g = f
149
150 module Internal where
151 import Internal.Total (f)
152 \end{verbatim}
153
154 Then in X.hi:
155
156 \begin{verbatim}
157 g = <R.id, Internal.Total, f> (this is the original name)
158 \end{verbatim}
159
160 (The reason we refer to the package as R.id is because it's the
161 full package ID, and not just R).
162
163 \subsection{hs-boot}
164
165 \verb|hs-boot| is a special mechanism used to support recursive linking
166 of modules within a package, today. Suppose I have a recursive module
167 dependency between modules and A and B. I break one of\ldots
168
169 (ToDo: describe how hs-boot mechanism works)
170
171 \subsection{Cabal}
172
173 Cabal is the build system for GHC, we can think of it as parsing a Cabal
174 file describing a package, and then making (possibly multiple)
175 invocations to GHC to perform the appropriate compilation. What
176 information does Cabal pass onto GHC\@? One can get an idea for this by
177 looking at a prototypical command line that Cabal invokes GHC with:
178
179 \begin{verbatim}
180 ghc --make
181 -package-name myapp-0.1
182 -hide-all-packages
183 -package-id containers-0.9-ABCD
184 Module1 Module2
185 \end{verbatim}
186
187 There are a few things going on here. First, Cabal has to tell GHC
188 what the name of the package it's compiling (otherwise, GHC can't appropriately
189 generate symbols that other code referring to this package might generate).
190 There are also a number of commands which configure its in-memory view of
191 the package database (GHC's view of the package database may not directly
192 correspond to what is on disk). There's also an optimization here: in principle,
193 GHC can compile each module one-by-one, but instead we use the \verb|--make| flag
194 because this allows GHC to reuse some data structures, resulting in a nontrivial
195 speedup.
196
197 (ToDo: describe cabal-install/sandbox)
198
199 \section{Goals}
200
201 Here are some of the high-level goals which motivate our improvements to
202 the module system.
203
204 \begin{itemize}
205 \item Solve \emph{Cabal hell}, a situation which occurs when conflicting
206 version ranges on a wide range of dependencies leads to a situation
207 where it is impossible to satisfy the constraints. We're seeking
208 to solve this problem in two ways: first, we want to support
209 multiple instances of containers-2.9 in the database which are
210 compiled with different dependencies (and even link them
211 together), and second, we want to abolish (often inaccurate)
212 version ranges and move to a regime where packages depend on
213 signatures. Version ranges may still be used to indicate important
214 semantic changes (e.g., bugs or bad behavior on the part of package
215 authors), but they should no longer drive dependency resolution
216 and often only be recorded after the fact.
217
218 \item Support \emph{hermetic builds with sharing}. A hermetic build
219 system is one which simulates rebuilding every package whenever
220 it is built; on the other hand, actually rebuilding every time
221 is extremely inefficient (but what happens in practice with
222 Cabal sandboxes). We seek to solve this problem with the IHG work,
223 by allowing multiple instances of a package in the database, where
224 the only difference is compilation parameters. We don't care
225 about being able to link these together in a single program.
226
227 \item Support \emph{module-level pluggability} as an alternative to
228 existing (poor) usage of type classes. The canonical example are
229 strings, where a library might want to either use the convenient
230 but inefficient native strings, or the efficient packed Text data
231 type, but would really like to avoid having to say \verb|StringLike s => ...|
232 in all of their type signatures. While we do not plan on supporting
233 separate compilation, Cabal should understand how to automatically
234 recompile these ``indefinite'' packages when they are instantiated
235 with a new plugin.
236
237 \item Support \emph{separate modular development}, where a library and
238 an application using the library can be developed and type-checked
239 separately, intermediated by an interface. The canonical example
240 here is the \verb|ghc-api|, which is a large, complex API that
241 the library writers (GHC developers) frequently change---the ability
242 for downstream projects to say, ``Here is the API I'm relying on''
243 without requiring these projects to actually be built would greatly
244 assist in keeping the API stable. This is applicable in
245 the pluggability example as well, where we want to ensure that all
246 of the $M \times N$ configurations of libraries versus applications
247 type check, by only running the typechecker $M + N$ times. A closely
248 related concern is related toolchain support for extracting a signature
249 from an existing implementation, as current Haskell culture is averse
250 to explicitly writing separate signature files.
251
252 \item Subsume existing support for \emph{mutually recursive modules},
253 without the double vision problem.
254 \end{itemize}
255
256 A \emph{non-goal} is to allow users to upgrade upstream libraries
257 without recompiling downstream. This is an ABI concern and we're not
258 going to worry about it.
259
260 \clearpage
261
262 \section{Module and package identity}
263
264 \begin{figure}[H]
265 \begin{tabular}{p{0.45\textwidth} p{0.45\textwidth}}
266 \begin{verbatim}
267 package p where
268 A :: [ data X ]
269 P = [ import A; data Y = Y X ]
270 package q where
271 A1 = [ data X = X1 ]
272 A2 = [ data X = X2 ]
273 include p (A as A1, P as P1)
274 include p (A as A2, P as P2)
275 \end{verbatim}
276 &
277 \begin{verbatim}
278 package p where
279 A :: [ data X ]
280 P = [ data T = T ] -- no A import!
281 package q where
282 A1 = [ data X = X1 ]
283 A2 = [ data X = X2 ]
284 include p (A as A1, P as P1)
285 include p (A as A2, P as P2)
286 \end{verbatim}
287 \\
288 (a) Type equality must consider holes\ldots &
289 (b) \ldots but how do we track dependencies? \\
290 \end{tabular}
291 \caption{Two similar examples}\label{fig:simple-ex}
292 \end{figure}
293
294 One of the central questions one encounters when type checking Haskell
295 code is: \emph{when are two types equal}? In ordinary Haskell, the
296 answer is simple: ``They are equal if their \emph{original names} (i.e.,
297 where they were originally defined) are the same.'' However, in
298 Backpack, the situation is murkier due to the presence of \emph{holes}.
299 Consider the pair of examples in Figure~\ref{fig:simple-ex}.
300 In Figure~\ref{fig:simple-ex}a, the types \m{B1}.\verb|Y| and \m{B2}.\verb|Y| should not be
301 considered equal, even though na\"\i vely their original names are
302 \pname{p}:\m{B}.\verb|Y|, since their arguments are different \verb|X|'s!
303 On the other hand, if we instantiated \pname{p} twice with the same \m{A}
304 (e.g., change the second include to \texttt{include p (A as A1, P as P2)}),
305 we might consider the two resulting \verb|Y|'s
306 equal, an \emph{applicative} semantics of identity instantiation. In
307 Figure~\ref{fig:simple-ex}b, we see that even though \m{A} was instantiated differently,
308 we might reasonably wonder if \texttt{T} should still be considered the same,
309 since it has no dependence on the actual choice of \m{A}.
310
311 In fact, there are quite a few different choices that can be made here.
312 Figures~\ref{fig:applicativity}~and~\ref{fig:granularity} summarize the various
313 choices on two axes: the granularity of applicativity (under what circumstances
314 do we consider two types equal) and the granularity of dependency (what circumstances
315 do we consider two types not equal)? A \ding{52} means the design we have chosen
316 answers the question affirmatively---\ding{54}, negatively---but all of these choices
317 are valid points on the design space.
318
319 \subsection{The granularity of applicativity}
320
321 An applicative semantics of package instantiation states that if a package is
322 instantiated with the ``same arguments'', then the resulting entities it defines
323 should also be considered equal. Because Backpack uses \emph{mix-in modules},
324 it is very natural to consider the arguments of a package instantiation as the
325 modules, as shown in Figure~\ref{fig:applicativity}b: the same module \m{A} is
326 linked for both instantiations, so \m{P1} and \m{P2} are considered equal.
327
328 However, we consider the situation at a finer granularity, we might say, ``Well,
329 for a declaration \texttt{data Y = Y X}, only the definition of type \verb|X| matters.
330 If they are the same, then \verb|Y| is the same.'' In that case, we might accept
331 that in Figure~\ref{fig:applicativity}a, even though \pname{p} is instantiated
332 with different modules, at the end of the day, the important component \verb|X| is
333 the same in both cases, so \verb|Y| should also be the same. This is a sort of
334 ``extreme'' view of modular development, where every declaration is desugared
335 into a separate module. In our design, we will be a bit more conservative, and
336 continue with module level applicativity, in the same manner as Paper Backpack.
337
338 \paragraph{Implementation considerations}
339 Compiling Figure~\ref{fig:applicativity}b to dynamic libraries poses an
340 interesting challenge, if every package compiles to a dynamic library.
341 When we compile package \pname{q}, the libraries we end up producing are \pname{q}
342 and an instance of \pname{p} (instantiated with \pname{q}:\m{A}). Furthermore,
343 \pname{q} refers to code in \pname{p} (the import in \m{Q}), and vice versa (the usage
344 of the instantiated hole \m{A}). When building static libraries, this circular
345 dependency doesn't matter: when we link the executable, we can resolve all
346 of the symbols in one go. However, when the libraries in question are
347 dynamic libraries \verb|libHSq.so| and \verb|libHSp(q:A).so|, we now have
348 a \emph{circular dependency} between the two dynamic libraries, and most dynamic
349 linkers will not be able to load either of these libraries.
350
351 To break the circularity in Figure~\ref{fig:applicativity}b, we have to \emph{inline}
352 the entire module \m{A} into the instance of \pname{p}. Since the code is exactly
353 the same, we can still consider the instance of \m{A} in \pname{q} and in \pname{p}
354 type equal. However, in Figure~\ref{fig:applicativity}c, applicativity has been
355 done at a coarser level: although we are using Backpack's module mixin syntax,
356 morally, this example is filling in the holes with the \emph{package} \pname{a}
357 (rather than a module). In this case, we can achieve code sharing, since
358 \pname{p} can refer directly to \pname{a}, breaking the circularity.
359
360 \newcolumntype{C}{>{\centering\arraybackslash}p{0.3\textwidth}}
361 \begin{savenotes}
362 \begin{figure}
363 \begin{tabular}{C C C}
364 \begin{verbatim}
365 package q where
366 A = [ data X = X ]
367 A1 = [ import A; x = 0 ]
368 A2 = [ import A; x = 1 ]
369 include p (A as A1, P as P1)
370 include p (A as A2, P as P2)
371 Q = [ import P1; ... ]
372 \end{verbatim}
373 &
374 \begin{verbatim}
375 package q where
376 A = [ data X = X ]
377
378
379 include p (A, P as P1)
380 include p (A, P as P2)
381 Q = [ import P1; ... ]
382 \end{verbatim}
383 &
384 \begin{verbatim}
385 package a where
386 A = [ data X = X ]
387 package q where
388 include a
389 include p (A, P as P1)
390 include p (A, P as P2)
391 Q = [ import P1; ... ]
392 \end{verbatim}
393 \\
394 (a) Declaration applicativity \ding{54} &
395 (b) Module applicativity \ding{52} &
396 (c) Package applicativity \ding{52} \\
397 \end{tabular}
398 \caption{Choices of granularity of applicativity on \pname{p}: given \texttt{data Y = Y X}, is \m{P1}.\texttt{Y} equal to \m{P2}.\texttt{Y}?}\label{fig:applicativity}
399 \end{figure}
400 \end{savenotes}
401
402 \subsection{The granularity of dependency}
403
404 \begin{savenotes}
405 \newcolumntype{C}{>{\centering\arraybackslash}p{0.3\textwidth}}
406 \begin{figure}
407 \begin{tabular}{C C C}
408 \begin{verbatim}
409 package p(A,P) where
410 A :: [ data X ]
411 P = [
412 import A
413 data T = T
414 data Y = Y X
415 ]
416 \end{verbatim}
417 &
418 \begin{verbatim}
419 package p(A,P) where
420 A :: [ data X ]
421 B = [ data T = T ]
422 C = [
423 import A
424 data Y = Y X
425 ]
426 P = [
427 import B
428 import C
429 ]
430 \end{verbatim}
431 &
432 \begin{verbatim}
433 package b where
434 B = [ data T = T ]
435 package c where
436 A :: [ data X ]
437 C = [
438 import A
439 data Y = Y X
440 ]
441 package p(A,P) where
442 include b; include c
443 P = [ import B; import C ]
444 \end{verbatim}
445 \\
446 (a) Declaration granularity \ding{54} &
447 (b) Module granularity \ding{54} &
448 (c) Package granularity \ding{52} \\
449 \end{tabular}
450 \caption{Choices of granularity for dependency: is the identity of \texttt{T} independent of how \m{A} is instantiated?}\label{fig:granularity}
451 \end{figure}
452
453 \end{savenotes}
454
455 In the previous section, we considered \emph{what} entities may be considered for
456 computing dependency; in this section we consider \emph{which} entities are actually
457 considered as part of the dependencies for the declaration/module/package we're writing.
458 Figure~\ref{fig:granularity} contains a series of examples which exemplify the choice
459 of whether or not to collect dependencies on a per-declaration, per-module or per-package basis:
460
461 \begin{itemize}
462 \item Package-level granularity states that the modules in a package are
463 considered to depend on \emph{all} of the holes in the package, even if
464 the hole is never imported. Figure~\ref{fig:granularity}c is factored so that
465 \verb|T| is defined in a distinct package \pname{b} with no holes, so no matter
466 the choice of \m{A}, \m{B}.\verb|T| will be the same. On the other hand, in
467 Figure~\ref{fig:granularity}b, there is a hole in the package defining \m{B},
468 so the identity of \verb|T| will depend on the choice of \m{A}.
469
470 \item Module-level granularity states that each module has its own dependency,
471 computed by looking at its import statements. In this setting, \verb|T| in Figure~\ref{fig:granularity}b
472 is independent of \m{A}, since the hole is never imported in \m{B}. But once again, in
473 Figure~\ref{fig:granularity}a, there is an import in the module defining \verb|T|,
474 so the identity of \verb|T| once again depends on the choice of \m{A}.
475
476 \item Finally, at the finest level of granularity, one could chop up \pname{p} in
477 Figure~\ref{fig:granularity}a, looking at the type declaration-level dependency
478 to suss out whether or not \verb|T| depends on \m{A}. It doesn't refer to
479 anything in \m{A}, so it is always considered the same.
480 \end{itemize}
481
482 It is well worth noting that the system described by Paper Backpack tracks dependencies per module;
483 however, we have decided that we will implement tracking per package instead:
484 a coarser grained granularity which accepts less programs.
485
486 Is a finer form of granularity \emph{better?} Not necessarily! For
487 one, we can always split packages into further subpackages (as was done
488 in Figure~\ref{fig:granularity}c) which better reflect the internal hole
489 dependencies, so it is always possible to rewrite a program to make it
490 typecheck---just with more packages. Additionally, the finer the
491 granularity of dependency, the more work I have to do to understand what
492 the identities of entities in a module are. In Paper Backpack, I have
493 to understand the imports of all modules in a package; with
494 declaration-granularity, I have to understand the entire code. This is
495 a lot of work for the developer to think about; a more granular model is
496 easier to remember and reason about. Finally, package-level granularity
497 is much easier to implement, as it preserves the previous compilation
498 model, \emph{one library per package}. At a fine level of granularity, we
499 may end up repeatedly compiling a module which actually should be considered
500 ``the same'' as any other instance of it.
501
502 Nevertheless, finer granularity can be desirable from an end-user perspective.
503 Usually, these circumstances arise when library-writers are forced to split their
504 components into many separate packages, when they would much rather have written
505 a single package. For example, if I define a data type in my library, and would
506 like to define a \verb|Lens| instance for it, I would create a new package just
507 for the instance, in order to avoid saddling users who aren't interested in lenses
508 with an extra dependency. Another example is test suites, which have dependencies
509 on various test frameworks that a user won't care about if they are not planning
510 on testing the code. (Cabal has a special case for this, allowing the user
511 to write effectively multiple packages in a single Cabal file.)
512
513 \subsection{Summary}
514
515 We can summarize all of the various schemes by describing the internal data
516 types that would be defined by GHC under each regime. First, we have
517 the shared data structures, which correspond closely to what users are
518 used to seeing:
519
520 \begin{verbatim}
521 <pkg-name> ::= containers, ...
522 <pkg-version ::= 1.0, ...
523 <pkg-id> ::= <pkg-name>-<pkg-version>
524 <mod-name> ::= Data.Set, ...
525 <occ> ::= empty, ...
526 \end{verbatim}
527
528 Changing the \textbf{granularity of applicativity} modifies how we represent the
529 list of dependencies associated with an entity. With module applicativity,
530 we list module identities (not yet defined); with declaration applicativity
531 we actually list the original names (i.e., ids).
532
533 \begin{verbatim}
534 <deps> ::= <id>, ... # Declaration applicativity
535 <deps> ::= <module>, ... # Module applicativity
536 \end{verbatim}
537
538 Changing the \textbf{granularity of dependency} affects how we compute
539 the lists of dependencies, and what entities are well defined:
540
541 \begin{verbatim}
542 # Package-level granularity
543 <pkg-key> ::= hash(<pkg-id> + <deps for pkg>)
544 <module> ::= <pkg-key> : <mod-name>
545 <id> ::= <module> . <occ>
546
547 # Module-level granularity
548 <pkg-key> not defined
549 <module> ::= hash(<pkg-id> : <mod-name> + <deps for mod>)
550 <id> ::= <module-key> . <occ>
551
552 # Declaration-level granularity
553 <pkg-key> not defined
554 <module> not defined
555 <id> ::= hash(<pkg-id> : <mod-name> . <occ> + <deps for decl>)
556 \end{verbatim}
557
558 Notice that as we increase the granularity, the notion of a ``package'' and a ``module''
559 become undefined. This is because, for example, with module-level granularity, a single
560 ``package'' may result in several modules, each of which have different sets of
561 dependencies. It doesn't make much sense to refer to the package as a monolithic entity,
562 because the point of splitting up the dependencies was so that if a user relies only
563 on a single module, it has a correspondingly restricted set of dependencies.
564 \subsection{The new scheme, formally}
565
566 \begin{wrapfigure}{R}{0.5\textwidth}
567 \begin{myfig}
568 \[
569 \begin{array}{@{}lr@{\;}c@{\;}l@{}}
570 \text{Package Names (\texttt{PkgName})} & P &\in& \mathit{PkgNames} \\
571 \text{Module Path Names (\texttt{ModName})} & p &\in& \mathit{ModPaths} \\
572 \text{Module Identity Vars} & \alpha,\beta &\in& \mathit{IdentVars} \\
573 \text{Package Key (\texttt{PackageKey})} & \K &::=& P(\vec{p\mapsto\nu}) \\
574 \text{Module Identities (\texttt{Module})} & \nu &::=&
575 \alpha ~|~
576 \K\colon\! p \\
577 \text{Module Identity Substs} & \phi,\theta &::=&
578 \{\vec{\alpha \coloneqq \nu}\} \\
579 \end{array}
580 \]
581 \caption{Module Identities}
582 \label{fig:mod-idents}
583 \end{myfig}
584 \end{wrapfigure}
585
586 In this section, we give a formal treatment of our choice in the design space, in the
587 same style as the Backpack paper, but omitting mutual recursion, as it follows straightforwardly.
588 Physical module
589 identities $\nu$, the \texttt{Module} component of \emph{original names} in GHC, are either (1) \emph{variables} $\alpha$, which are
590 used to represent holes\footnote{In practice, these will just be fresh paths in a special package key for variables.} or (2) a concrete module $p$ defined in package
591 $P$, with holes instantiated with other module identities (might be
592 empty)\footnote{In Paper Backpack, we would refer to just $P$:$p$ as the identity
593 constructor. However, we've written the subterms specifically next to $P$ to highlight the semantic difference of these terms.}.
594
595 As in traditional Haskell, every package contains a number of module
596 files at some module path $p$; within a package these paths are
597 guaranteed to be unique.\footnote{In Paper Backpack, the module expressions themselves are used to refer to globally unique identifiers for each literal. This makes the metatheory simpler, but for implementation purposes it is convenient to conflate the \emph{original} module path that a module is defined at with its physical identity.} When we write inline module definitions, we assume
598 that they are immediately assigned to a module path $p$ which is incorporated
599 into their identity. A module identity $\nu$ simply augments this
600 with subterms $\vec{p\mapsto\nu}$ representing how \emph{all} holes in the package $P$
601 were instantiated.\footnote{In Paper Backpack, we do not distinguish between holes/non-holes, and we consider all imports of the \emph{module}, not the package.} This naming is stable because the current Backpack surface syntax does not allow a logical path in a package
602 to be undefined. A package key is $P(\vec{p\mapsto\nu})$.
603
604 Here is the very first example from
605 Section 2 of the original Backpack paper, \pname{ab-1}:
606
607 \begin{example}
608 \Pdef{ab-1}{
609 \Pmod{A}{x = True}
610 \Pmod{B}{\Mimp{A}; y = not x}
611 % \Pmodd{C}{\mname{A}}
612 }
613 \end{example}
614
615 The identities of \m{A} and \m{B} are
616 \pname{ab-1}:\mname{A} and \pname{ab-1}:\mname{B}, respectively.\footnote{In Paper Backpack, the identity for \mname{B} records its import of \mname{A}, but since it is definite, this is strictly redundant.} In a package with holes, each
617 hole (within the package definition) gets a fresh variable as its
618 identity, and all of the holes associated with package $P$ are recorded. Consider \pname{abcd-holes-1}:
619
620 \begin{example}
621 \Pdef{abcd-holes-1}{
622 \Psig{A}{x :: Bool} % chktex 26
623 \Psig{B}{y :: Bool} % chktex 26
624 \Pmod{C}{x = False}
625 \Pmodbig{D}{
626 \Mimpq{A}\\
627 \Mimpq{C}\\
628 % \Mexp{\m{A}.x, z}\\
629 z = \m{A}.x \&\& \m{C}.x
630 }
631 }
632 \end{example}
633
634 The identities of the four modules
635 are, in order, $\alpha_a$, $\alpha_b$, $\pname{abcd-holes-1}(\alpha_a,\alpha_b)$:\mname{C}, and
636 $\pname{abcd-holes-1}(\alpha_a,\alpha_b)$:\mname{D}.\footnote{In Paper Backpack, the granularity is at the module level, so the subterms of \mname{C} and \mname{D} can differ.} We include both $\alpha_a$ and $\alpha_b$ in both \mname{C} and \mname{D}, regardless of the imports. When we link the package against an implementation of the hole, these variables are replaced with the identities of the modules we linked against.
637
638 Shaping proceeds in the same way as in Paper Backpack, except that the
639 shaping judgment must also accept the package key
640 $P(\vec{p\mapsto\alpha})$ so we can create identifiers with
641 \textsf{mkident}. This implies we must know ahead of time what the holes
642 of a package are.
643
644 \paragraph{A full Backpack comparison}
645 If you're curious about how the rest of the Backpack examples translate,
646 look no further than this section.
647
648 First, consider the module identities in the \m{Graph} instantiations in
649 \pname{multinst}, shown in Figure 2 of the original Backpack paper.
650 In the definition of \pname{structures}, assume that the variables for
651 \m{Prelude} and \m{Array} are $\alpha_P$ and $\alpha_A$ respectively.
652 The identity of \m{Graph} is $\pname{structures}(\alpha_P, \alpha_A)$:\m{Graph}. Similarly, the identities of the two array implementations
653 are $\nu_{AA} = \pname{arrays-a}(\alpha_P)$:\m{Array} and
654 $\nu_{AB} = \pname{arrays-b}(\alpha_P)$:\m{Array}.\footnote{Notice that the subterms coincide with Paper Backpack! A sign that module level granularity is not necessary for many use-cases.}
655
656 The package \pname{graph-a} is more interesting because it
657 \emph{links} the packages \pname{arrays-a} and \pname{structures}
658 together, with the implementation of \m{Array} from \pname{arrays-a}
659 \emph{instantiating} the hole \m{Array} from \pname{structures}. This
660 linking is reflected in the identity of the \m{Graph} module in
661 \pname{graph-a}: whereas in \pname{structures} it was $\nu_G =
662 \pname{structures}(\alpha_P, \alpha_A)$:\m{Graph}, in \pname{graph-a} it is
663 $\nu_{GA} = \nu_G[\nu_{AA}/\alpha_A] = \pname{structures}(\alpha_P, \nu_{AA})$:\m{Graph}. Similarly, the identity of \m{Graph} in
664 \pname{graph-b} is $\nu_{GB} = \nu_G[\nu_{AB}/\alpha_A] =
665 \pname{structures}(\alpha_P, \nu_{AB})$:\m{Graph}. Thus, linking consists
666 of substituting the variable identity of a hole by the concrete
667 identity of the module filling that hole.
668
669 Lastly, \pname{multinst} makes use of both of these \m{Graph}
670 modules, under the aliases \m{GA} and \m{GB}, respectively.
671 Consequently, in the \m{Client} module, \code{\m{GA}.G} and
672 \code{\m{GB}.G} will be correctly viewed as distinct types since they
673 originate in modules with distinct identities.
674
675 As \pname{multinst} illustrates, module identities effectively encode
676 dependency graphs at the package level.\footnote{In Paper Backpack, module identities
677 encode dependency graphs at the module level. In both cases, however, what is being
678 depended on is always a module.} Like in Paper Backpack, we have an \emph{applicative}
679 semantics of instantiation, and the applicativity example in Figure 3 of the
680 Backpack paper still type checks. However, because we are operating at a coarser
681 granularity, modules may have spurious dependencies on holes that they don't
682 actually depend on, which means less type equalities may hold.
683
684
685 \subsection{Cabal dependency resolution}
686
687 Currently, when we compile a Cabal
688 package, Cabal goes ahead and resolves \verb|build-depends| entries with actual
689 implementations, which we compile against. A planned addition to the package key,
690 independent of Backpack, is to record the transitive dependency tree selected
691 during this dependency resolution process, so that we can install \pname{libfoo-1.0}
692 twice compiled against different versions of its dependencies.
693 What is the relationship to this transitive dependency tree of \emph{packages},
694 with the subterms of our package identities which are \emph{modules}? Does one
695 subsume the other? In fact, these are separate mechanisms---two levels of indirections,
696 so to speak.
697
698 To illustrate, suppose I write a Cabal file with \verb|build-depends: foobar|. A reasonable assumption is that this translates into a
699 Backpack package which has \verb|include foobar|. However, this is not
700 actually a Paper Backpack package: Cabal's dependency solver has to
701 rewrite all of these package references into versioned references
702 \verb|include foobar-0.1|. For example, this is a pre-package:
703
704 \begin{verbatim}
705 package foo where
706 include bar
707 \end{verbatim}
708
709 and this is a Paper Backpack package:
710
711 \begin{verbatim}
712 package foo-0.3[bar-0.1[baz-0.2]] where
713 include bar-0.1[baz-0.2]
714 \end{verbatim}
715
716 This tree is very similar to the one tracking dependencies for holes,
717 but we must record this tree \emph{even} when our package has no holes.
718 % As a final example, the full module
719 % identity of \m{B1} in Figure~\ref{fig:granularity} may actually be $\pname{p-0.9(q-1.0[p-0.9]:A1)}$:\m{B}.
720
721
722 \subsection{Implementation}
723
724 In GHC's current packaging system, a single package compiles into a
725 single entry in the installed package database, indexed by the package
726 key. This property is preserved by package-level granularity, as we
727 assign the same package key to all modules. Package keys provide an
728 easy mechanism for sharing to occur: when an indefinite package is fully
729 instantiated, we can check if we already have its package key installed
730 in the installed package database. (At the end of this section, we'll
731 briefly discuss some of the problems actually implementing Paper Backpack.)
732 It is also important to note that we are \emph{willing to duplicate code};
733 processes like this already happen in other parts of the compiler
734 (such as inlining.)
735
736 \paragraph{Relaxing package selection restrictions} As mentioned
737 previously, GHC is unable to select multiple packages with the same
738 package name (but different package keys). This restriction needs to be
739 lifted. We should add a new flag \verb|-package-key|. GHC also knows
740 about version numbers and will mask out old versions of a library when
741 you make another version visible; this behavior needs to be modified.
742
743 \paragraph{Linker symbols} As we increase the amount of information in
744 PackageId, it's important to be careful about the length of these IDs,
745 as they are used for exported linker symbols (e.g.
746 \verb|base_TextziReadziLex_zdwvalDig_info|). Very long symbol names
747 hurt compile and link time, object file sizes, GHCi startup time,
748 dynamic linking, and make gdb hard to use. As such, the current plan is
749 to do away with full package names and versions, and instead use just a
750 base-62 encoded hash, perhaps with the first four characters of the package
751 name for user-friendliness.
752
753 \section{Shapeless Backpack}\label{sec:simplifying-backpack}
754
755 Backpack as currently defined always requires a \emph{shaping} pass,
756 which calculates the shapes of all modules defined in a package.
757 The shaping pass is critical to the solution of the double-vision problem
758 in recursive module linking, but it also presents a number of unpalatable
759 implementation problems:
760
761 \begin{itemize}
762
763 \item \emph{Shaping is a lot of work.} A module shape specifies the
764 providence of all data types and identifiers defined by a
765 module. To calculate this, we must preprocess and parse all
766 modules, even before we do the type-checking pass. (Fortunately,
767 shaping doesn't require a full parse of a module, only enough
768 to get identifiers. However, it does have to understand import
769 statements at the same level of detail as GHC's renamer.)
770
771 \item \emph{Shaping must be done upfront.} In the current Backpack
772 design, all shapes must be computed before any typechecking can
773 occur. While performing the shaping pass upfront is necessary
774 in order to solve the double vision problem (where a module
775 identity may be influenced by later definitions), it means
776 that GHC must first do a shaping pass, and then revisit every module and
777 compile them proper. Nor is it (easily) possible to skip the
778 shaping pass when it is unnecessary, as one might expect to be
779 the case in the absence of mutual recursion. Shaping is not
780 a ``pay as you go'' language feature.
781
782 \item \emph{GHC can't compile all programs shaping accepts.} Shaping
783 accepts programs that GHC, with its current hs-boot mechanism, cannot
784 compile. In particular, GHC requires that any data type or function
785 in a signature actually be \emph{defined} in the module corresponding
786 to that file (i.e., an original name can be assigned to these entities
787 immediately.) Shaping permits unrestricted exports to implement
788 modules; this shows up in the formalism as $\beta$ module variables.
789
790 \item \emph{Shaping encourages inefficient program organization.}
791 Shaping is designed to enable mutually recursive modules, but as
792 currently implemented, mutual recursion is less efficient than
793 code without recursive dependencies. Programmers should avoid
794 this code organization, except when it is absolutely necessary.
795
796 \item \emph{GHC is architecturally ill-suited for directly
797 implementing shaping.} Shaping implies that GHC's internal
798 concept of an ``original name'' be extended to accommodate
799 module variables. This is an extremely invasive change to all
800 aspects of GHC, since the original names assumption is baked
801 quite deeply into the compiler. Plausible implementations of
802 shaping requires all these variables to be skolemized outside
803 of GHC\@.
804
805 \end{itemize}
806
807 To be clear, the shaping pass is fundamentally necessary for some
808 Backpack packages. Here is the example which convinced Simon:
809
810 \begin{verbatim}
811 package p where
812 A :: [data T; f :: T -> T]
813 B = [export T(MkT), h; import A(f); data T = MkT; h x = f MkT]
814 A = [export T(MkT), f, h; import B; f MkT = MkT]
815 \end{verbatim}
816
817 The key to this example is that B \emph{may or may not typecheck} depending
818 on the definition of A. Because A reexports B's definition T, B will
819 typecheck; but if A defined T on its own, B would not typecheck. Thus,
820 we \emph{cannot} typecheck B until we have done some analysis of A (the
821 shaping analysis!)
822
823 Thus, it is beneficial (from an optimization point of view) to
824 consider a subset of Backpack for which shaping is not necessary.
825 Here is a programming discipline which does just that, which we will call the \textbf{linking restriction}: \emph{Module implementations must be declared before
826 signatures.} Formally, this restriction modifies the rule for merging
827 polarized module shapes ($\widetilde{\tau}_1^{m_1} \oplus \widetilde{\tau}_2^{m_2}$) so that
828 $\widetilde{\tau}_1^- \oplus \widetilde{\tau}_2^+$ is always undefined.\footnote{This seemed to be the crispest way of defining the restriction, although this means an error happens a bit later than I'd like it to: I'd prefer if we errored while merging logical contexts, but we don't know what is a hole at that point.}
829
830 Here is an example of the linking restriction. Consider these two packages:
831
832 \begin{verbatim}
833 package random where
834 System.Random = [ ... ].hs
835
836 package monte-carlo where
837 System.Random :: ...
838 System.MonteCarlo = [ ... ].hs
839 \end{verbatim}
840
841 Here, random is a definite package which may have been compiled ahead
842 of time; monte-carlo is an indefinite package with a dependency
843 on any package which provides \verb|System.Random|.
844
845 Now, to link these two applications together, only one ordering
846 is permissible:
847
848 \begin{verbatim}
849 package myapp where
850 include random
851 include monte-carlo
852 \end{verbatim}
853
854 If myapp wants to provide its own random implementation, it can do so:
855
856 \begin{verbatim}
857 package myapp2 where
858 System.Random = [ ... ].hs
859 include monte-carlo
860 \end{verbatim}
861
862 In both cases, all of \verb|monte-carlo|'s holes have been filled in by the time
863 it is included. The alternate ordering is not allowed.
864
865 Why does this discipline prevent mutually recursive modules? Intuitively,
866 a hole is the mechanism by which we can refer to an implementation
867 before it is defined; otherwise, we can only refer to definitions which
868 preceed our definition. If there are never any holes \emph{which get filled},
869 implementation links can only go backwards, ruling out circularity.
870
871 It's easy to see how mutual recursion can occur if we break this discipline:
872
873 \begin{verbatim}
874 package myapp2 where
875 include monte-carlo
876 System.Random = [ import System.MonteCarlo ].hs
877 \end{verbatim}
878
879 \subsection{Typechecking of definite modules without shaping}
880
881 If we are not carrying out a shaping pass, we need to be able to calculate
882 $\widetilde{\Xi}_{\mathsf{pkg}}$ on the fly. In the case that we are
883 compiling a package---there will be no holes in the final package---we
884 can show that shaping is unnecessary quite easily, since with the
885 linking restriction, everything is definite from the get-go.
886
887 Observe the following invariant: at any given step of the module
888 bindings, the physical context $\widetilde{\Phi}$ contains no
889 holes. We can thus conclude that there are no module variables in any
890 type shapes. As the only time a previously calculated package shape can
891 change is due to unification, the incrementally computed shape is in
892 fact the true one.
893
894 As far as the implementation is concerned, we never have to worry
895 about handling module variables; we only need to do extra typechecks
896 against (renamed) interface files.
897
898 \subsection{Compiling definite packages}\label{sec:compiling}
899
900 % New definitions
901 \algnewcommand\algorithmicswitch{\textbf{switch}}
902 \algnewcommand\algorithmiccase{\textbf{case}}
903 \algnewcommand\algorithmicassert{\texttt{assert}}
904 % New "environments"
905 \algdef{SE}[SWITCH]{Switch}{EndSwitch}[1]{\algorithmicswitch\ #1\ \algorithmicdo}{\algorithmicend\ \algorithmicswitch}%
906 \algdef{SE}[CASE]{Case}{EndCase}[1]{\algorithmiccase\ ``#1''}{\algorithmicend\ \algorithmiccase}%
907 \algtext*{EndSwitch}%
908 \algtext*{EndCase}%
909
910 \begin{algorithm}
911 \caption{Compilation of definite packages (assume \texttt{-hide-all-packages} on all \texttt{ghc} invocations)}\label{alg:compile}
912 \begin{algorithmic}
913 \Procedure{Compile}{\textbf{package} $P$ \textbf{where} $\vec{B}$, $H$, $flags_P$}\Comment{}$H$ maps hole module names to identities
914 \State$flags\gets flags_P$
915 \State$\mathcal{K}$ $\gets$ \Call{Hash}{$P + H$}
916 \State$cwd\gets \textrm{fresh working directory for compilation}$
917 \For{$B$ \textbf{in} $\vec{B}$}
918 \Case{$p = p\texttt{.hs}$}
919 \State\Call{Exec}{\texttt{ghc -c} $p$\texttt{.hs} \texttt{-package-name} $\mathcal{K}$ $flags$}
920 \EndCase%
921 \Case{$p$ $\cc$ $p$\texttt{.hsig}}
922 \State\Call{Exec}{\texttt{ghc -c} $p$\texttt{.hsig} \texttt{--sig-of} $H(p)$ $flags$}
923 \EndCase%
924 \Case{$p = p'$}
925 \State$flags\gets flags$ \texttt{-alias} $p$ $p'$
926 \State\Call{Exec}{\texttt{ghc --check} $flags$}
927 \EndCase%
928 \Case{\Cinc{$P'$} $\langle\vec{p_H\mapsto p_H'}, \vec{p\mapsto p'} \rangle$}
929 \State\textbf{let} $H'(p_H) = $ \Call{Exec}{\texttt{ghc --resolve-module} $p_H'$ $flags$}
930 \State$\mathcal{K}'\gets$ \Call{Compile}{$P'$, $H'$, $flags_P$ \texttt{-package-dir} $\mathcal{K}$ $cwd$ $\langle\rangle$}\Comment{}Nota bene: not $flags$
931 \State$flags\gets flags$ \texttt{-package} $\mathcal{K}'$ $\langle\vec{p\mapsto p'}\rangle$
932 \State\Call{Exec}{\texttt{ghc --check} $flags$}
933 \EndCase%
934 \EndFor%
935 \State\Call{Exec}{\texttt{ghc-pkg copy \&\& ghc-pkg register}} \Comment{Records if $P$ exports a thinning}
936 \State\Return$\mathcal{K}$
937 \EndProcedure%
938 \end{algorithmic}
939 \end{algorithm}
940
941 The full recursive procedure for compiling a Backpack package is given
942 in Figure~\ref{alg:compile}. We recursively walk through Backpack descriptions,
943 processing each line by invoking GHC and/or modifying our package state.
944 Here is a more in-depth description of the algorith, line-by-line:
945
946 \paragraph{The parameters} To compile a package description for package
947 $P$, we need to know $H$, the mapping of holes $p_H$ in package $P$ to
948 physical module identities $\nu$ which are implementing them; this
949 mapping is used to calculate the package key $\mathcal{K}$ for the
950 package in question. Furthermore, because we are running an actual
951 compiler, and some of these implementations may not be in the package
952 database, we also need an extra set of flags $flags_P$ which tells us
953 where to find the interface and object files of any parent packages
954 which are currently compiling.
955
956 \subsection{Compiling implementations}
957
958 We compile modules in the same way we do today, but with some extra
959 package visibility $flags$, which let GHC know how to resolve imports
960 and look up original names. We'll describe what the new flags are
961 and also discuss some subtleties with module lookup.
962
963 \paragraph{New package resolution algorithm} Currently, GHC has a
964 complicated mechanism for processing \texttt{-package} (and related
965 flags), motivated by the fact that without any packages, GHC would like
966 try to make available \emph{all} packages in the package database. Here
967 is a major simplification we can make: let's assume that \emph{no
968 packages are available by default.} Now process each package flag in
969 turn, building up a set of selected packages which is initially empty.
970
971 For \texttt{-package} and \texttt{-package-id}, get the set of installed
972 packages which match the flag. (If multiple package names apply, process
973 each in turn as if it were a separate flag.) Compute the transitive
974 closure of dependencies for all of them, and filter out all choices
975 which have dependencies which are inconsistent with the current set of
976 selected packages. In the current GHC world, a dependency is consistent
977 with the selected packages if there is no package with the same package
978 name but different installed package database in the selected package
979 database. Thus, in \texttt{-package} \pname{foo-0.2} \texttt{-package} \pname{bar},
980 selections of \pname{bar} which depended on \pname{foo-0.1} would be excluded.
981 In Backpack, we will make the model is a little more sophisticated.
982
983 If there is still more than one choice, tiebreak by version, which
984 database and time of install. (The latter tiebreak should not be
985 necessary unless there are multiple instances of a package with the same
986 package ID, as might be the case when their dependencies differ.)
987 If there are no choices, error, unless the particular package that was
988 asked for is an older version of a package already in the set
989 (e.g., \texttt{-package} \pname{containers-0.9} \texttt{-package} \pname{containers-0.8}).
990
991 \paragraph{Thinning and renaming on packages} We augment the syntax of
992 \texttt{-package} to also accept a list of thinnings and renamings, e.g.
993 \texttt{-package} \pname{containers} $\langle\m{Data.Set}, \m{Data.Map}\mapsto \m{Map}\rangle$ says to make visible for import \m{Data.Set} and
994 \m{Map} (which is \m{Data.Map} renamed.)
995 With thinning and renaming, we have to slightly augment the package resolution
996 algorithm in two ways. First, as we are processing \texttt{-package} flags,
997 we now need to concurrently maintain a mapping of visible module bindings.
998 We either put all exposed modules in these mapping, or just the ones mentioned
999 by the thinning and the renaming.
1000
1001 Furthermore, it makes sense to refine our notion of conflict to apply at
1002 the module level. Rather than maintain a mapping of package names to
1003 package choices, we instead look at whether or not there are any
1004 \emph{module binding} conflicts. Thus, it would make sense to say
1005 \texttt{-package} \pname{containers-0.9} $\langle\m{Data.Set}\mapsto
1006 \m{Set09}\rangle$ \texttt{-package} \pname{containers-0.8}
1007 $\langle\m{Data.Set}\mapsto \m{Set08}\rangle$ and then use both modules
1008 concurrently, since the modules no longer are mapped onto the same name.
1009 This is different from existing GHC behavior, where two packages which
1010 export the same module can be loaded, but now GHC gives an ambiguous import
1011 error if you try to import the module. (Perhaps this could still be done
1012 here, by reporting errors lazily.)
1013
1014 Additionally, it's important to note that two packages exporting the
1015 same module do not \emph{necessarily} cause a conflict; the modules
1016 may link together. For example, \texttt{-package} \pname{containers} $\langle\m{Data.Set}\rangle$
1017 \texttt{-package} \pname{containers} $\langle\m{Data.Set}\rangle$ is fine, because
1018 precisely the same implementation of \m{Data.Set} is loaded in both cases.
1019 A similar situation can occur with signatures:
1020
1021 \begin{verbatim}
1022 package p where
1023 A :: [ x :: Int ]
1024 package q
1025 include p
1026 A :: [ y :: Int ]
1027 B = [ import A; z = x + y ] -- *
1028 package r where
1029 A = [ x = 0; y = 0 ]
1030 include q
1031 \end{verbatim}
1032
1033 Here, both \pname{p} and \pname{q} are visible when compiling the starred
1034 module, which compiles with the flags \texttt{-package} \pname{p}, but there
1035 are two interface files available: one available locally, and one from \pname{p}.
1036 Both of these interface files are \emph{forwarding} to the original implementation
1037 \pname{r} (more on this in the ``Compiling signatures'' file), so rather than
1038 reporting an ambiguous import, we instead have to merge the two interface files
1039 together and use the result as the interface for the module. (This could be done
1040 on the fly, or we could generate merged interface files as we go along.)
1041
1042 Note that we need to merge signatures with an implementation, just use the
1043 implementation interface. E.g.
1044
1045 \begin{verbatim}
1046 package p where
1047 A :: ...
1048 package q where
1049 A = ...
1050 include p
1051 B = [ import A ] -- *
1052 \end{verbatim}
1053
1054 Here, \m{A} is available both from \pname{p} and \pname{q}, but the use in the
1055 starred module should be done with respect to the full implementation.
1056
1057 \paragraph{The \texttt{-alias} flag} We introduce a new flag
1058 \texttt{-alias} for aliasing modules today. Aliasing is analogous to
1059 the merging that can occur when we include packages, but it also applies
1060 to modules which are locally defined. When we alias a module $p$ with
1061 $p'$, we require that $p'$ exists in the current module mapping, and then
1062 we attempt to add an entry for it at entry $p$. If there is no mapping for
1063 $p$, this succeeds; otherwise, we apply the same conflict resolution algorithm.
1064
1065 \paragraph{The \texttt{-package-dir} flag} We introduce a new flag \texttt{-package-dir}
1066 which takes a package key $\mathcal{K}$, a directory and a list of thinning/renamings
1067 as before. \texttt{-package-dir} works nearly the same as a \texttt{-package} flag,
1068 except that the package in question does \emph{not} have to already be installed in the
1069 package database. Instead, the files associated with the package are checked for in
1070 the directory in question. We will use this flag to refer to files in partially compiled
1071 packages which have not been installed to the package database.
1072
1073 \subsection{Compiling signatures}
1074
1075 Signature compilation is triggered when we compile a signature file.
1076 This mode similar to how we process \verb|hs-boot| files, except
1077 we pass an extra flag \verb|--sig-of| which specifies what the
1078 actual implementation of the signature is (according to our $H$
1079 mapping). This is guaranteed to exist, due to the linking
1080 restriction. (Additionally, the implementation will probably by find by looking
1081 up the package key against a \texttt{-package-dir} flag.) In this case, we output an \texttt{hisig} file which, for all declarations the
1082 signature exposes, forwards their definitions to the original
1083 implementation file. The intent is that any code in the current package
1084 which compiles against this signature will use this \texttt{hisig} file,
1085 not the original one \texttt{hi} file.
1086 For example, the \texttt{hisig} file produced when compiling the starred interface
1087 points to the implementation in package \pname{q}.
1088
1089 \begin{verbatim}
1090 package p where
1091 A :: ... -- *
1092 B = [ import A; ... ]
1093 package q where
1094 A = [ ... ]
1095 include p
1096 \end{verbatim}
1097
1098 \paragraph{Sometimes \texttt{hisig} is unnecessary}
1099 In the following package:
1100
1101 \begin{verbatim}
1102 package p where
1103 P = ...
1104 P :: ...
1105 \end{verbatim}
1106
1107 Paper Backpack specifies that we check the signature \m{P} against implementation
1108 \m{P}, but otherwise no changes are made (i.e., the signature does not narrow
1109 the implementation.) In this case, it is not necessary to generate an \texttt{hisig} file.
1110
1111 \paragraph{Multiple signatures} As a simplification, we assume that there
1112 is only one signature per logical name in a package. (This prevents
1113 us from expressing mutual recursion in signatures, but let's not worry
1114 about it for now.)
1115
1116 \paragraph{Restricted recursive modules ala hs-boot}\label{sec:hs-boot-restrict}
1117 When we compile an \texttt{hsig} file without any \texttt{--sig-of} flag (because
1118 no implementation is known), we fall back to old-style GHC mutual recursion.
1119 Na\"\i vely, a shaping pass would be necessary;
1120 so we adopt an existing constraint that
1121 already applies to hs-boot files: \emph{at the time we define a signature,
1122 we must know what the original name for all data types is}. In practice,
1123 GHC enforces this by stating that: (1) an hs-boot file must be
1124 accompanied with an implementation, and (2) the implementation must
1125 in fact define (and not reexport) all of the declarations in the signature.
1126 We can discover if a signature is intended to break a recursive module loop
1127 when we discover that $p\notin flags_H$; in this case, we fallback to the
1128 old hs-boot behavior. (Alternatively, the user can explicitly ask for it.)
1129
1130 Why does this not require a shaping pass? The reason is that the
1131 signature is not really polymorphic: we require that the $\alpha$ module
1132 variable be resolved to a concrete module later in the same package, and
1133 that all the $\beta$ module variables be unified with $\alpha$. Thus, we
1134 know ahead of time the original names and don't need to deal with any
1135 renaming.\footnote{This strategy doesn't completely resolve the problem
1136 of cross-package mutual recursion, because we need to first compile a
1137 bit of the first package (signatures), then the second package, and then
1138 the rest of the first package.}
1139
1140 Compiling packages in this way gives the tantalizing possibility
1141 of true separate compilation: the only thing we don't know is what the actual
1142 package name of an indefinite package will be, and what the correct references
1143 to have are. This is a very minor change to the assembly, so one could conceive
1144 of dynamically rewriting these references at the linking stage. But
1145 separate compilation achieved in this fashion would not be able to take
1146 advantage of cross-module optimizations.
1147
1148 \subsection{Compiling aliases}
1149
1150 Aliasing simply adds an extra \texttt{-alias} flag to the compilation flags. To
1151 eagerly report errors, we run a special command \texttt{ghc --check} which checks
1152 to make sure $flags$ is consistent (e.g., no linking conflicts.)
1153
1154 \subsection{Compiling includes}
1155
1156 Includes are the most interesting part of the compilation process, as we have
1157 calculate how the holes of the subpackage we are filling in are compiled $H'$
1158 and modify our flags to make the exports of the include visible to subsequently
1159 compiled modules. We consider the case with renaming, since includes with
1160 no renaming are straightforward.
1161
1162 First, we assume that we know \emph{a priori} what the holes of a
1163 package $p_H$ are (either by some sort of pre-pass, or explicit
1164 declaration.) For each of their \emph{renamed targets} $p'_H$, we look
1165 up the module in the current $flags$ environment, retrieving the
1166 physical module identity by consulting GHC with the
1167 \texttt{--resolve-module} flag and storing it in $H'$. (This can be done in batch.)
1168 For example:
1169
1170 \begin{verbatim}
1171 package p where
1172 A :: ...
1173 ...
1174 package q where
1175 A = [ ... ]
1176 B = [ ... ]
1177 include p (A as B)
1178 \end{verbatim}
1179
1180 When computing the entry $H(\pname{A})$, we run the command \texttt{ghc --resolve-module} \pname{B}.
1181
1182 Next, we recursively call \textsc{Compile} with the computed $H'$,
1183 but registering the current working directory as the place to look
1184 for any original names containing the parent package key $\mathcal{K}$.
1185 For example, in this situation:
1186
1187 \begin{verbatim}
1188 package p where
1189 B :: ...
1190 C = [ import B; ... ]
1191 package q where
1192 A = [ ... ]
1193 B = [ import A; ... ]
1194 include p
1195 D = [ import C; ... ]
1196 \end{verbatim}
1197
1198 When we recursively process package \pname{p}, $H$ will refer to \pname{q}:\m{B}, and we need
1199 to know where to find it (\pname{q} is only partially processed and so not installed
1200 in the package database.) Furthermore, the interface file for \m{B} may refer to \pname{q}:\m{A},
1201 and thus we likewise need to know how to find its interface file.
1202
1203 Note that we do \emph{not} import any of the modules (the
1204 thinning/renaming is empty, masking all modules); we only want to
1205 register the package key in the in-memory database. Otherwise, this
1206 example would improperly compile:
1207
1208 \begin{verbatim}
1209 package p where
1210 B = [ import A; ... ]
1211 package q where
1212 A = ...
1213 include p
1214 \end{verbatim}
1215
1216 \pname{p} does not compile on its own, so it should not compile if it is recursively
1217 invoked from \pname{q}. However, if we add \texttt{-package-dir} \pname{q} $cwd$ to
1218 the flags without masking out all modules, \m{A} is now suddenly resolvable.
1219
1220 Finally, once the subpackage is compiled, we can add it to our $flags$ so later
1221 modules we compile see its (appropriately thinned and renamed) modules, and like
1222 aliasing, we eagerly check it for consistency. This consistency check would catch
1223 the ambiguity in this example:
1224
1225 \begin{verbatim}
1226 package p where
1227 A = ...
1228 package q where
1229 A = ...
1230 include p
1231 \end{verbatim}
1232
1233 (Note that \pname{p} is a perfectly good package on its own, it is only its inclusion in \pname{q} which is problematic.)
1234
1235 If we adopt the system where we merge signature files as we go along, this checking
1236 phase would also be a good time to merge any \texttt{hisig} files from the subpackage
1237 with any in the current $flags$ environment.
1238
1239 \paragraph{Absence of an \texttt{hi} file}
1240 It is important that \texttt{--resolve-module} truly looks up the \emph{implementor}
1241 of a module, and not just a signature which is providing it at some name.
1242 Sometimes, a little extra work is necessary to compute this, for example:
1243
1244 \begin{verbatim}
1245 package p where
1246 A :: [ y :: Int ]
1247 package q where
1248 A :: [ x :: Int ]
1249 include p -- *
1250 package r where
1251 A = [ x = 0; y = 1 ]
1252 include q
1253 \end{verbatim}
1254
1255 When computing $H'$ for the starred include, our $flags$ only include
1256 \texttt{-package-dir} \pname{r} $cwd_r$ $\langle\rangle$: with a thinning
1257 that excludes all modules! The only interface file we can pick up with these
1258 $flags$ is the local definition of \m{A}. However, we \emph{absolutely}
1259 should set $H'(\m{A})=\pname{q}:\m{A}$; if we do so, then we will incorrectly
1260 conclude when compiling the signature in \pname{p} that the implementation
1261 doesn't export enough identifiers to fulfill the signature (\texttt{y} is not
1262 available from just the signature in \pname{q}). Instead, we have to look
1263 up the original implementor of \m{A} in \pname{r}, and use that in $H'$.
1264
1265 \subsection{Commentary}
1266
1267 \paragraph{Just because it compiled, doesn't mean the individual packages type check}
1268 The compilation mechanism described is slightly more permissive than vanilla Backpack.
1269 Here is a simple example:
1270
1271 \begin{verbatim}
1272 package p where
1273 A :: [ data T = T ]
1274 B :: [ data T = T ]
1275 C = [
1276 import A
1277 import B
1278 x = A.T :: B.T
1279 ]
1280 package q where
1281 A = [ data T = T ]
1282 B = A
1283 include p
1284 \end{verbatim}
1285
1286 Here, we incorrectly decide \m{A}\verb|.T| and \m{B}\verb|.T| are type
1287 equal when typechecking \m{C}, because the \verb|hisig| files we
1288 generate for them all point to the same original implementation. However,
1289 \pname{p} should not typecheck.
1290
1291 The problem here is that type checking asks ``does it compile with respect
1292 to all possible instantiations of the holes'', whereas compilation asks
1293 ``does it compile with respect to this particular instantiation of holes.''
1294 It's a bit unavoidable, really.
1295
1296 \section{Shaped Backpack}
1297
1298 Despite the simplicity of shapeless Backpack with the linking
1299 restriction in the absence of holes, we will find that when we have
1300 holes, it will be very difficult to do type-checking without
1301 some form of shaping. This section is very much a work in progress,
1302 but the ability to typecheck against holes, even with the linking restriction,
1303 is a very important part of modular separate development, so we will need
1304 to support it at some point.
1305
1306 \subsection{Efficient shaping}
1307
1308 (These are Edward's opinion, he hasn't convinced other folks that this is
1309 the right way to do it.)
1310
1311 In this section, I want to argue that, although shaping constitutes
1312 a pre-pass which must be run before compilation in earnest, it is only
1313 about as bad as the dependency resolution analysis that GHC already does
1314 in \verb|ghc -M| or \verb|ghc --make|.
1315
1316 In Paper Backpack, what information does shaping compute? It looks at
1317 exports, imports, data declarations and value declarations (but not the
1318 actual expressions associated with these values.) As a matter of fact,
1319 GHC already must look at the imports associated with a package in order
1320 to determine the dependency graph, so that it can have some order to compile
1321 modules in. There is a specialized parser which just parses these statements,
1322 and then ignores the rest of the file.
1323
1324 A bit of background: the \emph{renamer} is responsible for resolving
1325 imports and figuring out where all of these entities actually come from.
1326 SPJ would really like to avoid having to run the renamer in order to perform
1327 a shaping pass.
1328
1329 \paragraph{Is it necessary to run the Renamer to do shaping?}
1330 Edward and Scott believe the answer is no, well, partially.
1331 Shaping needs to know the original names of all entities exposed by a
1332 module/signature. Then it needs to know (a) which entities a module/signature
1333 defines/declares locally and (b) which entities that module/signature exports.
1334 The former, (a), can be determined by a straightforward inspection of a parse
1335 tree of the source file.\footnote{Note that no expression or type parsing
1336 is necessary. We only need names of local values, data types, and data
1337 constructors.} The latter, (b), is a bit trickier. Right now it's the Renamer
1338 that interprets imports and exports into original names, so we would still
1339 rely on that implementation. However, the Renamer does other, harder things
1340 that we don't need, so ideally we could factor out the import/export
1341 resolution from the Renamer for use in shaping.
1342
1343 Unfortunately the Renamer's import resolution analyzes \verb|.hi| files, but for
1344 local modules, which haven't yet been typechecked, we don't have those.
1345 Instead, we could use a new file format, \verb|.hsi| files, to store the shape of
1346 a locally defined module. (Defined packages are bundled with their shapes,
1347 so included modules have \verb|.hsi| files as well.) (What about the logical
1348 vs.~physical distinction in file names?) If we refactor the import/export
1349 resolution code, could we rewrite it to generically operate on both
1350 \verb|.hi| files and \verb|.hsi| files?
1351
1352 Alternatively, rather than storing shapes on a per-source basis, we could
1353 store (in memory) the entire package shape. Similarly, included packages
1354 could have a single shape file for the entire package. Although this approach
1355 would make shaping non-incremental, since an entire package's shape would
1356 be recomputed any time a constituent module's shape changes, we do not expect
1357 shaping to be all that expensive.
1358
1359 \subsection{Typechecking of indefinite modules}\label{sec:typechecking-indefinite}
1360
1361 Recall in our argument in the definite case, where we showed there are
1362 no holes in the physical context. With indefinite modules, this is no
1363 longer true. While (with the linking restriction) these holes will never
1364 be linked against a physical implementation, they may be linked against
1365 other signatures. (Note: while disallowing signature linking would
1366 solve our problem, it would disallow a wide array of useful instances of
1367 signature reuse, for example, a package mylib that implements both
1368 mylib-1x-sig and mylib-2x-sig.)
1369
1370 With holes, we must handle module variables, and we sometimes must unify them:
1371
1372 \begin{verbatim}
1373 package p where
1374 A :: [ data A ]
1375 package q where
1376 A :: [ data A ]
1377 package r where
1378 include p
1379 include q
1380 \end{verbatim}
1381
1382 In this package, it is not possible to a priori assign original names to
1383 module A in p and q, because in package r, they should have the same
1384 original name. When signature linking occurs, unification may occur,
1385 which means we have to rename all relevant original names. (A similar
1386 situation occurs when a module is typechecked against a signature.)
1387
1388 An invariant which would be nice to have is this: when typechecking a
1389 signature or including a package, we may apply renaming to the entities
1390 being brought into context. But once we've picked an original name for
1391 our entities, no further renaming should be necessary. (Formally, in the
1392 unification for semantic object shapes, apply the unifier to the second
1393 shape, but not the first one.)
1394
1395 However, there are plenty of counterexamples here:
1396
1397 \begin{verbatim}
1398 package p where
1399 A :: [ data A ]
1400 B :: [ data A ]
1401 M = ...
1402 A = B
1403 \end{verbatim}
1404
1405 In this package, does module M know that A.A and B.A are type equal? In
1406 fact, the shaping pass will have assigned equal module identities to A
1407 and B, so M \emph{equates these types}, despite the aliasing occurring
1408 after the fact.
1409
1410 We can make this example more sophisticated, by having a later
1411 subpackage which causes the aliasing; now, the decision is not even a
1412 local one (on the other hand, the equality should be evident by inspection
1413 of the package interface associated with q):
1414
1415 \begin{verbatim}
1416 package p where
1417 A :: [ data A ]
1418 B :: [ data A ]
1419 package q where
1420 A :: [ data A ]
1421 B = A
1422 package r where
1423 include p
1424 include q
1425 \end{verbatim}
1426
1427 Another possibility is that it might be acceptable to do a mini-shaping
1428 pass, without parsing modules or signatures, \emph{simply} looking at
1429 names and aliases. But logical names are not the only mechanism by
1430 which unification may occur:
1431
1432 \begin{verbatim}
1433 package p where
1434 C :: [ data A ]
1435 A = [ data A = A ]
1436 B :: [ import A(A) ]
1437 C = B
1438 \end{verbatim}
1439
1440 It is easy to conclude that the original names of C and B are the same. But
1441 more importantly, C.A must be given the original name of p:A.A. This can only
1442 be discovered by looking at the signature definition for B. In any case, it
1443 is worth noting that this situation parallels the situation with hs-boot
1444 files (although there is no mutual recursion here).
1445
1446 The conclusion is that you will probably, in fact, have to do real
1447 shaping in order to typecheck all of these examples.
1448
1449 \paragraph{Hey, these signature imports are kind of tricky\ldots}
1450
1451 When signatures and modules are interleaved, the interaction can be
1452 complex. Here is an example:
1453
1454 \begin{verbatim}
1455 package p where
1456 C :: [ data A ]
1457 M = [ import C; ... ]
1458 A = [ import M; data A = A ]
1459 C :: [ import A(A) ]
1460 \end{verbatim}
1461
1462 Here, the second signature for C refers to a module implementation A
1463 (this is permissible: it simply means that the original name for p:C.A
1464 is p:A.A). But wait! A relies on M, and M relies on C. Circularity?
1465 Fortunately not: a client of package p will find it impossible to have
1466 the hole C implemented in advance, since they will need to get their hands on module
1467 A\ldots but it will not be defined prior to package p.
1468
1469 In any case, however, it would be good to emit a warning if a package
1470 cannot be compiled without mutual recursion.
1471
1472 \subsection{Rename on entry}
1473
1474 Consider the following example:
1475
1476 \begin{verbatim}
1477 package p where
1478 A :: [ data T = T ]
1479 B = [ import A; x = T ]
1480 package q where
1481 C :: ...
1482 A = [ data T = T ]
1483 include p
1484 D = [
1485 import qualified A
1486 import qualified B
1487 import C
1488 x = B.T :: A.T
1489 ]
1490 \end{verbatim}
1491
1492 We are interested in type-checking \pname{q}, which is an indefinite package
1493 on account of the uninstantiated hole \m{C}. Furthermore, let's suppose that
1494 \pname{p} has already been independently typechecked, and its interface files
1495 installed in some global location with $\alpha_A$ used as the module identity
1496 of \m{A}. (To simplify this example, we'll assume $\beta_{AT}=\alpha_A$.)
1497
1498 The first three lines of \pname{q} type check in the normal way, but \m{D}
1499 now poses a problem: if we load the interface file for \m{B} the normal way,
1500 we will get a reference to type \texttt{T} with the original name $\alpha_A$.\texttt{T},
1501 whereas from \m{A} we have an original name \pname{q}:\m{A}.\texttt{T}.
1502
1503 Let's suppose that we already have the result of a shaping pass, which
1504 maps our identity variables to their true identities.
1505 Let's consider the possible options here:
1506
1507 \begin{itemize}
1508 \item We could re-typecheck \pname{p}, feeding it the correct instantiations
1509 for its variables. However, this seems wasteful: we typechecked the
1510 package already, and up-to-renaming, the interface files are exactly
1511 what we need to type check our application.
1512 \item We could make copies of all the interface files, renamed to have the
1513 right original names. This also seems wasteful: why should we have to
1514 create a new copy of every interface file in a library we depend on?
1515 \item When \emph{reading in} the interface file to GHC, we could apply the
1516 renaming according to the shaping pass and store that in memory.
1517 \end{itemize}
1518
1519 That last solution is pretty appealing, however, there are still circumstances
1520 we need to create new interface files; these exactly mirror the cases described
1521 in Section~\ref{sec:compiling}.
1522
1523 \subsection{Incremental typechecking}
1524 We want to typecheck modules incrementally, i.e., when something changes in
1525 a package, we only want to re-typecheck the modules that care about that
1526 change. GHC already does this today.%
1527 \footnote{\url{https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/RecompilationAvoidance}}
1528 Is the same mechanism sufficient for Backpack? Edward and Scott think that it
1529 is, mostly. Our conjecture is that a module should be re-typechecked if the
1530 existing mechanism says it should \emph{or} if the logical shape
1531 context (which maps logical names to physical names) has changed. The latter
1532 condition is due to aliases that affect typechecking of modules.
1533
1534 Let's look again at an example from before:
1535 \begin{verbatim}
1536 package p where
1537 A :: [ data A ]
1538 B :: [ data A ]
1539 M = [ import A; import B; ... ]
1540 \end{verbatim}
1541 Let's say that \verb|M| is typechecked successfully. Now we add an alias binding
1542 at the end of the package, \verb|A = B|. Does \verb|M| need to be
1543 re-typechecked? Yes! (Well, it seems so, but let's just assert ``yes'' for now.
1544 Certainly in the reverse case---if we remove the alias and then ask---this
1545 is true, since \verb|M| might have depended on the two \verb|A| types
1546 being the same.)
1547 The logical shape context changed to say that \verb|A| and
1548 \verb|B| now map to the same physical module identity. But does the existing
1549 recompilation avoidance mechanism say that \verb|M| should be re-typechecked?
1550 It's unclear. The \verb|.hi| file for \verb|M| records that it imported \verb|A| and
1551 \verb|B| with particular ABIs, but does it also know about the physical module
1552 identities (or rather, original module names) of these modules?
1553
1554 Scott thinks this highlights the need for us to get our story straight about
1555 the connection between logical names, physical module identities, and file
1556 names!
1557
1558
1559 \subsection{Installing indefinite packages}\label{sec:installing-indefinite}
1560
1561 If an indefinite package contains no code at all, we only need
1562 to install the interface file for the signatures. However, if
1563 they include code, we must provide all of the
1564 ingredients necessary to compile them when the holes are linked against
1565 actual implementations. (Figure~\ref{fig:pkgdb})
1566
1567 \paragraph{Source tarball or preprocessed source?} What is the representation of the source that is saved is. There
1568 are a number of possible choices:
1569
1570 \begin{itemize}
1571 \item The original tarballs downloaded from Hackage,
1572 \item Preprocessed source files,
1573 \item Some sort of internal, type-checked representation of Haskell code (maybe the output of the desugarer).
1574 \end{itemize}
1575
1576 Storing the tarballs is the simplest and most straightforward mechanism,
1577 but we will have to be very certain that we can recompile the module
1578 later in precisely the same we compiled it originally, to ensure the hi
1579 files match up (fortunately, it should be simple to perform an optional
1580 sanity check before proceeding.) The appeal of saving preprocessed
1581 source, or even the IRs, is that this is conceptually this is exactly
1582 what an indefinite package is: we have paused the compilation process
1583 partway, intending to finish it later. However, our compilation strategy
1584 for definite packages requires us to run this step using a \emph{different}
1585 choice of original names, so it's unclear how much work could actually be reused.
1586
1587 \section{Surface syntax}
1588
1589 In the Backpack paper, a brand new module language is presented, with
1590 syntax for inline modules and signatures. This syntax is probably worth implementing,
1591 because it makes it easy to specify compatibility packages, whose module
1592 definitions in general may be very short:
1593
1594 \begin{verbatim}
1595 package ishake-0.12-shake-0.13 where
1596 include shake-0.13
1597 Development.Shake.Sys = Development.Shake.Cmd
1598 Development.Shake = [ (**>) = (&>) ; (*>>) = (|*>)]
1599 Development.Shake.Rule = [ defaultPriority = rule . priority 0.5 ]
1600 include ishake-0.12
1601 \end{verbatim}
1602
1603 However, there are a few things that are less than ideal about the
1604 surface syntax proposed by Paper Backpack:
1605
1606 \begin{itemize}
1607 \item It's completely different from the current method users
1608 specify packages. There's nothing wrong with this per se
1609 (one simply needs to support both formats) but the smaller
1610 the delta, the easier the new packaging format is to explain
1611 and implement.
1612
1613 \item Sometimes order matters (relative ordering of signatures and
1614 module implementations), and other times it does not (aliases).
1615 This can be confusing for users.
1616
1617 \item Users have to order module definitions topologically,
1618 whereas in current Cabal modules can be listed in any order, and
1619 GHC figures out an appropriate order to compile them.
1620 \end{itemize}
1621
1622 Here is an alternative proposal, closely based on Cabal syntax. Given
1623 the following Backpack definition:
1624
1625 \begin{verbatim}
1626 package libfoo(A, B, C, Foo) where
1627 include base
1628 -- renaming and thinning
1629 include libfoo (Foo, Bar as Baz)
1630 -- holes
1631 A :: [ a :: Bool ].hsig
1632 A2 :: [ b :: Bool ].hsig
1633 -- normal module
1634 B = [
1635 import {-# SOURCE #-} A
1636 import Foo
1637 import Baz
1638 ...
1639 ].hs
1640 -- recursively linked pair of modules, one is private
1641 C :: [ data C ].hsig
1642 D = [ import {-# SOURCE #-} C; data D = D C ].hs
1643 C = [ import D; data C = C D ].hs
1644 -- alias
1645 A = A2
1646 \end{verbatim}
1647
1648 We can write the following Cabal-like syntax instead (where
1649 all of the signatures and modules are placed in appropriately
1650 named files):
1651
1652 \begin{verbatim}
1653 package: libfoo
1654 ...
1655 build-depends: base, libfoo (Foo, Bar as Baz)
1656 holes: A A2 -- deferred for now
1657 exposed-modules: Foo B C
1658 aliases: A = A2
1659 other-modules: D
1660 \end{verbatim}
1661
1662 Notably, all of these lists are \emph{insensitive} to ordering!
1663 The key idea is use of the \verb|{-# SOURCE #-}| pragma, which
1664 is enough to solve the important ordering constraint between
1665 signatures and modules.
1666
1667 Here is how the elaboration works. For simplicity, in this algorithm
1668 description, we assume all packages being compiled have no holes
1669 (including \verb|build-depends| packages). Later, we'll discuss how to
1670 extend the algorithm to handle holes in both subpackages and the main
1671 package itself.
1672
1673 \begin{enumerate}
1674
1675 \item At the top-level with \verb|package| $p$ and
1676 \verb|exposed-modules| $ms$, record \verb|package p (ms) where|
1677
1678 \item For each package $p$ with thinning/renaming $ms$ in
1679 \verb|build-depends|, record a \verb|include p (ms)| in the
1680 Backpack package. The ordering of these includes does not
1681 matter, since none of these packages have holes.
1682
1683 \item Take all modules $m$ in \verb|other-modules| and
1684 \verb|exposed-modules| which were not exported by build
1685 dependencies, and create a directed graph where hs and hs-boot
1686 files are nodes and imports are edges (the target of an edge is
1687 an hs file if it is a normal import, and an hs-boot file if it
1688 is a SOURCE import). Topologically sort this graph, erroring if
1689 this graph contains cycles (even with recursive modules, the
1690 cycle should have been broken by an hs-boot file). For each
1691 node, in this order, record \verb|M = [ ... ]| or \verb|M :: [ ... ]|
1692 depending on whether or not it is an hs or hs-boot. If possible,
1693 sort signatures before implementations when there is no constraint
1694 otherwise.
1695
1696 \end{enumerate}
1697
1698 Here is a simple example which shows how SOURCE can be used to disambiguate
1699 between two important cases. Suppose we have these modules:
1700
1701 \begin{verbatim}
1702 -- A1.hs
1703 import {-# SOURCE #-} B
1704
1705 -- A2.hs
1706 import B
1707
1708 -- B.hs
1709 x = True
1710
1711 -- B.hs-boot
1712 x :: Bool
1713 \end{verbatim}
1714
1715 Then we translate the following packages as follows:
1716
1717 \begin{verbatim}
1718 exposed-modules: A1 B
1719 -- translates to
1720 B :: [ x :: Bool ]
1721 A1 = [ import B ]
1722 B = [ x = True ]
1723 \end{verbatim}
1724
1725 but
1726
1727 \begin{verbatim}
1728 exposed-modules: A2 B
1729 -- translates to
1730 B = [ x = True ]
1731 B :: [ x :: Bool ]
1732 A2 = [ import B ]
1733 \end{verbatim}
1734
1735 The import controls placement between signature and module, and in A1 it
1736 forces B's signature to be sorted before B's implementation (whereas in
1737 the second section, there is no constraint so we preferentially place
1738 the B's implementation first)
1739
1740 \paragraph{Holes in the database} In the presence of holes,
1741 \verb|build-depends| resolution becomes more complicated. First,
1742 let's consider the case where the package we are building is
1743 definite, but the package database contains indefinite packages with holes.
1744 In order to maintain the linking restriction, we now have to order packages
1745 from step (2) of the previous elaboration. We can do this by creating
1746 a directed graph, where nodes are packages and edges are from holes to the
1747 package which implements them. If there is a cycle, this indicates a mutually
1748 recursive package. In the absence of cycles, a topological sorting of this
1749 graph preserves the linking invariant.
1750
1751 One subtlety to consider is the fact that an entry in \verb|build-depends|
1752 can affect how a hole is instantiated by another entry. This might be a
1753 bit weird to users, who might like to explicitly say how holes are
1754 filled when instantiating a package. Food for thought, surface syntax wise.
1755
1756 \paragraph{Holes in the package} Actually, this is quite simple: the
1757 ordering of includes goes as before, but some indefinite packages in the
1758 database are less constrained as they're ``dependencies'' are fulfilled
1759 by the holes at the top-level of this package. It's also worth noting
1760 that some dependencies will go unresolved, since the following package
1761 is valid:
1762
1763 \begin{verbatim}
1764 package a where
1765 A :: ...
1766 package b where
1767 include a
1768 \end{verbatim}
1769
1770 \paragraph{Multiple signatures} In Backpack syntax, it's possible to
1771 define a signature multiple times, which is necessary for mutually
1772 recursive signatures:
1773
1774 \begin{verbatim}
1775 package a where
1776 A :: [ data A ]
1777 B :: [ import A; data B = B A ]
1778 A :: [ import B; data A = A B ]
1779 \end{verbatim}
1780
1781 Critically, notice that we can see the constructors for both module B and A
1782 after the signatures are linked together. This is not possible in GHC
1783 today, but could be possible by permitting multiple hs-boot files. Now
1784 the SOURCE pragma indicating an import must \emph{disambiguate} which
1785 hs-boot file it intends to include. This might be one way of doing it:
1786
1787 \begin{verbatim}
1788 -- A.hs-boot2
1789 data A
1790
1791 -- B.hs-boot
1792 import {-# SOURCE hs-boot2 #-} A
1793
1794 -- A.hs-boot
1795 import {-# SOURCE hs-boot #-} B
1796 \end{verbatim}
1797
1798 \paragraph{Explicit or implicit reexports} One annoying property of
1799 this proposal is that, looking at the \verb|exposed-modules| list, it is
1800 not immediately clear what source files one would expect to find in the
1801 current package. It's not obvious what the proper way to go about doing
1802 this is.
1803
1804 \paragraph{Better syntax for SOURCE} If we enshrine the SOURCE import
1805 as a way of solving Backpack ordering problems, it would be nice to have
1806 some better syntax for it. One possibility is:
1807
1808 \begin{verbatim}
1809 abstract import Data.Foo
1810 \end{verbatim}
1811
1812 which makes it clear that this module is pluggable, typechecking against
1813 a signature. Note that this only indicates how type checking should be
1814 done: when actually compiling the module we will compile against the
1815 interface file for the true implementation of the module.
1816
1817 It's worth noting that the SOURCE annotation was originally made a
1818 pragma because, in principle, it should have been possible to compile
1819 some recursive modules without needing the hs-boot file at all. But if
1820 we're moving towards boot files as signatures, this concern is less
1821 relevant.
1822
1823 \section{Type classes and type families}
1824
1825 \subsection{Background}
1826
1827 Before we talk about how to support type classes in Backpack, it's first
1828 worth talking about what we are trying to achieve in the design. Most
1829 would agree that \emph{type safety} is the cardinal law that should be
1830 preserved (in the sense that segfaults should not be possible), but
1831 there are many instances of ``bad behavior'' (top level mutable state,
1832 weakening of abstraction guarantees, ambiguous instance resolution, etc)
1833 which various Haskellers may disagree on the necessity of ruling out.
1834
1835 With this in mind, it is worth summarizing what kind of guarantees are
1836 presently given by GHC with regards to type classes and type families,
1837 as well as characterizing the \emph{cultural} expectations of the
1838 Haskell community.
1839
1840 \paragraph{Type classes} When discussing type class systems, there are
1841 several properties that one may talk about.
1842 A set of instances is \emph{confluent} if, no matter what order
1843 constraint solving is performed, GHC will terminate with a canonical set
1844 of constraints that must be satisfied for any given use of a type class.
1845 In other words, confluence says that we won't conclude that a program
1846 doesn't type check just because we swapped in a different constraint
1847 solving algorithm.
1848
1849 Confluence's closely related twin is \emph{coherence} (defined in ``Type
1850 classes: exploring the design space''). This property states that
1851 ``every different valid typing derivation of a program leads to a
1852 resulting program that has the same dynamic semantics.'' Why could
1853 differing typing derivations result in different dynamic semantics? The
1854 answer is that context reduction, which picks out type class instances,
1855 elaborates into concrete choices of dictionaries in the generated code.
1856 Confluence is a prerequisite for coherence, since one
1857 can hardly talk about the dynamic semantics of a program that doesn't
1858 type check.
1859
1860 In the vernacular, confluence and coherence are often incorrectly used
1861 to refer to another related property: \emph{global uniqueness of instances},
1862 which states that in a fully compiled program, for any type, there is at most one
1863 instance resolution for a given type class. Languages with local type
1864 class instances such as Scala generally do not have this property, and
1865 this assumption is frequently used for abstraction.
1866
1867 So, what properties does GHC enforce, in practice?
1868 In the absence of any type system extensions, GHC's employs a set of
1869 rules (described in ``Exploring the design space'') to ensure that type
1870 class resolution is confluent and coherent. Intuitively, it achieves
1871 this by having a very simple constraint solving algorithm (generate
1872 wanted constraints and solve wanted constraints) and then requiring the
1873 set of instances to be \emph{nonoverlapping}, ensuring there is only
1874 ever one way to solve a wanted constraint. Overlap is a
1875 more stringent restriction than either confluence or coherence, and
1876 via the \verb|OverlappingInstances| and \verb|IncoherentInstances|, GHC
1877 allows a user to relax this restriction ``if they know what they're doing.''
1878
1879 Surprisingly, however, GHC does \emph{not} enforce global uniqueness of
1880 instances. Imported instances are not checked for overlap until we
1881 attempt to use them for instance resolution. Consider the following program:
1882
1883 \begin{verbatim}
1884 -- T.hs
1885 data T = T
1886 -- A.hs
1887 import T
1888 instance Eq T where
1889 -- B.hs
1890 import T
1891 instance Eq T where
1892 -- C.hs
1893 import A
1894 import B
1895 \end{verbatim}
1896
1897 When compiled with one-shot compilation, \verb|C| will not report
1898 overlapping instances unless we actually attempt to use the \verb|Eq|
1899 instance in C.\footnote{When using batch compilation, GHC reuses the
1900 instance database and is actually able to detect the duplicated
1901 instance when compiling B. But if you run it again, recompilation
1902 avoidance skips A, and it finishes compiling! See this bug:
1903 \url{https://ghc.haskell.org/trac/ghc/ticket/5316}} This is by
1904 design\footnote{\url{https://ghc.haskell.org/trac/ghc/ticket/2356}}:
1905 ensuring that there are no overlapping instances eagerly requires
1906 eagerly reading all the interface files a module may depend on.
1907
1908 We might summarize these three properties in the following manner.
1909 Culturally, the Haskell community expects \emph{global uniqueness of instances}
1910 to hold: the implicit global database of instances should be
1911 confluent and coherent. GHC, however, does not enforce uniqueness of
1912 instances: instead, it merely guarantees that the \emph{subset} of the
1913 instance database it uses when it compiles any given module is confluent and coherent. GHC does do some
1914 tests when an instance is declared to see if it would result in overlap
1915 with visible instances, but the check is by no means
1916 perfect\footnote{\url{https://ghc.haskell.org/trac/ghc/ticket/9288}};
1917 truly, \emph{type-class constraint resolution} has the final word. One
1918 mitigating factor is that in the absence of \emph{orphan instances}, GHC is
1919 guaranteed to eagerly notice when the instance database has overlap.\footnote{Assuming that the instance declaration checks actually worked\ldots}
1920
1921 Clearly, the fact that GHC's lazy behavior is surprising to most
1922 Haskellers means that the lazy check is mostly good enough: a user
1923 is likely to discover overlapping instances one way or another.
1924 However, it is relatively simple to construct example programs which
1925 violate global uniqueness of instances in an observable way:
1926
1927 \begin{verbatim}
1928 -- A.hs
1929 module A where
1930 data U = X | Y deriving (Eq, Show)
1931
1932 -- B.hs
1933 module B where
1934 import Data.Set
1935 import A
1936
1937 instance Ord U where
1938 compare X X = EQ
1939 compare X Y = LT
1940 compare Y X = GT
1941 compare Y Y = EQ
1942
1943 ins :: U -> Set U -> Set U
1944 ins = insert
1945
1946 -- C.hs
1947 module C where
1948 import Data.Set
1949 import A
1950
1951 instance Ord U where
1952 compare X X = EQ
1953 compare X Y = GT
1954 compare Y X = LT
1955 compare Y Y = EQ
1956
1957 ins' :: U -> Set U -> Set U
1958 ins' = insert
1959
1960 -- D.hs
1961 module Main where
1962 import Data.Set
1963 import A
1964 import B
1965 import C
1966
1967 test :: Set U
1968 test = ins' X $ ins X $ ins Y $ empty
1969
1970 main :: IO ()
1971 main = print test
1972
1973 -- OUTPUT
1974 $ ghc -Wall -XSafe -fforce-recomp --make D.hs
1975 [1 of 4] Compiling A ( A.hs, A.o )
1976 [2 of 4] Compiling B ( B.hs, B.o )
1977
1978 B.hs:5:10: Warning: Orphan instance: instance [safe] Ord U
1979 [3 of 4] Compiling C ( C.hs, C.o )
1980
1981 C.hs:5:10: Warning: Orphan instance: instance [safe] Ord U
1982 [4 of 4] Compiling Main ( D.hs, D.o )
1983 Linking D ...
1984 $ ./D
1985 fromList [X,Y,X]
1986 \end{verbatim}
1987
1988 Locally, all type class resolution was coherent: in the subset of
1989 instances each module had visible, type class resolution could be done
1990 unambiguously. Furthermore, the types of \verb|ins| and \verb|ins'|
1991 discharge type class resolution, so that in \verb|D| when the database
1992 is now overlapping, no resolution occurs, so the error is never found.
1993
1994 It is easy to dismiss this example as an implementation wart in GHC, and
1995 continue pretending that global uniqueness of instances holds. However,
1996 the problem with \emph{global uniqueness of instances} is that they are
1997 inherently nonmodular: you might find yourself unable to compose two
1998 components because they accidentally defined the same type class
1999 instance, even though these instances are plumbed deep in the
2000 implementation details of the components.
2001
2002 As it turns out, there is already another feature in Haskell which
2003 must enforce global uniqueness, to prevent segfaults.
2004 We now turn to type classes' close cousin: type families.
2005
2006 \paragraph{Type families} With type families, confluence is the primary
2007 property of interest. (Coherence is not of much interest because type
2008 families are elaborated into coercions, which don't have any
2009 computational content.) Rather than considering what the set of
2010 constraints we reduce to, confluence for type families considers the
2011 reduction of type families. The overlap checks for type families
2012 can be quite sophisticated, especially in the case of closed type
2013 families.
2014
2015 Unlike type classes, however, GHC \emph{does} check the non-overlap
2016 of type families eagerly. The analogous program does \emph{not} type check:
2017
2018 \begin{verbatim}
2019 -- F.hs
2020 type family F a :: *
2021 -- A.hs
2022 import F
2023 type instance F Bool = Int
2024 -- B.hs
2025 import F
2026 type instance F Bool = Bool
2027 -- C.hs
2028 import A
2029 import B
2030 \end{verbatim}
2031
2032 The reason is that it is \emph{unsound} to ever allow any overlap
2033 (unlike in the case of type classes where it just leads to incoherence.)
2034 Thus, whereas one might imagine dropping the global uniqueness of instances
2035 invariant for type classes, it is absolutely necessary to perform global
2036 enforcement here. There's no way around it!
2037
2038 \subsection{Local type classes}
2039
2040 Here, we say \textbf{NO} to global uniqueness.
2041
2042 This design is perhaps best discussed in relation to modular type
2043 classes, which shares many similar properties. Instances are now
2044 treated as first class objects (in MTCs, they are simply modules)---we
2045 may explicitly hide or include instances for type class resolution (in
2046 MTCs, this is done via the \verb|using| top-level declaration). This is
2047 essentially what was sketched in Section 5 of the original Backpack
2048 paper. As a simple example:
2049
2050 \begin{verbatim}
2051 package p where
2052 A :: [ data T = T ]
2053 B = [ import A; instance Eq T where ... ]
2054
2055 package q where
2056 A = [ data T = T; instance Eq T where ... ]
2057 include p
2058 \end{verbatim}
2059
2060 Here, \verb|B| does not see the extra instance declared by \verb|A|,
2061 because it was thinned from its signature of \verb|A| (and thus never
2062 declared canonical.) To declare an instance without making it
2063 canonical, it must placed in a separate (unimported) module.
2064
2065 Like modular type classes, Backpack does not give rise to incoherence,
2066 because instance visibility can only be changed at the top level module
2067 language, where it is already considered best practice to provide
2068 explicit signatures. Here is the example used in the Modular Type
2069 Classes paper to demonstrate the problem:
2070
2071 \begin{verbatim}
2072 structure A = using EqInt1 in
2073 struct ...fun f x = eq(x,x)... end
2074 structure B = using EqInt2 in
2075 struct ...val y = A.f(3)... end
2076 \end{verbatim}
2077
2078 Is the type of f \verb|int -> bool|, or does it have a type-class
2079 constraint? Because type checking proceeds over the entire program, ML
2080 could hypothetically pick either. However, ported to Haskell, the
2081 example looks like this:
2082
2083 \begin{verbatim}
2084 EqInt1 :: [ instance Eq Int ]
2085 EqInt2 :: [ instance Eq Int ]
2086 A = [
2087 import EqInt1
2088 f x = x == x
2089 ]
2090 B = [
2091 import EqInt2
2092 import A hiding (instance Eq Int)
2093 y = f 3
2094 ]
2095 \end{verbatim}
2096
2097 There may be ambiguity, yes, but it can be easily resolved by the
2098 addition of a top-level type signature to \verb|f|, which is considered
2099 best-practice anyway. Additionally, Haskell users are trained to expect
2100 a particular inference for \verb|f| in any case (the polymorphic one).
2101
2102 Here is another example which might be considered surprising:
2103
2104 \begin{verbatim}
2105 package p where
2106 A :: [ data T = T ]
2107 B :: [ data T = T ]
2108 C = [
2109 import qualified A
2110 import qualified B
2111 instance Show A.T where show T = "A"
2112 instance Show B.T where show T = "B"
2113 x :: String
2114 x = show A.T ++ show B.T
2115 ]
2116 \end{verbatim}
2117
2118 In the original Backpack paper, it was implied that module \verb|C|
2119 should not type check if \verb|A.T = B.T| (failing at link time).
2120 However, if we set aside, for a moment, the issue that anyone who
2121 imports \verb|C| in such a context will now have overlapping instances,
2122 there is no reason in principle why the module itself should be
2123 problematic. Here is the example in MTCs, which I have good word from
2124 Derek does type check.
2125
2126 \begin{verbatim}
2127 signature SIG = sig
2128 type t
2129 val mk : t
2130 end
2131 signature SHOW = sig
2132 type t
2133 val show : t -> string
2134 end
2135 functor Example (A : SIG) (B : SIG) =
2136 let structure ShowA : SHOW = struct
2137 type t = A.t
2138 fun show _ = "A"
2139 end in
2140 let structure ShowB : SHOW = struct
2141 type t = B.t
2142 fun show _ = "B"
2143 end in
2144 using ShowA, ShowB in
2145 struct
2146 val x = show A.mk ++ show B.mk
2147 end : sig val x : string end
2148 \end{verbatim}
2149
2150 The moral of the story is, even though in a later context the instances
2151 are overlapping, inside the functor, the type-class resolution is unambiguous
2152 and should be done (so \verb|x = "AB"|).
2153
2154 Up until this point, we've argued why MTCs and this Backpack design are similar.
2155 However, there is an important sociological difference between modular type-classes
2156 and this proposed scheme for Backpack. In the presentation ``Why Applicative
2157 Functors Matter'', Derek mentions the canonical example of defining a set:
2158
2159 \begin{verbatim}
2160 signature ORD = sig type t; val cmp : t -> t -> bool end
2161 signature SET = sig type t; type elem;
2162 val empty : t;
2163 val insert : elem -> t -> t ...
2164 end
2165 functor MkSet (X : ORD) :> SET where type elem = X.t
2166 = struct ... end
2167 \end{verbatim}
2168
2169 This is actually very different from how sets tend to be defined in
2170 Haskell today. If we directly encoded this in Backpack, it would
2171 look like this:
2172
2173 \begin{verbatim}
2174 package mk-set where
2175 X :: [
2176 data T
2177 cmp :: T -> T-> Bool
2178 ]
2179 Set :: [
2180 data Set
2181 empty :: Set
2182 insert :: T -> Set -> Set
2183 ]
2184 Set = [
2185 import X
2186 ...
2187 ]
2188 \end{verbatim}
2189
2190 It's also informative to consider how MTCs would encode set as it is written
2191 today in Haskell:
2192
2193 \begin{verbatim}
2194 signature ORD = sig type t; val cmp : t -> t -> bool end
2195 signature SET = sig type 'a t;
2196 val empty : 'a t;
2197 val insert : (X : ORD) => X.t -> X.t t -> X.t t
2198 end
2199 struct MkSet :> SET = struct ... end
2200 \end{verbatim}
2201
2202 Here, it is clear to see that while functor instantiation occurs for
2203 implementation, it is not occuring for types. This is a big limitation
2204 with the Haskell approach, and it explains why Haskellers, in practice,
2205 find global uniqueness of instances so desirable.
2206
2207 Implementation-wise, this requires some subtle modifications to how we
2208 do type class resolution. Type checking of indefinite modules works as
2209 before, but when go to actually compile them against explicit
2210 implementations, we need to ``forget'' that two types are equal when
2211 doing instance resolution. This could probably be implemented by
2212 associating type class instances with the original name that was
2213 utilized when typechecking, so that we can resolve ambiguous matches
2214 against types which have the same original name now that we are
2215 compiling.
2216
2217 As we've mentioned previously, this strategy is unsound for type families.
2218
2219 \subsection{Globally unique}
2220
2221 Here, we say \textbf{YES} to global uniqueness.
2222
2223 When we require the global uniqueness of instances (either because
2224 that's the type class design we chose, or because we're considering
2225 the problem of type families), we will need to reject declarations like the
2226 one cited above when \verb|A.T = B.T|:
2227
2228 \begin{verbatim}
2229 A :: [ data T ]
2230 B :: [ data T ]
2231 C :: [
2232 import qualified A
2233 import qualified B
2234 instance Show A.T where show T = "A"
2235 instance Show B.T where show T = "B"
2236 ]
2237 \end{verbatim}
2238
2239 The paper mentions that a link-time check is sufficient to prevent this
2240 case from arising. While in the previous section, we've argued why this
2241 is actually unnecessary when local instances are allowed, the link-time
2242 check is a good match in the case of global instances, because any
2243 instance \emph{must} be declared in the signature. The scheme proceeds
2244 as follows: when some instances are typechecked initially, we type check
2245 them as if all of variable module identities were distinct. Then, when
2246 we perform linking (we \verb|include| or we unify some module
2247 identities), we check again if to see if we've discovered some instance
2248 overlap. This linking check is akin to the eager check that is
2249 performed today for type families; it would need to be implemented for
2250 type classes as well: however, there is a twist: we are \emph{redoing}
2251 the overlap check now that some identities have been unified.
2252
2253 As an implementation trick, one could deferring the check until \verb|C|
2254 is compiled, keeping in line with GHC's lazy ``don't check for overlap
2255 until the use site.'' (Once again, unsound for type families.)
2256
2257 \paragraph{What about module inequalities?} An older proposal was for
2258 signatures to contain ``module inequalities'', i.e., assertions that two
2259 modules are not equal. (Technically: we need to be able to apply this
2260 assertion to $\beta$ module variables, since \verb|A != B| while
2261 \verb|A.T = B.T|). Currently, Edward thinks that module inequalities
2262 are only marginal utility with local instances (i.e., not enough to
2263 justify the implementation cost) and not useful at all in the world of
2264 global instances!
2265
2266 With local instances, module inequalities could be useful to statically
2267 rule out examples like \verb|show A.T ++ show B.T|. Because such uses
2268 are not necessarily reflected in the signature, it would be a violation
2269 of separate module development to try to divine the constraint from the
2270 implementation itself. I claim this is of limited utility, however, because,
2271 as we mentioned earlier, we can compile these ``incoherent'' modules perfectly
2272 coherently. With global instances, all instances must be in the signature, so
2273 while it might be aesthetically displeasing to have the signature impose
2274 extra restrictions on linking identities, we can carry this out without
2275 violating the linking restriction.
2276
2277 \section{Bits and bobs}
2278
2279 \subsection{Abstract type synonyms}
2280
2281 In Paper Backpack, abstract type synonyms are not permitted, because GHC doesn't
2282 understand how to deal with them. The purpose of this section is to describe
2283 one particularly nastiness of abstract type synonyms, by way of the occurs check:
2284
2285 \begin{verbatim}
2286 A :: [ type T ]
2287 B :: [ import qualified A; type T = [A.T] ]
2288 \end{verbatim}
2289
2290 At this point, it is illegal for \verb|A = B|, otherwise this type synonym would
2291 fail the occurs check. This seems like pretty bad news, since every instance
2292 of the occurs check in the type-checker could constitute a module inequality.
2293
2294 \section{Open questions}\label{sec:open-questions}
2295
2296 Here are open problems about the implementation which still require
2297 hashing out.
2298
2299 \begin{itemize}
2300
2301 \item In Section~\ref{sec:simplifying-backpack}, we argued that we
2302 could implement Backpack without needing a shaping pass. We're
2303 pretty certain that this will work for typechecking and
2304 compiling fully definite packages with no recursive linking, but
2305 in Section~\ref{sec:typechecking-indefinite}, we described some
2306 of the prevailing difficulties of supporting signature linking.
2307 Renaming is not an insurmountable problem, but backwards flow of
2308 shaping information can be, and it is unclear how best to
2309 accommodate this. This is probably the most important problem
2310 to overcome.
2311
2312 \item In Section~\ref{sec:installing-indefinite}, a few choices for how to
2313 store source code were pitched, however, there is not consensus on which
2314 one is best.
2315
2316 \item What is the impact of the multiplicity of PackageIds on
2317 dependency solving in Cabal? Old questions of what to prefer
2318 when multiple package-versions are available (Cabal originally
2319 only needed to solve this between different versions of the same
2320 package, preferring the oldest version), but with signatures,
2321 there are more choices. Should there be a complex solver that
2322 does all signature solving, or a preprocessing step that puts
2323 things back into the original Cabal version. Authors may want
2324 to suggest policy for what packages should actually link against
2325 signatures (so a crypto library doesn't accidentally link
2326 against a null cipher package).
2327
2328 \end{itemize}
2329
2330 \end{document}