Add versioning section to Backpack docs.
[ghc.git] / docs / backpack / algorithm.tex
1 \documentclass{article}
2
3 \usepackage{pifont}
4 \usepackage{graphicx} %[pdftex] OR [dvips]
5 \usepackage{fullpage}
6 \usepackage{wrapfig}
7 \usepackage{float}
8 \usepackage{titling}
9 \usepackage{hyperref}
10 \usepackage{tikz}
11 \usepackage{color}
12 \usepackage{footnote}
13 \usepackage{float}
14 \usepackage{algorithm}
15 \usepackage{algpseudocode}
16 \usepackage{bigfoot}
17 \usepackage{amssymb}
18 \usepackage{framed}
19
20 % Alter some LaTeX defaults for better treatment of figures:
21 % See p.105 of "TeX Unbound" for suggested values.
22 % See pp. 199-200 of Lamport's "LaTeX" book for details.
23 % General parameters, for ALL pages:
24 \renewcommand{\topfraction}{0.9} % max fraction of floats at top
25 \renewcommand{\bottomfraction}{0.8} % max fraction of floats at bottom
26 % Parameters for TEXT pages (not float pages):
27 \setcounter{topnumber}{2}
28 \setcounter{bottomnumber}{2}
29 \setcounter{totalnumber}{4} % 2 may work better
30 \setcounter{dbltopnumber}{2} % for 2-column pages
31 \renewcommand{\dbltopfraction}{0.9} % fit big float above 2-col. text
32 \renewcommand{\textfraction}{0.07} % allow minimal text w. figs
33 % Parameters for FLOAT pages (not text pages):
34 \renewcommand{\floatpagefraction}{0.7} % require fuller float pages
35 % N.B.: floatpagefraction MUST be less than topfraction !!
36 \renewcommand{\dblfloatpagefraction}{0.7} % require fuller float pages
37
38 % remember to use [htp] or [htpb] for placement
39
40 \newcommand{\I}[1]{\ensuremath{\mathit{#1}}}
41
42 \newcommand{\optionrule}{\noindent\rule{1.0\textwidth}{0.75pt}}
43
44 \newenvironment{aside}
45 {\begin{figure}\def\FrameCommand{\hspace{2em}}
46 \MakeFramed{\advance\hsize-\width}\optionrule\small}
47 {\par\vskip-\smallskipamount\optionrule\endMakeFramed\end{figure}}
48
49 \setlength{\droptitle}{-6em}
50
51 \newcommand{\Red}[1]{{\color{red} #1}}
52
53 \title{The Backpack algorithm}
54
55 \begin{document}
56
57 \maketitle
58
59 This document describes the Backpack shaping and typechecking
60 passes, as we intend to implement it.
61
62 \section{Changelog}
63
64 \paragraph{April 28, 2015} A signature declaration no longer provides
65 a signature in the technical shaping sense; the motivation for this change
66 is explained in \textbf{In-scope signatures are not provisions}. The simplest
67 consequence of this is that all requirements are importable (Derek has stated that he doesn't
68 think this will be too much of a problem in practice); it is also possible to
69 extend shape with a \verb|signatures| field, although some work has to be
70 done specifying coherence conditions between \verb|signatures| and \verb|requirements|.
71
72 \section{Front-end syntax}
73
74 \begin{figure}[htpb]
75 $$
76 \begin{array}{rcll}
77 p,q,r && \mbox{Package names} \\
78 m,n && \mbox{Module names} \\[1em]
79 \multicolumn{3}{l}{\mbox{\bf Packages}} \\
80 \I{pkg} & ::= & \verb|package|\; p\; [\I{provreq}]\; \verb|where {| d_1 \verb|;| \ldots \verb|;| d_n \verb|}| \\[1em]
81 \multicolumn{3}{l}{\mbox{\bf Declarations}} \\
82 d & ::= & \verb|module|\; m \; [exports]\; \verb|where|\; \I{body} \\
83 & | & \verb|signature|\; m \; [exports]\; \verb|where|\; \I{body} \\
84 & | & \verb|include|\; p \; [provreq] \\[1em]
85 \multicolumn{3}{l}{\mbox{\bf Provides/requires specification}} \\
86 \I{provreq} & ::= & \verb|(| \, \I{rns} \, \verb|)| \;
87 [ \verb|requires(|\, \I{rns} \, \verb|)| ] \\
88 \I{rns} & ::= & \I{rn}_0 \verb|,| \, \ldots \verb|,| \, \I{rn}_n [\verb|,|] & \mbox{Renamings} \\
89 \I{rn} & ::= & m\; \verb|as| \; n & \mbox{Renaming} \\[1em]
90 \multicolumn{3}{l}{\mbox{\bf Haskell code}} \\
91 \I{exports} & & \mbox{A Haskell module export list} \\
92 \I{body} & & \mbox{A Haskell module body} \\
93 \end{array}
94 $$
95 \caption{Syntax of Backpack} \label{fig:syntax}
96 \end{figure}
97
98 The syntax of Backpack is given in Figure~\ref{fig:syntax}.
99 See the ``Backpack manual'' for more explanation about the syntax. It
100 is slightly simplified here by removing any constructs which are easily implemented as
101 syntactic sugar (e.g., a bare $m$ in a renaming is simply $m\; \verb|as|\; m$.)
102
103 \newpage
104 \section{Shaping}
105
106 \begin{figure}[htpb]
107 $$
108 \begin{array}{rcll}
109 \I{Shape} & ::= & \verb|provides:|\; m \; \verb|->|\; \I{Module}\; \verb|{|\, \I{Name} \verb|,|\, \ldots \, \verb|};| \ldots \\
110 & & \verb|requires:| \; m \; \verb|->|\; \textcolor{white}{\I{Module}}\; \verb|{| \, \I{Name} \verb|,| \, \ldots \, \verb|}| \verb|;| \ldots \\
111 \I{PkgKey} & ::= & p \verb|(| \, m \; \verb|->| \; \I{Module} \verb|,|\, \ldots\, \verb|)| \\
112 \I{Module} & ::= & \I{PkgKey} \verb|:| m \\
113 \I{Name} & ::= & \I{Module} \verb|.| \I{OccName} \\
114 \I{OccName} & & \mbox{Unqualified name in a namespace}
115 \end{array}
116 $$
117 \caption{Semantic entities in Backpack} \label{fig:semantic}
118 \end{figure}
119
120 Shaping computes a \I{Shape}, whose form is described in Figure~\ref{fig:semantic}.
121 A shape describes what modules a package implements and exports (the \emph{provides})
122 and what signatures a package needs to have filled in (the \emph{requires}). Both
123 provisions and requires can be imported after a package is included.
124
125 We incrementally build a shape by starting with an empty
126 shape context and adding to it as follows:
127
128 \begin{enumerate}
129 \item Calculate the shape of a declaration, with respect to the
130 current shape context. (e.g., by renaming a module/signature,
131 or using the shape from an included package.)
132 \item Merge this shape into the shape context.
133 \end{enumerate}
134
135 The final shape context is the shape of the package as a whole.
136 Optionally, we can also compute the renamed syntax trees of
137 modules and signatures.
138
139 % (There is a slight
140 % technical difficulty here, where to do shaping, we actually need an \texttt{AvailInfo},
141 % so we can resolve \texttt{T(..)} style imports.)
142
143 % One variation of shaping also computes the renamed version of a package,
144 % i.e., where each identifier in the module and signature is replaced with
145 % the original name (equivalently, we record the output of GHC's renaming
146 % pass). This simplifies type checking because you no longer have to
147 % recalculate the set of available names, which otherwise would be lost.
148 % See more about this in the type checking section.
149
150 In the description below, we'll assume \verb|THIS| is the package key
151 of the package being processed.
152
153 \begin{aside}
154 \textbf{\textit{OccName} is implied by \textit{Name}.}
155 In Haskell, the following is not valid syntax:
156
157 \begin{verbatim}
158 import A (foobar as baz)
159 \end{verbatim}
160 In particular, a \I{Name} which is in scope will always have the same
161 \I{OccName} (even if it may be qualified.) You might imagine relaxing
162 this restriction so that declarations can be used under different \I{OccName}s;
163 in such a world, we need a different definition of shape:
164
165 \begin{verbatim}
166 Shape ::=
167 provided: ModName -> { OccName -> Name }
168 required: ModName -> { OccName -> Name }
169 \end{verbatim}
170 Presently, however, such an \I{OccName} annotation would be redundant: it can be inferred from the \I{Name}.
171 \end{aside}
172
173 \begin{aside}
174 \textbf{Holes of a package are a mapping, not a set.} Why can't the \I{PkgKey} just record a
175 set of \I{Module}s, e.g. $\I{PkgKey}\;::= \; \I{SrcPkgKey} \; \verb|{| \; \I{Module} \; \verb|}|$? Consider:
176
177 \begin{verbatim}
178 package p (A) requires (H1, H2) where
179 signature H1(T) where
180 data T
181 signature H2(T) where
182 data T
183 module A(A(..)) where
184 import qualified H1
185 import qualified H2
186 data A = A H1.T H2.T
187
188 package q (A12, A21) where
189 module I1(T) where
190 data T = T Int
191 module I2(T) where
192 data T = T Bool
193 include p (A as A12) requires (H1 as I1, H2 as I2)
194 include p (A as A21) requires (H1 as I2, H2 as I1)
195 \end{verbatim}
196 With a mapping, the first instance of \verb|p| has key \verb|p(H1 -> q():I1, H2 -> q():I2)|
197 while the second instance has key \verb|p(H1 -> q():I2, H2 -> q():I1)|; with
198 a set, both would have the key \verb|p(q():I1, q():I2)|.
199 \end{aside}
200
201
202 \begin{aside}
203 \textbf{Signatures can require a specific entity.}
204 With requirements like \verb|A -> { HOLE:A.T, HOLE:A.foo }|,
205 why not specify it as \verb|A -> { T, foo }|,
206 e.g., \verb|required: { ModName -> { OccName } }|? Consider:
207
208 \begin{verbatim}
209 package p () requires (A, B) where
210 signature A(T) where
211 data T
212 signature B(T) where
213 import T
214 \end{verbatim}
215 The requirements of this package specify that \verb|A.T| $=$ \verb|B.T|; this
216 can be expressed with \I{Name}s as
217
218 \begin{verbatim}
219 A -> { HOLE:A.T }
220 B -> { HOLE:A.T }
221 \end{verbatim}
222 But, without \I{Name}s, the sharing constraint is impossible: \verb|A -> { T }; B -> { T }|. (NB: \verb|A| and \verb|B| don't have to be implemented with the same module.)
223 \end{aside}
224
225 \begin{aside}
226 \textbf{The \textit{Name} of a value is used to avoid
227 ambiguous identifier errors.} We state that two types
228 are equal when their \I{Name}s are the same; however,
229 for values, it is less clear why we care. But consider this example:
230
231 \begin{verbatim}
232 package p (A) requires (H1, H2) where
233 signature H1(x) where
234 x :: Int
235 signature H2(x) where
236 import H1(x)
237 module A(y) where
238 import H1
239 import H2
240 y = x
241 \end{verbatim}
242 The reference to \verb|x| in \verb|A| is unambiguous, because it is known
243 that \verb|x| from \verb|H1| and \verb|x| from \verb|H2| are the same (have
244 the same \I{Name}.) If they were not the same, it would be ambiguous and
245 should cause an error. Knowing the \I{Name} of a value distinguishes
246 between these two cases.
247 \end{aside}
248
249 \begin{aside}
250 \textbf{Holes are linear}
251 Requirements do not record what \I{Module} represents
252 the identity of a requirement, which means that it's not possible to assert
253 that hole \verb|A| and hole \verb|B| should be implemented with the same module,
254 as might occur with aliasing:
255
256 \begin{verbatim}
257 signature A where
258 signature B where
259 alias A = B
260 \end{verbatim}
261 %
262 The benefit of this restriction is that when a requirement is filled,
263 it is obvious that this is the only requirement that is filled: you won't
264 magically cause some other requirements to be filled. The downside is
265 it's not possible to write a package which looks for an interface it is
266 looking for in one of $n$ names, accepting any name as an acceptable linkage.
267 If aliasing was allowed, we'd need a separate physical shaping context,
268 to make sure multiple mentions of the same hole were consistent.
269
270 \end{aside}
271
272 %\newpage
273
274 \subsection{\texttt{module M}}
275
276 A module declaration provides a module \verb|THIS:M| at module name \verb|M|. It
277 has the shape:
278
279 \begin{verbatim}
280 provides: M -> THIS:M { exports of renamed M under THIS:M }
281 requires: (nothing)
282 \end{verbatim}
283 Example:
284
285 \begin{verbatim}
286 module A(T) where
287 data T = T
288
289 -- provides: A -> THIS:A { THIS:A.T }
290 -- requires: (nothing)
291 \end{verbatim}
292
293 \newpage
294 \subsection{\texttt{signature M}}
295
296 A signature declaration creates a requirement at module name \verb|M|. It has the shape:
297
298 \begin{verbatim}
299 provides: (nothing)
300 requires: M -> { exports of renamed M under HOLE:M }
301 \end{verbatim}
302
303 \noindent Example:
304
305 \begin{verbatim}
306 signature H(T) where
307 data T
308
309 -- provides: H -> (nothing)
310 -- requires: H -> { HOLE:H.T }
311 \end{verbatim}
312
313 \begin{aside}
314 \textbf{In-scope signatures are not provisions}. We enforce the invariant that
315 a provision is always (syntactically) a \verb|module| and a requirement
316 is always a \verb|signature|. This means that if you have a requirement
317 and a provision of the same name, the requirement can \emph{always} be filled
318 with the provision. Without this invariant, it's not clear if a provision
319 will actually fill a signature. Consider this example, where
320 a signature is required and exposed:
321
322 \begin{verbatim}
323 package a-sigs (A) requires (A) where -- ***
324 signature A where
325 data T
326
327 package a-user (B) requires (A) where
328 signature A where
329 data T
330 x :: T
331 module B where
332 ...
333
334 package p where
335 include a-sigs
336 include a-user
337 \end{verbatim}
338 %
339 When we consider merging in the shape of \verb|a-user|, does the
340 \verb|A| provided by \verb|a-sigs| fill in the \verb|A| requirement
341 in \verb|a-user|? It \emph{should not}, since \verb|a-sigs| does not
342 actually provide enough declarations to satisfy \verb|a-user|'s
343 requirement: the intended semantics \emph{merges} the requirements
344 of \verb|a-sigs| and \verb|a-user|.
345
346
347
348 \begin{verbatim}
349 package a-sigs (M as A) requires (H as A) where
350 signature H(T) where
351 data T
352 module M(T) where
353 import H(T)
354 \end{verbatim}
355 %
356 We rightly should error, since the provision is a module. And in this situation:
357
358 \begin{verbatim}
359 package a-sigs (H as A) requires (H) where
360 signature H(T) where
361 data T
362 \end{verbatim}
363 %
364 The requirements should be merged, but should the merged requirement
365 be under the name \verb|H| or \verb|A|?
366
367 It may still be possible to use the \verb|(A) requires (A)| syntax to
368 indicate exposed signatures, but this would be a mere syntactic
369 alternative to \verb|() requires (exposed A)|.
370 \end{aside}
371 %
372
373 \newpage
374 \subsection{\texttt{include pkg (X) requires (Y)}}
375
376 We merge with the transformed shape of package \verb|pkg|, where this
377 shape is transformed by:
378
379 \begin{itemize}
380 \item Renaming and thinning the provisions according to \verb|(X)|
381 \item Renaming requirements according to \verb|(Y)| (requirements cannot
382 be thinned, so non-mentioned requirements are implicitly passed through.)
383 For each renamed requirement from \verb|Y| to \verb|Y'|,
384 substitute \verb|HOLE:Y| with \verb|HOLE:Y'| in the
385 \I{Module}s and \I{Name}s of the provides and requires.
386 \end{itemize}
387 %
388 If there are no thinnings/renamings, you just merge the
389 shape unchanged! Here is an example:
390
391 \begin{verbatim}
392 package p (M) requires (H) where
393 signature H where
394 data T
395 module M where
396 import H
397 data S = S T
398
399 package q (A) where
400 module X where
401 data T = T
402 include p (M as A) requires (H as X)
403 \end{verbatim}
404 %
405 The shape of package \verb|p| is:
406
407 \begin{verbatim}
408 requires: M -> { p(H -> HOLE:H):M.S }
409 provides: H -> { HOLE:H.T }
410 \end{verbatim}
411 %
412 Thus, when we process the \verb|include| in package \verb|q|,
413 we make the following two changes: we rename the provisions,
414 and we rename the requirements, substituting \verb|HOLE|s.
415 The resulting shape to be merged in is:
416
417 \begin{verbatim}
418 provides: A -> { p(H -> HOLE:X):M.S }
419 requires: X -> { HOLE:X.T }
420 \end{verbatim}
421 %
422 After merging this in, the final shape of \verb|q| is:
423
424 \begin{verbatim}
425 provides: X -> { q():X.T } -- from shaping 'module X'
426 A -> { p(H -> q():X):M.S }
427 requires: (nothing) -- discharged by provided X
428 \end{verbatim}
429
430 \newpage
431
432 \subsection{Merging}
433
434 The shapes we've given for individual declarations have been quite
435 simple. Merging combines two shapes, filling requirements with
436 implementations, unifying \I{Name}s, and unioning requirements; it is
437 the most complicated part of the shaping process.
438
439 The best way to think about merging is that we take two packages with
440 inputs (requirements) and outputs (provisions) and ``wiring'' them up so
441 that outputs feed into inputs. In the absence
442 of mutual recursion, this wiring process is \emph{directed}: the provisions
443 of the first package feed into the requirements of the second package,
444 but never vice versa. (With mutual recursion, things can go in the opposite
445 direction as well.)
446
447 Suppose we are merging shape $p$ with shape $q$ (e.g., $p; q$). Merging
448 proceeds as follows:
449
450 \begin{enumerate}
451 \item \emph{Fill every requirement of $q$ with provided modules from
452 $p$.} For each requirement $M$ of $q$ that is provided by $p$ (in particular,
453 all of its required \verb|Name|s are provided),
454 substitute each \I{Module} occurrence of \verb|HOLE:M| with the
455 provided $p\verb|(|M\verb|)|$, unify the names, and remove the requirement from $q$.
456 If the names of the provision are not a superset of the required names, error.
457 \item If mutual recursion is supported, \emph{fill every requirement of $p$ with provided modules from $q$.}
458 \item \emph{Merge leftover requirements.} For each requirement $M$ of $q$ that is not
459 provided by $p$ but required by $p$, unify the names, and union them together to form the new requirement. (It's not
460 necessary to substitute \I{Module}s, since they are guaranteed to be the same.)
461 \item \emph{Add provisions of $q$.} Union the provisions of $p$ and $q$, erroring
462 if there is a duplicate that doesn't have the same identity.
463 \end{enumerate}
464 %
465 To unify two sets of names, find each pair of names with matching \I{OccName}s $n$ and $m$ and do the following:
466
467 \begin{enumerate}
468 \item If both are from holes, pick a canonical representative $m$ and substitute $n$ with $m$.
469 \item If one $n$ is from a hole, substitute $n$ with $m$.
470 \item Otherwise, error if the names are not the same.
471 \end{enumerate}
472 %
473 It is important to note that substitutions on \I{Module}s and substitutions on
474 \I{Name}s are disjoint: a substitution from \verb|HOLE:A| to \verb|HOLE:B|
475 does \emph{not} substitute inside the name \verb|HOLE:A.T|.
476
477 Since merging is the most complicated step of shaping, here are a big
478 pile of examples of it in action.
479
480 \subsubsection{A simple example}
481
482 In the following set of packages:
483
484 \begin{verbatim}
485 package p(M) requires (A) where
486 signature A(T) where
487 data T
488 module M(T, S) where
489 import A(T)
490 data S = S T
491
492 package q where
493 module A where
494 data T = T
495 include p
496 \end{verbatim}
497
498 When we \verb|include p|, we need to merge the partial shape
499 of \verb|q| (with just provides \verb|A|) with the shape
500 of \verb|p|. Here is each step of the merging process:
501
502 \begin{verbatim}
503 shape 1 shape 2
504 --------------------------------------------------------------------------------
505 (initial shapes)
506 provides: A -> THIS:A { q():A.T } M -> p(A -> HOLE:A) { HOLE:A.T, p(A -> HOLE:A).S }
507 requires: (nothing) A -> { HOLE:A.T }
508
509 (after filling requirements)
510 provides: A -> THIS:A { q():A.T } M -> p(A -> THIS:A) { q():A.T, p(A -> THIS:A).S }
511 requires: (nothing) (nothing)
512
513 (after adding provides)
514 provides: A -> THIS:A { q():A.T }
515 M -> p(A -> THIS:A) { q():A.T, p(A -> THIS:A).S }
516 requires: (nothing)
517 \end{verbatim}
518
519 Notice that we substituted \verb|HOLE:A| with \verb|THIS:A|, but \verb|HOLE:A.T| with \verb|q():A.T|.
520
521 \subsubsection{Requirements merging can affect provisions}
522
523 When a merge results in a substitution, we substitute over both
524 requirements and provisions:
525
526 \begin{verbatim}
527 signature H(T) where
528 data T
529 module A(T) where
530 import H(T)
531 module B(T) where
532 data T = T
533
534 -- provides: A -> THIS:A { HOLE:H.T }
535 -- B -> THIS:B { THIS:B.T }
536 -- requires: H -> { HOLE:H.T }
537
538 signature H(T, f) where
539 import B(T)
540 f :: a -> a
541
542 -- provides: A -> THIS:A { THIS:B.T } -- UPDATED
543 -- B -> THIS:B { THIS:B.T }
544 -- requires: H -> { THIS:B.T, HOLE:H.f } -- UPDATED
545 \end{verbatim}
546
547 \subsubsection{Sharing constraints}
548
549 Suppose you have two signature which both independently define a type,
550 and you would like to assert that these two types are the same. In the
551 ML world, such a constraint is known as a sharing constraint. Sharing
552 constraints can be encoded in Backpacks via clever use of reexports;
553 they are also an instructive example for signature merging.
554
555 \begin{verbatim}
556 signature A(T) where
557 data T
558 signature B(T) where
559 data T
560
561 -- requires: A -> { HOLE:A.T }
562 B -> { HOLE:B.T }
563
564 -- the sharing constraint!
565 signature A(T) where
566 import B(T)
567 -- (shape to merge)
568 -- requires: A -> { HOLE:B.T }
569
570 -- (after merge)
571 -- requires: A -> { HOLE:A.T }
572 -- B -> { HOLE:A.T }
573 \end{verbatim}
574 %
575 \Red{I'm pretty sure any choice of \textit{Name} is OK, since the
576 subsequent substitution will make it alpha-equivalent.}
577
578 % \subsubsection{Leaky requirements}
579
580 % Both requirements and provisions can be imported, but requirements
581 % are always available
582
583 %\Red{How to relax this so hs-boot works}
584
585 %\Red{Example of how loopy modules which rename requirements make it un-obvious whether or not
586 %a requirement is still required. Same example works declaration level.}
587
588 %\Red{package p (A) requires (A); the input output ports should be the same}
589
590 % We figure out the requirements (because no loopy modules)
591 %
592 % package p (A, B) requires (B)
593 % include base
594 % sig B(T)
595 % import Prelude(T)
596 %
597 % requirement example
598 %
599 % mental model: you start with an empty package, and you start accreting
600 % things on things, merging things together as you discover that this is
601 % the case.
602
603 %\newpage
604
605 \subsection{Export declarations}
606
607 If an explicit export declaration is given, the final shape is the
608 computed shape, minus any provisions not mentioned in the list, with the
609 appropriate renaming applied to provisions and requirements. (Requirements
610 are implicitly passed through if they are not named.)
611 If no explicit export declaration is given, the final shape is
612 the computed shape, including only provisions which were defined
613 in the declarations of the package.
614
615 \begin{aside}
616 \textbf{Signature visibility, and defaulting}
617 The simplest formulation of requirements is to have them always be
618 visible. Signature visibility could be controlled by associating
619 every requirement with a flag indicating if it is importable or
620 not: a signature declaration sets a requirement to be visible, and
621 an explicit export list can specify if a requirement is to be visible
622 or not.
623
624 When an export list is absent, we have to pick a default visibility
625 for a signature. If we use the same behavior as with modules,
626 a strange situation can occur:
627
628 \begin{verbatim}
629 package p where -- S is visible
630 signature S where
631 x :: True
632
633 package q where -- use defaulting
634 include p
635 signature S where
636 y :: True
637 module M where
638 import S
639 z = x && y -- OK
640
641 package r where
642 include q
643 module N where
644 import S
645 z = y -- OK
646 z = x -- ???
647 \end{verbatim}
648 %
649 Absent the second signature declaration in \verb|q|, \verb|S.x| clearly
650 should not be visible in \verb|N|. However, what ought to occur when this signature
651 declaration is added? One interpretation is to say that only some
652 (but not all) declarations are provided (\verb|S.x| remains invisible);
653 another interpretation is that adding \verb|S| is enough to treat
654 the signature as ``in-line'', and all declarations are now provided
655 (\verb|S.x| is visible).
656
657 The latter interpretation avoids having to keep track of providedness
658 per declarations, and means that you can always express defaulting
659 behavior by writing an explicit provides declaration on the package.
660 However, it has the odd behavior of making empty signatures semantically
661 meaningful:
662
663 \begin{verbatim}
664 package q where
665 include p
666 signature S where
667 \end{verbatim}
668 \end{aside}
669 %
670 % SPJ: This would be too complicated (if there's yet a third way)
671
672 \subsection{Package key}
673
674 What is \verb|THIS|? It is the package name, plus for every requirement \verb|M|,
675 a mapping \verb|M -> HOLE:M|. Annoyingly, you don't know the full set of
676 requirements until the end of shaping, so you don't know the package key ahead of time;
677 however, it can be substituted at the end easily.
678
679 \clearpage
680 \newpage
681
682 \section{Type constructor exports}
683
684 In the previous section, we described the \I{Name}s of a
685 module as a flat namespace; but actually, there is one level of
686 hierarchy associated with type-constructors. The type:
687
688 \begin{verbatim}
689 data A = B { foo :: Int }
690 \end{verbatim}
691 %
692 brings three \I{OccName}s into scope, \verb|A|, \verb|B|
693 and \verb|foo|, but the constructors and record selectors are
694 considered \emph{children}
695 of \verb|A|: in an import list, they can be implicitly brought
696 into scope with \verb|A(..)|, or individually brought into
697 scope with \verb|foo| or \verb|pattern B| (using the new \verb|PatternSynonyms|
698 extension). Symmetrically, a module may export only \emph{some}
699 of the constructors/selectors of a type; it may not even
700 export the type itself!
701
702 We \emph{absolutely} need this information to rename a module or
703 signature, which means that there is a little bit of extra information
704 we have to collect when shaping. What is this information? If we take
705 GHC's internal representation at face value, we have the more complex
706 semantic representation seen in Figure~\ref{fig:semantic2}:
707
708 \begin{figure}[htpb]
709 $$
710 \begin{array}{rcll}
711 \I{Shape} & ::= & \verb|provides:|\; m \; \verb|->|\; \I{Module}\; \verb|{|\, \I{AvailInfo} \verb|,|\, \ldots \, \verb|};| \ldots \\
712 & & \verb|requires:| \; m \; \verb|->|\; \textcolor{white}{\I{Module}}\; \verb|{| \, \I{AvailInfo} \verb|,| \, \ldots \, \verb|}| \verb|;| \ldots \\
713 \I{AvailInfo} & ::= & \I{Name} & \mbox{Plain identifiers} \\
714 & | & \I{Name} \, \verb|{| \, \I{Name}_0\verb|,| \, \ldots\verb|,| \, \I{Name}_n \, \verb|}| & \mbox{Type constructors} \\
715 \end{array}
716 $$
717 \caption{Enriched semantic entities in Backpack} \label{fig:semantic2}
718 \end{figure}
719 %
720 For type constructors, the outer \I{Name} identifies the \emph{parent}
721 identifier, which may not necessarily be in scope (define this to be the \texttt{availName}); the inner list consists
722 of the children identifiers that are actually in scope. If a wildcard
723 is written, all of the child identifiers are brought into scope.
724 In the following examples, we've ensured that
725 types and constructors are unambiguous, although in Haskell proper they
726 live in separate namespaces; we've also elided the \verb|THIS| package
727 key from the identifiers.
728
729 \begin{verbatim}
730 module M(A(..)) where
731 data A = B { foo :: Int }
732 -- M.A{ M.A, M.B, M.foo }
733
734 module N(A) where
735 data A = B { foo :: Int }
736 -- N.A{ N.A }
737
738 module O(foo) where
739 data A = B { foo :: Int }
740 -- O.A{ O.foo }
741
742 module A where
743 data T = S { bar :: Int }
744 module B where
745 data T = S { baz :: Bool }
746 module C(bar, baz) where
747 import A(bar)
748 import B(baz)
749 -- A.T{ A.bar }, B.T{ B.baz }
750 -- NB: it would be illegal for the type constructors
751 -- A.T and B.T to be both exported from C!
752 \end{verbatim}
753 %
754 Previously, we stated that we simply merged \I{Name}s based on their
755 \I{OccName}s. We now must consider what it means to merge \I{AvailInfo}s.
756
757 \subsection{Algorithm}
758
759 Our merging algorithm takes two sets of \I{AvailInfo}s and merges them
760 into one set. In the degenerate case where every \I{AvailInfo} is a
761 $Name$, this algorithm operates the same as the original algorithm.
762 Merging proceeds in two steps: unification and then simple union.
763
764 Unification proceeds as follows: for each pair of \I{Name}s with
765 matching \I{OccName}s, unify the names. For each pair of $\I{Name}\, \verb|{|\,
766 \I{Name}_0\verb|,|\, \ldots\verb|,|\, \I{Name}_n\, \verb|}|$, where there
767 exists some pair of child names with matching \I{OccName}s, unify the
768 parent \I{Name}s. (A single \I{AvailInfo} may participate in multiple such
769 pairs.) A simple identifier and a type constructor \I{AvailInfo} with
770 overlapping in-scope names fails to unify. After unification,
771 the simple union combines entries with matching \verb|availName|s (parent
772 name in the case of a type constructor), recursively unioning the child
773 names of type constructor \I{AvailInfo}s.
774
775 Unification of \I{Name}s results in a substitution, and a \I{Name} substitution
776 on \I{AvailInfo} is a little unconventional. Specifically, substitution on $\I{Name}\, \verb|{|\,
777 \I{Name}_0\verb|,|\, \ldots\verb|,|\, \I{Name}_n\, \verb|}|$ proceeds specially:
778 a substitution from \I{Name} to $\I{Name}'$ induces a substitution from
779 \I{Module} to $Module'$ (as the \I{OccName}s of the \I{Name}s are guaranteed
780 to be equal), so for each child $\I{Name}_i$, perform the \I{Module}
781 substitution. So for example, the substitution \verb|HOLE:A.T| to \verb|THIS:A.T|
782 takes the \I{AvailInfo} \verb|HOLE:A.T { HOLE:A.B, HOLE:A.foo }| to
783 \verb|THIS:A.T { THIS:A.B, THIS:A.foo }|. In particular, substitution
784 on children \I{Name}s is \emph{only} carried out by substituting on the outer name;
785 we will never directly substitute children.
786
787 \subsection{Examples}
788
789 Unfortunately, there are a number of tricky scenarios:
790
791 \paragraph{Merging when type constructors are not in scope}
792
793 \begin{verbatim}
794 signature A1(foo) where
795 data A = A { foo :: Int, bar :: Bool }
796
797 signature A2(bar) where
798 data A = A { foo :: Int, bar :: Bool }
799 \end{verbatim}
800 %
801 If we merge \verb|A1| and \verb|A2|, are we supposed to conclude that
802 the types \verb|A1.A| and \verb|A2.A| (not in scope!) are the same?
803 The answer is no! Consider these implementations:
804
805 \begin{verbatim}
806 module A1(A(..)) where
807 data A = A { foo :: Int, bar :: Bool }
808
809 module A2(A(..)) where
810 data A = A { foo :: Int, bar :: Bool }
811
812 module A(foo, bar) where
813 import A1(foo)
814 import A2(bar)
815 \end{verbatim}
816
817 Here, \verb|module A1| implements \verb|signature A1|, \verb|module A2| implements \verb|signature A2|,
818 and \verb|module A| implements \verb|signature A1| and \verb|signature A2| individually
819 and should certainly implement their merge. This is why we cannot simply
820 merge type constructors based on the \I{OccName} of their top-level type;
821 merging only occurs between in-scope identifiers.
822
823 \paragraph{Does merging a selector merge the type constructor?}
824
825 \begin{verbatim}
826 signature A1(A(..)) where
827 data A = A { foo :: Int, bar :: Bool }
828
829 signature A2(A(..)) where
830 data A = A { foo :: Int, bar :: Bool }
831
832 signature A2(foo) where
833 import A1(foo)
834 \end{verbatim}
835 %
836 Does the last signature, which is written in the style of a sharing constraint on \verb|foo|,
837 also cause \verb|bar| and the type and constructor \verb|A| to be unified?
838 Because a merge of a child name results in a substitution on the parent name,
839 the answer is yes.
840
841 \paragraph{Incomplete data declarations}
842
843 \begin{verbatim}
844 signature A1(A(foo)) where
845 data A = A { foo :: Int }
846
847 signature A2(A(bar)) where
848 data A = A { bar :: Bool }
849 \end{verbatim}
850 %
851 Should \verb|A1| and \verb|A2| merge? If yes, this would imply
852 that data definitions in signatures could only be \emph{partial}
853 specifications of their true data types. This seems complicated,
854 which suggests this should not be supported; however, in fact,
855 this sort of definition, while disallowed during type checking,
856 should be \emph{allowed} during shaping. The reason that the
857 shape we abscribe to the signatures \verb|A1| and \verb|A2| are
858 equivalent to the shapes for these which should merge:
859
860 \begin{verbatim}
861 signature A1(A(foo)) where
862 data A = A { foo :: Int, bar :: Bool }
863
864 signature A2(A(bar)) where
865 data A = A { foo :: Int, bar :: Bool }
866 \end{verbatim}
867
868 \subsection{Subtyping record selectors as functions}
869
870 \begin{verbatim}
871 signature H(A, foo) where
872 data A
873 foo :: A -> Int
874
875 module M(A, foo) where
876 data A = A { foo :: Int, bar :: Bool }
877 \end{verbatim}
878 %
879 Does \verb|M| successfully fill \verb|H|? If so, it means that anywhere
880 a signature requests a function \verb|foo|, we can instead validly
881 provide a record selector. This capability seems quite attractive,
882 although in practice record selectors rarely seem to be abstracted this
883 way: one reason is that \verb|M.foo| still \emph{is} a record selector,
884 and can be used to modify a record. (Many library authors find this
885 suprising!)
886
887 Nor does this seem to be an insurmountable instance of the avoidance
888 problem:
889 as a workaround, \verb|H| can equivalently be written as:
890
891 \begin{verbatim}
892 signature H(foo) where
893 data A = A { foo :: Int, bar :: Bool }
894 \end{verbatim}
895 %
896 However, you might not like this, as the otherwise irrelevant \verb|bar| must be mentioned
897 in the definition.
898
899 In any case, actually implementing this `subtyping' is quite complicated, because we can no
900 longer assume that every child name is associated with a parent name.
901 The technical difficulty is that we now need to unify a plain identifier
902 \I{AvailInfo} (from the signature) with a type constructor \I{AvailInfo}
903 (from a module.) It is not clear what this should mean.
904 Consider this situation:
905
906 \begin{verbatim}
907 package p where
908 signature H(A, foo, bar) where
909 data A
910 foo :: A -> Int
911 bar :: A -> Bool
912 module X(A, foo) where
913 import H
914 package q where
915 include p
916 signature H(bar) where
917 data A = A { foo :: Int, bar :: Bool }
918 module Y where
919 import X(A(..)) -- ???
920 \end{verbatim}
921
922 Should the wildcard import on \verb|X| be allowed?
923 This question is equivalent to whether or not shaping discovers
924 whether or not a function is a record selector and propagates this
925 information elsewhere.
926 If the wildcard is not allowed, here is another situation:
927
928 \begin{verbatim}
929 package p where
930 -- define without record selectors
931 signature X1(A, foo) where
932 data A
933 foo :: A -> Int
934 module M1(A, foo) where
935 import X1
936
937 package q where
938 -- define with record selectors (X1s unify)
939 signature X1(A(..)) where
940 data A = A { foo :: Int, bar :: Bool }
941 signature X2(A(..)) where
942 data A = A { foo :: Int, bar :: Bool }
943
944 -- export some record selectors
945 signature Y1(bar) where
946 import X1
947 signature Y2(bar) where
948 import X2
949
950 package r where
951 include p
952 include q
953
954 -- sharing constraint
955 signature Y2(bar) where
956 import Y1(bar)
957
958 -- the payload
959 module Test where
960 import M1(foo)
961 import X2(foo)
962 ... foo ... -- conflict?
963 \end{verbatim}
964
965 Without the sharing constraint, the \verb|foo|s from \verb|M1| and \verb|X2|
966 should conflict. With it, however, we should conclude that the \verb|foo|s
967 are the same, even though the \verb|foo| from \verb|M1| is \emph{not}
968 considered a child of \verb|A|, and even though in the sharing constraint
969 we \emph{only} unified \verb|bar| (and its parent \verb|A|). To know that
970 \verb|foo| from \verb|M1| should also be unified, we have to know a bit
971 more about \verb|A| when the sharing constraint performs unification;
972 however, the \I{AvailInfo} will only tell us about what is in-scope, which
973 is \emph{not} enough information.
974
975 %\newpage
976
977 \section{Type checking}
978
979 \begin{figure}[htpb]
980 $$
981 \begin{array}{rcll}
982 \I{PkgType} & ::= & \I{ModIface}_0 \verb|;|\, \ldots\verb|;|\, \I{ModIface}_n \\[1em]
983 \multicolumn{3}{l}{\mbox{\bf Module interface}} \\
984 \I{ModIface} & ::= & \verb|module| \; \I{Module} \; \verb|(| \I{mi\_exports} \verb|)| \; \verb|where| \\
985 & & \qquad \I{mi\_decls} \\
986 & & \qquad \I{mi\_insts} \\
987 & & \qquad \I{dep\_orphs} \\
988 \I{mi\_exports} & ::= & \I{AvailInfo}_0 \verb|,|\, \ldots \verb|,|\, \I{AvailInfo}_n & \mbox{Export list} \\
989 \I{mi\_decls} & ::= & \I{IfaceDecl}_0 \verb|;|\, \ldots \verb|;|\, \I{IfaceDecl}_n & \mbox{Defined declarations} \\
990 \I{mi\_insts} & ::= & \I{IfaceClsInst}_0 \verb|;|\, \ldots \verb|;|\, \I{IfaceClsInst}_n & \mbox{Defined instances} \\
991 \I{dep\_orphs} & ::= & \I{Module}_0 \verb|;|\, \ldots \verb|;|\, \I{Module}_n & \mbox{Transitive orphan dependencies} \\[1em]
992 \multicolumn{3}{l}{\mbox{\bf Interface declarations}} \\
993 \I{IfaceDecl} & ::= & \I{OccName} \; \verb|::| \; \I{IfaceId} \\
994 & | & \verb|data| \; \I{OccName} \; \verb|=| \;\ \I{IfaceData} \\
995 & | & \ldots \\
996 \I{IfaceClsInst} & & \mbox{A type-class instance} \\
997 \I{IfaceId} & & \mbox{Interface of top-level binder} \\
998 \I{IfaceData} & & \mbox{Interface of type constructor} \\
999 \end{array}
1000 $$
1001 \caption{Module interfaces in GHC} \label{fig:typecheck}
1002 \end{figure}
1003
1004 In general terms,
1005 type checking an indefinite package (a package with holes) involves
1006 calculating, for every module, a \I{ModIface} representing the
1007 type/interface of the module in question (which is serialized
1008 to disk). The general form of these
1009 interface files are described in Figure~\ref{fig:typecheck}; notably,
1010 the interfaces \I{IfaceId}, \I{IfaceData}, etc. contain \I{Name} references,
1011 which must be resolved by
1012 looking up a \I{ModIface} corresponding to the \I{Module} associated
1013 with the \I{Name}. (We will say more about this lookup process shortly.)
1014 For example, given:
1015
1016 \begin{verbatim}
1017 package p where
1018 signature H where
1019 data T
1020 module A(S, T) where
1021 import H
1022 data S = S T
1023 \end{verbatim}
1024 %
1025 the \I{PkgType} is:
1026
1027 \begin{verbatim}
1028 module HOLE:H (HOLE:H.T) where
1029 data T -- abstract type constructor
1030 module THIS:A (THIS:A.S, HOLE:H.T) where
1031 data S = S HOLE:H.T
1032 -- where THIS = p(H -> HOLE:H)
1033 \end{verbatim}
1034 %
1035 However, while it is true that the \I{ModIface} is the final result
1036 of type checking, we actually are conflating two distinct concepts: the user-visible
1037 notion of a \I{ModuleName}, which, when imported, brings some \I{Name}s
1038 into scope (or could trigger a deprecation warning, or pull in some
1039 orphan instances\ldots), versus the actual declarations, which, while recorded
1040 in the \I{ModIface}, have an independent existence: even if a declaration
1041 is not visible for an import, we may internally refer to its \I{Name}, and
1042 need to look it up to find out type information. (A simple case when
1043 this can occur is if a module exports a function with type \verb|T -> T|,
1044 but doesn't export \verb|T|).
1045
1046 \begin{figure}[htpb]
1047 $$
1048 \begin{array}{rcll}
1049 \I{ModDetails} & ::= & \langle\I{md\_types} \verb|;|\; \I{md\_insts}\rangle \\
1050 \I{md\_types} & ::= & \I{TyThing}_0 \verb|,|\, \ldots\verb|,|\, \I{TyThing}_n \\
1051 \I{md\_insts} & ::= & \I{ClsInst}_0 \verb|,|\, \ldots\verb|,|\, \I{ClsInst}_n \\[1em]
1052 \multicolumn{3}{l}{\mbox{\bf Type-checked declarations}} \\
1053 \I{TyThing} & & \mbox{Type-checked thing with a \I{Name}} \\
1054 \I{ClsInst} & & \mbox{Type-checked type class instance} \\
1055 \end{array}
1056 $$
1057 \caption{Semantic objects in GHC} \label{fig:typecheck-more}
1058 \end{figure}
1059
1060 Thus, a \I{ModIface} can be type-checked into a \I{ModDetails}, described in
1061 Figure~\ref{fig:typecheck-more}. Notice that a \I{ModDetails} is just
1062 a bag of type-checkable entities which GHC knows about. We
1063 define the \emph{external package state (EPT)} to
1064 simply be the union of the \I{ModDetails}
1065 of all external modules.
1066
1067 Type checking is a delicate balancing act between module
1068 interfaces and our semantic objects. A \I{ModIface} may get
1069 type-checked multiple times with different hole instantiations
1070 to provide multiple \I{ModDetails}.
1071 Furthermore complicating matters
1072 is that GHC does this resolution \emph{lazily}: a \I{ModIface}
1073 is only converted to a \I{ModDetails} when we are looking up
1074 the type of a \I{Name} that is described by the interface;
1075 thus, unlike usual theoretical treatments of type checking, we can't
1076 eagerly go ahead and perform substitutions on \I{ModIface}s when
1077 they get included.
1078
1079 In a separate compiler like GHC, there are two primary functions we must provide:
1080
1081 \paragraph{\textit{ModuleName} to \textit{ModIface}} Given a \I{ModuleName} which
1082 was explicitly imported by a user, we must produce a \I{ModIface}
1083 that, among other things, specifies what \I{Name}s are brought
1084 into scope. This is used by the renamer to resolve plain references
1085 to identifiers to real \I{Name}s. (By the way, if shaping produced
1086 renamed trees, it would not be necessary to do this step!)
1087
1088 \paragraph{\textit{Module} to \textit{ModDetails}/EPT} Given a \I{Module} which may be
1089 a part of a \I{Name}, we must be able to type check it into
1090 a \I{ModDetails} (usually by reading and typechecking the \I{ModIface}
1091 associated with the \I{Module}, but this process is involved). This
1092 is used by the type checker to find out type information on things. \\
1093
1094 There are two points in the type checker where these capabilities are exercised:
1095
1096 \paragraph{Source-level imports} When a user explicitly imports a
1097 module, the \textit{ModuleName} is mapped to a \textit{ModIface}
1098 to find out what exports are brought into scope (\I{mi\_exports})
1099 and what orphan instances must be loaded (\I{dep\_orphs}). Additionally,
1100 the \textit{Module} is loaded to the EPT to bring instances from
1101 the module into scope.
1102
1103 \paragraph{Internal name lookup} During type checking, we may have
1104 a \I{Name} for which we need type information (\I{TyThing}). If it's not already in the
1105 EPT, we type check and load
1106 into the EPT the \I{ModDetails} of the \I{Module} in the \I{Name},
1107 and then check the EPT again. (\verb|importDecl|)
1108
1109 \subsection{\textit{ModName} to \textit{ModIface}}
1110
1111 In all cases, the \I{mi\_exports} can be calculated directly from the
1112 shaping process, which specifies exactly for each \I{ModName} in scope
1113 what will be brought into scope.
1114
1115 \paragraph{Modules} Modules are straightforward, as for any
1116 \I{Module} there is only one possibly \I{ModIface} associated
1117 with it (the \I{ModIface} for when we type-checked the (unique) \verb|module|
1118 declaration.)
1119
1120 \paragraph{Signatures} For signatures, there may be multiple \I{ModIface}s
1121 associated with a \I{ModName} in scope, e.g. in this situation:
1122
1123 \begin{verbatim}
1124 package p where
1125 signature S where
1126 data A
1127 package q where
1128 include p
1129 signature S where
1130 data B
1131 module M where
1132 import S
1133 \end{verbatim}
1134 %
1135 Each literal \verb|signature| has a \I{ModIface} associated with it; and
1136 the import of \verb|S| in \verb|M|, we want to see the \emph{merged}
1137 \I{ModIface}s. We can determine the \I{mi\_exports} from the shape,
1138 but we also need to pull in orphan instances for each signature, and
1139 produce a warning for each deprecated signature.
1140
1141 \begin{aside}
1142 \textbf{Does hiding a signature hide its orphans.} Suppose that we have
1143 extended Backpack to allow hiding signatures from import.
1144
1145 \begin{verbatim}
1146 package p requires (H) where -- H is hidden from import
1147 module A where
1148 instance Eq (a -> b) where -- orphan
1149 signature H {-# DEPRECATED "Don't use me" #-} where
1150 import A
1151
1152 package q where
1153 include p
1154 signature H where
1155 data T
1156 module M where
1157 import H -- warn deprecated?
1158 instance Eq (a -> b) -- overlap?
1159 \end{verbatim}
1160
1161 It is probably the most consistent to not pull in orphan instances
1162 and not give the deprecated warning: this corresponds to merging
1163 visible \I{ModIface}s, and ignoring invisible ones.
1164 \end{aside}
1165
1166 \subsection{\textit{Module} to \textit{ModDetails}}
1167
1168 \paragraph{Modules} For modules, we have a \I{Module} of
1169 the form $\I{p}\verb|(|m\; \verb|->|\; \I{Module}\verb|,|\, \ldots\verb|)|$,
1170 and we also have a unique \I{ModIface}, where each hole instantiation
1171 is $\verb|HOLE:|m$.
1172
1173 To generate the \I{ModDetails} associated with the specific instantiation,
1174 we have to type-check the \I{ModIface} with the following adjustments:
1175
1176 \begin{enumerate}
1177 \item Perform a \I{Module} substitution according to the instantiation
1178 of the \I{ModIface}'s \I{Module}. (NB: we \emph{do}
1179 substitute \verb|HOLE:A.x| to \verb|HOLE:B.x| if we instantiated
1180 \verb|A -> HOLE:B|, \emph{unlike} the disjoint
1181 substitutions applied by shaping.)
1182 \item Perform a \I{Name} substitution as follows: for any name
1183 with a package key that is a $\verb|HOLE|$,
1184 substitute with the recorded \I{Name} in the requirements of the shape.
1185 Otherwise, look up the (unique) \I{ModIface} for the \I{Module},
1186 and subsitute with the corresponding \I{Name} in the \I{mi\_exports}.
1187 \end{enumerate}
1188
1189 \paragraph{Signatures} For signatures, we have a \I{Module} of the form
1190 $\verb|HOLE:|m$. Unlike modules, there are multiple \I{ModIface}s associated with a hole.
1191 We distinguish each separate \I{ModIface} by considering the full \I{PkgKey}
1192 it was defined in, e.g. \verb|p(A -> HOLE:C, B -> q():B)|; call this
1193 the hole's \emph{defining package key}; the set of \I{ModIface}s for a hole
1194 and their defining package keys can easily be calculated during shaping.
1195
1196 To generate the \I{ModDetails} associated with a hole, we type-check each
1197 \I{ModIface}, with the following adjustments:
1198
1199 \begin{enumerate}
1200 \item Perform a \I{Module} substitution according to the instantiation
1201 of the defining package key. (NB: This may rename the hole itself!)
1202 \item Perform a \I{Name} substitution as follows, in the same manner
1203 as would be done in the case of modules.
1204 \item When these \I{ModDetails} are merged into the EPT, some merging
1205 of duplicate types may occur; a type
1206 may be defined multiple times, in which case we check that each
1207 definition is compatible with the previous ones. A concrete
1208 type is always compatible with an abstract type.
1209 \end{enumerate}
1210
1211 \paragraph{Invariants} When we perform \I{Name} substitutions, we must be
1212 sure that we can always find out the correct \I{Name} to substitute to.
1213 This isn't obviously true, consider:
1214
1215 \begin{verbatim}
1216 package p where
1217 signature S(foo) where
1218 data T
1219 foo :: T
1220 module M(bar) where
1221 import S
1222 bar = foo
1223 package q where
1224 module A(T(..)) where
1225 data T = T
1226 foo = T
1227 module S(foo) where
1228 import A
1229 include p
1230 module A where
1231 import M
1232 ... bar ...
1233 \end{verbatim}
1234 %
1235 When we type check \verb|p|, we get the \I{ModIface}s:
1236
1237 \begin{verbatim}
1238 module HOLE:S(HOLE:S.foo) where
1239 data T
1240 foo :: HOLE:S.T
1241 module THIS:M(THIS:M.bar) where
1242 bar :: HOLE:S.T
1243 \end{verbatim}
1244 %
1245 Now, when we type check \verb|A|, we pull on the \I{Name} \verb|p(S -> q():S):M.bar|,
1246 which means we have to type check the \I{ModIface} for \verb|p(S -> q():S):M|.
1247 The un-substituted type of \verb|bar| has a reference to \verb|HOLE:S.T|;
1248 this should be substituted to \verb|q():S.T|. But how do we discover this?
1249 We know that \verb|HOLE:S| was instantiated to \verb|q():S|, so we might try
1250 and look for \verb|q():S.T|. However, this \I{Name} does not exist because
1251 the \verb|module S| reexports the selector from \verb|A|! Nor can we consult
1252 the (unique) \I{ModIface} for the module, as it doesn't reexport the relevant
1253 type.
1254
1255 The conclusion, then, is that a module written this way should be disallowed.
1256 Specifically, the correctness condition for a signature is this: \emph{Any \I{Name}
1257 mentioned in the \I{ModIface} of a signature must either be from an external module, or be
1258 exported by the signature}.
1259
1260 \begin{aside}
1261 \textbf{Special case export rule for record selectors.} Here is the analogous case for
1262 record selectors:
1263 \begin{verbatim}
1264 package p where
1265 signature S(foo) where
1266 data T = T { foo :: Int }
1267 module M(bar) where
1268 import S
1269 bar = foo
1270 package q where
1271 module A(T(..)) where
1272 data T = T { foo :: Int }
1273 module S(foo) where
1274 import A
1275 include p
1276 module A where
1277 import M
1278 ... bar ...
1279 \end{verbatim}
1280
1281 We could reject this, but technically we can find the right substitution
1282 for \verb|T|, because the export of \verb|foo| is an \I{AvailTC} which
1283 does mention \verb|T|.
1284 \end{aside}
1285
1286 \section{Cabal}
1287
1288 Design goals:
1289
1290 \begin{itemize}
1291 \item Backpack files are user-written. (In an earlier design, we had
1292 the idea that Cabal would generate Backpack files; however, we've
1293 since made Backpack files more user-friendly and reasonable to
1294 write by hand.)
1295
1296 \item Backpack files are optional. A package can add a Backpack file
1297 to replace some (but not all) of the fields in a Cabal description.
1298
1299 \item Backpack files can be compiled without GHC, if it is self-contained
1300 with respect to all the indefinite packages it includes. To include
1301 an indefinite package which is not locally defined but installed
1302 to the package database, you must use Cabal.
1303
1304 \item Backpack packages are \emph{unversioned}; you never see a version
1305 number in a Backpack package.
1306 \end{itemize}
1307
1308 \subsection{Versioning}
1309
1310 In this section, we discuss how Cabal's version numbers factor into
1311 Backpack, namely how we specify \I{PkgKey}s.
1312
1313 \paragraph{History}
1314 Prior to GHC 7.10, GHC has allowed an arbitrary combination of libraries
1315 to be linked together, assuming that the package IDs (e.g.
1316 \verb|foo-0.1|) were all unique. Cabal enforces a stronger restriction,
1317 which is that there exists some unique mapping from package name to
1318 package version which is consistent with all transitive dependencies.
1319
1320 \paragraph{Design goals}
1321 Here are some design goals for versioning:
1322
1323 \begin{enumerate}
1324 \item GHC only tests for equality on versioning; Cabal is
1325 responsible for determining the version of a package. For example,
1326 pre-7.10 the linker symbols were prefixed using a package name and
1327 version, but GHC simply represented this internally as an opaque
1328 string. As another example, package qualified imports only allow
1329 qualification by package name, and not by version.
1330
1331 \item Cabal only tests for equality on package keys; GHC is
1332 responsible for calculating the package key of a package. (This is
1333 because GHC must be able to maintain a mapping between the unhashed
1334 and hashed versions of a key, and the hashing process must be
1335 deterministic.) If Cabal needs to generate a new package key, it
1336 must do so through GHC.
1337
1338 \item Our design should, in principle, support mutual recursion
1339 between packages, even if the implementation does not (presently).
1340
1341 \item GHC should not lose functionality, i.e. it should still be
1342 possible to link together the same package with different versions;
1343 however, Cabal may arrange for this to not occur by default unless a
1344 user explicitly asks for it.
1345 \end{enumerate}
1346
1347 These goals imply a few things:
1348
1349 \begin{enumerate}
1350 \item Backpack files should not contain any version numbers,
1351 and should be agnostic to versioning.
1352
1353 \item Package keys must record versioning information, otherwise
1354 we can't link together two different versions of the same package.
1355 \end{enumerate}
1356
1357 \paragraph{Package keys}
1358
1359 Earlier, we specified \I{PkgKey} as a package name $p$ and then a list
1360 of hole instantiations. To allow linking together multiple versions of
1361 the same package, we must record versioning information into the
1362 \I{PkgKey}. To do this, we include in the \I{PkgKey} a \I{VersionHash}.
1363 Cabal is responsible for defining \I{VersionHash}, but we give two possible
1364 definitions in Figure~\ref{fig:version}.
1365
1366 \begin{figure}[htpb]
1367 $$
1368 \begin{array}{rcll}
1369 p && \mbox{Package name} \\
1370 v && \mbox{Version number} \\[1em]
1371 \I{VersionHash} & ::= & p \verb|-| v\; \verb|{| \, p_0 \; \verb|->| \; \I{VersionHash}_0 \verb|,|\, \ldots\, p_n \; \verb|->| \; \I{VersionHash}_n \, \verb|}| & \mbox{Full version hash} \\
1372 \I{VersionHash'} & ::= & p \; \verb|{| \, p_0\verb|-|v_0 \verb|,|\, \ldots\, p_n\verb|-|v_n \, \verb|}| & \mbox{Simplified version hash} \\
1373 \I{PkgKey} & ::= & \I{VersionHash} \verb|(| \, m \; \verb|->| \; \I{Module} \verb|,|\, \ldots\, \verb|)| \\
1374 \end{array}
1375 $$
1376 \caption{Version hash} \label{fig:version}
1377 \end{figure}
1378
1379 The difference between a full version hash and a simplified version hash
1380 is what linking restrictions they impose on programs: the full version
1381 hash supports linking arbitrary versions of packages with arbitrary
1382 other versions, whereas the simplified hash has a Cabal-style requirement
1383 that there be some globally consistent mapping from package name to version.
1384
1385 The full version hash has some subtleties:
1386
1387 \begin{itemize}
1388 \item Each sub-\I{VersionHash} recorded in a \I{VersionHash} is
1389 identified by a package name, which may not necessarily equal the
1390 package name in the \I{VersionHash}. This permits us to calculate
1391 a \I{VersionHash} for a package like:
1392 \begin{verbatim}
1393 package p where
1394 include network (Network)
1395 include network-old (Network as Network.Old)
1396 ...
1397 \end{verbatim}
1398 if we want \verb|network| to refer to \verb|network-2.0| and
1399 \verb|network-old| to refer to \verb|network-1.0|. Without
1400 identifying each subdependency by package name, we wouldn't know
1401 what \verb|network-old| would refer to.
1402
1403 \item If a package is locally specified in a Backpack
1404 file, it does not occur in the \I{VersionHash}. This is because
1405 we always refer to the same package; there are no different versions!
1406
1407 \item You might wonder why we need a \I{VersionHash} as well as a \I{PkgKey};
1408 why not just specify \I{PkgKey} as $p-v \; \verb|{| \, p \; \verb|->| \; \I{PkgKey} \verb|,|\, \ldots\, \verb|}| \verb|(| \, m \; \verb|->| \; \I{Module} \verb|,|\, \ldots\, \verb|)|$? However, there is ``too much'' information in the \I{PkgKey}, causing the scheme to not work with mutual recursion:
1409
1410 \begin{verbatim}
1411 package p where
1412 module M
1413 include q
1414 \end{verbatim}
1415
1416 To specify the package key of \verb|p|, we need the package key of \verb|q|; to
1417 specify the package key of \verb|q|, we need the module identifier of \verb|M|
1418 which contains the package key of \verb|p|: circularity! (The simplified
1419 version hash does not have this problem as it is not recursive.)
1420 \end{itemize}
1421
1422 \paragraph{Cabal to GHC}
1423
1424 Prior to GHC-7.10, Cabal passed versioning information to GHC using the
1425 \verb|-package-name| flag. In GHC 7.10, this flag was renamed to
1426 \verb|-this-package-key|. We propose that this flag be renamed once
1427 again to \verb|-this-version-hash|, to which Cabal passes a hash (or string)
1428 describing the versioning of the package which is then incorporated
1429 into the package key. Cabal no longer needs to calculate package keys.
1430 In the absence of Backpack, there will be no semantic difference if we
1431 switch to full version hashes.
1432
1433 \end{document} % chktex 16