588f10e8ccb5a9a6a726258c5e903745cc1700b6
[ghc.git] / docs / backpack / algorithm.tex
1 \documentclass{article}
2
3 \usepackage{pifont}
4 \usepackage{graphicx} %[pdftex] OR [dvips]
5 \usepackage{fullpage}
6 \usepackage{wrapfig}
7 \usepackage{float}
8 \usepackage{titling}
9 \usepackage{hyperref}
10 \usepackage{tikz}
11 \usepackage{color}
12 \usepackage{footnote}
13 \usepackage{float}
14 \usepackage{algorithm}
15 \usepackage{algpseudocode}
16 \usepackage{bigfoot}
17 \usepackage{amssymb}
18 \usepackage{framed}
19
20 % Alter some LaTeX defaults for better treatment of figures:
21 % See p.105 of "TeX Unbound" for suggested values.
22 % See pp. 199-200 of Lamport's "LaTeX" book for details.
23 % General parameters, for ALL pages:
24 \renewcommand{\topfraction}{0.9} % max fraction of floats at top
25 \renewcommand{\bottomfraction}{0.8} % max fraction of floats at bottom
26 % Parameters for TEXT pages (not float pages):
27 \setcounter{topnumber}{2}
28 \setcounter{bottomnumber}{2}
29 \setcounter{totalnumber}{4} % 2 may work better
30 \setcounter{dbltopnumber}{2} % for 2-column pages
31 \renewcommand{\dbltopfraction}{0.9} % fit big float above 2-col. text
32 \renewcommand{\textfraction}{0.07} % allow minimal text w. figs
33 % Parameters for FLOAT pages (not text pages):
34 \renewcommand{\floatpagefraction}{0.7} % require fuller float pages
35 % N.B.: floatpagefraction MUST be less than topfraction !!
36 \renewcommand{\dblfloatpagefraction}{0.7} % require fuller float pages
37
38 % remember to use [htp] or [htpb] for placement
39
40 \newcommand{\I}[1]{\ensuremath{\mathit{#1}}}
41
42 \newcommand{\optionrule}{\noindent\rule{1.0\textwidth}{0.75pt}}
43
44 \newenvironment{aside}
45 {\begin{figure}\def\FrameCommand{\hspace{2em}}
46 \MakeFramed{\advance\hsize-\width}\optionrule\small}
47 {\par\vskip-\smallskipamount\optionrule\endMakeFramed\end{figure}}
48
49 \setlength{\droptitle}{-6em}
50
51 \newcommand{\Red}[1]{{\color{red} #1}}
52
53 \title{The Backpack algorithm}
54
55 \begin{document}
56
57 \maketitle
58
59 \noindent
60 In this document, we look at the compilation of a Backpack unit.
61 Here are the steps a unit goes through:
62
63 \begin{enumerate}
64 \item The \textbf{unit renamer} takes the Backpack file consisting
65 of units transforms each unit name to an indefinite unit ID \I{IndefUnitId}.
66 In particular, it associates each unit name with its local binding
67 site (unit declaration), or an external unit declaration from
68 the indefinite unit database.
69
70 \item The \textbf{dependency solver} takes a \I{unit}
71 and converts it into a
72 directed acyclic graph representing the
73 dependency structure of the declarations in a unit.
74 It also computes the \I{Module} of each module/signature
75 declaration, the \I{UnitKey} of each include, and the overall
76 requirements of the unit.
77
78 \item The \textbf{shaping pass} takes the DAG
79 and computes its \I{Shape}. A \I{Shape} describes precisely what
80 a unit provides and requires at the Haskell declaration level (\I{AvailInfo}).
81
82 \item The \textbf{indefinite pipeline} takes the DAG and the shape
83 and typechecks each module and signature against the indefinite unit
84 database. The type checking results are saved in the indefinite
85 unit database under the \I{IndefiniteUnitId}.
86
87 \item The \textbf{definite pipeline} takes the DAG as well as a
88 \emph{hole mapping} specifying how each requirement in the unit
89 is to be filled, and type-checks and compiles the unit against the
90 installed unit database. The type checking results and object files
91 are saved to the installed unit database under the \I{InstalledUnitId}.
92 \end{enumerate}
93
94 \section{Front-end syntax}
95
96 \begin{figure}[htpb]
97 $$
98 \begin{array}{rcll}
99 p,q,r && \mbox{Unit names} \\
100 m,n && \mbox{Module names} \\[1em]
101 \multicolumn{3}{l}{\mbox{\bf Units}} \\
102 \I{unit} & ::= & \verb|unit|\; p\; [\I{provreq}]\; \verb|where {| d_1 \verb|;| \ldots \verb|;| d_n \verb|}| \\[1em]
103 \multicolumn{3}{l}{\mbox{\bf Declarations}} \\
104 d & ::= & \verb|module|\; m \; [exports]\; \verb|where|\; \I{body} \\
105 & | & \verb|signature|\; m \; [exports]\; \verb|where|\; \I{body} \\
106 & | & \verb|include|\; p \; [provreq] \\[1em]
107 \multicolumn{3}{l}{\mbox{\bf Provides/requires specification}} \\
108 \I{provreq} & ::= & \verb|(| \, \I{rns} \, \verb|)| \;
109 [ \verb|requires(|\, \I{rns} \, \verb|)| ] \\
110 \I{rns} & ::= & \I{rn}_0 \verb|,| \, \ldots \verb|,| \, \I{rn}_n [\verb|,|] & \mbox{Renamings} \\
111 \I{rn} & ::= & m\; \verb|as| \; n & \mbox{Renaming} \\[1em]
112 \multicolumn{3}{l}{\mbox{\bf Haskell code}} \\
113 \I{exports} & & \mbox{A Haskell module export list} \\
114 \I{body} & & \mbox{A Haskell module body} \\
115 \end{array}
116 $$
117 \caption{Syntax of Backpack} \label{fig:syntax}
118 \end{figure}
119
120 A (slightly simplified) syntax of Backpack is given in Figure~\ref{fig:syntax}.
121
122 \newpage
123 \section{Unit renamer}
124
125 \begin{figure}[htpb]
126 $$
127 \begin{array}{rcll}
128 \I{InstalledPackageId} & & \mbox{Installed package IDs} \\
129 \I{IndefiniteUnitId} & ::= & \I{InstalledPackageId}\, \verb|-|\, p \\
130 \I{InstalledUnitId} & ::= & \I{IndefiniteUnitId} \verb|(| \, m \; \verb|->| \; \I{Module} \verb|,|\, \ldots\, \verb|)| & \mbox{Also known as \I{UnitKey}} \\
131 \I{Module} & ::= & \I{InstalledUnitId} \verb|:| m \\
132 \end{array}
133 $$
134 \caption{Unit identification} \label{fig:ids}
135 \end{figure}
136
137 The unit renamer is responsible for transforming unit names $p$ into \I{IndefiniteUnitId}s, given some current \I{InstalledPackageId} (\I{ThisInstalledPackageId}) and a mapping from $p$ to
138 \I{IndefiniteUnitId} (\I{UnitNameMap}). Its operation on a Backpack file (collection of units) is very simple:
139
140 \begin{itemize}
141 \item Every unit declaration $\verb|unit|\; p$ is renamed to $\I{ThisInstalledPackageId}\, \verb|-|\, p$
142 \item Every unit include $\verb|include|\; p$ is renamed to $\I{ThisInstalledPackageId}\, \verb|-|\, p$ if $p$ was declared in the Backpack file; otherwise it is renamed according to \I{UnitNameMap}.
143 \end{itemize}
144 %
145 The purpose of an \emph{IndefiniteUnitId} is to uniquely identify the results of typechecking
146 an indefinite unit; similarly, an \emph{InstalledUnitId} uniquely identifies
147 the results of compiling a unit with all its holes
148 filled. It thus records a \emph{hole mapping} which specifies how each
149 hole was filled.
150
151 An \emph{installed package ID} (IPID) is an opaque string provided by
152 Cabal which uniquely identifies an installed package. A recipe for
153 computing an IPID would incorporate both the source info, such as the hash of the
154 source code distribution tarball, as well as build info, such
155 as the selected Cabal and GHC flags, what the provided mapping from
156 $p$ to $IndefiniteUnitId$ was, etc.
157
158 \paragraph{The difference between units and packages}
159 Cabal packages are:
160
161 \begin{itemize}
162 \item The unit of distribution
163 \item The unit that Hackage handles
164 \item The unit of versioning
165 \item The unit of ownership (who maintains it etc)
166 \end{itemize}
167
168 Backpack units are the building blocks of modular development;
169 there may be multiple units per a Cabal package. While in theory
170 Cabal could do sophisticated things with multiple units in a
171 package, we expect Cabal
172 to pick a distinguished unit (with the same unit name $p$ as
173 the package) which serves as the publically visible unit.
174
175 \paragraph{Notational convention}
176 In the rest of this document, when it is unambiguous, we will use $p$, $q$ and $r$
177 interchangeably with \I{IndefiniteUnitId}, as after unit renaming, there
178 are no occurrences of unit names.
179
180 \newpage
181 \section{Dependency solver}
182
183 \begin{figure}[htpb]
184 $$
185 \begin{array}{rcll}
186 \tilde{d} & ::= & \verb|module|\; Module \; [exports]\; \verb|where|\; \I{body} \\
187 & | & \verb|signature|\; \I{Module} \; [exports]\; \verb|where|\; \I{body} \\
188 & | & \verb|include|\; p\,\verb|(|\, m\; \verb|->|\; \I{Module}\,\verb|,|\;\ldots\,\verb|)| \\
189 & | & \verb|merge|\; \I{Module}\; \verb|of|\; (\I{Module}, \I{IsSource?}) \ldots
190 \end{array}
191 $$
192 \caption{Resolved declarations} \label{fig:resolved}
193 \end{figure}
194
195 The first phase of compilation is defines a directed acyclic graph on
196 the source syntax representing the dependency structure of the
197 modules/signatures/includes in the unit. This DAG has a node per:
198
199 \begin{itemize}
200 \item Each source-level module, signature and include, and
201 \item Each unfilled requirement (called a ``signature merge'' node).
202 \end{itemize}
203 %
204 Each module, signature and signature merge node can be identified
205 with the tuple $\left(\I{Module}, \I{IsSource?}\right)$, where \I{IsSource?}
206 is true for signatures and false for modules and signature merges.
207 The four nodes are described in Figure~\ref{fig:resolved}.
208
209 The edges of the directed graph signify a ``depends on'' relation, and are
210 defined as follows:
211
212 \begin{itemize}
213 \item A module/signature $m$ depends on $\verb|include|\; p$ if $m$ imports a module provided by $p$.
214 \item A module/signature $m$ depends on a module/signature merge $n$ if $m$ imports $n$.
215 \item A module/signature $m$ depends on a signature $n$ if $m$ \verb|{-# SOURCE #-}| imports $n$.
216 \item A module/signature merge $m$ depends on a local signature $m$ (if it exists).
217 \item A module/signature merge $m$ depends on a $\verb|include|\; p$, if the include requires $m$.
218 \end{itemize}
219 %
220 For compilation, these extra edges can also be defined if they
221 do not introduce a cycle:
222
223 \begin{itemize}
224 \item An $\verb|include|\; p$ depends on $\verb|include|\; q$ if, for some module name $m$, $p$ requires $m$ and $q$ provides $m$.
225 \item An $\verb|include|\; p$ depends on a module $m$ if $p$ requires a module named $m$.
226 \end{itemize}
227 %
228 If the resulting graph has a cycle, this is an error.
229
230 \paragraph{Computing unfilled requirements} To compute unfilled requirements,
231 maintain two sets of module names: the provisions $P$ and the possible requirements $R'$. For each declaration:
232
233 \begin{itemize}
234 \item $\verb|include|\; p$: union provisions with $P$ and requirements with $R'$.
235 \item $\verb|module|\; m$: add $m$ to $P$
236 \item $\verb|signature|\; m$: add $m$ to $R'$
237 \end{itemize}
238 %
239 The unfilled requirements $R=R'-P$.
240
241 \paragraph{Computing the \I{Module} of declarations}
242 The \I{Module} of any declaration $m$ in a unit $p$ is simply
243 \verb|p(A -> HOLE:A, ...):m|, where the hole map is a map
244 from each unfilled requirement $n$ to \verb|HOLE:n|.
245
246 \paragraph{Computing the hole mapping of includes} In absence of mutual
247 recursion of includes, the DAG is acyclic with include-include edges. Process includes
248 in this topological order, maintaining a mapping of provided modules $\Gamma$, accumulating
249 provisions of includes as we go along. For each $\verb|include|; p$, the hole map is
250 simply the requirements of $p$, mapping $m$ to $\Gamma(m)$ if it is defined, and \verb|HOLE:m| otherwise.
251
252 With mutual recursion, we have to use the regular tree unification algorithm described in the Backpack paper. We omit it from here for now.
253
254 \newpage
255 \section{Shaping}
256
257 \begin{figure}[htpb]
258 $$
259 \begin{array}{rcll}
260 \I{Shape} & ::= & \verb|provides:|\; m \; \verb|->|\; \I{Module}\; \verb|{|\, \I{AvailInfo} \verb|,|\, \ldots \, \verb|};| \ldots \\
261 & & \verb|requires:| \; m \; \verb|->|\; \textcolor{white}{\I{Module}}\; \verb|{| \, \I{AvailInfo} \verb|,| \, \ldots \, \verb|}| \verb|;| \ldots \\
262 \I{AvailInfo} & ::= & \I{Name} & \mbox{Plain identifiers} \\
263 & | & \I{Name} \, \verb|{| \, \I{Name}_0\verb|,| \, \ldots\verb|,| \, \I{Name}_n \, \verb|}| & \mbox{Type constructors} \\
264 \I{Name} & ::= & \I{Module} \verb|.| \I{OccName} \\
265 \I{OccName} & & \mbox{Unqualified name in a namespace}
266 \end{array}
267 $$
268 \caption{Shaping} \label{fig:semantic}
269 \end{figure}
270
271 Shaping computes a \I{Shape}, whose form is described in Figure~\ref{fig:semantic}.
272 A shape describes what modules a unit implements and exports (the \emph{provides})
273 and what signatures a unit needs to have filled in (the \emph{requires}). Both
274 provisions and requires are available for import by units which include this
275 unit.
276
277 We incrementally build a shape by starting with an empty
278 shape context and adding to it as follows:
279
280 \begin{enumerate}
281 \item Calculate the shape of a declaration, with respect to the
282 current shape context. (e.g., by renaming a module/signature,
283 or using the shape from an included unit.)
284 \item Merge this shape into the shape context.
285 \end{enumerate}
286
287 The final shape context is the shape of the unit as a whole.
288 Optionally, we can also compute the renamed syntax trees of
289 modules and signatures.
290
291 In the description below, we'll assume \verb|THIS| is the unit key
292 of the unit being processed.
293
294 \begin{aside}
295 \textbf{\textit{OccName} is implied by \textit{Name}.}
296 In Haskell, the following is not valid syntax:
297
298 \begin{verbatim}
299 import A (foobar as baz)
300 \end{verbatim}
301 In particular, a \I{Name} which is in scope will always have the same
302 \I{OccName} (even if it may be qualified.) You might imagine relaxing
303 this restriction so that declarations can be used under different \I{OccName}s;
304 in such a world, we need a different definition of shape:
305
306 \begin{verbatim}
307 Shape ::=
308 provided: ModName -> { OccName -> Name }
309 required: ModName -> { OccName -> Name }
310 \end{verbatim}
311 Presently, however, such an \I{OccName} annotation would be redundant: it can be inferred from the \I{Name}.
312 \end{aside}
313
314 \begin{aside}
315 \textbf{Holes of a unit are a mapping, not a set.} Why can't the \I{UnitKey} just record a
316 set of \I{Module}s, e.g. $\I{UnitKey}\;::= \; \I{SrcUnitKey} \; \verb|{| \; \I{Module} \; \verb|}|$? Consider:
317
318 \begin{verbatim}
319 unit p (A) requires (H1, H2) where
320 signature H1(T) where
321 data T
322 signature H2(T) where
323 data T
324 module A(A(..)) where
325 import qualified H1
326 import qualified H2
327 data A = A H1.T H2.T
328
329 unit q (A12, A21) where
330 module I1(T) where
331 data T = T Int
332 module I2(T) where
333 data T = T Bool
334 include p (A as A12) requires (H1 as I1, H2 as I2)
335 include p (A as A21) requires (H1 as I2, H2 as I1)
336 \end{verbatim}
337 With a mapping, the first instance of \verb|p| has key \verb|p(H1 -> q():I1, H2 -> q():I2)|
338 while the second instance has key \verb|p(H1 -> q():I2, H2 -> q():I1)|; with
339 a set, both would have the key \verb|p(q():I1, q():I2)|.
340 \end{aside}
341
342
343 \begin{aside}
344 \textbf{Signatures can require a specific entity.}
345 With requirements like \verb|A -> { HOLE:A.T, HOLE:A.foo }|,
346 why not specify it as \verb|A -> { T, foo }|,
347 e.g., \verb|required: { ModName -> { OccName } }|? Consider:
348
349 \begin{verbatim}
350 unit p () requires (A, B) where
351 signature A(T) where
352 data T
353 signature B(T) where
354 import T
355 \end{verbatim}
356 The requirements of this unit specify that \verb|A.T| $=$ \verb|B.T|; this
357 can be expressed with \I{Name}s as
358
359 \begin{verbatim}
360 A -> { HOLE:A.T }
361 B -> { HOLE:A.T }
362 \end{verbatim}
363 But, without \I{Name}s, the sharing constraint is impossible: \verb|A -> { T }; B -> { T }|. (NB: \verb|A| and \verb|B| don't have to be implemented with the same module.)
364 \end{aside}
365
366 \begin{aside}
367 \textbf{The \textit{Name} of a value is used to avoid
368 ambiguous identifier errors.} We state that two types
369 are equal when their \I{Name}s are the same; however,
370 for values, it is less clear why we care. But consider this example:
371
372 \begin{verbatim}
373 unit p (A) requires (H1, H2) where
374 signature H1(x) where
375 x :: Int
376 signature H2(x) where
377 import H1(x)
378 module A(y) where
379 import H1
380 import H2
381 y = x
382 \end{verbatim}
383 The reference to \verb|x| in \verb|A| is unambiguous, because it is known
384 that \verb|x| from \verb|H1| and \verb|x| from \verb|H2| are the same (have
385 the same \I{Name}.) If they were not the same, it would be ambiguous and
386 should cause an error. Knowing the \I{Name} of a value distinguishes
387 between these two cases.
388 \end{aside}
389
390 \begin{aside}
391 \textbf{Holes are linear}
392 Requirements do not record what \I{Module} represents
393 the identity of a requirement, which means that it's not possible to assert
394 that hole \verb|A| and hole \verb|B| should be implemented with the same module,
395 as might occur with aliasing:
396
397 \begin{verbatim}
398 signature A where
399 signature B where
400 alias A = B
401 \end{verbatim}
402 %
403 The benefit of this restriction is that when a requirement is filled,
404 it is obvious that this is the only requirement that is filled: you won't
405 magically cause some other requirements to be filled. The downside is
406 it's not possible to write a unit which looks for an interface it is
407 looking for in one of $n$ names, accepting any name as an acceptable linkage.
408 If aliasing was allowed, we'd need a separate physical shaping context,
409 to make sure multiple mentions of the same hole were consistent.
410
411 \end{aside}
412
413 %\newpage
414
415 \subsection{\texttt{module M}}
416
417 A module declaration provides a module \verb|THIS:M| at module name \verb|M|. It
418 has the shape:
419
420 \begin{verbatim}
421 provides: M -> THIS:M { exports of renamed M under THIS:M }
422 requires: (nothing)
423 \end{verbatim}
424 Example:
425
426 \begin{verbatim}
427 module A(T) where
428 data T = T
429
430 -- provides: A -> THIS:A { THIS:A.T }
431 -- requires: (nothing)
432 \end{verbatim}
433
434 \newpage
435 \subsection{\texttt{signature M}}
436
437 A signature declaration creates a requirement at module name \verb|M|. It has the shape:
438
439 \begin{verbatim}
440 provides: (nothing)
441 requires: M -> { exports of renamed M under HOLE:M }
442 \end{verbatim}
443
444 \noindent Example:
445
446 \begin{verbatim}
447 signature H(T) where
448 data T
449
450 -- provides: H -> (nothing)
451 -- requires: H -> { HOLE:H.T }
452 \end{verbatim}
453
454 \begin{aside}
455 \textbf{In-scope signatures are not provisions}. We enforce the invariant that
456 a provision is always (syntactically) a \verb|module| and a requirement
457 is always a \verb|signature|. This means that if you have a requirement
458 and a provision of the same name, the requirement can \emph{always} be filled
459 with the provision. Without this invariant, it's not clear if a provision
460 will actually fill a signature. Consider this example, where
461 a signature is required and exposed:
462
463 \begin{verbatim}
464 unit a-sigs (A) requires (A) where -- ***
465 signature A where
466 data T
467
468 unit a-user (B) requires (A) where
469 signature A where
470 data T
471 x :: T
472 module B where
473 ...
474
475 unit p where
476 include a-sigs
477 include a-user
478 \end{verbatim}
479 %
480 When we consider merging in the shape of \verb|a-user|, does the
481 \verb|A| provided by \verb|a-sigs| fill in the \verb|A| requirement
482 in \verb|a-user|? It \emph{should not}, since \verb|a-sigs| does not
483 actually provide enough declarations to satisfy \verb|a-user|'s
484 requirement: the intended semantics \emph{merges} the requirements
485 of \verb|a-sigs| and \verb|a-user|.
486
487
488
489 \begin{verbatim}
490 unit a-sigs (M as A) requires (H as A) where
491 signature H(T) where
492 data T
493 module M(T) where
494 import H(T)
495 \end{verbatim}
496 %
497 We rightly should error, since the provision is a module. And in this situation:
498
499 \begin{verbatim}
500 unit a-sigs (H as A) requires (H) where
501 signature H(T) where
502 data T
503 \end{verbatim}
504 %
505 The requirements should be merged, but should the merged requirement
506 be under the name \verb|H| or \verb|A|?
507
508 It may still be possible to use the \verb|(A) requires (A)| syntax to
509 indicate exposed signatures, but this would be a mere syntactic
510 alternative to \verb|() requires (exposed A)|.
511 \end{aside}
512 %
513
514 \newpage
515 \subsection{\texttt{include pkg (X) requires (Y)}}
516
517 We merge with the transformed shape of unit \verb|pkg|, where this
518 shape is transformed by:
519
520 \begin{itemize}
521 \item Renaming and thinning the provisions according to \verb|(X)|
522 \item Renaming requirements according to \verb|(Y)| (requirements cannot
523 be thinned, so non-mentioned requirements are implicitly passed through.)
524 For each renamed requirement from \verb|Y| to \verb|Y'|,
525 substitute \verb|HOLE:Y| with \verb|HOLE:Y'| in the
526 \I{Module}s and \I{Name}s of the provides and requires.
527 \end{itemize}
528 %
529 If there are no thinnings/renamings, you just merge the
530 shape unchanged! Here is an example:
531
532 \begin{verbatim}
533 unit p (M) requires (H) where
534 signature H where
535 data T
536 module M where
537 import H
538 data S = S T
539
540 unit q (A) where
541 module X where
542 data T = T
543 include p (M as A) requires (H as X)
544 \end{verbatim}
545 %
546 The shape of unit \verb|p| is:
547
548 \begin{verbatim}
549 requires: M -> { p(H -> HOLE:H):M.S }
550 provides: H -> { HOLE:H.T }
551 \end{verbatim}
552 %
553 Thus, when we process the \verb|include| in unit \verb|q|,
554 we make the following two changes: we rename the provisions,
555 and we rename the requirements, substituting \verb|HOLE|s.
556 The resulting shape to be merged in is:
557
558 \begin{verbatim}
559 provides: A -> { p(H -> HOLE:X):M.S }
560 requires: X -> { HOLE:X.T }
561 \end{verbatim}
562 %
563 After merging this in, the final shape of \verb|q| is:
564
565 \begin{verbatim}
566 provides: X -> { q():X.T } -- from shaping 'module X'
567 A -> { p(H -> q():X):M.S }
568 requires: (nothing) -- discharged by provided X
569 \end{verbatim}
570
571 \newpage
572
573 \subsection{Merging}
574
575 The shapes we've given for individual declarations have been quite
576 simple. Merging combines two shapes, filling requirements with
577 implementations, unifying \I{Name}s, and unioning requirements; it is
578 the most complicated part of the shaping process.
579
580 The best way to think about merging is that we take two units with
581 inputs (requirements) and outputs (provisions) and ``wiring'' them up so
582 that outputs feed into inputs. In the absence
583 of mutual recursion, this wiring process is \emph{directed}: the provisions
584 of the first unit feed into the requirements of the second unit,
585 but never vice versa. (With mutual recursion, things can go in the opposite
586 direction as well.)
587
588 Suppose we are merging shape $p$ with shape $q$ (e.g., $p; q$). Merging
589 proceeds as follows:
590
591 \begin{enumerate}
592 \item \emph{Fill every requirement of $q$ with provided modules from
593 $p$.} For each requirement $M$ of $q$ that is provided by $p$ (in particular,
594 all of its required \verb|Name|s are provided),
595 substitute each \I{Module} occurrence of \verb|HOLE:M| with the
596 provided $p\verb|(|M\verb|)|$, unify the names, and remove the requirement from $q$.
597 If the names of the provision are not a superset of the required names, error.
598 \item If mutual recursion is supported, \emph{fill every requirement of $p$ with provided modules from $q$.}
599 \item \emph{Merge leftover requirements.} For each requirement $M$ of $q$ that is not
600 provided by $p$ but required by $p$, unify the names, and union them together to form the new requirement. (It's not
601 necessary to substitute \I{Module}s, since they are guaranteed to be the same.)
602 \item \emph{Add provisions of $q$.} Union the provisions of $p$ and $q$, erroring
603 if there is a duplicate that doesn't have the same identity.
604 \end{enumerate}
605 %
606 To unify two sets of names, find each pair of names with matching \I{OccName}s $n$ and $m$ and do the following:
607
608 \begin{enumerate}
609 \item If both are from holes, pick a canonical representative $m$ and substitute $n$ with $m$.
610 \item If one $n$ is from a hole, substitute $n$ with $m$.
611 \item Otherwise, error if the names are not the same.
612 \end{enumerate}
613 %
614 It is important to note that substitutions on \I{Module}s and substitutions on
615 \I{Name}s are disjoint: a substitution from \verb|HOLE:A| to \verb|HOLE:B|
616 does \emph{not} substitute inside the name \verb|HOLE:A.T|.
617
618 Since merging is the most complicated step of shaping, here are a big
619 pile of examples of it in action.
620
621 \subsubsection{A simple example}
622
623 In the following set of units:
624
625 \begin{verbatim}
626 unit p(M) requires (A) where
627 signature A(T) where
628 data T
629 module M(T, S) where
630 import A(T)
631 data S = S T
632
633 unit q where
634 module A where
635 data T = T
636 include p
637 \end{verbatim}
638
639 When we \verb|include p|, we need to merge the partial shape
640 of \verb|q| (with just provides \verb|A|) with the shape
641 of \verb|p|. Here is each step of the merging process:
642
643 \begin{verbatim}
644 shape 1 shape 2
645 --------------------------------------------------------------------------------
646 (initial shapes)
647 provides: A -> THIS:A { q():A.T } M -> p(A -> HOLE:A) { HOLE:A.T, p(A -> HOLE:A).S }
648 requires: (nothing) A -> { HOLE:A.T }
649
650 (after filling requirements)
651 provides: A -> THIS:A { q():A.T } M -> p(A -> THIS:A) { q():A.T, p(A -> THIS:A).S }
652 requires: (nothing) (nothing)
653
654 (after adding provides)
655 provides: A -> THIS:A { q():A.T }
656 M -> p(A -> THIS:A) { q():A.T, p(A -> THIS:A).S }
657 requires: (nothing)
658 \end{verbatim}
659
660 Notice that we substituted \verb|HOLE:A| with \verb|THIS:A|, but \verb|HOLE:A.T| with \verb|q():A.T|.
661
662 \subsubsection{Requirements merging can affect provisions}
663
664 When a merge results in a substitution, we substitute over both
665 requirements and provisions:
666
667 \begin{verbatim}
668 signature H(T) where
669 data T
670 module A(T) where
671 import H(T)
672 module B(T) where
673 data T = T
674
675 -- provides: A -> THIS:A { HOLE:H.T }
676 -- B -> THIS:B { THIS:B.T }
677 -- requires: H -> { HOLE:H.T }
678
679 signature H(T, f) where
680 import B(T)
681 f :: a -> a
682
683 -- provides: A -> THIS:A { THIS:B.T } -- UPDATED
684 -- B -> THIS:B { THIS:B.T }
685 -- requires: H -> { THIS:B.T, HOLE:H.f } -- UPDATED
686 \end{verbatim}
687
688 \subsubsection{Sharing constraints}
689
690 Suppose you have two signature which both independently define a type,
691 and you would like to assert that these two types are the same. In the
692 ML world, such a constraint is known as a sharing constraint. Sharing
693 constraints can be encoded in Backpacks via clever use of reexports;
694 they are also an instructive example for signature merging.
695
696 \begin{verbatim}
697 signature A(T) where
698 data T
699 signature B(T) where
700 data T
701
702 -- requires: A -> { HOLE:A.T }
703 B -> { HOLE:B.T }
704
705 -- the sharing constraint!
706 signature A(T) where
707 import B(T)
708 -- (shape to merge)
709 -- requires: A -> { HOLE:B.T }
710
711 -- (after merge)
712 -- requires: A -> { HOLE:A.T }
713 -- B -> { HOLE:A.T }
714 \end{verbatim}
715 %
716 \Red{I'm pretty sure any choice of \textit{Name} is OK, since the
717 subsequent substitution will make it alpha-equivalent.}
718
719 \subsection{Export declarations}
720
721 If an explicit export declaration is given, the final shape is the
722 computed shape, minus any provisions not mentioned in the list, with the
723 appropriate renaming applied to provisions and requirements. (Requirements
724 are implicitly passed through if they are not named.)
725 If no explicit export declaration is given, the final shape is
726 the computed shape, including only provisions which were defined
727 in the declarations of the unit.
728
729 \begin{aside}
730 \textbf{Signature visibility, and defaulting}
731 The simplest formulation of requirements is to have them always be
732 visible. Signature visibility could be controlled by associating
733 every requirement with a flag indicating if it is importable or
734 not: a signature declaration sets a requirement to be visible, and
735 an explicit export list can specify if a requirement is to be visible
736 or not.
737
738 When an export list is absent, we have to pick a default visibility
739 for a signature. If we use the same behavior as with modules,
740 a strange situation can occur:
741
742 \begin{verbatim}
743 unit p where -- S is visible
744 signature S where
745 x :: True
746
747 unit q where -- use defaulting
748 include p
749 signature S where
750 y :: True
751 module M where
752 import S
753 z = x && y -- OK
754
755 unit r where
756 include q
757 module N where
758 import S
759 z = y -- OK
760 z = x -- ???
761 \end{verbatim}
762 %
763 Absent the second signature declaration in \verb|q|, \verb|S.x| clearly
764 should not be visible in \verb|N|. However, what ought to occur when this signature
765 declaration is added? One interpretation is to say that only some
766 (but not all) declarations are provided (\verb|S.x| remains invisible);
767 another interpretation is that adding \verb|S| is enough to treat
768 the signature as ``in-line'', and all declarations are now provided
769 (\verb|S.x| is visible).
770
771 The latter interpretation avoids having to keep track of providedness
772 per declarations, and means that you can always express defaulting
773 behavior by writing an explicit provides declaration on the unit.
774 However, it has the odd behavior of making empty signatures semantically
775 meaningful:
776
777 \begin{verbatim}
778 unit q where
779 include p
780 signature S where
781 \end{verbatim}
782 \end{aside}
783 %
784 % SPJ: This would be too complicated (if there's yet a third way)
785
786 \clearpage
787 \newpage
788
789 \subsection{Merging AvailInfos}
790
791 We describe how to take two sets of \I{AvailInfo}s and merges them
792 into one set. In the degenerate case where every \I{AvailInfo} is a
793 $Name$, this algorithm operates the same as the original algorithm.
794 Merging proceeds in two steps: unification and then simple union.
795
796 Unification proceeds as follows: for each pair of \I{Name}s with
797 matching \I{OccName}s, unify the names. For each pair of $\I{Name}\, \verb|{|\,
798 \I{Name}_0\verb|,|\, \ldots\verb|,|\, \I{Name}_n\, \verb|}|$, where there
799 exists some pair of child names with matching \I{OccName}s, unify the
800 parent \I{Name}s. (A single \I{AvailInfo} may participate in multiple such
801 pairs.) A simple identifier and a type constructor \I{AvailInfo} with
802 overlapping in-scope names fails to unify. After unification,
803 the simple union combines entries with matching \verb|availName|s (parent
804 name in the case of a type constructor), recursively unioning the child
805 names of type constructor \I{AvailInfo}s.
806
807 Unification of \I{Name}s results in a substitution, and a \I{Name} substitution
808 on \I{AvailInfo} is a little unconventional. Specifically, substitution on $\I{Name}\, \verb|{|\,
809 \I{Name}_0\verb|,|\, \ldots\verb|,|\, \I{Name}_n\, \verb|}|$ proceeds specially:
810 a substitution from \I{Name} to $\I{Name}'$ induces a substitution from
811 \I{Module} to $Module'$ (as the \I{OccName}s of the \I{Name}s are guaranteed
812 to be equal), so for each child $\I{Name}_i$, perform the \I{Module}
813 substitution. So for example, the substitution \verb|HOLE:A.T| to \verb|THIS:A.T|
814 takes the \I{AvailInfo} \verb|HOLE:A.T { HOLE:A.B, HOLE:A.foo }| to
815 \verb|THIS:A.T { THIS:A.B, THIS:A.foo }|. In particular, substitution
816 on children \I{Name}s is \emph{only} carried out by substituting on the outer name;
817 we will never directly substitute children.
818
819 Unfortunately, there are a number of tricky scenarios:
820
821 \paragraph{Merging when type constructors are not in scope}
822
823 \begin{verbatim}
824 signature A1(foo) where
825 data A = A { foo :: Int, bar :: Bool }
826
827 signature A2(bar) where
828 data A = A { foo :: Int, bar :: Bool }
829 \end{verbatim}
830 %
831 If we merge \verb|A1| and \verb|A2|, are we supposed to conclude that
832 the types \verb|A1.A| and \verb|A2.A| (not in scope!) are the same?
833 The answer is no! Consider these implementations:
834
835 \begin{verbatim}
836 module A1(A(..)) where
837 data A = A { foo :: Int, bar :: Bool }
838
839 module A2(A(..)) where
840 data A = A { foo :: Int, bar :: Bool }
841
842 module A(foo, bar) where
843 import A1(foo)
844 import A2(bar)
845 \end{verbatim}
846
847 Here, \verb|module A1| implements \verb|signature A1|, \verb|module A2| implements \verb|signature A2|,
848 and \verb|module A| implements \verb|signature A1| and \verb|signature A2| individually
849 and should certainly implement their merge. This is why we cannot simply
850 merge type constructors based on the \I{OccName} of their top-level type;
851 merging only occurs between in-scope identifiers.
852
853 \paragraph{Does merging a selector merge the type constructor?}
854
855 \begin{verbatim}
856 signature A1(A(..)) where
857 data A = A { foo :: Int, bar :: Bool }
858
859 signature A2(A(..)) where
860 data A = A { foo :: Int, bar :: Bool }
861
862 signature A2(foo) where
863 import A1(foo)
864 \end{verbatim}
865 %
866 Does the last signature, which is written in the style of a sharing constraint on \verb|foo|,
867 also cause \verb|bar| and the type and constructor \verb|A| to be unified?
868 Because a merge of a child name results in a substitution on the parent name,
869 the answer is yes.
870
871 \paragraph{Incomplete data declarations}
872
873 \begin{verbatim}
874 signature A1(A(foo)) where
875 data A = A { foo :: Int }
876
877 signature A2(A(bar)) where
878 data A = A { bar :: Bool }
879 \end{verbatim}
880 %
881 Should \verb|A1| and \verb|A2| merge? If yes, this would imply
882 that data definitions in signatures could only be \emph{partial}
883 specifications of their true data types. This seems complicated,
884 which suggests this should not be supported; however, in fact,
885 this sort of definition, while disallowed during type checking,
886 should be \emph{allowed} during shaping. The reason that the
887 shape we abscribe to the signatures \verb|A1| and \verb|A2| are
888 equivalent to the shapes for these which should merge:
889
890 \begin{verbatim}
891 signature A1(A(foo)) where
892 data A = A { foo :: Int, bar :: Bool }
893
894 signature A2(A(bar)) where
895 data A = A { foo :: Int, bar :: Bool }
896 \end{verbatim}
897
898 \subsection{Subtyping record selectors as functions}
899
900 \begin{verbatim}
901 signature H(A, foo) where
902 data A
903 foo :: A -> Int
904
905 module M(A, foo) where
906 data A = A { foo :: Int, bar :: Bool }
907 \end{verbatim}
908 %
909 Does \verb|M| successfully fill \verb|H|? If so, it means that anywhere
910 a signature requests a function \verb|foo|, we can instead validly
911 provide a record selector. This capability seems quite attractive,
912 although in practice record selectors rarely seem to be abstracted this
913 way: one reason is that \verb|M.foo| still \emph{is} a record selector,
914 and can be used to modify a record. (Many library authors find this
915 suprising!)
916
917 Nor does this seem to be an insurmountable instance of the avoidance
918 problem:
919 as a workaround, \verb|H| can equivalently be written as:
920
921 \begin{verbatim}
922 signature H(foo) where
923 data A = A { foo :: Int, bar :: Bool }
924 \end{verbatim}
925 %
926 However, you might not like this, as the otherwise irrelevant \verb|bar| must be mentioned
927 in the definition.
928
929 In any case, actually implementing this `subtyping' is quite complicated, because we can no
930 longer assume that every child name is associated with a parent name.
931 The technical difficulty is that we now need to unify a plain identifier
932 \I{AvailInfo} (from the signature) with a type constructor \I{AvailInfo}
933 (from a module.) It is not clear what this should mean.
934 Consider this situation:
935
936 \begin{verbatim}
937 unit p where
938 signature H(A, foo, bar) where
939 data A
940 foo :: A -> Int
941 bar :: A -> Bool
942 module X(A, foo) where
943 import H
944 unit q where
945 include p
946 signature H(bar) where
947 data A = A { foo :: Int, bar :: Bool }
948 module Y where
949 import X(A(..)) -- ???
950 \end{verbatim}
951
952 Should the wildcard import on \verb|X| be allowed?
953 This question is equivalent to whether or not shaping discovers
954 whether or not a function is a record selector and propagates this
955 information elsewhere.
956 If the wildcard is not allowed, here is another situation:
957
958 \begin{verbatim}
959 unit p where
960 -- define without record selectors
961 signature X1(A, foo) where
962 data A
963 foo :: A -> Int
964 module M1(A, foo) where
965 import X1
966
967 unit q where
968 -- define with record selectors (X1s unify)
969 signature X1(A(..)) where
970 data A = A { foo :: Int, bar :: Bool }
971 signature X2(A(..)) where
972 data A = A { foo :: Int, bar :: Bool }
973
974 -- export some record selectors
975 signature Y1(bar) where
976 import X1
977 signature Y2(bar) where
978 import X2
979
980 unit r where
981 include p
982 include q
983
984 -- sharing constraint
985 signature Y2(bar) where
986 import Y1(bar)
987
988 -- the payload
989 module Test where
990 import M1(foo)
991 import X2(foo)
992 ... foo ... -- conflict?
993 \end{verbatim}
994
995 Without the sharing constraint, the \verb|foo|s from \verb|M1| and \verb|X2|
996 should conflict. With it, however, we should conclude that the \verb|foo|s
997 are the same, even though the \verb|foo| from \verb|M1| is \emph{not}
998 considered a child of \verb|A|, and even though in the sharing constraint
999 we \emph{only} unified \verb|bar| (and its parent \verb|A|). To know that
1000 \verb|foo| from \verb|M1| should also be unified, we have to know a bit
1001 more about \verb|A| when the sharing constraint performs unification;
1002 however, the \I{AvailInfo} will only tell us about what is in-scope, which
1003 is \emph{not} enough information.
1004
1005 %\newpage
1006
1007 \section{Type checking}
1008
1009 \begin{figure}[htpb]
1010 $$
1011 \begin{array}{rcll}
1012 \I{PkgType} & ::= & \I{ModIface}_0 \verb|;|\, \ldots\verb|;|\, \I{ModIface}_n \\[1em]
1013 \multicolumn{3}{l}{\mbox{\bf Module interface}} \\
1014 \I{ModIface} & ::= & \verb|module| \; \I{Module} \; \verb|(| \I{mi\_exports} \verb|)| \; \verb|where| \\
1015 & & \qquad \I{mi\_decls} \\
1016 & & \qquad \I{mi\_insts} \\
1017 & & \qquad \I{dep\_orphs} \\
1018 \I{mi\_exports} & ::= & \I{AvailInfo}_0 \verb|,|\, \ldots \verb|,|\, \I{AvailInfo}_n & \mbox{Export list} \\
1019 \I{mi\_decls} & ::= & \I{IfaceDecl}_0 \verb|;|\, \ldots \verb|;|\, \I{IfaceDecl}_n & \mbox{Defined declarations} \\
1020 \I{mi\_insts} & ::= & \I{IfaceClsInst}_0 \verb|;|\, \ldots \verb|;|\, \I{IfaceClsInst}_n & \mbox{Defined instances} \\
1021 \I{dep\_orphs} & ::= & \I{Module}_0 \verb|;|\, \ldots \verb|;|\, \I{Module}_n & \mbox{Transitive orphan dependencies} \\[1em]
1022 \multicolumn{3}{l}{\mbox{\bf Interface declarations}} \\
1023 \I{IfaceDecl} & ::= & \I{OccName} \; \verb|::| \; \I{IfaceId} \\
1024 & | & \verb|data| \; \I{OccName} \; \verb|=| \;\ \I{IfaceData} \\
1025 & | & \ldots \\
1026 \I{IfaceClsInst} & & \mbox{A type-class instance} \\
1027 \I{IfaceId} & & \mbox{Interface of top-level binder} \\
1028 \I{IfaceData} & & \mbox{Interface of type constructor} \\
1029 \end{array}
1030 $$
1031 \caption{Module interfaces in GHC} \label{fig:typecheck}
1032 \end{figure}
1033
1034 In general terms,
1035 type checking an indefinite unit (a unit with holes) involves
1036 calculating, for every module, a \I{ModIface} representing the
1037 type/interface of the module in question (which is serialized
1038 to disk). The general form of these
1039 interface files are described in Figure~\ref{fig:typecheck}; notably,
1040 the interfaces \I{IfaceId}, \I{IfaceData}, etc. contain \I{Name} references,
1041 which must be resolved by
1042 looking up a \I{ModIface} corresponding to the \I{Module} associated
1043 with the \I{Name}. (We will say more about this lookup process shortly.)
1044 For example, given:
1045
1046 \begin{verbatim}
1047 unit p where
1048 signature H where
1049 data T
1050 module A(S, T) where
1051 import H
1052 data S = S T
1053 \end{verbatim}
1054 %
1055 the \I{PkgType} is:
1056
1057 \begin{verbatim}
1058 module HOLE:H (HOLE:H.T) where
1059 data T -- abstract type constructor
1060 module THIS:A (THIS:A.S, HOLE:H.T) where
1061 data S = S HOLE:H.T
1062 -- where THIS = p(H -> HOLE:H)
1063 \end{verbatim}
1064 %
1065 However, while it is true that the \I{ModIface} is the final result
1066 of type checking, we actually are conflating two distinct concepts: the user-visible
1067 notion of a \I{ModuleName}, which, when imported, brings some \I{Name}s
1068 into scope (or could trigger a deprecation warning, or pull in some
1069 orphan instances\ldots), versus the actual declarations, which, while recorded
1070 in the \I{ModIface}, have an independent existence: even if a declaration
1071 is not visible for an import, we may internally refer to its \I{Name}, and
1072 need to look it up to find out type information. (A simple case when
1073 this can occur is if a module exports a function with type \verb|T -> T|,
1074 but doesn't export \verb|T|).
1075
1076 \begin{figure}[htpb]
1077 $$
1078 \begin{array}{rcll}
1079 \I{ModDetails} & ::= & \langle\I{md\_types} \verb|;|\; \I{md\_insts}\rangle \\
1080 \I{md\_types} & ::= & \I{TyThing}_0 \verb|,|\, \ldots\verb|,|\, \I{TyThing}_n \\
1081 \I{md\_insts} & ::= & \I{ClsInst}_0 \verb|,|\, \ldots\verb|,|\, \I{ClsInst}_n \\[1em]
1082 \multicolumn{3}{l}{\mbox{\bf Type-checked declarations}} \\
1083 \I{TyThing} & & \mbox{Type-checked thing with a \I{Name}} \\
1084 \I{ClsInst} & & \mbox{Type-checked type class instance} \\
1085 \end{array}
1086 $$
1087 \caption{Semantic objects in GHC} \label{fig:typecheck-more}
1088 \end{figure}
1089
1090 Thus, a \I{ModIface} can be type-checked into a \I{ModDetails}, described in
1091 Figure~\ref{fig:typecheck-more}. Notice that a \I{ModDetails} is just
1092 a bag of type-checkable entities which GHC knows about. We
1093 define the \emph{external package state (EPT)} to
1094 simply be the union of the \I{ModDetails}
1095 of all external modules.
1096
1097 Type checking is a delicate balancing act between module
1098 interfaces and our semantic objects. A \I{ModIface} may get
1099 type-checked multiple times with different hole instantiations
1100 to provide multiple \I{ModDetails}.
1101 Furthermore complicating matters
1102 is that GHC does this resolution \emph{lazily}: a \I{ModIface}
1103 is only converted to a \I{ModDetails} when we are looking up
1104 the type of a \I{Name} that is described by the interface;
1105 thus, unlike usual theoretical treatments of type checking, we can't
1106 eagerly go ahead and perform substitutions on \I{ModIface}s when
1107 they get included.
1108
1109 In a separate compiler like GHC, there are two primary functions we must provide:
1110
1111 \paragraph{\textit{ModuleName} to \textit{ModIface}} Given a \I{ModuleName} which
1112 was explicitly imported by a user, we must produce a \I{ModIface}
1113 that, among other things, specifies what \I{Name}s are brought
1114 into scope. This is used by the renamer to resolve plain references
1115 to identifiers to real \I{Name}s. (By the way, if shaping produced
1116 renamed trees, it would not be necessary to do this step!)
1117
1118 \paragraph{\textit{Module} to \textit{ModDetails}/EPT} Given a \I{Module} which may be
1119 a part of a \I{Name}, we must be able to type check it into
1120 a \I{ModDetails} (usually by reading and typechecking the \I{ModIface}
1121 associated with the \I{Module}, but this process is involved). This
1122 is used by the type checker to find out type information on things. \\
1123
1124 There are two points in the type checker where these capabilities are exercised:
1125
1126 \paragraph{Source-level imports} When a user explicitly imports a
1127 module, the \textit{ModuleName} is mapped to a \textit{ModIface}
1128 to find out what exports are brought into scope (\I{mi\_exports})
1129 and what orphan instances must be loaded (\I{dep\_orphs}). Additionally,
1130 the \textit{Module} is loaded to the EPT to bring instances from
1131 the module into scope.
1132
1133 \paragraph{Internal name lookup} During type checking, we may have
1134 a \I{Name} for which we need type information (\I{TyThing}). If it's not already in the
1135 EPT, we type check and load
1136 into the EPT the \I{ModDetails} of the \I{Module} in the \I{Name},
1137 and then check the EPT again. (\verb|importDecl|)
1138
1139 \subsection{\textit{ModName} to \textit{ModIface}}
1140
1141 In all cases, the \I{mi\_exports} can be calculated directly from the
1142 shaping process, which specifies exactly for each \I{ModName} in scope
1143 what will be brought into scope.
1144
1145 \paragraph{Modules} Modules are straightforward, as for any
1146 \I{Module} there is only one possibly \I{ModIface} associated
1147 with it (the \I{ModIface} for when we type-checked the (unique) \verb|module|
1148 declaration.)
1149
1150 \paragraph{Signatures} For signatures, there may be multiple \I{ModIface}s
1151 associated with a \I{ModName} in scope, e.g. in this situation:
1152
1153 \begin{verbatim}
1154 unit p where
1155 signature S where
1156 data A
1157 unit q where
1158 include p
1159 signature S where
1160 data B
1161 module M where
1162 import S
1163 \end{verbatim}
1164 %
1165 Each literal \verb|signature| has a \I{ModIface} associated with it; and
1166 the import of \verb|S| in \verb|M|, we want to see the \emph{merged}
1167 \I{ModIface}s. We can determine the \I{mi\_exports} from the shape,
1168 but we also need to pull in orphan instances for each signature, and
1169 produce a warning for each deprecated signature.
1170
1171 \begin{aside}
1172 \textbf{Does hiding a signature hide its orphans.} Suppose that we have
1173 extended Backpack to allow hiding signatures from import.
1174
1175 \begin{verbatim}
1176 unit p requires (H) where -- H is hidden from import
1177 module A where
1178 instance Eq (a -> b) where -- orphan
1179 signature H {-# DEPRECATED "Don't use me" #-} where
1180 import A
1181
1182 unit q where
1183 include p
1184 signature H where
1185 data T
1186 module M where
1187 import H -- warn deprecated?
1188 instance Eq (a -> b) -- overlap?
1189 \end{verbatim}
1190
1191 It is probably the most consistent to not pull in orphan instances
1192 and not give the deprecated warning: this corresponds to merging
1193 visible \I{ModIface}s, and ignoring invisible ones.
1194 \end{aside}
1195
1196 \subsection{\textit{Module} to \textit{ModDetails}}
1197
1198 \paragraph{Modules} For modules, we have a \I{Module} of
1199 the form $\I{p}\verb|(|m\; \verb|->|\; \I{Module}\verb|,|\, \ldots\verb|)|$,
1200 and we also have a unique \I{ModIface}, where each hole instantiation
1201 is $\verb|HOLE:|m$.
1202
1203 To generate the \I{ModDetails} associated with the specific instantiation,
1204 we have to type-check the \I{ModIface} with the following adjustments:
1205
1206 \begin{enumerate}
1207 \item Perform a \I{Module} substitution according to the instantiation
1208 of the \I{ModIface}'s \I{Module}. (NB: we \emph{do}
1209 substitute \verb|HOLE:A.x| to \verb|HOLE:B.x| if we instantiated
1210 \verb|A -> HOLE:B|, \emph{unlike} the disjoint
1211 substitutions applied by shaping.)
1212 \item Perform a \I{Name} substitution as follows: for any name
1213 with a unit key that is a $\verb|HOLE|$,
1214 substitute with the recorded \I{Name} in the requirements of the shape.
1215 Otherwise, look up the (unique) \I{ModIface} for the \I{Module},
1216 and subsitute with the corresponding \I{Name} in the \I{mi\_exports}.
1217 \end{enumerate}
1218
1219 \paragraph{Signatures} For signatures, we have a \I{Module} of the form
1220 $\verb|HOLE:|m$. Unlike modules, there are multiple \I{ModIface}s associated with a hole.
1221 We distinguish each separate \I{ModIface} by considering the full \I{UnitKey}
1222 it was defined in, e.g. \verb|p(A -> HOLE:C, B -> q():B)|; call this
1223 the hole's \emph{defining unit key}; the set of \I{ModIface}s for a hole
1224 and their defining unit keys can easily be calculated during shaping.
1225
1226 To generate the \I{ModDetails} associated with a hole, we type-check each
1227 \I{ModIface}, with the following adjustments:
1228
1229 \begin{enumerate}
1230 \item Perform a \I{Module} substitution according to the instantiation
1231 of the defining unit key. (NB: This may rename the hole itself!)
1232 \item Perform a \I{Name} substitution as follows, in the same manner
1233 as would be done in the case of modules.
1234 \item When these \I{ModDetails} are merged into the EPT, some merging
1235 of duplicate types may occur; a type
1236 may be defined multiple times, in which case we check that each
1237 definition is compatible with the previous ones. A concrete
1238 type is always compatible with an abstract type.
1239 \end{enumerate}
1240
1241 \paragraph{Invariants} When we perform \I{Name} substitutions, we must be
1242 sure that we can always find out the correct \I{Name} to substitute to.
1243 This isn't obviously true, consider:
1244
1245 \begin{verbatim}
1246 unit p where
1247 signature S(foo) where
1248 data T
1249 foo :: T
1250 module M(bar) where
1251 import S
1252 bar = foo
1253 unit q where
1254 module A(T(..)) where
1255 data T = T
1256 foo = T
1257 module S(foo) where
1258 import A
1259 include p
1260 module A where
1261 import M
1262 ... bar ...
1263 \end{verbatim}
1264 %
1265 When we type check \verb|p|, we get the \I{ModIface}s:
1266
1267 \begin{verbatim}
1268 module HOLE:S(HOLE:S.foo) where
1269 data T
1270 foo :: HOLE:S.T
1271 module THIS:M(THIS:M.bar) where
1272 bar :: HOLE:S.T
1273 \end{verbatim}
1274 %
1275 Now, when we type check \verb|A|, we pull on the \I{Name} \verb|p(S -> q():S):M.bar|,
1276 which means we have to type check the \I{ModIface} for \verb|p(S -> q():S):M|.
1277 The un-substituted type of \verb|bar| has a reference to \verb|HOLE:S.T|;
1278 this should be substituted to \verb|q():S.T|. But how do we discover this?
1279 We know that \verb|HOLE:S| was instantiated to \verb|q():S|, so we might try
1280 and look for \verb|q():S.T|. However, this \I{Name} does not exist because
1281 the \verb|module S| reexports the selector from \verb|A|! Nor can we consult
1282 the (unique) \I{ModIface} for the module, as it doesn't reexport the relevant
1283 type.
1284
1285 The conclusion, then, is that a module written this way should be disallowed.
1286 Specifically, the correctness condition for a signature is this: \emph{Any \I{Name}
1287 mentioned in the \I{ModIface} of a signature must either be from an external module, or be
1288 exported by the signature}.
1289
1290 \begin{aside}
1291 \textbf{Special case export rule for record selectors.} Here is the analogous case for
1292 record selectors:
1293 \begin{verbatim}
1294 unit p where
1295 signature S(foo) where
1296 data T = T { foo :: Int }
1297 module M(bar) where
1298 import S
1299 bar = foo
1300 unit q where
1301 module A(T(..)) where
1302 data T = T { foo :: Int }
1303 module S(foo) where
1304 import A
1305 include p
1306 module A where
1307 import M
1308 ... bar ...
1309 \end{verbatim}
1310
1311 We could reject this, but technically we can find the right substitution
1312 for \verb|T|, because the export of \verb|foo| is an \I{AvailTC} which
1313 does mention \verb|T|.
1314 \end{aside}
1315
1316 \end{document} % chktex 16