Big batch of Backpack documentation edits.
[ghc.git] / docs / backpack / algorithm.tex
1 \documentclass{article}
2
3 \usepackage{pifont}
4 \usepackage{graphicx} %[pdftex] OR [dvips]
5 \usepackage{fullpage}
6 \usepackage{wrapfig}
7 \usepackage{float}
8 \usepackage{titling}
9 \usepackage{hyperref}
10 \usepackage{tikz}
11 \usepackage{color}
12 \usepackage{footnote}
13 \usepackage{float}
14 \usepackage{algorithm}
15 \usepackage{algpseudocode}
16 \usepackage{bigfoot}
17 \usepackage{amssymb}
18 \usepackage{amsmath}
19 \usepackage{framed}
20
21 % Alter some LaTeX defaults for better treatment of figures:
22 % See p.105 of "TeX Unbound" for suggested values.
23 % See pp. 199-200 of Lamport's "LaTeX" book for details.
24 % General parameters, for ALL pages:
25 \renewcommand{\topfraction}{0.9} % max fraction of floats at top
26 \renewcommand{\bottomfraction}{0.8} % max fraction of floats at bottom
27 % Parameters for TEXT pages (not float pages):
28 \setcounter{topnumber}{2}
29 \setcounter{bottomnumber}{2}
30 \setcounter{totalnumber}{4} % 2 may work better
31 \setcounter{dbltopnumber}{2} % for 2-column pages
32 \renewcommand{\dbltopfraction}{0.9} % fit big float above 2-col. text
33 \renewcommand{\textfraction}{0.07} % allow minimal text w. figs
34 % Parameters for FLOAT pages (not text pages):
35 \renewcommand{\floatpagefraction}{0.7} % require fuller float pages
36 % N.B.: floatpagefraction MUST be less than topfraction !!
37 \renewcommand{\dblfloatpagefraction}{0.7} % require fuller float pages
38
39 % remember to use [htp] or [htpb] for placement
40
41 \newcommand{\I}[1]{\ensuremath{\mathit{#1}}}
42
43 \newcommand{\optionrule}{\noindent\rule{1.0\textwidth}{0.75pt}}
44
45 \newenvironment{aside}
46 {\begin{figure}\def\FrameCommand{\hspace{2em}}
47 \MakeFramed{\advance\hsize-\width}\optionrule\small}
48 {\par\vskip-\smallskipamount\optionrule\endMakeFramed\end{figure}}
49
50 \setlength{\droptitle}{-6em}
51
52 \newcommand{\Red}[1]{{\color{red} #1}}
53
54 \title{The Backpack algorithm}
55
56 \begin{document}
57
58 \maketitle
59
60 \noindent
61 In this document, we look at the compilation of a Backpack unit.
62 Here are the steps a unit goes through:
63
64 \begin{enumerate}
65 \item The \textbf{unit renamer} takes the Backpack file consisting
66 of units transforms each unit name to an indefinite unit ID \I{IndefUnitId}.
67 In particular, it associates each unit name with its local binding
68 site (unit declaration), or an external unit declaration from
69 the indefinite unit database.
70
71 \item The \textbf{dependency solver} takes a unit
72 and converts it into a directed acyclic graph representing the
73 dependency structure of the declarations in a unit.
74 It also computes the \I{Module} of each module/signature
75 declaration, the \I{UnitKey} of each include, and the overall
76 requirements of the unit.
77
78 \item The \textbf{shaping pass} takes the DAG
79 and computes its \I{Shape}. A \I{Shape} describes precisely what
80 a unit provides and requires at the Haskell declaration level (\I{AvailInfo}).
81
82 \item The \textbf{indefinite type checker} takes the DAG and the shape
83 and typechecks each module and signature against the indefinite unit
84 database. The type checking results are saved in the indefinite
85 unit database under the \I{IndefUnitId}.
86
87 \item The \textbf{definite type checker and compiler} takes the DAG as well as a
88 \emph{hole mapping} specifying how each requirement in the unit
89 is filled, and type-checks and compiles the unit against the
90 installed unit database. The results are saved to the installed unit database
91 under the \I{InstalledUnitId}.
92 \end{enumerate}
93
94 \section{Front-end syntax}
95
96 \begin{figure}[htpb]
97 $$
98 \begin{array}{rcll}
99 p,q,r && \mbox{Unit names} \\
100 m,n && \mbox{Module names} \\[1em]
101 \multicolumn{3}{l}{\mbox{\bf Units}} \\
102 \I{unit} & ::= & \verb|unit|\; p\; [\I{provreq}]\; \verb|where {| d_1 \verb|;| \ldots \verb|;| d_n \verb|}| \\[1em]
103 \multicolumn{3}{l}{\mbox{\bf Declarations}} \\
104 d & ::= & \verb|module|\; m \; [exports]\; \verb|where|\; \I{body} \\
105 & | & \verb|signature|\; m \; [exports]\; \verb|where|\; \I{body} \\
106 & | & \verb|include|\; p \; [provreq] \\[1em]
107 \multicolumn{3}{l}{\mbox{\bf Provides/requires specification}} \\
108 \I{provreq} & ::= & \verb|(| \, \I{rns} \, \verb|)| \;
109 [ \verb|requires(|\, \I{rns} \, \verb|)| ] \\
110 \I{rns} & ::= & \I{rn}_0 \verb|,| \, \ldots \verb|,| \, \I{rn}_n [\verb|,|] & \mbox{Renamings} \\
111 \I{rn} & ::= & m\; \verb|as| \; n & \mbox{Renaming} \\[1em]
112 \multicolumn{3}{l}{\mbox{\bf Haskell code}} \\
113 \I{exports} & & \mbox{A Haskell module export list} \\
114 \I{body} & & \mbox{A Haskell module body} \\
115 \end{array}
116 $$
117 \caption{Syntax of Backpack} \label{fig:syntax}
118 \end{figure}
119
120 A (slightly simplified) syntax of Backpack is given in Figure~\ref{fig:syntax}.
121
122 \newpage
123 \section{Unit databases and the unit renamer}
124
125 \begin{figure}[htpb]
126 $$
127 \begin{array}{rcll}
128 \multicolumn{3}{l}{\mbox{\bf Identifiers}} \\
129 \I{InstalledPackageId} & & \mbox{Opaque unique identifier} \\
130 \I{IndefUnitId} & ::= & \I{InstalledPackageId}\, \verb|-|\, p \\
131 \I{UnitKey} & ::= & \I{IndefUnitId} \verb|(|\, \I{HoleMap}\, \verb|)| \\
132 & | & \verb|HOLE| \\
133 \I{HoleMap} & ::= & \I{ModuleName}\; \verb|->|\; \I{Module}\; \verb|,|\, \ldots \\
134 \I{Module} & ::= & \I{UnitKey} \verb|:| \I{ModuleName} \\
135 \I{InstalledUnitId} & ::= & \I{UnitKey} \quad \mbox{(with no occurrence of \texttt{HOLE})} \\[1em]
136 \multicolumn{3}{l}{\mbox{\bf Unit databases}} \\
137 \I{IndefUnitDb} & ::= & \I{IndefUnitId} \; \verb|->| \; \I{IndefUnitRecord} \verb|,|\, \ldots \\
138 \I{InstalledUnitDb} & ::= & \I{InstalledUnitId} \; \verb|->| \; \I{InstalledUnitRecord} \verb|,|\, \ldots \\
139 \end{array}
140 $$
141 \caption{Unit identification} \label{fig:ids}
142 \end{figure}
143
144 \subsection{The unit databases}
145
146 To install a package so that it is available when other packages are compiled,
147 we must record it in some sort of database (which the compiler will query later).
148 The obvious design for such a database is for it to record \emph{installed packages},
149 each of which has a collection of object files and interfaces from a compilation.
150 However, with Backpack, we have to refine this picture in two ways:
151
152 \begin{enumerate}
153 \item A package is always a unit of distribution: something that has
154 single authorship and is uploaded to Hackage. It would seriously hamper
155 modular programming in the small, however, if you always had to create
156 a new package to abstract over some requirements. Thus, while packages
157 continue to be a unit of distribution, the unit of modularity that the
158 compiler deals with is a unit; a package can contain multiple units.
159 Consequently, the database is a database of \emph{units}, not packages.
160 \item In Backpack, it may not be possible to \emph{compile} a unit
161 at install time: it may depend on some (as yet) unspecified holes.
162 Thus, we must maintain two databases: the \emph{indefinite unit
163 database}, recording type-checked but uncompiled units, and the
164 \emph{installed unit database}, recording compiled units. unit
165 database.
166 \end{enumerate}
167 %
168 Thus, there are four closely related types of identifier to be aware of:
169
170 \begin{description} \item[Installed package IDs] Although the unit
171 databases record compile products per unit, it is still useful to be
172 able to assign an identifier to an installed package as a whole. Thus,
173 the \I{InstalledPackageId} is a hash of the source code, as well as
174 build info for the package, such as the Cabal and GHC flags, the
175 resolution of textual dependencies, etc. The IPID is opaque to GHC
176 and selected by Cabal, which operates at the level of packages (rather
177 than units).
178
179 \item[Indefinite unit IDs] A package contains multiple units; so if an
180 installed package ID represents a package with all of its dependencies
181 resolved, an indefinite unit ID simply singles out a specific unit in
182 this installed package. Indefinite unit IDs identify entries in the
183 \textbf{indefinite unit database}, which contain the results of
184 type-checking a unit, but no actual object code. Instead, it has enough
185 information so that the indefinite unit can be built into actual code
186 when a requirement is filled (which means it includes source code and
187 build flags as well.)
188
189 \item[Unit keys] A unit key is an indefinite unit ID, but augmented
190 with a \I{HoleMap}, which says how the requirements of the unit are
191 instantiated. Unit keys are assigned when typecheckign both indefinite
192 definite units: a fictitious unit \verb|HOLE| is specified to define
193 all unfilled requirements in the unit. Units instantiated with holes
194 are never installed; they are strictly for type-checking (although
195 we could generate code for them, which could be linked if the
196 \verb|HOLE| symbols are rewritten to their true destinations).
197
198 \item[Installed unit IDs] An installed unit ID is a unit key which has
199 no \verb|HOLE|s; it identifies a unit that can be compiled. The
200 \textbf{installed unit database} caches the compilation results of these
201 units, so that if a unit is compiled multiple times with the same
202 instantiation, this code can be reused. (This database most closely
203 resembles the existing installed package database with GHC today.)
204 \end{description}
205
206 \subsection{The unit renamer}
207
208 The unit renamer is responsible for transforming unit names in a
209 Backpack file into \I{IndefUnitId}s, so that they can be uniquely
210 identified in the indefinite unit database.
211 Given the \I{InstalledPackageId} of the package we are compiling (\I{ThisInstalledPackageId})
212 and a mapping from $p$ to \I{IndefUnitId} (\I{UnitNameMap}), we rename
213 as follows:
214
215 \begin{itemize}
216 \item Every unit declaration $\verb|unit|\; p$ is renamed to $\I{ThisInstalledPackageId}\, \verb|-|\, p$
217 \item Every unit include $\verb|include|\; p$ is renamed to $\I{ThisInstalledPackageId}\, \verb|-|\, p$ if $p$ was declared in the Backpack file; otherwise it is renamed according to \I{UnitNameMap}.
218 \end{itemize}
219
220 The \I{UnitNameMap} is entirely user specified, so there is a great deal
221 of flexibility on how it can be created, but the convention we expect to
222 be used by Cabal is that a unit name $p$ corresponds to the same-named
223 unit in a \emph{package} named $p$. Packages that don't use Backpack
224 only have on unit, which has the same name as package.
225
226 \paragraph{Notational conventions}
227 In the rest of this document, when it is unambiguous, we will use $p$, $q$ and $r$
228 interchangeably with \I{IndefUnitId}, as after unit renaming, there
229 are no occurrences of unit names.
230
231 \newpage
232 \section{Dependency solver}
233
234 \begin{figure}[htpb]
235 $$
236 \begin{array}{rcll}
237 \tilde{d} & ::= & \verb|module|\; Module \; [exports]\; \verb|where|\; \I{body} \\
238 & | & \verb|signature|\; \I{Module} \; [exports]\; \verb|where|\; \I{body} \\
239 & | & \verb|merge|\; \I{Module} \\
240 & | & \verb|include|\; \I{UnitKey} \\
241 \I{IndefUnitRecord}^{\mathsf{dep}} & ::= & \verb|provides:|\; m\; \verb|->|\; \I{Module}\verb|,|\, \ldots\\
242 & & \verb|requires:|\; m\verb|,|\, \ldots
243 \end{array}
244 $$
245 \caption{Resolved declarations} \label{fig:resolved}
246 \end{figure}
247
248 The dependency solver computes the unfilled requirements of a unit, a
249 dependency DAG on the modules/signatures in the unit, and a dependency
250 DAG on the includes in the unit. We assume every referenced $p$ in the
251 unit must be recorded in the indefinite unit database, such that we can
252 look up $\I{IndefUnitRecord}^{\mathsf{dep}}$.
253
254 \paragraph{Computing unfilled requirements} The unfilled requirements are $R-P$, where $R$ and $P$ are sets of module names computed from the declarations in the following manner:
255
256 \begin{itemize}
257 \item $\verb|include|\; p$: union the (domain of the) provisions with $P$ and the requirements with $R$.
258 \item $\verb|module|\; m$: add $m$ to $P$.
259 \item $\verb|signature|\; m$: add $m$ to $R$.
260 \end{itemize}
261
262 \paragraph{Declaration dependency graph}
263 We define a graph where the nodes are as described in Figure~\ref{fig:resolved}:
264 there is a node per
265 for each module and signature, as well as an extra ``merge'' node for
266 every unfilled requirement, merges the interfaces of a local signature and
267 any requirements brought in from includes.
268 %
269 Each node is identified by the tuple $\left(\I{Module}, \I{IsSource?}\right)$, where
270 the \I{Module} of a declaration $m$ in unit $p$ is \verb|p(H):m|, where $H$ maps
271 each unfilled requirement $n$ to \verb|HOLE:n|, and \I{IsSource?} is true only for signatures.
272
273 The edges of the directed graph signify a ``depends on'' relation, and are
274 defined as follows:
275
276 \begin{itemize}
277 \item A module/signature $m$ depends on a module/signature merge $n$ if $m$ imports $n$.
278 \item A module/signature $m$ depends on a signature $n$ if $m$ \verb|{-# SOURCE #-}| imports $n$.
279 \item A module/signature merge $m$ depends on a local signature $m$ (if it exists).
280 \end{itemize}
281 %
282 If the resulting graph has a cycle, this is an error.
283
284 \paragraph{Include dependency graph} We define an dependency graph
285 between includes, where an $\verb|include|\; p$ depends on
286 $\verb|include|\; q$ if, for some module name $m$, $p$ requires $m$ and
287 $q$ provides $m$. If there is a cyclic, then there is cross-unit
288 mutual recursion: for now, this is an error.
289
290 Assuming an acyclic graph, we can compute the \I{UnitKey} of each
291 key as follows. Initialize $\Gamma$, a substitution from holes to \I{Module},
292 to the identity substitution. For each $\verb|include|\; p$ in topological
293 order, define its \I{UnitKey} to be $p$ with the mapping $\Gamma$ with its
294 domain restricted to the requirements of $p$. Then, for each provision
295 $m\; \verb|->|\; \I{Module}$, update $\Gamma$ so that
296 $\Gamma(m) = \operatorname{subst} (\Gamma, \I{Module})$
297 (where $\operatorname{subst}$ recursively applies the substitution $\Gamma$ in \I{Module}).
298
299 During compilation, the include dependency graph is useful for
300 determining a compilation order for included units.
301
302 \newpage
303 \section{Shaping pass}
304
305 \begin{figure}[htpb]
306 $$
307 \begin{array}{rcll}
308 \I{Shape} & ::= & \verb|provides:|\; m \; \verb|->|\; \I{Module}\; \I{ModShape} \verb|;|\; \ldots \\
309 & & \verb|requires:| \; m \; \verb|->|\; \textcolor{white}{\I{Module}}\; \I{ModShape} \verb|;|\; \ldots \\
310 \I{ModShape} & ::= & \I{AvailInfo}_0 \verb|,|\, \ldots \verb|,|\, \I{AvailInfo}_n \\
311 \I{AvailInfo} & ::= & \I{Name} & \mbox{Plain identifiers} \\
312 & | & \I{Name} \, \verb|{| \, \I{Name}_0\verb|,| \, \ldots\verb|,| \, \I{Name}_n \, \verb|}| & \mbox{Type constructors} \\
313 \I{Name} & ::= & \I{Module} \verb|.| \I{OccName} \\
314 \I{OccName} & & \mbox{Unqualified name in a namespace} \\
315 \I{IndefUnitRecord}^{\mathsf{shape}} & ::= & \I{Shape}
316 \end{array}
317 $$
318 \caption{Shaping} \label{fig:shaping}
319 \end{figure}
320
321 The shaping pass computes the export \I{AvailInfo}s for each node in
322 the dependency graph; collectively, these form the \I{Shape} of
323 the unit described in Figure~\ref{fig:shaping}. Equivalently,
324 the \I{Shape} of unit specifies what a unit requires and provides
325 at the Haskell declaration level.
326
327 An \I{AvailInfo} names a Haskell declaration that may be exported.
328 It may be a plain identifier \I{Name}, or it may be a type constructor,
329 in which case it has children \I{Name}s representing the names of the
330 data constructors, record selectors, etc. This level of hierarchy
331 makes it possible to use ellipses in an import list, e.g. \verb|TyCon(..)|,
332 to selectively import just the logical children of a type constructor.
333 Children names have the invariant that they have the same \I{Module} as the parent name.
334 In a \I{ModShape}/export list,
335 the \I{OccName}s of the plain identifier \I{AvailInfo}s and the \emph{children}
336 of type constructor are unique
337 (although the top-level \I{Name}s may not have unique \I{OccName}s).
338
339 The compilation of every node is associated with a ``shape context'',
340 which represents the modules which are transitively depended upon. Let the
341 environment shape context is the merge of the shapes of all includes; to
342 shape a node:
343
344 \begin{enumerate}
345 \item Merge the environment shape contexts with the shape contexts
346 of all direct dependencies, resulting in the initial shape context.
347 \item Rename the module/signature according to the initial shape context,
348 getting a \I{ModShape}. Importantly, when renaming the signature \verb|M|,
349 any declarations defined in the signature are assigned a \I{Name}s with the \I{Module} \verb|HOLE:M| (rather than a \I{Module} based on the current unit $p$).
350 \item Merge this \I{ModShape} into the initial shape context
351 (modules go in provisions while signatures go in provisions), the
352 result defining the shape context of this node.
353 \end{enumerate}
354
355 We now elaborate on these steps in more detail.
356
357 \subsection{Shapes of includes}
358 Given an \verb|include p (X) requires (Y)|, we can look up the shape
359 for $p$ from the indefinite package database. However, an include can
360 also rename provisions and requires (where $X$, $Y$ are partial maps
361 from module name to module name), which requires transforms the shape
362 in the following way:
363
364 \begin{itemize}
365 \item For each original provision $m\; \verb|->|\; \ldots$, provide
366 $X (m)\; \verb|->|\; \ldots$ if $X (m)$ is defined.
367 \item For each original requirement $m\; \verb|->|\; \ldots$, require
368 $Y (m)\; \verb|->|\; \ldots$ if $Y (m)$ is defined, and $m$ if it is not.
369 (Non-mentioned requirements are always passed through).
370 \item For each requirement renaming from \verb|M| to \verb|N| in $Y$, substitute
371 all occurrences of \verb|HOLE:M| to \verb|HOLE:N| in the \I{ModShape}
372 of all provisions and requirements.
373 \end{itemize}
374
375 \subsection{Shape merging}
376 Before specifying the how to merge shapes algorithm, we must define some subprocedures
377 for unifying and merging lower-level entities such as \I{AvailInfo}s and \I{Name}s,
378 which produce \I{Name} substitution that are applied to shapes.
379
380 \begin{description}
381 \item[Unify two \textit{Name}s]
382 (produces a \I{Name} and a \I{Name} substitution) \\
383 Error if the names do not have matching \I{OccName}s. Error if neither name
384 is a hole name. Otherwise, without loss of generality let $m$ be the hole name
385 and $n$ the other name, return $n$ and the substitution of $m$ to $n$.
386 \item[Merge two sets of \textit{Name}s]
387 (produces a set of \I{Name}s and a \I{Name} substitution) \\
388 Let two \I{Name}s be related if they have the same \I{OccName}.
389 Union the two sets, unifying related names.
390 \item[Unify two \textit{AvailInfo}s]
391 (produces an \I{AvailInfo} and a \I{Name} substitution) \\
392 If both \I{AvailInfo}s are simply
393 a \I{Name}, unify the two \I{Name}s. If both \I{AvailInfo}s are
394 $\I{Name}\, \verb|{|\, \I{Name}_0\verb|,|\, \ldots\verb|,|\, \I{Name}_n\, \verb|}|$,
395 unify the top-level \I{Name}, apply the substitution to both \I{AvailInfo}s,
396 and return the unified \I{Name} with the union of the child names of the
397 substituted \I{AvailInfo}s.
398 Otherwise, error.
399 \item[Merge two sets of \textit{AvailInfo}s]
400 (produces a set of \I{AvailInfo}s and a \I{Name} substitution) \\
401 Let two \I{AvailInfo}s be related if they both are of the form
402 \I{Name} and have matching \I{OccName}s, or if they both are of
403 the form
404 $\I{Name}\, \verb|{|\, \I{Name}_0\verb|,|\, \ldots\verb|,|\, \I{Name}_n\, \verb|}|$
405 and there exists a child name in each which have matching \I{OccName}s.
406 Union the two sets, unifying related \I{AvailInfo}s.
407 \item[Apply a name substitution on an \textit{AvailInfo}] (produces an \I{AvailInfo}) \\
408 Substitute the top-level \I{Name}, which induces a substitution from
409 \I{Module} to $\I{Module}'$. Apply this module substitution to each child
410 \I{Name} in the \I{AvailInfo}.
411
412 \end{description}
413 %
414 Shape merging takes two units with inputs (requirements) and outputs
415 (provisions) and ``wires'' them up so that outputs feed into inputs. To
416 merge the shape of $p$ with the shape of $q$:
417
418 \begin{enumerate}
419 \item \emph{Fill every requirement of $q$ with provided modules from
420 $p$.} For each requirement $M$ of $q$ that is provided by $p$,
421 substitute each \I{Module} occurrence of \verb|HOLE:M| with the
422 provided $p\verb|(|M\verb|)|$ (however, do \textbf{NOT} substitute the
423 top-level \I{Module} in a \I{Name}s), merge the \I{AvailInfo}s and apply
424 the resulting \I{Name} substitution, and
425 remove the requirement from $q$. If the \I{AvailInfo}s of the
426 provision are not a superset of the required \I{AvailInfo}s,
427 error.
428 \item If mutual recursion is supported, \emph{fill every requirement
429 of $p$ with provided modules from $q$.}
430 \item \emph{Merge leftover requirements.} For each requirement $M$
431 of $q$ that is not provided by $p$ but required by $p$, and let
432 the new requirement be the merge of
433 \I{AvailInfo}s, applying the resulting \I{Name} substitution.
434 \item \emph{Add provisions of $q$.} Union the provisions of $p$ and $q$, (lazily) erroring
435 if there is a duplicate that doesn't have the same \I{Module}.
436 \end{enumerate}
437
438 \newpage
439 \section{Indefinite type checker}
440
441 \begin{figure}[htpb]
442 $$
443 \begin{array}{rcll}
444 \I{IndefUnitRecord} & ::= & \I{ModIface}_0 \verb|;|\, \ldots\verb|;|\, \I{ModIface}_n \\[1em]
445 \multicolumn{3}{l}{\mbox{\bf Module interface}} \\
446 \I{ModIface} & ::= & \verb|module| \; \I{Module} \; \verb|(| \I{mi\_exports} \verb|)| \; \verb|where| \\
447 & & \qquad \I{mi\_decls} \\
448 & & \qquad \I{mi\_insts} \\
449 & & \qquad \I{dep\_orphs} \\
450 \I{mi\_exports} & ::= & \I{AvailInfo}_0 \verb|,|\, \ldots \verb|,|\, \I{AvailInfo}_n & \mbox{Export list} \\
451 \I{mi\_decls} & ::= & \I{IfaceDecl}_0 \verb|;|\, \ldots \verb|;|\, \I{IfaceDecl}_n & \mbox{Defined declarations} \\
452 \I{mi\_insts} & ::= & \I{IfaceClsInst}_0 \verb|;|\, \ldots \verb|;|\, \I{IfaceClsInst}_n & \mbox{Defined instances} \\
453 \I{dep\_orphs} & ::= & \I{Module}_0 \verb|;|\, \ldots \verb|;|\, \I{Module}_n & \mbox{Transitive orphan dependencies} \\[1em]
454 \multicolumn{3}{l}{\mbox{\bf Interface declarations}} \\
455 \I{IfaceDecl} & ::= & \I{OccName} \; \verb|::| \; \I{IfaceId} \\
456 & | & \verb|data| \; \I{OccName} \; \verb|=| \;\ \I{IfaceData} \\
457 & | & \ldots \\
458 \I{IfaceClsInst} & & \mbox{A type-class instance} \\
459 \I{IfaceId} & & \mbox{Interface of top-level binder} \\
460 \I{IfaceData} & & \mbox{Interface of type constructor} \\
461 \end{array}
462 $$
463 \caption{Module interfaces in GHC} \label{fig:typecheck}
464 \end{figure}
465
466 \Red{This needs updating.}
467
468 In general terms,
469 type checking an indefinite unit (a unit with holes) involves
470 calculating, for every module, a \I{ModIface} representing the
471 type/interface of the module in question (which is serialized
472 to disk). The general form of these
473 interface files are described in Figure~\ref{fig:typecheck}; notably,
474 the interfaces \I{IfaceId}, \I{IfaceData}, etc. contain \I{Name} references,
475 which must be resolved by
476 looking up a \I{ModIface} corresponding to the \I{Module} associated
477 with the \I{Name}. (We will say more about this lookup process shortly.)
478 For example, given:
479
480 \begin{verbatim}
481 unit p where
482 signature H where
483 data T
484 module A(S, T) where
485 import H
486 data S = S T
487 \end{verbatim}
488 %
489 the \I{PkgType} is:
490
491 \begin{verbatim}
492 module HOLE:H (HOLE:H.T) where
493 data T -- abstract type constructor
494 module THIS:A (THIS:A.S, HOLE:H.T) where
495 data S = S HOLE:H.T
496 -- where THIS = p(H -> HOLE:H)
497 \end{verbatim}
498 %
499 However, while it is true that the \I{ModIface} is the final result
500 of type checking, we actually are conflating two distinct concepts: the user-visible
501 notion of a \I{ModuleName}, which, when imported, brings some \I{Name}s
502 into scope (or could trigger a deprecation warning, or pull in some
503 orphan instances\ldots), versus the actual declarations, which, while recorded
504 in the \I{ModIface}, have an independent existence: even if a declaration
505 is not visible for an import, we may internally refer to its \I{Name}, and
506 need to look it up to find out type information. (A simple case when
507 this can occur is if a module exports a function with type \verb|T -> T|,
508 but doesn't export \verb|T|).
509
510 \begin{figure}[htpb]
511 $$
512 \begin{array}{rcll}
513 \I{ModDetails} & ::= & \langle\I{md\_types} \verb|;|\; \I{md\_insts}\rangle \\
514 \I{md\_types} & ::= & \I{TyThing}_0 \verb|,|\, \ldots\verb|,|\, \I{TyThing}_n \\
515 \I{md\_insts} & ::= & \I{ClsInst}_0 \verb|,|\, \ldots\verb|,|\, \I{ClsInst}_n \\[1em]
516 \multicolumn{3}{l}{\mbox{\bf Type-checked declarations}} \\
517 \I{TyThing} & & \mbox{Type-checked thing with a \I{Name}} \\
518 \I{ClsInst} & & \mbox{Type-checked type class instance} \\
519 \end{array}
520 $$
521 \caption{Semantic objects in GHC} \label{fig:typecheck-more}
522 \end{figure}
523
524 Thus, a \I{ModIface} can be type-checked into a \I{ModDetails}, described in
525 Figure~\ref{fig:typecheck-more}. Notice that a \I{ModDetails} is just
526 a bag of type-checkable entities which GHC knows about. We
527 define the \emph{external package state (EPT)} to
528 simply be the union of the \I{ModDetails}
529 of all external modules.
530
531 Type checking is a delicate balancing act between module
532 interfaces and our semantic objects. A \I{ModIface} may get
533 type-checked multiple times with different hole instantiations
534 to provide multiple \I{ModDetails}.
535 Furthermore complicating matters
536 is that GHC does this resolution \emph{lazily}: a \I{ModIface}
537 is only converted to a \I{ModDetails} when we are looking up
538 the type of a \I{Name} that is described by the interface;
539 thus, unlike usual theoretical treatments of type checking, we can't
540 eagerly go ahead and perform substitutions on \I{ModIface}s when
541 they get included.
542
543 In a separate compiler like GHC, there are two primary functions we must provide:
544
545 \paragraph{\textit{ModuleName} to \textit{ModIface}} Given a \I{ModuleName} which
546 was explicitly imported by a user, we must produce a \I{ModIface}
547 that, among other things, specifies what \I{Name}s are brought
548 into scope. This is used by the renamer to resolve plain references
549 to identifiers to real \I{Name}s. (By the way, if shaping produced
550 renamed trees, it would not be necessary to do this step!)
551
552 \paragraph{\textit{Module} to \textit{ModDetails}/EPT} Given a \I{Module} which may be
553 a part of a \I{Name}, we must be able to type check it into
554 a \I{ModDetails} (usually by reading and typechecking the \I{ModIface}
555 associated with the \I{Module}, but this process is involved). This
556 is used by the type checker to find out type information on things. \\
557
558 There are two points in the type checker where these capabilities are exercised:
559
560 \paragraph{Source-level imports} When a user explicitly imports a
561 module, the \textit{ModuleName} is mapped to a \textit{ModIface}
562 to find out what exports are brought into scope (\I{mi\_exports})
563 and what orphan instances must be loaded (\I{dep\_orphs}). Additionally,
564 the \textit{Module} is loaded to the EPT to bring instances from
565 the module into scope.
566
567 \paragraph{Internal name lookup} During type checking, we may have
568 a \I{Name} for which we need type information (\I{TyThing}). If it's not already in the
569 EPT, we type check and load
570 into the EPT the \I{ModDetails} of the \I{Module} in the \I{Name},
571 and then check the EPT again. (\verb|importDecl|)
572
573 \subsection{\textit{ModName} to \textit{ModIface}}
574
575 In all cases, the \I{mi\_exports} can be calculated directly from the
576 shaping process, which specifies exactly for each \I{ModName} in scope
577 what will be brought into scope.
578
579 \paragraph{Modules} Modules are straightforward, as for any
580 \I{Module} there is only one possibly \I{ModIface} associated
581 with it (the \I{ModIface} for when we type-checked the (unique) \verb|module|
582 declaration.)
583
584 \paragraph{Signatures} For signatures, there may be multiple \I{ModIface}s
585 associated with a \I{ModName} in scope, e.g. in this situation:
586
587 \begin{verbatim}
588 unit p where
589 signature S where
590 data A
591 unit q where
592 include p
593 signature S where
594 data B
595 module M where
596 import S
597 \end{verbatim}
598 %
599 Each literal \verb|signature| has a \I{ModIface} associated with it; and
600 the import of \verb|S| in \verb|M|, we want to see the \emph{merged}
601 \I{ModIface}s. We can determine the \I{mi\_exports} from the shape,
602 but we also need to pull in orphan instances for each signature, and
603 produce a warning for each deprecated signature.
604
605 \begin{aside}
606 \textbf{Does hiding a signature hide its orphans.} Suppose that we have
607 extended Backpack to allow hiding signatures from import.
608
609 \begin{verbatim}
610 unit p requires (H) where -- H is hidden from import
611 module A where
612 instance Eq (a -> b) where -- orphan
613 signature H {-# DEPRECATED "Don't use me" #-} where
614 import A
615
616 unit q where
617 include p
618 signature H where
619 data T
620 module M where
621 import H -- warn deprecated?
622 instance Eq (a -> b) -- overlap?
623 \end{verbatim}
624
625 It is probably the most consistent to not pull in orphan instances
626 and not give the deprecated warning: this corresponds to merging
627 visible \I{ModIface}s, and ignoring invisible ones.
628 \end{aside}
629
630 \subsection{\textit{Module} to \textit{ModDetails}}
631
632 \paragraph{Modules} For modules, we have a \I{Module} of
633 the form $\I{p}\verb|(|m\; \verb|->|\; \I{Module}\verb|,|\, \ldots\verb|)|$,
634 and we also have a unique \I{ModIface}, where each hole instantiation
635 is $\verb|HOLE:|m$.
636
637 To generate the \I{ModDetails} associated with the specific instantiation,
638 we have to type-check the \I{ModIface} with the following adjustments:
639
640 \begin{enumerate}
641 \item Perform a \I{Module} substitution according to the instantiation
642 of the \I{ModIface}'s \I{Module}. (NB: we \emph{do}
643 substitute \verb|HOLE:A.x| to \verb|HOLE:B.x| if we instantiated
644 \verb|A -> HOLE:B|, \emph{unlike} the disjoint
645 substitutions applied by shaping.)
646 \item Perform a \I{Name} substitution as follows: for any name
647 with a unit key that is a $\verb|HOLE|$,
648 substitute with the recorded \I{Name} in the requirements of the shape.
649 Otherwise, look up the (unique) \I{ModIface} for the \I{Module},
650 and subsitute with the corresponding \I{Name} in the \I{mi\_exports}.
651 \end{enumerate}
652
653 \paragraph{Signatures} For signatures, we have a \I{Module} of the form
654 $\verb|HOLE:|m$. Unlike modules, there are multiple \I{ModIface}s associated with a hole.
655 We distinguish each separate \I{ModIface} by considering the full \I{UnitKey}
656 it was defined in, e.g. \verb|p(A -> HOLE:C, B -> q():B)|; call this
657 the hole's \emph{defining unit key}; the set of \I{ModIface}s for a hole
658 and their defining unit keys can easily be calculated during shaping.
659
660 To generate the \I{ModDetails} associated with a hole, we type-check each
661 \I{ModIface}, with the following adjustments:
662
663 \begin{enumerate}
664 \item Perform a \I{Module} substitution according to the instantiation
665 of the defining unit key. (NB: This may rename the hole itself!)
666 \item Perform a \I{Name} substitution as follows, in the same manner
667 as would be done in the case of modules.
668 \item When these \I{ModDetails} are merged into the EPT, some merging
669 of duplicate types may occur; a type
670 may be defined multiple times, in which case we check that each
671 definition is compatible with the previous ones. A concrete
672 type is always compatible with an abstract type.
673 \end{enumerate}
674
675 \paragraph{Invariants} When we perform \I{Name} substitutions, we must be
676 sure that we can always find out the correct \I{Name} to substitute to.
677 This isn't obviously true, consider:
678
679 \begin{verbatim}
680 unit p where
681 signature S(foo) where
682 data T
683 foo :: T
684 module M(bar) where
685 import S
686 bar = foo
687 unit q where
688 module A(T(..)) where
689 data T = T
690 foo = T
691 module S(foo) where
692 import A
693 include p
694 module A where
695 import M
696 ... bar ...
697 \end{verbatim}
698 %
699 When we type check \verb|p|, we get the \I{ModIface}s:
700
701 \begin{verbatim}
702 module HOLE:S(HOLE:S.foo) where
703 data T
704 foo :: HOLE:S.T
705 module THIS:M(THIS:M.bar) where
706 bar :: HOLE:S.T
707 \end{verbatim}
708 %
709 Now, when we type check \verb|A|, we pull on the \I{Name} \verb|p(S -> q():S):M.bar|,
710 which means we have to type check the \I{ModIface} for \verb|p(S -> q():S):M|.
711 The un-substituted type of \verb|bar| has a reference to \verb|HOLE:S.T|;
712 this should be substituted to \verb|q():S.T|. But how do we discover this?
713 We know that \verb|HOLE:S| was instantiated to \verb|q():S|, so we might try
714 and look for \verb|q():S.T|. However, this \I{Name} does not exist because
715 the \verb|module S| reexports the selector from \verb|A|! Nor can we consult
716 the (unique) \I{ModIface} for the module, as it doesn't reexport the relevant
717 type.
718
719 The conclusion, then, is that a module written this way should be disallowed.
720 Specifically, the correctness condition for a signature is this: \emph{Any \I{Name}
721 mentioned in the \I{ModIface} of a signature must either be from an external module, or be
722 exported by the signature}.
723
724 \section{Appendix: Shaping}
725
726 This section clarifies some subtle aspects about shaping.
727
728 \subsection{\textit{OccName} is implied by \textit{Name}}
729 In Haskell, the following is not valid syntax:
730
731 \begin{verbatim}
732 import A (foobar as baz)
733 \end{verbatim}
734 In particular, a \I{Name} which is in scope will always have the same
735 \I{OccName} (even if it may be qualified.) You might imagine relaxing
736 this restriction so that declarations can be used under different \I{OccName}s;
737 in such a world, we need a different definition of shape:
738
739 \begin{verbatim}
740 Shape ::=
741 provided: ModName -> Module { OccName -> Name }
742 required: ModName -> { OccName -> Name }
743 \end{verbatim}
744 Presently, however, such an \I{OccName} annotation would be redundant: it can be inferred from the \I{Name}.
745
746 \subsection{Holes of a unit are a mapping, not a set.}
747
748 Why can't the \I{UnitKey} just record a
749 set of \I{Module}s, e.g. $\I{UnitKey}\;::= \; p \; \verb|{| \; \I{Module} \; \verb|}|$? Consider:
750
751 \begin{verbatim}
752 unit p (A) requires (H1, H2) where
753 signature H1(T) where
754 data T
755 signature H2(T) where
756 data T
757 module A(A(..)) where
758 import qualified H1
759 import qualified H2
760 data A = A H1.T H2.T
761
762 unit q (A12, A21) where
763 module I1(T) where
764 data T = T Int
765 module I2(T) where
766 data T = T Bool
767 include p (A as A12) requires (H1 as I1, H2 as I2)
768 include p (A as A21) requires (H1 as I2, H2 as I1)
769 \end{verbatim}
770 With a mapping, the first instance of \verb|p| has key \verb|p(H1 -> q():I1, H2 -> q():I2)|
771 while the second instance has key \verb|p(H1 -> q():I2, H2 -> q():I1)|; with
772 a set, both would have the key \verb|p{q():I1, q():I2}| and be
773 indistinguishable.
774
775 \subsection{Signatures can require a specific entity.}
776 With requirements like \verb|A -> { HOLE:A.T, HOLE:A.foo }|,
777 why not specify it as \verb|A -> { T, foo }|,
778 e.g., \verb|required: { ModName -> { OccName } }|? Consider:
779
780 \begin{verbatim}
781 unit p () requires (A, B) where
782 signature A(T) where
783 data T
784 signature B(T) where
785 import T
786 \end{verbatim}
787 The requirements of this unit specify that \verb|A.T| $=$ \verb|B.T|; this
788 can be expressed with \I{Name}s as
789
790 \begin{verbatim}
791 A -> { HOLE:A.T }
792 B -> { HOLE:A.T }
793 \end{verbatim}
794 But, without \I{Name}s, the sharing constraint is impossible: \verb|A -> { T }; B -> { T }|. (NB: \verb|A| and \verb|B| could be filled with different modules, they just have
795 to both export the same \verb|T|.)
796
797 \subsection{The \textit{Name} of a value is used to avoid
798 ambiguous identifier errors.}
799 We state that two types
800 are equal when their \I{Name}s are the same; however,
801 for values, it is less clear why we care. But consider this example:
802
803 \begin{verbatim}
804 unit p (A) requires (H1, H2) where
805 signature H1(x) where
806 x :: Int
807 signature H2(x) where
808 import H1(x)
809 module A(y) where
810 import H1
811 import H2
812 y = x
813 \end{verbatim}
814 The reference to \verb|x| in \verb|A| is unambiguous, because it is known
815 that \verb|x| from \verb|H1| and \verb|x| from \verb|H2| are the same (have
816 the same \I{Name}.) If they were not the same, it would be ambiguous and
817 should cause an error. Knowing the \I{Name} of a value distinguishes
818 between these two cases.
819
820 \subsection{Holes are linear}
821 Requirements do not record what \I{Module} represents
822 the identity of a requirement, which means that it's not possible to assert
823 that hole \verb|A| and hole \verb|B| should be implemented with the same module,
824 as might occur with aliasing:
825
826 \begin{verbatim}
827 signature A where
828 signature B where
829 alias A = B
830 \end{verbatim}
831 %
832 The benefit of this restriction is that when a requirement is filled,
833 it is obvious that this is the only requirement that is filled: you won't
834 magically cause some other requirements to be filled. The downside is
835 it's not possible to write a unit which looks for an interface it is
836 looking for in one of $n$ names, accepting any name as an acceptable linkage.
837 If aliasing was allowed, we'd need a separate physical shaping context,
838 to make sure multiple mentions of the same hole were consistent.
839
840 \subsection{A unit does not ``provide'' its signatures}
841 We enforce the invariant that
842 a provision is always (syntactically) a \verb|module| and a requirement
843 is always a \verb|signature|. This means that if you have a requirement
844 and a provision of the same name, the requirement can \emph{always} be filled
845 with the provision.
846
847 The alternate design, where a unit both requires and provides
848 its signatures, makes it unclear if a provision
849 will actually fill a signature. Consider this example, where
850 a signature is required and exposed:
851
852 \begin{verbatim}
853 unit a-sigs (A) requires (A) where -- ***
854 signature A where
855 data T
856
857 unit a-user (B) requires (A) where
858 signature A where
859 data T
860 x :: T
861 module B where
862 ...
863
864 unit p where
865 include a-sigs
866 include a-user
867 \end{verbatim}
868 %
869 When we consider merging in the shape of \verb|a-user|, does the
870 \verb|A| provided by \verb|a-sigs| fill in the \verb|A| requirement
871 in \verb|a-user|? It \emph{should not}, since \verb|a-sigs| does not
872 actually provide enough declarations to satisfy \verb|a-user|'s
873 requirement: the intended semantics \emph{merges} the requirements
874 of \verb|a-sigs| and \verb|a-user|.
875
876 What about this example?
877
878 \begin{verbatim}
879 unit a-sigs (M as A) requires (H as A) where
880 signature H(T) where
881 data T
882 module M(T) where
883 import H(T)
884 \end{verbatim}
885 %
886 We rightly should error, since the provision is a module. And in this situation:
887
888 \begin{verbatim}
889 unit a-sigs (H as A) requires (H) where
890 signature H(T) where
891 data T
892 \end{verbatim}
893 %
894 The requirements should be merged, but should the merged requirement
895 be under the name \verb|H| or \verb|A|?
896
897 It may still be possible to use the \verb|(A) requires (A)| syntax to
898 indicate exposed signatures, but this would be a mere syntactic
899 alternative to \verb|() requires (exposed A)|.
900
901 \subsection{Signature visibility, and defaulting}
902 The simplest formulation of requirements is to have them always be
903 importable. One proposed enhancement, however, is to allow some
904 requirements to be ``non-importable''; that is, they are not visible
905 to people who include packages.
906
907 One simple way of modeling this is to associate each required module
908 with a flag indicating whether or not it is importable or not. Then, we
909 might imagine that an explicit export list could be used to toggle
910 whether or not a requirement is visible or not.
911
912 However, when an export list is absent, we have to pick a default
913 visibility for a signature. If we use the same behavior as with
914 modules, a strange situation can occur:
915
916 \begin{verbatim}
917 unit p where -- S is visible
918 signature S where
919 x :: True
920
921 unit q where -- use defaulting
922 include p
923 signature S where
924 y :: True
925 module M where
926 import S
927 z = x && y -- OK
928
929 unit r where
930 include q
931 module N where
932 import S
933 z = y -- OK
934 z = x -- ???
935 \end{verbatim}
936 %
937 Absent the second signature declaration in \verb|q|, \verb|S.x| clearly
938 should not be visible in \verb|N|. However, what ought to occur when this signature
939 declaration is added? One interpretation is to say that only some
940 (but not all) declarations are provided (\verb|S.x| remains invisible);
941 another interpretation is that adding \verb|S| is enough to treat
942 the signature as ``in-line'', and all declarations are now provided
943 (\verb|S.x| is visible).
944
945 The latter interpretation avoids having to keep track of providedness
946 per declarations, and means that you can always express defaulting
947 behavior by writing an explicit provides declaration on the unit.
948 However, it has the odd behavior of making empty signatures semantically
949 meaningful:
950
951 \begin{verbatim}
952 unit q where
953 include p
954 signature S where
955 \end{verbatim}
956 %
957 % SPJ: This would be too complicated (if there's yet a third way)
958
959 This is pretty complicated, so signature visibility is not currently
960 planned to be implemented.
961
962 \subsection{Tricky \textit{AvailInfo} merging scenarios}
963
964 \paragraph{Merging when type constructors are not in scope}
965
966 \begin{verbatim}
967 signature A1(foo) where
968 data A = A { foo :: Int, bar :: Bool }
969
970 signature A2(bar) where
971 data A = A { foo :: Int, bar :: Bool }
972 \end{verbatim}
973 %
974 If we merge \verb|A1| and \verb|A2|, are we supposed to conclude that
975 the types \verb|A1.A| and \verb|A2.A| (not in scope!) are the same?
976 The answer is no! Consider these implementations:
977
978 \begin{verbatim}
979 module A1(A(..)) where
980 data A = A { foo :: Int, bar :: Bool }
981
982 module A2(A(..)) where
983 data A = A { foo :: Int, bar :: Bool }
984
985 module A(foo, bar) where
986 import A1(foo)
987 import A2(bar)
988 \end{verbatim}
989
990 Here, \verb|module A1| implements \verb|signature A1|, \verb|module A2| implements \verb|signature A2|,
991 and \verb|module A| implements \verb|signature A1| and \verb|signature A2| individually
992 and should certainly implement their merge. This is why we cannot simply
993 merge type constructors based on the \I{OccName} of their top-level type;
994 merging only occurs between in-scope identifiers.
995
996 \paragraph{Does merging a selector merge the type constructor?}
997
998 \begin{verbatim}
999 signature A1(A(..)) where
1000 data A = A { foo :: Int, bar :: Bool }
1001
1002 signature A2(A(..)) where
1003 data A = A { foo :: Int, bar :: Bool }
1004
1005 signature A2(foo) where
1006 import A1(foo)
1007 \end{verbatim}
1008 %
1009 Does the last signature, which is written in the style of a sharing constraint on \verb|foo|,
1010 also cause \verb|bar| and the type and constructor \verb|A| to be unified?
1011 Because a merge of a child name results in a substitution on the parent name,
1012 the answer is yes.
1013
1014 \paragraph{Incomplete data declarations}
1015
1016 \begin{verbatim}
1017 signature A1(A(foo)) where
1018 data A = A { foo :: Int }
1019
1020 signature A2(A(bar)) where
1021 data A = A { bar :: Bool }
1022 \end{verbatim}
1023 %
1024 Should \verb|A1| and \verb|A2| merge? If yes, this would imply
1025 that data definitions in signatures could only be \emph{partial}
1026 specifications of their true data types. This seems complicated,
1027 which suggests this should not be supported; however, in fact,
1028 this sort of definition, while disallowed during type checking,
1029 should be \emph{allowed} during shaping. The reason that the
1030 shape we abscribe to the signatures \verb|A1| and \verb|A2| are
1031 equivalent to the shapes for these which should merge:
1032
1033 \begin{verbatim}
1034 signature A1(A(foo)) where
1035 data A = A { foo :: Int, bar :: Bool }
1036
1037 signature A2(A(bar)) where
1038 data A = A { foo :: Int, bar :: Bool }
1039 \end{verbatim}
1040
1041 \subsection{Subtyping record selectors as functions}
1042
1043 \begin{verbatim}
1044 signature H(A, foo) where
1045 data A
1046 foo :: A -> Int
1047
1048 module M(A, foo) where
1049 data A = A { foo :: Int, bar :: Bool }
1050 \end{verbatim}
1051 %
1052 Does \verb|M| successfully fill \verb|H|? If so, it means that anywhere
1053 a signature requests a function \verb|foo|, we can instead validly
1054 provide a record selector. This capability seems quite attractive,
1055 although in practice record selectors rarely seem to be abstracted this
1056 way: one reason is that \verb|M.foo| still \emph{is} a record selector,
1057 and can be used to modify a record. (Many library authors find this
1058 suprising!)
1059
1060 Nor does this seem to be an insurmountable instance of the avoidance
1061 problem:
1062 as a workaround, \verb|H| can equivalently be written as:
1063
1064 \begin{verbatim}
1065 signature H(foo) where
1066 data A = A { foo :: Int, bar :: Bool }
1067 \end{verbatim}
1068 %
1069 However, you might not like this, as the otherwise irrelevant \verb|bar| must be mentioned
1070 in the definition.
1071
1072 In any case, actually implementing this `subtyping' is quite complicated, because we can no
1073 longer assume that every child name is associated with a parent name.
1074 The technical difficulty is that we now need to unify a plain identifier
1075 \I{AvailInfo} (from the signature) with a type constructor \I{AvailInfo}
1076 (from a module.) It is not clear what this should mean.
1077 Consider this situation:
1078
1079 \begin{verbatim}
1080 unit p where
1081 signature H(A, foo, bar) where
1082 data A
1083 foo :: A -> Int
1084 bar :: A -> Bool
1085 module X(A, foo) where
1086 import H
1087 unit q where
1088 include p
1089 signature H(bar) where
1090 data A = A { foo :: Int, bar :: Bool }
1091 module Y where
1092 import X(A(..)) -- ???
1093 \end{verbatim}
1094
1095 Should the wildcard import on \verb|X| be allowed?
1096 This question is equivalent to whether or not shaping discovers
1097 whether or not a function is a record selector and propagates this
1098 information elsewhere.
1099 If the wildcard is not allowed, here is another situation:
1100
1101 \begin{verbatim}
1102 unit p where
1103 -- define without record selectors
1104 signature X1(A, foo) where
1105 data A
1106 foo :: A -> Int
1107 module M1(A, foo) where
1108 import X1
1109
1110 unit q where
1111 -- define with record selectors (X1s unify)
1112 signature X1(A(..)) where
1113 data A = A { foo :: Int, bar :: Bool }
1114 signature X2(A(..)) where
1115 data A = A { foo :: Int, bar :: Bool }
1116
1117 -- export some record selectors
1118 signature Y1(bar) where
1119 import X1
1120 signature Y2(bar) where
1121 import X2
1122
1123 unit r where
1124 include p
1125 include q
1126
1127 -- sharing constraint
1128 signature Y2(bar) where
1129 import Y1(bar)
1130
1131 -- the payload
1132 module Test where
1133 import M1(foo)
1134 import X2(foo)
1135 ... foo ... -- conflict?
1136 \end{verbatim}
1137
1138 Without the sharing constraint, the \verb|foo|s from \verb|M1| and \verb|X2|
1139 should conflict. With it, however, we should conclude that the \verb|foo|s
1140 are the same, even though the \verb|foo| from \verb|M1| is \emph{not}
1141 considered a child of \verb|A|, and even though in the sharing constraint
1142 we \emph{only} unified \verb|bar| (and its parent \verb|A|). To know that
1143 \verb|foo| from \verb|M1| should also be unified, we have to know a bit
1144 more about \verb|A| when the sharing constraint performs unification;
1145 however, the \I{AvailInfo} will only tell us about what is in-scope, which
1146 is \emph{not} enough information.
1147
1148 \subsection{Some examples}
1149
1150 \subsubsection{A simple example}
1151
1152 In the following set of units:
1153
1154 \begin{verbatim}
1155 unit p(M) requires (A) where
1156 signature A(T) where
1157 data T
1158 module M(T, S) where
1159 import A(T)
1160 data S = S T
1161
1162 unit q where
1163 module A where
1164 data T = T
1165 include p
1166 \end{verbatim}
1167
1168 When we \verb|include p|, we need to merge the partial shape
1169 of \verb|q| (with just provides \verb|A|) with the shape
1170 of \verb|p|. Here is each step of the merging process:
1171
1172 \begin{verbatim}
1173 shape 1 shape 2
1174 --------------------------------------------------------------------------------
1175 (initial shapes)
1176 provides: A -> THIS:A { q():A.T } M -> p(A -> HOLE:A) { HOLE:A.T, p(A -> HOLE:A).S }
1177 requires: (nothing) A -> { HOLE:A.T }
1178
1179 (after filling requirements)
1180 provides: A -> THIS:A { q():A.T } M -> p(A -> THIS:A) { q():A.T, p(A -> THIS:A).S }
1181 requires: (nothing) (nothing)
1182
1183 (after adding provides)
1184 provides: A -> THIS:A { q():A.T }
1185 M -> p(A -> THIS:A) { q():A.T, p(A -> THIS:A).S }
1186 requires: (nothing)
1187 \end{verbatim}
1188
1189 Notice that we substituted \verb|HOLE:A| with \verb|THIS:A|, but \verb|HOLE:A.T| with \verb|q():A.T|.
1190
1191 \subsubsection{Requirements merging can affect provisions}
1192
1193 When a merge results in a substitution, we substitute over both
1194 requirements and provisions:
1195
1196 \begin{verbatim}
1197 signature H(T) where
1198 data T
1199 module A(T) where
1200 import H(T)
1201 module B(T) where
1202 data T = T
1203
1204 -- provides: A -> THIS:A { HOLE:H.T }
1205 -- B -> THIS:B { THIS:B.T }
1206 -- requires: H -> { HOLE:H.T }
1207
1208 signature H(T, f) where
1209 import B(T)
1210 f :: a -> a
1211
1212 -- provides: A -> THIS:A { THIS:B.T } -- UPDATED
1213 -- B -> THIS:B { THIS:B.T }
1214 -- requires: H -> { THIS:B.T, HOLE:H.f } -- UPDATED
1215 \end{verbatim}
1216
1217 \subsubsection{Sharing constraints}
1218
1219 Suppose you have two signature which both independently define a type,
1220 and you would like to assert that these two types are the same. In the
1221 ML world, such a constraint is known as a sharing constraint. Sharing
1222 constraints can be encoded in Backpacks via clever use of reexports;
1223 they are also an instructive example for signature merging.
1224
1225 \begin{verbatim}
1226 signature A(T) where
1227 data T
1228 signature B(T) where
1229 data T
1230
1231 -- requires: A -> { HOLE:A.T }
1232 B -> { HOLE:B.T }
1233
1234 -- the sharing constraint!
1235 signature A(T) where
1236 import B(T)
1237 -- (shape to merge)
1238 -- requires: A -> { HOLE:B.T }
1239
1240 -- (after merge)
1241 -- requires: A -> { HOLE:A.T }
1242 -- B -> { HOLE:A.T }
1243 \end{verbatim}
1244 %
1245 \Red{I'm pretty sure any choice of \textit{Name} is OK, since the
1246 subsequent substitution will make it alpha-equivalent.}
1247
1248 \subsection{Export declarations}
1249
1250 If an explicit export declaration is given, the final shape is the
1251 computed shape, minus any provisions not mentioned in the list, with the
1252 appropriate renaming applied to provisions and requirements. (Requirements
1253 are implicitly passed through if they are not named.)
1254 If no explicit export declaration is given, the final shape is
1255 the computed shape, including only provisions which were defined
1256 in the declarations of the unit.
1257
1258
1259 \section{Cabal}
1260
1261
1262 % \I{InstalledUnitId} & ::= & \I{IndefUnitId} \verb|(| \, m \; \verb|->| \; \I{Module} \verb|,|\, \ldots\, \verb|)| & \mbox{Also known as \I{UnitKey}} \\
1263 % \I{Module} & ::= & \I{InstalledUnitId} \verb|:| m \\
1264
1265 \paragraph{Indefinite versus installed units}
1266 The purpose of an \I{IndefUnitId} is to uniquely identify the results of \textbf{typechecking}
1267 an indefinite unit; whereas an \I{InstalledUnitId} uniquely identifies
1268 the results of \textbf{compiling} a unit with all its holes
1269 filled. Thus, an \I{InstalledUnitId} also records a \emph{hole mapping}
1270 which specifies how each hole was filled. If an \I{InstalledUnitId}
1271 is only partially filled, we may refer to it as a \I{UnitKey} (as these
1272 are never installed.)
1273
1274 \paragraph{Units versus packages}
1275
1276 Cabal packages are:
1277
1278 \begin{itemize}
1279 \item The unit of distribution
1280 \item The unit that Hackage handles
1281 \item The unit of versioning
1282 \item The unit of ownership (who maintains it etc)
1283 \end{itemize}
1284
1285 Backpack units are the building blocks of modular development;
1286 there may be multiple units per a Cabal package. While in theory
1287 Cabal could do sophisticated things with multiple units in a
1288 package, we expect Cabal
1289 to pick a distinguished unit (with the same unit name $p$ as
1290 the package) which serves as the publically visible unit.
1291
1292
1293 %\newpage
1294
1295 \end{document} % chktex 16