more guff about why the client has its own monad
[packages/hoopl.git] / paper / dfopt.tex
1 \newif\ifhaskellworkshop\haskellworkshoptrue
2
3 \iffalse
4
5 Payload: alternatives for functions polymorphic in node type
6
7 (n C O -> a, n C O -> b, n C O -> c)
8
9 vs
10
11 (forall e x . n e x -> ShapeTag e x, forall e x . n e x -> result e x)
12
13 where
14
15 data ShapeTag e x where
16 First :: ShapeTag C O
17 Middle :: ShapeTag O O
18 Last :: ShapeTag O C
19
20 result C O = a
21 result O O = b
22 result O C = c
23
24 The function returning ShapeTag can be passed in a type-class
25 dictionary.
26
27
28
29
30 Say something about the proliferation of heavyweight type signatures
31 required for GADT pattern matches. When the signature is three times
32 the size of the function, something is wrong...
33
34
35 I'm going to leave this point out, because in all the client code
36 we've written, we are (a) matching only on a node, and (b) matching on
37 all cases. In this scenario the GADT exhaustiveness checker provides
38 no additional benefits. Indeed, GADTs can be an irritant to the
39 client: in many pattern matches, GADTs make it impossible to default
40 cases.
41
42
43 I was thinking again about unwrapped nodes, cons-lists, snoc-lists,
44 and tree fringe. I think there's an analogy with ordinary lists, and
45 the analogy is 'concatMap' vs 'fold'. Just as with lists, the common
46 case of 'concatMap (\a -> [a])' does a lot of unnecessary wrapping and
47 unwrapping of elements. We nevertheless prefer 'concatMap' because it
48 provides better separation of concerns. But this analogy suggests
49 several ideas:
50
51 - Can block processing be written in higher-order fashion?
52
53 - Can it be done in both styles (concatMap *and* fold)?
54
55 - Once we have framed the problem in these terms, can we write
56 fold-style cold that is not too hard to understand and maintain?
57
58 - Finally, is there an analog of stream fusion that would work for
59 control-flow graphs, enabling us to write some graph combinators
60 that be both perspicuous and efficient?
61
62 These are good observations for the paper and for future work.
63
64
65 ----------------------------------------------------------------
66
67
68 P.S. The three of us should have a nice little Skype chat about
69 higher-rank types. A lot of the questions still at issue boil down to
70 the following observations:
71
72 - It's really convenient for the *implementation* of Hoopl to put
73 forall to the left of an arrow. Polymorphic functions as
74 arguments are powerful and keen, and they make it really easy for
75 Hoopl to do its job.
76
77 - It's a bid inconvenient for a *client* of Hoopl to be forced to
78 supply functions that are polymorphic in shape. All constructors
79 have to be written out by hand; defaulting is not possible. (Or
80 at least John and I haven't figured out how to do it.)
81
82 - Using higher-rank types in Hoopl's interface makes some very
83 desirable things downright impossible:
84
85 - One can't write a generic dominator analysis with type
86
87 dom :: NonLocal n => FwdPass n Dominators
88
89 - One can't write a generic debugging wrapper of type
90
91 debug :: (Show n, Show f)
92 => TraceFn -> WhatToTrace -> FwdPass n f -> FwdPass n f
93
94 - One can't write a combinator of type
95
96 product :: FwdPass n f -> FwdPass n f' -> FwdPass n (f, f')
97
98 I submit that these things are all very desirable.
99
100 I'm therefore strongly in favor of removing the *requirement* that a
101 FwdPass include transfer and rewrite functions that are polymorphic in
102 shape. It will be a bit tedious to return to triples of functions,
103 but for those lucky clients who *can* write polymorphic functions and
104 who wish to, we can provide injection functions.
105
106 I should stress that I believe the tedium can be contained within
107 reasonable bounds---for example, the arfXXX functions that are
108 internal to Hoopl all remain polymorphic in shape (although not
109 higher-rank any longer).
110
111 \fi
112
113
114 %
115 % TODO
116 %
117 %
118 % AGraph = graph under construction
119 % Graph = replacement graph
120 %
121 %%% Hoopl assumes that the replacement graph for a node open on exit
122 %%% doesn't contain any additional exits
123 %%%
124 %%% introduce replacement graph in section 3, after graph.
125 %%% it has arbitrarily complicated internal control flow, but only
126 %%% one exit (if open on exit)
127 %%%
128 %%% a rquirement on the client that is not checked statically.
129
130
131 \input{dfoptdu.tex}
132
133 \newif\ifpagetuning \pagetuningtrue % adjust page breaks
134
135 \newif\ifnoauthornotes \noauthornotesfalse
136
137 \newif\iftimestamp\timestamptrue % show MD5 stamp of paper
138
139 \timestampfalse % it's post-submission time
140
141 \IfFileExists{timestamp.tex}{}{\timestampfalse}
142
143 \newif\ifcutting \cuttingfalse % cutting down to submission size
144
145
146 \newif\ifgenkill\genkillfalse % have a section on gen and kill
147 \genkillfalse
148
149
150 \newif\ifnotesinmargin \notesinmarginfalse
151 \IfFileExists{notesinmargin.tex}{\notesinmargintrue}{\relax}
152
153 \documentclass[blockstyle,natbib]{sigplanconf}
154
155 \newcommand\ourlib{Hoopl}
156 % higher-order optimization library
157 % ('Hoople' was taken -- see hoople.org)
158 \let\hoopl\ourlib
159
160 \newcommand\ag{\ensuremath{\mathit{ag}}}
161 \renewcommand\ag{\ensuremath{g}} % not really seeing that 'ag' is helpful here ---NR
162 \newcommand\rw{\ensuremath{\mathit{rw}}}
163
164 % l2h substitution ourlib Hoopl
165 % l2h substitution hoopl Hoopl
166
167 \newcommand\fs{\ensuremath{\mathit{fs}}} % dataflow facts, possibly plural
168
169 \newcommand\vfilbreak[1][\baselineskip]{%
170 \vskip 0pt plus #1 \penalty -200 \vskip 0pt plus -#1 }
171
172 \usepackage{alltt}
173 \usepackage{array}
174 \usepackage{afterpage}
175 \newcommand\lbr{\char`\{}
176 \newcommand\rbr{\char`\}}
177
178 \clubpenalty=10000
179 \widowpenalty=10000
180
181 \usepackage{verbatim} % allows to define \begin{smallcode}
182 \newenvironment{smallcode}{\par\unskip\small\verbatim}{\endverbatim}
183 \newenvironment{fuzzcode}[1]{\par\unskip\hfuzz=#1 \verbatim}{\endverbatim}
184 \newenvironment{smallfuzzcode}[1]{\par\unskip\small\hfuzz=#1 \verbatim}{\endverbatim}
185 \newenvironment{smallttcode}{\par\unskip\small\alltt}{\endalltt}
186
187 \newcommand\smallverbatiminput[1]{{\small\verbatiminput{#1}}}
188
189 \newcommand\lineref[1]{line~\ref{line:#1}}
190 \newcommand\linepairref[2]{lines \ref{line:#1}~and~\ref{line:#2}}
191 \newcommand\linerangeref[2]{\mbox{lines~\ref{line:#1}--\ref{line:#2}}}
192 \newcommand\Lineref[1]{Line~\ref{line:#1}}
193 \newcommand\Linepairref[2]{Lines \ref{line:#1}~and~\ref{line:#2}}
194 \newcommand\Linerangeref[2]{\mbox{Lines~\ref{line:#1}--\ref{line:#2}}}
195
196 \makeatletter
197
198 \let\c@table=
199 \c@figure % one counter for tables and figures, please
200
201 \newcommand\setlabel[1]{%
202 \setlabel@#1!!\@endsetlabel
203 }
204 \def\setlabel@#1!#2!#3\@endsetlabel{%
205 \ifx*#1*% line begins with label or is empty
206 \ifx*#2*% line is empty
207 \verbatim@line{}%
208 \else
209 \@stripbangs#3\@endsetlabel%
210 \label{line:#2}%
211 \fi
212 \else
213 \@stripbangs#1!#2!#3\@endsetlabel%
214 \fi
215 }
216 \def\@stripbangs#1!!\@endsetlabel{%
217 \verbatim@line{#1}%
218 }
219
220
221 \verbatim@line{hello mama}
222
223 \newcommand{\numberedcodebackspace}{0.5\baselineskip}
224
225 \newcounter{codeline}
226 \newenvironment{numberedcode}
227 {\endgraf
228 \def\verbatim@processline{%
229 \noindent
230 \expandafter\ifx\expandafter+\the\verbatim@line+ % blank line
231 %{\small\textit{\def\rmdefault{cmr}\rmfamily\phantom{00}\phantom{: \,}}}%
232 \else
233 \refstepcounter{codeline}%
234 {\small\textit{\def\rmdefault{cmr}\rmfamily\phantom{00}\llap{\arabic{codeline}}: \,}}%
235 \fi
236 \expandafter\setlabel\expandafter{\the\verbatim@line}%
237 \expandafter\ifx\expandafter+\the\verbatim@line+ % blank line
238 \vspace*{-\numberedcodebackspace}\par%
239 \else
240 \the\verbatim@line\par
241 \fi}%
242 \verbatim
243 }
244 {\endverbatim}
245
246 \makeatother
247
248 \newcommand\arrow{\rightarrow}
249
250 \newcommand\join{\sqcup}
251 \newcommand\slotof[1]{\ensuremath{s_{#1}}}
252 \newcommand\tempof[1]{\ensuremath{t_{#1}}}
253 \let\tempOf=\tempof
254 \let\slotOf=\slotof
255
256 \makeatletter
257 \newcommand{\nrmono}[1]{%
258 {\@tempdima = \fontdimen2\font\relax
259 \texttt{\spaceskip = 1.1\@tempdima #1}}}
260 \makeatother
261
262 \usepackage{times} % denser fonts
263 \renewcommand{\ttdefault}{aett} % \texttt that goes better with times fonts
264 \usepackage{enumerate}
265 \usepackage{url}
266 \usepackage{graphicx}
267 \usepackage{natbib} % redundant for Simon
268 \bibpunct();A{},
269 \let\cite\citep
270 \let\citeyearnopar=\citeyear
271 \let\citeyear=\citeyearpar
272
273 \usepackage[ps2pdf,bookmarksopen,breaklinks,pdftitle=dataflow-made-simple]{hyperref}
274 \usepackage{breakurl} % enables \burl
275
276 \newcommand\naive{na\"\i ve}
277 \newcommand\naively{na\"\i vely}
278 \newcommand\Naive{Na\"\i ve}
279
280 \usepackage{amsfonts}
281 \newcommand\naturals{\ensuremath{\mathbb{N}}}
282 \newcommand\true{\ensuremath{\mathbf{true}}}
283 \newcommand\implies{\supseteq} % could use \Rightarrow?
284
285 \newcommand\PAL{\mbox{C{\texttt{-{}-}}}}
286 \newcommand\high[1]{\mbox{\fboxsep=1pt \smash{\fbox{\vrule height 6pt
287 depth 0pt width 0pt \leavevmode \kern 1pt #1}}}}
288
289 \usepackage{tabularx}
290
291 %%
292 %% 2009/05/10: removed 'float' package because it breaks multiple
293 %% \caption's per {figure} environment. ---NR
294 %%
295 %% % Put figures in boxes --- WHY??? --NR
296 %% \usepackage{float}
297 %% \floatstyle{boxed}
298 %% \restylefloat{figure}
299 %% \restylefloat{table}
300
301
302
303 % ON LINE THREE, set \noauthornotestrue to suppress notes (or not)
304
305 %\newcommand{\qed}{QED}
306 \ifnotesinmargin
307 \long\def\authornote#1{%
308 \ifvmode
309 \marginpar{\raggedright\hbadness=10000
310 \parindent=8pt \parskip=2pt
311 \def\baselinestretch{0.8}\tiny
312 \itshape\noindent #1\par}%
313 \else
314 \unskip\raisebox{-3.5pt}{\rlap{$\scriptstyle\diamond$}}%
315 \marginpar{\raggedright\hbadness=10000
316 \parindent=8pt \parskip=2pt
317 \def\baselinestretch{0.8}\tiny
318 \itshape\noindent #1\par}%
319 \fi}
320 \else
321 % Simon: please set \notesinmarginfalse on the first line
322 \long\def\authornote#1{{\em #1\/}}
323 \fi
324 \ifnoauthornotes
325 \def\authornote#1{\unskip\relax}
326 \fi
327
328 \long\def\newtext#1{{\mbox{$\clubsuit$}}{\slshape\ignorespaces#1}{\mbox{$\clubsuit$}}}
329 \newenvironment{ntext}{\mbox{$\clubsuit$}\slshape\ignorespaces}{\mbox{$\clubsuit$}}
330
331 \newcommand{\simon}[1]{\authornote{SLPJ: #1}}
332 \newcommand{\norman}[1]{\authornote{NR: #1}}
333 \let\remark\norman
334 \def\finalremark#1{\relax}
335 % \let \finalremark \remark % uncomment after submission
336 \newcommand{\john}[1]{\authornote{JD: #1}}
337 \newcommand{\todo}[1]{\textbf{To~do:} \emph{#1}}
338 \newcommand\delendum[1]{\relax\ifvmode\else\unskip\fi\relax}
339
340 \newcommand\secref[1]{Section~\ref{sec:#1}}
341 \newcommand\secreftwo[2]{Sections \ref{sec:#1}~and~\ref{sec:#2}}
342 \newcommand\seclabel[1]{\label{sec:#1}}
343
344 \newcommand\figref[1]{Figure~\ref{fig:#1}}
345 \newcommand\figreftwo[2]{Figures \ref{fig:#1}~and~\ref{fig:#2}}
346 \newcommand\figlabel[1]{\label{fig:#1}}
347
348 \newcommand\tabref[1]{Table~\ref{tab:#1}}
349 \newcommand\tablabel[1]{\label{tab:#1}}
350
351
352 \newcommand{\CPS}{\textbf{StkMan}} % Not sure what to call it.
353
354
355 \usepackage{code} % At-sign notation
356
357 \IfFileExists{timestamp.tex}{\input{timestamp}}{}
358 \iftimestamp
359 \preprintfooter{\mdfivestamp}
360 \fi
361
362 \hyphenation{there-by}
363
364 \renewcommand{\floatpagefraction}{0.9} % must be less than \topfraction
365 \renewcommand{\topfraction}{0.95}
366 \renewcommand{\textfraction}{0.05}
367
368 \begin{document}
369 %\title{\ourlib: Dataflow Optimization Made Simple}
370 \title{\ourlib: A Modular, Reusable Library for\\ Dataflow Analysis and Transformation}
371 %\title{Implementing Dataflow Analysis and Optimization by Lifting Node Functions to Basic Blocks and Control-Flow Graphs}
372 %\subtitle{Programming pearl}
373
374 \titlebanner{\textsf{\mdseries\itshape
375 Under consideration for publication in the ACM Haskell Symposium.
376 Comments
377 are welcome; please identify this version by the words
378 \textbf{\mdfivewords}.
379 }}
380
381 \conferenceinfo{Haskell'10,} {September 30, 2010, Baltimore, Maryland, USA.}
382 \CopyrightYear{2010}
383 \copyrightdata{978-1-4503-0252-4/10/09}
384
385 \ifnoauthornotes
386 \makeatletter
387 \let\HyPsd@Warning=
388 \@gobble
389 \makeatother
390 \fi
391
392 % João
393
394
395 \authorinfo{Norman Ramsey}{Tufts University}{nr@cs.tufts.edu}
396 \authorinfo{Jo\~ao Dias}{Tufts University}{dias@cs.tufts.edu}
397 \authorinfo{Simon Peyton Jones}{Microsoft Research}{simonpj@microsoft.com}
398
399
400 \maketitle
401
402 \begin{abstract}
403 \iffalse % A vote for Simon's abstract
404 \remark{I have replaced a good abstract of the POPL submission with a
405 bad abstract of \emph{this} submission.}
406 We present \ourlib, a Haskell library that makes it easy for a
407 compiler writer
408 to implement program transformations based on dataflow analyses.
409 A~client of \ourlib\ defines a representation of
410 logical assertions,
411 a transfer function that computes outgoing assertions from incoming
412 assertions,
413 and a rewrite function that improves code when improvements are
414 justified by the assertions.
415 \ourlib\ does the actual analysis and transformation.
416
417 \ourlib\ implements state-of-the art algorithms:
418 Lerner, Grove, and Chambers's
419 \citeyearpar{lerner-grove-chambers:2002}
420 composition of simple analyses and transformations, which achieves
421 the same precision as complex, handwritten
422 ``super-analyses;''
423 and Whalley's \citeyearpar{whalley:isolation} dynamic technique for
424 isolating bugs in a client's code.
425 \ourlib's implementation is unique in that unlike previous
426 implementations,
427 it carefully separates the tricky
428 elements of each of these algorithms, so that they can be examined and
429 understood independently.
430
431
432 \simon{Here is an alternative abstract based on the four-sentence model.}
433 \remark{Four-sentence model? You must teach me\ldots}
434 \fi
435 Dataflow analysis and transformation of control-flow graphs is
436 pervasive in optimizing compilers, but it is typically tightly
437 interwoven with the details of a \emph{particular} compiler.
438 We~describe \ourlib{}, a reusable Haskell library that makes it
439 unusually easy to define new analyses and
440 transformations for \emph{any} compiler.
441 \ourlib's interface is modular and polymorphic,
442 and it offers unusually strong static guarantees.
443 The implementation encapsulates
444 state-of-the-art algorithms (interleaved analysis and rewriting,
445 dynamic error isolation), and it cleanly separates their tricky elements
446 so that they can be understood independently.
447 %
448 %\ourlib\ will be the workhorse of a new
449 %back end for the Glasgow Haskell Compiler (version~6.14, forthcoming).
450
451 %% \emph{Reviewers:} code examples are indexed at {\small\url{http://bit.ly/jkr3K}}
452 %%% Source: http://www.cs.tufts.edu/~nr/drop/popl-index.pdf
453 \end{abstract}
454
455 \makeatactive % Enable @foo@ notation
456
457 \section{Introduction}
458
459 A mature optimizing compiler for an imperative language includes many
460 analyses, the results of which justify the optimizer's
461 code-improving transformations.
462 Many of the most important analyses and transformations---constant
463 propagation, live-variable analysis, inlining, sinking of loads,
464 and so on---should be regarded as particular cases of
465 a single general problem: \emph{dataflow analysis and optimization}.
466 %% \remark{I do not feel compelled to cite Muchnick (or anyone else) here}
467 Dataflow analysis is over thirty years old,
468 but a recent, seminal paper by \citet{lerner-grove-chambers:2002} goes further,
469 describing a powerful but subtle way to
470 \emph{interleave} analysis and transformation so that each
471 piggybacks on the other.
472
473 Because optimizations based on dataflow analysis
474 share a common intellectual framework, and because that framework is
475 subtle, it it tempting to
476 try to build a single reusable library that embodies the
477 subtle ideas, while
478 making it easy for clients to instantiate the library for different
479 situations.
480 % Tempting, but difficult.
481 Although such libraries exist, as we discuss
482 in \secref{related}, they have complex APIs and implementations,
483 and none implements the Lerner/Grove/Chambers technique.
484
485 In this paper we present \ourlib{} (short for ``higher-order
486 optimization library''), a new Haskell library for dataflow analysis and
487 optimization. It has the following distinctive characteristics:
488
489 \begin{itemize}
490 \item
491 \ourlib\ is purely functional.
492 Although pure functional languages are not obviously suited to writing
493 standard algorithms that
494 manipulate control-flow graphs,
495 the pure functional code is actually easier to write, and far easier
496 to write correctly, than code that is mostly functional but uses a mutable
497 representation of graphs
498 \cite{ramsey-dias:applicative-flow-graph}.
499 When analysis and rewriting
500 are interleaved, so that rewriting must be done \emph{speculatively},
501 without knowing whether
502 the result of the rewrite will be retained or discarded,
503 the benefit of a purely functional style is intensified.
504
505 \item
506 \ourlib\ is polymorphic. Just as a list library is
507 polymorphic in the list elements, so is \ourlib{} polymorphic, both in
508 the nodes that inhabit graphs and in the dataflow facts that
509 analyses compute over these graphs (\secref{using-hoopl}).
510
511 \item
512 The paper by Lerner, Grove, and Chambers is inspiring but abstract.
513 We articulate their ideas in a concrete, simple API,
514 which hides
515 a subtle implementation
516 (\secreftwo{graph-rep}{using-hoopl}).
517 You provide a representation for assertions,
518 a transfer function that transforms assertions across a node,
519 and a rewrite function that uses an assertion to
520 justify rewriting a node.
521 \ourlib\ ``lifts'' these node-level functions to work over
522 control-flow graphs, solves recursion equations,
523 and interleaves rewriting with analysis.
524 Designing good APIs is surprisingly
525 hard; we have been through over a dozen significantly different
526 iterations, and we offer our API as a contribution.
527
528 \item
529 Because the client
530 can perform very local reasoning (``@y@ is live before
531 @x:=y+2@''),
532 \seclabel{liveness-mention}
533 analyses and transformations built on \ourlib\
534 are small, simple, and easy to get right.
535 Moreover, \ourlib\ helps you write correct optimizations:
536 it~statically rules out transformations that violate invariants
537 of the control-flow graph (Sections \ref{sec:graph-rep} and \ref{sec:rewrites}),
538 and dynamically it can help find the first transformation that introduces a fault
539 in a test program (\secref{fuel}).
540 \finalremark{SLPJ: I wanted to write more about open/closed,
541 but I like this sentence with its claim to both static and dynamic assistance,
542 and maybe the open/closed story is hard to understand here.}
543
544 % \item \ourlib{} makes use of GADTS and type functions to offer unusually
545 % strong static guarantees. In particular, nodes, basic blocks, and
546 % graphs are all statically typed by their open or closedness on entry, and
547 % their open or closedness on exit (\secref{graph-rep}). For example, an add instruction is
548 % open on entry and exit, while a branch is open on entry and closed on exit.
549 % Using these types we can statically guarantee that, say, an add instruction
550 % is rewritten to a graph that is also open at both entry and exit; and
551 % that the user cannot construct a block where an add instruction follows an
552 % unconditional branch. We know of no other system that offers
553 % static guarantees this strong.
554
555 \item \ourlib{} implements subtle algorithms, including
556 (a)~interleaved analysis and rewriting, (b)~speculative rewriting,
557 (c)~computing fixed points, and (d)~dynamic fault isolation.
558 Previous implementations of these algorithms---including three of our
559 own---are complicated and hard to understand, because the tricky pieces
560 are implemented all together, inseparably.
561 In~this paper, each tricky piece is handled in just
562 one place, separate from all the others (\secref{engine}).
563 We~emphasize this implementation as an object of interest in
564 its own right.
565 \end{itemize}
566 Our work bridges the gap between abstract, theoretical presentations
567 and actual compilers.
568 \ourlib{} is available from
569 \burl{http://ghc.cs.tufts.edu/hoopl} and also from Hackage (version~3.8.3.0).
570 One of \hoopl's clients
571 is the Glasgow Haskell Compiler, which uses \hoopl\ to optimize
572 imperative code in GHC's back end.
573
574 The API for \ourlib{} requires
575 relatively sophisticated aspects of Haskell's type system, such
576 as higher-rank polymorphism, GADTs, and type functions.
577 \ourlib{} therefore may also serve as a case study in the utility
578 of these features.
579
580
581 \section{Dataflow analysis {\&} transformation by \texorpdfstring{\rlap{example}}{example}}
582 \seclabel{overview}
583 \seclabel{constant-propagation}
584 \seclabel{example:transforms}
585 \seclabel{example:xforms}
586
587 %% We begin by setting the scene, introducing some vocabulary, and
588 %% showing a small motivating example.
589 A control-flow graph, perhaps representing the body of a procedure,
590 is a collection of \emph{basic blocks}---or just ``blocks.''
591 Each block is a sequence of instructions,
592 beginning with a label and ending with a
593 control-transfer instruction that branches to other blocks.
594 % Each block has a label at the beginning,
595 % a sequence of
596 % -- each of which has a label at the
597 % beginning. Each block may branch to other blocks with arbitrarily
598 % complex control flow.
599 The goal of dataflow optimization is to compute valid
600 \emph{assertions} (or \emph{dataflow facts}),
601 then use those assertions to justify code-improving
602 transformations (or \emph{rewrites}) on a \emph{control-flow graph}.
603
604 As~a concrete example, we consider constant propagation with constant
605 folding.
606 On the left we have a basic block; in the middle we have
607 facts that hold between statements (or \emph{nodes})
608 in the block; and at
609 the right we have the result of transforming the block
610 based on the assertions:
611 \begin{verbatim}
612 Before Facts After
613 ------------{}-------------
614 x := 3+4 x := 7
615 ----------{x=7}------------
616 z := x>5 z := True
617 -------{x=7, z=True}-------
618 if z goto L1
619 then goto L1
620 else goto L2
621 \end{verbatim}
622 Constant propagation works
623 from top to bottom. We start with the empty fact.
624 Given the empty fact and the node @x:=3+4@ can we make a (constant-folding)
625 transformation?
626 Yes: we can replace the node with @x:=7@.
627 Now, given this transformed node,
628 and the original fact, what fact flows out of the bottom of
629 the transformed node?
630 The~fact \{@x=7@\}.
631 Given the fact \{@x=7@\} and the node @z:=x>5@, can we make a
632 transformation? Yes: constant propagation can replace the node with @z:=7>5@.
633 Now, can we make another transformation? Yes: constant folding can
634 replace the node with @z:=True@.
635 The~process continues to the end of the block, where we
636 can replace the conditional branch with an unconditional one, @goto L1@.
637
638 The example above is simple because the program has only straightline code;
639 when programs have loops, dataflow analysis gets more complicated.
640 For example,
641 consider the following graph,
642 where we assume @L1@ is the entry point:
643 \begin{verbatim}
644 L1: x=3; y=4; if z then goto L2 else goto L3
645 L2: x=7; goto L3
646 L3: ...
647 \end{verbatim}
648 Because control flows to @L3@ from two places (@L1@~and~@L2@),
649 we must \emph{join} the facts coming from those two places.
650 All paths to @L3@ produce the fact @y=@$4$,
651 so we can conclude that this fact holds at~@L3@.
652 But depending on the the path to~@L3@, @x@~may have different
653 values, so we conclude ``@x=@$\top$'',
654 meaning that there is no single value held by~@x@ at~@L3@.
655 %\footnote{
656 %In this example @x@ really does vary at @L3@, but in general
657 %the analysis might be conservative.}
658 The final result of joining the dataflow facts that flow to~@L3@
659 is the new fact $\mathord{@x=@\top\!} \land \mathord{@y=4@} \land \mathord{@z=@\top}$.
660
661 \seclabel{const-prop-example}
662
663 \paragraph{Forwards and backwards.}
664 Constant propagation works \emph{forwards}, and a fact is often an
665 assertion about the program state (such as ``variable~@x@ holds value~@7@'').
666 Some useful analyses work \emph{backwards}.
667 A~prime example is live-variable analysis, where a fact takes the~form
668 ``variable @x@ is live'' and is an assertion about the
669 \emph{continuation} of a program point. For example, the fact ``@x@~is
670 live'' at a program point P is an assertion that @x@ is used on some program
671 path starting at \mbox{P}. % TeXbook, exercise 12.6
672 The accompanying transformation is called dead-code elimination;
673 if @x@~is not live, this transformation
674 replaces the node @x:=e@ with a no-op.
675
676 \seclabel{simple-tx}
677 \paragraph{Interleaved transformation and analysis.}
678 Our first example \emph{interleaves} transformation and analysis.
679 Interleaving makes it far easier to write effective analyses.
680 If, instead, we \emph{first} analyzed the block
681 and \emph{then} transformed it, the analysis would have to ``predict''
682 the transformations.
683 For example, given the incoming fact \{@x=7@\}
684 and the instruction @z:=x>5@,
685 a pure analysis could produce the outgoing fact
686 \{@x=7@, @z=True@\} by simplifying @x>5@ to @True@.
687 But the subsequent transformation must perform
688 \emph{exactly the same simplification} when it transforms the instruction to @z:=True@!
689 If instead we \emph{first} rewrite the node to @z:=True@,
690 and \emph{then} apply the transfer function to the new node,
691 the transfer function becomes laughably simple: it merely has to see if the
692 right hand side is a constant (you can see actual code in \secref{const-prop-client}).
693
694 Another example is the interleaving of liveness analysis and dead-code elimination.
695 As mentioned in \secref{liveness-mention},
696 it is sufficient for the analysis to~say ``@y@ is live before @x:=y+2@''.
697 It is not necessary to have the more complex rule
698 ``if @x@~is live after @x:=y+2@ then @y@ is live before~it,''
699 because if @x@~is \emph{not} live after @x:=y+2@,
700 the assignment @x:=y+2@ will be eliminated.
701 If~there are a number of interacting
702 analyses and/or transformations,
703 the benefit of interleaving them is even more compelling; for more
704 substantial examples, consult \citet{lerner-grove-chambers:2002}.
705 \newtext{
706 But the compelling benefits come at a cost:
707 in~programs that have loops, it is impossible to analyze every block
708 after all its predecessors, and it is possible that the fact
709 justifying a program transformation in a block~@L@ may be invalidated
710 by later analysis of one of @L@'s predecessors.
711 For~this reason, for safety's sake, transformations must be
712 \emph{speculative}.
713 \hoopl~manages speculation on the client's behalf (\secref{speculative-transform}).
714 }
715
716
717
718 \section{Representing control-flow graphs} \seclabel{graph-rep}
719
720 \ourlib{} is a library that makes it easy to define dataflow analyses,
721 and transformations driven by these analyses, on control-flow graphs.
722 Graphs are composed from smaller units, which we discuss from the
723 bottom up:\finalremark{something about replacement graphs?}
724 \begin{itemize}
725 \item A \emph{node} is defined by \ourlib's client;
726 \ourlib{} knows nothing about the representation of nodes (\secref{nodes}).
727 \item A basic \emph{block} is a sequence of nodes (\secref{blocks}).
728 \item Control-flow \emph{edges} connect nodes (\secref{edges}).
729 \item A \emph{graph} is an arbitrarily complicated control-flow graph,
730 composed from basic blocks (\secref{graphs}).
731 \end{itemize}
732
733 \subsection{Shapes: Open and closed}
734
735 Nodes, blocks, and graphs share important properties in common.
736 In particular, each is \emph{open or closed on entry}
737 and \emph{open or closed on exit}.
738 An \emph{open} point is one at which control may implicitly ``fall through;''
739 to transfer control at a \emph{closed} point requires an explicit
740 control-transfer instruction to a named label.
741 For example,
742 \begin{itemize}
743 \item A shift-left instruction is open on entry (because control can fall into it
744 from the preceding instruction), and open on exit (because control falls through
745 to the next instruction).
746 \item An unconditional branch is open on entry, but closed on exit (because
747 control cannot fall through to the next instruction).
748 \item A label is closed on entry (because in \ourlib{} we do not allow
749 control to fall through into a branch target), but open on exit.
750 \item
751 A~function call should be closed on exit,
752 because control can flow from a call site to multiple points:
753 for example, a~return continuation or an exception handler.
754 (And~after optimization, distinct call sites may share a return
755 continuation.)
756 \end{itemize}
757 % This taxonomy enables \ourlib{} to enforce invariants:
758 % only nodes closed on entry can be the targets of branches, and only nodes closed
759 % on exits can transfer control (see also \secref{nonlocal-class}).
760 % As~a consequence, all control transfers originate at control-transfer
761 % instructions and terminated at labels; this invariant dramatically
762 % simplifies analysis and transformation.
763 These examples concern nodes, but the same classification applies
764 to blocks and graphs. For example the block
765 \begin{code}
766 x:=7; y:=x+2; goto L
767 \end{code}
768 is open on entry and closed on exit.
769 This is the block's \emph{shape}, which we often abbreviate
770 ``open/closed;''
771 we may refer to an ``open/closed block.''
772
773 The shape of a thing determines that thing's control-flow properties.
774 In particular, whenever E is a node, block, or graph,
775 % : \simon{Removed the claim about a unique entry point.}
776 \begin{itemize}
777 \item
778 If E is open on entry, it has a unique predecessor;
779 if it is closed, it may have arbitrarily many predecessors---or none.
780 \item
781 If E is open on exit, it has a unique successor;
782 if it is closed, it may have arbitrarily many successors---or none.
783 \end{itemize}
784 %%%%
785 %%%%
786 %%%% % \item Regardless of whether E is open or closed,
787 %%%% % it has a unique entry point where execution begins.
788 %%%% \item If E is closed on exit, control leaves \emph{only}
789 %%%% by explicit branches from closed-on-exit nodes.
790 %%%% \item If E is open on exit, control \emph{may} leave E
791 %%%% by ``falling through'' from a distinguished exit point.
792 %%%% \remark{If E is a node or block, control \emph{only} leaves E by
793 %%%% falling through, but this isn't so for a graph. Example: a body of a
794 %%%% loop contains a \texttt{break} statement} \simon{I don't understand.
795 %%%% A break statement would have to be translated as a branch, no?
796 %%%% Can you give a an example? I claim that control only leaves an
797 %%%% open graph by falling through.}
798 %%%% \end{itemize}
799
800
801 \subsection{Nodes} \seclabel{nodes}
802
803 The primitive constituents of a \ourlib{} control-flow graph are
804 \emph{nodes}, which are defined by the client.
805 Typically, a node might represent a machine instruction, such as an
806 assignment, a call, or a conditional branch.
807 But \ourlib{}'s graph representation is \emph{polymorphic in the node type},
808 so each client can define nodes as it likes.
809 Because they contain nodes defined by the client,
810 graphs can include arbitrary client-specified data, including
811 (say) method calls, C~statements, stack maps, or
812 whatever.
813
814
815 \begin{figure}
816 \begin{fuzzcode}{0.98pt}
817 data `Node e x where
818 Label :: Label -> Node C O
819 `Assign :: Var -> Expr -> Node O O
820 `Store :: Expr -> Expr -> Node O O
821 `Branch :: Label -> Node O C
822 `CondBranch :: Expr -> Label -> Label -> Node O C
823 ... more constructors ...
824 \end{fuzzcode}
825 \caption{A typical node type as it might be defined by a client}
826 \figlabel{cmm-node}
827 \end{figure}
828
829 \ourlib{} knows \emph{at compile time} whether a node is open or
830 closed on entry and exit:
831 the type of a node has kind @*->*->*@, where the two type parameters
832 are type-level flags, one for entry and one for exit.
833 Such a type parameter may be instantiated only with type @O@~(for
834 open) or type~@C@ (for closed).
835
836 As an example,
837 \figref{cmm-node} shows a typical node type as it might be written by
838 one of \ourlib's {clients}.
839 The type parameters are written @e@ and @x@, for
840 entry and exit respectively.
841 The type is a generalized algebraic data type;
842 the syntax gives the type of each constructor.
843 %%% \cite{peyton-jones:unification-based-gadts}.
844 For example, constructor @Label@
845 takes a @Label@ and returns a node of type @Node C O@, where
846 the~``@C@'' says ``closed on entry'' and the~``@O@'' says ``open on exit''.
847 The types @Label@, @O@, and~@C@ are
848 defined by \ourlib{} (\figref{graph}).
849 As another example, constructor @Assign@ takes a variable and an expression, and it
850 returns a @Node@ open on both entry and exit; constructor @Store@ is
851 similar.
852 Types @`Var@ and @`Expr@ are private to the client, and
853 \ourlib{} knows nothing of them.
854 Finally, control-transfer nodes @Branch@ and @CondBranch@ are open on entry
855 and closed on exit.
856
857 Nodes closed on entry are the only targets of control transfers;
858 nodes open on entry and exit never perform control transfers;
859 and nodes closed on exit always perform control transfers\footnote{%
860 To obey these invariants,
861 a node for
862 a conditional-branch instruction, which typically either transfers control
863 \emph{or} falls through, must be represented as a two-target
864 conditional branch, with the fall-through path in a separate block.
865 This representation is standard \cite{appel:modern},
866 and it costs nothing in practice:
867 such code is easily sequentialized without superfluous branches.
868 %a late-stage code-layout pass can readily reconstruct efficient code.
869 }.
870 Because of the position each type of node occupies in a
871 basic block,
872 we~often call them \emph{first}, \emph{middle}, and \emph{last} nodes
873 respectively.
874
875 \subsection{Blocks} \seclabel{blocks}
876
877 \ourlib\ combines the client's nodes into
878 blocks and graphs, which, unlike the nodes, are defined by \ourlib{}
879 (\figref{graph}).
880 A~@Block@ is parameterized over the node type~@n@
881 as well as over the same flag types that make it open or closed at
882 entry and exit.
883
884 The @BFirst@, @BMiddle@, and @BLast@ constructors create one-node
885 blocks.
886 Each of these constructors is polymorphic in the node's \emph{representation}
887 but monomorphic in its \emph{shape}.
888 Why not use a single constructor
889 of type \mbox{@n e x -> Block n e x@}, which would be polymorphic in a
890 node's representation \emph{and}
891 shape?
892 Because by making the shape known statically, we simplify the
893 implementation of analysis and transformation in
894 \secref{dfengine}.
895 \finalremark{SLPJ: Do we intend that clients
896 can see the representation of blocks?
897 Answer: At present the representation is partially exposed, and we
898 have no principled answer to the question. ---NR
899 OK; but what do we tell our readers?
900 }
901
902 The @BCat@ constructor concatenates blocks in sequence.
903 It~makes sense to concatenate blocks only when control can fall
904 through from the first to the second; therefore,
905 two blocks may be concatenated {only} if each block is open at
906 the point of concatenation.
907 This restriction is enforced by the type of @BCat@, whose first
908 argument must be open on exit, and whose second argument must be open on entry.
909 It~is statically impossible, for example, to concatenate a @Branch@
910 immediately before an
911 @Assign@.
912 Indeed, the @Block@ type statically guarantees that any
913 closed/closed @Block@---which compiler writers normally
914 call a ``basic block''---consists of exactly one first node
915 (such as @Label@ in \figref{cmm-node}),
916 followed by zero or more middle nodes (@Assign@ or @Store@),
917 and terminated with exactly one
918 last node (@Branch@ or @CondBranch@).
919 Using GADTs to enforce these invariants is one of
920 \ourlib{}'s innovations.
921
922
923 \subsection{Control-flow edges and program points} \seclabel{edges}
924
925 In a block, a control-flow edge is implicit in every application of
926 the @BCat@ constructor.
927 An~implicit edge originates in a first node or a middle node and flows
928 to a middle node or a last node.
929 \emph{Explicit} edges between blocks are represented by the client;
930 an~explicit edge originates in a last node and flows to a (labelled)
931 first node.
932 \hoopl\ discovers explicit edges by using the @successors@ and
933 @entryLabel@ functions of the @NonLocal@ class.
934 Any~edge, whether implicit or explicit, is considered a program point,
935 and an analysis written using \hoopl\ computes a dataflow fact at each
936 such point.
937
938 \begin{figure}
939 \begin{fuzzcode}{0.98pt}
940 data `O -- Open
941 data `C -- Closed
942
943 data `Block n e x where
944 `BFirst :: n C O -> Block n C O
945 `BMiddle :: n O O -> Block n O O
946 `BLast :: n O C -> Block n O C
947 `BCat :: Block n e O -> Block n O x -> Block n e x
948
949 data `Graph n e x where
950 `GNil :: Graph n O O
951 `GUnit :: Block n O O -> Graph n O O
952 `GMany :: MaybeO e (Block n O C)
953 -> LabelMap (Block n C C)
954 -> MaybeO x (Block n C O)
955 -> Graph n e x
956
957 data `MaybeO ^ex t where
958 `JustO :: t -> MaybeO O t
959 `NothingO :: MaybeO C t
960
961 newtype `Label -- abstract type
962
963 class `NonLocal n where
964 `entryLabel :: n C x -> Label
965 `successors :: n e C -> [Label]
966 \end{fuzzcode}
967 \caption{The block and graph types defined by \ourlib}
968 \figlabel{graph} \figlabel{edges}
969 \end{figure}
970 % omit MaybeC :: * -> * -> *
971
972
973 \subsection{Graphs} \seclabel{graphs}
974
975 \ourlib{} composes blocks into graphs, which are also defined
976 in \figref{graph}.
977 Like @Block@, the data type @Graph@ is parameterized over
978 both nodes @n@ and its open/closed shape (@e@ and @x@).
979 It has three constructors. The first two
980 deal with the base cases of open/open graphs:
981 an empty graph is represented by @GNil@ while a single-block graph
982 is represented by @GUnit@.
983
984 More general graphs are represented by @GMany@, which has three
985 fields: an optional entry sequence, a body, and an optional exit
986 sequence.
987 \begin{itemize}
988 \item
989 If the graph is open on entry, it contains an \emph{entry sequence} of type
990 @Block n O C@.
991 We could represent this sequence as a value of type
992 @Maybe (Block n O C)@, but we can do better:
993 a~value of @Maybe@ type requires a \emph{dynamic} test,
994 but we know \emph{statically}, at compile time, that the sequence is present if and only
995 if the graph is open on entry.
996 We~express our compile-time knowledge by using the type
997 @MaybeO e (Block n O C)@, a type-indexed version of @Maybe@
998 which is also defined in \figref{graph}:
999 the type @MaybeO O a@ is isomorphic to~@a@, while
1000 the type @MaybeO C a@ is isomorphic to~@()@.
1001 \item
1002 The \emph{body} of the graph is a collection of closed/closed blocks.
1003 To~facilitate traversal of the graph, we represent a body as a finite
1004 map from label to block.
1005 \item
1006 The \emph{exit sequence} is dual to the entry sequence, and like the entry
1007 sequence, its presence or absence is deducible from the static type of the graph.
1008 \end{itemize}
1009
1010 \seclabel{gSplice}
1011
1012 Graphs can be spliced together nicely; the cost is logarithmic in the
1013 number of closed/closed blocks.
1014 Unlike blocks, two graphs may be spliced together
1015 not only when they are both open
1016 at splice point but also
1017 when they are both closed---and not in the other two cases:
1018 \begingroup
1019 \newcommand\cnl{\\[-5pt]}%
1020 \begin{smallttcode}
1021 `gSplice :: Graph n e a -> Graph n a x -> Graph n e x
1022 gSplice GNil g2 = g2
1023 gSplice g1 GNil = g1\cnl
1024 gSplice (GUnit ^b1) (GUnit ^b2) = GUnit (b1 `BCat` b2)\cnl
1025 gSplice (GUnit b) (GMany (JustO e) bs x)
1026 = GMany (JustO (b `BCat` e)) bs x\cnl
1027 gSplice (GMany e ^bs (JustO x)) (GUnit b2)
1028 = GMany e bs (JustO (x `BCat` b2))\cnl
1029 gSplice (GMany e1 ^bs1 (JustO x1)) (GMany (JustO e2) ^bs2 x2)
1030 = GMany e1 (bs1 `mapUnion` (b `addBlock` bs2)) x2
1031 where b = x1 `BCat` e2\cnl
1032 gSplice (GMany e1 bs1 NothingO) (GMany NothingO bs2 x2)
1033 = GMany e1 (bs1 `mapUnion` bs2) x2
1034 \end{smallttcode}
1035 \endgroup
1036 % omit mapUnion :: LabelMap a -> LabelMap a -> LabelMap a
1037 % omit addBlock :: NonLocal n => Block n C C -> LabelMap (Block n C C) -> LabelMap (Block n C C)
1038 This definition illustrates the power of GADTs: the
1039 pattern matching is exhaustive, and all the open/closed invariants are
1040 statically checked. For example, consider the second-last equation for @gSplice@.
1041 Since the exit sequence of the first argument is @JustO x1@,
1042 we know that type parameter~@a@ is~@O@, and hence the entry sequence of the second
1043 argument must be @JustO e2@.
1044 Moreover, block~@x1@ must be
1045 closed/open, and block~@e2@ must be open/closed.
1046 We can therefore concatenate them
1047 with @BCat@ to produce a closed/closed block, which is
1048 added to the body of the result.
1049
1050 The representation of @Graph@s is exposed to \hoopl's clients.
1051 \finalremark{This \P\ is a candidate to be cut.}
1052 We~have carefully crafted the types so that if @BCat@
1053 is considered as an associative operator,
1054 every graph has a unique representation.\finalremark{So what? (Doumont)
1055 Need some words about how this is part of making the API simple for
1056 the client---I've added something to the end of the paragraph, but I'm
1057 not particularly thrilled.}
1058 %% \simon{Well, you were the one who was so keen on a unique representation!
1059 %% And since we have one, I think tis useful to say so. Lastly, the representation of
1060 %% singleton blocks is not entirely obvious.}
1061 %%%%
1062 %%%% An empty open/open graph is represented
1063 %%%% by @GNil@, while a closed/closed one is @gNilCC@:
1064 %%%% \par {\small
1065 %%%% \begin{code}
1066 %%%% gNilCC :: Graph C C
1067 %%%% gNilCC = GMany NothingO BodyEmpty NothingO
1068 %%%% \end{code}}
1069 %%%% The representation of a @Graph@ consisting of a single block~@b@
1070 %%%% depends on the shape of~@b@:\remark{Does anyone care?}
1071 %%%% \par{\small
1072 %%%% \begin{code}
1073 %%%% gUnitOO :: Block O O -> Graph O O
1074 %%%% gUnitOC :: Block O C -> Graph O C
1075 %%%% gUnitCO :: Block O C -> Graph C O
1076 %%%% gUnitCC :: Block O C -> Graph C C
1077 %%%% gUnitOO b = GUnit b
1078 %%%% gUnitOC b = GMany (JustO b) BodyEmpty NothingO
1079 %%%% gUnitCO b = GMany NothingO BodyEmpty (JustO b)
1080 %%%% gUnitCC b = GMany NothingO (BodyUnit b) NothingO
1081 %%%% \end{code}}
1082 %%%% Multi-block graphs are similar.
1083 %%%% From these definitions
1084 To guarantee uniqueness, @GUnit@ is restricted to open/open
1085 blocks.
1086 If~@GUnit@ were more polymorphic, there would be
1087 more than one way to represent some graphs, and it wouldn't be obvious
1088 to a client which representation to choose---or if the choice made a difference.
1089
1090
1091 \subsection{Labels and successors} \seclabel{nonlocal-class}
1092
1093
1094 Although \ourlib{} is polymorphic in the type of nodes,
1095 it~still needs to know how a node may transfer control from one
1096 block to another.
1097 \hoopl\ also
1098 needs to know what @Label@ is on the first node in a block.
1099 If \hoopl\ is polymorphic in the node type,
1100 how can it{} know these things?
1101 \hoopl\ requires the client to make the node type an instance of
1102 \ourlib's @NonLocal@ type class, which is defined in \figref{edges}.
1103 The @entryLabel@ method takes a first node (one closed on entry, \secref{nodes})
1104 and returns its @Label@;
1105 the @successors@ method takes a last node (closed on exit) and returns
1106 the @Label@s to
1107 which it can transfer control.
1108 A~middle node, which is open on both entry and exit, transfers control
1109 only locally, to its successor within a basic block,
1110 so no corresponding interrogation function is needed.
1111
1112 %% A node type defined by a client must be an instance of @NonLocal@.
1113 In~\figref{cmm-node},
1114 the client's instance declaration for @Node@ would be
1115 \begin{code}
1116 instance NonLocal Node where
1117 entryLabel (Label l) = l
1118 successors (Branch b) = [b]
1119 successors (CondBranch e b1 b2) = [b1, b2]
1120 \end{code}
1121 Again, the pattern matching for both functions is exhaustive, and
1122 the compiler statically checks this fact.
1123 Here, @entryLabel@
1124 cannot be applied to an @Assign@ or @Branch@ node,
1125 and any attempt to define a case for @Assign@ or @Branch@ would result
1126 in a type error.
1127
1128 While the client provides this information about
1129 nodes, it is convenient for \ourlib\ to get the same information
1130 about blocks.
1131 Internally,
1132 \ourlib{} uses this instance declaration for the @Block@ type:
1133 \begin{code}
1134 instance NonLocal n => NonLocal (Block n) where
1135 entryLabel (BFirst n) = entryLabel n
1136 entryLabel (BCat b _) = entryLabel b
1137 successors (BLast n) = successors n
1138 successors (BCat _ b) = successors b
1139 \end{code}
1140 Because the functions @entryLabel@ and @successors@ are used to track control
1141 flow \emph{within} a graph, \ourlib\ does not need to ask for the
1142 entry label or successors of a @Graph@ itself.
1143 Indeed, @Graph@ \emph{cannot} be an instance of @NonLocal@, because even
1144 if a @Graph@ is closed on entry, it need not have a unique entry label.
1145 %
1146 % A slight infelicity is that we cannot make @Graph@ an instance of @NonLocal@,
1147 % because a graph closed on entry has no unique label. Fortunately, we never
1148 % need @Graph@s to be in @NonLocal@.
1149 %
1150 % ``Never apologize. Never confess to infelicity.'' ---SLPJ
1151
1152
1153
1154
1155 \begin{table}
1156 \centerline{%
1157 \begin{tabular}{@{}>{\raggedright\arraybackslash}p{1.03in}>{\scshape}c>{\scshape
1158 }
1159 c>{\raggedright\arraybackslash}p{1.29in}@{}}
1160 &\multicolumn1{r}{\llap{\emph{Specified}}\hspace*{-0.3em}}&
1161 \multicolumn1{l}{\hspace*{-0.4em}\rlap{\emph{Implemented}}}&\\
1162 \multicolumn1{c}{\emph{Part of optimizer}}
1163 &\multicolumn1{c}{\emph{by}}&
1164 \multicolumn1{c}{\emph{by}}&
1165 \multicolumn1{c}{\emph{How many}}%
1166 \\[5pt]
1167 Control-flow graphs& Us & Us & One \\
1168 Nodes in a control-flow graph & You & You & One type per intermediate language \\[3pt]
1169 Dataflow fact~$F$ & You & You & One type per logic \\
1170 Lattice operations & Us & You & One set per logic \\[3pt]
1171 Transfer functions & Us & You & One per analysis \\
1172 Rewrite functions & Us & You & One per \rlap{transformation} \\[3pt]
1173 Solve-and-rewrite functions & Us & Us & Two (forward, backward) \\
1174 \end{tabular}%
1175 }
1176 \ifpagetuning\vspace*{0.5\baselineskip}\fi
1177 \caption{Parts of an optimizer built with \ourlib}
1178 \tablabel{parts}
1179 \end{table}
1180
1181
1182
1183 \section {Using \ourlib{} to analyze and transform graphs} \seclabel{using-hoopl}
1184 \seclabel{making-simple}
1185 \seclabel{create-analysis}
1186 \seclabel{api}
1187
1188 Now that we have graphs, how do we optimize them?
1189 \ourlib{} makes it easy to build a new dataflow analysis and optimization.
1190 A~client must supply the following pieces:
1191 \begin{itemize}
1192 \item A \emph{node type} (\secref{nodes}).
1193 \ourlib{} supplies the @Block@ and @Graph@ types
1194 that let the client build control-flow graphs out of nodes.
1195 \item A \emph{data type of facts} and some operations over
1196 those facts (\secref{facts}).
1197 Each analysis uses facts that are specific to that particular analysis,
1198 which \ourlib{} accommodates by being polymorphic in
1199 the fact type.
1200 \item A \emph{transfer function} that takes a node and returns a
1201 \emph{fact transformer}, which takes a fact flowing into the node and
1202 returns the transformed fact that flows out of the node (\secref{transfers}).
1203 \item A \emph{rewrite function} that takes a node and an input fact,
1204 and which returns either @Nothing@
1205 or @Just g@,
1206 where @g@~is a graph that should
1207 replace the node
1208 (\secreftwo{rewrites}{shallow-vs-deep}).
1209 The ability to replace a \emph{node} by a \emph{graph}
1210 % that may include internal control flow
1211 is crucial for many code-improving transformations.
1212 %We discuss the rewrite function
1213 %in Sections \ref{sec:rewrites} and \ref{sec:shallow-vs-deep}.
1214 \end{itemize}
1215 These requirements are summarized in \tabref{parts}.
1216 Because facts, transfer functions, and rewrite functions work closely together,
1217 we represent their combination
1218 as a single record of type @FwdPass@ (\figref{api-types}).
1219
1220
1221
1222 Given a node type~@n@ and a @FwdPass@,
1223 a client can ask \ourlib{}\ to analyze and rewrite a
1224 graph.
1225 \hoopl\ provides a fully polymorphic interface, but for purposes of
1226 exposition, we present a function that is specialized to a
1227 closed/closed graph:
1228 \begin{code}
1229 `analyzeAndRewriteFwdBody
1230 :: ( CkpointMonad m -- Roll back speculative actions
1231 , NonLocal n ) -- Extract non-local flow edges
1232 => FwdPass m n f -- Lattice, transfer, rewrite
1233 -> [Label] -- Entry point(s)
1234 -> Graph n C C -- Input graph
1235 -> FactBase f -- Input fact(s)
1236 -> m ( Graph n C C -- Result graph
1237 , FactBase f ) -- ... and its facts
1238 \end{code}
1239 Given a @FwdPass@ and a list of entry points, the
1240 analyze-and-rewrite function transforms a graph into
1241 an optimized graph.
1242 As its type shows, this function
1243 is polymorphic in the types of nodes~@n@ and facts~@f@;
1244 these types are determined by the client.
1245 The type of the monad~@m@ is also determined by the client, but it must
1246 meet the constraints implied by @FuelMonad@, as described in
1247 \secref{fuel-monad}.
1248 %%
1249 %% offer at least the operations of the @FuelMonad@ class (\figref{api-types}),
1250 %% namely: the ability to allocate fresh labels (\secref{rewrites}) and to
1251 %% manage ``optimisation fuel'' (\secref{fuel-monad}).
1252
1253 As well as taking and returning a graph, the
1254 function also takes input facts (the @FactBase@) and produces output facts.
1255 A~@FactBase@ is simply a finite mapping from @Label@ to facts;
1256 if~a @Label@ is not in the domain of the @FactBase@, its fact is the
1257 bottom element of the lattice.
1258 For example, in our constant-propagation example from \secref{const-prop-example},
1259 if the graph
1260 represents the body of a procedure
1261 with parameters $x,y,z$, we would map the entry @Label@ to a fact
1262 $@x=@\top \land @y=@\top \land @z=@\top$, to specify that the procedure's
1263 parameters are not known to be constants.
1264 %%
1265 %% The
1266 %% output @FactBase@ maps each @Label@ in the body to its fact
1267
1268 The client's model of how @analyzeAndRewriteFwdBody@ works is as follows:
1269 \ourlib\ walks forward over each block in the graph.
1270 At each node, \ourlib\ applies the
1271 rewrite function to the node and the incoming fact. If the rewrite
1272 function returns @Nothing@, the node is retained as part of the output
1273 graph, the transfer function is used to compute the downstream fact,
1274 and \ourlib\ moves on to the next node.
1275 But if the rewrite function returns @Just (FwdRew g rw)@,
1276 indicating that it wants to rewrite the node to the replacement graph~@g@, then
1277 \ourlib\ recursively analyzes and rewrites~@g@, using the new rewrite
1278 function~@rw@, before moving on to the next node.
1279 A~node following a rewritten node sees
1280 \emph{up-to-date} facts; that is, its input fact is computed by
1281 analyzing the replacement graph.
1282
1283 \newtext{The rewrite function may take any action that is justified by
1284 the incoming fact.
1285 If~further analysis turns out to invalidate the fact, \hoopl\ rolls
1286 back the action.
1287 \hoopl\ can roll back the graph easily enough because the graph is a
1288 purely functional data structure.
1289 But in addition to changing the graph,
1290 a~rewrite function may take action in an arbitrary monad defined by
1291 the client.\remark{This is the key bit that is not explained---why?
1292 One reasonable example might be that the client wishes to log how
1293 often a transformation is applied, or in which functions, or in a
1294 combined dataflow pass what transformations precede what others.
1295 Another example is that we don't want to mandate how fresh names are
1296 constructed, and we don't want to limit clients to a single supply of
1297 fresh names.
1298 On the {\PAL} project, we learned a painful lesson:
1299 no~matter how well designed the interface,
1300 if it does not accomodate design choices already embodied in existing
1301 compilers, it won't be used.
1302 }
1303 To~enable \hoopl\ to roll back any actions in this monad, the monad
1304 must satisfy the @CkpointMonad@ and its associated algebraic law, as
1305 described in \secref{ckpoint-monad} below.
1306 }
1307
1308 Below we flesh out the
1309 interface to @analyzeAndRewriteFwdBody@, leaving the implementation for
1310 \secref{engine}.
1311 %{\hfuzz=7.2pt\par}
1312
1313
1314 \begin{figure}
1315 \begin{code}
1316 data `FwdPass m n f
1317 = FwdPass { `fp_lattice :: DataflowLattice f
1318 , `fp_transfer :: FwdTransfer n f
1319 , `fp_rewrite :: FwdRewrite m n f }
1320
1321 ------- Lattice ----------
1322 data `DataflowLattice a = DataflowLattice
1323 { `fact_bot :: a
1324 , `fact_join :: OldFact a -> NewFact a
1325 -> (ChangeFlag, a) }
1326
1327 data `ChangeFlag = `NoChange | `SomeChange
1328 newtype `OldFact a = OldFact a
1329 newtype `NewFact a = NewFact a
1330
1331 ------- Transfers ----------
1332 newtype `FwdTransfer n f -- abstract type
1333 `mkFTransfer
1334 :: (forall e x . n e x -> f -> Fact x f)
1335 -> FwdTransfer n f
1336
1337 ------- Rewrites ----------
1338 newtype `FwdRewrite m n f -- abstract type
1339 `mkFRewrite :: FuelMonad m
1340 => (forall e x . n e x -> f -> m (Maybe (Graph n e x)))
1341 -> FwdRewrite m n f
1342 thenFwdRw :: FwdRewrite m n f -> FwdRewrite m n f
1343 -> FwdRewrite m n f
1344 iterFwdRw :: FwdRewrite m n f -> FwdRewrite m n f
1345 noFwdRw :: Monad m => FwdRewrite m n f
1346
1347 ------- Fact-like things, aka "fact(s)" -----
1348 type family `Fact x f :: *
1349 type instance Fact O f = f
1350 type instance Fact C f = FactBase f
1351
1352 ------- FactBase -------
1353 type `FactBase f = LabelMap f
1354 -- A finite mapping from Labels to facts f
1355
1356 ------- Monadic rewrites, speculatively ----
1357 class Monad m => CkpointMonad m where
1358 type Checkpoint m
1359 checkpoint :: m (Checkpoint m)
1360 restart :: Checkpoint m -> m ()
1361
1362 ------- Optimization fuel ----
1363 type `Fuel = Int
1364 class Monad m => `FuelMonad m where
1365 `getFuel :: m Fuel
1366 `setFuel :: Fuel -> m ()
1367 \end{code}
1368 \caption{\ourlib{} API data types}
1369 \figlabel{api-types}
1370 \figlabel{lattice-type} \figlabel{lattice}
1371 \figlabel{transfers} \figlabel{rewrites}
1372 \end{figure}
1373 % omit mkFactBase :: [(Label, f)] -> FactBase f
1374 %%%% \simon{We previously renamed @fact\_join@ to @fact\_extend@ because it really
1375 %%%% is not a symmetrical join; we're extending an old fact with a new one.
1376 %%%% NR: Yes, but the asymmetry is now explicit in the \emph{type}, so it
1377 %%%% needn't also be reflected in the \emph{name}.}
1378
1379 \subsection{Dataflow lattices} \seclabel{lattices} \seclabel{facts}
1380
1381 For each analysis or transformation, the client must define a type
1382 of dataflow facts.
1383 A~dataflow fact often represents an assertion
1384 about a program point,
1385 but in general, dataflow analysis establishes properties of \emph{paths}:
1386 \begin{itemize}
1387 \item An assertion about all paths \emph{to} a program point is established
1388 by a \emph{forward analysis}. For example the assertion ``$@x@=@3@$'' at point P
1389 claims that variable @x@ holds value @3@ at P, regardless of the
1390 path by which P is reached.
1391 \item An assertion about all paths \emph{from} a program point is
1392 established by a \emph{backward analysis}. For example, the
1393 assertion ``@x@ is dead'' at point P claims that no path from P uses
1394 variable @x@.
1395 \end{itemize}
1396
1397 A~set of dataflow facts must form a lattice, and \ourlib{} must know
1398 (a)~the bottom element of the lattice and (b)~how to take
1399 the least upper bound (join) of two elements.
1400 To ensure that analysis
1401 terminates, it is enough if every fact has a finite number of
1402 distinct facts above it, so that repeated joins
1403 eventually reach a fixed point.
1404 %% \simon{There is no mention here of the @OldFact@ and @NewFact@ types.
1405 %% Shall we nuke them for the purposes of the paper?
1406 %% NR: They should definitely not be nuked; they're needed if the reader
1407 %% wants to understand the asymmetrical behavior of @fact\_join@.
1408 %% I'm~also mindful that this paper will be the primary explanation of
1409 %% \hoopl\ to its users, and I don't want to hide this part of the
1410 %% interface.
1411 %% As for mentions in the text,
1412 %% if you look carefully, you'll
1413 %% see that I've changed the subscripts in the text to ``old'' and
1414 %% ``new''.
1415 %% Together with the code, I believe that's enough to get the point across.
1416 %% SLPJ -- OK. I don't agree that it's enough, but we have more important fish to fry.
1417 %% }
1418
1419 In practice, joins are computed at labels.
1420 If~$f_{\mathit{old}}$ is the fact currently associated with a
1421 label~$L$,
1422 and if a transfer function propagates a new fact~$f_{\mathit{new}}$
1423 into label~$L$,
1424 the dataflow engine replaces $f_{\mathit{old}}$ with
1425 the join $f_{\mathit{old}} \join f_{\mathit{new}}$.
1426 Furthermore, the dataflow engine wants to~know
1427 \relax % the tie (~) above prevents a really wacky-looking paragraph
1428 if $f_{\mathit{old}} \join f_{\mathit{new}} = f_{\mathit{old}}$,
1429 because if not, the analysis has not reached a fixed point.
1430
1431 The bottom element and join operation of a lattice of facts of
1432 type~@f@ are stored in a value of type @DataflowLattice f@
1433 (\figref{lattice}).
1434 %%%% \simon{Can we shorten ``@DataflowLattice@'' to just
1435 %%%% ``@Lattice@''?} % bigger fish ---NR
1436 %%Such a value is part of the @FwdPass n f@ that is passed to
1437 %%@analyzeAndRewriteFwd@ above.
1438 As noted in the previous paragraph,
1439 \ourlib{} needs to know when the result of a join is equal
1440 to the old fact.
1441 Because this information can often be computed cheaply together
1442 with the join, \ourlib\ does not
1443 require a separate equality test on facts (which might be expensive).
1444 Instead, \ourlib\ requires that @fact_join@ return a @ChangeFlag@ as
1445 well as the least upper bound.
1446 The @ChangeFlag@ should be @NoChange@ if
1447 the result is the same as the old fact, and
1448 @SomeChange@ if the result differs.
1449
1450 \seclabel{WithTop}
1451
1452 To help clients create lattices and join functions,
1453 \hoopl\ includes functions and constructors that can extend a type~@a@
1454 with top and bottom elements.
1455 In this paper, we use only type @`WithTop@, which comes with value
1456 constructors that have these types:
1457 \begin{code}
1458 `PElem :: a -> WithTop a
1459 `Top :: WithTop a
1460 \end{code}
1461 \hoopl\ provides combinators which make it easier to create join
1462 functions that use @Top@.
1463 The most useful is @extendJoinDomain@:
1464 \begin{smallcode}
1465 `extendJoinDomain
1466 :: (OldFact a -> NewFact a -> (ChangeFlag, WithTop a))
1467 -> (OldFact (WithTop a) -> NewFact (WithTop a)
1468 -> (ChangeFlag, WithTop a))
1469 \end{smallcode}
1470 A~client can write a join function that \emph{consumes} only facts of
1471 type~@a@, but may produce @Top@ (as well as a fact of type~@a@)---as
1472 in the example of \figref{const-prop} below.
1473 Calling @extendJoinDomain@ extends the client's function to a proper
1474 join function on the type @WithTop a@,
1475 and @extendJoinDomain@ makes sure that joins
1476 involving @Top@ obey the appropriate algebraic laws.
1477
1478 \hoopl\ also provides a value constructor @`Bot@ and type constructors
1479 @`WithBot@ and @`WithTopAndBot@, along with similar functions.
1480 Constructors @Top@ and @Bot@ are polymorphic, so for example,
1481 @Top@~also has type @WithTopAndBot a@.
1482
1483
1484
1485 \subsection{The transfer function} \seclabel{transfers}
1486
1487 A~forward transfer function is presented with the dataflow fact
1488 coming
1489 into a node, and it computes dataflow fact(s) on the node's outgoing edge(s).
1490 In a forward analysis, the dataflow engine starts with the fact at the
1491 beginning of a block and applies the transfer function to successive
1492 nodes in that block until eventually the transfer function for the last node
1493 computes the facts that are propagated to the block's successors.
1494 For example, consider doing constant propagation (\secref{constant-propagation}) on
1495 the following graph, with entry at @L1@:
1496 \begin{code}
1497 L1: x=3; goto L2
1498 L2: y=x+4; x=x-1;
1499 if x>0 then goto L2 else return
1500 \end{code}
1501 Forward analysis starts with the bottom fact \{\} at every label.
1502 Analyzing @L1@ propagates this fact forward, by applying the transfer
1503 function successively to the nodes
1504 of @L1@, emerging with the fact \{@x=3@\} for @L2@.
1505 This new fact is joined with the existing (bottom) fact for @L2@.
1506 Now the analysis propagates @L2@'s fact forward, again using the transfer
1507 function, this time emerging with a new fact \mbox{\{@x=2@, @y=7@\}} for @L2@.
1508 Again, the new fact is joined with the existing fact for @L2@, and the process
1509 is iterated until the facts for each label reach a fixed point.
1510
1511 %%% But wait! % Too breathless ---NR
1512 A~transfer function has an unusual sort of type:
1513 not quite a dependent type, but not a bog-standard polymorphic type either.
1514 The result type of the transfer function is \emph{indexed} by the shape (i.e.,
1515 the type) of the node argument:
1516 If the node is open on exit, the transfer function produces a single fact.
1517 But if the node is \emph{closed} on exit,
1518 the transfer function
1519 produces a collection of (@Label@,fact) pairs, one for each outgoing edge.
1520 %
1521 The indexing is expressed by Haskell's (recently added)
1522 \emph{indexed type families}.
1523 The relevant part of \ourlib's interface is given in \figref{transfers}.
1524 A~forward transfer function supplied by a client,
1525 which would be passed to @mkFTransfer@,
1526 is typically a function polymorphic in @e@ and @x@.
1527 It~takes a
1528 node of type \mbox{@n@ @e@ @x@}
1529 and it returns a \emph{fact transformer} of type
1530 @f -> Fact x f@.
1531 Type constructor @Fact@
1532 should be thought of as a type-level function: its signature is given in the
1533 @type family@ declaration, while its definition is given by two @type instance@
1534 declarations. The first declaration says that a @Fact O f@, which
1535 comes out of a node
1536 \emph{open} on exit, is just a fact~@f@.
1537 The second declaration says that a @Fact C f@, which comes out of a
1538 node \emph{closed} on exit, is a mapping from @Label@ to facts.
1539
1540 %%
1541 %% \begin{code}
1542 %% `transfer_fn :: forall e x . n e x -> f -> Fact x f
1543 %% `node :: n e x
1544 %% \end{code}
1545 %% then @(transfer_fn node)@ is a fact transformer:
1546 %% \begin{code}
1547 %% transfer_fn node :: f -> Fact x f
1548 %% \end{code}
1549 %%
1550
1551
1552 \subsection{The rewrite function and the client's monad}
1553 \seclabel{rewrites}
1554 \seclabel{example-rewrites}
1555
1556
1557 We compute dataflow facts in order to enable code-improving
1558 transformations.
1559 In our constant-propagation example,
1560 the dataflow facts may enable us
1561 to simplify an expression by performing constant folding, or to
1562 turn a conditional branch into an unconditional one.
1563 Similarly, a liveness analysis may allow us to
1564 replace a dead assignment with a no-op.
1565
1566 A @FwdPass@ therefore includes a \emph{rewrite function}, whose
1567 type, @FwdRewrite@, is abstract (\figref{api-types}).
1568 A~programmer creating a rewrite function chooses the type of a node~@n@ and
1569 a dataflow fact~@f@.
1570 A~rewrite function might also want access to fresh
1571 names (e.g., to label new blocks) or to other state (e.g., a~mapping
1572 indicating which loops a block is a part of).
1573 So~that a rewrite function may have access to such state, \hoopl\
1574 requires that a programmer creating a rewrite function also choose a
1575 monad~@m@.\remark{Looking for better/more in this~\P}
1576 \newtext{So that \hoopl\ may roll back speculative rewrites, the monad
1577 must satisfy the constraint @CkpointMonad m@.}
1578 The programmer may write code that works with any such monad,
1579 may create a monad just for the client,
1580 or may use a monad supplied by \hoopl.
1581
1582 When these choices are made, the easy way to create a rewrite
1583 function is to call the function @mkFRewrite@ in \figref{api-types}.
1584 The client supplies a function that is specialized to a particular
1585 node, fact, and (possibly) monad, but is polymorphic in the
1586 \emph{shape} of the node to be rewritten.
1587 The function, which we will call~$r$, takes a node and a fact and returns a monadic
1588 computation, but what is the result of that computation?
1589 One might
1590 expect that the result should be a new node, but that is not enough:
1591 in~general, it must be possible for rewriting to result in a graph.
1592 For example,
1593 we might want to remove a node by returning the empty graph,
1594 or more ambitiously, we might want to replace a high-level operation
1595 with a tree of conditional branches or a loop, which would entail
1596 returning a graph containing new blocks with internal control flow.
1597
1598
1599 It~must also be possible for a rewrite function to decide to do nothing.
1600 The result of the monadic computation returned by~$r$ may therefore be
1601 @Nothing@, indicating that the node should
1602 not be rewritten,
1603 or $@Just@\;\ag$, indicating that the node should
1604 be replaced with~\ag: the replacement graph.
1605 %%The additional value $\rw$ tells \hoopl\ whether
1606 %%and how the replacement graph~$\ag$ should be analyzed and rewritten
1607 %%further;
1608 %%we explain~$\rw$ in \secref{shallow-vs-deep}.
1609
1610
1611 The type of @mkFRewrite@ in \figref{api-types} guarantees
1612 that
1613 the replacement graph $\ag$ has
1614 the \emph{same} open/closed shape as the node being rewritten.
1615 For example, a branch instruction can be replaced only by a graph
1616 closed on exit.
1617 %% Moreover, because only an open/open graph can be
1618 %% empty---look at the type of @GNil@ in \figref{graph}---the type
1619 %% of @FwdRewrite@
1620 %% guarantees, at compile time, that no head of a block (closed/open)
1621 %% or tail of a block (open/closed) can ever be deleted by being
1622 %% rewritten to an empty graph.
1623
1624 \subsection{Shallow rewriting, deep rewriting, and rewriting combinators}
1625
1626 \seclabel{shallow-vs-deep}
1627
1628 When a node is rewritten, the replacement graph~$g$
1629 must itself be analyzed, and its nodes may be further rewritten.
1630 \hoopl\ can make a recursive call to
1631 @analyzeAndRewriteFwdBody@---but what @FwdPass@ should it use?
1632 There are two common situations:
1633 \begin{itemize}
1634 \item Sometimes we want to analyze and transform the replacement graph
1635 with an unmodified @FwdPass@, thereby further rewriting the replacement graph.
1636 This procedure is
1637 called \emph{deep rewriting}.
1638 When deep rewriting is used, the client's rewrite function must
1639 ensure that the graphs it produces are not rewritten indefinitely
1640 (\secref{correctness}).
1641 \item
1642 A client may want to analyze the
1643 replacement graph \emph{without} further rewriting.
1644 This procedure is called \emph{shallow rewriting}.
1645 \end{itemize}
1646 Deep rewriting is essential to achieve the full benefits of
1647 interleaved analysis and transformation
1648 \citep{lerner-grove-chambers:2002}.
1649 But shallow rewriting can be vital as well;
1650 for example, a backward dataflow pass that inserts
1651 a spill before a call must not rewrite the call again, lest it attempt
1652 to insert infinitely many spills.
1653
1654
1655
1656 An~innovation of \hoopl\ is to build the choice of shallow or deep
1657 rewriting into each rewrite function, through the use of combinators.
1658 A~function returned by @mkFRewrite@ is shallow by default, but it can
1659 be made deep by
1660 \begin{code}
1661 `iterFwdRw :: FwdRewrite m n f -> FwdRewrite m n f
1662 \end{code}
1663 Another useful combinator enables \hoopl\ to try any number of functions to
1664 rewrite a node, provided only that they all use the same fact:
1665 \verbatiminput{comb1}
1666 % defn thenFwdRw
1667 Rewrite function @r1 `thenFwdRw` r2@ first does the rewrites of~@r1@,
1668 then the rewrites of~@r2@.
1669 %%
1670 %%
1671 %% by using
1672 %% Rewrite functions are potentially more plentiful than transfer
1673 %% functions, because
1674 %% a~single dataflow fact might justify more than one kind of rewrite.
1675 %% \hoopl\ makes it easy for a client to combine multiple rewrite
1676 %% functions that use the same fact:
1677
1678 These combinators satisfy the algebraic law
1679 \begin{code}
1680 iterFwdRw r = r `thenFwdRw` iterFwdRw r
1681 \end{code}
1682 but unfortunately, this equation cannot be used to \emph{define}
1683 @iterFwdRw@:
1684 if~we used this equation to define @iterFwdRw@, then when @r@ declined
1685 to rewrite, @iterFwdRw r@ would diverge.
1686
1687
1688
1689 %%%% \hoopl\ provides
1690 %%%% a function that makes a shallow rewrite deep:\finalremark{algebraic
1691 %%%% law wanted!}
1692 %%%% \remark{Do we really care about @iterFwdRw@ or can we just drop it?}
1693 %%%% \verbatiminput{iterf}
1694 %%%% % defn iterFwdRw
1695
1696
1697
1698 \subsection{When the type of nodes is not known}
1699
1700 We note above (\secref{transfers}) that
1701 the type of the transfer function's result
1702 depends on the argument's shape on exit.
1703 It~is easy for a client to write a type-indexed transfer function,
1704 because the client defines the constructor and shape for each node.
1705 The client's transfer functions discriminate on the constructor
1706 and so can return a result that is indexed by each node's shape.
1707
1708 What if you want to write a transfer function that
1709 {does \emph{not} know the type of the node}?
1710 %
1711 For example, a dominator analysis need not scrutinize nodes;
1712 it needs to know only about
1713 labels and edges in the graph.
1714 Ideally, a dominator analysis would work
1715 with \emph{any} type of node~@n@,
1716 provided only that @n@~is an
1717 instance of the @NonLocal@ type class.
1718 But if we don't know
1719 the type of~@n@,
1720 we can't write a function of type
1721 @n e x -> f -> Fact x f@,
1722 because the only way to get the result type right is to
1723 scrutinize the constructors of~@n@.
1724
1725 \seclabel{triple-of-functions}
1726
1727 There is another way;
1728 instead of requiring a single function that is polymorphic in shape,
1729 \hoopl\ will also accept a triple of functions, each of which is
1730 polymorphic in the node's type but monomorphic in its shape:
1731 \begin{code}
1732 `mkFTransfer3 :: (n C O -> f -> Fact O f)
1733 -> (n O O -> f -> Fact O f)
1734 -> (n O C -> f -> Fact C f)
1735 -> FwdTransfer n f
1736 \end{code}
1737 We have used this interface to write a number functions that are
1738 polymorphic in the node type~@n@:
1739 \begin{itemize}
1740 \item
1741 A function that takes a @FwdTransfer@ and wraps it in logging code, so
1742 an analysis can be debugged by watching the facts flow through the
1743 nodes
1744 \item
1745 A pairing function that runs two passes interleaved, not sequentially,
1746 potentially producing better results than any sequence:
1747 \verbatiminput{pairf}
1748 % defn pairFwd
1749 \item
1750 An efficient dominator analysis
1751 in the style of
1752 \citet{cooper-harvey-kennedy:simple-dominance},
1753 whose transfer function is implemented
1754 using only the
1755 functions in the \texttt{NonLocal} type class
1756 \end{itemize}
1757
1758
1759 %% Or, you might want to write a combinator that
1760 %% computes the pairwise composition of any two analyses
1761 %% defined on any node type.
1762 %%
1763 %% You cannot write either of these examples
1764 %% using the polymorphic function described in \secref{transfers}
1765 %% because these examples require the ability
1766 %% to inspect the shape of the node.
1767 %%
1768 %%
1769 %% \subsection{Stuff cut from 4.2 that should disappear when the previous section is finished}
1770 %%
1771 %% So much for the interface.
1772 %% What about the implementation?
1773 %% The way GADTs work is that the compiler uses the value constructor for
1774 %% type \mbox{@n@ @e@ @x@} to determine whether @e@~and~@x@ are open or
1775 %% closed.
1776 %% But we want to write functions that are \emph{polymorphic} in the node
1777 %% type~@n@!
1778 %% Such functions include
1779 %% \begin{itemize}
1780 %% \item
1781 %% A function that takes a pair of @FwdTransfer@s for facts @f@~and~@f'@,
1782 %% and returns a single @FwdTransfer@ for the fact @(f, f')@
1783 %% \end{itemize}
1784 %% \simon{These bullets are utterly opaque. They belong in 4.5. The rest of this section
1785 %% is also very hard to understand without more explanation. See email. }
1786 %% Such functions may also be useful in \hoopl's \emph{clients}:
1787 %% % may need
1788 %% %functions that are polymorphic in the node type~@n@:
1789 %% \begin{itemize}
1790 %% \item
1791 %% A dominator analysis in the style of
1792 %% \citet{cooper-harvey-kennedy:simple-dominance} requires only the
1793 %% functions in the \texttt{NonLocal} type class;
1794 %% we have written such an analysis using transfer functions that are
1795 %% polymorphic in~@n@.
1796 %% \end{itemize}
1797 %% Because the mapping from
1798 %% value constructors to shape is different for each node type~@n@, transfer
1799 %% functions cannot be polymorphic in both
1800 %% the representation and the shape of nodes.
1801 %% Our implementation therefore sacrifices polymorphism in shape:
1802 %% internally, \hoopl\ represents
1803 %% a~@FwdTransfer n f@ as a \emph{triple} of functions,
1804 %% each polymorphic in~@n@ but monomorphic in shape:
1805 %% \begin{code}
1806 %% newtype FwdTransfer n f
1807 %% = FwdTransfers ( n C O -> f -> f
1808 %% , n O O -> f -> f
1809 %% , n O C -> f -> FactBase f
1810 %% )
1811 %% \end{code}
1812 %% \simon{I argue strongly that the implementation should be
1813 %% polymorphic too, using a shape classifier where necessary.
1814 %% Regardless of how this debate pans out, though, I think it's
1815 %% a bad idea to introduce triples here. They are simply confusing.}
1816 %% Such triples can easily be composed and wrapped without requiring a
1817 %% pattern match on the value constructor of an unknown node
1818 %% type.\remark{Need to refer to this junk in the conclusion}
1819 %% Working with triples is tedious, but only \hoopl\ itself is forced to
1820 %% work with triples; each client, which knows the node type, may supply
1821 %% a triple,
1822 %% but it typically supplies a single function polymorphic in the shape
1823 %% of the (known) node.
1824 %%
1825 %%
1826
1827 %% \subsection{Composing rewrite functions and dataflow passes} \seclabel{combinators}
1828 %%
1829 %% Requiring each rewrite to return a new rewrite function has another
1830 %% advantage, beyond controlling the shallow-vs-deep choice: it
1831 %% enables a variety of combinators over rewrite functions.
1832 %% \remark{This whole subsection needs to be redone in light of the new
1833 %% (triple-based) representation. It's not pretty any more.}
1834 %% For example, here is a function
1835 %% that combines two rewrite functions in sequence:
1836 %% \remark{This code must be improved}
1837 %% What a beautiful type @thenFwdRw@ has!
1838 %% \remark{And what an ugly implementation! Implementations must go.}
1839 %% It tries @rw1@, and if @rw1@
1840 %% declines to rewrite, it behaves like @rw2@. But if
1841 %% @rw1@ rewrites, returning a new rewriter @rw1a@, then the overall call also
1842 %% succeeds, returning a new rewrite function obtained by combining @rw1a@
1843 %% with @rw2@. (We cannot apply @rw1a@ or @rw2@
1844 %% directly to the replacement graph~@g@,
1845 %% because @r1@~returns a graph and @rw2@~expects a node.)
1846 %% The rewriter @noFwdRw@ is the identity of @thenFwdRw@.
1847 %% Finally, @thenFwdRw@ can
1848 %% combine a deep-rewriting function and a shallow-rewriting function,
1849 %% to produce a rewrite function that is a combination of deep and shallow.
1850 %% %%This is something that the Lerner/Grove/Chambers framework could not do,
1851 %% %%because there was a global shallow/deep flag.
1852 %% %% Our shortsightedness; Lerner/Grove/Chambers is deep only ---NR
1853 %%
1854 %%
1855 %% A shallow rewrite function can be made deep by iterating
1856 %% it:\remark{Algebraic law wanted!}
1857 %% If we have shallow rewrites $A$~and~$B$ then we can build $AB$,
1858 %% $A^*B$, $(AB)^*$,
1859 %% and so on: sequential composition is @thenFwdRw@ and the Kleene star
1860 %% is @iterFwdRw@.\remark{Do we still believe this claim?}
1861 %% \remark{We can't define @iterFwdRew@ in terms of @thenFwdRew@ because
1862 %% the second argument to @thenFwdRew@ would have to be the constant
1863 %% nothing function when applied but would have to be the original triple
1864 %% when passed to @thenFwdRew@ as the second argument (in the recursive
1865 %% call).}
1866 %%
1867 %%
1868 %% The combinators above operate on rewrite functions that share a common
1869 %% fact type and transfer function.
1870 %% It~can also be useful to combine entire dataflow passes that use
1871 %% different facts.
1872 %% We~invite you to write one such combinator, with type
1873 %% \begin{code}
1874 %% `pairFwd :: Monad m
1875 %% => FwdPass m n f1
1876 %% -> FwdPass m n f2
1877 %% -> FwdPass m n (f1,f2)
1878 %% \end{code}
1879 %% The two passes run interleaved, not sequentially, and each may help
1880 %% the other,
1881 %% yielding better results than running $A$~and then~$B$ or $B$~and then~$A$
1882 %% \citep{lerner-grove-chambers:2002}.
1883 %% %% call these passes. ``super-analyses;''
1884 %% %% in \hoopl, construction of super-analyses is
1885 %% %% particularly concrete.
1886
1887 \subsection{Example: Constant propagation and constant folding}
1888 \seclabel{const-prop-client}
1889
1890 % omit Binop :: Operator -> Expr -> Expr -> Expr
1891 % omit Add :: Operator
1892
1893 \begin{figure}
1894 \smallverbatiminput{cprop}
1895 % local node
1896 % defn ConstFact
1897 % defn constLattice
1898 % defn constFactAdd
1899 % defn varHasLit
1900 % local ft
1901 % defn constProp
1902 % local cp
1903 % local lookup
1904 % defn simplify
1905 % local simp
1906 % local s_node
1907 % local s_exp
1908 % defn constPropPass
1909 \caption{The client for constant propagation and constant folding\break (extracted automatically from code distributed with Hoopl)}
1910 \figlabel{const-prop}
1911 \ifpagetuning\vspace*{-1.0\baselineskip}\fi
1912 \end{figure}
1913
1914
1915 \figref{const-prop} shows client code for
1916 constant propagation and constant folding.
1917 For each variable, at each program point, the analysis concludes one
1918 of three facts:
1919 the variable holds a constant value of type~@Lit@,
1920 the variable might hold a non-constant value,
1921 or what the variable holds is unknown.
1922 We~represent these facts using a finite map from a variable to a
1923 fact of type @WithTop Lit@ (\secref{WithTop}).
1924 % Any one procedure has only
1925 % finitely many variables; only finitely many facts are computed at any
1926 % program point; and in this lattice any one fact can increase at most
1927 % twice. These properties ensure that the dataflow engine will reach a
1928 % fixed point.
1929 A~variable with a constant value maps to @Just (PElem k)@, where
1930 @k@~is the constant value;
1931 a variable with a non-constant value maps to @Just Top@;
1932 and a variable with an unknown value maps to @Nothing@ (it is not
1933 in the domain of the finite map).
1934
1935 % \afterpage{\clearpage}
1936
1937 The definition of the lattice (@constLattice@) is straightforward.
1938 The bottom element is an empty map (nothing is known about what
1939 any variable holds).
1940 %
1941 The join function is implemented with the help of combinators provided
1942 by \hoopl.
1943 The client writes a simple function, @constFactAdd@, which
1944 compares two values of type @Lit@ and returns a result of type
1945 @WithTop Lit@.
1946 The client uses @extendJoinDomain@ to lift @constFactAdd@ into a join
1947 function on @WithTop Lit@, then uses
1948 @stdMapJoin@ to lift \emph{that} join function up to the map
1949 containing facts for all variables.
1950 \finalremark{We really should work @stdMapJoin@ into the API somewhere}
1951
1952 % omit stdMapJoin :: Ord k => JoinFun v -> JoinFun (Map.Map k v)
1953
1954 The forward transfer function @varHasLit@ is defined using the
1955 shape-polymorphic auxiliary function~@ft@.
1956 For most nodes~@n@, @ft n@ simply propagates the input fact forward.
1957 But for an assignment node, if a variable~@x@ gets a constant value~@k@,
1958 @ft@ extends the input fact by mapping @x@ to~@PElem k@.
1959 And if a variable~@x@ is assigned a non-constant value,
1960 @ft@ extends the input fact by mapping @x@ to~@Top@.
1961 There is one other interesting case:
1962 a conditional branch where the condition
1963 is a variable.
1964 If the conditional branch flows to the true successor,
1965 the variable holds @True@, and similarly for the false successor,
1966 \emph{mutatis mutandis}.
1967 Function @ft@ updates the fact flowing to each successor accordingly.
1968 \finalremark{Because function @ft@ scrutinizes a GADT, it cannot
1969 default the uninteresting cases.}
1970
1971 The transfer function need not consider complicated cases such as
1972 an assignment @x:=y@ where @y@ holds a constant value~@k@.
1973 Instead, we rely on the interleaving of transformation
1974 and analysis to first transform the assignment to @x:=k@,
1975 which is exactly what our simple transfer function expects.
1976 As we mention in \secref{simple-tx},
1977 interleaving makes it possible to write
1978 very simple transfer functions, without missing
1979 opportunities to improve the code.
1980
1981 \figref{const-prop}'s rewrite function for constant propagation, @constProp@,
1982 rewrites each use of a variable to its constant value.
1983 The client has defined auxiliary functions that may change expressions
1984 or nodes:
1985 \begin{smallcode}
1986 type `MaybeChange a = a -> Maybe a
1987 `mapVE :: (Var -> Maybe Expr) -> MaybeChange Expr
1988 `mapEE :: MaybeChange Expr -> MaybeChange Expr
1989 `mapEN :: MaybeChange Expr -> MaybeChange (Node e x)
1990 `mapVN :: (Var -> Maybe Expr) -> MaybeChange (Node e x)
1991 `nodeToG :: Node e x -> Graph Node e x
1992 \end{smallcode}
1993 The client composes @mapXX@ functions
1994 to apply @lookup@ to each use of a variable in each kind of node;
1995 @lookup@ substitutes for each variable that has a constant value.
1996 Applying @liftM nodeToG@ lifts the final node, if present, into a~@Graph@.
1997
1998
1999 \figref{const-prop} also gives another, distinct function
2000 for constant
2001 folding: @simplify@.
2002 This function
2003 rewrites constant expressions to their values,
2004 and it rewrites a conditional branch on a
2005 boolean constant to an unconditional branch.
2006 To~rewrite constant expressions,
2007 it runs @s_exp@ on every subexpression.
2008 Function @simplify@ does not check whether a variable holds a
2009 constant value; it relies on @constProp@ to have replaced the
2010 variable by the constant.
2011 Indeed, @simplify@ does not consult the
2012 incoming fact, so it is polymorphic in~@f@.
2013
2014
2015 The @FwdRewrite@ functions @constProp@ and @simplify@
2016 are useful independently.
2017 In this case, however, we
2018 want \emph{both} of them,
2019 so we compose them with @thenFwdRw@.
2020 The composition, along with the lattice and the
2021 transfer function,
2022 goes into @constPropPass@ (bottom of \figref{const-prop}).
2023 Given @constPropPass@, we can
2024 improve a graph~@g@ by passing @constPropPass@ and~@g@
2025 to
2026 @analyzeAndRewriteFwdBody@.
2027
2028
2029
2030 %%%% \subsection{Fixed points and speculative rewrites} \seclabel{fixpoints}
2031 %%%%
2032 %%%% Are rewrites sound, especially when there are loops?
2033 %%%% Many analyses compute a fixed point starting from unsound
2034 %%%% ``facts''; for example, a live-variable analysis starts from the
2035 %%%% assumption that all variables are dead. This means \emph{rewrites
2036 %%%% performed before a fixed point is reached may be unsound, and their results
2037 %%%% must be discarded}. Each iteration of the fixed-point computation must
2038 %%%% start afresh with the original graph.
2039 %%%%
2040 %%%%
2041 %%%% Although the rewrites may be unsound, \emph{they must be performed}
2042 %%%% (speculatively, and possibly recursively),
2043 %%%% so that the facts downstream of the replacement graphs are as accurate
2044 %%%% as possible.
2045 %%%% For~example, consider this graph, with entry at @L1@:
2046 %%%% \par{\small
2047 %%%% \begin{code}
2048 %%%% L1: x=0; goto L2
2049 %%%% L2: x=x+1; if x==10 then goto L3 else goto L2
2050 %%%% \end{code}}
2051 %%%% The first traversal of block @L2@ starts with the unsound ``fact'' \{x=0\};
2052 %%%% but analysis of the block propagates the new fact \{x=1\} to @L2@, which joins the
2053 %%%% existing fact to get \{x=$\top$\}.
2054 %%%% What if the predicate in the conditional branch were @x<10@ instead
2055 %%%% of @x==10@?
2056 %%%% Again the first iteration would begin with the tentative fact \{x=0\}.
2057 %%%% Using that fact, we would rewrite the conditional branch to an unconditional
2058 %%%% branch @goto L3@. No new fact would propagate to @L2@, and we would
2059 %%%% have successfully (and soundly) eliminated the loop.
2060 %%%% This example is contrived, but it illustrates that
2061 %%%% for best results we should
2062 %%%% \begin{itemize}
2063 %%%% \item Perform the rewrites on every iteration.
2064 %%%% \item Begin each new iteration with the original, virgin graph.
2065 %%%% \end{itemize}
2066 %%%% This sort of algorithm is hard to implement in an imperative setting, where rewrites
2067 %%%% mutate a graph in place.
2068 %%%% But with an immutable graph, implementing the algorithm
2069 %%%% is trivially easy: we simply revert to the original graph at the start
2070 %%%% of each fixed-point iteration.
2071
2072 \subsection{Checkpointing the client's monad}
2073
2074 \seclabel{ckpoint-monad} \seclabel{checkpoint-monad}
2075
2076 \begin{ntext}
2077 In the presence of loops, a rewrite function could make a change
2078 that later has to be rolled back.
2079 For example, consider constant propagation in this loop, which
2080 computes factorial:
2081 \begin{code}
2082 i = 1; prod = 1;
2083 L1: if (i >= n) goto L3 else goto L2;
2084 L2: i = i + 1; prod = prod * i;
2085 goto L1;
2086 L3: ...
2087 \end{code}
2088 Function @analyzeAndRewriteFwdBody@ iterates through this graph until
2089 the dataflow facts stop changing.
2090 On~the first iteration, the assignment @i = i + 1@ will
2091 be analyzed with an incoming fact @i=0@, and the assignment will be
2092 rewritten to the graph @i = 1@.
2093 But~on a later iteration, the incoming fact will increase to @i=Top@,
2094 and the rewrite will no longer be justified.
2095 After each iteration, \hoopl\ starts the next iteration with
2096 \emph{new} facts but with the \emph{original} graph---simply by virtue
2097 of using purely functional data structures, rewrites from
2098 previous iterations are automatically ``rolled back.''
2099 \emph{But a rewrite function doesn't only produce new graphs.}
2100 A~rewrite function can also take a \emph{monadic action}, such as
2101 acquiring a fresh name.
2102 These actions must also be rolled back, and because the client chooses
2103 the monad in which the actions take place, the client must provide the
2104 means to roll them back.
2105
2106 \hoopl\ defines the \emph{interface} that a client must use to roll
2107 back monadic actions.
2108 The~interface is defined by the type class @CkpointMonad@ in
2109 \figref{api}:
2110 \begin{code}
2111 class Monad m => `CkpointMonad m where
2112 type `Checkpoint m
2113 `checkpoint :: m (Checkpoint m)
2114 `restart :: Checkpoint m -> m ()
2115 \end{code}
2116 \hoopl\ calls the @checkpoint@ method at the beginning of an
2117 iteration, then calls the @restart@ method if another iteration is
2118 necessary.
2119 These operations must obey the following algebraic law:
2120 \begin{code}
2121 do { s <- checkpoint; m; restart s } == return ()
2122 \end{code}
2123 where @m@~represents any combination of monadic actions that might be
2124 taken by rewrite functions.
2125 (The safest course is to make sure the law holds throughout the entire
2126 monad.)
2127 The~type of the saved checkpoint~@s@ is entirely up to the client;
2128 it is defined using SIMON, WHAT ARE THE RIGHT WORDS TO DESCRIBE THIS
2129 HASKELL FEATURE??.
2130 \end{ntext}
2131
2132
2133
2134 \subsection{Correctness} \seclabel{correctness}
2135
2136 % Facts computed by @analyzeAndRewriteFwdBody@ depend on graphs produced by the rewrite
2137 Facts computed by the transfer function depend on graphs produced by the rewrite
2138 function, which in turn depend on facts computed by the transfer function.
2139 How~do we know this algorithm is sound, or if it terminates?
2140 A~proof requires a POPL paper
2141 \cite{lerner-grove-chambers:2002};
2142 here we merely state the conditions for correctness as applied to \hoopl:
2143 \begin{itemize}
2144 \item
2145 The lattice must have no \emph{infinite ascending chains}; that is,
2146 every sequence of calls to @fact_join@ must eventually return @NoChange@.
2147 \item
2148 The transfer function must be
2149 \emph{monotonic}: given a more informative fact in,
2150 it must produce a more informative fact out.
2151 \item
2152 The rewrite function must be \emph{sound}:
2153 if it replaces a node @n@ by a replacement graph~@g@, then @g@~must be
2154 observationally equivalent to~@n@ under the
2155 assumptions expressed by the incoming dataflow fact~@f@.
2156 %%\footnote{We do not permit a transformation to change
2157 %% the @Label@ of a node. We have not found any optimizations
2158 %% that are prevented (or even affected) by this restriction.}
2159 %
2160 Moreover, analysis of~@g@ must produce output fact(s)
2161 that are at least as informative as the fact(s) produced by applying
2162 the transfer function to~@n@.
2163 %% \item
2164 %% The rewrite function must be \emph{consistent} with the transfer function;
2165 %% that is, \mbox{@`transfer n f@ $\sqsubseteq$ @transfer g f@}.
2166 For example, if the transfer function says that @x=7@ after the node~@n@,
2167 then after analysis of~@g@,
2168 @x@~had better still be~@7@.
2169 \item
2170 % To ensure termination,
2171 A transformation that uses deep rewriting
2172 must not return a replacement graph which
2173 contains a node that could be rewritten indefinitely.
2174 \end{itemize}
2175 %% Without the conditions on monotonicity and consistency,
2176 %% our algorithm will terminate,
2177 %% but there is no guarantee that it will compute
2178 %% a fixed point of the analysis. And that in turn threatens the
2179 %% soundness of rewrites based on possibly bogus ``facts''.
2180 Under these conditions, the algorithm terminates and is
2181 sound.
2182 %%
2183 %% \begin{itemize}
2184 %% \item
2185 %% The algorithm terminates. The fixed-point loop must terminate because the
2186 %% lattice has no infinite ascending chains. And the client is responsible
2187 %% for avoiding infinite recursion when deep rewriting is used.
2188 %% \item
2189 %% The algorithm is sound. Why? Because if each rewrite is sound (in the sense given above),
2190 %% then applying a succession of rewrites is also sound.
2191 %% Moreover, a~sound analysis of the replacement graph
2192 %% may generate only dataflow facts that could have been
2193 %% generated by a more complicated analysis of the original graph.
2194 %% \end{itemize}
2195 %%
2196 %% \finalremark{Doesn't the rewrite have to be have the following property:
2197 %% for a forward analysis/transform, if (rewrite P s) = Just s',
2198 %% then (transfer P s $\sqsubseteq$ transfer P s').
2199 %% For backward: if (rewrite Q s) = Just s', then (transfer Q s' $\sqsubseteq$ transfer Q s).
2200 %% Works for liveness.
2201 %% ``It works for liveness, so it must be true'' (NR).
2202 %% If this is true, it's worth a QuickCheck property!
2203 %% }%
2204 %% \finalremark{Version 2, after further rumination. Let's define
2205 %% $\scriptstyle \mathit{rt}(f,s) = \mathit{transform}(f, \mathit{rewrite}(f,s))$.
2206 %% Then $\mathit{rt}$ should
2207 %% be monotonic in~$f$. We think this is true of liveness, but we are not sure
2208 %% whether it's just a generally good idea, or whether it's actually a
2209 %% precondition for some (as yet unarticulated) property of \ourlib{} to hold.}%
2210
2211 %%%% \simon{The rewrite functions must presumably satisfy
2212 %%%% some monotonicity property. Something like: given a more informative
2213 %%%% fact, the rewrite function will rewrite a node to a more informative graph
2214 %%%% (in the fact lattice.).
2215 %%%% \textbf{NR}: actually the only obligation of the rewrite function is
2216 %%%% to preserve observable behavior. There's no requirement that it be
2217 %%%% monotonic or indeed that it do anything useful. It just has to
2218 %%%% preserve semantics (and be a pure function of course).
2219 %%%% \textbf{SLPJ} In that case I think I could cook up a program that
2220 %%%% would never reach a fixed point. Imagine a liveness analysis with a loop;
2221 %%%% x is initially unused anywhere.
2222 %%%% At some assignment node inside the loop, the rewriter behaves as follows:
2223 %%%% if (and only if) x is dead downstream,
2224 %%%% make it alive by rewriting the assignment to mention x.
2225 %%%% Now in each successive iteration x will go live/dead/live/dead etc. I
2226 %%%% maintain my claim that rewrite functions must satisfy some
2227 %%%% monotonicity property.
2228 %%%% \textbf{JD}: in the example you cite, monotonicity of facts at labels
2229 %%%% means x cannot go live/dead/live/dead etc. The only way we can think
2230 %%%% of not to terminate is infinite ``deep rewriting.''
2231 %%%% }
2232
2233
2234
2235
2236 \section{\ourlib's implementation}
2237 \seclabel{engine}
2238 \seclabel{dfengine}
2239
2240 \secref{making-simple}
2241 gives a client's-eye view of \hoopl, showing how to use
2242 it to create analyses and transformations.
2243 \hoopl's interface is simple, but
2244 the \emph{implementation} of interleaved analysis and rewriting is~not.
2245 \citet{lerner-grove-chambers:2002}
2246 do not describe their implementation. We have written
2247 at least three previous implementations, all of which
2248 were long and hard to understand, and only one of which
2249 provided compile-time guarantees about open and closed shapes.
2250 We are not confident that any of our previous implementations are correct.
2251
2252 In this paper we describe a new implementation. It is elegant and short
2253 (about a third of the size of our last attempt), and it offers
2254 strong compile-time guarantees about shapes.
2255 \finalremark{Wanted: enumerate the critical components and give each one's size}
2256
2257 We describe the implementation of \emph{forward}
2258 analysis and transformation.
2259 The implementations of backward analysis and transformation are
2260 exactly analogous and are included in \hoopl.
2261
2262
2263
2264 \begin{ntext}
2265 \subsection{Rewrite functions}
2266
2267
2268
2269 \begin{code}
2270 `withFuel :: FuelMonad m => Maybe a -> m (Maybe a)
2271 \end{code}
2272
2273
2274 as expressed by the
2275 @FwdRew@ type returned by a @FwdRewrite@ (\figref{api-types}).
2276 The first component of the @FwdRew@ is the replacement graph, as discussed earlier.
2277 The second component, $\rw$, is a
2278 \emph{new rewrite function} to use when recursively processing
2279 the replacement graph.
2280 For shallow rewriting this new function is
2281 the constant @Nothing@ function; for deep rewriting it is the original
2282 rewrite function.
2283 While @mkFRewrite@ allows for general rewriting, most clients will
2284 take advantage of \hoopl's support for these two common cases:
2285 \begin{smallcode}
2286 `deepFwdRw, `shallowFwdRw
2287 :: Monad m
2288 => (forall e x . n e x -> f -> m (Maybe (Graph n e x))
2289 -> FwdRewrite m n f
2290 \end{smallcode}
2291 \end{ntext}
2292
2293
2294
2295 \subsection{Overview}
2296
2297
2298 %% We on @analyzeAndRewriteFwd@, whose type is more general
2299 %% than that of @analyzeAndRewriteFwdBody@:
2300 %% \begin{smallcode}
2301 %% `analyzeAndRewriteFwd
2302 %% :: forall m n f e x. (FuelMonad m, NonLocal n)
2303 %% => FwdPass m n f -- lattice, transfers, rewrites
2304 %% -> MaybeC e [Label] -- entry points for a closed graph
2305 %% -> Graph n e x -- the original graph
2306 %% -> Fact e f -- fact(s) flowing into the entry/entries
2307 %% -> m (Graph n e x, FactBase f, MaybeO x f)
2308 %% \end{smallcode}
2309 %% We analyze graphs of all shapes; a single @FwdPass@ may be used with
2310 %% multiple shapes.
2311 %% If a graph is closed on entry, a list of entry points must be
2312 %% provided;
2313 %% if the graph is open on entry, it must be the case that the
2314 %% implicit entry point is the only entry point.
2315 %% The fact or set of facts flowing into the entries is similarly
2316 %% determined by the shape of the entry point.
2317 %% Finally, the result is a rewritten graph, a @FactBase@ that gives a
2318 %% fixed point of the analysis on the rewritten graph, and if the graph
2319 %% is open on exit, an ``exit fact'' flowing out.
2320
2321 Instead of the interface function @analyzeAndRewriteFwdBody@, we present
2322 the private function @arfGraph@ (short for ``analyze and rewrite
2323 forward graph''):
2324 \begin{smallcode}
2325 `arfGraph
2326 :: forall m n f e x. (FuelMonad m, NonLocal n)
2327 => FwdPass m n f -- lattice, transfers, rewrites
2328 -> MaybeC e [Label] -- entry points for a closed graph
2329 -> Graph n e x -- the original graph
2330 -> Fact e f -- fact(s) flowing into the entry/entries
2331 -> m (DG f n e x, Fact x f)
2332 \end{smallcode}
2333 Function @arfGraph@ has a more general type than
2334 the function @analyzeAndRewriteFwdBody@ % praying for a decent line break
2335 because @arfGraph@ is used recursively
2336 to analyze graphs of all shapes.
2337 If a graph is closed on entry, a list of entry points must be
2338 provided;
2339 if the graph is open on entry,
2340 the graph's entry sequence must be the only entry point.
2341 The~graph's shape on entry also determines the type of fact or facts
2342 flowing in.
2343 Finally, the result is a ``decorated graph''
2344 @DG f n e x@,
2345 and if the graph
2346 is open on exit, an ``exit fact'' flowing out.
2347
2348 %% \simon{I suggest (a) putting the paragraph break one sentence earlier,
2349 %%% so that this para is all about DGs.}
2350 %%% NR: previous para is about the type of arfGraph; I don't want to
2351 %%% leave the result type dangling. I hope the opening sentence of
2352 %%% this para suggests that the para is all about DGs.
2353 A~``decorated graph'' is one in which each block is decorated with the
2354 fact that holds at the start of the block.
2355 @DG@ actually shares a representation with @Graph@,
2356 which is possible because the definition of
2357 @Graph@ in \figref{graph} contains a white lie: @Graph@~is a type
2358 synonym for an underlying type @`Graph'@, which takes the type of block
2359 as an additional parameter.
2360 (Similarly, function @gSplice@ in \secref{gSplice} is actually a
2361 higher-order function that takes a block-concatenation function as a
2362 parameter.)
2363 The truth about @Graph@ and @DG@ is as follows:
2364 \verbatiminput{dg}
2365 % defn DG
2366 % defn DBlock
2367 Type~@DG@ is internal to \hoopl; it is not seen by any client.
2368 To~convert a~@DG@ to the @Graph@ and @FactBase@
2369 that are returned by the API function @analyzeAndRewriteFwdBody@,
2370 we use a 12-line function:
2371 \begin{smallcode}
2372 `normalizeGraph
2373 :: NonLocal n => DG f n e x -> (Graph n e x, FactBase f)
2374 \end{smallcode}
2375
2376 Function @arfGraph@ is implemented as follows:
2377 \begingroup
2378 \newcommand\cnl{\\[-6pt]}%
2379 \begin{smallttcode}
2380 arfGraph pass entries = graph
2381 where\cnl
2382 node :: forall e x . (ShapeLifter e x)
2383 => n e x -> f -> m (DG f n e x, Fact x f)\cnl
2384 block:: forall e x .
2385 Block n e x -> f -> m (DG f n e x, Fact x f)\cnl
2386 body :: [Label] -> LabelMap (Block n C C)
2387 -> Fact C f -> m (DG f n C C, Fact C f)\cnl
2388 `graph:: Graph n e x -> Fact e f -> m (DG f n e x, Fact x f)\cnl
2389 ... definitions of 'node', 'block', 'body', and 'graph' ...
2390 \end{smallttcode}
2391 The four auxiliary functions help us separate concerns: for example, only
2392 \endgroup
2393 @node@ knows about rewrite functions;
2394 and only @body@ knows about fixed points.
2395 %% All four functions have access to the @FwdPass@ and any entry points
2396 %% that are passed to @arfGraph@.
2397 %% These functions also have access to type variables bound by
2398 %% @arfGraph@:
2399 %% @n@~is the type of nodes; @f@~is the type of facts;
2400 %% @m@~is the monad used in the rewrite functions of the @FwdPass@;
2401 %% and
2402 %% @e@~and~@x@ give the shape of the graph passed to @arfGraph@.
2403 %% The types of the inner functions are
2404 %% \begin{smallcode}
2405 %% \end{smallcode}
2406 Each auxiliary function works the same way: it~takes a ``thing'' and
2407 returns an \emph{extended fact transformer}.
2408 An~extended fact transformer takes dataflow fact(s) coming into
2409 the ``thing,'' and it returns an output fact.
2410 It~also returns a decorated graph representing the (possibly
2411 rewritten) ``thing''---that's the \emph{extended} part.
2412 Finally, because a rewrite may require fresh names provided
2413 by the client,
2414 may wish to consult state provided by the client,
2415 or may consume ``optimization fuel'' (\secref{fuel}),
2416 every extended fact transformer is monadic.
2417
2418 %%%% \begin{figure}
2419 %%%% SIMON HAS ASKED IF TYPE SYNONYMS MIGHT IMPROVE THINGS FOR EXTENDED
2420 %%%% FACT TRANSFORMERS. JUDGE FOR YOURSELF.
2421 %%%% FIRST, SOMETHING THAT IS SOMEWHAT READABLE BUT IS NOT LEGAL HASKELL:
2422 %%%% \begin{smallcode}
2423 %%%% type EFFX ipt e x = ipt -> m (DG f n e x, Fact x f)
2424 %%%% -- extended forward fact transformer
2425 %%%%
2426 %%%% node :: forall e x . (ShapeLifter e x)
2427 %%%% => n e x -> EFFX f e x
2428 %%%% block :: forall e x .
2429 %%%% Block n e x -> EFFX f e x
2430 %%%% body :: [Label] -> LabelMap (Block n C C)
2431 %%%% -> EFFX (Fact C f) C C
2432 %%%% graph :: Graph n e x -> EFFX (Fact e f) e x
2433 %%%% \end{smallcode}
2434 %%%% IF WE MAKE IT LEGAL HASKELL, IT BECOMES COMPLETELY HOPELESS:
2435 %%%% \begin{smallcode}
2436 %%%% type EFFX m n f ipt e x = ipt -> m (DG f n e x, Fact x f)
2437 %%%% -- extended forward fact transformer
2438 %%%%
2439 %%%% node :: forall e x . (ShapeLifter e x)
2440 %%%% => n e x -> EFFX m n f f e x
2441 %%%% block :: forall e x .
2442 %%%% Block n e x -> EFFX m n f f e x
2443 %%%% body :: [Label] -> LabelMap (Block n C C)
2444 %%%% -> EFFX m n f (Fact C f) C C
2445 %%%% graph :: Graph n e x -> EFFX m n f (Fact e f) e x
2446 %%%% \end{smallcode}
2447 %%%% \caption{EXPERIMENTS WITH TYPE SYNONYMS}
2448 %%%% \end{figure}
2449 %%%%
2450
2451
2452 The types of the four extended fact transformers are not quite
2453 identical:
2454 \begin{itemize}
2455 \item
2456 Extended fact transformers for nodes and blocks have the same type;
2457 like forward transfer functions,
2458 they expect a fact~@f@ rather than the more general @Fact e f@
2459 required for a graph.
2460 Because a node or a block has
2461 exactly one fact flowing into the entry, it is easiest simply to pass
2462 that fact.
2463 \item
2464 Extended fact transformers for graphs have the most general type,
2465 as expressed using @Fact@:
2466 if the graph is open on entry, its fact transformer expects a
2467 single fact;
2468 if the graph is closed on entry, its fact transformer expects a
2469 @FactBase@.
2470 \item
2471 Extended fact transformers for bodies have the same type as
2472 extended fact transformers for closed/closed graphs.
2473 \end{itemize}
2474
2475
2476 Function @arfGraph@ and its four auxiliary functions comprise a cycle of
2477 mutual recursion:
2478 @arfGraph@ calls @graph@;
2479 @graph@ calls @body@ and @block@;
2480 @body@ calls @block@;
2481 @block@ calls @node@;
2482 and
2483 @node@ calls @arfGraph@.
2484 These five functions do three different kinds of work:
2485 compose extended fact transformers, analyze and rewrite nodes, and compute
2486 fixed points.
2487
2488
2489
2490 \subsection{Analyzing blocks and graphs by composing extended fact transformers}
2491 \seclabel{block-impl}
2492
2493 Extended fact transformers compose nicely.
2494 For example, @block@ is implemented thus:
2495 \ifpagetuning\par\vfilbreak[1in]\fi % horrible page break
2496 \begin{code}
2497 `block :: forall e x .
2498 Block n e x -> f -> m (DG f n e x, Fact x f)
2499 block (BFirst n) = node n
2500 block (BMiddle n) = node n
2501 block (BLast n) = node n
2502 block (BCat b1 b2) = block b1 `cat` block b2
2503 \end{code}
2504 The composition function @cat@ feeds facts from one extended fact
2505 transformer to another, and it splices decorated graphs.
2506 It~has a very general type:
2507 \begin{code}
2508 cat :: forall m e a x f f1 f2. Monad m
2509 => (f -> m (DG f n e a, f1))
2510 -> (f1 -> m (DG f n a x, f2))
2511 -> (f -> m (DG f n e x, f2))
2512 `cat ft1 ft2 f = do { (g1,f1) <- ft1 f
2513 ; (g2,f2) <- ft2 f1
2514 ; return (g1 `dgSplice` g2, f2) }
2515 \end{code}
2516 (Function @`dgSplice@ is the same splicing function used for an ordinary
2517 @Graph@, but it uses a one-line block-concatenation function suitable
2518 for @DBlock@s.)
2519 The name @cat@ comes from the concatenation of the decorated graphs,
2520 but it is also appropriate because the style in which it is used is
2521 reminiscent of @concatMap@, with the @node@ and @block@ functions
2522 playing the role of @map@.
2523
2524 Function @graph@ is much like @block@, but it has more cases.
2525
2526
2527 %%%%
2528 %%%% \begin{itemize}
2529 %%%% \item
2530 %%%% The @arfNode@ function processes nodes (\secref{arf-node}).
2531 %%%% It handles the subtleties of interleaved analysis and rewriting,
2532 %%%% and it deals with fuel consumption. It calls @arfGraph@ to analyze
2533 %%%% and transform rewritten graphs.
2534 %%%% \item
2535 %%%% Based on @arfNode@ it is extremely easy to write @arfBlock@, which lifts
2536 %%%% the analysis and rewriting from nodes to blocks (\secref{arf-block}).
2537 %%%%
2538 %%%%
2539 %%%%
2540 %%%% \item
2541 %%%% Using @arfBlock@ we define @arfBody@, which analyzes and rewrites a
2542 %%%% @Body@: a~group of closed/closed blocks linked by arbitrary
2543 %%%% control flow.
2544 %%%% Because a @Body@ is
2545 %%%% always closed/closed and does not take shape parameters, function
2546 %%%% @arfBody@ is less polymorphic than the others; its type is what
2547 %%%% would be obtained by expanding and specializing the definition of
2548 %%%% @ARF@ for a @thing@ which is always closed/closed and is equivalent to
2549 %%%% a @Body@.
2550 %%%%
2551 %%%% Function @arfBody@ takes care of fixed points (\secref{arf-body}).
2552 %%%% \item
2553 %%%% Based on @arfBody@ it is easy to write @arfGraph@ (\secref{arf-graph}).
2554 %%%% \end{itemize}
2555 %%%% Given these functions, writing the main analyzer is a simple
2556 %%%% matter of matching the external API to the internal functions:
2557 %%%% \begin{code}
2558 %%%% `analyzeAndRewriteFwdBody
2559 %%%% :: forall n f. NonLocal n
2560 %%%% => FwdPass n f -> Body n -> FactBase f
2561 %%%% -> FuelMonad (Body n, FactBase f)
2562 %%%%
2563 %%%% analyzeAndRewriteFwdBody pass ^body facts
2564 %%%% = do { (^rg, _) <- arfBody pass body facts
2565 %%%% ; return (normalizeBody rg) }
2566 %%%% \end{code}
2567 %%%%
2568 %%%% \subsection{From nodes to blocks} \seclabel{arf-block}
2569 %%%% \seclabel{arf-graph}
2570 %%%%
2571 %%%% We begin our explanation with the second task:
2572 %%%% writing @arfBlock@, which analyzes and transforms blocks.
2573 %%%% \begin{code}
2574 %%%% `arfBlock :: NonLocal n => ARF (Block n) n
2575 %%%% arfBlock pass (BUnit node) f
2576 %%%% = arfNode pass node f
2577 %%%% arfBlock pass (BCat b1 b2) f
2578 %%%% = do { (g1,f1) <- arfBlock pass b1 f
2579 %%%% ; (g2,f2) <- arfBlock pass b2 f1
2580 %%%% ; return (g1 `DGCatO` g2, f2) }
2581 %%%% \end{code}
2582 %%%% The code is delightfully simple.
2583 %%%% The @BUnit@ case is implemented by @arfNode@.
2584 %%%% The @BCat@ case is implemented by recursively applying @arfBlock@ to the two
2585 %%%% sub-blocks, threading the output fact from the first as the
2586 %%%% input to the second.
2587 %%%% Each recursive call produces a rewritten graph;
2588 %%%% we concatenate them with @DGCatO@.
2589 %%%%
2590 %%%% Function @arfGraph@ is equally straightforward:
2591 %%%% XXXXXXXXXXXXXXX
2592 %%%% The pattern is the same as for @arfBlock@: thread
2593 %%%% facts through the sequence, and concatenate the results.
2594 %%%% Because the constructors of type~@DG@ are more polymorphic than those
2595 %%%% of @Graph@, type~@DG@ can represent
2596 %%%% graphs more simply than @Graph@; for example, each element of a
2597 %%%% @GMany@ becomes a single @DG@ object, and these @DG@ objects are then
2598 %%%% concatenated to form a single result of type~@DG@.
2599 %%%%
2600
2601 \subsection{Analyzing and rewriting nodes} \seclabel{arf-node}
2602
2603 The @node@ function is where we interleave analysis with rewriting:
2604 \begin{smallfuzzcode}{10.5pt}
2605 node :: forall e x . (ShapeLifter e x, FuelMonad m)
2606 => n e x -> f -> m (DG f n e x, Fact x f)
2607 node n f
2608 = do { rew <- frewrite pass n f
2609 ; case rew of
2610 Nothing -> return (singletonDG f n,
2611 ftransfer pass n f)
2612 Just (FwdRew g rw) ->
2613 let pass' = pass { fp_rewrite = rw }
2614 f' = fwdEntryFact n f
2615 in arfGraph pass' (fwdEntryLabel n) g f' }
2616
2617 class ShapeLifter e x where
2618 singletonDG :: f -> n e x -> DG f n e x
2619 fwdEntryFact :: NonLocal n => n e x -> f -> Fact e f
2620 fwdEntryLabel :: NonLocal n => n e x -> MaybeC e [Label]
2621 ftransfer :: FwdPass m n f -> n e x -> f -> Fact x f
2622 frewrite :: FwdPass m n f -> n e x
2623 -> f -> m (Maybe (FwdRew m n f e x))
2624 \end{smallfuzzcode}
2625 Function @node@ uses @frewrite@ to extract the rewrite function from
2626 @pass@,
2627 and applies the rewrite function to the node~@n@ and the incoming fact~@f@.
2628 The result of the rewrite is passed to @withFuel@, but for now,
2629 pretend @withFuel@ is ``@return@;'' we present
2630 the details below
2631 in \secref{fuel}.
2632 The result from @withFuel@, @rew@, is
2633 scrutinized by the @case@ expression.
2634
2635 In the @Nothing@ case, no rewrite takes place.
2636 We~return node~@n@ and its incoming fact~@f@
2637 as the decorated graph @singletonDG f n@.
2638 To produce the outgoing fact, we apply the transfer function
2639 @ftransfer pass@ to @n@~and~@f@.
2640
2641 In the @Just@ case, we receive a replacement
2642 graph~@g@ and a new rewrite function~@rw@.
2643 We~recursively analyze @g@ with @arfGraph@.
2644 This analysis uses @pass'@, which contains the original lattice and transfer
2645 function from @pass@, together with the new rewrite function~@rw@.
2646 Function @fwdEntryFact@ converts fact~@f@ from the type~@f@,
2647 which @node@ expects, to the type @Fact e f@, which @arfGraph@ expects.
2648
2649 As you see, several functions called in @node@ are overloaded over a
2650 (private) class @ShapeLifter@, because their implementations depend
2651 on the open/closed shape of the node.
2652 By design, the shape of a node is known statically everywhere @node@
2653 is called, so
2654 this use of @ShapeLifter@ is specialized
2655 away by the compiler.
2656
2657 %% And that's it! If~the client wanted deep rewriting, it is
2658 %% implemented by the call to @arfGraph@;
2659 %% if the client wanted
2660 %% shallow rewriting, the rewrite function will have returned
2661 %% @noFwdRw@ as~@rw@, which is implanted in @pass'@
2662 %% (\secref{shallow-vs-deep}).
2663
2664 \subsection{Throttling rewriting using ``optimization fuel''}
2665 \seclabel{vpoiso}
2666 \seclabel{fuel}
2667
2668 In function @node@, the call to
2669 @withFuel@ may prevent a node from
2670 being rewritten.
2671 Function @withFuel@ inspects a supply of \emph{optimization fuel},
2672 which is stored in a @FuelMonad@ (\figref{api-types}).
2673 If~@withFuel@ is passed a @Just@, a rewrite is being requested, and
2674 if~fuel is available, @withFuel@ @return@s the @Just@,
2675 reducing the supply of fuel by one unit.
2676 In~all other cases, including when fuel is exhausted, @withFuel@
2677 has no effect on the @FuelMonad@ and @return@s @Nothing@.
2678
2679
2680 Optimization fuel is used to debug the optimizer:
2681 when optimization produces a faulty program,
2682 we use Whalley's
2683 \citeyearpar{whalley:isolation} technique to find the fault.
2684 Given a program that fails when compiled with optimization,
2685 a binary search on the amount of
2686 optimization fuel
2687 finds an~$n$ such that the program works correctly after $n-1$ rewrites
2688 but fails
2689 after $n$~rewrites.
2690 The $n$th rewrite is faulty.
2691
2692
2693
2694 \seclabel{fuel-monad}
2695
2696
2697
2698 \subsection{Fixed points} \seclabel{arf-body}
2699
2700 The fourth and final auxiliary function of @arfGraph@ is
2701 @body@, which iterates to a fixed point.
2702 This part of the implementation is the only really tricky part, and it is
2703 cleanly separated from everything else:
2704 \smallverbatiminput{bodyfun}
2705 % defn body
2706 % local do_block
2707 % local blocks
2708 % local lattice
2709 % local entryFact
2710 % local entries
2711 % local init_fbase
2712 % local blockmap
2713 Function @getFact@ looks up a fact by its label.
2714 If the label is not found,
2715 @getFact@ returns
2716 the bottom element of the lattice:
2717 \begin{smallcode}
2718 `getFact :: DataflowLattice f -> Label -> FactBase f -> f
2719 \end{smallcode}
2720 Function @forwardBlockList@ takes a list of possible entry points and
2721 a finite map from labels to blocks.
2722 It returns a list of
2723 blocks, sorted into an order that makes forward dataflow efficient.\footnote
2724 {The order of the blocks does not affect the fixed point or any other
2725 part of the answer; it affects only the number of iterations needed to
2726 reach the fixed point.}
2727 \begin{smallcode}
2728 `forwardBlockList
2729 :: NonLocal n
2730 => [Label] -> LabelMap (Block n C C) -> [Block n C C]
2731 \end{smallcode}
2732 For
2733 example, if the entry point is at~@L2@, and the block at~@L2@
2734 branches to~@L1@, but not vice versa, then \hoopl\ will reach a fixed point
2735 more quickly if we process @L2@ before~@L1@.
2736 To~find an efficient order, @forwardBlockList@ uses
2737 the methods of the @NonLocal@ class---@entryLabel@ and @successors@---to
2738 perform a reverse postorder depth-first traversal of the control-flow graph.
2739 %%
2740 %%The @NonLocal@ type-class constraint on~@n@ propagates to all the
2741 %%@`arfThing@ functions.
2742 %% paragraph carrying too much freight
2743 %%
2744
2745 The rest of the work is done by @fixpoint@, which is shared by
2746 both forward and backward analyses:
2747 \smallverbatiminput{fptype}
2748 % defn Direction
2749 % defn Fwd
2750 % defn Bwd
2751 Except for the @Direction@ passed as the first argument,
2752 the type signature tells the story.
2753 The third argument is an extended fact transformer for a single block;
2754 @fixpoint@ applies that function successively to each block in the list
2755 passed as the fourth argument.
2756 The result is an extended fact transformer for the list.
2757
2758 The extended fact transformer returned by @fixpoint@
2759 maintains a
2760 ``current @FactBase@''
2761 which grows monotonically:
2762 as each block is analyzed,
2763 the block's input fact is taken from
2764 the current @FactBase@,
2765 and the current @FactBase@
2766 is
2767 augmented with the facts that flow out of the block.
2768 %
2769 The initial value of the current @FactBase@ is the input @FactBase@,
2770 and
2771 the extended fact transformer
2772 iterates over the blocks until the current @FactBase@
2773 stops changing.
2774
2775
2776
2777 Implementing @fixpoint@ requires about 90 lines,
2778 formatted narrowly for display in one column.
2779 %%
2780 %% for completeness, it
2781 %% appears in Appendix~\ref{app:fixpoint}.
2782 The~code is mostly straightforward, although we try to be clever
2783 about deciding when a new fact means that another iteration over the
2784 blocks will be required.
2785 \finalremark{Rest of this \S\ is a candidate to be cut}
2786 There is one more subtle point worth mentioning, which we highlight by
2787 considering a forward analysis of this graph, where execution starts at~@L1@:
2788 \begin{code}
2789 L1: x:=3; goto L4
2790 L2: x:=4; goto L4
2791 L4: if x>3 goto L2 else goto L5
2792 \end{code}
2793 Block @L2@ is unreachable.
2794 But if we \naively\ process all the blocks (say in
2795 order @L1@, @L4@, @L2@), then we will start with the bottom fact for @L2@, propagate
2796 \{@x=4@\} to @L4@, where it will join with \{@x=3@\} to yield
2797 \{@x=@$\top$\}.
2798 Given @x=@$\top$, the
2799 conditional in @L4@ cannot be rewritten, and @L2@~seems reachable. We have
2800 lost a good optimization.
2801
2802 Function @fixpoint@ solves this problem
2803 by analyzing a block only if the block is
2804 reachable from an entry point.
2805 This trick is safe only for a forward analysis, which
2806 is why
2807 @fixpoint@ takes a @Direction@ as its first argument.
2808
2809 %% Although the trick can be implemented in just a couple of lines of
2810 %% code, the reasoning behind it is quite subtle---exactly the sort of
2811 %% thing that should be implemented once in \hoopl, so clients don't have
2812 %% to worry about it.
2813
2814 \section {Related work} \seclabel{related}
2815
2816 While there is a vast body of literature on
2817 dataflow analysis and optimization,
2818 relatively little can be found on
2819 the \emph{design} of optimizers, which is the topic of this paper.
2820 We therefore focus on the foundations of dataflow analysis
2821 and on the implementations of some comparable dataflow frameworks.
2822
2823 \paragraph{Foundations}
2824
2825 When transfer functions are monotone and lattices are finite in height,
2826 iterative dataflow analysis converges to a fixed point
2827 \cite{kam-ullman:global-iterative-analysis}.
2828 If~the lattice's join operation distributes over transfer
2829 functions,
2830 this fixed point is equivalent to a join-over-all-paths solution to
2831 the recursive dataflow equations
2832 \cite{kildall:unified-optimization}.\footnote
2833 {Kildall uses meets, not joins.
2834 Lattice orientation is conventional, and conventions have changed.
2835 We use Dana Scott's
2836 orientation, in which higher elements carry more information.}
2837 \citet{kam-ullman:monotone-flow-analysis} generalize to some
2838 monotone functions.
2839 Each~client of \hoopl\ must guarantee monotonicity.
2840
2841 \ifcutting
2842 \citet{cousot:abstract-interpretation:1977}
2843 \else
2844 \citet{cousot:abstract-interpretation:1977,cousot:systematic-analysis-frameworks}
2845 \fi
2846 introduce abstract interpretation as a technique for developing
2847 lattices for program analysis.
2848 \citet{steffen:data-flow-analysis-model-checking:1991} shows that
2849 a dataflow analysis can be implemented using model checking;
2850 \citet{schmidt:data-flow-analysis-model-checking}
2851 expands on this~result by showing that
2852 an all-paths dataflow problem can be viewed as model checking an
2853 abstract interpretation.
2854
2855 \citet{marlowe-ryder:properties-data-flow-frameworks}
2856 present a survey of different methods for performing dataflow analyses,
2857 with emphasis on theoretical results.
2858 \citet{muchnick:compiler-implementation}
2859 presents many examples of both particular analyses and related
2860 algorithms.
2861
2862
2863 The soundness of interleaving analysis and transformation,
2864 even when not all speculative transformations are performed on later
2865 iterations, was shown by
2866 \citet{lerner-grove-chambers:2002}.
2867
2868
2869 \paragraph{Frameworks}
2870 Most dataflow frameworks support only analysis, not transformation.
2871 The framework computes a fixed point of transfer functions, and it is
2872 up to the client of
2873 the framework to use that fixed point for transformation.
2874 Omitting transformation makes it much easier to build frameworks,
2875 and one can find a spectrum of designs.
2876 We~describe two representative
2877 designs, then move on to the prior frameworks that support interleaved
2878 analysis and transformation.
2879
2880 The Soot framework is designed for analysis of Java programs
2881 \cite{hendren:soot:2000}.\finalremark{This citation is probably the
2882 best for Soot in general, but there doesn't appear
2883 to be any formal publication that actually details the dataflow
2884 framework part. ---JD}
2885 While Soot's dataflow library supports only analysis, not
2886 transformation, we found much
2887 to admire in its design.
2888 Soot's library is abstracted over the representation of
2889 the control-flow graph and the representation of instructions.
2890 Soot's interface for defining lattice and analysis functions is
2891 like our own,
2892 although because Soot is implemented in an imperative style,
2893 additional functions are needed to copy lattice elements.
2894
2895
2896 The CIL toolkit \cite{necula:cil:2002}\finalremark{No good citation
2897 for same reason as Soot below ---JD}
2898 supports both analysis and rewriting of C~programs,
2899 but rewriting is clearly distinct from analysis:
2900 one runs an analysis to completion and then rewrites based on the
2901 results.
2902 The framework is limited to one representation of control-flow graphs
2903 and one representation of instructions, both of which are provided by
2904 the framework.
2905 The~API is complicated;
2906 much of the complexity is needed to enable the client to
2907 affect which instructions
2908 the analysis iterates over.
2909
2910
2911 \finalremark{FYI, LLVM has Pass Managers that try to control the order of passes,
2912 but I'll be darned if I can find anything that might be termed a dataflow framework.}
2913
2914 The Whirlwind compiler contains the dataflow framework implemented
2915 by \citet{lerner-grove-chambers:2002}, who were the first to
2916 interleave analysis and transformation.
2917 Their implementation is much like our early efforts:
2918 it is a complicated mix of code that simultaneously manages interleaving,
2919 deep rewriting, and fixed-point computation.
2920 By~separating these tasks,
2921 our implementation simplifies the problem dramatically.
2922 Whirlwind's implementation also suffers from the difficulty of
2923 maintaining pointer invariants in a mutable representation of
2924 control-flow graphs, a problem we have discussed elsewhere
2925 \cite{ramsey-dias:applicative-flow-graph}.
2926
2927 Because speculative transformation is difficult in an imperative setting,
2928 Whirlwind's implementation is split into two phases.
2929 The first phase runs the interleaved analyses and transformations
2930 to compute the final dataflow facts and a representation of the transformations
2931 that should be applied to the input graph.
2932 The second phase executes the transformations.
2933 In~\hoopl, because control-flow graphs are immutable, speculative transformations
2934 can be applied immediately, and there is no need
2935 for a phase distinction.
2936
2937 %%% % repetitious...
2938 %%%
2939 %%% \ourlib\ also improves upon Whirlwind's dataflow framework by providing
2940 %%% new support for the optimization writer:
2941 %%% \begin{itemize}
2942 %%% \item Using static type guarantees, \hoopl\ rules out a whole
2943 %%% class of possible bugs: transformations that produced malformed
2944 %%% control-flow graphs.
2945 %%% \item Using dynamic testing,
2946 %%% we can isolate the rewrite that transforms a working program
2947 %%% into a faulty program,
2948 %%% using Whalley's \citeyearpar{whalley:isolation} fault-isolation technique.
2949 %%% \end{itemize}
2950
2951 %% what follows now looks redundant with discussion below ---NR
2952
2953 %% In previous work \cite{ramsey-dias:applicative-flow-graph}, we
2954 %% described a zipper-based representation of control-flow
2955 %% graphs, stressing the advantages
2956 %% of immutability.
2957 %% Our new representation, described in \secref{graph-rep}, is a significant improvement:
2958 %% \begin{itemize}
2959 %% \item
2960 %% We can concatenate nodes, blocks, and graphs in constant time.
2961 %% %Previously, we had to resort to Hughes's
2962 %% %\citeyearpar{hughes:lists-representation:article} technique, representing
2963 %% %a graph as a function.
2964 %% \item
2965 %% We can do a backward analysis without having
2966 %% to ``unzip'' (and allocate a copy of) each block.
2967 %% \item
2968 %% Using GADTs, we can represent a flow-graph
2969 %% node using a single type, instead of the triple of first, middle, and
2970 %% last types used in our earlier representation.
2971 %% This change simplifies the interface significantly:
2972 %% instead of providing three transfer functions and three rewrite
2973 %% functions per pass---one for
2974 %% each type of node---a client of \hoopl\ provides only one transfer
2975 %% function and one rewrite function per pass.
2976 %% \item
2977 %% Errors in concatenation are ruled out at
2978 %% compile-compile time by Haskell's static
2979 %% type system.
2980 %% In~earlier implementations, such errors were not detected until
2981 %% the compiler~ran, at which point we tried to compensate
2982 %% for the errors---but
2983 %% the compensation code harbored subtle faults,
2984 %% which we discovered while developing a new back end
2985 %% for the Glasgow Haskell Compiler.
2986 %% \end{itemize}
2987 %%
2988 %% The implementation of \ourlib\ is also much better than
2989 %% our earlier implementations.
2990 %% Not only is the code simpler conceptually,
2991 %% but it is also shorter:
2992 %% our new implementation is about a third as long
2993 %% as the previous version, which is part of GHC, version 6.12.
2994
2995
2996
2997
2998 \section{Performance considerations}
2999
3000 Our work on \hoopl\ is too new for us to be able to say much
3001 about performance.
3002 It's~important to know how well \hoopl\ performs, but the
3003 question is comparative, and there isn't another library we can compare
3004 \hoopl\ with.
3005 For example, \hoopl\ is not a drop-in replacement for an existing
3006 component of GHC; we introduced \hoopl\ to GHC as part of a
3007 major refactoring of GHC's back end.
3008 The version of GHC with \hoopl\ seems about 15\%~slower than
3009 the previous version, and we
3010 don't know what portion of the slowdown can be attributed to the
3011 optimizer.
3012 %
3013 What we can say is that the costs of using \hoopl\ seem reasonable;
3014 there is no ``big performance hit.''
3015 And~a somewhat similar library, written in an \emph{impure} functional
3016 language, actually improved performance in an apples-to-apples
3017 comparison with a library using a mutable control-flow graph
3018 \cite{ramsey-dias:applicative-flow-graph}.
3019
3020 Although a~thorough evaluation of \hoopl's performance must await
3021 future work, we can identify some design decisions that affect
3022 performance.
3023 \begin{itemize}
3024 \item
3025 In \figref{graph}, we show a single concatenation operator for blocks.
3026 Using this representation, a block of $N$~nodes is represented using
3027 $2N-1$ heap objects.
3028 We~have also implemented a representation of blocks that include
3029 ``cons-like'' and ``snoc-like'' constructors;
3030 this representation requires only $N+1$ heap objects.
3031 We~don't know what difference this choice makes to performance.
3032 \item
3033 In \secref{engine}, the @body@ function analyzes and (speculatively)
3034 rewrites the body of a control-flow graph, and @fixpoint@ iterates
3035 this analysis until it reaches a fixed point.
3036 Decorated graphs computed on earlier iterations are thrown away.
3037 But~for each decorated graph of $N$~nodes, it is necessary to allocate
3038 at least $2N-1$~thunks, which correspond to applications of
3039 @singletonDG@ in~@node@ and of @dgSplice@ in~@cat@.
3040 In~an earlier version of \hoopl, this overhead was
3041 eliminated by splitting @arfGraph@ into two very similar functions: one to compute the
3042 fixed point, and the other to produce the rewritten graph.
3043 Having a single version of @arfGraph@ is simpler and easier
3044 to maintain, but we don't know the cost of allocating the
3045 additional thunks.
3046 \item
3047 The representation of a forward-transfer function is private to
3048 \hoopl.
3049 Two representations are possible:
3050 we may store a triple of functions, one for each shape a node may
3051 have;
3052 or we may store a single, polymorphic function.
3053 If~we use triples throughout, the costs are straightforward, but the
3054 code is complex.
3055 If~we use a single, polymorphic function, we sometimes have to use a
3056 \emph{shape classifier} (supplied by the client) when composing
3057 transfer functions.
3058 Using a shape classifier may introduce extra @case@ discriminations
3059 every time a transfer function or rewrite function is applied to a
3060 node.
3061 We~don't know how these extra discriminations might affect
3062 performance.
3063 \end{itemize}
3064 In summary, \hoopl\ performs well enough for use in~GHC,
3065 but there is much we don't know. Systematic investigation
3066 is indicated.
3067
3068
3069
3070 \section{Discussion}
3071
3072 We built \hoopl\ in order to combine three good ideas (interleaved
3073 analysis and transformation, optimization fuel, and an applicative
3074 control-flow graph) in a way that could easily be reused by many, many
3075 compiler writers.
3076 To~evaluate how well we succeeded, we examine how \hoopl\ has been
3077 used,
3078 we~examine the API, and we examine the implementation.
3079
3080 \paragraph{Using \hoopl}
3081
3082 As~suggested by the constant-propagation example in
3083 \figref{const-prop}, \hoopl\ makes it easy to implement many standard
3084 dataflow analyses.
3085 Students using \hoopl\ in a class at Tufts were able to implement
3086 such optimizations as lazy code motion \cite{knoop:lazy-code-motion}
3087 and induction-variable elimination
3088 \cite{cocke-kennedy:operator-strength-reduction} in just a few weeks.
3089 Students at Yale and at Portland State have also succeeded in building
3090 a variety of optimizations.
3091
3092 \hoopl's data types can support optimizations beyond classic
3093 dataflow.
3094 For example, within GHC, \hoopl's graphs are used
3095 to implement optimizations based on control flow,
3096 such as eliminating branch chains.
3097
3098 \hoopl\ has not yet been used to implement SSA optimizations.
3099 \hoopl\ is SSA-neutral:
3100 although we know of no attempt to use
3101 \hoopl\ to establish or enforce SSA~invariants,
3102 \hoopl\ makes it easy to include $\phi$-functions in the
3103 representation of first nodes,
3104 and if a transformation preserves SSA~invariants, it will continue to do
3105 so when implemented in \hoopl.
3106
3107 \paragraph{Examining the API}
3108
3109 We hope the our presentation of the API in \secref{api} speaks for
3110 itself,
3111 but there are a couple of properties we think are worth highlighting.
3112 First, it's a good sign that the API provides many higher-order
3113 combinators that make it easier to write client code. % with simple, expressive types.
3114 We~have had space to mention only a few:
3115 @extendJoinDomain@,
3116 @thenFwdRw@, @deepFwdRw@, @shallowFwdRw@, and @pairFwd@.
3117
3118 Second,
3119 the static encoding of open and closed shapes at compile time worked
3120 out well.
3121 % especially because it applies equally to nodes, blocks, and graphs.
3122 Shapes may
3123 seem like a small refinement, but they helped eliminate a number of
3124 bugs from GHC, and we expect them to help other clients too.
3125 GADTs are a convenient way to express shapes, and for clients
3126 written in Haskell, they are clearly appropriate.
3127 If~one wished to port \hoopl\ to a language without GADTs,
3128 many of the benefits could be realized by making the shapes phantom
3129 types, but without GADTs, pattern matching would be significantly more
3130 tedious and error-prone.
3131
3132
3133 % An~advantage of our ``shapely'' node API is that a client can
3134 % write a \emph{single} transfer function that is polymorphic in shape.
3135 % To~make this design work, however, we \emph{must} have
3136 % the type-level function
3137 % @Fact@ (\figref{api-types}), to express how incoming
3138 % and outgoing facts depend on the shape of a node.
3139 % Without type-level functions, we would have had to force clients to
3140 % use \emph{only} the triple-of-functions interface described in
3141 % \secref{triple-of-functions}.
3142
3143 \paragraph{Examining the implementation}
3144
3145 If you are thinking of adopting \hoopl, you had better consider not
3146 only whether you like the API, but whether, in case of emergency, you
3147 could maintain the implementation.
3148 We~believe that \secref{dfengine} sketches enough to show that \hoopl's
3149 implementation is a clear improvement over previous implementations
3150 of similar ideas.
3151 % The implementation is more difficult to evaluate than the~API.
3152 % Previous implementations of similar ideas have rolled the problems
3153 % into a big ball of mud.
3154 By~decomposing our implementation into @node@, @block@, @body@,
3155 @graph@, @cat@, @withFuel@, and @fixpoint@, we have clearly separated
3156 multiple concerns:
3157 interleaving analysis with rewriting,
3158 throttling rewriting using optimization fuel,
3159 and
3160 computing a fixed point using speculative rewriting.
3161 Because of this separation of concerns,
3162 we believe our implementation will be much easier to maintain than
3163 anything that preceded it.
3164
3165 Another good sign is that we have been able to make substantial
3166 changes in the implementation without changing the API.
3167 For example, in addition to ``@concatMap@ style,'' we have also
3168 implemented @arfGraph@ in ``fold style'' and in continuation-passing
3169 style.
3170 Which style is preferred is a matter of taste, and possibly
3171 a matter of performance.
3172
3173
3174
3175
3176 \iffalse
3177
3178 (We have also implemented a ``fold style,'' but because the types are
3179 a little less intuitive, we are sticking with @concatMap@ style for now.)
3180
3181
3182 > Some numbers, I have used it nine times, and would need the general fold
3183 > once to define blockToNodeList (or CB* equivalent suggested by you).
3184 > (We are using it in GHC to
3185 > - computing hash of the blocks from the nodes
3186 > - finding the last node of a block
3187 > - converting block to the old representation (2x)
3188 > - computing interference graph
3189 > - counting Area used by a block (2x)
3190 > - counting stack high-water mark for a block
3191 > - prettyprinting block)
3192
3193
3194 type-parameter hell, newtype hell, typechecking hell, instance hell,
3195 triple hell
3196
3197 \fi
3198
3199
3200 % We have spent six years implementing and reimplementing frameworks for
3201 % dataflow analysis and transformation.
3202 % This formidable design problem taught us
3203 % two kinds of lessons:
3204 % we learned some very specific lessons about representations and
3205 % algorithms for optimizing compilers,
3206 % and we were forcibly reminded of some very general, old lessons that are well
3207 % known not just to functional programmers, but to programmers
3208 % everywhere.
3209
3210
3211
3212 %%%% \remark{Orphaned: but for transfer functions that
3213 %%%% approximate weakest preconditions or strongest postconditions,
3214 %%%% monotonicity falls out naturally.}
3215 %%%%
3216 %%%%
3217 %%%% In conclusion we offer the following lessons from the experience of designing
3218 %%%% and implementing \ourlib{}.
3219 %%%% \begin{itemize}
3220 %%%% \item
3221 %%%% Although we have not stressed this point, there is a close connection
3222 %%%% between dataflow analyses and program logic:
3223 %%%% \begin{itemize}
3224 %%%% \item
3225 %%%% A forward dataflow analysis is represented by a predicate transformer
3226 %%%% that is related to \emph{strongest postconditions}
3227 %%%% \cite{floyd:meaning}.\footnote
3228 %%%% {In Floyd's paper the output of the predicate transformer is called
3229 %%%% the \emph{strongest verifiable consequent}, not the ``strongest
3230 %%%% postcondition.''}
3231 %%%% \item
3232 %%%% A backward dataflow analysis is represented by a predicate transformer
3233 %%%% that is related to \emph{weakest preconditions} \cite{dijkstra:discipline}.
3234 %%%% \end{itemize}
3235 %%%% Logicians write down the predicate transformers for the primitive
3236 %%%% program fragments, and then use compositional rules to ``lift'' them
3237 %%%% to a logic for whole programs. In the same way \ourlib{} lets the client
3238 %%%% write simple predicate transformers,
3239 %%%% and local rewrites based on those assertions, and ``lifts'' them to entire
3240 %%%% function bodies with arbitrary control flow.
3241
3242 \iffalse
3243
3244
3245 Reuse requires abstraction, and as is well known,
3246 designing good abstractions is challenging.
3247 \hoopl's data types and the functions over those types have been
3248 through {dozens} of revisions.
3249 \remark{dozens alert}
3250 As~we were refining our design, we~found it invaluable to operate in
3251 two modes:
3252 In the first mode, we designed, built, and used a framework as an
3253 important component of a real compiler (first Quick~{\PAL}, then GHC).
3254 In the second mode, we designed and built a standalone library, then
3255 redesigned and rebuilt it, sometimes going through several significant
3256 changes in a week.
3257 Operating in the first mode---inside a live compiler---forced us to
3258 make sure that no corners were cut, that we were solving a real
3259 problem, and that we did not inadvertently
3260 cripple some other part of the compiler.
3261 Operating in the second mode---as a standalone library---enabled us to
3262 iterate furiously, trying out many more ideas than would have
3263 been possible in the first mode.
3264 Alternating between these two modes has led to a
3265 better design than operating in either mode alone.
3266
3267 %% We were forcibly reminded of timeless truths:
3268 It is a truth universally acknowledged that
3269 interfaces are more important than implementations and data
3270 is more important than code.
3271 This truth is reflected in this paper, in which
3272 we
3273 have given \hoopl's API three times as much space as \hoopl's implementation.
3274
3275 We have evaluate \hoopl's API through small, focused classroom
3276 projects and by using \hoopl\ in the back end of the Glasgow Haskell
3277 Compiler.
3278
3279
3280
3281 We were also reminded that Haskell's type system (polymorphism, GADTs,
3282 higher-order functions, type classes, and so on) is a remarkably
3283 effective
3284 language for thinking about data and code---and that
3285 Haskell lacks a language of interfaces (like ML's signatures) that
3286 would make it equally effective for thinking about APIs at a larger scale.
3287 Still, as usual, the types were a remarkable aid to writing the code:
3288 when we finally agreed on the types presented above, the
3289 code almost wrote itself.
3290
3291 Types are widely appreciated at ICFP, but here are three specific
3292 examples of how types helped us:
3293 \begin{itemize}
3294 \item
3295 Reuse is enabled by representation-independence, which in a functional
3296 language is
3297 expressed through parametric polymorphism.
3298 Making \ourlib{} polymorphic in the nodes
3299 made the code simpler, easier to understand, and easier to maintain.
3300 In particular, it forced us to make explicit \emph{exactly} what
3301 \ourlib\ must know about nodes, and to embody that knowledge in the
3302 @NonLocal@ type class (\secref{nonlocal-class}).
3303 \item
3304 \remark{too much? Simon: better?}
3305 %
3306 % this paper is just not about run-time performance ---NR
3307 %
3308 %%%% Moreover, the implementation is faster than it would otherwise be,
3309 %%%% because, say, a @(Fact O f)e@ is known to be just an @f@ rather than
3310 %%%% being a sum type that must be tested (with a statically known outcome!).
3311 %
3312 Giving the \emph{same} shapes
3313 to nodes, blocks, and graphs helped our
3314 thinking and helped to structure the implementation.
3315 \item
3316 \end{itemize}
3317
3318 \fi
3319
3320 \paragraph{Final remarks}
3321
3322 Dataflow optimization is usually described as a way to improve imperative
3323 programs by mutating control-flow graphs.
3324 Such transformations appear very different from the tree rewriting
3325 that functional languages are so well known for, and that makes
3326 \ifhaskellworkshop
3327 Haskell
3328 \else
3329 functional languages
3330 \fi
3331 so attractive for writing other parts of compilers.
3332 But even though dataflow optimization looks very different from
3333 what we are used to,
3334 writing a dataflow optimizer
3335 in
3336 \ifhaskellworkshop
3337 Haskell
3338 \else
3339 a pure functional language
3340 \fi
3341 was a win:
3342 % We could not possibly have conceived \ourlib{} in C++.
3343 we had to make every input and output explicit,
3344 and we had a strong incentive to implement things compositionally.
3345 Using Haskell helped us make real improvements in the implementation of
3346 some very sophisticated ideas.
3347 % %%
3348 % %%
3349 % %% In~a pure functional language, not only do we know that
3350 % %% no data structure will be unexpectedly mutated,
3351 % %% but we are forced to be
3352 % %% explicit about every input and output,
3353 % %% and we are encouraged to implement things compositionally.
3354 % %% This kind of thinking has helped us make
3355 % %% significant improvements to the already tricky work of Lerner, Grove,
3356 % %% and Chambers:
3357 % %% per-function control of shallow vs deep rewriting
3358 % %% (\secref{shallow-vs-deep}),
3359 % %% optimization fuel (\secref{fuel}),
3360 % %% and transparent management of unreachable blocks (\secref{arf-body}).
3361 % We~trust that the improvements are right only because they are
3362 % implemented in separate
3363 % parts of the code that cannot interact except through
3364 % explicit function calls.
3365 % %% %%
3366 % %% %%An ancestor of \ourlib{} is in the Glasgow Haskell Compiler today,
3367 % %% %%in version~6.12.
3368 % %% With this new, improved design in hand, we are now moving back to
3369 % %% live-compiler mode, pushing \hoopl\ into version
3370 % %% 6.13 of the Glasgow Haskell Compiler.
3371
3372
3373 \acks
3374
3375 Brian Huffman and Graham Hutton helped with algebraic laws.
3376 Several anonymous reviewers helped improve the
3377 presentation, especially
3378 reviewer~C, who suggested better language with which to describe our
3379 work.
3380
3381 The first and second authors were funded
3382 by a grant from Intel Corporation and
3383 by NSF awards CCF-0838899 and CCF-0311482.
3384 These authors also thank Microsoft Research Ltd, UK, for funding
3385 extended visits to the third author.
3386
3387
3388 \makeatother
3389
3390 \providecommand\includeftpref{\relax} %% total bafflement -- workaround
3391 \IfFileExists{nrbib.tex}{\bibliography{cs,ramsey}}{\bibliography{dfopt}}
3392 \bibliographystyle{plainnatx}
3393
3394
3395 \clearpage
3396
3397 \appendix
3398
3399 % omit LabelSet :: *
3400 % omit LabelMap :: *
3401 % omit delFromFactBase :: FactBase f -> [(Label,f)] -> FactBase f
3402 % omit elemFactBase :: Label -> FactBase f -> Bool
3403 % omit elemLabelSet :: Label -> LabelSet -> Bool
3404 % omit emptyLabelSet :: LabelSet
3405 % omit factBaseLabels :: FactBase f -> [Label]
3406 % omit extendFactBase :: FactBase f -> Label -> f -> FactBase f
3407 % omit extendLabelSet :: LabelSet -> Label -> LabelSet
3408 % omit lookupFact :: FactBase f -> Label -> Maybe f
3409 % omit factBaseList :: FactBase f -> [(Label, f)]
3410
3411 %% \section{Code for \textmd{\texttt{fixpoint}}}
3412 %% \label{app:fixpoint}
3413 %%
3414 %% {\def\baselinestretch{0.95}\hfuzz=20pt
3415 %% \begin{smallcode}
3416 %% data `TxFactBase n f
3417 %% = `TxFB { `tfb_fbase :: FactBase f
3418 %% , `tfb_rg :: DG n f C C -- Transformed blocks
3419 %% , `tfb_cha :: ChangeFlag
3420 %% , `tfb_lbls :: LabelSet }
3421 %% -- Set the tfb_cha flag iff
3422 %% -- (a) the fact in tfb_fbase for or a block L changes
3423 %% -- (b) L is in tfb_lbls.
3424 %% -- The tfb_lbls are all Labels of the *original*
3425 %% -- (not transformed) blocks
3426 %%
3427 %% `updateFact :: DataflowLattice f -> LabelSet -> (Label, f)
3428 %% -> (ChangeFlag, FactBase f)
3429 %% -> (ChangeFlag, FactBase f)
3430 %% updateFact ^lat ^lbls (lbl, ^new_fact) (^cha, fbase)
3431 %% | NoChange <- ^cha2 = (cha, fbase)
3432 %% | lbl `elemLabelSet` lbls = (SomeChange, new_fbase)
3433 %% | otherwise = (cha, new_fbase)
3434 %% where
3435 %% (cha2, ^res_fact)
3436 %% = case lookupFact fbase lbl of
3437 %% Nothing -> (SomeChange, new_fact)
3438 %% Just ^old_fact -> fact_extend lat old_fact new_fact
3439 %% ^new_fbase = extendFactBase fbase lbl res_fact
3440 %%
3441 %% fixpoint :: forall n f. NonLocal n
3442 %% => Bool -- Going forwards?
3443 %% -> DataflowLattice f
3444 %% -> (Block n C C -> FactBase f
3445 %% -> FuelMonad (DG n f C C, FactBase f))
3446 %% -> FactBase f -> [(Label, Block n C C)]
3447 %% -> FuelMonad (DG n f C C, FactBase f)
3448 %% fixpoint ^is_fwd lat ^do_block ^init_fbase ^blocks
3449 %% = do { ^fuel <- getFuel
3450 %% ; ^tx_fb <- loop fuel init_fbase
3451 %% ; return (tfb_rg tx_fb,
3452 %% tfb_fbase tx_fb `delFromFactBase` blocks) }
3453 %% -- The outgoing FactBase contains facts only for
3454 %% -- Labels *not* in the blocks of the graph
3455 %% where
3456 %% `tx_blocks :: [(Label, Block n C C)]
3457 %% -> TxFactBase n f -> FuelMonad (TxFactBase n f)
3458 %% tx_blocks [] tx_fb = return tx_fb
3459 %% tx_blocks ((lbl,blk):bs) tx_fb = tx_block lbl blk tx_fb
3460 %% >>= tx_blocks bs
3461 %%
3462 %% `tx_block :: Label -> Block n C C
3463 %% -> TxFactBase n f -> FuelMonad (TxFactBase n f)
3464 %% tx_block ^lbl ^blk tx_fb@(TxFB { tfb_fbase = fbase
3465 %% , tfb_lbls = lbls
3466 %% , tfb_rg = ^blks
3467 %% , tfb_cha = cha })
3468 %% | is_fwd && not (lbl `elemFactBase` fbase)
3469 %% = return tx_fb -- Note [Unreachable blocks]
3470 %% | otherwise
3471 %% = do { (rg, ^out_facts) <- do_block blk fbase
3472 %% ; let (^cha', ^fbase')
3473 %% = foldr (updateFact lat lbls) (cha,fbase)
3474 %% (factBaseList out_facts)
3475 %% ; return (TxFB { tfb_lbls = extendLabelSet lbls lbl
3476 %% , tfb_rg = rg `DGCatC` blks
3477 %% , tfb_fbase = fbase'
3478 %% , tfb_cha = cha' }) }
3479 %%
3480 %% loop :: Fuel -> FactBase f -> FuelMonad (TxFactBase n f)
3481 %% `loop fuel fbase
3482 %% = do { let ^init_tx_fb = TxFB { tfb_fbase = fbase
3483 %% , tfb_cha = NoChange
3484 %% , tfb_rg = DGNil
3485 %% , tfb_lbls = emptyLabelSet}
3486 %% ; tx_fb <- tx_blocks blocks init_tx_fb
3487 %% ; case tfb_cha tx_fb of
3488 %% NoChange -> return tx_fb
3489 %% SomeChange -> setFuel fuel >>
3490 %% loop fuel (tfb_fbase tx_fb) }
3491 %% \end{smallcode}
3492 %% \par
3493 %% } % end \baselinestretch
3494
3495
3496 \section{Index of defined identifiers}
3497
3498 This appendix lists every nontrivial identifier used in the body of
3499 the paper.
3500 For each identifier, we list the page on which that identifier is
3501 defined or discussed---or when appropriate, the figure (with line
3502 number where possible).
3503 For those few identifiers not defined or discussed in text, we give
3504 the type signature and the page on which the identifier is first
3505 referred to.
3506
3507 Some identifiers used in the text are defined in the Haskell Prelude;
3508 for those readers less familiar with Haskell (possible even at the
3509 Haskell Symposium!), these identifiers are
3510 listed in Appendix~\ref{sec:prelude}.
3511
3512 \newcommand\dropit[3][]{}
3513
3514 \newcommand\hsprelude[2]{\noindent
3515 \texttt{#1} defined in the Haskell Prelude\\}
3516 \let\hsprelude\dropit
3517
3518 \newcommand\hspagedef[3][]{\noindent
3519 \texttt{#2} defined on page~\pageref{#3}.\\}
3520 \newcommand\omithspagedef[3][]{\noindent
3521 \texttt{#2} not shown (but see page~\pageref{#3}).\\}
3522 \newcommand\omithsfigdef[3][]{\noindent
3523 \texttt{#2} not shown (but see Figure~\ref{#3} on page~\pageref{#3}).\\}
3524 \newcommand\hsfigdef[3][]{%
3525 \noindent
3526 \ifx!#1!%
3527 \texttt{#2} defined in Figure~\ref{#3} on page~\pageref{#3}.\\
3528 \else
3529 \texttt{#2} defined on \lineref{#1} of Figure~\ref{#3} on page~\pageref{#3}.\\
3530 \fi
3531 }
3532 \newcommand\hstabdef[3][]{%
3533 \noindent
3534 \ifx!#1!
3535 \texttt{#2} defined in Table~\ref{#3} on page~\pageref{#3}.\\
3536 \else
3537 \texttt{#2} defined on \lineref{#1} of Table~\ref{#3} on page~\pageref{#3}.\\
3538 \fi
3539 }
3540 \newcommand\hspagedefll[3][]{\noindent
3541 \texttt{#2} {let}- or $\lambda$-bound on page~\pageref{#3}.\\}
3542 \newcommand\hsfigdefll[3][]{%
3543 \noindent
3544 \ifx!#1!%
3545 \texttt{#2} {let}- or $\lambda$-bound in Figure~\ref{#3} on page~\pageref{#3}.\\
3546 \else
3547 \texttt{#2} {let}- or $\lambda$-bound on \lineref{#1} of Figure~\ref{#3} on page~\pageref{#3}.\\
3548 \fi
3549 }
3550
3551 \newcommand\nothspagedef[3][]{\notdefd\ndpage{#1}{#2}{#3}}
3552 \newcommand\nothsfigdef[3][]{\notdefd\ndfig{#1}{#2}{#3}}
3553 \newcommand\nothslinedef[3][]{\notdefd\ndline{#1}{#2}{#3}}
3554
3555 \newcommand\ndpage[3]{\texttt{#2}~(p\pageref{#3})}
3556 \newcommand\ndfig[3]{\texttt{#2}~(Fig~\ref{#3},~p\pageref{#3})}
3557 \newcommand\ndline[3]{%
3558 \ifx!#1!%
3559 \ndfig{#1}{#2}{#3}%
3560 \else
3561 \texttt{#2}~(Fig~\ref{#3}, line~\lineref{#1}, p\pageref{#3})%
3562 \fi
3563 }
3564
3565 \newif\ifundefinedsection\undefinedsectionfalse
3566
3567 \newcommand\notdefd[4]{%
3568 \ifundefinedsection
3569 , #1{#2}{#3}{#4}%
3570 \else
3571 \undefinedsectiontrue
3572 \par
3573 \section{Undefined identifiers}
3574 #1{#2}{#3}{#4}%
3575 \fi
3576 }
3577
3578 \begingroup
3579 \raggedright
3580
3581 \input{defuse}%
3582 \ifundefinedsection.\fi
3583
3584 \undefinedsectionfalse
3585
3586
3587 \renewcommand\hsprelude[2]{\noindent
3588 \ifundefinedsection
3589 , \texttt{#1}%
3590 \else
3591 \undefinedsectiontrue
3592 \par
3593 \section{Identifiers defined in Haskell Prelude or a standard library}\label{sec:prelude}
3594 \texttt{#1}%
3595 \fi
3596 }
3597 \let\hspagedef\dropit
3598 \let\omithspagedef\dropit
3599 \let\omithsfigdef\dropit
3600 \let\hsfigdef\dropit
3601 \let\hstabdef\dropit
3602 \let\hspagedefll\dropit
3603 \let\hsfigdefll\dropit
3604 \let\nothspagedef\dropit
3605 \let\nothsfigdef\dropit
3606 \let\nothslinedef\dropit
3607
3608 \input{defuse}
3609 \ifundefinedsection.\fi
3610
3611
3612
3613 \endgroup
3614
3615
3616 \iffalse
3617
3618 \section{Dataflow-engine functions}
3619
3620
3621 \begin{figure*}
3622 \setcounter{codeline}{0}
3623 \begin{numberedcode}
3624 \end{numberedcode}
3625 \caption{The forward iterator}
3626 \end{figure*}
3627
3628 \begin{figure*}
3629 \setcounter{codeline}{0}
3630 \begin{numberedcode}
3631 \end{numberedcode}
3632 \caption{The forward actualizer}
3633 \end{figure*}
3634
3635
3636 \fi
3637
3638
3639
3640 \end{document}
3641
3642
3643
3644
3645 THE FUEL PROBLEM:
3646
3647
3648 Here is the problem:
3649
3650 A graph has an entry sequence, a body, and an exit sequence.
3651 Correctly computing facts on and flowing out of the body requires
3652 iteration; computation on the entry and exit sequences do not, since
3653 each is connected to the body by exactly one flow edge.
3654
3655 The problem is to provide the correct fuel supply to the combined
3656 analysis/rewrite (iterator) functions, so that speculative rewriting
3657 is limited by the fuel supply.
3658
3659 I will number iterations from 1 and name the fuel supplies as
3660 follows:
3661
3662 f_pre fuel remaining before analysis/rewriting starts
3663 f_0 fuel remaining after analysis/rewriting of the entry sequence
3664 f_i, i>0 fuel remaining after iteration i of the body
3665 f_post fuel remaining after analysis/rewriting of the exit sequence
3666
3667 The issue here is that only the last iteration of the body 'counts'.
3668 To formalize, I will name fuel consumed:
3669
3670 C_pre fuel consumed by speculative rewrites in entry sequence
3671 C_i fuel consumed by speculative rewrites in iteration i of body
3672 C_post fuel consumed by speculative rewrites in exit sequence
3673
3674 These quantities should be related as follows:
3675
3676 f_0 = f_pre - C_pref
3677 f_i = f_0 - C_i where i > 0
3678 f_post = f_n - C_post where iteration converges after n steps
3679
3680 When the fuel supply is passed explicitly as parameter and result, it
3681 is fairly easy to see how to keep reusing f_0 at every iteration, then
3682 extract f_n for use before the exit sequence. It is not obvious to me
3683 how to do it cleanly using the fuel monad.
3684
3685
3686 Norman