1979c03ce27d13c4d1d17ddadae52a2869151f49
[packages/hoopl.git] / paper / dfopt.tex
1 \iffalse
2
3 Payload: alternatives for functions polymorphic in node type
4
5 (n C O -> a, n C O -> b, n C O -> c)
6
7 vs
8
9 (forall e x . n e x -> ShapeTag e x, forall e x . n e x -> result e x)
10
11 where
12
13 data ShapeTag e x where
14 First :: ShapeTag C O
15 Middle :: ShapeTag O O
16 Last :: ShapeTag O C
17
18 result C O = a
19 result O O = b
20 result O C = c
21
22 The function returning ShapeTag can be passed in a type-class
23 dictionary.
24
25
26
27
28 Say something about the proliferation of heavyweight type signatures
29 required for GADT pattern matches. When the signature is three times
30 the size of the function, something is wrong...
31
32
33 I'm going to leave this point out, because in all the client code
34 we've written, we are (a) matching only on a node, and (b) matching on
35 all cases. In this scenario the GADT exhaustiveness checker provides
36 no additional benefits. Indeed, GADTs can be an irritant to the
37 client: in many pattern matches, GADTs make it impossible to default
38 cases.
39
40
41 I was thinking again about unwrapped nodes, cons-lists, snoc-lists,
42 and tree fringe. I think there's an analogy with ordinary lists, and
43 the analogy is 'concatMap' vs 'fold'. Just as with lists, the common
44 case of 'concatMap (\a -> [a])' does a lot of unnecessary wrapping and
45 unwrapping of elements. We nevertheless prefer 'concatMap' because it
46 provides better separation of concerns. But this analogy suggests
47 several ideas:
48
49 - Can block processing be written in higher-order fashion?
50
51 - Can it be done in both styles (concatMap *and* fold)?
52
53 - Once we have framed the problem in these terms, can we write
54 fold-style cold that is not too hard to understand and maintain?
55
56 - Finally, is there an analog of stream fusion that would work for
57 control-flow graphs, enabling us to write some graph combinators
58 that be both perspicuous and efficient?
59
60 These are good observations for the paper and for future work.
61
62
63 ----------------------------------------------------------------
64
65
66 P.S. The three of us should have a nice little Skype chat about
67 higher-rank types. A lot of the questions still at issue boil down to
68 the following observations:
69
70 - It's really convenient for the *implementation* of Hoopl to put
71 forall to the left of an arrow. Polymorphic functions as
72 arguments are powerful and keen, and they make it really easy for
73 Hoopl to do its job.
74
75 - It's a bid inconvenient for a *client* of Hoopl to be forced to
76 supply functions that are polymorphic in shape. All constructors
77 have to be written out by hand; defaulting is not possible. (Or
78 at least John and I haven't figured out how to do it.)
79
80 - Using higher-rank types in Hoopl's interface makes some very
81 desirable things downright impossible:
82
83 - One can't write a generic dominator analysis with type
84
85 dom :: NonLocal n => FwdPass n Dominators
86
87 - One can't write a generic debugging wrapper of type
88
89 debug :: (Show n, Show f)
90 => TraceFn -> WhatToTrace -> FwdPass n f -> FwdPass n f
91
92 - One can't write a cominator of type
93
94 product :: FwdPass n f -> FwdPass n f' -> FwdPass n (f, f')
95
96 I submit that these things are all very desirable.
97
98 I'm therefore strongly in favor of removing the *requirement* that a
99 FwdPass include transfer and rewrite functions that are polymorphic in
100 shape. It will be a bit tedious to return to triples of functions,
101 but for those lucky clients who *can* write polymorphic functions and
102 who wish to, we can provide injection functions.
103
104 I should stress that I believe the tedium can be contained within
105 reasonable bounds---for example, the arfXXX functions that are
106 internal to Hoopl all remain polymorphic in shape (although not
107 higher-rank any longer).
108
109 \fi
110
111
112 %
113 % TODO
114 %
115 %
116 % AGraph = graph under construction
117 % Graph = replacement graph
118 %
119 %%% Hoopl assumes that the replacement graph for a node open at the exit
120 %%% doesn't contain any additional exits
121 %%%
122 %%% introduce replacement graph in section 3, after graph.
123 %%% it has arbitrarily complicated internal control flow, but only
124 %%% one exit (if open at the exit)
125 %%%
126 %%% a rquirement on the client that is not checked statically.
127
128
129 \input{dfoptdu.tex}
130
131 \newif\ifpagetuning \pagetuningtrue % adjust page breaks
132
133 \newif\ifnoauthornotes \noauthornotesfalse
134
135 \newif\iftimestamp\timestamptrue % show MD5 stamp of paper
136
137 % \timestampfalse % it's post-submission time
138
139 \IfFileExists{timestamp.tex}{}{\timestampfalse}
140
141 \newif\ifcutting \cuttingfalse % cutting down to submission size
142
143
144 \newif\ifgenkill\genkillfalse % have a section on gen and kill
145 \genkillfalse
146
147
148 \newif\ifnotesinmargin \notesinmarginfalse
149 \IfFileExists{notesinmargin.tex}{\notesinmargintrue}{\relax}
150
151 \documentclass[blockstyle,preprint,natbib,nocopyrightspace]{sigplanconf}
152
153 \newcommand\ourlib{Hoopl}
154 % higher-order optimization library
155 % ('Hoople' was taken -- see hoople.org)
156 \let\hoopl\ourlib
157
158 \newcommand\ag{\ensuremath{\mathit{ag}}}
159 \renewcommand\ag{\ensuremath{g}} % not really seeing that 'ag' is helpful here ---NR
160 \newcommand\rw{\ensuremath{\mathit{rw}}}
161
162 % l2h substitution ourlib Hoopl
163 % l2h substitution hoopl Hoopl
164
165 \newcommand\fs{\ensuremath{\mathit{fs}}} % dataflow facts, possibly plural
166
167 \newcommand\vfilbreak[1][\baselineskip]{%
168 \vskip 0pt plus #1 \penalty -200 \vskip 0pt plus -#1 }
169
170 \usepackage{alltt}
171 \usepackage{array}
172 \usepackage{afterpage}
173 \newcommand\lbr{\char`\{}
174 \newcommand\rbr{\char`\}}
175
176 \clubpenalty=10000
177 \widowpenalty=10000
178
179 \usepackage{verbatim} % allows to define \begin{smallcode}
180 \newenvironment{smallcode}{\par\unskip\small\verbatim}{\endverbatim}
181 \newenvironment{fuzzcode}[1]{\par\unskip\hfuzz=#1 \verbatim}{\endverbatim}
182 \newenvironment{smallfuzzcode}[1]{\par\unskip\small\hfuzz=#1 \verbatim}{\endverbatim}
183
184 \newcommand\smallverbatiminput[1]{{\small\verbatiminput{#1}}}
185
186 \newcommand\lineref[1]{line~\ref{line:#1}}
187 \newcommand\linepairref[2]{lines \ref{line:#1}~and~\ref{line:#2}}
188 \newcommand\linerangeref[2]{\mbox{lines~\ref{line:#1}--\ref{line:#2}}}
189 \newcommand\Lineref[1]{Line~\ref{line:#1}}
190 \newcommand\Linepairref[2]{Lines \ref{line:#1}~and~\ref{line:#2}}
191 \newcommand\Linerangeref[2]{\mbox{Lines~\ref{line:#1}--\ref{line:#2}}}
192
193 \makeatletter
194
195 \let\c@table=
196 \c@figure % one counter for tables and figures, please
197
198 \newcommand\setlabel[1]{%
199 \setlabel@#1!!\@endsetlabel
200 }
201 \def\setlabel@#1!#2!#3\@endsetlabel{%
202 \ifx*#1*% line begins with label or is empty
203 \ifx*#2*% line is empty
204 \verbatim@line{}%
205 \else
206 \@stripbangs#3\@endsetlabel%
207 \label{line:#2}%
208 \fi
209 \else
210 \@stripbangs#1!#2!#3\@endsetlabel%
211 \fi
212 }
213 \def\@stripbangs#1!!\@endsetlabel{%
214 \verbatim@line{#1}%
215 }
216
217
218 \verbatim@line{hello mama}
219
220 \newcommand{\numberedcodebackspace}{0.5\baselineskip}
221
222 \newcounter{codeline}
223 \newenvironment{numberedcode}
224 {\endgraf
225 \def\verbatim@processline{%
226 \noindent
227 \expandafter\ifx\expandafter+\the\verbatim@line+ % blank line
228 %{\small\textit{\def\rmdefault{cmr}\rmfamily\phantom{00}\phantom{: \,}}}%
229 \else
230 \refstepcounter{codeline}%
231 {\small\textit{\def\rmdefault{cmr}\rmfamily\phantom{00}\llap{\arabic{codeline}}: \,}}%
232 \fi
233 \expandafter\setlabel\expandafter{\the\verbatim@line}%
234 \expandafter\ifx\expandafter+\the\verbatim@line+ % blank line
235 \vspace*{-\numberedcodebackspace}\par%
236 \else
237 \the\verbatim@line\par
238 \fi}%
239 \verbatim
240 }
241 {\endverbatim}
242
243 \makeatother
244
245 \newcommand\arrow{\rightarrow}
246
247 \newcommand\join{\sqcup}
248 \newcommand\slotof[1]{\ensuremath{s_{#1}}}
249 \newcommand\tempof[1]{\ensuremath{t_{#1}}}
250 \let\tempOf=\tempof
251 \let\slotOf=\slotof
252
253 \makeatletter
254 \newcommand{\nrmono}[1]{%
255 {\@tempdima = \fontdimen2\font\relax
256 \texttt{\spaceskip = 1.1\@tempdima #1}}}
257 \makeatother
258
259 \usepackage{times} % denser fonts
260 \renewcommand{\ttdefault}{aett} % \texttt that goes better with times fonts
261 \usepackage{enumerate}
262 \usepackage{url}
263 \usepackage{graphicx}
264 \usepackage{natbib} % redundant for Simon
265 \bibpunct();A{},
266 \let\cite\citep
267 \let\citeyearnopar=\citeyear
268 \let\citeyear=\citeyearpar
269
270 \usepackage[ps2pdf,bookmarksopen,breaklinks,pdftitle=dataflow-made-simple]{hyperref}
271 \usepackage{breakurl} % enables \burl
272
273 \newcommand\naive{na\"\i ve}
274 \newcommand\naively{na\"\i vely}
275 \newcommand\Naive{Na\"\i ve}
276
277 \usepackage{amsfonts}
278 \newcommand\naturals{\ensuremath{\mathbb{N}}}
279 \newcommand\true{\ensuremath{\mathbf{true}}}
280 \newcommand\implies{\supseteq} % could use \Rightarrow?
281
282 \newcommand\PAL{\mbox{C{\texttt{-{}-}}}}
283 \newcommand\high[1]{\mbox{\fboxsep=1pt \smash{\fbox{\vrule height 6pt
284 depth 0pt width 0pt \leavevmode \kern 1pt #1}}}}
285
286 \usepackage{tabularx}
287
288 %%
289 %% 2009/05/10: removed 'float' package because it breaks multiple
290 %% \caption's per {figure} environment. ---NR
291 %%
292 %% % Put figures in boxes --- WHY??? --NR
293 %% \usepackage{float}
294 %% \floatstyle{boxed}
295 %% \restylefloat{figure}
296 %% \restylefloat{table}
297
298
299
300 % ON LINE THREE, set \noauthornotestrue to suppress notes (or not)
301
302 %\newcommand{\qed}{QED}
303 \ifnotesinmargin
304 \long\def\authornote#1{%
305 \ifvmode
306 \marginpar{\raggedright\hbadness=10000
307 \parindent=8pt \parskip=2pt
308 \def\baselinestretch{0.8}\tiny
309 \itshape\noindent #1\par}%
310 \else
311 \unskip\raisebox{-3.5pt}{\rlap{$\scriptstyle\diamond$}}%
312 \marginpar{\raggedright\hbadness=10000
313 \parindent=8pt \parskip=2pt
314 \def\baselinestretch{0.8}\tiny
315 \itshape\noindent #1\par}%
316 \fi}
317 \else
318 % Simon: please set \notesinmarginfalse on the first line
319 \long\def\authornote#1{{\em #1\/}}
320 \fi
321 \ifnoauthornotes
322 \def\authornote#1{\unskip\relax}
323 \fi
324
325 \newcommand{\simon}[1]{\authornote{SLPJ: #1}}
326 \newcommand{\norman}[1]{\authornote{NR: #1}}
327 \let\remark\norman
328 \def\finalremark#1{\relax}
329 % \let \finalremark \remark % uncomment after submission
330 \newcommand{\john}[1]{\authornote{JD: #1}}
331 \newcommand{\todo}[1]{\textbf{To~do:} \emph{#1}}
332 \newcommand\delendum[1]{\relax\ifvmode\else\unskip\fi\relax}
333
334 \newcommand\secref[1]{Section~\ref{sec:#1}}
335 \newcommand\secreftwo[2]{Sections \ref{sec:#1}~and~\ref{sec:#2}}
336 \newcommand\seclabel[1]{\label{sec:#1}}
337
338 \newcommand\figref[1]{Figure~\ref{fig:#1}}
339 \newcommand\figreftwo[2]{Figures \ref{fig:#1}~and~\ref{fig:#2}}
340 \newcommand\figlabel[1]{\label{fig:#1}}
341
342 \newcommand\tabref[1]{Table~\ref{tab:#1}}
343 \newcommand\tablabel[1]{\label{tab:#1}}
344
345
346 \newcommand{\CPS}{\textbf{StkMan}} % Not sure what to call it.
347
348
349 \usepackage{code} % At-sign notation
350
351 \iftimestamp
352 \input{timestamp}
353 \preprintfooter{\mdfivestamp}
354 \fi
355
356 \hyphenation{there-by}
357
358 \renewcommand{\floatpagefraction}{0.9} % must be less than \topfraction
359 \renewcommand{\topfraction}{0.95}
360 \renewcommand{\textfraction}{0.05}
361
362 \begin{document}
363 %\title{\ourlib: Dataflow Optimization Made Simple}
364 \title{\ourlib: A Modular, Reusable Library for\\ Dataflow Analysis and Transformation}
365 %\title{Implementing Dataflow Analysis and Optimization by Lifting Node Functions to Basic Blocks and Control-Flow Graphs}
366 %\subtitle{Programming pearl}
367
368 %% \titlebanner{\textsf{\mdseries\itshape
369 %% Under consideration for publication in the ACM Haskell Symposium.
370 %% Comments
371 %% are welcome; please identify this version by the words
372 %% \textbf{\mdfivewords}.
373 %% }}
374
375 \ifnoauthornotes
376 \makeatletter
377 \let\HyPsd@Warning=
378 \@gobble
379 \makeatother
380 \fi
381
382 % João
383
384
385 \authorinfo{Norman Ramsey}{Tufts University}{nr@cs.tufts.edu}
386 \authorinfo{Jo\~ao Dias}{Tufts University}{dias@cs.tufts.edu}
387 \authorinfo{Simon Peyton Jones}{Microsoft Research}{simonpj@microsoft.com}
388
389
390 \maketitle
391
392 \begin{abstract}
393 \iffalse % A vote for Simon's abstract
394 \remark{I have replaced a good abstract of the POPL submission with a
395 bad abstract of \emph{this} submission.}
396 We present \ourlib, a Haskell library that makes it easy for a
397 compiler writer
398 to implement program transformations based on dataflow analyses.
399 A~client of \ourlib\ defines a representation of
400 logical assertions,
401 a transfer function that computes outgoing assertions from incoming
402 assertions,
403 and a rewrite function that improves code when improvements are
404 justified by the assertions.
405 \ourlib\ does the actual analysis and transformation.
406
407 \ourlib\ implements state-of-the art algorithms:
408 Lerner, Grove, and Chambers's
409 \citeyearpar{lerner-grove-chambers:2002}
410 composition of simple analyses and transformations, which achieves
411 the same precision as complex, handwritten
412 ``super-analyses;''
413 and Whalley's \citeyearpar{whalley:isolation} dynamic technique for
414 isolating bugs in a client's code.
415 \ourlib's implementation is unique in that unlike previous
416 implementations,
417 it carefully separates the tricky
418 elements of each of these algorithms, so that they can be examined and
419 understood independently.
420
421
422 \simon{Here is an alternative abstract based on the four-sentence model.}
423 \remark{Four-sentence model? You must teach me\ldots}
424 \fi
425 Dataflow analysis and transformation of control-flow graphs is
426 pervasive in optimizing compilers, but it is typically tightly
427 interwoven with the details of a \emph{particular} compiler.
428 We~describe \ourlib{}, a reusable Haskell library that makes it
429 unusually easy to define new analyses and
430 transformations for \emph{any} compiler.
431 \ourlib's interface is modular and polymorphic,
432 and it offers unusually strong static guarantees.
433 The implementation
434 is also far from routine: it encapsulates
435 state-of-the-art algorithms (interleaved analysis and rewriting,
436 dynamic error isolation), and it cleanly separates their tricky elements
437 so that they can be understood independently.
438 %
439 %\ourlib\ will be the workhorse of a new
440 %back end for the Glasgow Haskell Compiler (version~6.14, forthcoming).
441
442 %% \emph{Reviewers:} code examples are indexed at {\small\url{http://bit.ly/jkr3K}}
443 %%% Source: http://www.cs.tufts.edu/~nr/drop/popl-index.pdf
444 \end{abstract}
445
446 \makeatactive % Enable @foo@ notation
447
448 \section{Introduction}
449
450 A mature optimizing compiler for an imperative language includes many
451 analyses, the results of which justify the optimizer's
452 code-improving transformations.
453 Many of the most important analyses and transformations---constant
454 propagation, live-variable analysis, inlining, sinking of loads,
455 and so on---should be regarded as particular cases of
456 a single general problem: \emph{dataflow analysis and optimization}.
457 %% \remark{I do not feel compelled to cite Muchnick (or anyone else) here}
458 Dataflow analysis is over thirty years old,
459 but a recent, seminal paper by \citet{lerner-grove-chambers:2002} goes further,
460 describing a powerful but subtle way to
461 \emph{interleave} analysis and transformation so that each
462 piggybacks on the other.
463
464 Because optimizations based on dataflow analysis
465 share a common intellectual framework, and because that framework is
466 subtle, it it tempting to
467 try to build a single reusable library that embodies the
468 subtle ideas, while
469 making it easy for clients to instantiate the library for different
470 situations.
471 Tempting, but difficult.
472 Although some such frameworks exist, as we discuss
473 in \secref{related}, they have complex APIs and implementations,
474 and none implements the Lerner/Grove/Chambers technique.
475
476 In this paper we present \ourlib{} (short for ``higher-order
477 optimization library''), a new Haskell library for dataflow analysis and
478 optimization. It has the following distinctive characteristics:
479
480 \begin{itemize}
481 \item
482 \ourlib\ is purely functional.
483 Perhaps surprisingly, code that
484 manipulates control-flow graphs is easier to write, and far easier
485 to write correctly, when written in a purely functional style
486 \cite{ramsey-dias:applicative-flow-graph}.
487 When analysis and rewriting
488 are interleaved, so that rewriting must be done \emph{speculatively},
489 without knowing whether
490 the result of the rewrite will be retained or discarded,
491 the benefit of a purely functional style is intensified
492 (Sections \ref{sec:overview} and \ref{sec:fixpoints}).
493
494 \item
495 \ourlib\ is polymorphic. Just as a list library is
496 polymorphic in the list elements, so is \ourlib{} polymorphic, both in
497 the nodes that inhabit graphs, and in the dataflow facts that
498 analyses compute over these graphs (\secref{using-hoopl}).
499
500 \item The paper by Lerner, Grove, and Chambers is inspiring but abstract.
501 We articulate their ideas in a concrete but simple API that hides
502 a subtle implementation (Sections \ref{sec:graph-rep} and \ref{sec:using-hoopl}).
503 You provide a representation for assertions,
504 a transfer function that transforms assertions across a node,
505 and a rewrite function that uses a assertion to
506 justify rewriting a node.
507 \ourlib\ ``lifts'' these node-level functions to work over
508 control-flow graphs, sets up and solves recursion equations,
509 and interleaves rewriting with analysis.
510 Designing good abstractions (data types, APIs) is surprisingly
511 hard; we have been through over a dozen significantly different
512 iterations, and we offer our API as a contribution.
513 % ``in its own right'' -- there's an echo in here...
514
515 \item
516 Because the client
517 can perform very local reasoning (``@y@ is live before
518 @x:=y+2@''),%
519 \footnote
520 {Using \hoopl, it is not necessary to have the more complex rule
521 ``if @x@~is live after @x:=y+2@ then @y@ is live before~it,''
522 because if @x@~is \emph{not} live after @x:=y+2@, the assignment
523 @x:=y+2@ will be eliminated.}
524 analyses and transformations built on \ourlib\
525 are small, simple, and easy to get right.
526 Moreover, \ourlib\ helps you write correct optimizations:
527 it~statically rules out transformations that violate invariants
528 of the control-flow graph (Sections \ref{sec:graph-rep} and \ref{sec:rewrites}),
529 and dynamically it can help find the first transformation that introduces a fault
530 in a test program (\secref{fuel}).
531 \finalremark{SLPJ: I wanted to write more about open/closed,
532 but I like this sentence with its claim to both static and dynamic assistance,
533 and maybe the open/closed story is hard to understand here.}
534
535 % \item \ourlib{} makes use of GADTS and type functions to offer unusually
536 % strong static guarantees. In particular, nodes, basic blocks, and
537 % graphs are all statically typed by their open or closedness on entry, and
538 % their open or closedness on exit (\secref{graph-rep}). For example, an add instruction is
539 % open on entry and exit, while a branch is open on entry and closed on exit.
540 % Using these types we can statically guarantee that, say, an add instruction
541 % is rewritten to a graph that is also open on both entry and exit; and
542 % that the user cannot construct a block where an add instruction follows an
543 % unconditional branch. We know of no other system that offers
544 % static guarantees this strong.
545
546 \item \ourlib{} implements subtle algorithms, including
547 (a)~interleaved analysis and rewriting, (b)~speculative rewriting,
548 (c)~computing fixed points, and (d)~dynamic fault isolation.
549 Previous implementations of these algorithms---including three of our
550 own---are complicated and hard to understand, because the tricky pieces
551 are implemented all together, inseparably.
552 A~significant contribution of this paper is
553 a new way to structure the implementation so that each tricky piece
554 is handled in just
555 one place, separate from all the others (\secref{engine}).
556 %\remark{This is a very important claim---is
557 %it substantiated in \secref{engine}? And shouldn't the word
558 %\textbf{contribution} appear in this~\P?}
559 % \simon{Better?} % yes thanks ---NR
560 The result is sufficiently elegant that
561 we emphasize the implementation as an object of interest in
562 its own right.
563 \end{itemize}
564 A working prototype of \ourlib{} is available from
565 \burl{http://ghc.cs.tufts.edu/hoopl} and also from Hackage.
566 It is no toy: an ancestor of this library is
567 part of the Glasgow Haskell Compiler, where it optimizes the
568 imperative {\PAL} code in GHC's back end. The new design is far
569 nicer, and it will be in GHC shortly.
570
571 The API for \ourlib{} seems quite natural, but it requires
572 relatively sophisticated aspects of Haskell's type system, such
573 as higher-rank polymorphism, GADTs, and type functions.
574 As such, \ourlib{} offers a compelling case study in the utility
575 of these features.
576
577
578 \section{Dataflow analysis {\&} transformation by \texorpdfstring{\rlap{example}}{example}}
579 \seclabel{overview}
580 \seclabel{constant-propagation}
581 \seclabel{example:transforms}
582 \seclabel{example:xforms}
583
584 We begin by setting the scene, introducing some vocabulary, and
585 showing a small motivating example.
586 A control-flow graph, perhaps representing the body of a procedure,
587 is a collection of \emph{basic blocks}---or just ``blocks''.
588 Each block is a sequence of instructions,
589 beginning with a label and ending with a
590 control-transfer instruction that branches to other blocks.
591 % Each block has a label at the beginning,
592 % a sequence of
593 % -- each of which has a label at the
594 % beginning. Each block may branch to other blocks with arbitrarily
595 % complex control flow.
596 The goal of dataflow optimization is to compute valid
597 \emph{assertions} (or \emph{dataflow facts}),
598 then use those assertions to justify code-improving
599 transformations (or \emph{rewrites}) on a \emph{control-flow graph}.
600
601 Consider a concrete example: constant propagation with constant folding.
602 On the left we have a basic block; in the middle we have
603 facts that hold between statements (or \emph{nodes})
604 in the block; and at
605 the right we have the result of transforming the block
606 based on the assertions:
607 \begin{verbatim}
608 Before Facts After
609 ------------{}-------------
610 x := 3+4 x := 7
611 ----------{x=7}------------
612 z := x>5 z := True
613 -------{x=7, z=True}-------
614 if z goto L1
615 then goto L1
616 else goto L2
617 \end{verbatim}
618 Constant propagation works
619 from top to bottom. We start with the empty fact.
620 Given the empty fact and the node @x:=3+4@ can we make a (constant-folding)
621 transformation?
622 Yes! We can replace the node with @x:=7@.
623 Now, given this transformed node,
624 and the original fact, what fact flows out of the bottom of
625 the transformed node?
626 The~fact \{@x=7@\}.
627 Given the fact \{@x=7@\} and the node @z:=x>5@, can we make a
628 transformation? Yes: constant propagation can replace the node with @z:=7>5@.
629 Now, can we do another transformation? Yes: constant folding can
630 replace the node with @z:=True@.
631 And so the process continues to the end of the block, where we
632 can replace the conditional branch with an unconditional one, @goto L1@.
633
634 The example above is simple because the program has only straightline code;
635 when programs have loops, dataflow analysis gets more complicated.
636 For example,
637 consider the following graph,
638 where we assume @L1@ is the entry point:
639 \begin{verbatim}
640 L1: x=3; y=4; if z then goto L2 else goto L3
641 L2: x=7; goto L3
642 L3: ...
643 \end{verbatim}
644 Because control flows to @L3@ from two places,
645 we must \emph{join} the facts coming from those two places.
646 All paths to @L3@ produce the fact @y=@$4$,
647 so we can conclude that this fact holds at @L3@.
648 But depending on the the path to @L3@, @x@ may have different
649 values, so we conclude ``@x=@$\top$'',
650 meaning that there is no single value held by @x@ at @L3@.%
651 \footnote{
652 In this example @x@ really does vary at @L3@, but in general
653 the analysis might be conservative.}
654 The final result of joining the dataflow facts that flow to @L3@
655 is the new fact $@x=@\top \land @y=4@ \land @z=@\top$.
656
657 \seclabel{const-prop-example}
658
659 \seclabel{simple-tx}
660 \paragraph{Interleaved transformation and analysis.}
661 Our example \emph{interleaves} transformation and analysis.
662 Interleaving makes it far easier to write effective analyses.
663 If, instead, we \emph{first} analyzed the block
664 and \emph{then} transformed it, the analysis would have to ``predict''
665 the transformations.
666 For example, given the incoming fact \{@x=7@\}
667 and the instruction @z:=x>5@,
668 a pure analysis could produce the outgoing fact
669 \{@x=7@, @z=True@\} by simplifying @x>5@ to @True@.
670 But the subsequent transformation must perform
671 \emph{exactly the same simplification} when it transforms the instruction to @z:=True@!
672 If instead we \emph{first} rewrite the node to @z:=True@,
673 and \emph{then} apply the transfer function to the new node,
674 the transfer function becomes laughably simple: it merely has to see if the
675 right hand side is a constant (you can see actual code in \secref{const-prop-client}).
676 The gain is even more compelling if there are a number of interacting
677 analyses and/or transformations; for more substantial
678 examples, consult \citet{lerner-grove-chambers:2002}.
679
680 \paragraph{Forwards and backwards.}
681 Constant propagation works \emph{forwards}, and a fact is typically an
682 assertion about the program state (such as ``variable~@x@ holds value~@7@'').
683 Some useful analyses work \emph{backwards}.
684 A~prime example is live-variable analysis, where a fact takes the~form
685 ``variable @x@ is live'' and is an assertion about the
686 \emph{continuation} of a program point. For example, the fact ``@x@~is
687 live'' at a program point P is an assertion that @x@ is used on some program
688 path starting at \mbox{P}. % TeXbook, exercise 12.6
689 The accompanying transformation is called dead-code elimination;
690 if @x@~is not live, this transformation
691 replaces the node @x:=e@ with a no-op.
692
693 % ----------------------------------------
694 \section{Representing control-flow graphs} \seclabel{graph-rep}
695
696 \ourlib{} is a library that makes it easy to define dataflow analyses,
697 and transformations driven by these analyses, on control-flow graphs.
698 Graphs are composed from smaller units, which we discuss from the
699 bottom up:\finalremark{something about replacement graphs?}
700 \begin{itemize}
701 \item A \emph{node} is defined by \ourlib's client;
702 \ourlib{} knows nothing about the representation of nodes (\secref{nodes}).
703 \item A basic \emph{block} is a sequence of nodes (\secref{blocks}).
704 \item A \emph{graph} is an arbitrarily complicated control-flow graph,
705 composed from basic blocks (\secref{graphs}).
706 \end{itemize}
707
708 \subsection{Shapes: Open and closed}
709
710 Nodes, blocks, and graphs share important properties in common.
711 In particular, each can be \emph{open or closed at entry}
712 and \emph{open or closed at exit}.
713 An \emph{open} point is one at which control may implicitly ``fall through;''
714 to transfer control at a \emph{closed} point requires an explicit
715 control-transfer instruction to a named label.
716 For example,
717 \begin{itemize}
718 \item A shift-left instruction is open on entry (because control can fall into it
719 from the preceding instruction), and open on exit (because control falls through
720 to the next instruction).
721 \item An unconditional branch is open on entry, but closed on exit (because
722 control cannot fall through to the next instruction).
723 \item A label is closed on entry (because in \ourlib{} we do not allow
724 control to fall through into a branch target), but open on exit.
725 \end{itemize}
726 % This taxonomy enables \ourlib{} to enforce invariants:
727 % only nodes closed at entry can be the targets of branches, and only nodes closed
728 % at exits can transfer control (see also \secref{edges}).
729 % As~a consequence, all control transfers originate at control-transfer
730 % instructions and terminated at labels; this invariant dramatically
731 % simplifies analysis and transformation.
732 These examples concern nodes, but the same classification applies
733 to blocks and graphs. For example the block
734 \begin{code}
735 x:=7; y:=x+2; goto L
736 \end{code}
737 is open on entry and closed on exit.
738 This is the block's \emph{shape}, which we often abbreviate
739 ``open/closed;''
740 we may refer to an ``open/closed block.''
741
742 The shape of a thing determines that thing's control-flow properties.
743 In particular, whenever E is a node, block, or graph,
744 % : \simon{Removed the claim about a unique entry point.}
745 \begin{itemize}
746 \item
747 If E is open at the entry, it has a unique predecessor;
748 if it is closed, it may have arbitrarily many predecessors---or none.
749 \item
750 If E is open at the exit, it has a unique successor;
751 if it is closed, it may have arbitrarily many successors---or none.
752 \end{itemize}
753 %%%%
754 %%%%
755 %%%% % \item Regardless of whether E is open or closed,
756 %%%% % it has a unique entry point where execution begins.
757 %%%% \item If E is closed at exit, control leaves \emph{only}
758 %%%% by explicit branches from closed-on-exit nodes.
759 %%%% \item If E is open at exit, control \emph{may} leave E
760 %%%% by ``falling through'' from a distinguished exit point.
761 %%%% \remark{If E is a node or block, control \emph{only} leaves E by
762 %%%% falling through, but this isn't so for a graph. Example: a body of a
763 %%%% loop contains a \texttt{break} statement} \simon{I don't understand.
764 %%%% A break statement would have to be translated as a branch, no?
765 %%%% Can you give a an example? I claim that control only leaves an
766 %%%% open graph by falling through.}
767 %%%% \end{itemize}
768
769
770 \subsection{Nodes} \seclabel{nodes}
771
772 The primitive constituents of a \ourlib{} control-flow graph are
773 \emph{nodes}, which are defined by the client.
774 Typically, a node might represent a machine instruction, such as an
775 assignment, a call, or a conditional branch.
776 But \ourlib{}'s graph representation is polymorphic in the node type,
777 so each client can define nodes as it likes.
778 Because they contain nodes defined by the client,
779 graphs can include arbitrary client-specified data, including
780 (say) C~statements, method calls in an object-oriented language, or
781 whatever.
782
783
784 \begin{figure}
785 \begin{fuzzcode}{0.98pt}
786 data `Node e x where
787 Label :: Label -> Node C O
788 `Assign :: Var -> Expr -> Node O O
789 `Store :: Expr -> Expr -> Node O O
790 `Branch :: Label -> Node O C
791 `CondBranch :: Expr -> Label -> Label -> Node O C
792 -- ... more constructors ...
793 \end{fuzzcode}
794 \caption{A typical node type as it might be defined by a client}
795 \figlabel{cmm-node}
796 \end{figure}
797
798 \ourlib{} knows \emph{at compile time} whether a node is open or
799 closed at entry and exit:
800 the type of a node has kind @*->*->*@, where the two type parameters
801 are type-level flags, one for entry and one for exit.
802 Such a type parameter may be instantiated only with type @O@~(for
803 open) or type~@C@ (for closed).
804 As an example,
805 \figref{cmm-node} shows a typical node type as it might be written by
806 one of \ourlib's {clients}.
807 The type parameters are written @e@ and @x@, for
808 entry and exit respectively.
809 The type is a generalized algebraic data type;
810 the syntax gives the type of each constructor.
811 %%% \cite{peyton-jones:unification-based-gadts}.
812 For example, constructor @Label@
813 takes a @Label@ and returns a node of type @Node C O@, where
814 the~``@C@'' says ``closed at entry'' and the~``@O@'' says ``open at exit''.
815 The types @Label@, @O@, and~@C@ are
816 defined by \ourlib{} (\figref{graph}).
817
818 Similarly, an @Assign@ node takes a variable and an expression, and
819 returns a @Node@ open at both entry and exit; the @Store@ node is
820 similar. The types @`Var@ and @`Expr@ are private to the client, and
821 \ourlib{} knows nothing of them.
822 Finally, the control-transfer nodes @Branch@ and @CondBranch@ are open at entry
823 and closed at exit.
824
825 Nodes closed on entry are the only targets of control transfers;
826 nodes open on entry and exit never perform control transfers;
827 and nodes closed on exit always perform control transfers\footnote{%
828 To obey these invariants,
829 a node for
830 a conditional-branch instruction, which typically either transfers control
831 \emph{or} falls through, must be represented as a two-target
832 conditional branch, with the fall-through path in a separate block.
833 This representation is standard \cite{appel:modern},
834 and it costs nothing in practice:
835 such code is easily sequentialized without superfluous branches.
836 %a late-stage code-layout pass can readily reconstruct efficient code.
837 }.
838 Because of the position each type of node occupies in a
839 basic block,
840 we~often call them \emph{first}, \emph{middle}, and \emph{last} nodes
841 respectively.
842
843 \subsection{Blocks} \seclabel{blocks}
844
845 \begin{figure}
846 \begin{fuzzcode}{0.98pt}
847 data `O -- Open
848 data `C -- Closed
849
850 data `Block n e x where
851 `BFirst :: n C O -> Block n C O
852 `BMiddle :: n O O -> Block n O O
853 `BLast :: n O C -> Block n O C
854 `BCat :: Block n e O -> Block n O x -> Block n e x
855
856 data `Graph n e x where
857 `GNil :: Graph n O O
858 `GUnit :: Block n O O -> Graph n O O
859 `GMany :: MaybeO e (Block n O C)
860 -> LabelMap (Block n C C)
861 -> MaybeO x (Block n C O)
862 -> Graph n e x
863
864 data `MaybeO ^ex t where
865 `JustO :: t -> MaybeO O t
866 `NothingO :: MaybeO C t
867
868 newtype `Label = Label Int
869
870 class `NonLocal n where
871 `entryLabel :: n C x -> Label
872 `successors :: n e C -> [Label]
873 \end{fuzzcode}
874 \caption{The block and graph types defined by \ourlib}
875 \figlabel{graph} \figlabel{edges}
876 \end{figure}
877
878 \ourlib\ combines the client's nodes into
879 blocks and graphs, which, unlike the nodes, are defined by \ourlib{}
880 (\figref{graph}).
881 A~@Block@ is parameterized over the node type~@n@
882 as well as over the same flag types that make it open or closed at
883 entry and exit.
884
885 The @BFirst@, @BMiddle@, and @BLast@ constructors create one-node
886 blocks.
887 Each of these constructors is polymorphic in the node's \emph{representation}
888 but monomorphic in its \emph{shape}.
889 An~earlier representation of blocks used a single constructor
890 of type \mbox{@n e x -> Block n e x@}, polymorphic in a node's representation \emph{and}
891 shape.
892 The representation of blocks in \figref{graph} has more constructors, but it
893 uses the power of GADTs to ensure that the shape of every node is
894 known statically.
895 This property makes no difference to clients, but it significantly
896 simplifies the implementation of analysis and transformation in
897 \secref{monomorphic-shape-outcome}.
898
899 The @BCat@ constructor concatenates blocks in sequence.
900 It~makes sense to concatenate blocks only when control can fall
901 through from the first to the second; therefore,
902 two blocks may be concatenated {only} if each block is open at
903 the point of concatenation.
904 This restriction is enforced by the type of @BCat@, whose first
905 argument must be open on exit, and whose second argument must be open on entry.
906 It~is statically impossible, for example, to concatenate a @Branch@
907 immediately before an
908 @Assign@.
909 Indeed, the @Block@ type statically guarantees that any
910 closed/closed @Block@---which compiler writers normally
911 call a ``basic block''---consists of exactly one closed/open node
912 (such as @Label@ in \figref{cmm-node}),
913 followed by zero or more open/open nodes (@Assign@ or @Store@),
914 and terminated with exactly one
915 open/closed node (@Branch@ or @CondBranch@).
916 Using GADTs to enforce these invariants is one of
917 \ourlib{}'s innovations.
918 % Also notice that the definition of @Block@ guarantees that every @Block@
919 % has at least one node.
920 % SPJ: I think we've agreed to stick with PNil.
921 % \remark{So what? Who cares? Why is it important
922 % that no block is empty? (And if we had an empty block, we could ditch
923 % @PNil@ from @PGraph@, which woudl be an improvement.)}
924
925 \subsection{Graphs} \seclabel{graphs}
926
927 \ourlib{} composes blocks into graphs, which are also defined
928 in \figref{graph}.
929 Like @Block@, the data type @Graph@ is parameterized over
930 both nodes @n@ and its open/closed shape (@e@ and @x@).
931 It has three constructors. The first two
932 deal with the base cases of open/open graphs:
933 an empty graph is represented by @GNil@ while a single-block graph
934 is represented by @GUnit@.
935
936 More general graphs are represented by @GMany@, which has three
937 fields: an optional entry sequence, a body, and an optional exit
938 sequence.
939 \begin{itemize}
940 \item
941 If the graph is open at the entry, it contains an entry sequence of type
942 @Block n O C@.
943 We could represent this sequence as a value of type
944 @Maybe (Block n O C)@, but we can do better:
945 a~value of @Maybe@ type requires a \emph{dynamic} test,
946 but we know \emph{statically}, at compile time, that the sequence is present if and only
947 if the graph is open at the entry.
948 We~express our compile-time knowledge by using the type
949 @MaybeO e (Block n O C)@, a type-indexed version of @Maybe@
950 which is also defined in \figref{graph}:
951 the type @MaybeO O a@ is isomorphic to~@a@, while
952 the type @MaybeO C a@ is isomorphic to~@()@.
953 \item
954 The body of the graph is a collection of closed/closed blocks.
955 To~facilitate traversal of the graph, we represent a body as a finite
956 map from label to block.
957 \item
958 The exit sequence is dual to the entry sequence, and like the entry
959 sequence, its presence or absence is deducible from the static type of the graph.
960 \end{itemize}
961
962 \seclabel{gSplice}
963
964 We can splice graphs together nicely; the cost is logarithmic in the
965 number of closed/closed blocks.
966 Unlike blocks, two graphs may be spliced together
967 not only when they are both open
968 at splice point but also
969 when they are both closed---and not in the other two cases:
970 \begin{smallfuzzcode}{10.8pt}
971 `gSplice :: Graph n e a -> Graph n a x -> Graph n e x
972 gSplice GNil g2 = g2
973 gSplice g1 GNil = g1
974
975 gSplice (GUnit ^b1) (GUnit ^b2) = GUnit (b1 `BCat` b2)
976
977 gSplice (GUnit b) (GMany (JustO e) bs x)
978 = GMany (JustO (b `BCat` e)) bs x
979
980 gSplice (GMany e ^bs (JustO x)) (GUnit b2)
981 = GMany e bs (JustO (x `BCat` b2))
982
983 gSplice (GMany e1 ^bs1 (JustO x1)) (GMany (JustO e2) ^bs2 x2)
984 = GMany e1 (bs1 `mapUnion` (b `addBlock` bs2)) x2
985 where b = x1 `BCat` e2
986
987 gSplice (GMany e1 bs1 NothingO) (GMany NothingO bs2 x2)
988 = GMany e1 (bs1 `mapUnion` bs2) x2
989 \end{smallfuzzcode}
990 This definition illustrates the power of GADTs: the
991 pattern matching is exhaustive, and all the open/closed invariants are
992 statically checked. For example, consider the second-last equation for @gSplice@.
993 Since the exit link of the first argument is @JustO x1@,
994 we know that type parameter~@a@ is~@O@, and hence the entry link of the second
995 argument must be @JustO e2@.
996 Moreover, block~@x1@ must be
997 closed/open, and block~@e2@ must be open/closed.
998 We can therefore concatenate them
999 with @BCat@ to produce a closed/closed block, which is
1000 added to the body of the result.
1001
1002 We~have carefully crafted the types so that if @BCat@ and @BodyCat@
1003 are considered as associative operators,
1004 every graph has a unique representation.\finalremark{So what? (Doumont)
1005 Need some words about how this is part of making the API simple for
1006 the client---I've added something to the end of the paragraph, but I'm
1007 not particularly thrilled.}
1008 %% \simon{Well, you were the one who was so keen on a unique representation!
1009 %% And since we have one, I think tis useful to say so. Lastly, the representation of
1010 %% singleton blocks is not entirely obvious.}
1011 %%%%
1012 %%%% An empty open/open graph is represented
1013 %%%% by @GNil@, while a closed/closed one is @gNilCC@:
1014 %%%% \par {\small
1015 %%%% \begin{code}
1016 %%%% gNilCC :: Graph C C
1017 %%%% gNilCC = GMany NothingO BodyEmpty NothingO
1018 %%%% \end{code}}
1019 %%%% The representation of a @Graph@ consisting of a single block~@b@
1020 %%%% depends on the shape of~@b@:\remark{Does anyone care?}
1021 %%%% \par{\small
1022 %%%% \begin{code}
1023 %%%% gUnitOO :: Block O O -> Graph O O
1024 %%%% gUnitOC :: Block O C -> Graph O C
1025 %%%% gUnitCO :: Block O C -> Graph C O
1026 %%%% gUnitCC :: Block O C -> Graph C C
1027 %%%% gUnitOO b = GUnit b
1028 %%%% gUnitOC b = GMany (JustO b) BodyEmpty NothingO
1029 %%%% gUnitCO b = GMany NothingO BodyEmpty (JustO b)
1030 %%%% gUnitCC b = GMany NothingO (BodyUnit b) NothingO
1031 %%%% \end{code}}
1032 %%%% Multi-block graphs are similar.
1033 %%%% From these definitions
1034 To guarantee uniqueness, @GUnit@ is restricted to open/open
1035 blocks.
1036 If~@GUnit@ were more polymorphic, there would be
1037 more than one way to represent some graphs, and it wouldn't be obvious
1038 to a client which representation to choose---or if the choice made a difference.
1039
1040
1041 \subsection{Labels and successors} \seclabel{edges}
1042
1043 Although \ourlib{} is polymorphic in the type of nodes,
1044 it~still needs to know how nonlocal control is transferred from one
1045 block to another:
1046 \hoopl\ must know where
1047 a control transfer can go,
1048 and it must know what @Label@ is at the start of a block.
1049 The client must provide this information by placing its node type into
1050 \ourlib's @NonLocal@ type class (\figref{edges}).
1051 The @entryLabel@ method takes a first node (one closed on entry, \secref{nodes})
1052 and returns its @Label@;
1053 the @successors@ method takes a last node (closed on exit) and returns
1054 the @Label@s to
1055 which it can transfer control.
1056 A~middle node, which is open at both entry and exit, cannot refer to
1057 any @Label@s,
1058 so no corresponding interrogation function is needed.
1059
1060 A node type defined by a client must be an instance of @NonLocal@.
1061 In~\figref{cmm-node},
1062 the client's instance declaration for @Node@ would be
1063 \begin{code}
1064 instance NonLocal Node where
1065 entryLabel (Label l) = l
1066 successors (Branch b) = [b]
1067 successors (CondBranch e b1 b2) = [b1, b2]
1068 \end{code}
1069 Again, the pattern matching for both functions is exhaustive, and
1070 the compiler statically checks this fact. Here, @entryLabel@
1071 cannot be applied to an @Assign@ or @Branch@ node,
1072 and any attempt to define a case for @Assign@ or @Branch@ would result
1073 in a type error.
1074
1075 While it is required for the client to provide this information about
1076 nodes, it is very convenient for \ourlib\ to get the same information
1077 about blocks.
1078 For its own internal use,
1079 \ourlib{} provides this instance declaration for the @Block@ type:
1080 \begin{code}
1081 instance NonLocal n => NonLocal (Block n) where
1082 entryLabel (BFirst n) = entryLabel n
1083 entryLabel (BCat b _) = entryLabel b
1084 successors (BLast n) = successors n
1085 successors (BCat _ b) = successors b
1086 \end{code}
1087 Because the functions @entryLabel@ and @successors@ are used to track control
1088 flow \emph{within} a graph, \ourlib\ does not need to ask for the
1089 entry label or successors of a @Graph@ itself.
1090 Indeed, @Graph@ \emph{cannot} be an instance of @NonLocal@, because even
1091 if a @Graph@ is closed at the entry, it does not have a unique entry label.
1092 %
1093 % A slight infelicity is that we cannot make @Graph@ an instance of @NonLocal@,
1094 % because a graph closed on entry has no unique label. Fortunately, we never
1095 % need @Graph@s to be in @NonLocal@.
1096 %
1097 % ``Never apologize. Never confess to infelicity.'' ---SLPJ
1098
1099
1100
1101
1102 \begin{table}
1103 \centerline{%
1104 \begin{tabular}{@{}>{\raggedright\arraybackslash}p{1.03in}>{\scshape}c>{\scshape
1105 }
1106 c>{\raggedright\arraybackslash}p{1.29in}@{}}
1107 &\multicolumn1{r}{\llap{\emph{Specified}}\hspace*{-0.3em}}&
1108 \multicolumn1{l}{\hspace*{-0.4em}\rlap{\emph{Implemented}}}&\\
1109 \multicolumn1{c}{\emph{Part of optimizer}}
1110 &\multicolumn1{c}{\emph{by}}&
1111 \multicolumn1{c}{\emph{by}}&
1112 \multicolumn1{c}{\emph{How many}}%
1113 \\[5pt]
1114 Control-flow graphs& Us & Us & One \\
1115 Nodes in a control-flow graph & You & You & One type per intermediate language \\[3pt]
1116 Dataflow fact~$F$ & You & You & One type per logic \\
1117 Lattice operations & Us & You & One set per logic \\[3pt]
1118 Transfer functions & Us & You & One per analysis \\
1119 Rewrite functions & Us & You & One per \rlap{transformation} \\[3pt]
1120 Solve-and-rewrite functions & Us & Us & Two (forward, backward) \\
1121 \end{tabular}%
1122 }
1123 \caption{Parts of an optimizer built with \ourlib}
1124 \tablabel{parts}
1125 \end{table}
1126
1127
1128
1129 \section {Using \ourlib{} to analyze and transform graphs} \seclabel{using-hoopl}
1130 \seclabel{making-simple}
1131 \seclabel{create-analysis}
1132
1133 Now that we have graphs, how do we optimize them?
1134 \ourlib{} makes it easy for a
1135 client to build a new dataflow analysis and optimization.
1136 The client must supply the following pieces:
1137 \begin{itemize}
1138 \item \emph{A node type} (\secref{nodes}).
1139 \ourlib{} supplies the @Block@ and @Graph@ types
1140 that let the client build control-flow graphs out of nodes.
1141 \item \emph{A data type of facts} and some operations over
1142 those facts (\secref{facts}).
1143 Each analysis uses facts that are specific to that particular analysis,
1144 which \ourlib{} accommodates by being polymorphic in
1145 the fact type.
1146 \item \emph{A transfer function} that takes a node and returns a
1147 \emph{fact transformer}, which takes a fact flowing into the node and
1148 returns the transformed fact that flows out of the node (\secref{transfers}).
1149 \item \emph{A rewrite function} that takes a node and an input fact,
1150 and which returns either @Nothing@
1151 or @(Just g)@ where @g@~is a graph that should
1152 replace the node.
1153 The ability to replace a \emph{node} by a \emph{graph} that may include
1154 internal control flow is
1155 crucial for many code-improving transformations.
1156 We discuss the rewrite function
1157 in Sections \ref{sec:rewrites} and \ref{sec:shallow-vs-deep}.
1158 \end{itemize}
1159 These requirements are summarized in \tabref{parts}.
1160 Because facts, transfer functions, and rewrite functions work closely together,
1161 we represent their combination
1162 as a single record of type @FwdPass@ (\figref{api-types}).
1163
1164
1165 \ifpagetuning\enlargethispage{\baselineskip}\fi
1166
1167 Given a node type~@n@ and a @FwdPass@,
1168 a client can ask \ourlib{}\ to analyze and rewrite a
1169 graph.
1170 \hoopl\ provides a fully polymorphic interface, but for purposes of
1171 exposition, we present a function that is specialized to a
1172 closed/closed graph:
1173 \begin{code}
1174 analyzeAndRewriteFwdBody
1175 :: ( FuelMonad m -- Gensym and other state
1176 , NonLocal n ) -- Extract non-local flow edges
1177 => FwdPass m n f -- Lattice, transfer, rewrite
1178 -> [Label] -- Entry point(s)
1179 -> Graph n C C -- Input graph
1180 -> FactBase f -- Input fact(s)
1181 -> m ( Graph n C C -- Result graph
1182 , FactBase f ) -- ... and its facts
1183 \end{code}
1184 Given a @FwdPass@, the
1185 analyze-and-rewrite function transforms a graph into
1186 an optimized graph.
1187 As its type shows, this function
1188 is polymorphic in the types of nodes~@n@ and facts~@f@;
1189 these types are determined entirely by the client.
1190 The type of the monad~@m@ is also supplied by the client, but it must
1191 meet certain constraints laid out by \hoopl, as described in
1192 \secref{fuel-monad}.
1193
1194
1195 \begin{figure}
1196 \begin{code}
1197 data `FwdPass m n f
1198 = FwdPass { `fp_lattice :: DataflowLattice f
1199 , `fp_transfer :: FwdTransfer n f
1200 , `fp_rewrite :: FwdRewrite m n f }
1201
1202 ------- Lattice ----------
1203 data `DataflowLattice a = DataflowLattice
1204 { `fact_bot :: a
1205 , `fact_join :: OldFact a -> NewFact a
1206 -> (ChangeFlag, a) }
1207
1208 data `ChangeFlag = `NoChange | `SomeChange
1209 newtype `OldFact a = OldFact a
1210 newtype `NewFact a = NewFact a
1211
1212 ------- Transfers ----------
1213 newtype `FwdTransfer n f -- abstract type
1214 `mkFTransfer'
1215 :: (forall e x . n e x -> f -> Fact x f)
1216 -> FwdTransfer n f
1217
1218 ------- Rewrites ----------
1219 newtype `FwdRewrite m n f -- abstract type
1220 `mkFRewrite'
1221 :: (forall e x . n e x -> f -> m (FwdRes m n f e x))
1222 -> FwdRewrite m n f
1223
1224 data `FwdRes m n f e x
1225 = FwdRes (Graph n e x) (FwdRewrite m n f)
1226 | `NoFwdRes
1227
1228 ------- Fact-like things -------
1229 type family `Fact x f :: *
1230 type instance Fact O f = f
1231 type instance Fact C f = FactBase f
1232
1233 ------- FactBase -------
1234 type `FactBase f = LabelMap f
1235 -- A finite mapping from Labels to facts f
1236 \end{code}
1237 \caption{\ourlib{} API data types}
1238 \figlabel{api-types}
1239 \figlabel{lattice-type} \figlabel{lattice}
1240 \figlabel{transfers} \figlabel{rewrites}
1241 \end{figure}
1242 % omit mkFactBase :: [(Label, f)] -> FactBase f
1243
1244
1245
1246 As well as taking and returning a graph with its entry point(s), the
1247 function also takes input facts (the @FactBase@) and produces output facts.
1248 A~@FactBase@ is simply a finite mapping from @Label@ to facts.
1249 The
1250 output @FactBase@ maps each @Label@ in the @Body@ to its fact; if
1251 the @Label@ is not in the domain of the @FactBase@, its fact is the
1252 bottom element of the lattice.
1253 The point(s) at which control flow may enter the graph are supplied by
1254 the caller; the input @FactBase@ supplies any
1255 facts that hold at these points.
1256 For example, in our constant-propagation example from \secref{const-prop-example},
1257 if the graph
1258 represents the body of a procedure
1259 with parameters $x,y,z$, we would map the entry @Label@ to a fact
1260 $@x=@\top \land @y=@\top \land @z=@\top$, to specify that the procedure's
1261 parameters are not known to be constants.
1262
1263 The client's model of how @analyzeAndRewriteFwdBody@ works is as follows:
1264 \ourlib\ walks forward over each block in the graph.
1265 At each node, \ourlib\ applies the
1266 rewrite function to the node and the incoming fact. If the rewrite
1267 function returns @NoFwdRes@, the node is retained as part of the output
1268 graph, the transfer function is used to compute the downstream fact,
1269 and \ourlib\ moves on to the next node.
1270 But if the rewrite function returns @(FwdRes g rw)@,
1271 indicating that it wants to rewrite the node to the replacement graph~@g@, then
1272 \ourlib\ recursively analyzes and rewrites~@g@, using the new rewrite
1273 function~@rw@, before moving on to the next node.
1274 A~node following a rewritten node sees
1275 \emph{up-to-date} facts; that is, its input fact is computed by
1276 analyzing the replacement graph.
1277
1278 In this section we flesh out the
1279 \emph{interface} to @analyzeAndRewriteFwdBody@, leaving the implementation for
1280 \secref{engine}.
1281 {\hfuzz=7.2pt\par}
1282
1283 \subsection{Dataflow lattices} \seclabel{lattices} \seclabel{facts}
1284
1285 For each analysis or transformation, the client must define a type
1286 of dataflow facts.
1287 A~dataflow fact often represents an assertion
1288 about a program point,\footnote
1289 {In \ourlib, a program point is simply an edge in a
1290 control-flow graph.}
1291 but in general, dataflow analysis establishes properties of \emph{paths}:
1292 \begin{itemize}
1293 \item An assertion about all paths \emph{to} a program point is established
1294 by a \emph{forwards analysis}. For example the assertion ``$@x@=@3@$'' at point P
1295 claims that variable @x@ holds value @3@ at P, regardless of the
1296 path by which P is reached.
1297 \item An assertion about all paths \emph{from} a program point is
1298 established by a \emph{backwards analysis}. For example, the
1299 assertion ``@x@ is dead'' at point P claims that no path from P uses
1300 variable @x@.
1301 \end{itemize}
1302
1303 A~set of dataflow facts must form a lattice, and \ourlib{} must know
1304 (a)~the bottom element of the lattice and (b)~how to take
1305 the least upper bound (join) of two elements.
1306 To ensure that analysis
1307 terminates, it is enough if every fact has a finite number of
1308 distinct facts above it, so that repeated joins
1309 eventually reach a fixed point.
1310
1311 In practice, joins are computed at labels.
1312 If~$f_{\mathit{old}}$ is the fact currently associated with the
1313 label~$L$,
1314 and if a transfer function propagates a new fact~$f_{\mathit{new}}$
1315 into the label~$L$,
1316 the dataflow engine replaces $f_{\mathit{old}}$ with
1317 the join $f_{\mathit{old}} \join f_{\mathit{new}}$.
1318 Furthermore, the dataflow engine wants to know if
1319 $f_{\mathit{new}} \join f_{\mathit{old}} = f_{\mathit{old}}$,
1320 because if not, the analysis has not reached a fixed point.
1321
1322 The bottom element and join operation of a lattice of facts of
1323 type~@f@ are stored in a value of type @DataflowLattice f@
1324 (\figref{lattice}).
1325 %%Such a value is part of the @FwdPass n f@ that is passed to
1326 %%@analyzeAndRewriteFwd@ above.
1327 As noted in the previous paragraph,
1328 \ourlib{} needs to know when the result of a join is equal
1329 to one of the arguments joined.
1330 Because this information can often be computed cheaply together
1331 when the join, \ourlib\ does not
1332 require a separate equality test on facts (which might be
1333 expensive).
1334 Instead, \ourlib\ requires that @fact_join@ return a @ChangeFlag@ as
1335 well as the least upper bound.
1336 The @ChangeFlag@ should be @NoChange@ if
1337 the result is the same as the old fact, and
1338 @SomeChange@ if the result differs.
1339
1340 \seclabel{WithTop}
1341
1342 To help clients create lattices and join functions,
1343 \hoopl\ includes functions and constructors that extend a type~@a@
1344 with top and bottom elements.
1345 In this paper, we use only type @`WithTop@, which has value
1346 constructors with these types:
1347 \begin{code}
1348 `PElem :: a -> WithTop a
1349 `Top :: WithTop a
1350 \end{code}
1351 \hoopl\ also provides a @`Bot@ constructor and types @WithTop@,
1352 @`WithBot@, and @`WithTopAndBot@.
1353 All are defined using a single GADT,
1354 so that value constructors @Top@ and @Bot@ may be used with any of the
1355 types.
1356
1357 The real convenience of @WithTop@, @WithBot@, and @WithTopAndBot@ is
1358 that \hoopl\ provides a combinator that lifts
1359 a join function on~@a@ to a join function on
1360 @WithTop a@, and similarly for the other types.
1361 The lifting combinator ensures that joins involving top and bottom
1362 elements not only obey the appropriate algebraic laws but also set the
1363 @ChangeFlag@ properly.
1364
1365
1366
1367
1368 % A~possible representation of the facts needed to implement
1369 % constant propagation is shown in \figref{const-prop}.
1370 % A~fact
1371 % is represented as a finite map from a variable to a value of type
1372 % @Maybe HasConst@.%
1373 % A~variable $x$ maps to @Nothing@ iff $x=\bot$;
1374 % $x$~maps to @Just Top@ iff $x=\top$;
1375 % and $x$~maps to $@Just@\, (@I@\, k)$ iff $x=k$ (and similarly for
1376 % Boolean constants).\remark{Ugh}
1377 % Any one procedure has only
1378 % finitely many variables; only finitely many facts are computed at any
1379 % program point; and in this lattice any one fact can increase at most
1380 % twice. These properties ensure that the dataflow engine will reach a
1381 % fixed point.
1382
1383
1384 \subsection{The transfer function} \seclabel{transfers}
1385
1386 A~forward transfer function is presented with the dataflow fact(s) on
1387 the edge(s) coming
1388 into a node, and it computes dataflow fact(s) on the outgoing edge(s).
1389 In a forward analysis, the dataflow engine starts with the fact at the
1390 beginning of a block and applies the transfer function to successive
1391 nodes in that block until eventually the transfer function for the last node
1392 computes the facts that are propagated to the block's successors.
1393 For example, consider this graph, with entry at @L1@:
1394 \begin{code}
1395 L1: x=3; goto L2
1396 L2: y=x+4; x=x-1;
1397 if x>0 then goto L2 else return
1398 \end{code}
1399 A forward analysis starts with the bottom fact \{\} at every label.
1400 Analyzing @L1@ propagates this fact forward, by applying the transfer
1401 function successively to the nodes
1402 of @L1@, emerging with the fact \{@x=3@\} for @L2@.
1403 This new fact is joined with the existing (bottom) fact for @L2@.
1404 Now the analysis propagates @L2@'s fact forward, again using the transfer
1405 function, this time emerging with a new fact \mbox{\{@x=2@, @y=7@\}} for @L2@.
1406 Again, the new fact is joined with the existing fact for @L2@, and the process
1407 is iterated until the facts for each label reach a fixed point.
1408
1409 But wait! What is the \emph{type} of the transfer function?
1410 If the node is open at exit, the transfer function produces a single fact.
1411 But what if the node is \emph{closed} on exit?
1412 In that case the transfer function must
1413 produce a list of (@Label@,fact) pairs, one for each outgoing edge.
1414 %
1415 \emph{So the type of the transfer function's result
1416 depends on the shape of the node's exit.}
1417 Fortunately, this dependency can be expressed precisely, at compile
1418 time, by Haskell's (recently added)
1419 \emph{indexed type families}.
1420 The relevant part of \ourlib's interface is given in \figref{transfers}.
1421 A~forward transfer function supplied by a client,
1422 of type (@FwdTransfer@ @n@ @f@),
1423 is typically a function polymorphic in @e@ and @x@.
1424 It takes a
1425 node of type \mbox{(@n@ @e@ @x@)} and a fact of type~@f@, and it produces an
1426 outgoing ``fact-like thing'' of type (@Fact@ @x@ @f@). The
1427 type constructor @Fact@
1428 should be thought of as a type-level function; its signature is given in the
1429 @type family@ declaration, while its definition is given by two @type instance@
1430 declarations. The first declaration says that the fact-like thing
1431 coming out of a node
1432 \emph{open} at the exit is just a fact~@f@. The second declaration says that
1433 the fact-like thing
1434 coming out of a node
1435 \emph{closed} at the exit is a mapping from @Label@ to facts.
1436
1437 We have ordered the arguments such that if
1438 \begin{code}
1439 `transfer_fn :: forall e x . n e x -> f -> Fact x f
1440 `node :: n e x
1441 \end{code}
1442 then @(transfer_fn node)@ is a fact transformer:
1443 \begin{code}
1444 transfer_fn node :: f -> Fact x f
1445 \end{code}
1446
1447 So much for the interface.
1448 What about the implementation?
1449 The way GADTs work is that the compiler uses the value constructor for
1450 type \mbox{@n@ @e@ @x@} to determine whether @e@~and~@x@ are open or
1451 closed.
1452 But we want to write functions that are \emph{polymorphic} in the node
1453 type~@n@!
1454 Such functions include
1455 \begin{itemize}
1456 \item
1457 A function that takes a pair of @FwdTransfer@s for facts @f@~and~@f'@,
1458 and returns a single @FwdTransfer@ for the fact @(f, f')@
1459 \item
1460 A function that takes a @FwdTransfer@ and wraps it in logging code, so
1461 an analysis can be debugged by watching the facts flow through the
1462 nodes
1463 \end{itemize}
1464 Such functions may also be useful in \hoopl's \emph{clients}:
1465 % may need
1466 %functions that are polymorphic in the node type~@n@:
1467 \begin{itemize}
1468 \item
1469 A dominator analysis in the style of
1470 \citet{cooper-harvey-kennedy:simple-dominance} requires only the
1471 functions in the \texttt{NonLocal} type class;
1472 we have written such an analysis using transfer functions that are
1473 polymorphic in~@n@.
1474 \end{itemize}
1475 Because the mapping from
1476 value constructors to shape is different for each node type~@n@, transfer
1477 functions cannot be polymorphic in both
1478 the representation and the shape of nodes.
1479 Our implementation therefore sacrifices polymorphism in shape:
1480 internally, \hoopl\ represents
1481 a~@FwdTransfer n f@ as a \emph{triple} of functions,
1482 each polymorphic in~@n@ but monomorphic in shape:
1483 \begin{code}
1484 newtype FwdTransfer n f
1485 = FwdTransfers ( n C O -> f -> f
1486 , n O O -> f -> f
1487 , n O C -> f -> FactBase f
1488 )
1489 \end{code}
1490 Such triples can easily be composed and wrapped without requiring a
1491 pattern match on the value constructor of an unknown node
1492 type.\remark{Need to refer to this junk in the conclusion}
1493 Working with triples is tedious, but only \hoopl\ itself is forced to
1494 work with triples; each client, which knows the node type, may supply
1495 a triple,
1496 but it typically supplies a single function polymorphic in the shape
1497 of the (known) node.
1498
1499
1500
1501
1502 \subsection{The rewrite function}
1503 \seclabel{rewrites}
1504 \seclabel{example-rewrites}
1505
1506 %%%% \begin{figure}
1507 %%%% \begin{smallfuzzcode}{6.6pt}
1508 %%%% type `AGraph n e x
1509 %%%% = [Label] -> (Graph n e x, [Label])
1510 %%%%
1511 %%%% `withLabels :: Int -> ([Label] -> AGraph n e x)
1512 %%%% -> AGraph n e x
1513 %%%% withLabels n ^fn = \ls -> fn (take n ^ls) (drop n ls)
1514 %%%%
1515 %%%% `mkIfThenElse :: Expr -> AGraph Node O O
1516 %%%% -> AGraph Node O O -> AGraph Node O O
1517 %%%% mkIfThenElse p t e
1518 %%%% = withLabels 3 $ \[l1,l2,l3] ->
1519 %%%% gUnitOC (BUnit (CondBranch p l1 l2)) `gSplice`
1520 %%%% mkLabel l1 `gSplice` t `gSplice` mkBranch l3 `gSplice`
1521 %%%% mkLabel l2 `gSplice` e `gSplice` mkBranch l3 `gSplice`
1522 %%%% mkLabel l3
1523 %%%%
1524 %%%% `mkLabel l = gUnitCO (BUnit (Label l))
1525 %%%% `mkBranch l = gUnitOC (BUnit (Branch l))
1526 %%%% `gUnitOC b = GMany (JustO b) BodyEmpty NothingO
1527 %%%% `gUnitCO b = GMany NothingO BodyEmpty (JustO b)
1528 %%%% \end{smallfuzzcode}
1529 %%%% \caption{The \texttt{AGraph} type and example constructions} \figlabel{agraph}
1530 %%%% \end{figure}
1531
1532 We compute dataflow facts in order to enable code-improving
1533 transformations.
1534 In our constant-propagation example,
1535 the dataflow facts may enable us
1536 to simplify an expression by performing constant folding, or to
1537 turn a conditional branch into an unconditional one.
1538 Similarly, a liveness analysis may allow us to
1539 replace a dead assignment with a no-op.
1540
1541 A @FwdPass@ therefore includes a \emph{rewriting function}, whose
1542 type, @FwdRewrite@, is given in \figref{api-types}.
1543 A rewriting function takes a node and a fact, and optionally returns\ldots what?
1544 At first one might
1545 expect that rewriting should return a new node, but that is not enough:
1546 We~might want to remove a node by rewriting it to the empty graph,
1547 or more ambitiously, we might want to replace a high-level operation
1548 with a tree of conditional branches or a loop, which would entail
1549 introducing new blocks with internal control flow.
1550 In~general, a rewrite function must be able to return a
1551 \emph{graph}.
1552
1553 \ifpagetuning\enlargethispage{0.8\baselineskip}\fi
1554
1555 Concretely, a @FwdRewrite@ takes a node and a suitably shaped
1556 fact, and returns either @NoFwdRes@, indicating that the node should
1557 not be replaced,
1558 or $m\;@(FwdRes@\;\ag\;\rw@)@$, indicating that the node should
1559 be replaced with~\ag: the replacement graph.
1560 The result is monadic because
1561 if the rewriter makes graphs containing blocks,
1562 it may need fresh @Label@s, which are supplied by a monad.
1563 A~client may use its own monad or may use a monad or monad transformer
1564 supplied by \hoopl.
1565 \remark{Forward reference??}
1566
1567 %%%% \ifpagetuning\enlargethispage{0.8\baselineskip}\fi
1568
1569 The type of @FwdRewrite@ in \figref{api-types} guarantees
1570 \emph{at compile time} that
1571 the replacement graph $\ag$ has
1572 the \emph{same} open/closed shape as the node being rewritten.
1573 For example, a branch instruction can be replaced only by a graph
1574 closed at the exit. Moreover, because only an open/open graph can be
1575 empty---look at the type of @GNil@ in \figref{graph}---the type
1576 of @FwdRewrite@
1577 guarantees, at compile time, that no head of a block (closed/open)
1578 or tail of a block (open/closed) can ever be deleted by being
1579 rewritten to an empty graph.
1580
1581 \subsection{Shallow vs deep rewriting}
1582 \seclabel{shallow-vs-deep}
1583
1584 Once the rewrite has been performed, what then?
1585 The rewrite returns a replacement graph,
1586 which must itself be analyzed, and its nodes may be further rewritten.
1587 We~call @analyzeAndRewriteFwdBody@ to process the replacement
1588 graph---but what @FwdPass@ should we use?
1589 There are two common situations:
1590 \begin{itemize}
1591 \item Sometimes we want to analyze and transform the replacement graph
1592 with an unmodified @FwdPass@, further rewriting the replacement graph.
1593 This procedure is
1594 called \emph{deep rewriting}.
1595 When deep rewriting is used, the client's rewrite function must
1596 ensure that the graphs it produces are not rewritten indefinitely
1597 (\secref{correctness}).
1598 \item Sometimes we want to analyze \emph{but not further rewrite} the
1599 replacement graph. This procedure is called \emph{shallow rewriting}.
1600 It~is easily implemented by using a modified @FwdPass@
1601 whose rewriting function always returns @NoFwdRes@.
1602 \end{itemize}
1603 Deep rewriting is essential to achieve the full benefits of
1604 interleaved analysis and transformation
1605 \citep{lerner-grove-chambers:2002}.
1606 But shallow rewriting can be vital as well;
1607 for example, a forward dataflow pass that inserts
1608 a spill before a call must not rewrite the call again, lest it attempt
1609 to insert infinitely many spills.
1610
1611 An~innovation of \hoopl\ is to build the choice of shallow or deep
1612 rewriting into each rewrite function,
1613 an idea that is elegantly captured by the
1614 @FwdRes@ type returned by a @FwdRewrite@ (\figref{api-types}).
1615 The first component of the @FwdRes@ is the replacement graph, as discussed earlier.
1616 The second component, $\rw$, is a
1617 \emph{new rewriting function} to use when recursively processing
1618 the replacement graph.
1619 For shallow rewriting this new function is
1620 the constant @Nothing@ function; for deep rewriting it is the original
1621 rewriting function.
1622
1623
1624 \subsection{Composing rewrite functions and dataflow passes} \seclabel{combinators}
1625
1626 By requiring each rewrite to return a new rewrite function,
1627 \hoopl\ enables a variety of combinators over
1628 rewrite functions.
1629 \remark{This whole subsection needs to be redone in light of the new
1630 (triple-based) representation. It's not pretty any more.}
1631 For example, here is a function
1632 that combines two rewriting functions in sequence:
1633 \remark{This code must be improved}
1634 \smallverbatiminput{comb1}
1635 What a beautiful type @thenFwdRw@ has!
1636 \remark{And what an ugly implementation! Implementations must go.}
1637 It tries @rw1@, and if @rw1@
1638 declines to rewrite, it behaves like @rw2@. But if
1639 @rw1@ rewrites, returning a new rewriter @rw1a@, then the overall call also
1640 succeeds, returning a new rewrite function obtained by combining @rw1a@
1641 with @rw2@. (We cannot apply @rw1a@ or @rw2@
1642 directly to the replacement graph~@g@,
1643 because @r1@~returns a graph and @rw2@~expects a node.)
1644 The rewriter @noFwdRw@ is the identity of @thenFwdRw@.
1645 Finally, @thenFwdRw@ can
1646 combine a deep-rewriting function and a shallow-rewriting function,
1647 to produce a rewriting function that is a combination of deep and shallow.
1648 %%This is something that the Lerner/Grove/Chambers framework could not do,
1649 %%because there was a global shallow/deep flag.
1650 %% Our shortsightedness; Lerner/Grove/Chambers is deep only ---NR
1651
1652
1653 A shallow rewriting function can be made deep by iterating
1654 it:\remark{Algebraic law wanted!}
1655 \smallverbatiminput{iterf}
1656 If we have shallow rewrites $A$~and~$B$ then we can build $AB$,
1657 $A^*B$, $(AB)^*$,
1658 and so on: sequential composition is @thenFwdRw@ and the Kleene star
1659 is @iterFwdRw@.\remark{Do we still believe this claim?}
1660 \remark{We can't define @iterFwdRew@ in terms of @thenFwdRew@ because
1661 the second argument to @thenFwdRew@ would have to be the constant
1662 nothing function when applied but would have to be the original triple
1663 when passed to @thenFwdRew@ as the second argument (in the recursive
1664 call).}
1665
1666
1667 The combinators above operate on rewrite functions that share a common
1668 fact type and transfer function.
1669 It~can also be useful to combine entire dataflow passes that use
1670 different facts.
1671 We~invite you to write one such combinator, with type
1672 \begin{code}
1673 `pairFwd :: Monad m
1674 => FwdPass m n f1
1675 -> FwdPass m n f2
1676 -> FwdPass m n (f1,f2)
1677 \end{code}
1678 The two passes run interleaved, not sequentially, and each may help
1679 the other,
1680 yielding better results than running $A$~and then~$B$ or $B$~and then~$A$
1681 \citep{lerner-grove-chambers:2002}.
1682 %% call these passes. ``super-analyses;''
1683 %% in \hoopl, construction of super-analyses is
1684 %% particularly concrete.
1685
1686 \subsection{Example: Constant propagation and constant folding}
1687 \seclabel{const-prop-client}
1688
1689 % omit Binop :: Operator -> Expr -> Expr -> Expr
1690 % omit Add :: Operator
1691
1692
1693 %% \begin{figure}
1694 %% {\small\hfuzz=3pt
1695 %% \begin{code}
1696 %% -- Types and definition of the lattice
1697 %% data `HasConst = `Top | `B Bool | `I Integer
1698 %% type `ConstFact = Map.Map Var HasConst
1699 %% `constLattice = DataflowLattice
1700 %% { fact_bot = Map.empty
1701 %% , fact_extend = stdMapJoin constFactAdd }
1702 %% where
1703 %% `constFactAdd ^old ^new = (c, j)
1704 %% where j = if new == old then new else Top
1705 %% c = if j == old then NoChange else SomeChange
1706 %%
1707 %% -------------------------------------------------------
1708 %% -- Analysis: variable has constant value
1709 %% `varHasConst :: FwdTransfer Node ConstFact
1710 %% varHasConst (Label l) f = lookupFact f l
1711 %% varHasConst (Store _ _) f = f
1712 %% varHasConst (Assign x (Bool b)) f = Map.insert x (B b) f
1713 %% varHasConst (Assign x (Int i)) f = Map.insert x (I i) f
1714 %% varHasConst (Assign x _) f = Map.insert x Top f
1715 %% varHasConst (Branch l) f = mkFactBase [(l, f)]
1716 %% varHasConst (CondBranch (Var x) ^tid ^fid) f
1717 %% = mkFactBase [(tid, Map.insert x (B True) f),
1718 %% (fid, Map.insert x (B False) f)]
1719 %% varHasConst (CondBranch _ tid fid) f
1720 %% = mkFactBase [(tid, f), (fid, f)]
1721 %%
1722 %% -------------------------------------------------------
1723 %% -- Constant propagation
1724 %% `constProp :: FwdRewrite Node ConstFact
1725 %% constProp node ^facts
1726 %% = fmap toAGraph (mapE rewriteE node)
1727 %% where
1728 %% `rewriteE e (Var x)
1729 %% = case Map.lookup x facts of
1730 %% Just (B b) -> Just $ Bool b
1731 %% Just (I i) -> Just $ Int i
1732 %% _ -> Nothing
1733 %% rewriteE e = Nothing
1734 %%
1735 %% -------------------------------------------------------
1736 %% -- Simplification ("constant folding")
1737 %% `simplify :: FwdRewrite Node f
1738 %% simplify (CondBranch (Bool b) t f) _
1739 %% = Just $ toAGraph $ Branch (if b then t else f)
1740 %% simplify node _ = fmap toAGraph (mapE s_exp node)
1741 %% where
1742 %% `s_exp (Binop Add (Int i1) (Int i2))
1743 %% = Just $ Int $ i1 + i2
1744 %% ... -- more cases for constant folding
1745 %%
1746 %% -- Rewriting expressions
1747 %% `mapE :: (Expr -> Maybe Expr)
1748 %% -> Node e x -> Maybe (Node e x)
1749 %% mapE f (Label _) = Nothing
1750 %% mapE f (Assign x e) = fmap (Assign x) $ f e
1751 %% ... -- more cases for rewriting expressions
1752 %%
1753 %% -------------------------------------------------------
1754 %% -- Defining the forward dataflow pass
1755 %% `constPropPass = FwdPass
1756 %% { fp_lattice = constLattice
1757 %% , fp_transfer = varHasConst
1758 %% , fp_rewrite = constProp `thenFwdRw` simplify }
1759 %% \end{code}}
1760 %% \caption{The client for constant propagation and constant folding} \figlabel{old-const-prop}
1761 %% \end{figure}
1762 \begin{figure}
1763 \smallverbatiminput{cprop}
1764 \caption{The client for constant propagation and constant folding\break (extracted automatically from code distributed with Hoopl)}
1765 \figlabel{const-prop}
1766 \end{figure}
1767
1768
1769 \figref{const-prop} shows client code for
1770 constant propagation and constant folding.
1771 For each variable at each point in a graph, the analysis concludes one of three facts:
1772 the variable holds a constant value (Boolean or integer),
1773 the variable might hold a non-constant value,
1774 or nothing is known about what the variable holds.
1775 We~represent these facts using a finite map from a variable to a
1776 fact of type @WithTop Lit@ (\secref{WithTop}).
1777 A~variable with a constant value maps to @Just (PElem k)@, where
1778 @k@~is the constant value;
1779 a variable with a non-constant value maps to @Just Top@;
1780 and a variable with an unknown value maps to @Nothing@ (it is not
1781 in the domain of the finite map).
1782
1783 % \afterpage{\clearpage}
1784
1785 The definition of the lattice (@constLattice@) is straightforward.
1786 The bottom element is an empty map (nothing is known about the contents of any variable).
1787 We~use the @stdMapJoin@ function to lift the join operation
1788 for a single variable (@constFactAdd@) up to the map containing facts
1789 for all variables.
1790 The utility function @`changeIf :: Bool -> ChangeFlag@ is exported by \hoopl.
1791
1792 % omit stdMapJoin :: Ord k => JoinFun v -> JoinFun (Map.Map k v)
1793
1794 The forward transfer function @varHasLit@ is defined using the
1795 shape-polymorphic auxiliary function~@ft@.
1796 There are two interesting kinds of nodes:
1797 assignment and conditional branch.
1798 In the first two cases for assignment, a variable gets a constant value,
1799 so we produce a dataflow fact mapping the variable to its value.
1800 In the third case for assignment, the variable gets a non-constant value,
1801 so we produce a dataflow fact mapping the variable to @Top@.
1802 The~last interesting case is a conditional branch where the condition
1803 is a variable.
1804 If the conditional branch flows to the true successor,
1805 the variable holds @True@, and similarly for the false successor.
1806 We~update the fact flowing to each successor accordingly.
1807 (Because function @lt@ scrutinizes a GADT, it cannot
1808 default the uninteresting cases.)
1809
1810 We do not need to consider complicated cases such as
1811 an assignment @x:=y@ where @y@ holds a constant value @k@.
1812 Instead, we rely on the interleaving of transformation
1813 and analysis to first transform the assignment to @x:=k@,
1814 which is exactly what our simple transfer function expects.
1815 As we mention in \secref{simple-tx},
1816 interleaving makes it possible to write
1817 the simplest imaginable transfer functions, without missing
1818 opportunities to improve the code.
1819
1820 \figref{const-prop}'s rewrite function for constant propagation, @constProp@,
1821 rewrites each use of a variable to its constant value.
1822 The client has defined auxiliary functions
1823 \begin{smallcode}
1824 `mapVE :: (Var -> Maybe Expr) -> (Expr -> Maybe Expr)
1825 `mapEE :: (Expr -> Maybe Expr) -> (Expr -> Maybe Expr)
1826 `mapEN :: (Expr -> Maybe Expr) -> (Node e x -> Maybe (Node e x))
1827 `mapVN :: (Var -> Maybe Expr) -> (Node e x -> Maybe (Node e x))
1828 `nodeToG :: Node e x -> Graph Node e x
1829 \end{smallcode}
1830 The client composes @mapXX@ functions
1831 to apply @lookup@ to each use of a variable in each kind of node;
1832 @lookup@ substitutes for each variable that has a constant value.
1833 Applying @fmap nodeToG@ lifts the final node, if present, into a~@Graph@.
1834
1835
1836 \figref{const-prop} also gives another, distinct function
1837 for constant
1838 folding: @simplify@.
1839 This function rewrites a conditional branch on a
1840 boolean constant to an unconditional branch, and
1841 to find constant subexpressions,
1842 it runs @s_exp@
1843 on every subexpression.
1844 Function @simplify@ does not need to check whether a variable holds a
1845 constant value; it relies on @constProp@ to have replaced the
1846 variable by the constant.
1847 Indeed, @simplify@ does not consult the
1848 incoming fact, so it is polymorphic in~@f@.
1849
1850
1851 The @FwdRewrite@ functions @constProp@ and @simplify@
1852 are useful independently.
1853 In this case, however, we
1854 want \emph{both} of them,
1855 so we compose them with @thenFwdRw@.
1856 The composition, along with the lattice and the
1857 transfer function,
1858 goes into @constPropPass@ (bottom of \figref{const-prop}).
1859 Given @constPropPass@, we can
1860 improve a graph~@g@ by passing @constPropPass@ and~@g@
1861 to
1862 @analyzeAndRewriteFwdBody@.
1863
1864
1865 \subsection{Throttling the dataflow engine using ``optimization fuel''}
1866 \seclabel{vpoiso}
1867 \seclabel{fuel}
1868
1869 Debugging an optimization can be tricky:
1870 an optimization may rewrite hundreds of nodes,
1871 and any of those rewrites could be incorrect.
1872 To debug dataflow optimizations, we use Whalley's
1873 \citeyearpar{whalley:isolation} powerful technique
1874 to identify the first rewrite that
1875 transforms a program from working code to faulty code.
1876
1877 The key idea is to~limit the number of rewrites that
1878 are performed while optimizing a graph.
1879 In \hoopl, the limit is called
1880 \emph{optimization fuel}:
1881 each rewrite costs one unit of fuel,
1882 and when the fuel is exhausted, no more rewrites are permitted.
1883 Because each rewrite leaves the observable behavior of the
1884 program unchanged, it is safe to stop rewriting at any point.
1885 Given a program that fails when compiled with optimization,
1886 a test infrastructure uses binary search on the amount of
1887 optimization fuel, until
1888 it finds that the program works correctly after $n-1$ rewrites but fails
1889 after $n$~rewrites.
1890 The $n$th rewrite is faulty.
1891
1892 \seclabel{fuel-monad}
1893
1894 \hoopl\ ensures it can keep track of fuel by imposing the constraint
1895 @FuelMonad m@ on the monad of any rewrite function used with
1896 @analyzeAndRewriteFwdBody@.
1897 This constraint ensures that \hoopl\ can get and set the amount of
1898 fuel remaining.
1899
1900
1901 \subsection{Fixed points and speculative rewrites} \seclabel{fixpoints}
1902
1903 Are rewrites sound, especially when there are loops?
1904 Many analyses compute a fixed point starting from unsound
1905 ``facts''; for example, a live-variable analysis starts from the
1906 assumption that all variables are dead. This means \emph{rewrites
1907 performed before a fixed point is reached may be unsound, and their results
1908 must be discarded}. Each iteration of the fixed-point computation must
1909 start afresh with the original graph.
1910
1911
1912 Although the rewrites may be unsound, \emph{they must be performed}
1913 (speculatively, and possibly recursively),
1914 so that the facts downstream of the replacement graphs are as accurate
1915 as possible.
1916 For~example, consider this graph, with entry at @L1@:
1917 \par{\small
1918 \begin{code}
1919 L1: x=0; goto L2
1920 L2: x=x+1; if x==10 then goto L3 else goto L2
1921 \end{code}}
1922 The first traversal of block @L2@ starts with the unsound ``fact'' \{x=0\};
1923 but analysis of the block propagates the new fact \{x=1\} to @L2@, which joins the
1924 existing fact to get \{x=$\top$\}.
1925 What if the predicate in the conditional branch were @x<10@ instead
1926 of @x==10@?
1927 Again the first iteration would begin with the tentative fact \{x=0\}.
1928 Using that fact, we would rewrite the conditional branch to an unconditional
1929 branch @goto L3@. No new fact would propagate to @L2@, and we would
1930 have successfully (and soundly) eliminated the loop.
1931 This example is contrived, but it illustrates that
1932 for best results we should
1933 \begin{itemize}
1934 \item Perform the rewrites on every iteration.
1935 \item Begin each new iteration with the original, virgin graph.
1936 \end{itemize}
1937 This sort of algorithm is hard to implement in an imperative setting, where rewrites
1938 mutate a graph in place.
1939 But with an immutable graph, implementing the algorithm
1940 is trivially easy: we simply revert to the original graph at the start
1941 of each fixed-point iteration.
1942
1943 \subsection{Correctness} \seclabel{correctness}
1944
1945 Facts computed by @analyzeAndRewriteFwdBody@ depend on graphs produced by the rewrite
1946 function, which in turn depend on facts computed by the transfer function.
1947 How~do we know this algorithm is sound, or if it terminates?
1948 A~proof requires a POPL paper
1949 \cite{lerner-grove-chambers:2002}, but we can give some
1950 intuition.
1951
1952 \hoopl\ requires that a client's functions meet
1953 these preconditions:
1954 \begin{itemize}
1955 \item
1956 The lattice must have no \emph{infinite ascending chains}; that is,
1957 every sequence of calls to @fact_extend@ must eventually return @NoChange@.
1958 \item
1959 The transfer function must be
1960 \emph{monotonic}: given a more informative fact in,
1961 it should produce a more informative fact out.
1962 \item
1963 The rewrite function must be \emph{sound}:
1964 if it replaces a node @n@ by a replacement graph~@g@, then @g@~must be
1965 observationally equivalent to~@n@ under the
1966 assumptions expressed by the incoming dataflow fact~@f@.
1967 %%\footnote{We do not permit a transformation to change
1968 %% the @Label@ of a node. We have not found any optimizations
1969 %% that are prevented (or even affected) by this restriction.}
1970 \item
1971 The rewrite function must be \emph{consistent} with the transfer function;
1972 that is, \mbox{@`transfer n f@ $\sqsubseteq$ @transfer g f@}.
1973 For example, if the analysis says that @x@ is dead before the node~@n@,
1974 then it had better still be dead if @n@ is replaced by @g@.
1975 \item
1976 To ensure termination, a transformation that uses deep rewriting
1977 must not return replacement graphs which
1978 contain nodes that could be rewritten indefinitely.
1979 \end{itemize}
1980 Without the conditions on monotonicity and consistency,
1981 our algorithm will terminate,
1982 but there is no guarantee that it will compute
1983 a fixed point of the analysis. And that in turn threatens the
1984 soundness of rewrites based on possibly bogus ``facts''.
1985
1986 However, when the preconditions above are met,
1987 \begin{itemize}
1988 \item
1989 The algorithm terminates. The fixed-point loop must terminate because the
1990 lattice has no infinite ascending chains. And the client is responsible
1991 for avoiding infinite recursion when deep rewriting is used.
1992 \item
1993 The algorithm is sound. Why? Because if each rewrite is sound (in the sense given above),
1994 then applying a succession of rewrites is also sound.
1995 Moreover, a~sound analysis of the replacement graph
1996 may generate only dataflow facts that could have been
1997 generated by a more complicated analysis of the original graph.
1998 \end{itemize}
1999
2000 \finalremark{Doesn't the rewrite have to be have the following property:
2001 for a forward analysis/transform, if (rewrite P s) = Just s',
2002 then (transfer P s $\sqsubseteq$ transfer P s').
2003 For backward: if (rewrite Q s) = Just s', then (transfer Q s' $\sqsubseteq$ transfer Q s).
2004 Works for liveness.
2005 ``It works for liveness, so it must be true'' (NR).
2006 If this is true, it's worth a QuickCheck property!
2007 }%
2008 \finalremark{Version 2, after further rumination. Let's define
2009 $\scriptstyle \mathit{rt}(f,s) = \mathit{transform}(f, \mathit{rewrite}(f,s))$.
2010 Then $\mathit{rt}$ should
2011 be monotonic in~$f$. We think this is true of liveness, but we are not sure
2012 whether it's just a generally good idea, or whether it's actually a
2013 precondition for some (as yet unarticulated) property of \ourlib{} to hold.}%
2014
2015 %%%% \simon{The rewrite functions must presumably satisfy
2016 %%%% some monotonicity property. Something like: given a more informative
2017 %%%% fact, the rewrite function will rewrite a node to a more informative graph
2018 %%%% (in the fact lattice.).
2019 %%%% \textbf{NR}: actually the only obligation of the rewrite function is
2020 %%%% to preserve observable behavior. There's no requirement that it be
2021 %%%% monotonic or indeed that it do anything useful. It just has to
2022 %%%% preserve semantics (and be a pure function of course).
2023 %%%% \textbf{SLPJ} In that case I think I could cook up a program that
2024 %%%% would never reach a fixed point. Imagine a liveness analysis with a loop;
2025 %%%% x is initially unused anywhere.
2026 %%%% At some assignment node inside the loop, the rewriter behaves as follows:
2027 %%%% if (and only if) x is dead downstream,
2028 %%%% make it alive by rewriting the assignment to mention x.
2029 %%%% Now in each successive iteration x will go live/dead/live/dead etc. I
2030 %%%% maintain my claim that rewrite functions must satisfy some
2031 %%%% monotonicity property.
2032 %%%% \textbf{JD}: in the example you cite, monotonicity of facts at labels
2033 %%%% means x cannot go live/dead/live/dead etc. The only way we can think
2034 %%%% of not to terminate is infinite ``deep rewriting.''
2035 %%%% }
2036
2037
2038
2039
2040 \section{\ourlib's implementation}
2041 \seclabel{engine}
2042 \seclabel{dfengine}
2043
2044 \secref{making-simple}
2045 gives a client's-eye view of \hoopl, showing how to use
2046 it to create analyses and transformations.
2047 \hoopl's interface is simple, but
2048 the \emph{implementation} of interleaved analysis and rewriting is
2049 quite complicated. \citet{lerner-grove-chambers:2002}
2050 do not describe their implementation. We have written
2051 at least three previous implementations, all of which
2052 were long and hard to understand, and only one of which
2053 provided compile-time guarantees about open and closed shapes.
2054 We are not confident that any of these implementations are correct.
2055
2056 In this paper we describe our new implementation. It is short
2057 (about a third of the size of our last attempt), elegant, and offers
2058 strong static shape guarantees.
2059 \remark{Wanted: enumerate the critical components and give each one's size}
2060
2061 We describe the implementation of \emph{forward}
2062 analysis and transformation.
2063 The implementations of backward analysis and transformation are
2064 exactly analogous and are included in \hoopl.
2065
2066
2067
2068 \subsection{Overview}
2069
2070
2071 %% We on @analyzeAndRewriteFwd@, whose type is more general
2072 %% than that of @analyzeAndRewriteFwdBody@:
2073 %% \begin{smallcode}
2074 %% `analyzeAndRewriteFwd
2075 %% :: forall m n f e x. (FuelMonad m, NonLocal n)
2076 %% => FwdPass m n f -- lattice, transfers, rewrites
2077 %% -> MaybeC e [Label] -- entry points for a closed graph
2078 %% -> Graph n e x -- the original graph
2079 %% -> Fact e f -- fact(s) flowing into the entry/entries
2080 %% -> m (Graph n e x, FactBase f, MaybeO x f)
2081 %% \end{smallcode}
2082 %% We analyze graphs of all shapes; a single @FwdPass@ may be used with
2083 %% multiple shapes.
2084 %% If a graph is closed at the entry, a list of entry points must be
2085 %% provided;
2086 %% if the graph is open at the entry, it must be the case that the
2087 %% implicit entry point is the only entry point.
2088 %% The fact or set of facts flowing into the entries is similarly
2089 %% determined by the shape of the entry point.
2090 %% Finally, the result is a rewritten graph, a @FactBase@ that gives a
2091 %% fixed point of the analysis on the rewritten graph, and if the graph
2092 %% is open at the exit, an ``exit fact'' flowing out.
2093
2094 Instead of the interface function @analyzeAndRewriteBody@, we present
2095 the internal function @arfGraph@ (short for ``analyze and rewrite
2096 forward graph''):
2097 \begin{smallcode}
2098 `arfGraph
2099 :: forall m n f e x. (FuelMonad m, NonLocal n)
2100 => FwdPass m n f -- lattice, transfers, rewrites
2101 -> MaybeC e [Label] -- entry points for a closed graph
2102 -> Graph n e x -- the original graph
2103 -> Fact e f -- fact(s) flowing into the entry/entries
2104 -> m (RG f n e x, Fact x f)
2105 \end{smallcode}
2106 We analyze graphs of all shapes; a single @FwdPass@ may be used with
2107 multiple shapes.
2108 If a graph is closed at the entry, a list of entry points must be
2109 provided;
2110 if the graph is open at the entry, it must be the case that the
2111 implicit entry point is the only entry point.
2112 The fact or set of facts flowing into the entries is similarly
2113 determined by the shape of the entry point.
2114 Finally, the result is a rewritten graph decorated with facts
2115 (@RG f n e x@),
2116 and if the graph
2117 is open at the exit, an ``exit fact'' flowing out.
2118
2119 To explain~@RG@, we have to reveal that the definition of
2120 @Graph@ in \figref{graph} contains a white lie: @Graph@~is a type
2121 synonym for an underlying type @`Graph'@, which takes the type of block
2122 as an additional parameter.
2123 (Similarly, function @gSplice@ in \secref{gSplice} is actually a
2124 higher-order function that takes a block-concatenation function as a
2125 parameter.)
2126 The truth about @Graph@ and @RG@ is as follows:
2127 \begin{code}
2128 type Graph = Graph' Block
2129 type RG f n e x = Graph' (FBlock f) n e x
2130 data FBlock f n e x = FBlock f (Block n e x)
2131 \end{code}
2132 @RG@~differs from @Graph@ simply by having the beginning of each block
2133 decorated with a fact.
2134 Extracting a @Graph@ and a @FactBase@ for @analyzeAndRewriteFwdBody@
2135 requires a 12-line function.
2136 Type~@RG@ is internal to \hoopl; it is not seen by any client.
2137 %% We use it, not @Graph@, for two reasons:
2138 %% \begin{itemize}
2139 %% \item The client is often interested not only in the facts flowing
2140 %% out of the graph (which are returned in the @Fact x f@),
2141 %% but also in the facts on the \emph{internal} blocks
2142 %% of the graph. A~replacement graph of type @(RG n e x)@ is decorated with
2143 %% these internal facts.
2144 %% \item A @Graph@ has deliberately restrictive invariants; for example,
2145 %% a @GMany@ with a @JustO@ is always open at exit (\figref{graph}). It turns
2146 %% out to be awkward to maintain these invariants \emph{during} rewriting,
2147 %% but easy to restore them \emph{after} rewriting by ``normalizing'' an @RG@.
2148 %% \end{itemize}
2149 %% The information in an @RG@ is returned to the client by
2150 %% the normalization function @normalizeBody@, which
2151 %% splits an @RG@ into a @Body@ and its corresponding @FactBase@:
2152 %% \begin{code}
2153 %% `normalizeBody :: NonLocal n => RG n f C C
2154 %% -> (Body n, FactBase f)
2155 %%
2156
2157 The implementation of @arfGraph@ is built up in layers:
2158 we analyze nodes, blocks, graph bodies, and full graphs.
2159 This layering enables us to separate concerns: for example, only the
2160 analysis of a node has to worry about rewrite functions;
2161 and only the analysis of a body has to worry about fixed points.
2162 All four layers typically share the @FwdPass@ and any entry points,
2163 which are captured in closures.
2164 The types of the inner functions are therefore
2165 \begin{smallcode}
2166 node :: forall e x . (ShapeLifter e x)
2167 => n e x -> f -> m (RG f n e x, Fact x f)
2168 block :: forall e x .
2169 Block n e x -> f -> m (RG f n e x, Fact x f)
2170 body :: [Label] -> LabelMap (Block n C C)
2171 -> Fact C f -> m (RG f n C C, Fact C f)
2172 graph :: Graph n e x -> Fact e f -> m (RG f n e x, Fact x f)
2173 \end{smallcode}
2174 Together with @arfGraph@, these functions comprise a cycle of mutual recursion:
2175 @arfGraph@ calls @graph@,
2176 each of the inner functions calls the one above it,
2177 and @node@ may call @arfGraph@.
2178 And each of the inner functions has roughly the same form:
2179 it~gets a ``thing'' and returns a fact transformer suitable to that
2180 thing.
2181 But the fact transformer provides a bonus:
2182 in addition to a result fact, we also get a rewritten graph.
2183
2184 Fact transformers compose nicely, and our composition of fact
2185 transformers deliberately mirrors the composition of the
2186 data structures.
2187 The result is something one might call
2188 ``@concatMap@ style,'' which is best defined by example:
2189 \begin{code}
2190 block (BFirst n) = node n
2191 block (BMiddle n) = node n
2192 block (BLast n) = node n
2193 block (BCat b1 b2) = block b1 `cat` block b2
2194 \end{code}
2195 (We have also implemented a ``fold style,'' but because the types are
2196 a little less intuitive, we are sticking with @concatMap@ style for now.)
2197 The key operation here is @cat@, which must thread facts through
2198 monadic fact transformers and also accumulate the rewritten graphs:
2199 \begin{code}
2200 `cat ft1 ft2 f = do { (g1,f1) <- ft1 f
2201 ; (g2,f2) <- ft2 f1
2202 ; return (g1 `rgCat` g2, f2) }
2203 \end{code}
2204 (Function @rgCat@ is the same splicing function used for an ordinary
2205 @Graph@, but it takes a one-line block-concatenation function suitable
2206 for @FBlock@s.)
2207 The @cat@ function is completely agnostic about the types of facts,
2208 but it does ensure that where graphs are concatented, shapes match up:
2209 \begin{code}
2210 cat :: forall m e a x info info' info''. Monad m =>
2211 (info -> m (RG f n e a, info'))
2212 -> (info' -> m (RG f n a x, info''))
2213 -> (info -> m (RG f n e x, info''))
2214 \end{code}
2215
2216
2217
2218
2219 \begin{itemize}
2220 \item
2221 The @arfNode@ function processes nodes (\secref{arf-node}).
2222 It handles the subtleties of interleaved analysis and rewriting,
2223 and it deals with fuel consumption. It calls @arfGraph@ to analyze
2224 and transform rewritten graphs.
2225 \item
2226 Based on @arfNode@ it is extremely easy to write @arfBlock@, which lifts
2227 the analysis and rewriting from nodes to blocks (\secref{arf-block}).
2228
2229 %%% \ifpagetuning\penalty -10000 \fi
2230
2231
2232 \item
2233 Using @arfBlock@ we define @arfBody@, which analyzes and rewrites a
2234 @Body@: a~group of closed/closed blocks linked by arbitrary
2235 control flow.
2236 Because a @Body@ is
2237 always closed/closed and does not take shape parameters, function
2238 @arfBody@ is less polymorphic than the others; its type is what
2239 would be obtained by expanding and specializing the definition of
2240 @ARF@ for a @thing@ which is always closed/closed and is equivalent to
2241 a @Body@.
2242
2243 Function @arfBody@ takes care of fixed points (\secref{arf-body}).
2244 \item
2245 Based on @arfBody@ it is easy to write @arfGraph@ (\secref{arf-graph}).
2246 \end{itemize}
2247 Given these functions, writing the main analyzer is a simple
2248 matter of matching the external API to the internal functions:
2249 \begin{code}
2250 `analyzeAndRewriteFwdBody
2251 :: forall n f. NonLocal n
2252 => FwdPass n f -> Body n -> FactBase f
2253 -> FuelMonad (Body n, FactBase f)
2254
2255 analyzeAndRewriteFwdBody pass ^body facts
2256 = do { (^rg, _) <- arfBody pass body facts
2257 ; return (normalizeBody rg) }
2258 \end{code}
2259
2260 \subsection{From nodes to blocks} \seclabel{arf-block}
2261 \seclabel{arf-graph}
2262
2263 We begin our explanation with the second task:
2264 writing @arfBlock@, which analyzes and transforms blocks.
2265 \begin{code}
2266 `arfBlock :: NonLocal n => ARF (Block n) n
2267 arfBlock pass (BUnit node) f
2268 = arfNode pass node f
2269 arfBlock pass (BCat b1 b2) f
2270 = do { (g1,f1) <- arfBlock pass b1 f
2271 ; (g2,f2) <- arfBlock pass b2 f1
2272 ; return (g1 `RGCatO` g2, f2) }
2273 \end{code}
2274 The code is delightfully simple.
2275 The @BUnit@ case is implemented by @arfNode@.
2276 The @BCat@ case is implemented by recursively applying @arfBlock@ to the two
2277 sub-blocks, threading the output fact from the first as the
2278 input to the second.
2279 Each recursive call produces a rewritten graph;
2280 we concatenate them with @RGCatO@.
2281
2282 Function @arfGraph@ is equally straightforward:
2283 XXXXXXXXXXXXXXX
2284 The pattern is the same as for @arfBlock@: thread
2285 facts through the sequence, and concatenate the results.
2286 Because the constructors of type~@RG@ are more polymorphic than those
2287 of @Graph@, type~@RG@ can represent
2288 graphs more simply than @Graph@; for example, each element of a
2289 @GMany@ becomes a single @RG@ object, and these @RG@ objects are then
2290 concatenated to form a single result of type~@RG@.
2291
2292 %% \ifpagetuning\penalty -10000 \fi
2293
2294 \subsection{Analyzing and rewriting nodes} \seclabel{arf-node}
2295
2296 Although interleaving analysis with transformation is tricky, we have
2297 succeeded in isolating the algorithm in just two functions,
2298 @arfNode@ and its backward analog, @`arbNode@:
2299 \begin{fuzzcode}{10.5pt}
2300 `arfNode :: NonLocal n => ARF n n
2301 arfNode ^pass n f
2302 = do { ^mb_g <- withFuel (fp_rewrite pass n f)
2303 ; case mb_g of
2304 Nothing -> return (RGUnit f (BUnit n),
2305 fp_transfer pass n f)
2306 Just (FwdRes ^ag ^rw) ->
2307 do { g <- graphOfAGraph ag
2308 ; let ^pass' = pass { fp_rewrite = rw }
2309 ; arfGraph pass' g f } }
2310 \end{fuzzcode}
2311 The code here is more complicated,
2312 but still admirably brief.
2313 Using the @fp_rewrite@ record selector (\figref{api-types}),
2314 we~begin by extracting the
2315 rewriting function from the @FwdPass@,
2316 and we apply it to the node~@n@
2317 and the incoming fact~@f@.
2318
2319 The resulting @Maybe@ is passed to @withFuel@, which
2320 deals with fuel accounting:
2321 \begin{code}
2322 `withFuel :: Maybe a -> FuelMonad (Maybe a)
2323 \end{code}
2324 If @withFuel@'s argument is @Nothing@, \emph{or} if we have run out of
2325 optimization fuel (\secref{fuel}), @withFuel@ returns @Nothing@.
2326 Otherwise, @withFuel@ consumes one unit of fuel and returns its
2327 % defn Fuel
2328 argument (which will be a @Just@). That is all we need say about fuel.
2329
2330 In the @Nothing@ case, no rewrite takes place---either because the rewrite function
2331 didn't want one or because fuel is exhausted.
2332 We~return a single-node
2333 graph @(RGUnit f (BUnit n))@, decorated with its incoming fact.
2334 We~also apply the transfer function @(fp_transfer pass)@
2335 to the incoming fact to produce the outgoing fact.
2336 (Like @fp_rewrite@, @fp_transfer@ is a record selector of @FwdPass@.)
2337
2338 In the @Just@ case, we receive a replacement
2339 @AGraph@ @ag@ and a new rewrite function~@rw@.
2340 We~convert @ag@ to a @Graph@, using
2341 \par{\small
2342 \begin{code}
2343 `graphOfAGraph :: AGraph n e x -> FuelMonad (Graph n e x)
2344 \end{code}}
2345 and we analyze the resulting @Graph@ with @arfGraph@.
2346 This analysis uses @pass'@, which contains the original lattice and transfer
2347 function from @pass@, together with the new rewrite function~@rg@.
2348
2349 And that's it! If~the client wanted deep rewriting, it is
2350 implemented by the call to @arfGraph@;
2351 if the client wanted
2352 shallow rewriting, the rewrite function will have returned
2353 @noFwdRw@ as~@rw@, which is implanted in @pass'@
2354 (\secref{shallow-vs-deep}).
2355
2356 \subsection{Fixed points} \seclabel{arf-body}
2357
2358 Lastly, @arfBody@ deals with the fixed-point calculation.
2359 This part of the implementation is the only really tricky part, and it is
2360 cleanly separated from everything else:
2361 \par{\small
2362 \begin{code}
2363 arfBody :: NonLocal n
2364 => FwdPass n f -> Body n -> FactBase f
2365 -> FuelMonad (RG n f C C, FactBase f)
2366 `arfBody pass body ^fbase
2367 = fixpoint (fp_lattice pass) (arfBlock pass) fbase $
2368 forwardBlockList (factBaseLabels fbase) body
2369 \end{code}}
2370 Function @forwardBlockList@ takes a list of possible entry points and @Body@,
2371 and it
2372 returns a linear list of
2373 blocks, sorted into an order that makes forward dataflow efficient:
2374 \begin{code}
2375 `forwardBlockList
2376 :: NonLocal n => [Label]
2377 -> Body n -> [(Label,Block n C C)]
2378 \end{code}
2379 For
2380 example, if the @Body@ starts at block~@L2@, and @L2@
2381 branches to~@L1@, but not vice versa, then \hoopl\ will reach a fixed point
2382 more quickly if we process @L2@ before~@L1@.
2383 To~find an efficient order, @forwardBlockList@ uses
2384 the methods of the @NonLocal@ class---@entryLabel@ and @successors@---to
2385 perform a reverse depth-first traversal of the control-flow graph.
2386 %%
2387 %%The @NonLocal@ type-class constraint on~@n@ propagates to all the
2388 %%@`arfThing@ functions.
2389 %% paragraph carrying too much freight
2390 %%
2391 The order of the blocks does not affect the fixed point or any other
2392 part of the answer; it affects only the number of iterations needed to
2393 reach the fixed point.
2394
2395 How do we know what entry points to pass to @forwardBlockList@?
2396 We treat
2397 any block with an entry in the in-flowing @FactBase@ as an entry point.
2398 \finalremark{Why does this work?}
2399 {\hfuzz=0.8pt \par}
2400
2401 The rest of the work is done by @fixpoint@, which is shared by
2402 both forward and backward analyses:
2403 \begin{code}
2404 `fixpoint :: forall n f.
2405 NonLocal n
2406 => Bool -- going Forward?
2407 -> DataflowLattice f
2408 -> (Block n C C -> FactBase f ->
2409 FuelMonad (RG n f C C, FactBase f))
2410 -> FactBase f
2411 -> [(Label, Block n C C)]
2412 -> FuelMonad (RG n f C C, FactBase f)
2413 \end{code}
2414 Except for the mysterious @Bool@ passed as the first argument,
2415 the type signature tells the story.
2416 The third argument is
2417 a function that analyzes and rewrites a single block;
2418 @fixpoint@ applies that function successively to all the blocks,
2419 which are passed as the fifth argument.\finalremark{For consistency with the transfer
2420 functions, blocks should come before @FactBase@, even though this change will
2421 ugly up the call site some.}
2422 The @fixpoint@ function maintains a
2423 ``Current @FactBase@''
2424 which grows monotonically:
2425 the initial value of the Current @FactBase@ is the fourth argument to
2426 @fixpoint@,
2427 and the Current @FactBase@ is augmented with the new facts that flow
2428 out of each @Block@ as it is analyzed.
2429 The @fixpoint@ function
2430 keeps analyzing blocks until the Current @FactBase@ reaches a fixed
2431 point.
2432
2433 The code for @fixpoint@ is a massive 70 lines long;
2434 for completeness, it
2435 appears in Appendix~\ref{app:fixpoint}.
2436 The~code is mostly straightforward, although we try to be a bit clever
2437 about deciding when a new fact means that another iteration over the
2438 blocks will be required.
2439 There is one more subtle point worth mentioning, which we highlight by
2440 considering a forward analysis of this graph, where execution starts at~@L1@:
2441 \begin{code}
2442 L1: x:=3; goto L4
2443 L2: x:=4; goto L4
2444 L4: if x>3 goto L2 else goto L5
2445 \end{code}
2446 Block @L2@ is unreachable.
2447 But if we \naively\ process all the blocks (say in
2448 order @L1@, @L4@, @L2@), then we will start with the bottom fact for @L2@, propagate
2449 \{@x=4@\} to @L4@, where it will join with \{@x=3@\} to yield
2450 \{@x=@$\top$\}.
2451 Given @x=@$\top$, the
2452 conditional in @L4@ cannot be rewritten, and @L2@~seems reachable. We have
2453 lost a good optimization.
2454
2455 Our implementation solves this problem through a clever trick that is
2456 safe only for a forward analysis;
2457 @fixpoint@ analyzes a block only if the block is
2458 reachable from an entry point.
2459 This trick is not safe for a backward analysis, which
2460 is why
2461 @fixpoint@ takes a @Bool@ as its first argument:
2462 it must know if the analysis goes forward.
2463
2464 Although the trick can be implemented in just a couple of lines of
2465 code, the reasoning behind it is quite subtle---exactly the sort of
2466 thing that should be implemented once in \hoopl, so clients don't have
2467 to worry about it.
2468
2469 \section {Related work} \seclabel{related}
2470
2471 While there is a vast body of literature on
2472 dataflow analysis and optimization,
2473 relatively little can be found on
2474 the \emph{design} of optimizers, which is the topic of this paper.
2475 We therefore focus on the foundations of dataflow analysis
2476 and on the implementations of some comparable dataflow frameworks.
2477
2478 \paragraph{Foundations}
2479
2480 When transfer functions are monotone and lattices are finite in height,
2481 iterative dataflow analysis converges to a fixed point
2482 \cite{kam-ullman:global-iterative-analysis}.
2483 If~the lattice's join operation distributes over transfer
2484 functions,
2485 this fixed point is equivalent to a join-over-all-paths solution to
2486 the recursive dataflow equations
2487 \cite{kildall:unified-optimization}.\footnote
2488 {Kildall uses meets, not joins.
2489 Lattice orientation is conventional, and conventions have changed.
2490 We use Dana Scott's
2491 orientation, in which higher elements carry more information.}
2492 \citet{kam-ullman:monotone-flow-analysis} generalize to some
2493 monotone functions.
2494 Each~client of \hoopl\ must guarantee monotonicity.
2495
2496 \ifcutting
2497 \citet{cousot:abstract-interpretation:1977}
2498 \else
2499 \citet{cousot:abstract-interpretation:1977,cousot:systematic-analysis-frameworks}
2500 \fi
2501 introduce abstract interpretation as a technique for developing
2502 lattices for program analysis.
2503 \citet{schmidt:data-flow-analysis-model-checking} shows that
2504 an all-paths dataflow problem can be viewed as model checking an
2505 abstract interpretation.
2506
2507 \citet{muchnick:compiler-implementation}
2508 presents many examples of both particular analyses and related
2509 algorithms.
2510
2511
2512 The soundness of interleaving analysis and transformation,
2513 even when not all speculative transformations are performed on later
2514 iterations, was shown by
2515 \citet{lerner-grove-chambers:2002}.
2516
2517
2518 \paragraph{Frameworks}
2519 Most dataflow frameworks support only analysis, not transformation.
2520 The framework computes a fixed point of transfer functions, and it is
2521 up to the client of
2522 the framework to use that fixed point for transformation.
2523 Omitting transformation makes it much easier to build frameworks,
2524 and one can find a spectrum of designs.
2525 We~describe two representative
2526 designs, then move on to the prior frameworks that support interleaved
2527 analysis and transformation.
2528
2529 The CIL toolkit \cite{necula:cil:2002}\finalremark{No good citation
2530 for same reason as Soot below ---JD}
2531 provides an analysis-only framework for C~programs.
2532 The framework is limited to one representation of control-flow graphs
2533 and one representation of instructions, both of which are provided by
2534 the framework.
2535 The~API is complicated;
2536 much of the complexity is needed to enable the client to
2537 affect which instructions
2538 the analysis iterates over.
2539
2540
2541 The Soot framework is designed for analysis of Java programs
2542 \cite{hendren:soot:2000}.\finalremark{This citation is probably the
2543 best for Soot in general, but there doesn't appear
2544 to be any formal publication that actually details the dataflow
2545 framework part. ---JD}
2546 While Soot's dataflow library supports only analysis, not
2547 transformation, we found much
2548 to admire in its design.
2549 Soot's library is abstracted over the representation of
2550 the control-flow graph and the representation of instructions.
2551 Soot's interface for defining lattice and analysis functions is
2552 like our own,
2553 although because Soot is implemented in an imperative style,
2554 additional functions are needed to copy lattice elements.
2555 Like CIL, Soot provides only analysis, not transformation.
2556
2557
2558 \finalremark{FYI, LLVM has Pass Managers that try to control the order of passes,
2559 but I'll be darned if I can find anything that might be termed a dataflow framework.}
2560
2561 The Whirlwind compiler contains the dataflow framework implemented
2562 by \citet{lerner-grove-chambers:2002}, who were the first to
2563 interleave analysis and transformation.
2564 Their implementation is much like our early efforts:
2565 it is a complicated mix of code that simultaneously manages interleaving,
2566 deep rewriting, and fixed-point computation.
2567 By~separating these tasks,
2568 our implementation simplifies the problem dramatically.
2569 Whirlwind's implementation also suffers from the difficulty of
2570 maintaining pointer invariants in a mutable representation of
2571 control-flow graphs, a problem we have discussed elsewhere
2572 \cite{ramsey-dias:applicative-flow-graph}.
2573
2574 Because speculative transformation is difficult in an imperative setting,
2575 Whirlwind's implementation is split into two phases.
2576 The first phase runs the interleaved analyses and transformations
2577 to compute the final dataflow facts and a representation of the transformations
2578 that should be applied to the input graph.
2579 The second phase executes the transformations.
2580 In~\hoopl, because control-flow graphs are immutable, speculative transformations
2581 can be applied immediately, and there is no need
2582 for a phase distinction.
2583
2584 %%% % repetitious...
2585 %%%
2586 %%% \ourlib\ also improves upon Whirlwind's dataflow framework by providing
2587 %%% new support for the optimization writer:
2588 %%% \begin{itemize}
2589 %%% \item Using static type guarantees, \hoopl\ rules out a whole
2590 %%% class of possible bugs: transformations that produced malformed
2591 %%% control-flow graphs.
2592 %%% \item Using dynamic testing,
2593 %%% we can isolate the rewrite that transforms a working program
2594 %%% into a faulty program,
2595 %%% using Whalley's \citeyearpar{whalley:isolation} fault-isolation technique.
2596 %%% \end{itemize}
2597
2598 In previous work \cite{ramsey-dias:applicative-flow-graph}, we
2599 described a zipper-based representation of control-flow
2600 graphs, stressing the advantages
2601 of immutability.
2602 Our new representation, described in \secref{graph-rep}, is a significant improvement:
2603 \begin{itemize}
2604 \item
2605 We can concatenate nodes, blocks, and graphs in constant time.
2606 %Previously, we had to resort to Hughes's
2607 %\citeyearpar{hughes:lists-representation:article} technique, representing
2608 %a graph as a function.
2609 \item
2610 We can do a backward analysis without having
2611 to ``unzip'' (and allocate a copy of) each block.
2612 \item
2613 Using GADTs, we can represent a flow-graph
2614 node using a single type, instead of the triple of first, middle, and
2615 last types used in our earlier representation.
2616 This change simplifies the interface significantly:
2617 instead of providing three transfer functions and three rewrite
2618 functions per pass---one for
2619 each type of node---a client of \hoopl\ provides only one transfer
2620 function and one rewrite function per pass.
2621 \item
2622 Errors in concatenation are ruled out at
2623 compile-compile time by Haskell's static
2624 type system.
2625 In~earlier implementations, such errors were not detected until
2626 the compiler~ran, at which point we tried to compensate
2627 for the errors---but
2628 the compensation code harbored subtle faults,
2629 which we discovered while developing a new back end
2630 for the Glasgow Haskell Compiler.
2631 \end{itemize}
2632
2633 The implementation of \ourlib\ is also much better than
2634 our earlier implementations.
2635 Not only is the code simpler conceptually,
2636 but it is also shorter:
2637 our new implementation is about a third as long
2638 as the previous version, which is part of GHC, version 6.12.
2639
2640
2641
2642
2643 \section{What we learned}
2644
2645 > Some numbers, I have used it nine times, and would need the general fold
2646 > once to define blockToNodeList (or CB* equivalent suggested by you).
2647 > (We are using it in GHC to
2648 > - computing hash of the blocks from the nodes
2649 > - finding the last node of a block
2650 > - converting block to the old representation (2x)
2651 > - computing interference graph
2652 > - counting Area used by a block (2x)
2653 > - counting stack high-water mark for a block
2654 > - prettyprinting block)
2655
2656
2657 type-parameter hell, newtype hell, typechekcing hell, instance hell,
2658 triple hell
2659
2660
2661
2662 We have spent six years implementing and reimplementing frameworks for
2663 dataflow analysis and transformation.
2664 This formidable design problem taught us
2665 two kinds of lessons:
2666 we learned some very specific lessons about representations and
2667 algorithms for optimizing compilers,
2668 and we were forcibly reminded of some very general, old lessons that are well
2669 known not just to functional programmers, but to programmers
2670 everywhere.
2671
2672
2673
2674 %% \remark{Orphaned: but for transfer functions that
2675 %% approximate weakest preconditions or strongest postconditions,
2676 %% monotonicity falls out naturally.}
2677 %%
2678 %%
2679 %% In conclusion we offer the following lessons from the experience of designing
2680 %% and implementing \ourlib{}.
2681 %% \begin{itemize}
2682 %% \item
2683 %% Although we have not stressed this point, there is a close connection
2684 %% between dataflow analyses and program logic:
2685 %% \begin{itemize}
2686 %% \item
2687 %% A forward dataflow analysis is represented by a predicate transformer
2688 %% that is related to \emph{strongest postconditions}
2689 %% \cite{floyd:meaning}.\footnote
2690 %% {In Floyd's paper the output of the predicate transformer is called
2691 %% the \emph{strongest verifiable consequent}, not the ``strongest
2692 %% postcondition.''}
2693 %% \item
2694 %% A backward dataflow analysis is represented by a predicate transformer
2695 %% that is related to \emph{weakest preconditions} \cite{dijkstra:discipline}.
2696 %% \end{itemize}
2697 %% Logicians write down the predicate transformers for the primitive
2698 %% program fragments, and then use compositional rules to ``lift'' them
2699 %% to a logic for whole programs. In the same way \ourlib{} lets the client
2700 %% write simple predicate transformers,
2701 %% and local rewrites based on those assertions, and ``lifts'' them to entire
2702 %% function bodies with arbitrary control flow.
2703
2704 Our main goal for \hoopl\ was to combine three good ideas (interleaved
2705 analysis and transformation, optimization fuel, and an applicative
2706 control-flow graph) in a way that could easily be reused by many, many
2707 compiler writers.
2708 Reuse requires abstraction, and as is well known,
2709 designing good abstractions is challenging.
2710 \hoopl's data types and the functions over those types have been
2711 through \emph{dozens} of revisions.
2712 As~we were refining our design, we~found it invaluable to operate in
2713 two modes:
2714 In the first mode, we designed, built, and used a framework as an
2715 important component of a real compiler (first Quick~{\PAL}, then GHC).
2716 In the second mode, we designed and built a standalone library, then
2717 redesigned and rebuilt it, sometimes going through several significant
2718 changes in a week.
2719 Operating in the first mode---inside a live compiler---forced us to
2720 make sure that no corners were cut, that we were solving a real
2721 problem, and that we did not inadvertently
2722 cripple some other part of the compiler.
2723 Operating in the second mode---as a standalone library---enabled us to
2724 iterate furiously, trying out many more ideas than would have
2725 been possible in the first mode.
2726 We~have learned that alternating between these two modes leads to a
2727 better design than operating in either mode alone.
2728
2729 We were forcibly reminded of timeless truths:
2730 that interfaces are more important than implementations, and that data
2731 is more important than code.
2732 These truths are reflected in this paper, in which
2733 we
2734 have given \hoopl's API three times as much space as \hoopl's implementation.
2735
2736 We were also reminded that Haskell's type system (polymorphism, GADTs,
2737 higher-order functions, type classes, and so on) is a remarkably
2738 effective
2739 language for thinking about data and code---and that
2740 Haskell lacks a language of interfaces (like ML's signatures) that
2741 would make it equally effective for thinking about APIs at a larger scale.
2742 Still, as usual, the types were a remarkable aid to writing the code:
2743 when we finally agreed on the types presented above, the
2744 code almost wrote itself.
2745
2746 Types are widely appreciated at ICFP, but here are three specific
2747 examples of how types helped us:
2748 \begin{itemize}
2749 \item
2750 Reuse is enabled by representation-independence, which in a functional
2751 language is
2752 expressed through parametric polymorphism.
2753 Making \ourlib{} polymorphic in the nodes
2754 made the code simpler, easier to understand, and easier to maintain.
2755 In particular, it forced us to make explicit \emph{exactly} what
2756 \ourlib\ must know about nodes, and to embody that knowledge in the
2757 @NonLocal@ type class (\secref{edges}).
2758 \item
2759 We are proud of using GADTs to
2760 track the open and closed shapes of nodes, blocks, and graphs at
2761 compile time.
2762 Shapes may
2763 seem like a small refinement, but they helped tremendously when
2764 building \hoopl, and we expect them to help clients.
2765 %
2766 % this paper is just not about run-time performance ---NR
2767 %
2768 %%%% Moreover, the implementation is faster than it would otherwise be,
2769 %%%% because, say, a @(Fact O f)e@ is known to be just an @f@ rather than
2770 %%%% being a sum type that must be tested (with a statically known outcome!).
2771 %
2772 Giving the \emph{same} shapes
2773 to nodes, blocks, and graphs helped our
2774 thinking and helped to structure the implementation.
2775 \item
2776 In our earlier designs, graphs were parameterized over \emph{three} node
2777 types: first, middle, and last nodes.
2778 Those designs therefore required
2779 three transfer functions, three rewrite functions, and so~on.
2780 Moving to a single, ``shapely'' node type was a major breakthrough:
2781 not only do we have just one node type, but our client need supply only
2782 one transfer function and one rewrite function.
2783 To~make this design work, however, we \emph{must} have
2784 the type-level function for
2785 @Fact@ (\figref{api-types}), to express how incoming
2786 and outgoing facts depend on the shape of a node.
2787 \end{itemize}
2788
2789 Dataflow optimization is usually described as a way to improve imperative
2790 programs by mutating control-flow graphs.
2791 Such transformations appear very different from the tree rewriting
2792 that functional languages are so well known for, and that makes
2793 functional languages so attractive for writing other parts of compilers.
2794 But even though dataflow optimization looks very different from
2795 what we are used to,
2796 writing a dataflow optimizer
2797 in a pure functional language was a huge win.
2798 % We could not possibly have conceived \ourlib{} in C++.
2799 In~a pure functional language, not only do we know that
2800 no data structure will be unexpectedly mutated,
2801 but we are forced to be
2802 explicit about every input and output,
2803 and we are encouraged to implement things compositionally.
2804 This kind of thinking has helped us make
2805 significant improvements to the already tricky work of Lerner, Grove,
2806 and Chambers:
2807 per-function control of shallow vs deep rewriting
2808 (\secref{shallow-vs-deep}),
2809 combinators for dataflow passes (\secref{combinators}),
2810 optimization fuel (\secref{fuel}),
2811 and transparent management of unreachable blocks (\secref{arf-body}).
2812 We~trust that these improvements are right only because they are
2813 implemented in separate
2814 parts of the code that cannot interact except through
2815 explicit function calls.
2816 %%
2817 %%An ancestor of \ourlib{} is in the Glasgow Haskell Compiler today,
2818 %%in version~6.12.
2819 With this new, improved design in hand, we are now moving back to
2820 live-compiler mode, pushing \hoopl\ into version
2821 6.13 of the Glasgow Haskell Compiler.
2822
2823
2824 \acks
2825
2826 The first and second authors were funded
2827 by a grant from Intel Corporation and
2828 by NSF awards CCF-0838899 and CCF-0311482.
2829 These authors also thank Microsoft Research Ltd, UK, for funding
2830 extended visits to the third author.
2831
2832
2833 \makeatother
2834
2835 \providecommand\includeftpref{\relax} %% total bafflement -- workaround
2836 \IfFileExists{nrbib.tex}{\bibliography{cs,ramsey}}{\bibliography{dfopt}}
2837 \bibliographystyle{plainnatx}
2838
2839
2840 \clearpage
2841
2842 \appendix
2843
2844 % omit LabelSet :: *
2845 % omit LabelMap :: *
2846 % omit delFromFactBase :: FactBase f -> [(Label,f)] -> FactBase f
2847 % omit elemFactBase :: Label -> FactBase f -> Bool
2848 % omit elemLabelSet :: Label -> LabelSet -> Bool
2849 % omit emptyLabelSet :: LabelSet
2850 % omit factBaseLabels :: FactBase f -> [Label]
2851 % omit extendFactBase :: FactBase f -> Label -> f -> FactBase f
2852 % omit extendLabelSet :: LabelSet -> Label -> LabelSet
2853 % omit getFuel :: FuelMonad Fuel
2854 % omit setFuel :: Fuel -> FuelMonad ()
2855 % omit lookupFact :: FactBase f -> Label -> Maybe f
2856 % omit factBaseList :: FactBase f -> [(Label, f)]
2857
2858 \section{Code for \textmd{\texttt{fixpoint}}}
2859 \label{app:fixpoint}
2860
2861 {\def\baselinestretch{0.95}\hfuzz=20pt
2862 \begin{smallcode}
2863 data `TxFactBase n f
2864 = `TxFB { `tfb_fbase :: FactBase f
2865 , `tfb_rg :: RG n f C C -- Transformed blocks
2866 , `tfb_cha :: ChangeFlag
2867 , `tfb_lbls :: LabelSet }
2868 -- Set the tfb_cha flag iff
2869 -- (a) the fact in tfb_fbase for or a block L changes
2870 -- (b) L is in tfb_lbls.
2871 -- The tfb_lbls are all Labels of the *original*
2872 -- (not transformed) blocks
2873
2874 `updateFact :: DataflowLattice f -> LabelSet -> (Label, f)
2875 -> (ChangeFlag, FactBase f)
2876 -> (ChangeFlag, FactBase f)
2877 updateFact ^lat ^lbls (lbl, ^new_fact) (^cha, fbase)
2878 | NoChange <- ^cha2 = (cha, fbase)
2879 | lbl `elemLabelSet` lbls = (SomeChange, new_fbase)
2880 | otherwise = (cha, new_fbase)
2881 where
2882 (cha2, ^res_fact)
2883 = case lookupFact fbase lbl of
2884 Nothing -> (SomeChange, new_fact)
2885 Just ^old_fact -> fact_extend lat old_fact new_fact
2886 ^new_fbase = extendFactBase fbase lbl res_fact
2887
2888 fixpoint :: forall n f. NonLocal n
2889 => Bool -- Going forwards?
2890 -> DataflowLattice f
2891 -> (Block n C C -> FactBase f
2892 -> FuelMonad (RG n f C C, FactBase f))
2893 -> FactBase f -> [(Label, Block n C C)]
2894 -> FuelMonad (RG n f C C, FactBase f)
2895 fixpoint ^is_fwd lat ^do_block ^init_fbase ^blocks
2896 = do { ^fuel <- getFuel
2897 ; ^tx_fb <- loop fuel init_fbase
2898 ; return (tfb_rg tx_fb,
2899 tfb_fbase tx_fb `delFromFactBase` blocks) }
2900 -- The outgoing FactBase contains facts only for
2901 -- Labels *not* in the blocks of the graph
2902 where
2903 `tx_blocks :: [(Label, Block n C C)]
2904 -> TxFactBase n f -> FuelMonad (TxFactBase n f)
2905 tx_blocks [] tx_fb = return tx_fb
2906 tx_blocks ((lbl,blk):bs) tx_fb = tx_block lbl blk tx_fb
2907 >>= tx_blocks bs
2908
2909 `tx_block :: Label -> Block n C C
2910 -> TxFactBase n f -> FuelMonad (TxFactBase n f)
2911 tx_block ^lbl ^blk tx_fb@(TxFB { tfb_fbase = fbase
2912 , tfb_lbls = lbls
2913 , tfb_rg = ^blks
2914 , tfb_cha = cha })
2915 | is_fwd && not (lbl `elemFactBase` fbase)
2916 = return tx_fb -- Note [Unreachable blocks]
2917 | otherwise
2918 = do { (rg, ^out_facts) <- do_block blk fbase
2919 ; let (^cha', ^fbase')
2920 = foldr (updateFact lat lbls) (cha,fbase)
2921 (factBaseList out_facts)
2922 ; return (TxFB { tfb_lbls = extendLabelSet lbls lbl
2923 , tfb_rg = rg `RGCatC` blks
2924 , tfb_fbase = fbase'
2925 , tfb_cha = cha' }) }
2926
2927 loop :: Fuel -> FactBase f -> FuelMonad (TxFactBase n f)
2928 `loop fuel fbase
2929 = do { let ^init_tx_fb = TxFB { tfb_fbase = fbase
2930 , tfb_cha = NoChange
2931 , tfb_rg = RGNil
2932 , tfb_lbls = emptyLabelSet}
2933 ; tx_fb <- tx_blocks blocks init_tx_fb
2934 ; case tfb_cha tx_fb of
2935 NoChange -> return tx_fb
2936 SomeChange -> setFuel fuel >>
2937 loop fuel (tfb_fbase tx_fb) }
2938 \end{smallcode}
2939 \par
2940 } % end \baselinestretch
2941
2942
2943 \section{Index of defined identifiers}
2944
2945 This appendix lists every nontrivial identifier used in the body of
2946 the paper.
2947 For each identifier, we list the page on which that identifier is
2948 defined or discussed---or when appropriate, the figure (with line
2949 number where possible).
2950 For those few identifiers not defined or discussed in text, we give
2951 the type signature and the page on which the identifier is first
2952 referred to.
2953
2954 Some identifiers used in the text are defined in the Haskell Prelude;
2955 for those readers less familiar with Haskell, these identifiers are
2956 listed in Appendix~\ref{sec:prelude}.
2957
2958 \newcommand\dropit[3][]{}
2959
2960 \newcommand\hsprelude[2]{\noindent
2961 \texttt{#1} defined in the Haskell Prelude\\}
2962 \let\hsprelude\dropit
2963
2964 \newcommand\hspagedef[3][]{\noindent
2965 \texttt{#2} defined on page~\pageref{#3}.\\}
2966 \newcommand\omithspagedef[3][]{\noindent
2967 \texttt{#2} not shown (but see page~\pageref{#3}).\\}
2968 \newcommand\omithsfigdef[3][]{\noindent
2969 \texttt{#2} not shown (but see Figure~\ref{#3} on page~\pageref{#3}).\\}
2970 \newcommand\hsfigdef[3][]{%
2971 \noindent
2972 \ifx!#1!%
2973 \texttt{#2} defined in Figure~\ref{#3} on page~\pageref{#3}.\\
2974 \else
2975 \texttt{#2} defined on \lineref{#1} of Figure~\ref{#3} on page~\pageref{#3}.\\
2976 \fi
2977 }
2978 \newcommand\hstabdef[3][]{%
2979 \noindent
2980 \ifx!#1!
2981 \texttt{#2} defined in Table~\ref{#3} on page~\pageref{#3}.\\
2982 \else
2983 \texttt{#2} defined on \lineref{#1} of Table~\ref{#3} on page~\pageref{#3}.\\
2984 \fi
2985 }
2986 \newcommand\hspagedefll[3][]{\noindent
2987 \texttt{#2} {let}- or $\lambda$-bound on page~\pageref{#3}.\\}
2988 \newcommand\hsfigdefll[3][]{%
2989 \noindent
2990 \ifx!#1!%
2991 \texttt{#2} {let}- or $\lambda$-bound in Figure~\ref{#3} on page~\pageref{#3}.\\
2992 \else
2993 \texttt{#2} {let}- or $\lambda$-bound on \lineref{#1} of Figure~\ref{#3} on page~\pageref{#3}.\\
2994 \fi
2995 }
2996
2997 \newcommand\nothspagedef[3][]{\notdefd\ndpage{#1}{#2}{#3}}
2998 \newcommand\nothsfigdef[3][]{\notdefd\ndfig{#1}{#2}{#3}}
2999 \newcommand\nothslinedef[3][]{\notdefd\ndline{#1}{#2}{#3}}
3000
3001 \newcommand\ndpage[3]{\texttt{#2}~(p\pageref{#3})}
3002 \newcommand\ndfig[3]{\texttt{#2}~(Fig~\ref{#3},~p\pageref{#3})}
3003 \newcommand\ndline[3]{%
3004 \ifx!#1!%
3005 \ndfig{#1}{#2}{#3}%
3006 \else
3007 \texttt{#2}~(Fig~\ref{#3}, line~\lineref{#1}, p\pageref{#3})%
3008 \fi
3009 }
3010
3011 \newif\ifundefinedsection\undefinedsectionfalse
3012
3013 \newcommand\notdefd[4]{%
3014 \ifundefinedsection
3015 , #1{#2}{#3}{#4}%
3016 \else
3017 \undefinedsectiontrue
3018 \par
3019 \section{Undefined identifiers}
3020 #1{#2}{#3}{#4}%
3021 \fi
3022 }
3023
3024 \begingroup
3025 \raggedright
3026
3027 \input{defuse}%
3028 \ifundefinedsection.\fi
3029
3030 \undefinedsectionfalse
3031
3032
3033 \renewcommand\hsprelude[2]{\noindent
3034 \ifundefinedsection
3035 , \texttt{#1}%
3036 \else
3037 \undefinedsectiontrue
3038 \par
3039 \section{Identifiers defined in Haskell Prelude or a standard library}\label{sec:prelude}
3040 \texttt{#1}%
3041 \fi
3042 }
3043 \let\hspagedef\dropit
3044 \let\omithspagedef\dropit
3045 \let\omithsfigdef\dropit
3046 \let\hsfigdef\dropit
3047 \let\hstabdef\dropit
3048 \let\hspagedefll\dropit
3049 \let\hsfigdefll\dropit
3050 \let\nothspagedef\dropit
3051 \let\nothsfigdef\dropit
3052 \let\nothslinedef\dropit
3053
3054 \input{defuse}
3055 \ifundefinedsection.\fi
3056
3057
3058
3059 \endgroup
3060
3061
3062 \iffalse
3063
3064 \section{Dataflow-engine functions}
3065
3066
3067 \begin{figure*}
3068 \setcounter{codeline}{0}
3069 \begin{numberedcode}
3070 \end{numberedcode}
3071 \caption{The forward iterator}
3072 \end{figure*}
3073
3074 \begin{figure*}
3075 \setcounter{codeline}{0}
3076 \begin{numberedcode}
3077 \end{numberedcode}
3078 \caption{The forward actualizer}
3079 \end{figure*}
3080
3081
3082 \fi
3083
3084
3085
3086 \end{document}
3087
3088
3089
3090
3091 THE FUEL PROBLEM:
3092
3093
3094 Here is the problem:
3095
3096 A graph has an entry sequence, a body, and an exit sequence.
3097 Correctly computing facts on and flowing out of the body requires
3098 iteration; computation on the entry and exit sequences do not, since
3099 each is connected to the body by exactly one flow edge.
3100
3101 The problem is to provide the correct fuel supply to the combined
3102 analysis/rewrite (iterator) functions, so that speculative rewriting
3103 is limited by the fuel supply.
3104
3105 I will number iterations from 1 and name the fuel supplies as
3106 follows:
3107
3108 f_pre fuel remaining before analysis/rewriting starts
3109 f_0 fuel remaining after analysis/rewriting of the entry sequence
3110 f_i, i>0 fuel remaining after iteration i of the body
3111 f_post fuel remaining after analysis/rewriting of the exit sequence
3112
3113 The issue here is that only the last iteration of the body 'counts'.
3114 To formalize, I will name fuel consumed:
3115
3116 C_pre fuel consumed by speculative rewrites in entry sequence
3117 C_i fuel consumed by speculative rewrites in iteration i of body
3118 C_post fuel consumed by speculative rewrites in exit sequence
3119
3120 These quantities should be related as follows:
3121
3122 f_0 = f_pre - C_pref
3123 f_i = f_0 - C_i where i > 0
3124 f_post = f_n - C_post where iteration converges after n steps
3125
3126 When the fuel supply is passed explicitly as parameter and result, it
3127 is fairly easy to see how to keep reusing f_0 at every iteration, then
3128 extract f_n for use before the exit sequence. It is not obvious to me
3129 how to do it cleanly using the fuel monad.
3130
3131
3132 Norman