refactoring 'fixpoint' and other code for display in paper
[packages/hoopl.git] / paper / dfopt.tex
1 \iffalse
2
3 Payload: alternatives for functions polymorphic in node type
4
5 (n C O -> a, n C O -> b, n C O -> c)
6
7 vs
8
9 (forall e x . n e x -> ShapeTag e x, forall e x . n e x -> result e x)
10
11 where
12
13 data ShapeTag e x where
14 First :: ShapeTag C O
15 Middle :: ShapeTag O O
16 Last :: ShapeTag O C
17
18 result C O = a
19 result O O = b
20 result O C = c
21
22 The function returning ShapeTag can be passed in a type-class
23 dictionary.
24
25
26
27
28 Say something about the proliferation of heavyweight type signatures
29 required for GADT pattern matches. When the signature is three times
30 the size of the function, something is wrong...
31
32
33 I'm going to leave this point out, because in all the client code
34 we've written, we are (a) matching only on a node, and (b) matching on
35 all cases. In this scenario the GADT exhaustiveness checker provides
36 no additional benefits. Indeed, GADTs can be an irritant to the
37 client: in many pattern matches, GADTs make it impossible to default
38 cases.
39
40
41 I was thinking again about unwrapped nodes, cons-lists, snoc-lists,
42 and tree fringe. I think there's an analogy with ordinary lists, and
43 the analogy is 'concatMap' vs 'fold'. Just as with lists, the common
44 case of 'concatMap (\a -> [a])' does a lot of unnecessary wrapping and
45 unwrapping of elements. We nevertheless prefer 'concatMap' because it
46 provides better separation of concerns. But this analogy suggests
47 several ideas:
48
49 - Can block processing be written in higher-order fashion?
50
51 - Can it be done in both styles (concatMap *and* fold)?
52
53 - Once we have framed the problem in these terms, can we write
54 fold-style cold that is not too hard to understand and maintain?
55
56 - Finally, is there an analog of stream fusion that would work for
57 control-flow graphs, enabling us to write some graph combinators
58 that be both perspicuous and efficient?
59
60 These are good observations for the paper and for future work.
61
62
63 ----------------------------------------------------------------
64
65
66 P.S. The three of us should have a nice little Skype chat about
67 higher-rank types. A lot of the questions still at issue boil down to
68 the following observations:
69
70 - It's really convenient for the *implementation* of Hoopl to put
71 forall to the left of an arrow. Polymorphic functions as
72 arguments are powerful and keen, and they make it really easy for
73 Hoopl to do its job.
74
75 - It's a bid inconvenient for a *client* of Hoopl to be forced to
76 supply functions that are polymorphic in shape. All constructors
77 have to be written out by hand; defaulting is not possible. (Or
78 at least John and I haven't figured out how to do it.)
79
80 - Using higher-rank types in Hoopl's interface makes some very
81 desirable things downright impossible:
82
83 - One can't write a generic dominator analysis with type
84
85 dom :: NonLocal n => FwdPass n Dominators
86
87 - One can't write a generic debugging wrapper of type
88
89 debug :: (Show n, Show f)
90 => TraceFn -> WhatToTrace -> FwdPass n f -> FwdPass n f
91
92 - One can't write a cominator of type
93
94 product :: FwdPass n f -> FwdPass n f' -> FwdPass n (f, f')
95
96 I submit that these things are all very desirable.
97
98 I'm therefore strongly in favor of removing the *requirement* that a
99 FwdPass include transfer and rewrite functions that are polymorphic in
100 shape. It will be a bit tedious to return to triples of functions,
101 but for those lucky clients who *can* write polymorphic functions and
102 who wish to, we can provide injection functions.
103
104 I should stress that I believe the tedium can be contained within
105 reasonable bounds---for example, the arfXXX functions that are
106 internal to Hoopl all remain polymorphic in shape (although not
107 higher-rank any longer).
108
109 \fi
110
111
112 %
113 % TODO
114 %
115 %
116 % AGraph = graph under construction
117 % Graph = replacement graph
118 %
119 %%% Hoopl assumes that the replacement graph for a node open at the exit
120 %%% doesn't contain any additional exits
121 %%%
122 %%% introduce replacement graph in section 3, after graph.
123 %%% it has arbitrarily complicated internal control flow, but only
124 %%% one exit (if open at the exit)
125 %%%
126 %%% a rquirement on the client that is not checked statically.
127
128
129 \input{dfoptdu.tex}
130
131 \newif\ifpagetuning \pagetuningtrue % adjust page breaks
132
133 \newif\ifnoauthornotes \noauthornotesfalse
134
135 \newif\iftimestamp\timestamptrue % show MD5 stamp of paper
136
137 % \timestampfalse % it's post-submission time
138
139 \IfFileExists{timestamp.tex}{}{\timestampfalse}
140
141 \newif\ifcutting \cuttingfalse % cutting down to submission size
142
143
144 \newif\ifgenkill\genkillfalse % have a section on gen and kill
145 \genkillfalse
146
147
148 \newif\ifnotesinmargin \notesinmarginfalse
149 \IfFileExists{notesinmargin.tex}{\notesinmargintrue}{\relax}
150
151 \documentclass[blockstyle,preprint,natbib,nocopyrightspace]{sigplanconf}
152
153 \newcommand\ourlib{Hoopl}
154 % higher-order optimization library
155 % ('Hoople' was taken -- see hoople.org)
156 \let\hoopl\ourlib
157
158 \newcommand\ag{\ensuremath{\mathit{ag}}}
159 \renewcommand\ag{\ensuremath{g}} % not really seeing that 'ag' is helpful here ---NR
160 \newcommand\rw{\ensuremath{\mathit{rw}}}
161
162 % l2h substitution ourlib Hoopl
163 % l2h substitution hoopl Hoopl
164
165 \newcommand\fs{\ensuremath{\mathit{fs}}} % dataflow facts, possibly plural
166
167 \newcommand\vfilbreak[1][\baselineskip]{%
168 \vskip 0pt plus #1 \penalty -200 \vskip 0pt plus -#1 }
169
170 \usepackage{alltt}
171 \usepackage{array}
172 \usepackage{afterpage}
173 \newcommand\lbr{\char`\{}
174 \newcommand\rbr{\char`\}}
175
176 \clubpenalty=10000
177 \widowpenalty=10000
178
179 \usepackage{verbatim} % allows to define \begin{smallcode}
180 \newenvironment{smallcode}{\par\unskip\small\verbatim}{\endverbatim}
181 \newenvironment{fuzzcode}[1]{\par\unskip\hfuzz=#1 \verbatim}{\endverbatim}
182 \newenvironment{smallfuzzcode}[1]{\par\unskip\small\hfuzz=#1 \verbatim}{\endverbatim}
183
184 \newcommand\smallverbatiminput[1]{{\small\verbatiminput{#1}}}
185
186 \newcommand\lineref[1]{line~\ref{line:#1}}
187 \newcommand\linepairref[2]{lines \ref{line:#1}~and~\ref{line:#2}}
188 \newcommand\linerangeref[2]{\mbox{lines~\ref{line:#1}--\ref{line:#2}}}
189 \newcommand\Lineref[1]{Line~\ref{line:#1}}
190 \newcommand\Linepairref[2]{Lines \ref{line:#1}~and~\ref{line:#2}}
191 \newcommand\Linerangeref[2]{\mbox{Lines~\ref{line:#1}--\ref{line:#2}}}
192
193 \makeatletter
194
195 \let\c@table=
196 \c@figure % one counter for tables and figures, please
197
198 \newcommand\setlabel[1]{%
199 \setlabel@#1!!\@endsetlabel
200 }
201 \def\setlabel@#1!#2!#3\@endsetlabel{%
202 \ifx*#1*% line begins with label or is empty
203 \ifx*#2*% line is empty
204 \verbatim@line{}%
205 \else
206 \@stripbangs#3\@endsetlabel%
207 \label{line:#2}%
208 \fi
209 \else
210 \@stripbangs#1!#2!#3\@endsetlabel%
211 \fi
212 }
213 \def\@stripbangs#1!!\@endsetlabel{%
214 \verbatim@line{#1}%
215 }
216
217
218 \verbatim@line{hello mama}
219
220 \newcommand{\numberedcodebackspace}{0.5\baselineskip}
221
222 \newcounter{codeline}
223 \newenvironment{numberedcode}
224 {\endgraf
225 \def\verbatim@processline{%
226 \noindent
227 \expandafter\ifx\expandafter+\the\verbatim@line+ % blank line
228 %{\small\textit{\def\rmdefault{cmr}\rmfamily\phantom{00}\phantom{: \,}}}%
229 \else
230 \refstepcounter{codeline}%
231 {\small\textit{\def\rmdefault{cmr}\rmfamily\phantom{00}\llap{\arabic{codeline}}: \,}}%
232 \fi
233 \expandafter\setlabel\expandafter{\the\verbatim@line}%
234 \expandafter\ifx\expandafter+\the\verbatim@line+ % blank line
235 \vspace*{-\numberedcodebackspace}\par%
236 \else
237 \the\verbatim@line\par
238 \fi}%
239 \verbatim
240 }
241 {\endverbatim}
242
243 \makeatother
244
245 \newcommand\arrow{\rightarrow}
246
247 \newcommand\join{\sqcup}
248 \newcommand\slotof[1]{\ensuremath{s_{#1}}}
249 \newcommand\tempof[1]{\ensuremath{t_{#1}}}
250 \let\tempOf=\tempof
251 \let\slotOf=\slotof
252
253 \makeatletter
254 \newcommand{\nrmono}[1]{%
255 {\@tempdima = \fontdimen2\font\relax
256 \texttt{\spaceskip = 1.1\@tempdima #1}}}
257 \makeatother
258
259 \usepackage{times} % denser fonts
260 \renewcommand{\ttdefault}{aett} % \texttt that goes better with times fonts
261 \usepackage{enumerate}
262 \usepackage{url}
263 \usepackage{graphicx}
264 \usepackage{natbib} % redundant for Simon
265 \bibpunct();A{},
266 \let\cite\citep
267 \let\citeyearnopar=\citeyear
268 \let\citeyear=\citeyearpar
269
270 \usepackage[ps2pdf,bookmarksopen,breaklinks,pdftitle=dataflow-made-simple]{hyperref}
271 \usepackage{breakurl} % enables \burl
272
273 \newcommand\naive{na\"\i ve}
274 \newcommand\naively{na\"\i vely}
275 \newcommand\Naive{Na\"\i ve}
276
277 \usepackage{amsfonts}
278 \newcommand\naturals{\ensuremath{\mathbb{N}}}
279 \newcommand\true{\ensuremath{\mathbf{true}}}
280 \newcommand\implies{\supseteq} % could use \Rightarrow?
281
282 \newcommand\PAL{\mbox{C{\texttt{-{}-}}}}
283 \newcommand\high[1]{\mbox{\fboxsep=1pt \smash{\fbox{\vrule height 6pt
284 depth 0pt width 0pt \leavevmode \kern 1pt #1}}}}
285
286 \usepackage{tabularx}
287
288 %%
289 %% 2009/05/10: removed 'float' package because it breaks multiple
290 %% \caption's per {figure} environment. ---NR
291 %%
292 %% % Put figures in boxes --- WHY??? --NR
293 %% \usepackage{float}
294 %% \floatstyle{boxed}
295 %% \restylefloat{figure}
296 %% \restylefloat{table}
297
298
299
300 % ON LINE THREE, set \noauthornotestrue to suppress notes (or not)
301
302 %\newcommand{\qed}{QED}
303 \ifnotesinmargin
304 \long\def\authornote#1{%
305 \ifvmode
306 \marginpar{\raggedright\hbadness=10000
307 \parindent=8pt \parskip=2pt
308 \def\baselinestretch{0.8}\tiny
309 \itshape\noindent #1\par}%
310 \else
311 \unskip\raisebox{-3.5pt}{\rlap{$\scriptstyle\diamond$}}%
312 \marginpar{\raggedright\hbadness=10000
313 \parindent=8pt \parskip=2pt
314 \def\baselinestretch{0.8}\tiny
315 \itshape\noindent #1\par}%
316 \fi}
317 \else
318 % Simon: please set \notesinmarginfalse on the first line
319 \long\def\authornote#1{{\em #1\/}}
320 \fi
321 \ifnoauthornotes
322 \def\authornote#1{\unskip\relax}
323 \fi
324
325 \newcommand{\simon}[1]{\authornote{SLPJ: #1}}
326 \newcommand{\norman}[1]{\authornote{NR: #1}}
327 \let\remark\norman
328 \def\finalremark#1{\relax}
329 % \let \finalremark \remark % uncomment after submission
330 \newcommand{\john}[1]{\authornote{JD: #1}}
331 \newcommand{\todo}[1]{\textbf{To~do:} \emph{#1}}
332 \newcommand\delendum[1]{\relax\ifvmode\else\unskip\fi\relax}
333
334 \newcommand\secref[1]{Section~\ref{sec:#1}}
335 \newcommand\secreftwo[2]{Sections \ref{sec:#1}~and~\ref{sec:#2}}
336 \newcommand\seclabel[1]{\label{sec:#1}}
337
338 \newcommand\figref[1]{Figure~\ref{fig:#1}}
339 \newcommand\figreftwo[2]{Figures \ref{fig:#1}~and~\ref{fig:#2}}
340 \newcommand\figlabel[1]{\label{fig:#1}}
341
342 \newcommand\tabref[1]{Table~\ref{tab:#1}}
343 \newcommand\tablabel[1]{\label{tab:#1}}
344
345
346 \newcommand{\CPS}{\textbf{StkMan}} % Not sure what to call it.
347
348
349 \usepackage{code} % At-sign notation
350
351 \iftimestamp
352 \input{timestamp}
353 \preprintfooter{\mdfivestamp}
354 \fi
355
356 \hyphenation{there-by}
357
358 \renewcommand{\floatpagefraction}{0.9} % must be less than \topfraction
359 \renewcommand{\topfraction}{0.95}
360 \renewcommand{\textfraction}{0.05}
361
362 \begin{document}
363 %\title{\ourlib: Dataflow Optimization Made Simple}
364 \title{\ourlib: A Modular, Reusable Library for\\ Dataflow Analysis and Transformation}
365 %\title{Implementing Dataflow Analysis and Optimization by Lifting Node Functions to Basic Blocks and Control-Flow Graphs}
366 %\subtitle{Programming pearl}
367
368 %% \titlebanner{\textsf{\mdseries\itshape
369 %% Under consideration for publication in the ACM Haskell Symposium.
370 %% Comments
371 %% are welcome; please identify this version by the words
372 %% \textbf{\mdfivewords}.
373 %% }}
374
375 \ifnoauthornotes
376 \makeatletter
377 \let\HyPsd@Warning=
378 \@gobble
379 \makeatother
380 \fi
381
382 % João
383
384
385 \authorinfo{Norman Ramsey}{Tufts University}{nr@cs.tufts.edu}
386 \authorinfo{Jo\~ao Dias}{Tufts University}{dias@cs.tufts.edu}
387 \authorinfo{Simon Peyton Jones}{Microsoft Research}{simonpj@microsoft.com}
388
389
390 \maketitle
391
392 \begin{abstract}
393 \iffalse % A vote for Simon's abstract
394 \remark{I have replaced a good abstract of the POPL submission with a
395 bad abstract of \emph{this} submission.}
396 We present \ourlib, a Haskell library that makes it easy for a
397 compiler writer
398 to implement program transformations based on dataflow analyses.
399 A~client of \ourlib\ defines a representation of
400 logical assertions,
401 a transfer function that computes outgoing assertions from incoming
402 assertions,
403 and a rewrite function that improves code when improvements are
404 justified by the assertions.
405 \ourlib\ does the actual analysis and transformation.
406
407 \ourlib\ implements state-of-the art algorithms:
408 Lerner, Grove, and Chambers's
409 \citeyearpar{lerner-grove-chambers:2002}
410 composition of simple analyses and transformations, which achieves
411 the same precision as complex, handwritten
412 ``super-analyses;''
413 and Whalley's \citeyearpar{whalley:isolation} dynamic technique for
414 isolating bugs in a client's code.
415 \ourlib's implementation is unique in that unlike previous
416 implementations,
417 it carefully separates the tricky
418 elements of each of these algorithms, so that they can be examined and
419 understood independently.
420
421
422 \simon{Here is an alternative abstract based on the four-sentence model.}
423 \remark{Four-sentence model? You must teach me\ldots}
424 \fi
425 Dataflow analysis and transformation of control-flow graphs is
426 pervasive in optimizing compilers, but it is typically tightly
427 interwoven with the details of a \emph{particular} compiler.
428 We~describe \ourlib{}, a reusable Haskell library that makes it
429 unusually easy to define new analyses and
430 transformations for \emph{any} compiler.
431 \ourlib's interface is modular and polymorphic,
432 and it offers unusually strong static guarantees.
433 The implementation
434 is also far from routine: it encapsulates
435 state-of-the-art algorithms (interleaved analysis and rewriting,
436 dynamic error isolation), and it cleanly separates their tricky elements
437 so that they can be understood independently.
438 %
439 %\ourlib\ will be the workhorse of a new
440 %back end for the Glasgow Haskell Compiler (version~6.14, forthcoming).
441
442 %% \emph{Reviewers:} code examples are indexed at {\small\url{http://bit.ly/jkr3K}}
443 %%% Source: http://www.cs.tufts.edu/~nr/drop/popl-index.pdf
444 \end{abstract}
445
446 \makeatactive % Enable @foo@ notation
447
448 \section{Introduction}
449
450 A mature optimizing compiler for an imperative language includes many
451 analyses, the results of which justify the optimizer's
452 code-improving transformations.
453 Many of the most important analyses and transformations---constant
454 propagation, live-variable analysis, inlining, sinking of loads,
455 and so on---should be regarded as particular cases of
456 a single general problem: \emph{dataflow analysis and optimization}.
457 %% \remark{I do not feel compelled to cite Muchnick (or anyone else) here}
458 Dataflow analysis is over thirty years old,
459 but a recent, seminal paper by \citet{lerner-grove-chambers:2002} goes further,
460 describing a powerful but subtle way to
461 \emph{interleave} analysis and transformation so that each
462 piggybacks on the other.
463
464 Because optimizations based on dataflow analysis
465 share a common intellectual framework, and because that framework is
466 subtle, it it tempting to
467 try to build a single reusable library that embodies the
468 subtle ideas, while
469 making it easy for clients to instantiate the library for different
470 situations.
471 Tempting, but difficult.
472 Although some such frameworks exist, as we discuss
473 in \secref{related}, they have complex APIs and implementations,
474 and none implements the Lerner/Grove/Chambers technique.
475
476 In this paper we present \ourlib{} (short for ``higher-order
477 optimization library''), a new Haskell library for dataflow analysis and
478 optimization. It has the following distinctive characteristics:
479
480 \begin{itemize}
481 \item
482 \ourlib\ is purely functional.
483 Perhaps surprisingly, code that
484 manipulates control-flow graphs is easier to write, and far easier
485 to write correctly, when written in a purely functional style
486 \cite{ramsey-dias:applicative-flow-graph}.
487 When analysis and rewriting
488 are interleaved, so that rewriting must be done \emph{speculatively},
489 without knowing whether
490 the result of the rewrite will be retained or discarded,
491 the benefit of a purely functional style is intensified
492 (Sections \ref{sec:overview} and \ref{sec:fixpoints}).
493
494 \item
495 \ourlib\ is polymorphic. Just as a list library is
496 polymorphic in the list elements, so is \ourlib{} polymorphic, both in
497 the nodes that inhabit graphs, and in the dataflow facts that
498 analyses compute over these graphs (\secref{using-hoopl}).
499
500 \item The paper by Lerner, Grove, and Chambers is inspiring but abstract.
501 We articulate their ideas in a concrete but simple API that hides
502 a subtle implementation (Sections \ref{sec:graph-rep} and \ref{sec:using-hoopl}).
503 You provide a representation for assertions,
504 a transfer function that transforms assertions across a node,
505 and a rewrite function that uses a assertion to
506 justify rewriting a node.
507 \ourlib\ ``lifts'' these node-level functions to work over
508 control-flow graphs, sets up and solves recursion equations,
509 and interleaves rewriting with analysis.
510 Designing good abstractions (data types, APIs) is surprisingly
511 hard; we have been through over a dozen significantly different
512 iterations, and we offer our API as a contribution.
513 % ``in its own right'' -- there's an echo in here...
514
515 \item
516 Because the client
517 can perform very local reasoning (``@y@ is live before
518 @x:=y+2@''),%
519 \footnote
520 {Using \hoopl, it is not necessary to have the more complex rule
521 ``if @x@~is live after @x:=y+2@ then @y@ is live before~it,''
522 because if @x@~is \emph{not} live after @x:=y+2@, the assignment
523 @x:=y+2@ will be eliminated.}
524 analyses and transformations built on \ourlib\
525 are small, simple, and easy to get right.
526 Moreover, \ourlib\ helps you write correct optimizations:
527 it~statically rules out transformations that violate invariants
528 of the control-flow graph (Sections \ref{sec:graph-rep} and \ref{sec:rewrites}),
529 and dynamically it can help find the first transformation that introduces a fault
530 in a test program (\secref{fuel}).
531 \finalremark{SLPJ: I wanted to write more about open/closed,
532 but I like this sentence with its claim to both static and dynamic assistance,
533 and maybe the open/closed story is hard to understand here.}
534
535 % \item \ourlib{} makes use of GADTS and type functions to offer unusually
536 % strong static guarantees. In particular, nodes, basic blocks, and
537 % graphs are all statically typed by their open or closedness on entry, and
538 % their open or closedness on exit (\secref{graph-rep}). For example, an add instruction is
539 % open on entry and exit, while a branch is open on entry and closed on exit.
540 % Using these types we can statically guarantee that, say, an add instruction
541 % is rewritten to a graph that is also open on both entry and exit; and
542 % that the user cannot construct a block where an add instruction follows an
543 % unconditional branch. We know of no other system that offers
544 % static guarantees this strong.
545
546 \item \ourlib{} implements subtle algorithms, including
547 (a)~interleaved analysis and rewriting, (b)~speculative rewriting,
548 (c)~computing fixed points, and (d)~dynamic fault isolation.
549 Previous implementations of these algorithms---including three of our
550 own---are complicated and hard to understand, because the tricky pieces
551 are implemented all together, inseparably.
552 A~significant contribution of this paper is
553 a new way to structure the implementation so that each tricky piece
554 is handled in just
555 one place, separate from all the others (\secref{engine}).
556 %\remark{This is a very important claim---is
557 %it substantiated in \secref{engine}? And shouldn't the word
558 %\textbf{contribution} appear in this~\P?}
559 % \simon{Better?} % yes thanks ---NR
560 The result is sufficiently elegant that
561 we emphasize the implementation as an object of interest in
562 its own right.
563 \end{itemize}
564 A working prototype of \ourlib{} is available from
565 \burl{http://ghc.cs.tufts.edu/hoopl} and also from Hackage.
566 It is no toy: an ancestor of this library is
567 part of the Glasgow Haskell Compiler, where it optimizes the
568 imperative {\PAL} code in GHC's back end. The new design is far
569 nicer, and it will be in GHC shortly.
570
571 The API for \ourlib{} seems quite natural, but it requires
572 relatively sophisticated aspects of Haskell's type system, such
573 as higher-rank polymorphism, GADTs, and type functions.
574 As such, \ourlib{} offers a compelling case study in the utility
575 of these features.
576
577
578 \section{Dataflow analysis {\&} transformation by \texorpdfstring{\rlap{example}}{example}}
579 \seclabel{overview}
580 \seclabel{constant-propagation}
581 \seclabel{example:transforms}
582 \seclabel{example:xforms}
583
584 We begin by setting the scene, introducing some vocabulary, and
585 showing a small motivating example.
586 A control-flow graph, perhaps representing the body of a procedure,
587 is a collection of \emph{basic blocks}---or just ``blocks''.
588 Each block is a sequence of instructions,
589 beginning with a label and ending with a
590 control-transfer instruction that branches to other blocks.
591 % Each block has a label at the beginning,
592 % a sequence of
593 % -- each of which has a label at the
594 % beginning. Each block may branch to other blocks with arbitrarily
595 % complex control flow.
596 The goal of dataflow optimization is to compute valid
597 \emph{assertions} (or \emph{dataflow facts}),
598 then use those assertions to justify code-improving
599 transformations (or \emph{rewrites}) on a \emph{control-flow graph}.
600
601 Consider a concrete example: constant propagation with constant folding.
602 On the left we have a basic block; in the middle we have
603 facts that hold between statements (or \emph{nodes})
604 in the block; and at
605 the right we have the result of transforming the block
606 based on the assertions:
607 \begin{verbatim}
608 Before Facts After
609 ------------{}-------------
610 x := 3+4 x := 7
611 ----------{x=7}------------
612 z := x>5 z := True
613 -------{x=7, z=True}-------
614 if z goto L1
615 then goto L1
616 else goto L2
617 \end{verbatim}
618 Constant propagation works
619 from top to bottom. We start with the empty fact.
620 Given the empty fact and the node @x:=3+4@ can we make a (constant-folding)
621 transformation?
622 Yes! We can replace the node with @x:=7@.
623 Now, given this transformed node,
624 and the original fact, what fact flows out of the bottom of
625 the transformed node?
626 The~fact \{@x=7@\}.
627 Given the fact \{@x=7@\} and the node @z:=x>5@, can we make a
628 transformation? Yes: constant propagation can replace the node with @z:=7>5@.
629 Now, can we do another transformation? Yes: constant folding can
630 replace the node with @z:=True@.
631 And so the process continues to the end of the block, where we
632 can replace the conditional branch with an unconditional one, @goto L1@.
633
634 The example above is simple because the program has only straightline code;
635 when programs have loops, dataflow analysis gets more complicated.
636 For example,
637 consider the following graph,
638 where we assume @L1@ is the entry point:
639 \begin{verbatim}
640 L1: x=3; y=4; if z then goto L2 else goto L3
641 L2: x=7; goto L3
642 L3: ...
643 \end{verbatim}
644 Because control flows to @L3@ from two places,
645 we must \emph{join} the facts coming from those two places.
646 All paths to @L3@ produce the fact @y=@$4$,
647 so we can conclude that this fact holds at @L3@.
648 But depending on the the path to @L3@, @x@ may have different
649 values, so we conclude ``@x=@$\top$'',
650 meaning that there is no single value held by @x@ at @L3@.%
651 \footnote{
652 In this example @x@ really does vary at @L3@, but in general
653 the analysis might be conservative.}
654 The final result of joining the dataflow facts that flow to @L3@
655 is the new fact $@x=@\top \land @y=4@ \land @z=@\top$.
656
657 \seclabel{const-prop-example}
658
659 \seclabel{simple-tx}
660 \paragraph{Interleaved transformation and analysis.}
661 Our example \emph{interleaves} transformation and analysis.
662 Interleaving makes it far easier to write effective analyses.
663 If, instead, we \emph{first} analyzed the block
664 and \emph{then} transformed it, the analysis would have to ``predict''
665 the transformations.
666 For example, given the incoming fact \{@x=7@\}
667 and the instruction @z:=x>5@,
668 a pure analysis could produce the outgoing fact
669 \{@x=7@, @z=True@\} by simplifying @x>5@ to @True@.
670 But the subsequent transformation must perform
671 \emph{exactly the same simplification} when it transforms the instruction to @z:=True@!
672 If instead we \emph{first} rewrite the node to @z:=True@,
673 and \emph{then} apply the transfer function to the new node,
674 the transfer function becomes laughably simple: it merely has to see if the
675 right hand side is a constant (you can see actual code in \secref{const-prop-client}).
676 The gain is even more compelling if there are a number of interacting
677 analyses and/or transformations; for more substantial
678 examples, consult \citet{lerner-grove-chambers:2002}.
679
680 \paragraph{Forwards and backwards.}
681 Constant propagation works \emph{forwards}, and a fact is typically an
682 assertion about the program state (such as ``variable~@x@ holds value~@7@'').
683 Some useful analyses work \emph{backwards}.
684 A~prime example is live-variable analysis, where a fact takes the~form
685 ``variable @x@ is live'' and is an assertion about the
686 \emph{continuation} of a program point. For example, the fact ``@x@~is
687 live'' at a program point P is an assertion that @x@ is used on some program
688 path starting at \mbox{P}. % TeXbook, exercise 12.6
689 The accompanying transformation is called dead-code elimination;
690 if @x@~is not live, this transformation
691 replaces the node @x:=e@ with a no-op.
692
693 % ----------------------------------------
694 \section{Representing control-flow graphs} \seclabel{graph-rep}
695
696 \ourlib{} is a library that makes it easy to define dataflow analyses,
697 and transformations driven by these analyses, on control-flow graphs.
698 Graphs are composed from smaller units, which we discuss from the
699 bottom up:\finalremark{something about replacement graphs?}
700 \begin{itemize}
701 \item A \emph{node} is defined by \ourlib's client;
702 \ourlib{} knows nothing about the representation of nodes (\secref{nodes}).
703 \item A basic \emph{block} is a sequence of nodes (\secref{blocks}).
704 \item A \emph{graph} is an arbitrarily complicated control-flow graph,
705 composed from basic blocks (\secref{graphs}).
706 \end{itemize}
707
708 \subsection{Shapes: Open and closed}
709
710 Nodes, blocks, and graphs share important properties in common.
711 In particular, each can be \emph{open or closed at entry}
712 and \emph{open or closed at exit}.
713 An \emph{open} point is one at which control may implicitly ``fall through;''
714 to transfer control at a \emph{closed} point requires an explicit
715 control-transfer instruction to a named label.
716 For example,
717 \begin{itemize}
718 \item A shift-left instruction is open on entry (because control can fall into it
719 from the preceding instruction), and open on exit (because control falls through
720 to the next instruction).
721 \item An unconditional branch is open on entry, but closed on exit (because
722 control cannot fall through to the next instruction).
723 \item A label is closed on entry (because in \ourlib{} we do not allow
724 control to fall through into a branch target), but open on exit.
725 \end{itemize}
726 % This taxonomy enables \ourlib{} to enforce invariants:
727 % only nodes closed at entry can be the targets of branches, and only nodes closed
728 % at exits can transfer control (see also \secref{edges}).
729 % As~a consequence, all control transfers originate at control-transfer
730 % instructions and terminated at labels; this invariant dramatically
731 % simplifies analysis and transformation.
732 These examples concern nodes, but the same classification applies
733 to blocks and graphs. For example the block
734 \begin{code}
735 x:=7; y:=x+2; goto L
736 \end{code}
737 is open on entry and closed on exit.
738 This is the block's \emph{shape}, which we often abbreviate
739 ``open/closed;''
740 we may refer to an ``open/closed block.''
741
742 The shape of a thing determines that thing's control-flow properties.
743 In particular, whenever E is a node, block, or graph,
744 % : \simon{Removed the claim about a unique entry point.}
745 \begin{itemize}
746 \item
747 If E is open at the entry, it has a unique predecessor;
748 if it is closed, it may have arbitrarily many predecessors---or none.
749 \item
750 If E is open at the exit, it has a unique successor;
751 if it is closed, it may have arbitrarily many successors---or none.
752 \end{itemize}
753 %%%%
754 %%%%
755 %%%% % \item Regardless of whether E is open or closed,
756 %%%% % it has a unique entry point where execution begins.
757 %%%% \item If E is closed at exit, control leaves \emph{only}
758 %%%% by explicit branches from closed-on-exit nodes.
759 %%%% \item If E is open at exit, control \emph{may} leave E
760 %%%% by ``falling through'' from a distinguished exit point.
761 %%%% \remark{If E is a node or block, control \emph{only} leaves E by
762 %%%% falling through, but this isn't so for a graph. Example: a body of a
763 %%%% loop contains a \texttt{break} statement} \simon{I don't understand.
764 %%%% A break statement would have to be translated as a branch, no?
765 %%%% Can you give a an example? I claim that control only leaves an
766 %%%% open graph by falling through.}
767 %%%% \end{itemize}
768
769
770 \subsection{Nodes} \seclabel{nodes}
771
772 The primitive constituents of a \ourlib{} control-flow graph are
773 \emph{nodes}, which are defined by the client.
774 Typically, a node might represent a machine instruction, such as an
775 assignment, a call, or a conditional branch.
776 But \ourlib{}'s graph representation is polymorphic in the node type,
777 so each client can define nodes as it likes.
778 Because they contain nodes defined by the client,
779 graphs can include arbitrary client-specified data, including
780 (say) C~statements, method calls in an object-oriented language, or
781 whatever.
782
783
784 \begin{figure}
785 \begin{fuzzcode}{0.98pt}
786 data `Node e x where
787 Label :: Label -> Node C O
788 `Assign :: Var -> Expr -> Node O O
789 `Store :: Expr -> Expr -> Node O O
790 `Branch :: Label -> Node O C
791 `CondBranch :: Expr -> Label -> Label -> Node O C
792 -- ... more constructors ...
793 \end{fuzzcode}
794 \caption{A typical node type as it might be defined by a client}
795 \figlabel{cmm-node}
796 \end{figure}
797
798 \ourlib{} knows \emph{at compile time} whether a node is open or
799 closed at entry and exit:
800 the type of a node has kind @*->*->*@, where the two type parameters
801 are type-level flags, one for entry and one for exit.
802 Such a type parameter may be instantiated only with type @O@~(for
803 open) or type~@C@ (for closed).
804 As an example,
805 \figref{cmm-node} shows a typical node type as it might be written by
806 one of \ourlib's {clients}.
807 The type parameters are written @e@ and @x@, for
808 entry and exit respectively.
809 The type is a generalized algebraic data type;
810 the syntax gives the type of each constructor.
811 %%% \cite{peyton-jones:unification-based-gadts}.
812 For example, constructor @Label@
813 takes a @Label@ and returns a node of type @Node C O@, where
814 the~``@C@'' says ``closed at entry'' and the~``@O@'' says ``open at exit''.
815 The types @Label@, @O@, and~@C@ are
816 defined by \ourlib{} (\figref{graph}).
817
818 Similarly, an @Assign@ node takes a variable and an expression, and
819 returns a @Node@ open at both entry and exit; the @Store@ node is
820 similar. The types @`Var@ and @`Expr@ are private to the client, and
821 \ourlib{} knows nothing of them.
822 Finally, the control-transfer nodes @Branch@ and @CondBranch@ are open at entry
823 and closed at exit.
824
825 Nodes closed on entry are the only targets of control transfers;
826 nodes open on entry and exit never perform control transfers;
827 and nodes closed on exit always perform control transfers\footnote{%
828 To obey these invariants,
829 a node for
830 a conditional-branch instruction, which typically either transfers control
831 \emph{or} falls through, must be represented as a two-target
832 conditional branch, with the fall-through path in a separate block.
833 This representation is standard \cite{appel:modern},
834 and it costs nothing in practice:
835 such code is easily sequentialized without superfluous branches.
836 %a late-stage code-layout pass can readily reconstruct efficient code.
837 }.
838 Because of the position each type of node occupies in a
839 basic block,
840 we~often call them \emph{first}, \emph{middle}, and \emph{last} nodes
841 respectively.
842
843 \subsection{Blocks} \seclabel{blocks}
844
845 \begin{figure}
846 \begin{fuzzcode}{0.98pt}
847 data `O -- Open
848 data `C -- Closed
849
850 data `Block n e x where
851 `BFirst :: n C O -> Block n C O
852 `BMiddle :: n O O -> Block n O O
853 `BLast :: n O C -> Block n O C
854 `BCat :: Block n e O -> Block n O x -> Block n e x
855
856 data `Graph n e x where
857 `GNil :: Graph n O O
858 `GUnit :: Block n O O -> Graph n O O
859 `GMany :: MaybeO e (Block n O C)
860 -> LabelMap (Block n C C)
861 -> MaybeO x (Block n C O)
862 -> Graph n e x
863
864 data `MaybeO ^ex t where
865 `JustO :: t -> MaybeO O t
866 `NothingO :: MaybeO C t
867
868 newtype `Label = Label Int
869
870 class `NonLocal n where
871 `entryLabel :: n C x -> Label
872 `successors :: n e C -> [Label]
873 \end{fuzzcode}
874 \caption{The block and graph types defined by \ourlib}
875 \figlabel{graph} \figlabel{edges}
876 \end{figure}
877
878 \ourlib\ combines the client's nodes into
879 blocks and graphs, which, unlike the nodes, are defined by \ourlib{}
880 (\figref{graph}).
881 A~@Block@ is parameterized over the node type~@n@
882 as well as over the same flag types that make it open or closed at
883 entry and exit.
884
885 The @BFirst@, @BMiddle@, and @BLast@ constructors create one-node
886 blocks.
887 Each of these constructors is polymorphic in the node's \emph{representation}
888 but monomorphic in its \emph{shape}.
889 An~earlier representation of blocks used a single constructor
890 of type \mbox{@n e x -> Block n e x@}, polymorphic in a node's representation \emph{and}
891 shape.
892 The representation of blocks in \figref{graph} has more constructors, but it
893 uses the power of GADTs to ensure that the shape of every node is
894 known statically.
895 This property makes no difference to clients, but it significantly
896 simplifies the implementation of analysis and transformation in
897 \secref{monomorphic-shape-outcome}.
898
899 The @BCat@ constructor concatenates blocks in sequence.
900 It~makes sense to concatenate blocks only when control can fall
901 through from the first to the second; therefore,
902 two blocks may be concatenated {only} if each block is open at
903 the point of concatenation.
904 This restriction is enforced by the type of @BCat@, whose first
905 argument must be open on exit, and whose second argument must be open on entry.
906 It~is statically impossible, for example, to concatenate a @Branch@
907 immediately before an
908 @Assign@.
909 Indeed, the @Block@ type statically guarantees that any
910 closed/closed @Block@---which compiler writers normally
911 call a ``basic block''---consists of exactly one closed/open node
912 (such as @Label@ in \figref{cmm-node}),
913 followed by zero or more open/open nodes (@Assign@ or @Store@),
914 and terminated with exactly one
915 open/closed node (@Branch@ or @CondBranch@).
916 Using GADTs to enforce these invariants is one of
917 \ourlib{}'s innovations.
918 % Also notice that the definition of @Block@ guarantees that every @Block@
919 % has at least one node.
920 % SPJ: I think we've agreed to stick with PNil.
921 % \remark{So what? Who cares? Why is it important
922 % that no block is empty? (And if we had an empty block, we could ditch
923 % @PNil@ from @PGraph@, which woudl be an improvement.)}
924
925 \subsection{Graphs} \seclabel{graphs}
926
927 \ourlib{} composes blocks into graphs, which are also defined
928 in \figref{graph}.
929 Like @Block@, the data type @Graph@ is parameterized over
930 both nodes @n@ and its open/closed shape (@e@ and @x@).
931 It has three constructors. The first two
932 deal with the base cases of open/open graphs:
933 an empty graph is represented by @GNil@ while a single-block graph
934 is represented by @GUnit@.
935
936 More general graphs are represented by @GMany@, which has three
937 fields: an optional entry sequence, a body, and an optional exit
938 sequence.
939 \begin{itemize}
940 \item
941 If the graph is open at the entry, it contains an entry sequence of type
942 @Block n O C@.
943 We could represent this sequence as a value of type
944 @Maybe (Block n O C)@, but we can do better:
945 a~value of @Maybe@ type requires a \emph{dynamic} test,
946 but we know \emph{statically}, at compile time, that the sequence is present if and only
947 if the graph is open at the entry.
948 We~express our compile-time knowledge by using the type
949 @MaybeO e (Block n O C)@, a type-indexed version of @Maybe@
950 which is also defined in \figref{graph}:
951 the type @MaybeO O a@ is isomorphic to~@a@, while
952 the type @MaybeO C a@ is isomorphic to~@()@.
953 \item
954 The body of the graph is a collection of closed/closed blocks.
955 To~facilitate traversal of the graph, we represent a body as a finite
956 map from label to block.
957 \item
958 The exit sequence is dual to the entry sequence, and like the entry
959 sequence, its presence or absence is deducible from the static type of the graph.
960 \end{itemize}
961
962 \seclabel{gSplice}
963
964 We can splice graphs together nicely; the cost is logarithmic in the
965 number of closed/closed blocks.
966 Unlike blocks, two graphs may be spliced together
967 not only when they are both open
968 at splice point but also
969 when they are both closed---and not in the other two cases:
970 \begin{smallfuzzcode}{10.8pt}
971 `gSplice :: Graph n e a -> Graph n a x -> Graph n e x
972 gSplice GNil g2 = g2
973 gSplice g1 GNil = g1
974
975 gSplice (GUnit ^b1) (GUnit ^b2) = GUnit (b1 `BCat` b2)
976
977 gSplice (GUnit b) (GMany (JustO e) bs x)
978 = GMany (JustO (b `BCat` e)) bs x
979
980 gSplice (GMany e ^bs (JustO x)) (GUnit b2)
981 = GMany e bs (JustO (x `BCat` b2))
982
983 gSplice (GMany e1 ^bs1 (JustO x1)) (GMany (JustO e2) ^bs2 x2)
984 = GMany e1 (bs1 `mapUnion` (b `addBlock` bs2)) x2
985 where b = x1 `BCat` e2
986
987 gSplice (GMany e1 bs1 NothingO) (GMany NothingO bs2 x2)
988 = GMany e1 (bs1 `mapUnion` bs2) x2
989 \end{smallfuzzcode}
990 This definition illustrates the power of GADTs: the
991 pattern matching is exhaustive, and all the open/closed invariants are
992 statically checked. For example, consider the second-last equation for @gSplice@.
993 Since the exit link of the first argument is @JustO x1@,
994 we know that type parameter~@a@ is~@O@, and hence the entry link of the second
995 argument must be @JustO e2@.
996 Moreover, block~@x1@ must be
997 closed/open, and block~@e2@ must be open/closed.
998 We can therefore concatenate them
999 with @BCat@ to produce a closed/closed block, which is
1000 added to the body of the result.
1001
1002 We~have carefully crafted the types so that if @BCat@ and @BodyCat@
1003 are considered as associative operators,
1004 every graph has a unique representation.\finalremark{So what? (Doumont)
1005 Need some words about how this is part of making the API simple for
1006 the client---I've added something to the end of the paragraph, but I'm
1007 not particularly thrilled.}
1008 %% \simon{Well, you were the one who was so keen on a unique representation!
1009 %% And since we have one, I think tis useful to say so. Lastly, the representation of
1010 %% singleton blocks is not entirely obvious.}
1011 %%%%
1012 %%%% An empty open/open graph is represented
1013 %%%% by @GNil@, while a closed/closed one is @gNilCC@:
1014 %%%% \par {\small
1015 %%%% \begin{code}
1016 %%%% gNilCC :: Graph C C
1017 %%%% gNilCC = GMany NothingO BodyEmpty NothingO
1018 %%%% \end{code}}
1019 %%%% The representation of a @Graph@ consisting of a single block~@b@
1020 %%%% depends on the shape of~@b@:\remark{Does anyone care?}
1021 %%%% \par{\small
1022 %%%% \begin{code}
1023 %%%% gUnitOO :: Block O O -> Graph O O
1024 %%%% gUnitOC :: Block O C -> Graph O C
1025 %%%% gUnitCO :: Block O C -> Graph C O
1026 %%%% gUnitCC :: Block O C -> Graph C C
1027 %%%% gUnitOO b = GUnit b
1028 %%%% gUnitOC b = GMany (JustO b) BodyEmpty NothingO
1029 %%%% gUnitCO b = GMany NothingO BodyEmpty (JustO b)
1030 %%%% gUnitCC b = GMany NothingO (BodyUnit b) NothingO
1031 %%%% \end{code}}
1032 %%%% Multi-block graphs are similar.
1033 %%%% From these definitions
1034 To guarantee uniqueness, @GUnit@ is restricted to open/open
1035 blocks.
1036 If~@GUnit@ were more polymorphic, there would be
1037 more than one way to represent some graphs, and it wouldn't be obvious
1038 to a client which representation to choose---or if the choice made a difference.
1039
1040
1041 \subsection{Labels and successors} \seclabel{edges}
1042
1043 Although \ourlib{} is polymorphic in the type of nodes,
1044 it~still needs to know how nonlocal control is transferred from one
1045 block to another:
1046 \hoopl\ must know where
1047 a control transfer can go,
1048 and it must know what @Label@ is at the start of a block.
1049 The client must provide this information by placing its node type into
1050 \ourlib's @NonLocal@ type class (\figref{edges}).
1051 The @entryLabel@ method takes a first node (one closed on entry, \secref{nodes})
1052 and returns its @Label@;
1053 the @successors@ method takes a last node (closed on exit) and returns
1054 the @Label@s to
1055 which it can transfer control.
1056 A~middle node, which is open at both entry and exit, cannot refer to
1057 any @Label@s,
1058 so no corresponding interrogation function is needed.
1059
1060 A node type defined by a client must be an instance of @NonLocal@.
1061 In~\figref{cmm-node},
1062 the client's instance declaration for @Node@ would be
1063 \begin{code}
1064 instance NonLocal Node where
1065 entryLabel (Label l) = l
1066 successors (Branch b) = [b]
1067 successors (CondBranch e b1 b2) = [b1, b2]
1068 \end{code}
1069 Again, the pattern matching for both functions is exhaustive, and
1070 the compiler statically checks this fact. Here, @entryLabel@
1071 cannot be applied to an @Assign@ or @Branch@ node,
1072 and any attempt to define a case for @Assign@ or @Branch@ would result
1073 in a type error.
1074
1075 While it is required for the client to provide this information about
1076 nodes, it is very convenient for \ourlib\ to get the same information
1077 about blocks.
1078 For its own internal use,
1079 \ourlib{} provides this instance declaration for the @Block@ type:
1080 \begin{code}
1081 instance NonLocal n => NonLocal (Block n) where
1082 entryLabel (BFirst n) = entryLabel n
1083 entryLabel (BCat b _) = entryLabel b
1084 successors (BLast n) = successors n
1085 successors (BCat _ b) = successors b
1086 \end{code}
1087 Because the functions @entryLabel@ and @successors@ are used to track control
1088 flow \emph{within} a graph, \ourlib\ does not need to ask for the
1089 entry label or successors of a @Graph@ itself.
1090 Indeed, @Graph@ \emph{cannot} be an instance of @NonLocal@, because even
1091 if a @Graph@ is closed at the entry, it does not have a unique entry label.
1092 %
1093 % A slight infelicity is that we cannot make @Graph@ an instance of @NonLocal@,
1094 % because a graph closed on entry has no unique label. Fortunately, we never
1095 % need @Graph@s to be in @NonLocal@.
1096 %
1097 % ``Never apologize. Never confess to infelicity.'' ---SLPJ
1098
1099
1100
1101
1102 \begin{table}
1103 \centerline{%
1104 \begin{tabular}{@{}>{\raggedright\arraybackslash}p{1.03in}>{\scshape}c>{\scshape
1105 }
1106 c>{\raggedright\arraybackslash}p{1.29in}@{}}
1107 &\multicolumn1{r}{\llap{\emph{Specified}}\hspace*{-0.3em}}&
1108 \multicolumn1{l}{\hspace*{-0.4em}\rlap{\emph{Implemented}}}&\\
1109 \multicolumn1{c}{\emph{Part of optimizer}}
1110 &\multicolumn1{c}{\emph{by}}&
1111 \multicolumn1{c}{\emph{by}}&
1112 \multicolumn1{c}{\emph{How many}}%
1113 \\[5pt]
1114 Control-flow graphs& Us & Us & One \\
1115 Nodes in a control-flow graph & You & You & One type per intermediate language \\[3pt]
1116 Dataflow fact~$F$ & You & You & One type per logic \\
1117 Lattice operations & Us & You & One set per logic \\[3pt]
1118 Transfer functions & Us & You & One per analysis \\
1119 Rewrite functions & Us & You & One per \rlap{transformation} \\[3pt]
1120 Solve-and-rewrite functions & Us & Us & Two (forward, backward) \\
1121 \end{tabular}%
1122 }
1123 \caption{Parts of an optimizer built with \ourlib}
1124 \tablabel{parts}
1125 \end{table}
1126
1127
1128
1129 \section {Using \ourlib{} to analyze and transform graphs} \seclabel{using-hoopl}
1130 \seclabel{making-simple}
1131 \seclabel{create-analysis}
1132
1133 Now that we have graphs, how do we optimize them?
1134 \ourlib{} makes it easy for a
1135 client to build a new dataflow analysis and optimization.
1136 The client must supply the following pieces:
1137 \begin{itemize}
1138 \item \emph{A node type} (\secref{nodes}).
1139 \ourlib{} supplies the @Block@ and @Graph@ types
1140 that let the client build control-flow graphs out of nodes.
1141 \item \emph{A data type of facts} and some operations over
1142 those facts (\secref{facts}).
1143 Each analysis uses facts that are specific to that particular analysis,
1144 which \ourlib{} accommodates by being polymorphic in
1145 the fact type.
1146 \item \emph{A transfer function} that takes a node and returns a
1147 \emph{fact transformer}, which takes a fact flowing into the node and
1148 returns the transformed fact that flows out of the node (\secref{transfers}).
1149 \item \emph{A rewrite function} that takes a node and an input fact,
1150 and which returns either @Nothing@
1151 or @(Just g)@ where @g@~is a graph that should
1152 replace the node.
1153 The ability to replace a \emph{node} by a \emph{graph} that may include
1154 internal control flow is
1155 crucial for many code-improving transformations.
1156 We discuss the rewrite function
1157 in Sections \ref{sec:rewrites} and \ref{sec:shallow-vs-deep}.
1158 \end{itemize}
1159 These requirements are summarized in \tabref{parts}.
1160 Because facts, transfer functions, and rewrite functions work closely together,
1161 we represent their combination
1162 as a single record of type @FwdPass@ (\figref{api-types}).
1163
1164
1165 \ifpagetuning\enlargethispage{\baselineskip}\fi
1166
1167 Given a node type~@n@ and a @FwdPass@,
1168 a client can ask \ourlib{}\ to analyze and rewrite a
1169 graph.
1170 \hoopl\ provides a fully polymorphic interface, but for purposes of
1171 exposition, we present a function that is specialized to a
1172 closed/closed graph:
1173 \begin{code}
1174 analyzeAndRewriteFwdBody
1175 :: ( FuelMonad m -- Gensym and other state
1176 , NonLocal n ) -- Extract non-local flow edges
1177 => FwdPass m n f -- Lattice, transfer, rewrite
1178 -> [Label] -- Entry point(s)
1179 -> Graph n C C -- Input graph
1180 -> FactBase f -- Input fact(s)
1181 -> m ( Graph n C C -- Result graph
1182 , FactBase f ) -- ... and its facts
1183 \end{code}
1184 Given a @FwdPass@, the
1185 analyze-and-rewrite function transforms a graph into
1186 an optimized graph.
1187 As its type shows, this function
1188 is polymorphic in the types of nodes~@n@ and facts~@f@;
1189 these types are determined entirely by the client.
1190 The type of the monad~@m@ is also supplied by the client, but it must
1191 meet certain constraints laid out by \hoopl, as described in
1192 \secref{fuel-monad}.
1193
1194
1195 \begin{figure}
1196 \begin{code}
1197 data `FwdPass m n f
1198 = FwdPass { `fp_lattice :: DataflowLattice f
1199 , `fp_transfer :: FwdTransfer n f
1200 , `fp_rewrite :: FwdRewrite m n f }
1201
1202 ------- Lattice ----------
1203 data `DataflowLattice a = DataflowLattice
1204 { `fact_bot :: a
1205 , `fact_join :: OldFact a -> NewFact a
1206 -> (ChangeFlag, a) }
1207
1208 data `ChangeFlag = `NoChange | `SomeChange
1209 newtype `OldFact a = OldFact a
1210 newtype `NewFact a = NewFact a
1211
1212 ------- Transfers ----------
1213 newtype `FwdTransfer n f -- abstract type
1214 `mkFTransfer'
1215 :: (forall e x . n e x -> f -> Fact x f)
1216 -> FwdTransfer n f
1217
1218 ------- Rewrites ----------
1219 newtype `FwdRewrite m n f -- abstract type
1220 `mkFRewrite'
1221 :: (forall e x . n e x -> f -> m (Maybe (FwdRew m n f e x)))
1222 -> FwdRewrite m n f
1223
1224 data `FwdRew m n f e x
1225 = FwdRew (Graph n e x) (FwdRewrite m n f)
1226
1227 ------- Fact-like things -------
1228 type family `Fact x f :: *
1229 type instance Fact O f = f
1230 type instance Fact C f = FactBase f
1231
1232 ------- FactBase -------
1233 type `FactBase f = LabelMap f
1234 -- A finite mapping from Labels to facts f
1235 \end{code}
1236 \caption{\ourlib{} API data types}
1237 \figlabel{api-types}
1238 \figlabel{lattice-type} \figlabel{lattice}
1239 \figlabel{transfers} \figlabel{rewrites}
1240 \end{figure}
1241 % omit mkFactBase :: [(Label, f)] -> FactBase f
1242
1243
1244
1245 As well as taking and returning a graph with its entry point(s), the
1246 function also takes input facts (the @FactBase@) and produces output facts.
1247 A~@FactBase@ is simply a finite mapping from @Label@ to facts.
1248 The
1249 output @FactBase@ maps each @Label@ in the @Body@ to its fact; if
1250 the @Label@ is not in the domain of the @FactBase@, its fact is the
1251 bottom element of the lattice.
1252 The point(s) at which control flow may enter the graph are supplied by
1253 the caller; the input @FactBase@ supplies any
1254 facts that hold at these points.
1255 For example, in our constant-propagation example from \secref{const-prop-example},
1256 if the graph
1257 represents the body of a procedure
1258 with parameters $x,y,z$, we would map the entry @Label@ to a fact
1259 $@x=@\top \land @y=@\top \land @z=@\top$, to specify that the procedure's
1260 parameters are not known to be constants.
1261
1262 The client's model of how @analyzeAndRewriteFwdBody@ works is as follows:
1263 \ourlib\ walks forward over each block in the graph.
1264 At each node, \ourlib\ applies the
1265 rewrite function to the node and the incoming fact. If the rewrite
1266 function returns @Nothing@, the node is retained as part of the output
1267 graph, the transfer function is used to compute the downstream fact,
1268 and \ourlib\ moves on to the next node.
1269 But if the rewrite function returns @Just (FwdRew g rw)@,
1270 indicating that it wants to rewrite the node to the replacement graph~@g@, then
1271 \ourlib\ recursively analyzes and rewrites~@g@, using the new rewrite
1272 function~@rw@, before moving on to the next node.
1273 A~node following a rewritten node sees
1274 \emph{up-to-date} facts; that is, its input fact is computed by
1275 analyzing the replacement graph.
1276
1277 In this section we flesh out the
1278 \emph{interface} to @analyzeAndRewriteFwdBody@, leaving the implementation for
1279 \secref{engine}.
1280 {\hfuzz=7.2pt\par}
1281
1282 \subsection{Dataflow lattices} \seclabel{lattices} \seclabel{facts}
1283
1284 For each analysis or transformation, the client must define a type
1285 of dataflow facts.
1286 A~dataflow fact often represents an assertion
1287 about a program point,\footnote
1288 {In \ourlib, a program point is simply an edge in a
1289 control-flow graph.}
1290 but in general, dataflow analysis establishes properties of \emph{paths}:
1291 \begin{itemize}
1292 \item An assertion about all paths \emph{to} a program point is established
1293 by a \emph{forwards analysis}. For example the assertion ``$@x@=@3@$'' at point P
1294 claims that variable @x@ holds value @3@ at P, regardless of the
1295 path by which P is reached.
1296 \item An assertion about all paths \emph{from} a program point is
1297 established by a \emph{backwards analysis}. For example, the
1298 assertion ``@x@ is dead'' at point P claims that no path from P uses
1299 variable @x@.
1300 \end{itemize}
1301
1302 A~set of dataflow facts must form a lattice, and \ourlib{} must know
1303 (a)~the bottom element of the lattice and (b)~how to take
1304 the least upper bound (join) of two elements.
1305 To ensure that analysis
1306 terminates, it is enough if every fact has a finite number of
1307 distinct facts above it, so that repeated joins
1308 eventually reach a fixed point.
1309
1310 In practice, joins are computed at labels.
1311 If~$f_{\mathit{old}}$ is the fact currently associated with the
1312 label~$L$,
1313 and if a transfer function propagates a new fact~$f_{\mathit{new}}$
1314 into the label~$L$,
1315 the dataflow engine replaces $f_{\mathit{old}}$ with
1316 the join $f_{\mathit{old}} \join f_{\mathit{new}}$.
1317 Furthermore, the dataflow engine wants to know if
1318 $f_{\mathit{new}} \join f_{\mathit{old}} = f_{\mathit{old}}$,
1319 because if not, the analysis has not reached a fixed point.
1320
1321 The bottom element and join operation of a lattice of facts of
1322 type~@f@ are stored in a value of type @DataflowLattice f@
1323 (\figref{lattice}).
1324 %%Such a value is part of the @FwdPass n f@ that is passed to
1325 %%@analyzeAndRewriteFwd@ above.
1326 As noted in the previous paragraph,
1327 \ourlib{} needs to know when the result of a join is equal
1328 to one of the arguments joined.
1329 Because this information can often be computed cheaply together
1330 when the join, \ourlib\ does not
1331 require a separate equality test on facts (which might be
1332 expensive).
1333 Instead, \ourlib\ requires that @fact_join@ return a @ChangeFlag@ as
1334 well as the least upper bound.
1335 The @ChangeFlag@ should be @NoChange@ if
1336 the result is the same as the old fact, and
1337 @SomeChange@ if the result differs.
1338
1339 \seclabel{WithTop}
1340
1341 To help clients create lattices and join functions,
1342 \hoopl\ includes functions and constructors that extend a type~@a@
1343 with top and bottom elements.
1344 In this paper, we use only type @`WithTop@, which has value
1345 constructors with these types:
1346 \begin{code}
1347 `PElem :: a -> WithTop a
1348 `Top :: WithTop a
1349 \end{code}
1350 \hoopl\ also provides a @`Bot@ constructor and types @WithTop@,
1351 @`WithBot@, and @`WithTopAndBot@.
1352 All are defined using a single GADT,
1353 so that value constructors @Top@ and @Bot@ may be used with any of the
1354 types.
1355
1356 The real convenience of @WithTop@, @WithBot@, and @WithTopAndBot@ is
1357 that \hoopl\ provides a combinator that lifts
1358 a join function on~@a@ to a join function on
1359 @WithTop a@, and similarly for the other types.
1360 The lifting combinator ensures that joins involving top and bottom
1361 elements not only obey the appropriate algebraic laws but also set the
1362 @ChangeFlag@ properly.
1363
1364
1365
1366
1367 % A~possible representation of the facts needed to implement
1368 % constant propagation is shown in \figref{const-prop}.
1369 % A~fact
1370 % is represented as a finite map from a variable to a value of type
1371 % @Maybe HasConst@.%
1372 % A~variable $x$ maps to @Nothing@ iff $x=\bot$;
1373 % $x$~maps to @Just Top@ iff $x=\top$;
1374 % and $x$~maps to $@Just@\, (@I@\, k)$ iff $x=k$ (and similarly for
1375 % Boolean constants).\remark{Ugh}
1376 % Any one procedure has only
1377 % finitely many variables; only finitely many facts are computed at any
1378 % program point; and in this lattice any one fact can increase at most
1379 % twice. These properties ensure that the dataflow engine will reach a
1380 % fixed point.
1381
1382
1383 \subsection{The transfer function} \seclabel{transfers}
1384
1385 A~forward transfer function is presented with the dataflow fact(s) on
1386 the edge(s) coming
1387 into a node, and it computes dataflow fact(s) on the outgoing edge(s).
1388 In a forward analysis, the dataflow engine starts with the fact at the
1389 beginning of a block and applies the transfer function to successive
1390 nodes in that block until eventually the transfer function for the last node
1391 computes the facts that are propagated to the block's successors.
1392 For example, consider this graph, with entry at @L1@:
1393 \begin{code}
1394 L1: x=3; goto L2
1395 L2: y=x+4; x=x-1;
1396 if x>0 then goto L2 else return
1397 \end{code}
1398 A forward analysis starts with the bottom fact \{\} at every label.
1399 Analyzing @L1@ propagates this fact forward, by applying the transfer
1400 function successively to the nodes
1401 of @L1@, emerging with the fact \{@x=3@\} for @L2@.
1402 This new fact is joined with the existing (bottom) fact for @L2@.
1403 Now the analysis propagates @L2@'s fact forward, again using the transfer
1404 function, this time emerging with a new fact \mbox{\{@x=2@, @y=7@\}} for @L2@.
1405 Again, the new fact is joined with the existing fact for @L2@, and the process
1406 is iterated until the facts for each label reach a fixed point.
1407
1408 But wait! What is the \emph{type} of the transfer function?
1409 If the node is open at exit, the transfer function produces a single fact.
1410 But what if the node is \emph{closed} on exit?
1411 In that case the transfer function must
1412 produce a list of (@Label@,fact) pairs, one for each outgoing edge.
1413 %
1414 \emph{So the type of the transfer function's result
1415 depends on the shape of the node's exit.}
1416 Fortunately, this dependency can be expressed precisely, at compile
1417 time, by Haskell's (recently added)
1418 \emph{indexed type families}.
1419 The relevant part of \ourlib's interface is given in \figref{transfers}.
1420 A~forward transfer function supplied by a client,
1421 of type (@FwdTransfer@ @n@ @f@),
1422 is typically a function polymorphic in @e@ and @x@.
1423 It takes a
1424 node of type \mbox{(@n@ @e@ @x@)} and a fact of type~@f@, and it produces an
1425 outgoing ``fact-like thing'' of type (@Fact@ @x@ @f@). The
1426 type constructor @Fact@
1427 should be thought of as a type-level function; its signature is given in the
1428 @type family@ declaration, while its definition is given by two @type instance@
1429 declarations. The first declaration says that the fact-like thing
1430 coming out of a node
1431 \emph{open} at the exit is just a fact~@f@. The second declaration says that
1432 the fact-like thing
1433 coming out of a node
1434 \emph{closed} at the exit is a mapping from @Label@ to facts.
1435
1436 We have ordered the arguments such that if
1437 \begin{code}
1438 `transfer_fn :: forall e x . n e x -> f -> Fact x f
1439 `node :: n e x
1440 \end{code}
1441 then @(transfer_fn node)@ is a fact transformer:
1442 \begin{code}
1443 transfer_fn node :: f -> Fact x f
1444 \end{code}
1445
1446 So much for the interface.
1447 What about the implementation?
1448 The way GADTs work is that the compiler uses the value constructor for
1449 type \mbox{@n@ @e@ @x@} to determine whether @e@~and~@x@ are open or
1450 closed.
1451 But we want to write functions that are \emph{polymorphic} in the node
1452 type~@n@!
1453 Such functions include
1454 \begin{itemize}
1455 \item
1456 A function that takes a pair of @FwdTransfer@s for facts @f@~and~@f'@,
1457 and returns a single @FwdTransfer@ for the fact @(f, f')@
1458 \item
1459 A function that takes a @FwdTransfer@ and wraps it in logging code, so
1460 an analysis can be debugged by watching the facts flow through the
1461 nodes
1462 \end{itemize}
1463 Such functions may also be useful in \hoopl's \emph{clients}:
1464 % may need
1465 %functions that are polymorphic in the node type~@n@:
1466 \begin{itemize}
1467 \item
1468 A dominator analysis in the style of
1469 \citet{cooper-harvey-kennedy:simple-dominance} requires only the
1470 functions in the \texttt{NonLocal} type class;
1471 we have written such an analysis using transfer functions that are
1472 polymorphic in~@n@.
1473 \end{itemize}
1474 Because the mapping from
1475 value constructors to shape is different for each node type~@n@, transfer
1476 functions cannot be polymorphic in both
1477 the representation and the shape of nodes.
1478 Our implementation therefore sacrifices polymorphism in shape:
1479 internally, \hoopl\ represents
1480 a~@FwdTransfer n f@ as a \emph{triple} of functions,
1481 each polymorphic in~@n@ but monomorphic in shape:
1482 \begin{code}
1483 newtype FwdTransfer n f
1484 = FwdTransfers ( n C O -> f -> f
1485 , n O O -> f -> f
1486 , n O C -> f -> FactBase f
1487 )
1488 \end{code}
1489 Such triples can easily be composed and wrapped without requiring a
1490 pattern match on the value constructor of an unknown node
1491 type.\remark{Need to refer to this junk in the conclusion}
1492 Working with triples is tedious, but only \hoopl\ itself is forced to
1493 work with triples; each client, which knows the node type, may supply
1494 a triple,
1495 but it typically supplies a single function polymorphic in the shape
1496 of the (known) node.
1497
1498
1499
1500
1501 \subsection{The rewrite function}
1502 \seclabel{rewrites}
1503 \seclabel{example-rewrites}
1504
1505 %%%% \begin{figure}
1506 %%%% \begin{smallfuzzcode}{6.6pt}
1507 %%%% type `AGraph n e x
1508 %%%% = [Label] -> (Graph n e x, [Label])
1509 %%%%
1510 %%%% `withLabels :: Int -> ([Label] -> AGraph n e x)
1511 %%%% -> AGraph n e x
1512 %%%% withLabels n ^fn = \ls -> fn (take n ^ls) (drop n ls)
1513 %%%%
1514 %%%% `mkIfThenElse :: Expr -> AGraph Node O O
1515 %%%% -> AGraph Node O O -> AGraph Node O O
1516 %%%% mkIfThenElse p t e
1517 %%%% = withLabels 3 $ \[l1,l2,l3] ->
1518 %%%% gUnitOC (BUnit (CondBranch p l1 l2)) `gSplice`
1519 %%%% mkLabel l1 `gSplice` t `gSplice` mkBranch l3 `gSplice`
1520 %%%% mkLabel l2 `gSplice` e `gSplice` mkBranch l3 `gSplice`
1521 %%%% mkLabel l3
1522 %%%%
1523 %%%% `mkLabel l = gUnitCO (BUnit (Label l))
1524 %%%% `mkBranch l = gUnitOC (BUnit (Branch l))
1525 %%%% `gUnitOC b = GMany (JustO b) BodyEmpty NothingO
1526 %%%% `gUnitCO b = GMany NothingO BodyEmpty (JustO b)
1527 %%%% \end{smallfuzzcode}
1528 %%%% \caption{The \texttt{AGraph} type and example constructions} \figlabel{agraph}
1529 %%%% \end{figure}
1530
1531 We compute dataflow facts in order to enable code-improving
1532 transformations.
1533 In our constant-propagation example,
1534 the dataflow facts may enable us
1535 to simplify an expression by performing constant folding, or to
1536 turn a conditional branch into an unconditional one.
1537 Similarly, a liveness analysis may allow us to
1538 replace a dead assignment with a no-op.
1539
1540 A @FwdPass@ therefore includes a \emph{rewriting function}, whose
1541 type, @FwdRewrite@, is given in \figref{api-types}.
1542 A rewriting function takes a node and a fact, and optionally returns\ldots what?
1543 At first one might
1544 expect that rewriting should return a new node, but that is not enough:
1545 We~might want to remove a node by rewriting it to the empty graph,
1546 or more ambitiously, we might want to replace a high-level operation
1547 with a tree of conditional branches or a loop, which would entail
1548 introducing new blocks with internal control flow.
1549 In~general, a rewrite function must be able to return a
1550 \emph{graph}.
1551
1552 \ifpagetuning\enlargethispage{0.8\baselineskip}\fi
1553
1554 Concretely, a @FwdRewrite@ takes a node and a suitably shaped
1555 fact, and returns either @Nothing@, indicating that the node should
1556 not be replaced,
1557 or $m\;@(Just (FwdRew@\;\ag\;\rw@))@$, indicating that the node should
1558 be replaced with~\ag: the replacement graph.
1559 The result is monadic because
1560 if the rewriter makes graphs containing blocks,
1561 it may need fresh @Label@s, which are supplied by a monad.
1562 A~client may use its own monad or may use a monad or monad transformer
1563 supplied by \hoopl.
1564 \remark{Forward reference??}
1565
1566 %%%% \ifpagetuning\enlargethispage{0.8\baselineskip}\fi
1567
1568 The type of @FwdRewrite@ in \figref{api-types} guarantees
1569 \emph{at compile time} that
1570 the replacement graph $\ag$ has
1571 the \emph{same} open/closed shape as the node being rewritten.
1572 For example, a branch instruction can be replaced only by a graph
1573 closed at the exit. Moreover, because only an open/open graph can be
1574 empty---look at the type of @GNil@ in \figref{graph}---the type
1575 of @FwdRewrite@
1576 guarantees, at compile time, that no head of a block (closed/open)
1577 or tail of a block (open/closed) can ever be deleted by being
1578 rewritten to an empty graph.
1579
1580 \subsection{Shallow vs deep rewriting}
1581 \seclabel{shallow-vs-deep}
1582
1583 Once the rewrite has been performed, what then?
1584 The rewrite returns a replacement graph,
1585 which must itself be analyzed, and its nodes may be further rewritten.
1586 We~call @analyzeAndRewriteFwdBody@ to process the replacement
1587 graph---but what @FwdPass@ should we use?
1588 There are two common situations:
1589 \begin{itemize}
1590 \item Sometimes we want to analyze and transform the replacement graph
1591 with an unmodified @FwdPass@, further rewriting the replacement graph.
1592 This procedure is
1593 called \emph{deep rewriting}.
1594 When deep rewriting is used, the client's rewrite function must
1595 ensure that the graphs it produces are not rewritten indefinitely
1596 (\secref{correctness}).
1597 \item Sometimes we want to analyze \emph{but not further rewrite} the
1598 replacement graph. This procedure is called \emph{shallow rewriting}.
1599 It~is easily implemented by using a modified @FwdPass@
1600 whose rewriting function always returns @Nothing@.
1601 \end{itemize}
1602 Deep rewriting is essential to achieve the full benefits of
1603 interleaved analysis and transformation
1604 \citep{lerner-grove-chambers:2002}.
1605 But shallow rewriting can be vital as well;
1606 for example, a forward dataflow pass that inserts
1607 a spill before a call must not rewrite the call again, lest it attempt
1608 to insert infinitely many spills.
1609
1610 An~innovation of \hoopl\ is to build the choice of shallow or deep
1611 rewriting into each rewrite function,
1612 an idea that is elegantly captured by the
1613 @FwdRew@ type returned by a @FwdRewrite@ (\figref{api-types}).
1614 The first component of the @FwdRew@ is the replacement graph, as discussed earlier.
1615 The second component, $\rw$, is a
1616 \emph{new rewriting function} to use when recursively processing
1617 the replacement graph.
1618 For shallow rewriting this new function is
1619 the constant @Nothing@ function; for deep rewriting it is the original
1620 rewriting function.
1621
1622
1623 \subsection{Composing rewrite functions and dataflow passes} \seclabel{combinators}
1624
1625 By requiring each rewrite to return a new rewrite function,
1626 \hoopl\ enables a variety of combinators over
1627 rewrite functions.
1628 \remark{This whole subsection needs to be redone in light of the new
1629 (triple-based) representation. It's not pretty any more.}
1630 For example, here is a function
1631 that combines two rewriting functions in sequence:
1632 \remark{This code must be improved}
1633 \smallverbatiminput{comb1}
1634 What a beautiful type @thenFwdRw@ has!
1635 \remark{And what an ugly implementation! Implementations must go.}
1636 It tries @rw1@, and if @rw1@
1637 declines to rewrite, it behaves like @rw2@. But if
1638 @rw1@ rewrites, returning a new rewriter @rw1a@, then the overall call also
1639 succeeds, returning a new rewrite function obtained by combining @rw1a@
1640 with @rw2@. (We cannot apply @rw1a@ or @rw2@
1641 directly to the replacement graph~@g@,
1642 because @r1@~returns a graph and @rw2@~expects a node.)
1643 The rewriter @noFwdRw@ is the identity of @thenFwdRw@.
1644 Finally, @thenFwdRw@ can
1645 combine a deep-rewriting function and a shallow-rewriting function,
1646 to produce a rewriting function that is a combination of deep and shallow.
1647 %%This is something that the Lerner/Grove/Chambers framework could not do,
1648 %%because there was a global shallow/deep flag.
1649 %% Our shortsightedness; Lerner/Grove/Chambers is deep only ---NR
1650
1651
1652 A shallow rewriting function can be made deep by iterating
1653 it:\remark{Algebraic law wanted!}
1654 \smallverbatiminput{iterf}
1655 If we have shallow rewrites $A$~and~$B$ then we can build $AB$,
1656 $A^*B$, $(AB)^*$,
1657 and so on: sequential composition is @thenFwdRw@ and the Kleene star
1658 is @iterFwdRw@.\remark{Do we still believe this claim?}
1659 \remark{We can't define @iterFwdRew@ in terms of @thenFwdRew@ because
1660 the second argument to @thenFwdRew@ would have to be the constant
1661 nothing function when applied but would have to be the original triple
1662 when passed to @thenFwdRew@ as the second argument (in the recursive
1663 call).}
1664
1665
1666 The combinators above operate on rewrite functions that share a common
1667 fact type and transfer function.
1668 It~can also be useful to combine entire dataflow passes that use
1669 different facts.
1670 We~invite you to write one such combinator, with type
1671 \begin{code}
1672 `pairFwd :: Monad m
1673 => FwdPass m n f1
1674 -> FwdPass m n f2
1675 -> FwdPass m n (f1,f2)
1676 \end{code}
1677 The two passes run interleaved, not sequentially, and each may help
1678 the other,
1679 yielding better results than running $A$~and then~$B$ or $B$~and then~$A$
1680 \citep{lerner-grove-chambers:2002}.
1681 %% call these passes. ``super-analyses;''
1682 %% in \hoopl, construction of super-analyses is
1683 %% particularly concrete.
1684
1685 \subsection{Example: Constant propagation and constant folding}
1686 \seclabel{const-prop-client}
1687
1688 % omit Binop :: Operator -> Expr -> Expr -> Expr
1689 % omit Add :: Operator
1690
1691
1692 %% \begin{figure}
1693 %% {\small\hfuzz=3pt
1694 %% \begin{code}
1695 %% -- Types and definition of the lattice
1696 %% data `HasConst = `Top | `B Bool | `I Integer
1697 %% type `ConstFact = Map.Map Var HasConst
1698 %% `constLattice = DataflowLattice
1699 %% { fact_bot = Map.empty
1700 %% , fact_extend = stdMapJoin constFactAdd }
1701 %% where
1702 %% `constFactAdd ^old ^new = (c, j)
1703 %% where j = if new == old then new else Top
1704 %% c = if j == old then NoChange else SomeChange
1705 %%
1706 %% -------------------------------------------------------
1707 %% -- Analysis: variable has constant value
1708 %% `varHasConst :: FwdTransfer Node ConstFact
1709 %% varHasConst (Label l) f = lookupFact f l
1710 %% varHasConst (Store _ _) f = f
1711 %% varHasConst (Assign x (Bool b)) f = Map.insert x (B b) f
1712 %% varHasConst (Assign x (Int i)) f = Map.insert x (I i) f
1713 %% varHasConst (Assign x _) f = Map.insert x Top f
1714 %% varHasConst (Branch l) f = mkFactBase [(l, f)]
1715 %% varHasConst (CondBranch (Var x) ^tid ^fid) f
1716 %% = mkFactBase [(tid, Map.insert x (B True) f),
1717 %% (fid, Map.insert x (B False) f)]
1718 %% varHasConst (CondBranch _ tid fid) f
1719 %% = mkFactBase [(tid, f), (fid, f)]
1720 %%
1721 %% -------------------------------------------------------
1722 %% -- Constant propagation
1723 %% `constProp :: FwdRewrite Node ConstFact
1724 %% constProp node ^facts
1725 %% = fmap toAGraph (mapE rewriteE node)
1726 %% where
1727 %% `rewriteE e (Var x)
1728 %% = case Map.lookup x facts of
1729 %% Just (B b) -> Just $ Bool b
1730 %% Just (I i) -> Just $ Int i
1731 %% _ -> Nothing
1732 %% rewriteE e = Nothing
1733 %%
1734 %% -------------------------------------------------------
1735 %% -- Simplification ("constant folding")
1736 %% `simplify :: FwdRewrite Node f
1737 %% simplify (CondBranch (Bool b) t f) _
1738 %% = Just $ toAGraph $ Branch (if b then t else f)
1739 %% simplify node _ = fmap toAGraph (mapE s_exp node)
1740 %% where
1741 %% `s_exp (Binop Add (Int i1) (Int i2))
1742 %% = Just $ Int $ i1 + i2
1743 %% ... -- more cases for constant folding
1744 %%
1745 %% -- Rewriting expressions
1746 %% `mapE :: (Expr -> Maybe Expr)
1747 %% -> Node e x -> Maybe (Node e x)
1748 %% mapE f (Label _) = Nothing
1749 %% mapE f (Assign x e) = fmap (Assign x) $ f e
1750 %% ... -- more cases for rewriting expressions
1751 %%
1752 %% -------------------------------------------------------
1753 %% -- Defining the forward dataflow pass
1754 %% `constPropPass = FwdPass
1755 %% { fp_lattice = constLattice
1756 %% , fp_transfer = varHasConst
1757 %% , fp_rewrite = constProp `thenFwdRw` simplify }
1758 %% \end{code}}
1759 %% \caption{The client for constant propagation and constant folding} \figlabel{old-const-prop}
1760 %% \end{figure}
1761 \begin{figure}
1762 \smallverbatiminput{cprop}
1763 \caption{The client for constant propagation and constant folding\break (extracted automatically from code distributed with Hoopl)}
1764 \figlabel{const-prop}
1765 \end{figure}
1766
1767
1768 \figref{const-prop} shows client code for
1769 constant propagation and constant folding.
1770 For each variable at each point in a graph, the analysis concludes one of three facts:
1771 the variable holds a constant value (Boolean or integer),
1772 the variable might hold a non-constant value,
1773 or nothing is known about what the variable holds.
1774 We~represent these facts using a finite map from a variable to a
1775 fact of type @WithTop Lit@ (\secref{WithTop}).
1776 A~variable with a constant value maps to @Just (PElem k)@, where
1777 @k@~is the constant value;
1778 a variable with a non-constant value maps to @Just Top@;
1779 and a variable with an unknown value maps to @Nothing@ (it is not
1780 in the domain of the finite map).
1781
1782 % \afterpage{\clearpage}
1783
1784 The definition of the lattice (@constLattice@) is straightforward.
1785 The bottom element is an empty map (nothing is known about the contents of any variable).
1786 We~use the @stdMapJoin@ function to lift the join operation
1787 for a single variable (@constFactAdd@) up to the map containing facts
1788 for all variables.
1789 The utility function @`changeIf :: Bool -> ChangeFlag@ is exported by \hoopl.
1790
1791 % omit stdMapJoin :: Ord k => JoinFun v -> JoinFun (Map.Map k v)
1792
1793 The forward transfer function @varHasLit@ is defined using the
1794 shape-polymorphic auxiliary function~@ft@.
1795 There are two interesting kinds of nodes:
1796 assignment and conditional branch.
1797 In the first two cases for assignment, a variable gets a constant value,
1798 so we produce a dataflow fact mapping the variable to its value.
1799 In the third case for assignment, the variable gets a non-constant value,
1800 so we produce a dataflow fact mapping the variable to @Top@.
1801 The~last interesting case is a conditional branch where the condition
1802 is a variable.
1803 If the conditional branch flows to the true successor,
1804 the variable holds @True@, and similarly for the false successor.
1805 We~update the fact flowing to each successor accordingly.
1806 (Because function @lt@ scrutinizes a GADT, it cannot
1807 default the uninteresting cases.)
1808
1809 We do not need to consider complicated cases such as
1810 an assignment @x:=y@ where @y@ holds a constant value @k@.
1811 Instead, we rely on the interleaving of transformation
1812 and analysis to first transform the assignment to @x:=k@,
1813 which is exactly what our simple transfer function expects.
1814 As we mention in \secref{simple-tx},
1815 interleaving makes it possible to write
1816 the simplest imaginable transfer functions, without missing
1817 opportunities to improve the code.
1818
1819 \figref{const-prop}'s rewrite function for constant propagation, @constProp@,
1820 rewrites each use of a variable to its constant value.
1821 The client has defined auxiliary functions
1822 \begin{smallcode}
1823 `mapVE :: (Var -> Maybe Expr) -> (Expr -> Maybe Expr)
1824 `mapEE :: (Expr -> Maybe Expr) -> (Expr -> Maybe Expr)
1825 `mapEN :: (Expr -> Maybe Expr) -> (Node e x -> Maybe (Node e x))
1826 `mapVN :: (Var -> Maybe Expr) -> (Node e x -> Maybe (Node e x))
1827 `nodeToG :: Node e x -> Graph Node e x
1828 \end{smallcode}
1829 The client composes @mapXX@ functions
1830 to apply @lookup@ to each use of a variable in each kind of node;
1831 @lookup@ substitutes for each variable that has a constant value.
1832 Applying @fmap nodeToG@ lifts the final node, if present, into a~@Graph@.
1833
1834
1835 \figref{const-prop} also gives another, distinct function
1836 for constant
1837 folding: @simplify@.
1838 This function rewrites a conditional branch on a
1839 boolean constant to an unconditional branch, and
1840 to find constant subexpressions,
1841 it runs @s_exp@
1842 on every subexpression.
1843 Function @simplify@ does not need to check whether a variable holds a
1844 constant value; it relies on @constProp@ to have replaced the
1845 variable by the constant.
1846 Indeed, @simplify@ does not consult the
1847 incoming fact, so it is polymorphic in~@f@.
1848
1849
1850 The @FwdRewrite@ functions @constProp@ and @simplify@
1851 are useful independently.
1852 In this case, however, we
1853 want \emph{both} of them,
1854 so we compose them with @thenFwdRw@.
1855 The composition, along with the lattice and the
1856 transfer function,
1857 goes into @constPropPass@ (bottom of \figref{const-prop}).
1858 Given @constPropPass@, we can
1859 improve a graph~@g@ by passing @constPropPass@ and~@g@
1860 to
1861 @analyzeAndRewriteFwdBody@.
1862
1863
1864 \subsection{Throttling the dataflow engine using ``optimization fuel''}
1865 \seclabel{vpoiso}
1866 \seclabel{fuel}
1867
1868 Debugging an optimization can be tricky:
1869 an optimization may rewrite hundreds of nodes,
1870 and any of those rewrites could be incorrect.
1871 To debug dataflow optimizations, we use Whalley's
1872 \citeyearpar{whalley:isolation} powerful technique
1873 to identify the first rewrite that
1874 transforms a program from working code to faulty code.
1875
1876 The key idea is to~limit the number of rewrites that
1877 are performed while optimizing a graph.
1878 In \hoopl, the limit is called
1879 \emph{optimization fuel}:
1880 each rewrite costs one unit of fuel,
1881 and when the fuel is exhausted, no more rewrites are permitted.
1882 Because each rewrite leaves the observable behavior of the
1883 program unchanged, it is safe to stop rewriting at any point.
1884 Given a program that fails when compiled with optimization,
1885 a test infrastructure uses binary search on the amount of
1886 optimization fuel, until
1887 it finds that the program works correctly after $n-1$ rewrites but fails
1888 after $n$~rewrites.
1889 The $n$th rewrite is faulty.
1890
1891 \seclabel{fuel-monad}
1892
1893 \hoopl\ ensures it can keep track of fuel by imposing the constraint
1894 @FuelMonad m@ on the monad of any rewrite function used with
1895 @analyzeAndRewriteFwdBody@.
1896 This constraint ensures that \hoopl\ can get and set the amount of
1897 fuel remaining.
1898
1899
1900 \subsection{Fixed points and speculative rewrites} \seclabel{fixpoints}
1901
1902 Are rewrites sound, especially when there are loops?
1903 Many analyses compute a fixed point starting from unsound
1904 ``facts''; for example, a live-variable analysis starts from the
1905 assumption that all variables are dead. This means \emph{rewrites
1906 performed before a fixed point is reached may be unsound, and their results
1907 must be discarded}. Each iteration of the fixed-point computation must
1908 start afresh with the original graph.
1909
1910
1911 Although the rewrites may be unsound, \emph{they must be performed}
1912 (speculatively, and possibly recursively),
1913 so that the facts downstream of the replacement graphs are as accurate
1914 as possible.
1915 For~example, consider this graph, with entry at @L1@:
1916 \par{\small
1917 \begin{code}
1918 L1: x=0; goto L2
1919 L2: x=x+1; if x==10 then goto L3 else goto L2
1920 \end{code}}
1921 The first traversal of block @L2@ starts with the unsound ``fact'' \{x=0\};
1922 but analysis of the block propagates the new fact \{x=1\} to @L2@, which joins the
1923 existing fact to get \{x=$\top$\}.
1924 What if the predicate in the conditional branch were @x<10@ instead
1925 of @x==10@?
1926 Again the first iteration would begin with the tentative fact \{x=0\}.
1927 Using that fact, we would rewrite the conditional branch to an unconditional
1928 branch @goto L3@. No new fact would propagate to @L2@, and we would
1929 have successfully (and soundly) eliminated the loop.
1930 This example is contrived, but it illustrates that
1931 for best results we should
1932 \begin{itemize}
1933 \item Perform the rewrites on every iteration.
1934 \item Begin each new iteration with the original, virgin graph.
1935 \end{itemize}
1936 This sort of algorithm is hard to implement in an imperative setting, where rewrites
1937 mutate a graph in place.
1938 But with an immutable graph, implementing the algorithm
1939 is trivially easy: we simply revert to the original graph at the start
1940 of each fixed-point iteration.
1941
1942 \subsection{Correctness} \seclabel{correctness}
1943
1944 Facts computed by @analyzeAndRewriteFwdBody@ depend on graphs produced by the rewrite
1945 function, which in turn depend on facts computed by the transfer function.
1946 How~do we know this algorithm is sound, or if it terminates?
1947 A~proof requires a POPL paper
1948 \cite{lerner-grove-chambers:2002}, but we can give some
1949 intuition.
1950
1951 \hoopl\ requires that a client's functions meet
1952 these preconditions:
1953 \begin{itemize}
1954 \item
1955 The lattice must have no \emph{infinite ascending chains}; that is,
1956 every sequence of calls to @fact_extend@ must eventually return @NoChange@.
1957 \item
1958 The transfer function must be
1959 \emph{monotonic}: given a more informative fact in,
1960 it should produce a more informative fact out.
1961 \item
1962 The rewrite function must be \emph{sound}:
1963 if it replaces a node @n@ by a replacement graph~@g@, then @g@~must be
1964 observationally equivalent to~@n@ under the
1965 assumptions expressed by the incoming dataflow fact~@f@.
1966 %%\footnote{We do not permit a transformation to change
1967 %% the @Label@ of a node. We have not found any optimizations
1968 %% that are prevented (or even affected) by this restriction.}
1969 \item
1970 The rewrite function must be \emph{consistent} with the transfer function;
1971 that is, \mbox{@`transfer n f@ $\sqsubseteq$ @transfer g f@}.
1972 For example, if the analysis says that @x@ is dead before the node~@n@,
1973 then it had better still be dead if @n@ is replaced by @g@.
1974 \item
1975 To ensure termination, a transformation that uses deep rewriting
1976 must not return replacement graphs which
1977 contain nodes that could be rewritten indefinitely.
1978 \end{itemize}
1979 Without the conditions on monotonicity and consistency,
1980 our algorithm will terminate,
1981 but there is no guarantee that it will compute
1982 a fixed point of the analysis. And that in turn threatens the
1983 soundness of rewrites based on possibly bogus ``facts''.
1984
1985 However, when the preconditions above are met,
1986 \begin{itemize}
1987 \item
1988 The algorithm terminates. The fixed-point loop must terminate because the
1989 lattice has no infinite ascending chains. And the client is responsible
1990 for avoiding infinite recursion when deep rewriting is used.
1991 \item
1992 The algorithm is sound. Why? Because if each rewrite is sound (in the sense given above),
1993 then applying a succession of rewrites is also sound.
1994 Moreover, a~sound analysis of the replacement graph
1995 may generate only dataflow facts that could have been
1996 generated by a more complicated analysis of the original graph.
1997 \end{itemize}
1998
1999 \finalremark{Doesn't the rewrite have to be have the following property:
2000 for a forward analysis/transform, if (rewrite P s) = Just s',
2001 then (transfer P s $\sqsubseteq$ transfer P s').
2002 For backward: if (rewrite Q s) = Just s', then (transfer Q s' $\sqsubseteq$ transfer Q s).
2003 Works for liveness.
2004 ``It works for liveness, so it must be true'' (NR).
2005 If this is true, it's worth a QuickCheck property!
2006 }%
2007 \finalremark{Version 2, after further rumination. Let's define
2008 $\scriptstyle \mathit{rt}(f,s) = \mathit{transform}(f, \mathit{rewrite}(f,s))$.
2009 Then $\mathit{rt}$ should
2010 be monotonic in~$f$. We think this is true of liveness, but we are not sure
2011 whether it's just a generally good idea, or whether it's actually a
2012 precondition for some (as yet unarticulated) property of \ourlib{} to hold.}%
2013
2014 %%%% \simon{The rewrite functions must presumably satisfy
2015 %%%% some monotonicity property. Something like: given a more informative
2016 %%%% fact, the rewrite function will rewrite a node to a more informative graph
2017 %%%% (in the fact lattice.).
2018 %%%% \textbf{NR}: actually the only obligation of the rewrite function is
2019 %%%% to preserve observable behavior. There's no requirement that it be
2020 %%%% monotonic or indeed that it do anything useful. It just has to
2021 %%%% preserve semantics (and be a pure function of course).
2022 %%%% \textbf{SLPJ} In that case I think I could cook up a program that
2023 %%%% would never reach a fixed point. Imagine a liveness analysis with a loop;
2024 %%%% x is initially unused anywhere.
2025 %%%% At some assignment node inside the loop, the rewriter behaves as follows:
2026 %%%% if (and only if) x is dead downstream,
2027 %%%% make it alive by rewriting the assignment to mention x.
2028 %%%% Now in each successive iteration x will go live/dead/live/dead etc. I
2029 %%%% maintain my claim that rewrite functions must satisfy some
2030 %%%% monotonicity property.
2031 %%%% \textbf{JD}: in the example you cite, monotonicity of facts at labels
2032 %%%% means x cannot go live/dead/live/dead etc. The only way we can think
2033 %%%% of not to terminate is infinite ``deep rewriting.''
2034 %%%% }
2035
2036
2037
2038
2039 \section{\ourlib's implementation}
2040 \seclabel{engine}
2041 \seclabel{dfengine}
2042
2043 \secref{making-simple}
2044 gives a client's-eye view of \hoopl, showing how to use
2045 it to create analyses and transformations.
2046 \hoopl's interface is simple, but
2047 the \emph{implementation} of interleaved analysis and rewriting is
2048 quite complicated. \citet{lerner-grove-chambers:2002}
2049 do not describe their implementation. We have written
2050 at least three previous implementations, all of which
2051 were long and hard to understand, and only one of which
2052 provided compile-time guarantees about open and closed shapes.
2053 We are not confident that any of these implementations are correct.
2054
2055 In this paper we describe our new implementation. It is short
2056 (about a third of the size of our last attempt), elegant, and offers
2057 strong static shape guarantees.
2058 \remark{Wanted: enumerate the critical components and give each one's size}
2059
2060 We describe the implementation of \emph{forward}
2061 analysis and transformation.
2062 The implementations of backward analysis and transformation are
2063 exactly analogous and are included in \hoopl.
2064
2065
2066
2067 \subsection{Overview}
2068
2069
2070 %% We on @analyzeAndRewriteFwd@, whose type is more general
2071 %% than that of @analyzeAndRewriteFwdBody@:
2072 %% \begin{smallcode}
2073 %% `analyzeAndRewriteFwd
2074 %% :: forall m n f e x. (FuelMonad m, NonLocal n)
2075 %% => FwdPass m n f -- lattice, transfers, rewrites
2076 %% -> MaybeC e [Label] -- entry points for a closed graph
2077 %% -> Graph n e x -- the original graph
2078 %% -> Fact e f -- fact(s) flowing into the entry/entries
2079 %% -> m (Graph n e x, FactBase f, MaybeO x f)
2080 %% \end{smallcode}
2081 %% We analyze graphs of all shapes; a single @FwdPass@ may be used with
2082 %% multiple shapes.
2083 %% If a graph is closed at the entry, a list of entry points must be
2084 %% provided;
2085 %% if the graph is open at the entry, it must be the case that the
2086 %% implicit entry point is the only entry point.
2087 %% The fact or set of facts flowing into the entries is similarly
2088 %% determined by the shape of the entry point.
2089 %% Finally, the result is a rewritten graph, a @FactBase@ that gives a
2090 %% fixed point of the analysis on the rewritten graph, and if the graph
2091 %% is open at the exit, an ``exit fact'' flowing out.
2092
2093 Instead of the interface function @analyzeAndRewriteBody@, we present
2094 the internal function @arfGraph@ (short for ``analyze and rewrite
2095 forward graph''):
2096 \begin{smallcode}
2097 `arfGraph
2098 :: forall m n f e x. (FuelMonad m, NonLocal n)
2099 => FwdPass m n f -- lattice, transfers, rewrites
2100 -> MaybeC e [Label] -- entry points for a closed graph
2101 -> Graph n e x -- the original graph
2102 -> Fact e f -- fact(s) flowing into the entry/entries
2103 -> m (DG f n e x, Fact x f)
2104 \end{smallcode}
2105 We analyze graphs of all shapes; a single @FwdPass@ may be used with
2106 multiple shapes.
2107 If a graph is closed at the entry, a list of entry points must be
2108 provided;
2109 if the graph is open at the entry, it must be the case that the
2110 implicit entry point is the only entry point.
2111 The fact or set of facts flowing into the entries is similarly
2112 determined by the shape of the entry point.
2113 Finally, the result is a ``decorated graph''
2114 (@DG f n e x@),
2115 and if the graph
2116 is open at the exit, an ``exit fact'' flowing out.
2117
2118 A~``decorated graph'' is one in which each block is decorated with the
2119 fact that holds at the start of the block.
2120 @DG@ actually shares a representation with @Graph@,
2121 which is possible because the definition of
2122 @Graph@ in \figref{graph} contains a white lie: @Graph@~is a type
2123 synonym for an underlying type @`Graph'@, which takes the type of block
2124 as an additional parameter.
2125 (Similarly, function @gSplice@ in \secref{gSplice} is actually a
2126 higher-order function that takes a block-concatenation function as a
2127 parameter.)
2128 The truth about @Graph@ and @DG@ is as follows:
2129 \begin{code}
2130 type Graph = Graph' Block
2131 type DG f n e x = Graph' (DBlock f) n e x
2132 data DBlock f n e x = DBlock f (Block n e x)
2133 \end{code}
2134 Type~@DG@ is internal to \hoopl; it is not seen by any client.
2135 Extracting a @Graph@ and a @FactBase@ for @analyzeAndRewriteFwdBody@
2136 requires a 12-line function.
2137 %%
2138 %% We use it, not @Graph@, for two reasons:
2139 %% \begin{itemize}
2140 %% \item The client is often interested not only in the facts flowing
2141 %% out of the graph (which are returned in the @Fact x f@),
2142 %% but also in the facts on the \emph{internal} blocks
2143 %% of the graph. A~replacement graph of type @(DG n e x)@ is decorated with
2144 %% these internal facts.
2145 %% \item A @Graph@ has deliberately restrictive invariants; for example,
2146 %% a @GMany@ with a @JustO@ is always open at exit (\figref{graph}). It turns
2147 %% out to be awkward to maintain these invariants \emph{during} rewriting,
2148 %% but easy to restore them \emph{after} rewriting by ``normalizing'' an @DG@.
2149 %% \end{itemize}
2150 %% The information in an @DG@ is returned to the client by
2151 %% the normalization function @normalizeBody@, which
2152 %% splits an @DG@ into a @Body@ and its corresponding @FactBase@:
2153 %% \begin{code}
2154 %% `normalizeBody :: NonLocal n => DG n f C C
2155 %% -> (Body n, FactBase f)
2156 %%
2157
2158 Function @arfGraph@ is implemented using four inner functions:
2159 one each for nodes, blocks, graph bodies, and full graphs.
2160 These functions help us separate concerns: for example, only the
2161 analysis of a node has to worry about rewrite functions;
2162 and only the analysis of a body has to worry about fixed points.
2163 All four functions have access to the @FwdPass@ and any entry points
2164 that are passed to @arfGraph@.
2165 These functions also have access to type variables bound by
2166 @arfGraph@:
2167 @n@~is the type of nodes; @f@~is the type of facts;
2168 @m@~is the monad used in the rewriting functions of the @FwdPass@;
2169 and
2170 @e@~and~@x@ give the shape of the graph passed to @arfGraph@.
2171 The types of the inner functions are
2172 \begin{smallcode}
2173 node :: forall e x . (ShapeLifter e x)
2174 => n e x -> f -> m (DG f n e x, Fact x f)
2175 block :: forall e x .
2176 Block n e x -> f -> m (DG f n e x, Fact x f)
2177 body :: [Label] -> LabelMap (Block n C C)
2178 -> Fact C f -> m (DG f n C C, Fact C f)
2179 graph :: Graph n e x -> Fact e f -> m (DG f n e x, Fact x f)
2180 \end{smallcode}
2181 Each inner function works the same way: it takes a ``thing'' and
2182 returns an \emph{extended fact transformer}.
2183 An~extended fact transformer takes an input fact or fact suitable to
2184 the ``thing'' and it returns an output fact or facts.
2185 And it also returns a decorated graph representing the (possibly
2186 rewritten) ``thing''---that's the \emph{extended} part.
2187 Finally, because rewriting may consume fuel or require a fresh name,
2188 every extended fact transformer is monadic.
2189
2190 The types of the four extended fact transformers are not quite
2191 identical:
2192 \begin{itemize}
2193 \item
2194 Extended fact transformers for nodes and blocks have the same type;
2195 they expect a fact~@f@ rather than the more general @Fact e f@
2196 required for a graph.
2197 We~made a similar design choice in the interface for transfer
2198 functions (\secref{transfer-function}): because a node or a block has
2199 exactly one fact flowing into the entry, it is easiest simply to pass
2200 that fact,
2201 \item
2202 Extended fact transformers for graphs have the most general type,
2203 which is expressed using the type family @Fact@:
2204 if the graph is open at the entry, its fact transformer expects a
2205 single fact;
2206 if the graph is close at the entry, its fact transformer expects a
2207 @FactBase@.
2208 \item
2209 Extended fact transformers for bodies have the same type as an
2210 extended fact transformer for a closed/closed graph.
2211 \end{itemize}
2212
2213
2214 Function @arfGraph@ and its four inner functions comprise a cycle of
2215 mutual recursion:
2216 @arfGraph@ calls @graph@;
2217 @graph@ calls @body@ and @block@;
2218 @body@ calls @block@;
2219 @block@ calls @node@;
2220 and
2221 @node@ calls @arfGraph@.
2222 Theses five functions do three different kinds of work:
2223 compose extended fact transformers, analyze and rewrite nodes, and compute
2224 fixed points.
2225
2226
2227 \subsection{Analyzing blocks and graphs by composing extended fact transformers}
2228
2229 Extended fact transformers compose nicely.
2230 The composition function, @cat@, arranges for the functional parts to
2231 compose and for the decorated graphs to be spliceable.
2232 It~has a very general type:
2233 \begin{code}
2234 cat :: forall m e a x info info' info''. Monad m
2235 => (info -> m (DG f n e a, info'))
2236 -> (info' -> m (DG f n a x, info''))
2237 -> (info -> m (DG f n e x, info''))
2238 \end{code}
2239 The name @cat@ comes from the concatenation of the decorated graphs,
2240 but it is also appropriate because the style in which it is used is
2241 reminiscent of @concatMap@.
2242 The inner @block@ function exemplifies this style:
2243 \begin{code}
2244 block :: forall e x .
2245 Block n e x -> f -> m (DG f n e x, Fact x f)
2246 block (BFirst n) = node n
2247 block (BMiddle n) = node n
2248 block (BLast n) = node n
2249 block (BCat b1 b2) = block b1 `cat` block b2
2250 \end{code}
2251 The implementation of @graph@ is similar but has many more cases.
2252
2253 Function @cat@ threads facts through
2254 monadic fact transformers and accumulates decorated graphs:
2255 \begin{code}
2256 `cat ft1 ft2 f = do { (g1,f1) <- ft1 f
2257 ; (g2,f2) <- ft2 f1
2258 ; return (g1 `dgCat` g2, f2) }
2259 \end{code}
2260 (Function @dgCat@ is the same splicing function used for an ordinary
2261 @Graph@, but it takes a one-line block-concatenation function suitable
2262 for @DBlock@s.)
2263
2264
2265
2266 %%%%
2267 %%%% \begin{itemize}
2268 %%%% \item
2269 %%%% The @arfNode@ function processes nodes (\secref{arf-node}).
2270 %%%% It handles the subtleties of interleaved analysis and rewriting,
2271 %%%% and it deals with fuel consumption. It calls @arfGraph@ to analyze
2272 %%%% and transform rewritten graphs.
2273 %%%% \item
2274 %%%% Based on @arfNode@ it is extremely easy to write @arfBlock@, which lifts
2275 %%%% the analysis and rewriting from nodes to blocks (\secref{arf-block}).
2276 %%%%
2277 %%%% %%% \ifpagetuning\penalty -10000 \fi
2278 %%%%
2279 %%%%
2280 %%%% \item
2281 %%%% Using @arfBlock@ we define @arfBody@, which analyzes and rewrites a
2282 %%%% @Body@: a~group of closed/closed blocks linked by arbitrary
2283 %%%% control flow.
2284 %%%% Because a @Body@ is
2285 %%%% always closed/closed and does not take shape parameters, function
2286 %%%% @arfBody@ is less polymorphic than the others; its type is what
2287 %%%% would be obtained by expanding and specializing the definition of
2288 %%%% @ARF@ for a @thing@ which is always closed/closed and is equivalent to
2289 %%%% a @Body@.
2290 %%%%
2291 %%%% Function @arfBody@ takes care of fixed points (\secref{arf-body}).
2292 %%%% \item
2293 %%%% Based on @arfBody@ it is easy to write @arfGraph@ (\secref{arf-graph}).
2294 %%%% \end{itemize}
2295 %%%% Given these functions, writing the main analyzer is a simple
2296 %%%% matter of matching the external API to the internal functions:
2297 %%%% \begin{code}
2298 %%%% `analyzeAndRewriteFwdBody
2299 %%%% :: forall n f. NonLocal n
2300 %%%% => FwdPass n f -> Body n -> FactBase f
2301 %%%% -> FuelMonad (Body n, FactBase f)
2302 %%%%
2303 %%%% analyzeAndRewriteFwdBody pass ^body facts
2304 %%%% = do { (^rg, _) <- arfBody pass body facts
2305 %%%% ; return (normalizeBody rg) }
2306 %%%% \end{code}
2307 %%%%
2308 %%%% \subsection{From nodes to blocks} \seclabel{arf-block}
2309 %%%% \seclabel{arf-graph}
2310 %%%%
2311 %%%% We begin our explanation with the second task:
2312 %%%% writing @arfBlock@, which analyzes and transforms blocks.
2313 %%%% \begin{code}
2314 %%%% `arfBlock :: NonLocal n => ARF (Block n) n
2315 %%%% arfBlock pass (BUnit node) f
2316 %%%% = arfNode pass node f
2317 %%%% arfBlock pass (BCat b1 b2) f
2318 %%%% = do { (g1,f1) <- arfBlock pass b1 f
2319 %%%% ; (g2,f2) <- arfBlock pass b2 f1
2320 %%%% ; return (g1 `DGCatO` g2, f2) }
2321 %%%% \end{code}
2322 %%%% The code is delightfully simple.
2323 %%%% The @BUnit@ case is implemented by @arfNode@.
2324 %%%% The @BCat@ case is implemented by recursively applying @arfBlock@ to the two
2325 %%%% sub-blocks, threading the output fact from the first as the
2326 %%%% input to the second.
2327 %%%% Each recursive call produces a rewritten graph;
2328 %%%% we concatenate them with @DGCatO@.
2329 %%%%
2330 %%%% Function @arfGraph@ is equally straightforward:
2331 %%%% XXXXXXXXXXXXXXX
2332 %%%% The pattern is the same as for @arfBlock@: thread
2333 %%%% facts through the sequence, and concatenate the results.
2334 %%%% Because the constructors of type~@DG@ are more polymorphic than those
2335 %%%% of @Graph@, type~@DG@ can represent
2336 %%%% graphs more simply than @Graph@; for example, each element of a
2337 %%%% @GMany@ becomes a single @DG@ object, and these @DG@ objects are then
2338 %%%% concatenated to form a single result of type~@DG@.
2339 %%%%
2340 %%%% %% \ifpagetuning\penalty -10000 \fi
2341
2342 \subsection{Analyzing and rewriting nodes} \seclabel{arf-node}
2343
2344 Although interleaving analysis with transformation is tricky, we have
2345 isolated the algorithm in the @node@ function:
2346 \begin{smallfuzzcode}{10.5pt}
2347 node :: forall e x . (ShapeLifter e x)
2348 => n e x -> f -> m (DG f n e x, Fact x f)
2349 node n f
2350 = do { rew <- frewrite pass n f >>= withFuel
2351 ; case rew of
2352 Nothing -> return (toDg f (toBlock n),
2353 ftransfer pass n f)
2354 Just (FwdRew g rw) ->
2355 let pass' = pass { fp_rewrite = rw }
2356 f' = fwdEntryFact n f
2357 in arfGraph pass' (maybeEntry n) g f' }
2358 \end{smallfuzzcode}
2359 This code is more complicated
2360 but still brief.
2361 %
2362 Functions @`frewrite@, @`ftransfer@, @`toBlock@,
2363 @`maybeEntry@, and @`fwdEntryFact@ are overloaded based on the
2364 node's shape, via the (private) @`ShapeLifter@ class.
2365 We begin by using @frewrite@ to
2366 extract the
2367 rewriting function from the @FwdPass@,
2368 and we apply it to the node~@n@
2369 and the incoming fact~@f@.
2370
2371 The resulting monadic @Maybe@ is passed to @withFuel@, which
2372 accounts for fuel:
2373 \begin{code}
2374 `withFuel :: FuelMonad m => Maybe a -> m (Maybe a)
2375 \end{code}
2376 If @withFuel@'s argument is @Nothing@, \emph{or} if we have run out of
2377 optimization fuel (\secref{fuel}), @withFuel@ returns @Nothing@.
2378 Otherwise, @withFuel@ consumes one unit of fuel and returns its
2379 % defn Fuel
2380 argument (which will be a @Just@). That is all we need say about fuel.
2381
2382 In the @Nothing@ case, no rewrite takes place: fuel is
2383 exhausted, or the rewrite function
2384 didn't want a rewrite.
2385 We~return the single-node
2386 graph @(toDg f (toBlock n))@, decorated with its incoming fact.
2387 We~also apply the transfer function @(ftransfer pass)@
2388 to the incoming fact to produce the outgoing fact.
2389
2390 In the @Just@ case, we receive a replacement
2391 graph~@g@ and a new rewrite function~@rw@.
2392 We~recursively analyze @g@ with @arfGraph@.
2393 This analysis uses @pass'@, which contains the original lattice and transfer
2394 function from @pass@, together with the new rewrite function~@rw@.
2395 Function @fwdEntryFact@ converts fact~@f@ from the type~@f@,
2396 which @node@ expects, to the type @Fact e f@, which @arfGraph@ expects.
2397
2398 And that's it! If~the client wanted deep rewriting, it is
2399 implemented by the call to @arfGraph@;
2400 if the client wanted
2401 shallow rewriting, the rewrite function will have returned
2402 @noFwdRw@ as~@rw@, which is implanted in @pass'@
2403 (\secref{shallow-vs-deep}).
2404
2405 \subsection{Fixed points} \seclabel{arf-body}
2406
2407 Lastly, the inner @body@ function iterates to a fixed point.
2408 This part of the implementation is the only really tricky part, and it is
2409 cleanly separated from everything else:
2410 \begin{smallcode}
2411 `body :: [Label] -> LabelMap (Block n C C)
2412 -> Fact C f -> m (DG f n C C, Fact C f)
2413 body entries blocks init_fbase
2414 = fixpoint True lattice do_block init_fbase $
2415 forwardBlockList entries blocks
2416 where
2417 lattice = fp_lattice pass
2418 do_block b fb = do (g, fb) <- block b entryFact
2419 return (g, mapToList fb)
2420 where entryFact = getFact lattice (entryLabel b) fb
2421 \end{smallcode}
2422 Function @forwardBlockList@ takes a list of possible entry points and
2423 a finite map from labels to places.
2424 It returns a list of
2425 blocks, sorted into an order that makes forward dataflow efficient:
2426 \begin{smallcode}
2427 `forwardBlockList
2428 :: NonLocal n
2429 => [Label] -> LabelMap (Block n C C) -> [Block n C C]
2430 \end{smallcode}
2431 For
2432 example, if the entry point is at~@L2@, and the block at~@L2@
2433 branches to~@L1@, but not vice versa, then \hoopl\ will reach a fixed point
2434 more quickly if we process @L2@ before~@L1@.
2435 To~find an efficient order, @forwardBlockList@ uses
2436 the methods of the @NonLocal@ class---@entryLabel@ and @successors@---to
2437 perform a reverse postorder depth-first traversal of the control-flow graph.
2438 %%
2439 %%The @NonLocal@ type-class constraint on~@n@ propagates to all the
2440 %%@`arfThing@ functions.
2441 %% paragraph carrying too much freight
2442 %%
2443 (The order of the blocks does not affect the fixed point or any other
2444 part of the answer; it affects only the number of iterations needed to
2445 reach the fixed point.)
2446
2447 The rest of the work is done by @fixpoint@, which is shared by
2448 both forward and backward analyses:
2449 \begin{smallcode}
2450 `fixpoint :: forall m n f. (FuelMonad m, NonLocal n)
2451 => Bool -- going Forward?
2452 -> DataflowLattice f
2453 -> (Block n C C -> FactBase f -> m (DG n f C C, FactBase f))
2454 -> [Block n C C]
2455 -> (FactBase f -> m (DG f n C C, FactBase f))
2456 \end{smallcode}
2457 Except for the mysterious @Bool@ passed as the first argument,
2458 the type signature tells the story.
2459 The third argument is an extended fact transformer for a single block;
2460 @fixpoint@ applies that function successively to all the blocks,
2461 which are passed as the fourth argument.
2462 The result is an extended fact transformer for the blocks.
2463 This extended fact transformer maintains a
2464 ``Current @FactBase@''
2465 which grows monotonically:
2466 its initial value is the input @FactBase@, and it is
2467 augmented with the new facts that flow
2468 out of each @Block@ as it is analyzed.
2469 The @fixpoint@ function
2470 iterates over the list of blocks until the Current @FactBase@ reaches
2471 a fixed point.
2472
2473 The code for @fixpoint@ is a massive 70 lines long;
2474 for completeness, it
2475 appears in Appendix~\ref{app:fixpoint}.
2476 The~code is mostly straightforward, although we try to be clever
2477 about deciding when a new fact means that another iteration over the
2478 blocks will be required.
2479 \remark{Rest of this \S\ is a candidate to be cut}
2480 There is one more subtle point worth mentioning, which we highlight by
2481 considering a forward analysis of this graph, where execution starts at~@L1@:
2482 \begin{code}
2483 L1: x:=3; goto L4
2484 L2: x:=4; goto L4
2485 L4: if x>3 goto L2 else goto L5
2486 \end{code}
2487 Block @L2@ is unreachable.
2488 But if we \naively\ process all the blocks (say in
2489 order @L1@, @L4@, @L2@), then we will start with the bottom fact for @L2@, propagate
2490 \{@x=4@\} to @L4@, where it will join with \{@x=3@\} to yield
2491 \{@x=@$\top$\}.
2492 Given @x=@$\top$, the
2493 conditional in @L4@ cannot be rewritten, and @L2@~seems reachable. We have
2494 lost a good optimization.
2495
2496 Our implementation solves this problem through a trick that is
2497 safe only for a forward analysis;
2498 @fixpoint@ analyzes a block only if the block is
2499 reachable from an entry point.
2500 This trick is not safe for a backward analysis, which
2501 is why
2502 @fixpoint@ takes a @Bool@ as its first argument:
2503 it must know if the analysis goes forward.
2504
2505 Although the trick can be implemented in just a couple of lines of
2506 code, the reasoning behind it is quite subtle---exactly the sort of
2507 thing that should be implemented once in \hoopl, so clients don't have
2508 to worry about it.
2509
2510 \section {Related work} \seclabel{related}
2511
2512 While there is a vast body of literature on
2513 dataflow analysis and optimization,
2514 relatively little can be found on
2515 the \emph{design} of optimizers, which is the topic of this paper.
2516 We therefore focus on the foundations of dataflow analysis
2517 and on the implementations of some comparable dataflow frameworks.
2518
2519 \paragraph{Foundations}
2520
2521 When transfer functions are monotone and lattices are finite in height,
2522 iterative dataflow analysis converges to a fixed point
2523 \cite{kam-ullman:global-iterative-analysis}.
2524 If~the lattice's join operation distributes over transfer
2525 functions,
2526 this fixed point is equivalent to a join-over-all-paths solution to
2527 the recursive dataflow equations
2528 \cite{kildall:unified-optimization}.\footnote
2529 {Kildall uses meets, not joins.
2530 Lattice orientation is conventional, and conventions have changed.
2531 We use Dana Scott's
2532 orientation, in which higher elements carry more information.}
2533 \citet{kam-ullman:monotone-flow-analysis} generalize to some
2534 monotone functions.
2535 Each~client of \hoopl\ must guarantee monotonicity.
2536
2537 \ifcutting
2538 \citet{cousot:abstract-interpretation:1977}
2539 \else
2540 \citet{cousot:abstract-interpretation:1977,cousot:systematic-analysis-frameworks}
2541 \fi
2542 introduce abstract interpretation as a technique for developing
2543 lattices for program analysis.
2544 \citet{schmidt:data-flow-analysis-model-checking} shows that
2545 an all-paths dataflow problem can be viewed as model checking an
2546 abstract interpretation.
2547
2548 \citet{muchnick:compiler-implementation}
2549 presents many examples of both particular analyses and related
2550 algorithms.
2551
2552
2553 The soundness of interleaving analysis and transformation,
2554 even when not all speculative transformations are performed on later
2555 iterations, was shown by
2556 \citet{lerner-grove-chambers:2002}.
2557
2558
2559 \paragraph{Frameworks}
2560 Most dataflow frameworks support only analysis, not transformation.
2561 The framework computes a fixed point of transfer functions, and it is
2562 up to the client of
2563 the framework to use that fixed point for transformation.
2564 Omitting transformation makes it much easier to build frameworks,
2565 and one can find a spectrum of designs.
2566 We~describe two representative
2567 designs, then move on to the prior frameworks that support interleaved
2568 analysis and transformation.
2569
2570 The CIL toolkit \cite{necula:cil:2002}\finalremark{No good citation
2571 for same reason as Soot below ---JD}
2572 provides an analysis-only framework for C~programs.
2573 The framework is limited to one representation of control-flow graphs
2574 and one representation of instructions, both of which are provided by
2575 the framework.
2576 The~API is complicated;
2577 much of the complexity is needed to enable the client to
2578 affect which instructions
2579 the analysis iterates over.
2580
2581
2582 The Soot framework is designed for analysis of Java programs
2583 \cite{hendren:soot:2000}.\finalremark{This citation is probably the
2584 best for Soot in general, but there doesn't appear
2585 to be any formal publication that actually details the dataflow
2586 framework part. ---JD}
2587 While Soot's dataflow library supports only analysis, not
2588 transformation, we found much
2589 to admire in its design.
2590 Soot's library is abstracted over the representation of
2591 the control-flow graph and the representation of instructions.
2592 Soot's interface for defining lattice and analysis functions is
2593 like our own,
2594 although because Soot is implemented in an imperative style,
2595 additional functions are needed to copy lattice elements.
2596 Like CIL, Soot provides only analysis, not transformation.
2597
2598
2599 \finalremark{FYI, LLVM has Pass Managers that try to control the order of passes,
2600 but I'll be darned if I can find anything that might be termed a dataflow framework.}
2601
2602 The Whirlwind compiler contains the dataflow framework implemented
2603 by \citet{lerner-grove-chambers:2002}, who were the first to
2604 interleave analysis and transformation.
2605 Their implementation is much like our early efforts:
2606 it is a complicated mix of code that simultaneously manages interleaving,
2607 deep rewriting, and fixed-point computation.
2608 By~separating these tasks,
2609 our implementation simplifies the problem dramatically.
2610 Whirlwind's implementation also suffers from the difficulty of
2611 maintaining pointer invariants in a mutable representation of
2612 control-flow graphs, a problem we have discussed elsewhere
2613 \cite{ramsey-dias:applicative-flow-graph}.
2614
2615 Because speculative transformation is difficult in an imperative setting,
2616 Whirlwind's implementation is split into two phases.
2617 The first phase runs the interleaved analyses and transformations
2618 to compute the final dataflow facts and a representation of the transformations
2619 that should be applied to the input graph.
2620 The second phase executes the transformations.
2621 In~\hoopl, because control-flow graphs are immutable, speculative transformations
2622 can be applied immediately, and there is no need
2623 for a phase distinction.
2624
2625 %%% % repetitious...
2626 %%%
2627 %%% \ourlib\ also improves upon Whirlwind's dataflow framework by providing
2628 %%% new support for the optimization writer:
2629 %%% \begin{itemize}
2630 %%% \item Using static type guarantees, \hoopl\ rules out a whole
2631 %%% class of possible bugs: transformations that produced malformed
2632 %%% control-flow graphs.
2633 %%% \item Using dynamic testing,
2634 %%% we can isolate the rewrite that transforms a working program
2635 %%% into a faulty program,
2636 %%% using Whalley's \citeyearpar{whalley:isolation} fault-isolation technique.
2637 %%% \end{itemize}
2638
2639 In previous work \cite{ramsey-dias:applicative-flow-graph}, we
2640 described a zipper-based representation of control-flow
2641 graphs, stressing the advantages
2642 of immutability.
2643 Our new representation, described in \secref{graph-rep}, is a significant improvement:
2644 \begin{itemize}
2645 \item
2646 We can concatenate nodes, blocks, and graphs in constant time.
2647 %Previously, we had to resort to Hughes's
2648 %\citeyearpar{hughes:lists-representation:article} technique, representing
2649 %a graph as a function.
2650 \item
2651 We can do a backward analysis without having
2652 to ``unzip'' (and allocate a copy of) each block.
2653 \item
2654 Using GADTs, we can represent a flow-graph
2655 node using a single type, instead of the triple of first, middle, and
2656 last types used in our earlier representation.
2657 This change simplifies the interface significantly:
2658 instead of providing three transfer functions and three rewrite
2659 functions per pass---one for
2660 each type of node---a client of \hoopl\ provides only one transfer
2661 function and one rewrite function per pass.
2662 \item
2663 Errors in concatenation are ruled out at
2664 compile-compile time by Haskell's static
2665 type system.
2666 In~earlier implementations, such errors were not detected until
2667 the compiler~ran, at which point we tried to compensate
2668 for the errors---but
2669 the compensation code harbored subtle faults,
2670 which we discovered while developing a new back end
2671 for the Glasgow Haskell Compiler.
2672 \end{itemize}
2673
2674 The implementation of \ourlib\ is also much better than
2675 our earlier implementations.
2676 Not only is the code simpler conceptually,
2677 but it is also shorter:
2678 our new implementation is about a third as long
2679 as the previous version, which is part of GHC, version 6.12.
2680
2681
2682
2683
2684 \section{What we learned}
2685
2686 (We have also implemented a ``fold style,'' but because the types are
2687 a little less intuitive, we are sticking with @concatMap@ style for now.)
2688
2689
2690 > Some numbers, I have used it nine times, and would need the general fold
2691 > once to define blockToNodeList (or CB* equivalent suggested by you).
2692 > (We are using it in GHC to
2693 > - computing hash of the blocks from the nodes
2694 > - finding the last node of a block
2695 > - converting block to the old representation (2x)
2696 > - computing interference graph
2697 > - counting Area used by a block (2x)
2698 > - counting stack high-water mark for a block
2699 > - prettyprinting block)
2700
2701
2702 type-parameter hell, newtype hell, typechekcing hell, instance hell,
2703 triple hell
2704
2705
2706
2707 We have spent six years implementing and reimplementing frameworks for
2708 dataflow analysis and transformation.
2709 This formidable design problem taught us
2710 two kinds of lessons:
2711 we learned some very specific lessons about representations and
2712 algorithms for optimizing compilers,
2713 and we were forcibly reminded of some very general, old lessons that are well
2714 known not just to functional programmers, but to programmers
2715 everywhere.
2716
2717
2718
2719 %% \remark{Orphaned: but for transfer functions that
2720 %% approximate weakest preconditions or strongest postconditions,
2721 %% monotonicity falls out naturally.}
2722 %%
2723 %%
2724 %% In conclusion we offer the following lessons from the experience of designing
2725 %% and implementing \ourlib{}.
2726 %% \begin{itemize}
2727 %% \item
2728 %% Although we have not stressed this point, there is a close connection
2729 %% between dataflow analyses and program logic:
2730 %% \begin{itemize}
2731 %% \item
2732 %% A forward dataflow analysis is represented by a predicate transformer
2733 %% that is related to \emph{strongest postconditions}
2734 %% \cite{floyd:meaning}.\footnote
2735 %% {In Floyd's paper the output of the predicate transformer is called
2736 %% the \emph{strongest verifiable consequent}, not the ``strongest
2737 %% postcondition.''}
2738 %% \item
2739 %% A backward dataflow analysis is represented by a predicate transformer
2740 %% that is related to \emph{weakest preconditions} \cite{dijkstra:discipline}.
2741 %% \end{itemize}
2742 %% Logicians write down the predicate transformers for the primitive
2743 %% program fragments, and then use compositional rules to ``lift'' them
2744 %% to a logic for whole programs. In the same way \ourlib{} lets the client
2745 %% write simple predicate transformers,
2746 %% and local rewrites based on those assertions, and ``lifts'' them to entire
2747 %% function bodies with arbitrary control flow.
2748
2749 Our main goal for \hoopl\ was to combine three good ideas (interleaved
2750 analysis and transformation, optimization fuel, and an applicative
2751 control-flow graph) in a way that could easily be reused by many, many
2752 compiler writers.
2753 Reuse requires abstraction, and as is well known,
2754 designing good abstractions is challenging.
2755 \hoopl's data types and the functions over those types have been
2756 through \emph{dozens} of revisions.
2757 As~we were refining our design, we~found it invaluable to operate in
2758 two modes:
2759 In the first mode, we designed, built, and used a framework as an
2760 important component of a real compiler (first Quick~{\PAL}, then GHC).
2761 In the second mode, we designed and built a standalone library, then
2762 redesigned and rebuilt it, sometimes going through several significant
2763 changes in a week.
2764 Operating in the first mode---inside a live compiler---forced us to
2765 make sure that no corners were cut, that we were solving a real
2766 problem, and that we did not inadvertently
2767 cripple some other part of the compiler.
2768 Operating in the second mode---as a standalone library---enabled us to
2769 iterate furiously, trying out many more ideas than would have
2770 been possible in the first mode.
2771 We~have learned that alternating between these two modes leads to a
2772 better design than operating in either mode alone.
2773
2774 We were forcibly reminded of timeless truths:
2775 that interfaces are more important than implementations, and that data
2776 is more important than code.
2777 These truths are reflected in this paper, in which
2778 we
2779 have given \hoopl's API three times as much space as \hoopl's implementation.
2780
2781 We were also reminded that Haskell's type system (polymorphism, GADTs,
2782 higher-order functions, type classes, and so on) is a remarkably
2783 effective
2784 language for thinking about data and code---and that
2785 Haskell lacks a language of interfaces (like ML's signatures) that
2786 would make it equally effective for thinking about APIs at a larger scale.
2787 Still, as usual, the types were a remarkable aid to writing the code:
2788 when we finally agreed on the types presented above, the
2789 code almost wrote itself.
2790
2791 Types are widely appreciated at ICFP, but here are three specific
2792 examples of how types helped us:
2793 \begin{itemize}
2794 \item
2795 Reuse is enabled by representation-independence, which in a functional
2796 language is
2797 expressed through parametric polymorphism.
2798 Making \ourlib{} polymorphic in the nodes
2799 made the code simpler, easier to understand, and easier to maintain.
2800 In particular, it forced us to make explicit \emph{exactly} what
2801 \ourlib\ must know about nodes, and to embody that knowledge in the
2802 @NonLocal@ type class (\secref{edges}).
2803 \item
2804 We are proud of using GADTs to
2805 track the open and closed shapes of nodes, blocks, and graphs at
2806 compile time.
2807 Shapes may
2808 seem like a small refinement, but they helped tremendously when
2809 building \hoopl, and we expect them to help clients.
2810 %
2811 % this paper is just not about run-time performance ---NR
2812 %
2813 %%%% Moreover, the implementation is faster than it would otherwise be,
2814 %%%% because, say, a @(Fact O f)e@ is known to be just an @f@ rather than
2815 %%%% being a sum type that must be tested (with a statically known outcome!).
2816 %
2817 Giving the \emph{same} shapes
2818 to nodes, blocks, and graphs helped our
2819 thinking and helped to structure the implementation.
2820 \item
2821 In our earlier designs, graphs were parameterized over \emph{three} node
2822 types: first, middle, and last nodes.
2823 Those designs therefore required
2824 three transfer functions, three rewrite functions, and so~on.
2825 Moving to a single, ``shapely'' node type was a major breakthrough:
2826 not only do we have just one node type, but our client need supply only
2827 one transfer function and one rewrite function.
2828 To~make this design work, however, we \emph{must} have
2829 the type-level function for
2830 @Fact@ (\figref{api-types}), to express how incoming
2831 and outgoing facts depend on the shape of a node.
2832 \end{itemize}
2833
2834 Dataflow optimization is usually described as a way to improve imperative
2835 programs by mutating control-flow graphs.
2836 Such transformations appear very different from the tree rewriting
2837 that functional languages are so well known for, and that makes
2838 functional languages so attractive for writing other parts of compilers.
2839 But even though dataflow optimization looks very different from
2840 what we are used to,
2841 writing a dataflow optimizer
2842 in a pure functional language was a huge win.
2843 % We could not possibly have conceived \ourlib{} in C++.
2844 In~a pure functional language, not only do we know that
2845 no data structure will be unexpectedly mutated,
2846 but we are forced to be
2847 explicit about every input and output,
2848 and we are encouraged to implement things compositionally.
2849 This kind of thinking has helped us make
2850 significant improvements to the already tricky work of Lerner, Grove,
2851 and Chambers:
2852 per-function control of shallow vs deep rewriting
2853 (\secref{shallow-vs-deep}),
2854 combinators for dataflow passes (\secref{combinators}),
2855 optimization fuel (\secref{fuel}),
2856 and transparent management of unreachable blocks (\secref{arf-body}).
2857 We~trust that these improvements are right only because they are
2858 implemented in separate
2859 parts of the code that cannot interact except through
2860 explicit function calls.
2861 %%
2862 %%An ancestor of \ourlib{} is in the Glasgow Haskell Compiler today,
2863 %%in version~6.12.
2864 With this new, improved design in hand, we are now moving back to
2865 live-compiler mode, pushing \hoopl\ into version
2866 6.13 of the Glasgow Haskell Compiler.
2867
2868
2869 \acks
2870
2871 The first and second authors were funded
2872 by a grant from Intel Corporation and
2873 by NSF awards CCF-0838899 and CCF-0311482.
2874 These authors also thank Microsoft Research Ltd, UK, for funding
2875 extended visits to the third author.
2876
2877
2878 \makeatother
2879
2880 \providecommand\includeftpref{\relax} %% total bafflement -- workaround
2881 \IfFileExists{nrbib.tex}{\bibliography{cs,ramsey}}{\bibliography{dfopt}}
2882 \bibliographystyle{plainnatx}
2883
2884
2885 \clearpage
2886
2887 \appendix
2888
2889 % omit LabelSet :: *
2890 % omit LabelMap :: *
2891 % omit delFromFactBase :: FactBase f -> [(Label,f)] -> FactBase f
2892 % omit elemFactBase :: Label -> FactBase f -> Bool
2893 % omit elemLabelSet :: Label -> LabelSet -> Bool
2894 % omit emptyLabelSet :: LabelSet
2895 % omit factBaseLabels :: FactBase f -> [Label]
2896 % omit extendFactBase :: FactBase f -> Label -> f -> FactBase f
2897 % omit extendLabelSet :: LabelSet -> Label -> LabelSet
2898 % omit getFuel :: FuelMonad Fuel
2899 % omit setFuel :: Fuel -> FuelMonad ()
2900 % omit lookupFact :: FactBase f -> Label -> Maybe f
2901 % omit factBaseList :: FactBase f -> [(Label, f)]
2902
2903 \section{Code for \textmd{\texttt{fixpoint}}}
2904 \label{app:fixpoint}
2905
2906 {\def\baselinestretch{0.95}\hfuzz=20pt
2907 \begin{smallcode}
2908 data `TxFactBase n f
2909 = `TxFB { `tfb_fbase :: FactBase f
2910 , `tfb_rg :: DG n f C C -- Transformed blocks
2911 , `tfb_cha :: ChangeFlag
2912 , `tfb_lbls :: LabelSet }
2913 -- Set the tfb_cha flag iff
2914 -- (a) the fact in tfb_fbase for or a block L changes
2915 -- (b) L is in tfb_lbls.
2916 -- The tfb_lbls are all Labels of the *original*
2917 -- (not transformed) blocks
2918
2919 `updateFact :: DataflowLattice f -> LabelSet -> (Label, f)
2920 -> (ChangeFlag, FactBase f)
2921 -> (ChangeFlag, FactBase f)
2922 updateFact ^lat ^lbls (lbl, ^new_fact) (^cha, fbase)
2923 | NoChange <- ^cha2 = (cha, fbase)
2924 | lbl `elemLabelSet` lbls = (SomeChange, new_fbase)
2925 | otherwise = (cha, new_fbase)
2926 where
2927 (cha2, ^res_fact)
2928 = case lookupFact fbase lbl of
2929 Nothing -> (SomeChange, new_fact)
2930 Just ^old_fact -> fact_extend lat old_fact new_fact
2931 ^new_fbase = extendFactBase fbase lbl res_fact
2932
2933 fixpoint :: forall n f. NonLocal n
2934 => Bool -- Going forwards?
2935 -> DataflowLattice f
2936 -> (Block n C C -> FactBase f
2937 -> FuelMonad (DG n f C C, FactBase f))
2938 -> FactBase f -> [(Label, Block n C C)]
2939 -> FuelMonad (DG n f C C, FactBase f)
2940 fixpoint ^is_fwd lat ^do_block ^init_fbase ^blocks
2941 = do { ^fuel <- getFuel
2942 ; ^tx_fb <- loop fuel init_fbase
2943 ; return (tfb_rg tx_fb,
2944 tfb_fbase tx_fb `delFromFactBase` blocks) }
2945 -- The outgoing FactBase contains facts only for
2946 -- Labels *not* in the blocks of the graph
2947 where
2948 `tx_blocks :: [(Label, Block n C C)]
2949 -> TxFactBase n f -> FuelMonad (TxFactBase n f)
2950 tx_blocks [] tx_fb = return tx_fb
2951 tx_blocks ((lbl,blk):bs) tx_fb = tx_block lbl blk tx_fb
2952 >>= tx_blocks bs
2953
2954 `tx_block :: Label -> Block n C C
2955 -> TxFactBase n f -> FuelMonad (TxFactBase n f)
2956 tx_block ^lbl ^blk tx_fb@(TxFB { tfb_fbase = fbase
2957 , tfb_lbls = lbls
2958 , tfb_rg = ^blks
2959 , tfb_cha = cha })
2960 | is_fwd && not (lbl `elemFactBase` fbase)
2961 = return tx_fb -- Note [Unreachable blocks]
2962 | otherwise
2963 = do { (rg, ^out_facts) <- do_block blk fbase
2964 ; let (^cha', ^fbase')
2965 = foldr (updateFact lat lbls) (cha,fbase)
2966 (factBaseList out_facts)
2967 ; return (TxFB { tfb_lbls = extendLabelSet lbls lbl
2968 , tfb_rg = rg `DGCatC` blks
2969 , tfb_fbase = fbase'
2970 , tfb_cha = cha' }) }
2971
2972 loop :: Fuel -> FactBase f -> FuelMonad (TxFactBase n f)
2973 `loop fuel fbase
2974 = do { let ^init_tx_fb = TxFB { tfb_fbase = fbase
2975 , tfb_cha = NoChange
2976 , tfb_rg = DGNil
2977 , tfb_lbls = emptyLabelSet}
2978 ; tx_fb <- tx_blocks blocks init_tx_fb
2979 ; case tfb_cha tx_fb of
2980 NoChange -> return tx_fb
2981 SomeChange -> setFuel fuel >>
2982 loop fuel (tfb_fbase tx_fb) }
2983 \end{smallcode}
2984 \par
2985 } % end \baselinestretch
2986
2987
2988 \section{Index of defined identifiers}
2989
2990 This appendix lists every nontrivial identifier used in the body of
2991 the paper.
2992 For each identifier, we list the page on which that identifier is
2993 defined or discussed---or when appropriate, the figure (with line
2994 number where possible).
2995 For those few identifiers not defined or discussed in text, we give
2996 the type signature and the page on which the identifier is first
2997 referred to.
2998
2999 Some identifiers used in the text are defined in the Haskell Prelude;
3000 for those readers less familiar with Haskell, these identifiers are
3001 listed in Appendix~\ref{sec:prelude}.
3002
3003 \newcommand\dropit[3][]{}
3004
3005 \newcommand\hsprelude[2]{\noindent
3006 \texttt{#1} defined in the Haskell Prelude\\}
3007 \let\hsprelude\dropit
3008
3009 \newcommand\hspagedef[3][]{\noindent
3010 \texttt{#2} defined on page~\pageref{#3}.\\}
3011 \newcommand\omithspagedef[3][]{\noindent
3012 \texttt{#2} not shown (but see page~\pageref{#3}).\\}
3013 \newcommand\omithsfigdef[3][]{\noindent
3014 \texttt{#2} not shown (but see Figure~\ref{#3} on page~\pageref{#3}).\\}
3015 \newcommand\hsfigdef[3][]{%
3016 \noindent
3017 \ifx!#1!%
3018 \texttt{#2} defined in Figure~\ref{#3} on page~\pageref{#3}.\\
3019 \else
3020 \texttt{#2} defined on \lineref{#1} of Figure~\ref{#3} on page~\pageref{#3}.\\
3021 \fi
3022 }
3023 \newcommand\hstabdef[3][]{%
3024 \noindent
3025 \ifx!#1!
3026 \texttt{#2} defined in Table~\ref{#3} on page~\pageref{#3}.\\
3027 \else
3028 \texttt{#2} defined on \lineref{#1} of Table~\ref{#3} on page~\pageref{#3}.\\
3029 \fi
3030 }
3031 \newcommand\hspagedefll[3][]{\noindent
3032 \texttt{#2} {let}- or $\lambda$-bound on page~\pageref{#3}.\\}
3033 \newcommand\hsfigdefll[3][]{%
3034 \noindent
3035 \ifx!#1!%
3036 \texttt{#2} {let}- or $\lambda$-bound in Figure~\ref{#3} on page~\pageref{#3}.\\
3037 \else
3038 \texttt{#2} {let}- or $\lambda$-bound on \lineref{#1} of Figure~\ref{#3} on page~\pageref{#3}.\\
3039 \fi
3040 }
3041
3042 \newcommand\nothspagedef[3][]{\notdefd\ndpage{#1}{#2}{#3}}
3043 \newcommand\nothsfigdef[3][]{\notdefd\ndfig{#1}{#2}{#3}}
3044 \newcommand\nothslinedef[3][]{\notdefd\ndline{#1}{#2}{#3}}
3045
3046 \newcommand\ndpage[3]{\texttt{#2}~(p\pageref{#3})}
3047 \newcommand\ndfig[3]{\texttt{#2}~(Fig~\ref{#3},~p\pageref{#3})}
3048 \newcommand\ndline[3]{%
3049 \ifx!#1!%
3050 \ndfig{#1}{#2}{#3}%
3051 \else
3052 \texttt{#2}~(Fig~\ref{#3}, line~\lineref{#1}, p\pageref{#3})%
3053 \fi
3054 }
3055
3056 \newif\ifundefinedsection\undefinedsectionfalse
3057
3058 \newcommand\notdefd[4]{%
3059 \ifundefinedsection
3060 , #1{#2}{#3}{#4}%
3061 \else
3062 \undefinedsectiontrue
3063 \par
3064 \section{Undefined identifiers}
3065 #1{#2}{#3}{#4}%
3066 \fi
3067 }
3068
3069 \begingroup
3070 \raggedright
3071
3072 \input{defuse}%
3073 \ifundefinedsection.\fi
3074
3075 \undefinedsectionfalse
3076
3077
3078 \renewcommand\hsprelude[2]{\noindent
3079 \ifundefinedsection
3080 , \texttt{#1}%
3081 \else
3082 \undefinedsectiontrue
3083 \par
3084 \section{Identifiers defined in Haskell Prelude or a standard library}\label{sec:prelude}
3085 \texttt{#1}%
3086 \fi
3087 }
3088 \let\hspagedef\dropit
3089 \let\omithspagedef\dropit
3090 \let\omithsfigdef\dropit
3091 \let\hsfigdef\dropit
3092 \let\hstabdef\dropit
3093 \let\hspagedefll\dropit
3094 \let\hsfigdefll\dropit
3095 \let\nothspagedef\dropit
3096 \let\nothsfigdef\dropit
3097 \let\nothslinedef\dropit
3098
3099 \input{defuse}
3100 \ifundefinedsection.\fi
3101
3102
3103
3104 \endgroup
3105
3106
3107 \iffalse
3108
3109 \section{Dataflow-engine functions}
3110
3111
3112 \begin{figure*}
3113 \setcounter{codeline}{0}
3114 \begin{numberedcode}
3115 \end{numberedcode}
3116 \caption{The forward iterator}
3117 \end{figure*}
3118
3119 \begin{figure*}
3120 \setcounter{codeline}{0}
3121 \begin{numberedcode}
3122 \end{numberedcode}
3123 \caption{The forward actualizer}
3124 \end{figure*}
3125
3126
3127 \fi
3128
3129
3130
3131 \end{document}
3132
3133
3134
3135
3136 THE FUEL PROBLEM:
3137
3138
3139 Here is the problem:
3140
3141 A graph has an entry sequence, a body, and an exit sequence.
3142 Correctly computing facts on and flowing out of the body requires
3143 iteration; computation on the entry and exit sequences do not, since
3144 each is connected to the body by exactly one flow edge.
3145
3146 The problem is to provide the correct fuel supply to the combined
3147 analysis/rewrite (iterator) functions, so that speculative rewriting
3148 is limited by the fuel supply.
3149
3150 I will number iterations from 1 and name the fuel supplies as
3151 follows:
3152
3153 f_pre fuel remaining before analysis/rewriting starts
3154 f_0 fuel remaining after analysis/rewriting of the entry sequence
3155 f_i, i>0 fuel remaining after iteration i of the body
3156 f_post fuel remaining after analysis/rewriting of the exit sequence
3157
3158 The issue here is that only the last iteration of the body 'counts'.
3159 To formalize, I will name fuel consumed:
3160
3161 C_pre fuel consumed by speculative rewrites in entry sequence
3162 C_i fuel consumed by speculative rewrites in iteration i of body
3163 C_post fuel consumed by speculative rewrites in exit sequence
3164
3165 These quantities should be related as follows:
3166
3167 f_0 = f_pre - C_pref
3168 f_i = f_0 - C_i where i > 0
3169 f_post = f_n - C_post where iteration converges after n steps
3170
3171 When the fuel supply is passed explicitly as parameter and result, it
3172 is fairly easy to see how to keep reusing f_0 at every iteration, then
3173 extract f_n for use before the exit sequence. It is not obvious to me
3174 how to do it cleanly using the fuel monad.
3175
3176
3177 Norman