Better, more consistent examples of why the client gets a monad of its own.
[packages/hoopl.git] / paper / dfopt.tex
1 \newif\ifhaskellworkshop\haskellworkshoptrue
2
3 \iffalse
4
5 Payload: alternatives for functions polymorphic in node type
6
7 (n C O -> a, n C O -> b, n C O -> c)
8
9 vs
10
11 (forall e x . n e x -> ShapeTag e x, forall e x . n e x -> result e x)
12
13 where
14
15 data ShapeTag e x where
16 First :: ShapeTag C O
17 Middle :: ShapeTag O O
18 Last :: ShapeTag O C
19
20 result C O = a
21 result O O = b
22 result O C = c
23
24 The function returning ShapeTag can be passed in a type-class
25 dictionary.
26
27
28
29
30 Say something about the proliferation of heavyweight type signatures
31 required for GADT pattern matches. When the signature is three times
32 the size of the function, something is wrong...
33
34
35 I'm going to leave this point out, because in all the client code
36 we've written, we are (a) matching only on a node, and (b) matching on
37 all cases. In this scenario the GADT exhaustiveness checker provides
38 no additional benefits. Indeed, GADTs can be an irritant to the
39 client: in many pattern matches, GADTs make it impossible to default
40 cases.
41
42
43 I was thinking again about unwrapped nodes, cons-lists, snoc-lists,
44 and tree fringe. I think there's an analogy with ordinary lists, and
45 the analogy is 'concatMap' vs 'fold'. Just as with lists, the common
46 case of 'concatMap (\a -> [a])' does a lot of unnecessary wrapping and
47 unwrapping of elements. We nevertheless prefer 'concatMap' because it
48 provides better separation of concerns. But this analogy suggests
49 several ideas:
50
51 - Can block processing be written in higher-order fashion?
52
53 - Can it be done in both styles (concatMap *and* fold)?
54
55 - Once we have framed the problem in these terms, can we write
56 fold-style cold that is not too hard to understand and maintain?
57
58 - Finally, is there an analog of stream fusion that would work for
59 control-flow graphs, enabling us to write some graph combinators
60 that be both perspicuous and efficient?
61
62 These are good observations for the paper and for future work.
63
64
65 ----------------------------------------------------------------
66
67
68 P.S. The three of us should have a nice little Skype chat about
69 higher-rank types. A lot of the questions still at issue boil down to
70 the following observations:
71
72 - It's really convenient for the *implementation* of Hoopl to put
73 forall to the left of an arrow. Polymorphic functions as
74 arguments are powerful and keen, and they make it really easy for
75 Hoopl to do its job.
76
77 - It's a bid inconvenient for a *client* of Hoopl to be forced to
78 supply functions that are polymorphic in shape. All constructors
79 have to be written out by hand; defaulting is not possible. (Or
80 at least John and I haven't figured out how to do it.)
81
82 - Using higher-rank types in Hoopl's interface makes some very
83 desirable things downright impossible:
84
85 - One can't write a generic dominator analysis with type
86
87 dom :: NonLocal n => FwdPass n Dominators
88
89 - One can't write a generic debugging wrapper of type
90
91 debug :: (Show n, Show f)
92 => TraceFn -> WhatToTrace -> FwdPass n f -> FwdPass n f
93
94 - One can't write a combinator of type
95
96 product :: FwdPass n f -> FwdPass n f' -> FwdPass n (f, f')
97
98 I submit that these things are all very desirable.
99
100 I'm therefore strongly in favor of removing the *requirement* that a
101 FwdPass include transfer and rewrite functions that are polymorphic in
102 shape. It will be a bit tedious to return to triples of functions,
103 but for those lucky clients who *can* write polymorphic functions and
104 who wish to, we can provide injection functions.
105
106 I should stress that I believe the tedium can be contained within
107 reasonable bounds---for example, the arfXXX functions that are
108 internal to Hoopl all remain polymorphic in shape (although not
109 higher-rank any longer).
110
111 \fi
112
113
114 %
115 % TODO
116 %
117 %
118 % AGraph = graph under construction
119 % Graph = replacement graph
120 %
121 %%% Hoopl assumes that the replacement graph for a node open on exit
122 %%% doesn't contain any additional exits
123 %%%
124 %%% introduce replacement graph in section 3, after graph.
125 %%% it has arbitrarily complicated internal control flow, but only
126 %%% one exit (if open on exit)
127 %%%
128 %%% a rquirement on the client that is not checked statically.
129
130
131 \input{dfoptdu.tex}
132
133 \newif\ifpagetuning \pagetuningtrue % adjust page breaks
134
135 \newif\ifnoauthornotes \noauthornotesfalse
136
137 \newif\iftimestamp\timestamptrue % show MD5 stamp of paper
138
139 % \timestampfalse % it's post-submission time
140
141 \IfFileExists{timestamp.tex}{}{\timestampfalse}
142
143 \newif\ifcutting \cuttingfalse % cutting down to submission size
144
145
146 \newif\ifgenkill\genkillfalse % have a section on gen and kill
147 \genkillfalse
148
149
150 \newif\ifnotesinmargin \notesinmarginfalse
151 \IfFileExists{notesinmargin.tex}{\notesinmargintrue}{\relax}
152
153 \documentclass[blockstyle,natbib,preprint]{sigplanconf}
154
155 \newcommand\ourlib{Hoopl}
156 % higher-order optimization library
157 % ('Hoople' was taken -- see hoople.org)
158 \let\hoopl\ourlib
159
160 \newcommand\ag{\ensuremath{\mathit{ag}}}
161 \renewcommand\ag{\ensuremath{g}} % not really seeing that 'ag' is helpful here ---NR
162 \newcommand\rw{\ensuremath{\mathit{rw}}}
163
164 % l2h substitution ourlib Hoopl
165 % l2h substitution hoopl Hoopl
166
167 \newcommand\fs{\ensuremath{\mathit{fs}}} % dataflow facts, possibly plural
168
169 \newcommand\vfilbreak[1][\baselineskip]{%
170 \vskip 0pt plus #1 \penalty -200 \vskip 0pt plus -#1 }
171
172 \usepackage{alltt}
173 \usepackage{array}
174 \usepackage{afterpage}
175 \newcommand\lbr{\char`\{}
176 \newcommand\rbr{\char`\}}
177
178 \clubpenalty=10000
179 \widowpenalty=10000
180
181 \usepackage{verbatim} % allows to define \begin{smallcode}
182 \newenvironment{smallcode}{\par\unskip\small\verbatim}{\endverbatim}
183 \newenvironment{fuzzcode}[1]{\par\unskip\hfuzz=#1 \verbatim}{\endverbatim}
184 \newenvironment{smallfuzzcode}[1]{\par\unskip\small\hfuzz=#1 \verbatim}{\endverbatim}
185 \newenvironment{smallttcode}{\par\unskip\small\alltt}{\endalltt}
186
187 \newcommand\smallverbatiminput[1]{{\small\verbatiminput{#1}}}
188
189 \newcommand\lineref[1]{line~\ref{line:#1}}
190 \newcommand\linepairref[2]{lines \ref{line:#1}~and~\ref{line:#2}}
191 \newcommand\linerangeref[2]{\mbox{lines~\ref{line:#1}--\ref{line:#2}}}
192 \newcommand\Lineref[1]{Line~\ref{line:#1}}
193 \newcommand\Linepairref[2]{Lines \ref{line:#1}~and~\ref{line:#2}}
194 \newcommand\Linerangeref[2]{\mbox{Lines~\ref{line:#1}--\ref{line:#2}}}
195
196 \makeatletter
197
198 \def \@copyrightspace {%
199 \@float{copyrightbox}[b]%
200 \vbox to 0.8in{% less than 1in, please
201 \vfill
202 \parbox[b]{20pc}{%
203 \scriptsize
204 \if \@preprint
205 [Copyright notice will appear here
206 once 'preprint' option is removed.]\par
207 \else
208 \@toappear
209 \fi
210 \if \@reprint
211 \noindent Reprinted from \@conferencename,
212 \@proceedings,
213 \@conferenceinfo,
214 pp.~\number\thepage--\pageref{sigplanconf@finalpage}.\par
215 \fi}}%
216 \end@float}
217
218
219 \let\c@table=
220 \c@figure % one counter for tables and figures, please
221
222
223 \newcommand\setlabel[1]{%
224 \setlabel@#1!!\@endsetlabel
225 }
226 \def\setlabel@#1!#2!#3\@endsetlabel{%
227 \ifx*#1*% line begins with label or is empty
228 \ifx*#2*% line is empty
229 \verbatim@line{}%
230 \else
231 \@stripbangs#3\@endsetlabel%
232 \label{line:#2}%
233 \fi
234 \else
235 \@stripbangs#1!#2!#3\@endsetlabel%
236 \fi
237 }
238 \def\@stripbangs#1!!\@endsetlabel{%
239 \verbatim@line{#1}%
240 }
241
242
243 \verbatim@line{hello mama}
244
245 \newcommand{\numberedcodebackspace}{0.5\baselineskip}
246
247 \newcounter{codeline}
248 \newenvironment{numberedcode}
249 {\endgraf
250 \def\verbatim@processline{%
251 \noindent
252 \expandafter\ifx\expandafter+\the\verbatim@line+ % blank line
253 %{\small\textit{\def\rmdefault{cmr}\rmfamily\phantom{00}\phantom{: \,}}}%
254 \else
255 \refstepcounter{codeline}%
256 {\small\textit{\def\rmdefault{cmr}\rmfamily\phantom{00}\llap{\arabic{codeline}}: \,}}%
257 \fi
258 \expandafter\setlabel\expandafter{\the\verbatim@line}%
259 \expandafter\ifx\expandafter+\the\verbatim@line+ % blank line
260 \vspace*{-\numberedcodebackspace}\par%
261 \else
262 \the\verbatim@line\par
263 \fi}%
264 \verbatim
265 }
266 {\endverbatim}
267
268 \makeatother
269
270 \newcommand\arrow{\rightarrow}
271
272 \newcommand\join{\sqcup}
273 \newcommand\slotof[1]{\ensuremath{s_{#1}}}
274 \newcommand\tempof[1]{\ensuremath{t_{#1}}}
275 \let\tempOf=\tempof
276 \let\slotOf=\slotof
277
278 \makeatletter
279 \newcommand{\nrmono}[1]{%
280 {\@tempdima = \fontdimen2\font\relax
281 \texttt{\spaceskip = 1.1\@tempdima #1}}}
282 \makeatother
283
284 \usepackage{times} % denser fonts
285 \renewcommand{\ttdefault}{aett} % \texttt that goes better with times fonts
286 \usepackage{enumerate}
287 \usepackage{url}
288 \usepackage{graphicx}
289 \usepackage{natbib} % redundant for Simon
290 \bibpunct();A{},
291 \let\cite\citep
292 \let\citeyearnopar=\citeyear
293 \let\citeyear=\citeyearpar
294
295 \usepackage[ps2pdf,bookmarksopen,breaklinks,pdftitle=dataflow-made-simple]{hyperref}
296 \usepackage{breakurl} % enables \burl
297
298 \newcommand\naive{na\"\i ve}
299 \newcommand\naively{na\"\i vely}
300 \newcommand\Naive{Na\"\i ve}
301
302 \usepackage{amsfonts}
303 \newcommand\naturals{\ensuremath{\mathbb{N}}}
304 \newcommand\true{\ensuremath{\mathbf{true}}}
305 \newcommand\implies{\supseteq} % could use \Rightarrow?
306
307 \newcommand\PAL{\mbox{C{\texttt{-{}-}}}}
308 \newcommand\high[1]{\mbox{\fboxsep=1pt \smash{\fbox{\vrule height 6pt
309 depth 0pt width 0pt \leavevmode \kern 1pt #1}}}}
310
311 \usepackage{tabularx}
312
313 %%
314 %% 2009/05/10: removed 'float' package because it breaks multiple
315 %% \caption's per {figure} environment. ---NR
316 %%
317 %% % Put figures in boxes --- WHY??? --NR
318 %% \usepackage{float}
319 %% \floatstyle{boxed}
320 %% \restylefloat{figure}
321 %% \restylefloat{table}
322
323
324
325 % ON LINE THREE, set \noauthornotestrue to suppress notes (or not)
326
327 %\newcommand{\qed}{QED}
328 \ifnotesinmargin
329 \long\def\authornote#1{%
330 \ifvmode
331 \marginpar{\raggedright\hbadness=10000
332 \parindent=8pt \parskip=2pt
333 \def\baselinestretch{0.8}\tiny
334 \itshape\noindent #1\par}%
335 \else
336 \unskip\raisebox{-3.5pt}{\rlap{$\scriptstyle\diamond$}}%
337 \marginpar{\raggedright\hbadness=10000
338 \parindent=8pt \parskip=2pt
339 \def\baselinestretch{0.8}\tiny
340 \itshape\noindent #1\par}%
341 \fi}
342 \else
343 % Simon: please set \notesinmarginfalse on the first line
344 \long\def\authornote#1{{\em #1\/}}
345 \fi
346 \ifnoauthornotes
347 \def\authornote#1{\unskip\relax}
348 \fi
349
350 \long\def\newtext#1{{\mbox{$\clubsuit$}}{\slshape\ignorespaces#1}{\mbox{$\clubsuit$}}}
351 \newenvironment{ntext}{\mbox{$\clubsuit$}\slshape\ignorespaces}{\mbox{$\clubsuit$}}
352
353 \newcommand{\simon}[1]{\authornote{SLPJ: #1}}
354 \newcommand{\norman}[1]{\authornote{NR: #1}}
355 \let\remark\norman
356 \def\finalremark#1{\relax}
357 \let \finalremark \remark % uncomment after submission
358 \newcommand{\john}[1]{\authornote{JD: #1}}
359 \newcommand{\todo}[1]{\textbf{To~do:} \emph{#1}}
360 \newcommand\delendum[1]{\relax\ifvmode\else\unskip\fi\relax}
361
362 \newcommand\secref[1]{Section~\ref{sec:#1}}
363 \newcommand\secreftwo[2]{Sections \ref{sec:#1}~and~\ref{sec:#2}}
364 \newcommand\seclabel[1]{\label{sec:#1}}
365
366 \newcommand\figref[1]{Figure~\ref{fig:#1}}
367 \newcommand\figreftwo[2]{Figures \ref{fig:#1}~and~\ref{fig:#2}}
368 \newcommand\figlabel[1]{\label{fig:#1}}
369
370 \newcommand\tabref[1]{Table~\ref{tab:#1}}
371 \newcommand\tablabel[1]{\label{tab:#1}}
372
373
374 \newcommand{\CPS}{\textbf{StkMan}} % Not sure what to call it.
375
376
377 \usepackage{code} % At-sign notation
378
379 \IfFileExists{timestamp.tex}{\input{timestamp}}{}
380 \iftimestamp
381 \preprintfooter{\mdfivestamp}
382 \fi
383
384 \hyphenation{there-by}
385
386 \renewcommand{\floatpagefraction}{0.9} % must be less than \topfraction
387 \renewcommand{\topfraction}{0.95}
388 \renewcommand{\textfraction}{0.05}
389
390 \begin{document}
391 %\title{\ourlib: Dataflow Optimization Made Simple}
392 \title{\ourlib: A Modular, Reusable Library for\\ Dataflow Analysis and Transformation}
393 %\title{Implementing Dataflow Analysis and Optimization by Lifting Node Functions to Basic Blocks and Control-Flow Graphs}
394 %\subtitle{Programming pearl}
395
396 \titlebanner{\textsf{\mdseries\itshape
397 Under consideration for publication in the ACM Haskell Symposium.
398 Comments
399 are welcome; please identify this version by the words
400 \textbf{\mdfivewords}.
401 }}
402
403 \conferenceinfo{Haskell'10,} {September 30, 2010, Baltimore, Maryland, USA.}
404 \CopyrightYear{2010}
405 \copyrightdata{978-1-4503-0252-4/10/09}
406
407 \ifnoauthornotes
408 \makeatletter
409 \let\HyPsd@Warning=
410 \@gobble
411 \makeatother
412 \fi
413
414 % João
415
416
417 \authorinfo{Norman Ramsey}{Tufts University}{nr@cs.tufts.edu}
418 \authorinfo{Jo\~ao Dias}{Tufts University}{dias@cs.tufts.edu}
419 \authorinfo{Simon Peyton Jones}{Microsoft Research}{simonpj@microsoft.com}
420
421
422 \maketitle
423
424 \begin{abstract}
425 \iffalse % A vote for Simon's abstract
426 \remark{I have replaced a good abstract of the POPL submission with a
427 bad abstract of \emph{this} submission.}
428 We present \ourlib, a Haskell library that makes it easy for a
429 compiler writer
430 to implement program transformations based on dataflow analyses.
431 A~client of \ourlib\ defines a representation of
432 logical assertions,
433 a transfer function that computes outgoing assertions from incoming
434 assertions,
435 and a rewrite function that improves code when improvements are
436 justified by the assertions.
437 \ourlib\ does the actual analysis and transformation.
438
439 \ourlib\ implements state-of-the art algorithms:
440 Lerner, Grove, and Chambers's
441 \citeyearpar{lerner-grove-chambers:2002}
442 composition of simple analyses and transformations, which achieves
443 the same precision as complex, handwritten
444 ``super-analyses;''
445 and Whalley's \citeyearpar{whalley:isolation} dynamic technique for
446 isolating bugs in a client's code.
447 \ourlib's implementation is unique in that unlike previous
448 implementations,
449 it carefully separates the tricky
450 elements of each of these algorithms, so that they can be examined and
451 understood independently.
452
453
454 \simon{Here is an alternative abstract based on the four-sentence model.}
455 \remark{Four-sentence model? You must teach me\ldots}
456 \fi
457 Dataflow analysis and transformation of control-flow graphs is
458 pervasive in optimizing compilers, but it is typically tightly
459 interwoven with the details of a \emph{particular} compiler.
460 We~describe \ourlib{}, a~reusable Haskell library that makes it
461 unusually easy to define new analyses and
462 transformations for \emph{any} compiler written in Haskell.
463 \ourlib's interface is modular and polymorphic,
464 and it offers unusually strong static guarantees.
465 The implementation encapsulates
466 state-of-the-art algorithms (interleaved analysis and rewriting,
467 dynamic error isolation), and it cleanly separates their tricky elements
468 so that they can be understood independently.
469 %
470 %\ourlib\ will be the workhorse of a new
471 %back end for the Glasgow Haskell Compiler (version~6.14, forthcoming).
472
473 \emph{Readers:} code examples are indexed at
474 \iffalse % one more line on page 1 blows up the paper...
475 {\url{http://bit.ly/doUJpr}}
476 \else
477 {\url{http://www.cs.tufts.edu/~nr/pubs/hoopl10-supplement.pdf}}
478 (\url{bit.ly/doUJpr}).
479 \fi
480 %%% Source: http://www.cs.tufts.edu/~nr/drop/popl-index.pdf
481 \end{abstract}
482
483 \makeatactive % Enable @foo@ notation
484
485 \section{Introduction}
486
487 A mature optimizing compiler for an imperative language includes many
488 analyses, the results of which justify the optimizer's
489 code-improving transformations.
490 Many analyses and transformations---constant
491 propagation, live-variable analysis, inlining, sinking of loads,
492 and so on---should be regarded as particular cases of
493 a single general problem: \emph{dataflow analysis and optimization}.
494 %% \remark{I do not feel compelled to cite Muchnick (or anyone else) here}
495 Dataflow analysis is over thirty years old,
496 but a recent, seminal paper by \citet{lerner-grove-chambers:2002} goes further,
497 describing a powerful but subtle way to
498 \emph{interleave} analysis and transformation so that each
499 piggybacks on the other.
500
501 Because optimizations based on dataflow analysis
502 share a common intellectual framework, and because that framework is
503 subtle, it~is tempting to
504 try to build a single reusable library that embodies the
505 subtle ideas, while
506 making it easy for clients to instantiate the library for different
507 situations.
508 % Tempting, but difficult.
509 Although such libraries exist, as we discuss
510 in \secref{related}, they have complex APIs and implementations,
511 and none implements the Lerner/Grove/Chambers technique.
512
513 In this paper we present \ourlib{} (short for ``higher-order
514 optimization library''), a new Haskell library for dataflow analysis and
515 optimization. It has the following distinctive characteristics:
516
517 \begin{itemize}
518 \item
519 \ourlib\ is purely functional.
520 Although pure functional languages are not obviously suited to writing
521 standard algorithms that
522 transform control-flow graphs,
523 pure functional code is actually easier to write, and far easier
524 to write correctly, than code that is mostly functional but uses a mutable
525 representation of graphs
526 \cite{ramsey-dias:applicative-flow-graph}.
527 When analysis and transformation
528 are interleaved, so that graphs must be transformed \emph{speculatively},
529 without knowing whether
530 a~transformed graph will be retained or discarded,
531 pure functional code offers even more benefits.
532
533 \item
534 \ourlib\ is polymorphic. Just as a list library is
535 polymorphic in the list elements, so is \ourlib{} polymorphic, both in
536 the nodes that inhabit graphs and in the dataflow facts that
537 analyses compute over these graphs (\secref{using-hoopl}).
538
539 \item
540 The paper by Lerner, Grove, and Chambers is inspiring but abstract.
541 We articulate their ideas in a concrete, simple API,
542 which hides
543 a subtle implementation
544 (\secreftwo{graph-rep}{using-hoopl}).
545 You provide a representation for assertions,
546 a transfer function that transforms assertions across a node,
547 and a rewrite function that uses an assertion to
548 justify rewriting a node.
549 \ourlib\ ``lifts'' these node-level functions to work over
550 control-flow graphs, solves recursion equations,
551 and interleaves rewriting with analysis.
552 Designing APIs is surprisingly
553 hard; we have been through over a dozen significantly different
554 iterations, and we offer our API as a contribution.
555
556 \item
557 Because the client
558 can perform very local reasoning (``@y@ is live before
559 @x:=y+2@''),
560 \seclabel{liveness-mention}
561 analyses and transformations built on \ourlib\
562 are small, simple, and easy to get right.
563 Moreover, \ourlib\ helps you write correct optimizations:
564 it~statically rules out transformations that violate invariants
565 of the control-flow graph (Sections \ref{sec:graph-rep} and \ref{sec:rewrites}),
566 and dynamically it can help find the first transformation that introduces a fault
567 in a test program (\secref{fuel}).
568 %%\finalremark{SLPJ: I wanted to write more about open/closed,
569 %%but I like this sentence with its claim to both static and dynamic assistance,
570 %%and maybe the open/closed story is hard to understand here.}
571
572 % \item \ourlib{} makes use of GADTS and type functions to offer unusually
573 % strong static guarantees. In particular, nodes, basic blocks, and
574 % graphs are all statically typed by their open or closedness on entry, and
575 % their open or closedness on exit (\secref{graph-rep}). For example, an add instruction is
576 % open on entry and exit, while a branch is open on entry and closed on exit.
577 % Using these types we can statically guarantee that, say, an add instruction
578 % is rewritten to a graph that is also open at both entry and exit; and
579 % that the user cannot construct a block where an add instruction follows an
580 % unconditional branch. We know of no other system that offers
581 % static guarantees this strong.
582
583 \item \ourlib{} implements subtle algorithms, including
584 (a)~interleaved analysis and rewriting, (b)~speculative rewriting,
585 (c)~computing fixed points, and (d)~dynamic fault isolation.
586 Previous implementations of these algorithms---including three of our
587 own---are complicated and hard to understand, because the tricky pieces
588 are implemented all together, inseparably.
589 In~this paper, each tricky piece is handled in just
590 one place, separate from all the others (\secref{engine}).
591 We~emphasize this implementation as an object of interest in
592 its own right.
593 \end{itemize}
594 Our work bridges the gap between abstract, theoretical presentations
595 and actual compilers.
596 \ourlib{} is available from
597 \burl{http://ghc.cs.tufts.edu/hoopl} and also from Hackage (version~3.8.3.0).
598 One of \hoopl's clients
599 is the Glasgow Haskell Compiler, which uses \hoopl\ to optimize
600 imperative code in GHC's back end.
601
602 \ourlib's API is made possible by
603 sophisticated aspects of Haskell's type system, such
604 as higher-rank polymorphism, GADTs, and type functions.
605 \ourlib{} may therefore also serve as a case study in the utility
606 of these features.
607
608
609 \section{Dataflow analysis {\&} transformation by \texorpdfstring{\rlap{example}}{example}}
610 \seclabel{overview}
611 \seclabel{constant-propagation}
612 \seclabel{example:transforms}
613 \seclabel{example:xforms}
614
615 %% We begin by setting the scene, introducing some vocabulary, and
616 %% showing a small motivating example.
617 A \emph{control-flow graph}, perhaps representing the body of a procedure,
618 is a collection of \emph{basic blocks}---or just ``blocks.''
619 Each block is a sequence of instructions,
620 beginning with a label and ending with a
621 control-transfer instruction that branches to other blocks.
622 % Each block has a label at the beginning,
623 % a sequence of
624 % -- each of which has a label at the
625 % beginning. Each block may branch to other blocks with arbitrarily
626 % complex control flow.
627 The goal of dataflow optimization is to compute valid
628 \emph{assertions} (or \emph{dataflow facts}),
629 then use those assertions to justify code-improving
630 transformations (or \emph{rewrites}) on a {control-flow graph}.
631
632 As~a concrete example, we show constant propagation with constant
633 folding.
634 On the left we show a basic block; in the middle we show
635 facts that hold between statements (or \emph{nodes})
636 in the block; and at
637 the right we show the result of transforming the block
638 based on the assertions:
639 \begin{verbatim}
640 Before Facts After
641 ------------{}-------------
642 x := 3+4 x := 7
643 ----------{x=7}------------
644 z := x>5 z := True
645 -------{x=7, z=True}-------
646 if z goto L1
647 then goto L1
648 else goto L2
649 \end{verbatim}
650 Constant propagation works
651 from top to bottom. We start with the empty fact.
652 Given the empty fact and the node @x:=3+4@ can we make a (constant-folding)
653 transformation?
654 Yes: we can replace the node with @x:=7@.
655 Now, given this transformed node
656 and the original fact, what fact flows out of the bottom of
657 the transformed node?
658 The~fact~\mbox{\{@x=7@\}}.
659 Given the fact \{@x=7@\} and the node @z:=x>5@, can we make a
660 transformation? Yes: constant propagation can replace the node with @z:=7>5@.
661 Now, can we make another transformation? Yes: constant folding can
662 replace the node with @z:=True@.
663 The~process continues to the end of the block, where we
664 can replace the conditional branch with an unconditional one, @goto L1@.
665
666 The example above is simple because the program has only straightline code;
667 when programs have loops, dataflow analysis gets more complicated.
668 For example,
669 consider the following graph,
670 where we assume @L1@ is the entry point:
671 \begin{verbatim}
672 L1: x=3; y=4; if z then goto L2 else goto L3
673 L2: x=7; goto L3
674 L3: ...
675 \end{verbatim}
676 Because control flows to @L3@ from two places (@L1@~and~@L2@),
677 we must \emph{join} the facts coming from those two places.
678 All paths to @L3@ produce the fact @y=@$4$,
679 so we can conclude that this fact holds at~@L3@.
680 But depending on the the path to~@L3@, @x@~may have different
681 values, so we conclude ``@x=@$\top$'',
682 meaning that there is no single value held by~@x@ at~@L3@.
683 %\footnote{
684 %In this example @x@ really does vary at @L3@, but in general
685 %the analysis might be conservative.}
686 The final result of joining the dataflow facts that flow to~@L3@
687 is the new fact $\mathord{@x=@\top\!} \land \mathord{@y=4@} \land \mathord{@z=@\top}$.
688
689 \seclabel{const-prop-example}
690
691 \paragraph{Forwards and backwards.}
692 Constant propagation works \emph{forwards}, and a fact is often an
693 assertion about the program state, such as ``variable~@x@ holds value~@7@.''
694 Some useful analyses work \emph{backwards}.
695 A~prime example is live-variable analysis, where a fact takes the~form
696 ``variable @x@ is live'' and is an assertion about the
697 \emph{continuation} of a program point. For example, the fact ``@x@~is
698 live'' at a program point P is an assertion that @x@ is used on some program
699 path starting at \mbox{P}. % TeXbook, exercise 12.6
700 The accompanying transformation is called dead-code elimination;
701 if @x@~is not live, this transformation
702 replaces the node @x:=e@ with a no-op.
703
704 \seclabel{simple-tx}
705 \paragraph{Interleaved transformation and analysis.}
706 Our first example \emph{interleaves} transformation and analysis.
707 Interleaving makes it far easier to write effective analyses.
708 If, instead, we \emph{first} analyzed the block
709 and \emph{then} transformed it, the analysis would have to ``predict''
710 the transformations.
711 For example, given the incoming fact \{@x=7@\}
712 and the instruction @z:=x>5@,
713 a pure analysis could produce the outgoing fact
714 \{@x=7@, @z=True@\} by simplifying @x>5@ to @True@.
715 But the subsequent transformation must perform
716 \emph{exactly the same simplification} when it transforms the instruction to @z:=True@!
717 If instead we \emph{first} rewrite the node to @z:=True@,
718 and \emph{then} apply the transfer function to the new node,
719 the transfer function becomes wonderfully simple: it merely has to see if the
720 right hand side is a constant.
721 You~can see code in \secref{const-prop-client}.
722
723 Another example is the interleaving of liveness analysis and dead-code elimination.
724 As mentioned in \secref{liveness-mention},
725 it is sufficient for the analysis to~say ``@y@ is live before @x:=y+2@''.
726 It is not necessary to have the more complex rule
727 ``if @x@~is live after @x:=y+2@ then @y@ is live before~it,''
728 because if @x@~is \emph{not} live after @x:=y+2@,
729 the assignment @x:=y+2@ will be eliminated.
730 If~there are a number of interacting
731 analyses and/or transformations,
732 the benefit of interleaving them is even more compelling; for more
733 substantial examples, consult \citet{lerner-grove-chambers:2002}.
734 %
735 But these compelling benefits come at a cost.
736 To~compute valid facts for a program that has loops,
737 an analysis may require multiple iterations over the program.
738 Before the final iteration, the analysis may
739 compute a~fact that is invalid,
740 and a transformation may use the invalid fact to rewrite the program
741 (\secref{ckpoint-monad}).
742 To~avoid unjustified rewrites,
743 any rewrite based on an invalid fact must be rolled back;
744 transformations must be \emph{speculative}.
745 As~described in \secref{checkpoint-monad},
746 \hoopl~manages speculation with cooperation from the client.
747
748
749 \paragraph{Cognitive costs of interleaving}
750
751 \remark{I'm not entirely pleased with this subsection title}
752
753 \seclabel{debugging-introduced}
754
755 While it is wonderful that we can implement complex optimizations by
756 composing very simple analyses and transformations,
757 it~is not so wonderful that very simple analyses and transformations,
758 when composed, can exhibit complex emergent behavior.
759 Because such behavior is not easily predicted, it is essential to have good
760 tools for debugging.
761 \hoopl's primary debugging tool is an implementation of
762 Whalley's \citeyearpar{whalley:isolation} search technique for
763 isolating faults in an optimizer (\secref{whalley-from-s2}).
764
765
766 \section{Representing control-flow graphs} \seclabel{graph-rep}
767
768 \ourlib{} is a library that makes it easy to define dataflow analyses,
769 and transformations driven by these analyses, on control-flow graphs.
770 Graphs are composed from smaller units, which we discuss from the
771 bottom up:
772 \begin{itemize}
773 \item A \emph{node} is defined by \ourlib's client;
774 \ourlib{} knows nothing about the representation of nodes (\secref{nodes}).
775 \item A basic \emph{block} is a sequence of nodes (\secref{blocks}).
776 \item Control-flow \emph{edges} connect nodes (\secref{edges}).
777 \item A \emph{graph} is an arbitrarily complicated control-flow graph:
778 basic blocks connected by edges (\secref{graphs}).
779 \end{itemize}
780
781 \subsection{Shapes: Open and closed}
782
783 Nodes, blocks, and graphs share important properties in common.
784 In particular, each is \emph{open or closed on entry}
785 and \emph{open or closed on exit}.
786 An \emph{open} point is one at which control may implicitly ``fall through;''
787 to transfer control at a \emph{closed} point requires an explicit
788 control-transfer instruction to a named label.
789 For example,
790 \begin{itemize}
791 \item A shift-left instruction is open on entry (because control can fall into it
792 from the preceding instruction), and open on exit (because control falls through
793 to the next instruction).
794 \item An unconditional branch is open on entry, but closed on exit (because
795 control cannot fall through to the next instruction).
796 \item A label is closed on entry (because in \ourlib{} we do not allow
797 control to fall through into a branch target), but open on exit.
798 \item
799 A~function call should be closed on exit,
800 because control can flow from a call site to multiple points:
801 for example, a~return continuation or an exception handler.
802 (And~after optimization, distinct call sites may share a return
803 continuation.)
804 \end{itemize}
805 % This taxonomy enables \ourlib{} to enforce invariants:
806 % only nodes closed on entry can be the targets of branches, and only nodes closed
807 % on exits can transfer control (see also \secref{nonlocal-class}).
808 % As~a consequence, all control transfers originate at control-transfer
809 % instructions and terminated at labels; this invariant dramatically
810 % simplifies analysis and transformation.
811 These examples concern nodes, but the same classification applies
812 to blocks and graphs. For example the block
813 \begin{code}
814 x:=7; y:=x+2; goto L
815 \end{code}
816 is open on entry and closed on exit.
817 This is the block's \emph{shape}, which we often abbreviate
818 ``open/closed;''
819 we may refer to an ``open/closed block.''
820
821 The shape of a thing determines that thing's control-flow properties.
822 In particular, whenever E is a node, block, or graph,
823 % : \simon{Removed the claim about a unique entry point.}
824 \begin{itemize}
825 \item
826 If E is open on entry, it has a unique predecessor;
827 if it is closed, it may have arbitrarily many predecessors---or none.
828 \item
829 If E is open on exit, it has a unique successor;
830 if it is closed, it may have arbitrarily many successors---or none.
831 \end{itemize}
832 %%%%
833 %%%%
834 %%%% % \item Regardless of whether E is open or closed,
835 %%%% % it has a unique entry point where execution begins.
836 %%%% \item If E is closed on exit, control leaves \emph{only}
837 %%%% by explicit branches from closed-on-exit nodes.
838 %%%% \item If E is open on exit, control \emph{may} leave E
839 %%%% by ``falling through'' from a distinguished exit point.
840 %%%% \remark{If E is a node or block, control \emph{only} leaves E by
841 %%%% falling through, but this isn't so for a graph. Example: a body of a
842 %%%% loop contains a \texttt{break} statement} \simon{I don't understand.
843 %%%% A break statement would have to be translated as a branch, no?
844 %%%% Can you give a an example? I claim that control only leaves an
845 %%%% open graph by falling through.}
846 %%%% \end{itemize}
847
848
849 \subsection{Nodes} \seclabel{nodes}
850
851 The primitive constituents of a control-flow graph are
852 \emph{nodes}, which are defined by the client.
853 For example, a node might represent a machine instruction, such as an
854 assignment, a call, or a conditional branch.
855 But \ourlib{}'s graph representation is \emph{polymorphic in the node type},
856 so each client can define nodes as it likes.
857 Because they contain nodes defined by the client,
858 graphs can include arbitrary client-specified data, including
859 (say) method calls, C~statements, stack maps, or
860 whatever.
861
862
863 \begin{figure}
864 \begin{fuzzcode}{0.98pt}
865 data `Node e x where
866 Label :: Label -> Node C O
867 `Assign :: Var -> Expr -> Node O O
868 `Store :: Expr -> Expr -> Node O O
869 `Branch :: Label -> Node O C
870 `Cond :: Expr -> Label -> Label -> Node O C
871 ... more constructors ...
872 \end{fuzzcode}
873 \caption{A typical node type as it might be defined by a client}
874 \figlabel{cmm-node}
875 \end{figure}
876
877 \ourlib{} knows \emph{at compile time} whether a node is open or
878 closed on entry and exit:
879 the type of a node has kind @*->*->*@, where the two type parameters
880 are type-level flags, one for entry and one for exit.
881 Such a type parameter may be instantiated only with type @O@~(for
882 open) or type~@C@ (for closed).
883
884 As an example,
885 \figref{cmm-node} shows a typical node type as it might be defined by
886 one of \ourlib's {clients}.
887 The type parameters are written @e@ and @x@, for
888 entry and exit respectively.
889 The type is a generalized algebraic data type;
890 the syntax gives the type of each constructor.
891 %%% \cite{peyton-jones:unification-based-gadts}.
892 For example, constructor @Label@
893 takes a @Label@ and returns a node of type @Node C O@, where
894 the~``@C@'' says ``closed on entry'' and the~``@O@'' says ``open on exit''.
895 The types @Label@, @O@, and~@C@ are
896 defined by \ourlib{} (\figref{graph}).
897 As another example, constructor @Assign@ takes a variable and an expression, and it
898 returns a @Node@ open on both entry and exit; constructor @Store@ is
899 similar.
900 Types @`Var@ and @`Expr@ are private to the client, and
901 \ourlib{} knows nothing of them.
902 Finally, control-transfer nodes @Branch@ and @Cond@ (conditional
903 branch) are open on entry
904 and closed on exit.
905
906 Nodes closed on entry are the only targets of control transfers;
907 nodes open on entry and exit never perform control transfers;
908 and nodes closed on exit always perform control transfers\footnote{%
909 To obey these invariants,
910 a node for
911 a conditional-branch instruction, which typically either transfers control
912 \emph{or} falls through, must be represented as a two-target
913 conditional branch, with the fall-through path in a separate block.
914 This representation is standard \cite{appel:modern},
915 and it costs nothing in practice:
916 such code is easily sequentialized without superfluous branches.
917 %a late-stage code-layout pass can readily reconstruct efficient code.
918 }.
919 Because of the position each type of node occupies in a
920 basic block,
921 we~often call them \emph{first}, \emph{middle}, and \emph{last} nodes
922 respectively.
923
924
925 \begin{figure}
926 \begin{fuzzcode}{0.98pt}
927 data `O -- Open
928 data `C -- Closed
929
930 data `Block n e x where
931 `BFirst :: n C O -> Block n C O
932 `BMiddle :: n O O -> Block n O O
933 `BLast :: n O C -> Block n O C
934 `BCat :: Block n e O -> Block n O x -> Block n e x
935
936 data `Graph n e x where
937 `GNil :: Graph n O O
938 `GUnit :: Block n O O -> Graph n O O
939 `GMany :: MaybeO e (Block n O C)
940 -> LabelMap (Block n C C)
941 -> MaybeO x (Block n C O)
942 -> Graph n e x
943
944 data `MaybeO ^ex t where
945 `JustO :: t -> MaybeO O t
946 `NothingO :: MaybeO C t
947
948 newtype `Label -- abstract type
949
950 class `NonLocal n where
951 `entryLabel :: n C x -> Label
952 `successors :: n e C -> [Label]
953 \end{fuzzcode}
954 \caption{The block and graph types defined by \ourlib}
955 \figlabel{graph} \figlabel{edges}
956 \end{figure}
957 % omit MaybeC :: * -> * -> *
958
959
960 \subsection{Blocks} \seclabel{blocks}
961
962 \ourlib\ combines the client's nodes into
963 blocks and graphs, which, unlike the nodes, are defined by \ourlib{}
964 (\figref{graph}).
965 A~@Block@ is parameterized over the node type~@n@
966 as well as over the same flag types that make it open or closed at
967 entry and exit.
968
969 The @BFirst@, @BMiddle@, and @BLast@ constructors create one-node
970 blocks.
971 Each of these constructors is polymorphic in the node's \emph{representation}
972 but monomorphic in its \emph{shape}.
973 Why not use a single constructor
974 of type \mbox{@n e x -> Block n e x@}, which would be polymorphic in a
975 node's representation \emph{and}
976 shape?
977 Because by making the shape known statically, we simplify the
978 implementation of analysis and transformation in
979 \secref{dfengine}.
980 \finalremark{We have no principled answer to the question of what
981 parts of the representation of blocks are exposed.
982 Do we tell our readers or just ignore the question?
983 }
984
985 The @BCat@ constructor concatenates blocks in sequence.
986 It~makes sense to concatenate blocks only when control can fall
987 through from the first to the second; therefore,
988 two blocks may be concatenated {only} if each block is open at
989 the point of concatenation.
990 This restriction is enforced by the type of @BCat@, whose first
991 argument must be open on exit, and whose second argument must be open on entry.
992 It~is statically impossible, for example, to concatenate a @Branch@
993 immediately before an
994 @Assign@.
995 Indeed, the @Block@ type statically guarantees that any
996 closed/closed @Block@---which compiler writers normally
997 call a ``basic block''---consists of exactly one first node
998 (such as @Label@ in \figref{cmm-node}),
999 followed by zero or more middle nodes (@Assign@ or @Store@),
1000 and terminated with exactly one
1001 last node (@Branch@ or @Cond@).
1002 Using GADTs to enforce these invariants is one of
1003 \ourlib{}'s innovations.
1004
1005
1006 \subsection{Control-flow edges and program points} \seclabel{edges}
1007
1008 In a block, a control-flow edge is implicit in every application of
1009 the @BCat@ constructor.
1010 An~implicit edge originates in a first node or a middle node and flows
1011 to a middle node or a last node.
1012 \emph{Explicit} edges between blocks are represented by the client;
1013 an~explicit edge originates in a last node and flows to a (labelled)
1014 first node.
1015 \hoopl\ discovers explicit edges by using the @successors@ and
1016 @entryLabel@ functions of the @NonLocal@ class (\figref{graph}).
1017 Any~edge, whether implicit or explicit, is considered a program point,
1018 and an analysis written using \hoopl\ computes a dataflow fact at each
1019 such point.
1020
1021 \subsection{Graphs} \seclabel{graphs}
1022
1023 \ourlib{} composes blocks into graphs, which are also defined
1024 in \figref{graph}.
1025 Like @Block@, the data type @Graph@ is parameterized over
1026 both nodes @n@ and its open/closed shape (@e@ and @x@).
1027 It has three constructors. The first two
1028 deal with the base cases of open/open graphs:
1029 an empty graph is represented by @GNil@ while a single-block graph
1030 is represented by @GUnit@.
1031
1032 More general graphs are represented by @GMany@, which has three
1033 fields: an optional entry sequence, a body, and an optional exit
1034 sequence.
1035 \begin{itemize}
1036 \item
1037 If the graph is open on entry, it contains an \emph{entry sequence} of type
1038 @Block n O C@.
1039 We could represent this sequence as a value of type
1040 @Maybe (Block n O C)@, but we can do better:
1041 a~value of @Maybe@ type requires a \emph{dynamic} test,
1042 but we know \emph{statically}, at compile time, that the sequence is present if and only
1043 if the graph is open on entry.
1044 We~express our compile-time knowledge by using the type
1045 @MaybeO e (Block n O C)@, a type-indexed version of @Maybe@
1046 which is also defined in \figref{graph}:
1047 the type @MaybeO O a@ is isomorphic to~@a@, while
1048 the type @MaybeO C a@ is isomorphic to~@()@.
1049 \item
1050 The \emph{body} of the graph is a collection of closed/closed blocks.
1051 To~facilitate traversal of the graph, we represent a body as a finite
1052 map from label to block.
1053 \item
1054 The \emph{exit sequence} is dual to the entry sequence, and like the entry
1055 sequence, its presence or absence is deducible from the static type of the graph.
1056 \end{itemize}
1057
1058 \seclabel{gSplice}
1059
1060 Graphs can be spliced together nicely; the cost is logarithmic in the
1061 number of closed/closed blocks.
1062 Unlike blocks, two graphs may be spliced together
1063 not only when they are both open
1064 at splice point but also
1065 when they are both closed---and not in the other two cases:
1066 \begingroup
1067 \def\^{\\[-5pt]}%
1068 \begin{smallttcode}
1069 `gSplice :: Graph n e a -> Graph n a x -> Graph n e x
1070 gSplice GNil g2 = g2
1071 gSplice g1 GNil = g1\^
1072 gSplice (GUnit ^b1) (GUnit ^b2) = GUnit (b1 `BCat` b2)\^
1073 gSplice (GUnit b) (GMany (JustO e) bs x)
1074 = GMany (JustO (b `BCat` e)) bs x\^
1075 gSplice (GMany e ^bs (JustO x)) (GUnit b2)
1076 = GMany e bs (JustO (x `BCat` b2))\^
1077 gSplice (GMany e1 ^bs1 (JustO x1)) (GMany (JustO e2) ^bs2 x2)
1078 = GMany e1 (bs1 `mapUnion` (b `addBlock` bs2)) x2
1079 where b = x1 `BCat` e2\^
1080 gSplice (GMany e1 bs1 NothingO) (GMany NothingO bs2 x2)
1081 = GMany e1 (bs1 `mapUnion` bs2) x2
1082 \end{smallttcode}
1083 \endgroup
1084 % omit mapUnion :: LabelMap a -> LabelMap a -> LabelMap a
1085 % omit addBlock :: NonLocal n => Block n C C -> LabelMap (Block n C C) -> LabelMap (Block n C C)
1086 This definition illustrates the power of GADTs: the
1087 pattern matching is exhaustive, and all the open/closed invariants are
1088 statically checked. For example, consider the second-last equation for @gSplice@.
1089 Since the exit sequence of the first argument is @JustO x1@,
1090 we know that type parameter~@a@ is~@O@, and hence the entry sequence of the second
1091 argument must be @JustO e2@.
1092 Moreover, block~@x1@ must be
1093 closed/open, and block~@e2@ must be open/closed.
1094 We can therefore concatenate
1095 \ifpagetuning
1096 @x1@~and~@e2@
1097 \else
1098 them
1099 \fi
1100 with @BCat@ to produce a closed/closed block~@b@, which is
1101 added to the body of the result.
1102
1103 The representation of @Graph@s is exposed to \hoopl's clients.
1104 \finalremark{This \P\ is a candidate to be cut.}
1105 We~have carefully crafted the types so that if @BCat@
1106 is considered as an associative operator,
1107 every graph has a unique representation.
1108 %% \simon{Well, you were the one who was so keen on a unique representation!
1109 %% And since we have one, I think tis useful to say so. Lastly, the representation of
1110 %% singleton blocks is not entirely obvious.}
1111 %%%%
1112 %%%% An empty open/open graph is represented
1113 %%%% by @GNil@, while a closed/closed one is @gNilCC@:
1114 %%%% \par {\small
1115 %%%% \begin{code}
1116 %%%% gNilCC :: Graph C C
1117 %%%% gNilCC = GMany NothingO BodyEmpty NothingO
1118 %%%% \end{code}}
1119 %%%% The representation of a @Graph@ consisting of a single block~@b@
1120 %%%% depends on the shape of~@b@:\remark{Does anyone care?}
1121 %%%% \par{\small
1122 %%%% \begin{code}
1123 %%%% gUnitOO :: Block O O -> Graph O O
1124 %%%% gUnitOC :: Block O C -> Graph O C
1125 %%%% gUnitCO :: Block O C -> Graph C O
1126 %%%% gUnitCC :: Block O C -> Graph C C
1127 %%%% gUnitOO b = GUnit b
1128 %%%% gUnitOC b = GMany (JustO b) BodyEmpty NothingO
1129 %%%% gUnitCO b = GMany NothingO BodyEmpty (JustO b)
1130 %%%% gUnitCC b = GMany NothingO (BodyUnit b) NothingO
1131 %%%% \end{code}}
1132 %%%% Multi-block graphs are similar.
1133 %%%% From these definitions
1134 To guarantee uniqueness, @GUnit@ is restricted to open/open
1135 blocks.
1136 If~@GUnit@ were more polymorphic, there would be
1137 more than one way to represent some graphs, and it wouldn't be obvious
1138 to a client which representation to choose---or if the choice made a difference.
1139
1140
1141 \subsection{Labels and successors} \seclabel{nonlocal-class}
1142
1143
1144 Although \ourlib{} is polymorphic in the type of nodes,
1145 it~still needs to know how a node may transfer control from one
1146 block to another.
1147 \hoopl\ also
1148 needs to know what @Label@ is on the first node in a block.
1149 If \hoopl\ is polymorphic in the node type,
1150 how can it{} know these things?
1151 \hoopl\ requires the client to make the node type an instance of
1152 \ourlib's @NonLocal@ type class, which is defined in \figref{edges}.
1153 The @entryLabel@ method takes a first node (one closed on entry, \secref{nodes})
1154 and returns its @Label@;
1155 the @successors@ method takes a last node (closed on exit) and returns
1156 the @Label@s to
1157 which it can transfer control.
1158 A~middle node, which is open on both entry and exit, transfers control
1159 only locally, to its successor within a basic block,
1160 so no corresponding interrogation function is needed.
1161
1162 %% A node type defined by a client must be an instance of @NonLocal@.
1163 In~\figref{cmm-node},
1164 the client's instance declaration for @Node@ would be
1165 \begin{code}
1166 instance NonLocal Node where
1167 entryLabel (Label l) = l
1168 successors (Branch b) = [b]
1169 successors (Cond e b1 b2) = [b1, b2]
1170 \end{code}
1171 Again, the pattern matching for both functions is exhaustive, and
1172 the compiler statically checks this fact.
1173 Here, @entryLabel@
1174 cannot be applied to an @Assign@ or @Branch@ node,
1175 and any attempt to define a case for @Assign@ or @Branch@ would result
1176 in a type error.
1177
1178 While the client provides this information about
1179 nodes, it is convenient for \ourlib\ to get the same information
1180 about blocks.
1181 Internally,
1182 \ourlib{} uses this instance declaration for the @Block@ type:
1183 \begin{code}
1184 instance NonLocal n => NonLocal (Block n) where
1185 entryLabel (BFirst n) = entryLabel n
1186 entryLabel (BCat b _) = entryLabel b
1187 successors (BLast n) = successors n
1188 successors (BCat _ b) = successors b
1189 \end{code}
1190 Because the functions @entryLabel@ and @successors@ are used to track control
1191 flow \emph{within} a graph, \ourlib\ does not need to ask for the
1192 entry label or successors of a @Graph@ itself.
1193 Indeed, @Graph@ \emph{cannot} be an instance of @NonLocal@, because even
1194 if a @Graph@ is closed on entry, it need not have a unique entry label.
1195 %
1196 % A slight infelicity is that we cannot make @Graph@ an instance of @NonLocal@,
1197 % because a graph closed on entry has no unique label. Fortunately, we never
1198 % need @Graph@s to be in @NonLocal@.
1199 %
1200 % ``Never apologize. Never confess to infelicity.'' ---SLPJ
1201
1202
1203
1204
1205
1206
1207 \section {Using \ourlib{} to analyze and transform graphs} \seclabel{using-hoopl}
1208 \seclabel{making-simple}
1209 \seclabel{create-analysis}
1210 \seclabel{api}
1211
1212 Now that we have graphs, how do we optimize them?
1213 \ourlib{} makes it easy to build a new dataflow analysis and optimization.
1214 A~client must supply the following pieces:
1215 \begin{itemize}
1216 \item A \emph{node type} (\secref{nodes}).
1217 \ourlib{} supplies the @Block@ and @Graph@ types
1218 that let the client build control-flow graphs out of nodes.
1219 \item A \emph{data type of facts} and some operations over
1220 those facts (\secref{facts}).
1221 Each analysis uses facts that are specific to that particular analysis,
1222 which \ourlib{} accommodates by being polymorphic in
1223 the fact type.
1224 \item A \emph{transfer function} that takes a node and returns a
1225 \emph{fact transformer}, which takes a fact flowing into the node and
1226 returns the transformed fact that flows out of the node (\secref{transfers}).
1227 \item A \emph{rewrite function} that takes a node and an input fact,
1228 performs a monadic action,
1229 and returns either @Nothing@
1230 or @Just g@,
1231 where @g@~is a graph that should
1232 replace the node
1233 (\secreftwo{rewrites}{shallow-vs-deep}).
1234 The ability to replace a \emph{node} by a \emph{graph}
1235 % that may include internal control flow
1236 is crucial for many code-improving transformations.
1237 %We discuss the rewrite function
1238 %in Sections \ref{sec:rewrites} and \ref{sec:shallow-vs-deep}.
1239 \end{itemize}
1240 These requirements are summarized in \tabref{parts}.
1241 Because facts, transfer functions, and rewrite functions work together,
1242 we~combine them in a single record of type @FwdPass@ (\figref{api-types}).
1243
1244 \begin{table}
1245 \centerline{%
1246 \begin{tabular}{@{}>{\raggedright\arraybackslash}p{1.03in}>{\scshape}c>{\scshape
1247 }
1248 c>{\raggedright\arraybackslash}p{1.29in}@{}}
1249 &\multicolumn1{r}{\llap{\emph{Specified}}\hspace*{-0.3em}}&
1250 \multicolumn1{l}{\hspace*{-0.4em}\rlap{\emph{Implemented}}}&\\
1251 \multicolumn1{c}{\emph{Part of optimizer}}
1252 &\multicolumn1{c}{\emph{by}}&
1253 \multicolumn1{c}{\emph{by}}&
1254 \multicolumn1{c}{\emph{How many}}%
1255 \\[5pt]
1256 Control-flow graphs& Us & Us & One \\
1257 Nodes in a control-flow graph & You & You & One type per intermediate language \\[3pt]
1258 Dataflow fact~$F$ & You & You & One type per logic \\
1259 Lattice operations & Us & You & One set per logic \\[3pt]
1260 Transfer functions & Us & You & One per analysis \\
1261 Rewrite functions & Us & You & One per \rlap{transformation} \\[3pt]
1262 Analyze-and-rewrite functions & Us & Us & Two (forward, backward) \\
1263 \end{tabular}%
1264 }
1265 \ifpagetuning\vspace*{0.5\baselineskip}\fi
1266 \caption{Parts of an optimizer built with \ourlib}
1267 \tablabel{parts}
1268 \end{table}
1269
1270
1271
1272
1273 Given a node type~@n@ and a @FwdPass@,
1274 a client can ask \ourlib{}\ to analyze and rewrite a
1275 graph.
1276 \hoopl\ provides a fully polymorphic interface, but for purposes of
1277 exposition, we present a function that is specialized to a
1278 closed/closed graph:
1279 \begin{code}
1280 `analyzeAndRewriteFwdBody
1281 :: ( CkpointMonad m -- Roll back speculative actions
1282 , NonLocal n ) -- Extract non-local flow edges
1283 => FwdPass m n f -- Lattice, transfer, rewrite
1284 -> [Label] -- Entry point(s)
1285 -> Graph n C C -- Input graph
1286 -> FactBase f -- Input fact(s)
1287 -> m ( Graph n C C -- Result graph
1288 , FactBase f ) -- ... and its facts
1289 \end{code}
1290 Given a @FwdPass@ and a list of entry points, the
1291 analyze-and-rewrite function transforms a graph into
1292 an optimized graph.
1293 As its type shows, this function
1294 is polymorphic in the types of nodes~@n@ and facts~@f@;
1295 these types are chosen by the client.
1296 The type of the monad~@m@ is also chosen by the client.
1297
1298 As well as taking and returning a graph, the
1299 function also takes input facts (the @FactBase@) and produces output facts.
1300 A~@FactBase@ is a finite mapping from @Label@ to facts;
1301 if~a @Label@ is not in the domain of the @FactBase@, its fact is the
1302 bottom element of the lattice.
1303 For example, in our constant-propagation example from \secref{const-prop-example},
1304 if the graph
1305 represents the body of a procedure
1306 with parameters $x,y,z$, we would map the entry @Label@ to a fact
1307 \mbox{$@x=@\top \land @y=@\top \land @z=@\top$}, to specify that the procedure's
1308 parameters are not known to be constants.
1309 %%
1310 %% The
1311 %% output @FactBase@ maps each @Label@ in the body to its fact
1312
1313 The client's model of how @analyzeAndRewriteFwdBody@ works is as follows:
1314 \ourlib\ walks forward over each block in the graph.
1315 At~each node, \ourlib\ applies the
1316 rewrite function to the node and the incoming fact. If the rewrite
1317 function returns @Nothing@, the node is retained as part of the output
1318 graph, the transfer function is used to compute the downstream fact,
1319 and \ourlib\ moves on to the next node.
1320 But if the rewrite function returns @Just g@,
1321 indicating that it wants to rewrite the node to the replacement graph~@g@, then
1322 \ourlib\ recursively analyzes and may further rewrite~@g@
1323 before moving on to the next node.
1324 A~node following a rewritten node sees
1325 \emph{up-to-date} facts; that is, its input fact is computed by
1326 analyzing the replacement graph.
1327
1328 A~rewrite function may take any action that is justified by
1329 the incoming fact.
1330 If~further analysis invalidates the fact, \hoopl\ rolls
1331 back the action.
1332 Because graphs cannot be mutated,
1333 rolling back to the original graph is easy.
1334 But~rolling back a rewrite function's {monadic} action
1335 requires cooperation from the client:
1336 the~client must provide @checkpoint@ and
1337 @restart@ operations, which
1338 make~@m@ an instance of
1339 \hoopl's @CkpointMonad@ class
1340 (\secref{checkpoint-monad}).
1341
1342
1343 Below we flesh out the
1344 interface to @analyzeAndRewriteFwdBody@, leaving the implementation for
1345 \secref{engine}.
1346 %{\hfuzz=7.2pt\par}
1347
1348
1349 \begin{figure}
1350 \begin{code}
1351 data `FwdPass m n f
1352 = FwdPass { `fp_lattice :: DataflowLattice f
1353 , `fp_transfer :: FwdTransfer n f
1354 , `fp_rewrite :: FwdRewrite m n f }
1355
1356 ------- Lattice ----------
1357 data `DataflowLattice f = DataflowLattice
1358 { `fact_bot :: f
1359 , `fact_join :: JoinFun f }
1360 type `JoinFun f =
1361 OldFact f -> NewFact f -> (ChangeFlag, f)
1362 newtype `OldFact f = OldFact f
1363 newtype `NewFact f = NewFact f
1364 data `ChangeFlag = `NoChange | `SomeChange
1365
1366 ------- Transfers ----------
1367 newtype `FwdTransfer n f -- abstract type
1368 `mkFTransfer
1369 :: (forall e x . n e x -> f -> Fact x f)
1370 -> FwdTransfer n f
1371
1372 ------- Rewrites ----------
1373 newtype `FwdRewrite m n f -- abstract type
1374 `mkFRewrite :: FuelMonad m
1375 => (forall e x . n e x -> f -> m (Maybe (Graph n e x)))
1376 -> FwdRewrite m n f
1377 thenFwdRw :: FwdRewrite m n f -> FwdRewrite m n f
1378 -> FwdRewrite m n f
1379 iterFwdRw :: FwdRewrite m n f -> FwdRewrite m n f
1380 `noFwdRw :: Monad m => FwdRewrite m n f
1381
1382 ------- Fact-like things, aka "fact(s)" -----
1383 type family `Fact x f :: *
1384 type instance Fact O f = f
1385 type instance Fact C f = FactBase f
1386
1387 ------- FactBase -------
1388 type `FactBase f = LabelMap f
1389 -- A finite mapping from Labels to facts f
1390
1391 ------- Rolling back speculative rewrites ----
1392 class Monad m => CkpointMonad m where
1393 type Checkpoint m
1394 checkpoint :: m (Checkpoint m)
1395 restart :: Checkpoint m -> m ()
1396
1397 ------- Optimization fuel ----
1398 type `Fuel = Int
1399 class Monad m => `FuelMonad m where
1400 `getFuel :: m Fuel
1401 `setFuel :: Fuel -> m ()
1402 \end{code}
1403 \caption{\ourlib{} API data types}
1404 \figlabel{api-types}
1405 \figlabel{lattice-type} \figlabel{lattice}
1406 \figlabel{transfers} \figlabel{rewrites}
1407 \end{figure}
1408 % omit mkFactBase :: [(Label, f)] -> FactBase f
1409 %%%% \simon{We previously renamed @fact\_join@ to @fact\_extend@ because it really
1410 %%%% is not a symmetrical join; we're extending an old fact with a new one.
1411 %%%% NR: Yes, but the asymmetry is now explicit in the \emph{type}, so it
1412 %%%% needn't also be reflected in the \emph{name}.}
1413
1414 \subsection{Dataflow lattices} \seclabel{lattices} \seclabel{facts}
1415
1416 For each analysis or transformation, the client must define a type
1417 of dataflow facts.
1418 A~dataflow fact often represents an assertion
1419 about a program point,
1420 but in general, dataflow analysis establishes properties of \emph{paths}:
1421 \begin{itemize}
1422 \item An assertion about all paths \emph{to} a program point is established
1423 by a \emph{forward analysis}. For example the assertion ``$@x@=@3@$'' at point P
1424 claims that variable @x@ holds value @3@ at P, regardless of the
1425 path by which P is reached.
1426 \item An assertion about all paths \emph{from} a program point is
1427 established by a \emph{backward analysis}. For example, the
1428 assertion ``@x@ is dead'' at point P claims that no path from P uses
1429 variable @x@.
1430 \end{itemize}
1431
1432 A~set of dataflow facts must form a lattice, and \ourlib{} must know
1433 (a)~the bottom element of the lattice and (b)~how to take
1434 the least upper bound (join) of two elements.
1435 To ensure that analysis
1436 terminates, it is enough if every fact has a finite number of
1437 distinct facts above it, so that repeated joins
1438 eventually reach a fixed point.
1439 %% \simon{There is no mention here of the @OldFact@ and @NewFact@ types.
1440 %% Shall we nuke them for the purposes of the paper?
1441 %% NR: They should definitely not be nuked; they're needed if the reader
1442 %% wants to understand the asymmetrical behavior of @fact\_join@.
1443 %% I'm~also mindful that this paper will be the primary explanation of
1444 %% \hoopl\ to its users, and I don't want to hide this part of the
1445 %% interface.
1446 %% As for mentions in the text,
1447 %% if you look carefully, you'll
1448 %% see that I've changed the subscripts in the text to ``old'' and
1449 %% ``new''.
1450 %% Together with the code, I believe that's enough to get the point across.
1451 %% SLPJ -- OK. I don't agree that it's enough, but we have more important fish to fry.
1452 %% }
1453
1454 In practice, joins are computed at labels.
1455 If~$f_{\mathit{old}}$ is the fact currently associated with a
1456 label~$L$,
1457 and if a transfer function propagates a new fact~$f_{\mathit{new}}$
1458 into label~$L$,
1459 \hoopl\ replaces $f_{\mathit{old}}$ with
1460 the join \mbox{$f_{\mathit{old}} \join f_{\mathit{new}}$}.
1461 And \hoopl\ wants to know
1462 if \mbox{$f_{\mathit{old}} \join f_{\mathit{new}} = f_{\mathit{old}}$},
1463 because if not, the analysis has not reached a fixed point.
1464
1465 The bottom element and join operation of a lattice of facts of
1466 type~@f@ are stored in a value of type @DataflowLattice f@
1467 (\figref{lattice}).
1468 %%%% \simon{Can we shorten ``@DataflowLattice@'' to just
1469 %%%% ``@Lattice@''?} % bigger fish ---NR
1470 %%Such a value is part of the @FwdPass n f@ that is passed to
1471 %%@analyzeAndRewriteFwd@ above.
1472 As noted in the previous paragraph,
1473 \ourlib{} needs to know when the result of a join is equal
1474 to the old fact.
1475 Because this information can often be computed cheaply together
1476 with the join, \ourlib\ does not
1477 require a separate equality test on facts, which might be expensive.
1478 Instead, \ourlib\ requires that @fact_join@ return a @ChangeFlag@ as
1479 well as the least upper bound.
1480 The @ChangeFlag@ should be @NoChange@ if
1481 the result is the same as the old fact, and
1482 @SomeChange@ if the result differs.
1483
1484 \seclabel{WithTop}
1485
1486 To help clients create lattices and join functions,
1487 \hoopl\ includes functions and constructors that can extend a type~@a@
1488 with top and bottom elements.
1489 In this paper, we use only type @`WithTop@, which comes with value
1490 constructors that have these types:
1491 \begin{code}
1492 `PElem :: a -> WithTop a
1493 `Top :: WithTop a
1494 \end{code}
1495 \hoopl\ provides combinators which make it easier to create join
1496 functions that use @Top@.
1497 The most useful is @extendJoinDomain@:
1498 \begin{smallcode}
1499 `extendJoinDomain
1500 :: (OldFact a -> NewFact a -> (ChangeFlag, WithTop a))
1501 -> JoinFun (WithTop a)
1502 \end{smallcode}
1503 A~client can write a join function that \emph{consumes} only facts of
1504 type~@a@, but may produce @Top@ (as well as a fact of type~@a@)---as
1505 in the example of \figref{const-prop} below.
1506 Calling @extendJoinDomain@ extends the client's function to a proper
1507 join function on the type @WithTop a@,
1508 and @extendJoinDomain@ makes sure that joins
1509 involving @Top@ obey the appropriate algebraic laws.
1510
1511 \hoopl\ also provides a value constructor @`Bot@ and type constructors
1512 @`WithBot@ and @`WithTopAndBot@, along with similar functions.
1513 Constructors @Top@ and @Bot@ are polymorphic, so for example,
1514 @Top@~also has type @WithTopAndBot a@.
1515
1516 It is also very common to use a lattice that takes the form of a
1517 finite~map.
1518 In~such lattices it is typical to join maps pointwise, and \hoopl\
1519 provides a function that makes it convenient to do so:
1520 \begin{smallcode}
1521 `joinMaps :: Ord k => JoinFun v -> JoinFun (Map k v)
1522 \end{smallcode}
1523
1524
1525
1526 \subsection{The transfer function} \seclabel{transfers}
1527
1528 A~forward transfer function is presented with the dataflow fact
1529 coming
1530 into a node, and it computes dataflow fact(s) on the node's outgoing edge(s).
1531 In a forward analysis, \hoopl\ starts with the fact at the
1532 beginning of a block and applies the transfer function to successive
1533 nodes in that block, until eventually the transfer function for the last node
1534 computes the facts that are propagated to the block's successors.
1535 For example, consider doing constant propagation (\secref{constant-propagation}) on
1536 the following graph, with entry at @L1@:
1537 \begin{code}
1538 L1: x=3; goto L2
1539 L2: y=x+4; x=x-1;
1540 if x>0 then goto L2 else return
1541 \end{code}
1542 Forward analysis starts with the bottom fact \{\} at every label
1543 except the entry~@L1@.
1544 The initial fact at~@L1@ is \{@x=@$\top$@,y=@$\top$\}.
1545 Analyzing~@L1@ propagates this fact forward, applying the transfer
1546 function successively to the nodes
1547 of~@L1@, and propagating the new fact \{@x=3,y=@$\top$\} to~@L2@.
1548 This new fact is joined with the existing (bottom) fact at~@L2@.
1549 Now~the analysis propagates @L2@'s fact forward, again applying the transfer
1550 function, and propagating the new fact \mbox{\{@x=2@, @y=7@\}} to~@L2@.
1551 Again the new fact is joined with the existing fact at~@L2@, and the process
1552 repeats until the facts for each label reach a fixed point.
1553
1554 A~transfer function has an unusual sort of type:
1555 not quite a dependent type, but not a bog-standard polymorphic type either.
1556 The result type of the transfer function is \emph{indexed} by the shape (i.e.,
1557 the type) of the node argument:
1558 If the node is open on exit, the transfer function produces a single fact.
1559 But if the node is \emph{closed} on exit,
1560 the transfer function
1561 produces a collection of (@Label@,fact) pairs: one for each outgoing edge.
1562 %
1563 The indexing is expressed by Haskell's (recently added)
1564 \emph{indexed type families}.
1565 The relevant part of \ourlib's interface is given in \figref{transfers}.
1566 A~forward transfer function supplied by a client,
1567 which would be passed to @mkFTransfer@,
1568 is typically a function polymorphic in @e@ and @x@.
1569 It~takes a
1570 node of type \mbox{@n@ @e@ @x@}
1571 and it returns a \emph{fact transformer} of type
1572 @f -> Fact x f@.
1573 Type constructor @Fact@
1574 should be thought of as a type-level function: its signature is given in the
1575 @type family@ declaration, while its definition is given by two @type instance@
1576 declarations. The first declaration says that a @Fact O f@, which
1577 comes out of a node
1578 \emph{open} on exit, is just a fact~@f@.
1579 The second declaration says that a @Fact C f@, which comes out of a
1580 node \emph{closed} on exit, is a mapping from @Label@ to facts.
1581
1582 %%
1583 %% \begin{code}
1584 %% `transfer_fn :: forall e x . n e x -> f -> Fact x f
1585 %% `node :: n e x
1586 %% \end{code}
1587 %% then @(transfer_fn node)@ is a fact transformer:
1588 %% \begin{code}
1589 %% transfer_fn node :: f -> Fact x f
1590 %% \end{code}
1591 %%
1592
1593
1594 \subsection{The rewrite function and the client's monad}
1595 \seclabel{rewrites}
1596 \seclabel{example-rewrites}
1597
1598
1599 We compute dataflow facts in order to enable code-improving
1600 transformations.
1601 In our constant-propagation example,
1602 the dataflow facts may enable us
1603 to simplify an expression by performing constant folding, or to
1604 turn a conditional branch into an unconditional one.
1605 Similarly, a liveness analysis may allow us to
1606 replace a dead assignment with a no-op.
1607
1608 A @FwdPass@ therefore includes a \emph{rewrite function}, whose
1609 type, @FwdRewrite@, is abstract (\figref{api-types}).
1610 A~programmer creating a rewrite function chooses the type of a node~@n@ and
1611 a dataflow fact~@f@.
1612 A~rewrite function might also want to consume fresh
1613 names (e.g., to label new blocks) or take other actions (e.g., logging
1614 rewrites).
1615 So~that a rewrite function may take actions, \hoopl\
1616 requires that a programmer creating a rewrite function also choose a
1617 monad~@m@.
1618 So~that \hoopl\ may roll back actions taken by speculative rewrites, the monad
1619 must satisfy the constraint @CkpointMonad m@, as described in
1620 \secref{checkpoint-monad}.
1621 The programmer may write code that works with any such monad,
1622 may create a monad just for the client,
1623 or may use a monad supplied by \hoopl.
1624
1625
1626 When these choices are made, the easy way to create a rewrite
1627 function is to call the function @mkFRewrite@ in \figref{api-types}.
1628 The client supplies a function that is specialized to a particular
1629 node, fact, and (possibly) monad, but is polymorphic in the
1630 \emph{shape} of the node to be rewritten.
1631 The function, which we will call~$r$, takes a node and a fact and returns a monadic
1632 computation, but what is the result of that computation?
1633 One might
1634 expect that the result should be a new node, but that is not enough:
1635 in~general, it must be possible for rewriting to result in a graph.
1636 For example,
1637 we might want to remove a node by returning the empty graph,
1638 or more ambitiously, we might want to replace a high-level operation
1639 with a tree of conditional branches or a loop, which would entail
1640 returning a graph containing new blocks with internal control flow.
1641
1642
1643 It~must also be possible for a rewrite function to decide to do nothing.
1644 The result of the monadic computation returned by~$r$ may therefore be
1645 @Nothing@, indicating that the node should
1646 not be rewritten,
1647 or $@Just@\;\ag$, indicating that the node should
1648 be replaced with~\ag: the replacement graph.
1649 %%The additional value $\rw$ tells \hoopl\ whether
1650 %%and how the replacement graph~$\ag$ should be analyzed and rewritten
1651 %%further;
1652 %%we explain~$\rw$ in \secref{shallow-vs-deep}.
1653
1654
1655 The type of @mkFRewrite@ in \figref{api-types} guarantees
1656 that
1657 the replacement graph $\ag$ has
1658 the \emph{same} open/closed shape as the node being rewritten.
1659 For example, a branch instruction can be replaced only by a graph
1660 closed on exit.
1661 %% Moreover, because only an open/open graph can be
1662 %% empty---look at the type of @GNil@ in \figref{graph}---the type
1663 %% of @FwdRewrite@
1664 %% guarantees, at compile time, that no head of a block (closed/open)
1665 %% or tail of a block (open/closed) can ever be deleted by being
1666 %% rewritten to an empty graph.
1667
1668 \subsection{Shallow rewriting, deep rewriting, and rewriting combinators}
1669
1670 \seclabel{shallow-vs-deep}
1671
1672 When a node is rewritten, the replacement graph~$g$
1673 must itself be analyzed, and its nodes may be further rewritten.
1674 \hoopl\ can make a recursive call to
1675 @analyzeAndRewriteFwdBody@---but what @FwdPass@ should it use?
1676 There are two common situations:
1677 \begin{itemize}
1678 \item Sometimes we want to analyze and transform the replacement graph
1679 with an unmodified @FwdPass@, thereby further rewriting the replacement graph.
1680 This procedure is
1681 called \emph{deep rewriting}.
1682 When deep rewriting is used, the client's rewrite function must
1683 ensure that the graphs it produces are not rewritten indefinitely
1684 (\secref{correctness}).
1685 \item
1686 A client may want to analyze the
1687 replacement graph \emph{without} further rewriting.
1688 This procedure is called \emph{shallow rewriting}.
1689 \end{itemize}
1690 Deep rewriting is essential to achieve the full benefits of
1691 interleaved analysis and transformation
1692 \citep{lerner-grove-chambers:2002}.
1693 But shallow rewriting can be vital as well;
1694 for example, a backward dataflow pass that inserts
1695 a spill before a call must not rewrite the call again, lest it attempt
1696 to insert infinitely many spills.
1697
1698
1699
1700 An~innovation of \hoopl\ is to build the choice of shallow or deep
1701 rewriting into each rewrite function, through the use of combinators.
1702 A~function returned by @mkFRewrite@ is shallow by default, but it can
1703 be made deep by
1704 \begin{code}
1705 `iterFwdRw :: FwdRewrite m n f -> FwdRewrite m n f
1706 \end{code}
1707 For~convenience, we~provide the~function @`deepFwdRw@,
1708 which is the composition of @iterFwdRw@ and~@mkFRewrite@.
1709 Another useful combinator enables \hoopl\ to try any number of functions to
1710 rewrite a node, provided only that they all use the same fact:
1711 \verbatiminput{comb1}
1712 % defn thenFwdRw
1713 Rewrite function @r1 `thenFwdRw` r2@ first does the rewrites of~@r1@,
1714 then the rewrites of~@r2@.
1715 The rewriter @noFwdRw@ is the identity of @thenFwdRw@.
1716 %%
1717 %%
1718 %% by using
1719 %% Rewrite functions are potentially more plentiful than transfer
1720 %% functions, because
1721 %% a~single dataflow fact might justify more than one kind of rewrite.
1722 %% \hoopl\ makes it easy for a client to combine multiple rewrite
1723 %% functions that use the same fact:
1724
1725 These combinators satisfy the algebraic law
1726 \begin{code}
1727 iterFwdRw r = r `thenFwdRw` iterFwdRw r
1728 \end{code}
1729 but unfortunately, this equation cannot be used to \emph{define}
1730 @iterFwdRw@:
1731 if~we used this equation to define @iterFwdRw@, then when @r@ declined
1732 to rewrite, @iterFwdRw r@ would diverge.
1733
1734
1735
1736 %%%% \hoopl\ provides
1737 %%%% a function that makes a shallow rewrite deep:\finalremark{algebraic
1738 %%%% law wanted!}
1739 %%%% \remark{Do we really care about @iterFwdRw@ or can we just drop it?}
1740 %%%% \verbatiminput{iterf}
1741 %%%% % defn iterFwdRw
1742
1743
1744
1745 \subsection{When the type of nodes is not known}
1746
1747 We note above (\secref{transfers}) that
1748 the type of the transfer function's result
1749 depends on the argument's shape on exit.
1750 It~is easy for a client to write a type-indexed transfer function,
1751 because the client defines the constructor and shape for each node.
1752 The client's transfer functions discriminate on the constructor
1753 and so can return a result that is indexed by each node's shape.
1754
1755 What if you want to write a transfer function that
1756 {does \emph{not} know the type of the node}?
1757 %
1758 For example, a dominator analysis need not scrutinize nodes;
1759 it needs to know only about
1760 labels and edges in the graph.
1761 Ideally, a dominator analysis would work
1762 with \emph{any} type of node~@n@,
1763 provided only that @n@~is an
1764 instance of the @NonLocal@ type class.
1765 But if we don't know
1766 the type of~@n@,
1767 we can't write a function of type
1768 @n e x -> f -> Fact x f@,
1769 because the only way to get the result type right is to
1770 scrutinize the constructors of~@n@.
1771
1772 \seclabel{triple-of-functions}
1773
1774 There is another way;
1775 instead of requiring a single function that is polymorphic in shape,
1776 \hoopl\ will also accept a triple of functions, each of which is
1777 polymorphic in the node's type but monomorphic in its shape:
1778 \begin{code}
1779 `mkFTransfer3 :: (n C O -> f -> Fact O f)
1780 -> (n O O -> f -> Fact O f)
1781 -> (n O C -> f -> Fact C f)
1782 -> FwdTransfer n f
1783 \end{code}
1784 We have used this interface to write a number of functions that are
1785 polymorphic in the node type~@n@:
1786 \begin{itemize}
1787 \item
1788 A function that takes a @FwdTransfer@ and wraps it in logging code, so
1789 an analysis can be debugged by watching the facts flow through the
1790 nodes
1791 \item
1792 A pairing function that runs two passes interleaved, not sequentially,
1793 potentially producing better results than any sequence:
1794 \verbatiminput{pairf}
1795 % defn pairFwd
1796 \item
1797 An efficient dominator analysis
1798 in the style of
1799 \citet{cooper-harvey-kennedy:simple-dominance},
1800 whose transfer function is implemented
1801 using only the
1802 functions in the \texttt{NonLocal} type class
1803 \end{itemize}
1804
1805
1806 %% Or, you might want to write a combinator that
1807 %% computes the pairwise composition of any two analyses
1808 %% defined on any node type.
1809 %%
1810 %% You cannot write either of these examples
1811 %% using the polymorphic function described in \secref{transfers}
1812 %% because these examples require the ability
1813 %% to inspect the shape of the node.
1814 %%
1815 %%
1816 %% \subsection{Stuff cut from 4.2 that should disappear when the previous section is finished}
1817 %%
1818 %% So much for the interface.
1819 %% What about the implementation?
1820 %% The way GADTs work is that the compiler uses the value constructor for
1821 %% type \mbox{@n@ @e@ @x@} to determine whether @e@~and~@x@ are open or
1822 %% closed.
1823 %% But we want to write functions that are \emph{polymorphic} in the node
1824 %% type~@n@!
1825 %% Such functions include
1826 %% \begin{itemize}
1827 %% \item
1828 %% A function that takes a pair of @FwdTransfer@s for facts @f@~and~@f'@,
1829 %% and returns a single @FwdTransfer@ for the fact @(f, f')@
1830 %% \end{itemize}
1831 %% \simon{These bullets are utterly opaque. They belong in 4.5. The rest of this section
1832 %% is also very hard to understand without more explanation. See email. }
1833 %% Such functions may also be useful in \hoopl's \emph{clients}:
1834 %% % may need
1835 %% %functions that are polymorphic in the node type~@n@:
1836 %% \begin{itemize}
1837 %% \item
1838 %% A dominator analysis in the style of
1839 %% \citet{cooper-harvey-kennedy:simple-dominance} requires only the
1840 %% functions in the \texttt{NonLocal} type class;
1841 %% we have written such an analysis using transfer functions that are
1842 %% polymorphic in~@n@.
1843 %% \end{itemize}
1844 %% Because the mapping from
1845 %% value constructors to shape is different for each node type~@n@, transfer
1846 %% functions cannot be polymorphic in both
1847 %% the representation and the shape of nodes.
1848 %% Our implementation therefore sacrifices polymorphism in shape:
1849 %% internally, \hoopl\ represents
1850 %% a~@FwdTransfer n f@ as a \emph{triple} of functions,
1851 %% each polymorphic in~@n@ but monomorphic in shape:
1852 %% \begin{code}
1853 %% newtype FwdTransfer n f
1854 %% = FwdTransfers ( n C O -> f -> f
1855 %% , n O O -> f -> f
1856 %% , n O C -> f -> FactBase f
1857 %% )
1858 %% \end{code}
1859 %% \simon{I argue strongly that the implementation should be
1860 %% polymorphic too, using a shape classifier where necessary.
1861 %% Regardless of how this debate pans out, though, I think it's
1862 %% a bad idea to introduce triples here. They are simply confusing.}
1863 %% Such triples can easily be composed and wrapped without requiring a
1864 %% pattern match on the value constructor of an unknown node
1865 %% type.\remark{Need to refer to this junk in the conclusion}
1866 %% Working with triples is tedious, but only \hoopl\ itself is forced to
1867 %% work with triples; each client, which knows the node type, may supply
1868 %% a triple,
1869 %% but it typically supplies a single function polymorphic in the shape
1870 %% of the (known) node.
1871 %%
1872 %%
1873
1874 %% \subsection{Composing rewrite functions and dataflow passes} \seclabel{combinators}
1875 %%
1876 %% Requiring each rewrite to return a new rewrite function has another
1877 %% advantage, beyond controlling the shallow-vs-deep choice: it
1878 %% enables a variety of combinators over rewrite functions.
1879 %% \remark{This whole subsection needs to be redone in light of the new
1880 %% (triple-based) representation. It's not pretty any more.}
1881 %% For example, here is a function
1882 %% that combines two rewrite functions in sequence:
1883 %% \remark{This code must be improved}
1884 %% What a beautiful type @thenFwdRw@ has!
1885 %% \remark{And what an ugly implementation! Implementations must go.}
1886 %% It tries @rw1@, and if @rw1@
1887 %% declines to rewrite, it behaves like @rw2@. But if
1888 %% @rw1@ rewrites, returning a new rewriter @rw1a@, then the overall call also
1889 %% succeeds, returning a new rewrite function obtained by combining @rw1a@
1890 %% with @rw2@. (We cannot apply @rw1a@ or @rw2@
1891 %% directly to the replacement graph~@g@,
1892 %% because @r1@~returns a graph and @rw2@~expects a node.)
1893 %% The rewriter @noFwdRw@ is the identity of @thenFwdRw@.
1894 %% Finally, @thenFwdRw@ can
1895 %% combine a deep-rewriting function and a shallow-rewriting function,
1896 %% to produce a rewrite function that is a combination of deep and shallow.
1897 %% %%This is something that the Lerner/Grove/Chambers framework could not do,
1898 %% %%because there was a global shallow/deep flag.
1899 %% %% Our shortsightedness; Lerner/Grove/Chambers is deep only ---NR
1900 %%
1901 %%
1902 %% A shallow rewrite function can be made deep by iterating
1903 %% it:\remark{Algebraic law wanted!}
1904 %% If we have shallow rewrites $A$~and~$B$ then we can build $AB$,
1905 %% $A^*B$, $(AB)^*$,
1906 %% and so on: sequential composition is @thenFwdRw@ and the Kleene star
1907 %% is @iterFwdRw@.\remark{Do we still believe this claim?}
1908 %% \remark{We can't define @iterFwdRew@ in terms of @thenFwdRew@ because
1909 %% the second argument to @thenFwdRew@ would have to be the constant
1910 %% nothing function when applied but would have to be the original triple
1911 %% when passed to @thenFwdRew@ as the second argument (in the recursive
1912 %% call).}
1913 %%
1914 %%
1915 %% The combinators above operate on rewrite functions that share a common
1916 %% fact type and transfer function.
1917 %% It~can also be useful to combine entire dataflow passes that use
1918 %% different facts.
1919 %% We~invite you to write one such combinator, with type
1920 %% \begin{code}
1921 %% `pairFwd :: Monad m
1922 %% => FwdPass m n f1
1923 %% -> FwdPass m n f2
1924 %% -> FwdPass m n (f1,f2)
1925 %% \end{code}
1926 %% The two passes run interleaved, not sequentially, and each may help
1927 %% the other,
1928 %% yielding better results than running $A$~and then~$B$ or $B$~and then~$A$
1929 %% \citep{lerner-grove-chambers:2002}.
1930 %% %% call these passes. ``super-analyses;''
1931 %% %% in \hoopl, construction of super-analyses is
1932 %% %% particularly concrete.
1933
1934 \subsection{Example: Constant propagation and constant folding}
1935 \seclabel{const-prop-client}
1936
1937 % omit Binop :: Operator -> Expr -> Expr -> Expr
1938 % omit Add :: Operator
1939
1940 \begin{figure}
1941 \smallverbatiminput{cprop}
1942 % local node
1943 % defn ConstFact
1944 % defn constLattice
1945 % defn constFactAdd
1946 % defn varHasLit
1947 % local ft
1948 % defn constProp
1949 % local cp
1950 % local lookup
1951 % defn simplify
1952 % local simp
1953 % local s_node
1954 % local s_exp
1955 % defn constPropPass
1956 \caption{The client for constant propagation and constant folding\break (extracted automatically from code distributed with Hoopl)}
1957 \figlabel{const-prop}
1958 \ifpagetuning\vspace*{-1.0\baselineskip}\fi
1959 \end{figure}
1960
1961
1962 \figref{const-prop} shows client code for
1963 constant propagation and constant folding.
1964 For each variable, at each program point, the analysis concludes one
1965 of three facts:
1966 the variable holds a constant value of type~@Lit@,
1967 the variable might hold a non-constant value,
1968 or what the variable holds is unknown.
1969 We~represent these facts using a finite map from a variable to a
1970 fact of type @WithTop Lit@ (\secref{WithTop}).
1971 % Any one procedure has only
1972 % finitely many variables; only finitely many facts are computed at any
1973 % program point; and in this lattice any one fact can increase at most
1974 % twice. These properties ensure that the dataflow engine will reach a
1975 % fixed point.
1976 A~variable with a constant value maps to @Just (PElem k)@, where
1977 @k@~is the constant value;
1978 a variable with a non-constant value maps to @Just Top@;
1979 and a variable with an unknown value maps to @Nothing@ (it is not
1980 in the domain of the finite map).
1981
1982 % \afterpage{\clearpage}
1983
1984 The definition of the lattice (@constLattice@) is straightforward.
1985 The bottom element is an empty map (nothing is known about what
1986 any variable holds).
1987 %
1988 The join function is implemented with the help of combinators provided
1989 by \hoopl.
1990 The client writes a simple function, @constFactAdd@, which
1991 compares two values of type @Lit@ and returns a result of type
1992 @WithTop Lit@.
1993 The client uses @extendJoinDomain@ to lift @constFactAdd@ into a join
1994 function on @WithTop Lit@, then uses
1995 @joinMaps@ to lift \emph{that} join function up to the map
1996 containing facts for all variables.
1997
1998 The forward transfer function @varHasLit@ is defined using the
1999 shape-polymorphic auxiliary function~@ft@.
2000 For most nodes~@n@, @ft n@ simply propagates the input fact forward.
2001 But for an assignment node, if a variable~@x@ gets a constant value~@k@,
2002 @ft@ extends the input fact by mapping @x@ to~@PElem k@.
2003 And if a variable~@x@ is assigned a non-constant value,
2004 @ft@ extends the input fact by mapping @x@ to~@Top@.
2005 There is one other interesting case:
2006 a conditional branch where the condition
2007 is a variable.
2008 If the conditional branch flows to the true successor,
2009 the variable holds @True@, and similarly for the false successor,
2010 \emph{mutatis mutandis}.
2011 Function @ft@ updates the fact flowing to each successor accordingly.
2012 \finalremark{Because function @ft@ scrutinizes a GADT, it cannot
2013 default the uninteresting cases.}
2014
2015 The transfer function need not consider complicated cases such as
2016 an assignment @x:=y@ where @y@ holds a constant value~@k@.
2017 Instead, we rely on the interleaving of transformation
2018 and analysis to first transform the assignment to @x:=k@,
2019 which is exactly what our simple transfer function expects.
2020 As we mention in \secref{simple-tx},
2021 interleaving makes it possible to write
2022 very simple transfer functions, without missing
2023 opportunities to improve the code.
2024
2025 \figref{const-prop}'s rewrite function for constant propagation, @constProp@,
2026 rewrites each use of a variable to its constant value.
2027 The client has defined auxiliary functions that may change expressions
2028 or nodes:
2029 \begin{smallcode}
2030 type `MaybeChange a = a -> Maybe a
2031 `mapVE :: (Var -> Maybe Expr) -> MaybeChange Expr
2032 `mapEE :: MaybeChange Expr -> MaybeChange Expr
2033 `mapEN :: MaybeChange Expr -> MaybeChange (Node e x)
2034 `mapVN :: (Var -> Maybe Expr) -> MaybeChange (Node e x)
2035 `nodeToG :: Node e x -> Graph Node e x
2036 \end{smallcode}
2037 The client composes @map@\emph{XX} functions
2038 to apply @lookup@ to each use of a variable in each kind of node;
2039 @lookup@ substitutes for each variable that has a constant value.
2040 Applying @liftM nodeToG@ lifts the final node, if present, into a~@Graph@.
2041
2042
2043 \figref{const-prop} also gives another, distinct function
2044 for constant
2045 folding: @simplify@.
2046 This function
2047 rewrites constant expressions to their values,
2048 and it rewrites a conditional branch on a
2049 boolean constant to an unconditional branch.
2050 To~rewrite constant expressions,
2051 it runs @s_exp@ on every subexpression.
2052 Function @simplify@ does not check whether a variable holds a
2053 constant value; it relies on @constProp@ to have replaced the
2054 variable by the constant.
2055 Indeed, @simplify@ does not consult the
2056 incoming fact, so it is polymorphic in~@f@.
2057
2058
2059 The @FwdRewrite@ functions @constProp@ and @simplify@
2060 are useful independently.
2061 In this case, however, we
2062 want \emph{both} of them,
2063 so we compose them with @thenFwdRw@.
2064 The composition, along with the lattice and the
2065 transfer function,
2066 goes into @constPropPass@ (bottom of \figref{const-prop}).
2067 Given @constPropPass@, we can
2068 improve a graph~@g@ by passing @constPropPass@ and~@g@
2069 to
2070 @analyzeAndRewriteFwdBody@.
2071
2072
2073
2074 %%%% \subsection{Fixed points and speculative rewrites} \seclabel{fixpoints}
2075 %%%%
2076 %%%% Are rewrites sound, especially when there are loops?
2077 %%%% Many analyses compute a fixed point starting from unsound
2078 %%%% ``facts''; for example, a live-variable analysis starts from the
2079 %%%% assumption that all variables are dead. This means \emph{rewrites
2080 %%%% performed before a fixed point is reached may be unsound, and their results
2081 %%%% must be discarded}. Each iteration of the fixed-point computation must
2082 %%%% start afresh with the original graph.
2083 %%%%
2084 %%%%
2085 %%%% Although the rewrites may be unsound, \emph{they must be performed}
2086 %%%% (speculatively, and possibly recursively),
2087 %%%% so that the facts downstream of the replacement graphs are as accurate
2088 %%%% as possible.
2089 %%%% For~example, consider this graph, with entry at @L1@:
2090 %%%% \par{\small
2091 %%%% \begin{code}
2092 %%%% L1: x=0; goto L2
2093 %%%% L2: x=x+1; if x==10 then goto L3 else goto L2
2094 %%%% \end{code}}
2095 %%%% The first traversal of block @L2@ starts with the unsound ``fact'' \{x=0\};
2096 %%%% but analysis of the block propagates the new fact \{x=1\} to @L2@, which joins the
2097 %%%% existing fact to get \{x=$\top$\}.
2098 %%%% What if the predicate in the conditional branch were @x<10@ instead
2099 %%%% of @x==10@?
2100 %%%% Again the first iteration would begin with the tentative fact \{x=0\}.
2101 %%%% Using that fact, we would rewrite the conditional branch to an unconditional
2102 %%%% branch @goto L3@. No new fact would propagate to @L2@, and we would
2103 %%%% have successfully (and soundly) eliminated the loop.
2104 %%%% This example is contrived, but it illustrates that
2105 %%%% for best results we should
2106 %%%% \begin{itemize}
2107 %%%% \item Perform the rewrites on every iteration.
2108 %%%% \item Begin each new iteration with the original, virgin graph.
2109 %%%% \end{itemize}
2110 %%%% This sort of algorithm is hard to implement in an imperative setting, where rewrites
2111 %%%% mutate a graph in place.
2112 %%%% But with an immutable graph, implementing the algorithm
2113 %%%% is trivially easy: we simply revert to the original graph at the start
2114 %%%% of each fixed-point iteration.
2115
2116 \subsection{Checkpointing the client's monad}
2117
2118 \seclabel{ckpoint-monad} \seclabel{checkpoint-monad}
2119
2120 \begin{ntext}
2121 In the presence of loops, a rewrite function could make a change
2122 that later has to be rolled back.
2123 For example, consider constant propagation in this loop, which
2124 computes factorial:
2125 \begin{code}
2126 i = 1; prod = 1;
2127 L1: if (i >= n) goto L3 else goto L2;
2128 L2: i = i + 1; prod = prod * i;
2129 goto L1;
2130 L3: ...
2131 \end{code}
2132 Function @analyzeAndRewriteFwdBody@ iterates through this graph until
2133 the dataflow facts stop changing.
2134 On~the first iteration, the assignment @i = i + 1@ will
2135 be analyzed with an incoming fact @i=1@, and the assignment will be
2136 rewritten to the graph @i = 2@.
2137 But~on a later iteration, the incoming fact will increase to @i=@$\top$,
2138 and the~rewrite will no longer be justified.
2139 After each iteration, \hoopl\ starts the next iteration with
2140 \emph{new} facts but with the \emph{original} graph---by~virtue
2141 of using purely functional data structures, rewrites from
2142 previous iterations are automatically ``rolled back.''
2143 \emph{But a rewrite function doesn't only produce new graphs.}
2144 A~rewrite function can also take a \emph{monadic action}, such as
2145 acquiring a fresh name.
2146 These~actions must also be rolled back, and because the~client chooses
2147 the monad in which the actions take place, the client must provide the
2148 means to roll them back.
2149
2150 \hoopl\ defines the \emph{interface} that a client must use to roll
2151 back monadic actions.
2152 The~interface is defined by the type class @CkpointMonad@ in
2153 \figref{api-types}:
2154 \begin{code}
2155 class Monad m => `CkpointMonad m where
2156 type `Checkpoint m
2157 `checkpoint :: m (Checkpoint m)
2158 `restart :: Checkpoint m -> m ()
2159 \end{code}
2160 \hoopl\ calls the @checkpoint@ method at the beginning of an
2161 iteration, then calls the @restart@ method if another iteration is
2162 necessary.
2163 These operations must obey the following algebraic law:
2164 \begin{code}
2165 do { s <- checkpoint; m; restart s } == return ()
2166 \end{code}
2167 where @m@~represents any combination of monadic actions that might be
2168 taken by rewrite functions.
2169 (The safest course is to make sure the law holds throughout the entire
2170 monad.)
2171 The~type of the saved checkpoint~@s@ is up to the client;
2172 it is defined as an associated~type.\remark{Is ``associated type'' the
2173 \emph{au courant} phrase?}
2174 \end{ntext}
2175
2176
2177
2178 \subsection{Correctness} \seclabel{correctness}
2179
2180 % Facts computed by @analyzeAndRewriteFwdBody@ depend on graphs produced by the rewrite
2181 Facts computed by the transfer function depend on graphs produced by the rewrite
2182 function, which in turn depend on facts computed by the transfer function.
2183 How~do we know this algorithm is sound, or if it terminates?
2184 A~proof requires a POPL paper
2185 \cite{lerner-grove-chambers:2002};
2186 here we merely state the conditions for correctness as applied to \hoopl:
2187 \begin{itemize}
2188 \item
2189 The lattice must have no \emph{infinite ascending chains}; that is,
2190 every sequence of calls to @fact_join@ must eventually return @NoChange@.
2191 \item
2192 The transfer function must be
2193 \emph{monotonic}: given a more informative fact in,
2194 it must produce a more informative fact out.
2195 \item
2196 The rewrite function must be \emph{sound}:
2197 if it replaces a node @n@ by a replacement graph~@g@, then @g@~must be
2198 observationally equivalent to~@n@ under the
2199 assumptions expressed by the incoming dataflow fact~@f@.
2200 %%\footnote{We do not permit a transformation to change
2201 %% the @Label@ of a node. We have not found any optimizations
2202 %% that are prevented (or even affected) by this restriction.}
2203 %
2204 Moreover, analysis of~@g@ must produce output fact(s)
2205 that are at least as informative as the fact(s) produced by applying
2206 the transfer function to~@n@.
2207 %% \item
2208 %% The rewrite function must be \emph{consistent} with the transfer function;
2209 %% that is, \mbox{@`transfer n f@ $\sqsubseteq$ @transfer g f@}.
2210 For example, if the transfer function says that @x=7@ after the node~@n@,
2211 then after analysis of~@g@,
2212 @x@~had better still be~@7@.
2213 \item
2214 % To ensure termination,
2215 A transformation that uses deep rewriting
2216 must not return a replacement graph which
2217 contains a node that could be rewritten indefinitely.
2218 \end{itemize}
2219 %% Without the conditions on monotonicity and consistency,
2220 %% our algorithm will terminate,
2221 %% but there is no guarantee that it will compute
2222 %% a fixed point of the analysis. And that in turn threatens the
2223 %% soundness of rewrites based on possibly bogus ``facts''.
2224 Under these conditions, the algorithm terminates and is
2225 sound.
2226 %%
2227 %% \begin{itemize}
2228 %% \item
2229 %% The algorithm terminates. The fixed-point loop must terminate because the
2230 %% lattice has no infinite ascending chains. And the client is responsible
2231 %% for avoiding infinite recursion when deep rewriting is used.
2232 %% \item
2233 %% The algorithm is sound. Why? Because if each rewrite is sound (in the sense given above),
2234 %% then applying a succession of rewrites is also sound.
2235 %% Moreover, a~sound analysis of the replacement graph
2236 %% may generate only dataflow facts that could have been
2237 %% generated by a more complicated analysis of the original graph.
2238 %% \end{itemize}
2239 %%
2240 %% \finalremark{Doesn't the rewrite have to be have the following property:
2241 %% for a forward analysis/transform, if (rewrite P s) = Just s',
2242 %% then (transfer P s $\sqsubseteq$ transfer P s').
2243 %% For backward: if (rewrite Q s) = Just s', then (transfer Q s' $\sqsubseteq$ transfer Q s).
2244 %% Works for liveness.
2245 %% ``It works for liveness, so it must be true'' (NR).
2246 %% If this is true, it's worth a QuickCheck property!
2247 %% }%
2248 %% \finalremark{Version 2, after further rumination. Let's define
2249 %% $\scriptstyle \mathit{rt}(f,s) = \mathit{transform}(f, \mathit{rewrite}(f,s))$.
2250 %% Then $\mathit{rt}$ should
2251 %% be monotonic in~$f$. We think this is true of liveness, but we are not sure
2252 %% whether it's just a generally good idea, or whether it's actually a
2253 %% precondition for some (as yet unarticulated) property of \ourlib{} to hold.}%
2254
2255 %%%% \simon{The rewrite functions must presumably satisfy
2256 %%%% some monotonicity property. Something like: given a more informative
2257 %%%% fact, the rewrite function will rewrite a node to a more informative graph
2258 %%%% (in the fact lattice.).
2259 %%%% \textbf{NR}: actually the only obligation of the rewrite function is
2260 %%%% to preserve observable behavior. There's no requirement that it be
2261 %%%% monotonic or indeed that it do anything useful. It just has to
2262 %%%% preserve semantics (and be a pure function of course).
2263 %%%% \textbf{SLPJ} In that case I think I could cook up a program that
2264 %%%% would never reach a fixed point. Imagine a liveness analysis with a loop;
2265 %%%% x is initially unused anywhere.
2266 %%%% At some assignment node inside the loop, the rewriter behaves as follows:
2267 %%%% if (and only if) x is dead downstream,
2268 %%%% make it alive by rewriting the assignment to mention x.
2269 %%%% Now in each successive iteration x will go live/dead/live/dead etc. I
2270 %%%% maintain my claim that rewrite functions must satisfy some
2271 %%%% monotonicity property.
2272 %%%% \textbf{JD}: in the example you cite, monotonicity of facts at labels
2273 %%%% means x cannot go live/dead/live/dead etc. The only way we can think
2274 %%%% of not to terminate is infinite ``deep rewriting.''
2275 %%%% }
2276
2277
2278
2279
2280 \section{\ourlib's implementation}
2281 \seclabel{implementation}
2282 \seclabel{engine}
2283 \seclabel{dfengine}
2284
2285 \secref{making-simple}
2286 gives a client's-eye view of \hoopl, showing how to use
2287 it to create analyses and transformations.
2288 \hoopl's interface is simple, but
2289 the \emph{implementation} of interleaved analysis and rewriting is~not.
2290 \citet{lerner-grove-chambers:2002}
2291 do not describe their implementation. We have written
2292 at least three previous implementations, all of which
2293 were long and hard to understand, and only one of which
2294 provided compile-time guarantees about open and closed shapes.
2295 We are not confident that any of our previous implementations are correct.
2296
2297 In this paper we describe a new implementation. It is elegant and short
2298 (about a third of the size of our last attempt), and it offers
2299 strong compile-time guarantees about shapes.
2300 \finalremark{Wanted: enumerate the critical components and give each one's size}
2301 %
2302 We describe the implementation of \emph{forward}
2303 analysis and transformation.
2304 The implementations of backward analysis and transformation are
2305 exactly analogous and are included in \hoopl.
2306
2307 We~also, starting in \secref{first-debugging-section}, explain how we
2308 isolate errors in faulty optimizers, and how the fault-isolation
2309 machinery is integrated with the rest of the implementation.
2310
2311
2312
2313
2314
2315 \subsection{Overview}
2316
2317
2318 %% We on @analyzeAndRewriteFwd@, whose type is more general
2319 %% than that of @analyzeAndRewriteFwdBody@:
2320 %% \begin{smallcode}
2321 %% `analyzeAndRewriteFwd
2322 %% :: forall m n f e x. (FuelMonad m, NonLocal n)
2323 %% => FwdPass m n f -- lattice, transfers, rewrites
2324 %% -> MaybeC e [Label] -- entry points for a closed graph
2325 %% -> Graph n e x -- the original graph
2326 %% -> Fact e f -- fact(s) flowing into the entry/entries
2327 %% -> m (Graph n e x, FactBase f, MaybeO x f)
2328 %% \end{smallcode}
2329 %% We analyze graphs of all shapes; a single @FwdPass@ may be used with
2330 %% multiple shapes.
2331 %% If a graph is closed on entry, a list of entry points must be
2332 %% provided;
2333 %% if the graph is open on entry, it must be the case that the
2334 %% implicit entry point is the only entry point.
2335 %% The fact or set of facts flowing into the entries is similarly
2336 %% determined by the shape of the entry point.
2337 %% Finally, the result is a rewritten graph, a @FactBase@ that gives a
2338 %% fixed point of the analysis on the rewritten graph, and if the graph
2339 %% is open on exit, an ``exit fact'' flowing out.
2340
2341 Instead of the interface function @analyzeAndRewriteFwdBody@, we present
2342 the private function @arfGraph@ (short for ``analyze and rewrite
2343 forward graph''):
2344 \begin{smallcode}
2345 `arfGraph
2346 :: forall m n f e x. (FuelMonad m, NonLocal n)
2347 => FwdPass m n f -- lattice, transfers, rewrites
2348 -> MaybeC e [Label] -- entry points for a closed graph
2349 -> Graph n e x -- the original graph
2350 -> Fact e f -- fact(s) flowing into the entry/entries
2351 -> m (DG f n e x, Fact x f)
2352 \end{smallcode}
2353 Function @arfGraph@ has a more general type than
2354 the function @analyzeAndRewriteFwdBody@ % praying for a decent line break
2355 because @arfGraph@ is used recursively
2356 to analyze graphs of all shapes.
2357 If a graph is closed on entry, a list of entry points must be
2358 provided;
2359 if the graph is open on entry,
2360 the graph's entry sequence must be the only entry point.
2361 The~graph's shape on entry also determines the type of fact or facts
2362 flowing in.
2363 Finally, the result is a ``decorated graph''
2364 @DG f n e x@,
2365 and if the graph
2366 is open on exit, an ``exit fact'' flowing out.
2367
2368 %% \simon{I suggest (a) putting the paragraph break one sentence earlier,
2369 %%% so that this para is all about DGs.}
2370 %%% NR: previous para is about the type of arfGraph; I don't want to
2371 %%% leave the result type dangling. I hope the opening sentence of
2372 %%% this para suggests that the para is all about DGs.
2373 A~``decorated graph'' is one in which each block is decorated with the
2374 fact that holds at the start of the block.
2375 @DG@ actually shares a representation with @Graph@,
2376 which is possible because the definition of
2377 @Graph@ in \figref{graph} contains a white lie: @Graph@~is a type
2378 synonym for an underlying type @`Graph'@, which takes the type of block
2379 as an additional parameter.
2380 (Similarly, function @gSplice@ in \secref{gSplice} is actually a
2381 higher-order function that takes a block-concatenation function as a
2382 parameter.)
2383 The truth about @Graph@ and @DG@ is as follows:
2384 \smallverbatiminput{dg}
2385 % defn DG
2386 % defn DBlock
2387 Type~@DG@ is internal to \hoopl; it is not seen by any client.
2388 To~convert a~@DG@ to the @Graph@ and @FactBase@
2389 that are returned by the API function @analyzeAndRewriteFwdBody@,
2390 we use a 12-line function:
2391 \begin{smallcode}
2392 `normalizeGraph
2393 :: NonLocal n => DG f n e x -> (Graph n e x, FactBase f)
2394 \end{smallcode}
2395
2396 Function @arfGraph@ is implemented as follows:
2397 \begingroup
2398 \def\^{\\[-6pt]}%
2399 \begin{smallttcode}
2400 arfGraph ^pass entries = graph
2401 where\^
2402 node :: forall e x . (ShapeLifter e x)
2403 => n e x -> f -> m (DG f n e x, Fact x f)\^
2404 block:: forall e x .
2405 Block n e x -> f -> m (DG f n e x, Fact x f)\^
2406 body :: [Label] -> LabelMap (Block n C C)
2407 -> Fact C f -> m (DG f n C C, Fact C f)\^
2408 `graph:: Graph n e x -> Fact e f -> m (DG f n e x, Fact x f)\^
2409 ... definitions of 'node', 'block', 'body', and 'graph' ...
2410 \end{smallttcode}
2411 The four auxiliary functions help us separate concerns: for example, only
2412 \endgroup
2413 @node@ knows about rewrite functions;
2414 and only @body@ knows about fixed points.
2415 %% All four functions have access to the @FwdPass@ and any entry points
2416 %% that are passed to @arfGraph@.
2417 %% These functions also have access to type variables bound by
2418 %% @arfGraph@:
2419 %% @n@~is the type of nodes; @f@~is the type of facts;
2420 %% @m@~is the monad used in the rewrite functions of the @FwdPass@;
2421 %% and
2422 %% @e@~and~@x@ give the shape of the graph passed to @arfGraph@.
2423 %% The types of the inner functions are
2424 %% \begin{smallcode}
2425 %% \end{smallcode}
2426 Each auxiliary function works the same way: it~takes a ``thing'' and
2427 returns an \emph{extended fact transformer}.
2428 An~extended fact transformer takes dataflow fact(s) coming into
2429 the ``thing,'' and it returns an output fact.
2430 It~also returns a decorated graph representing the (possibly
2431 rewritten) ``thing''---that's the \emph{extended} part.
2432 Finally, because a rewrite may require fresh names provided
2433 by the client,
2434 may wish to consult state provided by the client,
2435 or may consume ``optimization fuel'' (\secref{fuel}),
2436 every extended fact transformer is monadic.
2437
2438 %%%% \begin{figure}
2439 %%%% SIMON HAS ASKED IF TYPE SYNONYMS MIGHT IMPROVE THINGS FOR EXTENDED
2440 %%%% FACT TRANSFORMERS. JUDGE FOR YOURSELF.
2441 %%%% FIRST, SOMETHING THAT IS SOMEWHAT READABLE BUT IS NOT LEGAL HASKELL:
2442 %%%% \begin{smallcode}
2443 %%%% type EFFX ipt e x = ipt -> m (DG f n e x, Fact x f)
2444 %%%% -- extended forward fact transformer
2445 %%%%
2446 %%%% node :: forall e x . (ShapeLifter e x)
2447 %%%% => n e x -> EFFX f e x
2448 %%%% block :: forall e x .
2449 %%%% Block n e x -> EFFX f e x
2450 %%%% body :: [Label] -> LabelMap (Block n C C)
2451 %%%% -> EFFX (Fact C f) C C
2452 %%%% graph :: Graph n e x -> EFFX (Fact e f) e x
2453 %%%% \end{smallcode}
2454 %%%% IF WE MAKE IT LEGAL HASKELL, IT BECOMES COMPLETELY HOPELESS:
2455 %%%% \begin{smallcode}
2456 %%%% type EFFX m n f ipt e x = ipt -> m (DG f n e x, Fact x f)
2457 %%%% -- extended forward fact transformer
2458 %%%%
2459 %%%% node :: forall e x . (ShapeLifter e x)
2460 %%%% => n e x -> EFFX m n f f e x
2461 %%%% block :: forall e x .
2462 %%%% Block n e x -> EFFX m n f f e x
2463 %%%% body :: [Label] -> LabelMap (Block n C C)
2464 %%%% -> EFFX m n f (Fact C f) C C
2465 %%%% graph :: Graph n e x -> EFFX m n f (Fact e f) e x
2466 %%%% \end{smallcode}
2467 %%%% \caption{EXPERIMENTS WITH TYPE SYNONYMS}
2468 %%%% \end{figure}
2469 %%%%
2470
2471
2472 The types of the four extended fact transformers are not quite
2473 identical:
2474 \begin{itemize}
2475 \item
2476 Extended fact transformers for nodes and blocks have the same type;
2477 like forward transfer functions,
2478 they expect a fact~@f@ rather than the more general @Fact e f@
2479 required for a graph.
2480 Because a node or a block has
2481 exactly one fact flowing into the entry, it is easiest simply to pass
2482 that fact.
2483 \item
2484 Extended fact transformers for graphs have the most general type,
2485 as expressed using @Fact@:
2486 if the graph is open on entry, its fact transformer expects a
2487 single fact;
2488 if the graph is closed on entry, its fact transformer expects a
2489 @FactBase@.
2490 \item
2491 Extended fact transformers for bodies have the same type as
2492 extended fact transformers for closed/closed graphs.
2493 \end{itemize}
2494
2495
2496 Function @arfGraph@ and its four auxiliary functions comprise a cycle of
2497 mutual recursion:
2498 @arfGraph@ calls @graph@;
2499 @graph@ calls @body@ and @block@;
2500 @body@ calls @block@;
2501 @block@ calls @node@;
2502 and
2503 @node@ calls @arfGraph@.
2504 These five functions do three different kinds of work:
2505 compose extended fact transformers, analyze and rewrite nodes, and compute
2506 fixed points.
2507
2508
2509
2510 \subsection{Analyzing blocks and graphs by composing extended fact transformers}
2511 \seclabel{block-impl}
2512
2513 Extended fact transformers compose nicely.
2514 For example, @block@ is implemented thus:
2515 %\ifpagetuning\par\vfilbreak[1in]\fi % horrible page break
2516 \begin{smallcode}
2517 `block :: forall e x .
2518 Block n e x -> f -> m (DG f n e x, Fact x f)
2519 block (BFirst n) = node n
2520 block (BMiddle n) = node n
2521 block (BLast n) = node n
2522 block (BCat b1 b2) = block b1 `cat` block b2
2523 \end{smallcode}
2524 The composition function @cat@ feeds facts from one extended fact
2525 transformer to another, and it splices decorated graphs.
2526 It~has a very general type:
2527 \begin{smallcode}
2528 cat :: forall m e a x f f1 f2. Monad m
2529 => (f -> m (DG f n e a, f1))
2530 -> (f1 -> m (DG f n a x, f2))
2531 -> (f -> m (DG f n e x, f2))
2532 `cat ft1 ft2 f = do { (g1,f1) <- ft1 f
2533 ; (g2,f2) <- ft2 f1
2534 ; return (g1 `dgSplice` g2, f2) }
2535 \end{smallcode}
2536 (Function @`dgSplice@ is the same splicing function used for an ordinary
2537 @Graph@, but it uses a one-line block-concatenation function suitable
2538 for @DBlock@s.)
2539 The name @cat@ comes from the concatenation of the decorated graphs,
2540 but it is also appropriate because the style in which it is used is
2541 reminiscent of @concatMap@, with the @node@ and @block@ functions
2542 playing the role of @map@.
2543
2544 \seclabel{concat-map-style}
2545
2546 Function @graph@ is much like @block@, but it has more cases.
2547
2548
2549 %%%%
2550 %%%% \begin{itemize}
2551 %%%% \item
2552 %%%% The @arfNode@ function processes nodes (\secref{arf-node}).
2553 %%%% It handles the subtleties of interleaved analysis and rewriting,
2554 %%%% and it deals with fuel consumption. It calls @arfGraph@ to analyze
2555 %%%% and transform rewritten graphs.
2556 %%%% \item
2557 %%%% Based on @arfNode@ it is extremely easy to write @arfBlock@, which lifts
2558 %%%% the analysis and rewriting from nodes to blocks (\secref{arf-block}).
2559 %%%%
2560 %%%%
2561 %%%%
2562 %%%% \item
2563 %%%% Using @arfBlock@ we define @arfBody@, which analyzes and rewrites a
2564 %%%% @Body@: a~group of closed/closed blocks linked by arbitrary
2565 %%%% control flow.
2566 %%%% Because a @Body@ is
2567 %%%% always closed/closed and does not take shape parameters, function
2568 %%%% @arfBody@ is less polymorphic than the others; its type is what
2569 %%%% would be obtained by expanding and specializing the definition of
2570 %%%% @ARF@ for a @thing@ which is always closed/closed and is equivalent to
2571 %%%% a @Body@.
2572 %%%%
2573 %%%% Function @arfBody@ takes care of fixed points (\secref{arf-body}).
2574 %%%% \item
2575 %%%% Based on @arfBody@ it is easy to write @arfGraph@ (\secref{arf-graph}).
2576 %%%% \end{itemize}
2577 %%%% Given these functions, writing the main analyzer is a simple
2578 %%%% matter of matching the external API to the internal functions:
2579 %%%% \begin{code}
2580 %%%% `analyzeAndRewriteFwdBody
2581 %%%% :: forall n f. NonLocal n
2582 %%%% => FwdPass n f -> Body n -> FactBase f
2583 %%%% -> FuelMonad (Body n, FactBase f)
2584 %%%%
2585 %%%% analyzeAndRewriteFwdBody pass ^body facts
2586 %%%% = do { (^rg, _) <- arfBody pass body facts
2587 %%%% ; return (normalizeBody rg) }
2588 %%%% \end{code}
2589 %%%%
2590 %%%% \subsection{From nodes to blocks} \seclabel{arf-block}
2591 %%%% \seclabel{arf-graph}
2592 %%%%
2593 %%%% We begin our explanation with the second task:
2594 %%%% writing @arfBlock@, which analyzes and transforms blocks.
2595 %%%% \begin{code}
2596 %%%% `arfBlock :: NonLocal n => ARF (Block n) n
2597 %%%% arfBlock pass (BUnit node) f
2598 %%%% = arfNode pass node f
2599 %%%% arfBlock pass (BCat b1 b2) f
2600 %%%% = do { (g1,f1) <- arfBlock pass b1 f
2601 %%%% ; (g2,f2) <- arfBlock pass b2 f1
2602 %%%% ; return (g1 `DGCatO` g2, f2) }
2603 %%%% \end{code}
2604 %%%% The code is delightfully simple.
2605 %%%% The @BUnit@ case is implemented by @arfNode@.
2606 %%%% The @BCat@ case is implemented by recursively applying @arfBlock@ to the two
2607 %%%% sub-blocks, threading the output fact from the first as the
2608 %%%% input to the second.
2609 %%%% Each recursive call produces a rewritten graph;
2610 %%%% we concatenate them with @DGCatO@.
2611 %%%%
2612 %%%% Function @arfGraph@ is equally straightforward:
2613 %%%% XXXXXXXXXXXXXXX
2614 %%%% The pattern is the same as for @arfBlock@: thread
2615 %%%% facts through the sequence, and concatenate the results.
2616 %%%% Because the constructors of type~@DG@ are more polymorphic than those
2617 %%%% of @Graph@, type~@DG@ can represent
2618 %%%% graphs more simply than @Graph@; for example, each element of a
2619 %%%% @GMany@ becomes a single @DG@ object, and these @DG@ objects are then
2620 %%%% concatenated to form a single result of type~@DG@.
2621 %%%%
2622
2623 \subsection{Analyzing and rewriting nodes} \seclabel{arf-node}
2624
2625 The @node@ function is where we interleave analysis with rewriting:
2626 \remark{Implementation of rewriting is finally revealed---needs to be earlier}
2627 \smallverbatiminput{node}
2628 % defn ShapeLifter
2629 % defn singletonDG
2630 % defn fwdEntryFact
2631 % defn fwdEntryLabel
2632 % defn ftransfer
2633 % defn frewrite
2634 %\begin{smallfuzzcode}{10.5pt}
2635 %node :: forall e x . (ShapeLifter e x, FuelMonad m)
2636 % => n e x -> f -> m (DG f n e x, Fact x f)
2637 %node n f
2638 % = do { ^rew <- frewrite pass n f
2639 % ; case rew of
2640 % Nothing -> return (singletonDG f n,
2641 % ftransfer pass n f)
2642 % Just (FwdRew g rw) ->
2643 % let ^pass' = pass { fp_rewrite = rw }
2644 % f' = fwdEntryFact n f
2645 % in arfGraph pass' (fwdEntryLabel n) g f' }
2646 %
2647 %class `ShapeLifter e x where
2648 % `singletonDG :: f -> n e x -> DG f n e x
2649 % `fwdEntryFact :: NonLocal n => n e x -> f -> Fact e f
2650 % `fwdEntryLabel :: NonLocal n => n e x -> MaybeC e [Label]
2651 % `ftransfer :: FwdPass m n f -> n e x -> f -> Fact x f
2652 % `frewrite :: FwdPass m n f -> n e x
2653 % -> f -> m (Maybe (FwdRew m n f e x))
2654 %\end{smallfuzzcode}
2655
2656 Function @node@ uses @frewrite@ to extract the rewrite function from
2657 @pass@,
2658 and applies the rewrite function to the node~@n@ and the incoming fact~@f@.
2659 The result, @rew@, is
2660 scrutinized by the @case@ expression.
2661
2662 In the @Nothing@ case, no rewrite takes place.
2663 We~return node~@n@ and its incoming fact~@f@
2664 as the decorated graph @singletonDG f n@.
2665 To produce the outgoing fact, we apply the transfer function
2666 @ftransfer pass@ to @n@~and~@f@.
2667
2668 In the @Just@ case, we receive a replacement
2669 graph~@g@ and a new rewrite function~@rw@.
2670 We~recursively analyze @g@ with @arfGraph@.
2671 This analysis uses @pass'@, which contains the original lattice and transfer
2672 function from @pass@, together with the new rewrite function~@rw@.
2673 Function @fwdEntryFact@ converts fact~@f@ from the type~@f@,
2674 which @node@ expects, to the type @Fact e f@, which @arfGraph@ expects.
2675
2676 As you see, several functions called in @node@ are overloaded over a
2677 (private) class @ShapeLifter@, because their implementations depend
2678 on the open/closed shape of the node.
2679 By design, the shape of a node is known statically everywhere @node@
2680 is called, so
2681 this use of @ShapeLifter@ is specialized
2682 away by the compiler.
2683
2684 %% And that's it! If~the client wanted deep rewriting, it is
2685 %% implemented by the call to @arfGraph@;
2686 %% if the client wanted
2687 %% shallow rewriting, the rewrite function will have returned
2688 %% @noFwdRw@ as~@rw@, which is implanted in @pass'@
2689 %% (\secref{shallow-vs-deep}).
2690
2691
2692 \subsection{Fixed points} \seclabel{arf-body}
2693
2694 The fourth and final auxiliary function of @arfGraph@ is
2695 @body@, which iterates to a fixed point.
2696 This part of the implementation is the only really tricky part, and it is
2697 cleanly separated from everything else:
2698 \smallverbatiminput{bodyfun}
2699 % defn body
2700 % local do_block
2701 % local blocks
2702 % local lattice
2703 % local entryFact
2704 % local entries
2705 % local init_fbase
2706 % local blockmap
2707 Function @getFact@ looks up a fact by its label.
2708 If the label is not found,
2709 @getFact@ returns
2710 the bottom element of the lattice:
2711 \begin{smallcode}
2712 `getFact :: DataflowLattice f -> Label -> FactBase f -> f
2713 \end{smallcode}
2714 Function @forwardBlockList@ takes a list of possible entry points and
2715 a finite map from labels to blocks.
2716 It returns a list of
2717 blocks, sorted into an order that makes forward dataflow efficient.\footnote
2718 {The order of the blocks does not affect the fixed point or any other
2719 part of the answer; it affects only the number of iterations needed to
2720 reach the fixed point.}
2721 \begin{smallcode}
2722 `forwardBlockList
2723 :: NonLocal n
2724 => [Label] -> LabelMap (Block n C C) -> [Block n C C]
2725 \end{smallcode}
2726 For
2727 example, if the entry point is at~@L2@, and the block at~@L2@
2728 branches to~@L1@, but not vice versa, then \hoopl\ will reach a fixed point
2729 more quickly if we process @L2@ before~@L1@.
2730 To~find an efficient order, @forwardBlockList@ uses
2731 the methods of the @NonLocal@ class---@entryLabel@ and @successors@---to
2732 perform a reverse postorder depth-first traversal of the control-flow graph.
2733 %%
2734 %%The @NonLocal@ type-class constraint on~@n@ propagates to all the
2735 %%@`arfThing@ functions.
2736 %% paragraph carrying too much freight
2737 %%
2738
2739 The rest of the work is done by @`fixpoint@, which is shared by
2740 both forward and backward analyses:
2741 \smallverbatiminput{fptype}
2742 % defn Direction
2743 % defn Fwd
2744 % defn Bwd
2745 Except for the @Direction@ passed as the first argument,
2746 the type signature tells the story.
2747 The third argument is an extended fact transformer for a single block;
2748 @fixpoint@ applies that function successively to each block in the list
2749 passed as the fourth argument.
2750 The result is an extended fact transformer for the list.
2751
2752 The extended fact transformer returned by @fixpoint@
2753 maintains a
2754 ``current @FactBase@''
2755 which grows monotonically:
2756 as each block is analyzed,
2757 the block's input fact is taken from
2758 the current @FactBase@,
2759 and the current @FactBase@
2760 is
2761 augmented with the facts that flow out of the block.
2762 %
2763 The initial value of the current @FactBase@ is the input @FactBase@,
2764 and
2765 the extended fact transformer
2766 iterates over the blocks until the current @FactBase@
2767 stops changing.
2768
2769
2770
2771 Implementing @fixpoint@ requires about 90 lines,
2772 formatted narrowly for display in one column.
2773 %%
2774 %% for completeness, it
2775 %% appears in Appendix~\ref{app:fixpoint}.
2776 The~code is mostly straightforward, although we try to be clever
2777 about deciding when a new fact means that another iteration over the
2778 blocks will be required.
2779 \finalremark{Rest of this \S\ is a candidate to be cut}
2780 There is one more subtle point worth mentioning, which we highlight by
2781 considering a forward analysis of this graph, where execution starts at~@L1@:
2782 \begin{code}
2783 L1: x:=3; goto L4
2784 L2: x:=4; goto L4
2785 L4: if x>3 goto L2 else goto L5
2786 \end{code}
2787 Block @L2@ is unreachable.
2788 But if we \naively\ process all the blocks (say in
2789 order @L1@, @L4@, @L2@), then we will start with the bottom fact for @L2@, propagate
2790 \{@x=4@\} to @L4@, where it will join with \{@x=3@\} to yield
2791 \{@x=@$\top$\}.
2792 Given @x=@$\top$, the
2793 conditional in @L4@ cannot be rewritten, and @L2@~seems reachable. We have
2794 lost a good optimization.
2795
2796 Function @fixpoint@ solves this problem
2797 by analyzing a block only if the block is
2798 reachable from an entry point.
2799 This trick is safe only for a forward analysis, which
2800 is why
2801 @fixpoint@ takes a @Direction@ as its first argument.
2802
2803 %% Although the trick can be implemented in just a couple of lines of
2804 %% code, the reasoning behind it is quite subtle---exactly the sort of
2805 %% thing that should be implemented once in \hoopl, so clients don't have
2806 %% to worry about it.
2807
2808
2809 \subsection{Throttling rewriting using ``optimization fuel''}
2810 \seclabel{vpoiso}
2811 \seclabel{fuel}
2812 \seclabel{whalley-from-s2}
2813 \seclabel{first-debugging-section}
2814
2815 When optimization produces a faulty program,
2816 we use Whalley's \citeyearpar{whalley:isolation} technique to find the fault:
2817 given a program that fails when compiled with optimization,
2818 a binary search on the number of rewrites
2819 finds an~$n$ such that the program works correctly after $n-1$ rewrites
2820 but fails after $n$~rewrites.
2821 The $n$th rewrite is faulty.
2822 As~alluded to at the end of \secref{debugging-introduced}, this
2823 technique enables us to debug complex optimizations by
2824 identifying one single rewrite that is faulty.
2825
2826 This debugging technique requires the~ability to~limit
2827 the number of~rewrites,
2828 which we provide in the form of \emph{optimization fuel}.
2829 If each rewrite consumes one unit of optimization fuel
2830 (and does nothing if there is no remaining fuel),
2831 we can perform the binary search by running the optimization
2832 with different supplies of fuel.
2833 The supply of fuel is encapsulated in the @FuelMonad@ type class (\figref{api-types}),
2834 which must be implemented by the client's monad @m@.
2835 And to~ensure that each rewrite consumes one~unit of~fuel,
2836 the function @mkFRewrite@ wraps the client's rewrite function, which is oblivious to fuel,
2837 in another function that satisfies the following contract:
2838 \begin{itemize}
2839 \item
2840 If the fuel supply is empty, the wrapped function always returns @Nothing@.
2841 \item
2842 If the wrapped function returns @Just g@, it has the monadic effect of
2843 reducing the fuel supply by one unit.
2844 \end{itemize}
2845
2846 %%%% \seclabel{fuel-monad}
2847 %%%%
2848 %%%% \begin{ntext}
2849 %%%% \subsection{Rewrite functions}
2850 %%%%
2851 %%%%
2852 %%%%
2853 %%%% \begin{code}
2854 %%%% `withFuel :: FuelMonad m => Maybe a -> m (Maybe a)
2855 %%%% \end{code}
2856 %%%%
2857 %%%%
2858 %%%% as expressed by the
2859 %%%% @FwdRew@ type returned by a @FwdRewrite@ (\figref{api-types}).
2860 %%%% The first component of the @FwdRew@ is the replacement graph, as discussed earlier.
2861 %%%% The second component, $\rw$, is a
2862 %%%% \emph{new rewrite function} to use when recursively processing
2863 %%%% the replacement graph.
2864 %%%% For shallow rewriting this new function is
2865 %%%% the constant @Nothing@ function; for deep rewriting it is the original
2866 %%%% rewrite function.
2867 %%%% While @mkFRewrite@ allows for general rewriting, most clients will
2868 %%%% take advantage of \hoopl's support for these two common cases:
2869 %%%% \begin{smallcode}
2870 %%%% `deepFwdRw, `shallowFwdRw
2871 %%%% :: Monad m
2872 %%%% => (forall e x . n e x -> f -> m (Maybe (Graph n e x))
2873 %%%% -> FwdRewrite m n f
2874 %%%% \end{smallcode}
2875 %%%% \end{ntext}
2876
2877
2878
2879 \section {Related work} \seclabel{related}
2880
2881 While there is a vast body of literature on
2882 dataflow analysis and optimization,
2883 relatively little can be found on
2884 the \emph{design} of optimizers, which is the topic of this paper.
2885 We therefore focus on the foundations of dataflow analysis
2886 and on the implementations of some comparable dataflow frameworks.
2887
2888 \paragraph{Foundations}
2889
2890 When transfer functions are monotone and lattices are finite in height,
2891 iterative dataflow analysis converges to a fixed point
2892 \cite{kam-ullman:global-iterative-analysis}.
2893 If~the lattice's join operation distributes over transfer
2894 functions,
2895 this fixed point is equivalent to a join-over-all-paths solution to
2896 the recursive dataflow equations
2897 \cite{kildall:unified-optimization}.\footnote
2898 {Kildall uses meets, not joins.
2899 Lattice orientation is a matter of convention, and conventions have changed.
2900 We use Dana Scott's
2901 orientation, in which higher elements carry more information.}
2902 \citet{kam-ullman:monotone-flow-analysis} generalize to some
2903 monotone functions.
2904 Each~client of \hoopl\ must guarantee monotonicity.
2905
2906 \ifcutting
2907 \citet{cousot:abstract-interpretation:1977}
2908 \else
2909 \citet{cousot:abstract-interpretation:1977,cousot:systematic-analysis-frameworks}
2910 \fi
2911 introduce abstract interpretation as a technique for developing
2912 lattices for program analysis.
2913 \citet{steffen:data-flow-analysis-model-checking:1991} shows that
2914 a dataflow analysis can be implemented using model checking;
2915 \citet{schmidt:data-flow-analysis-model-checking}
2916 expands on this~result by showing that
2917 an all-paths dataflow problem can be viewed as model checking an
2918 abstract interpretation.
2919
2920 \citet{marlowe-ryder:properties-data-flow-frameworks}
2921 present a survey of different methods for performing dataflow analyses,
2922 with emphasis on theoretical results.
2923 \citet{muchnick:compiler-implementation}
2924 presents many examples of both particular analyses and related
2925 algorithms.
2926
2927
2928 The soundness of interleaving analysis and transformation,
2929 even when not all speculative transformations are performed on later
2930 iterations, was shown by
2931 \citet{lerner-grove-chambers:2002}.
2932
2933
2934 \paragraph{Frameworks}
2935 Most dataflow frameworks support only analysis, not transformation.
2936 The framework computes a fixed point of transfer functions, and it is
2937 up to the client of
2938 the framework to use that fixed point for transformation.
2939 Omitting transformation makes it much easier to build frameworks,
2940 and one can find a spectrum of designs.
2941 We~describe two representative
2942 designs, then move on to the prior frameworks that support interleaved
2943 analysis and transformation.
2944
2945 The Soot framework is designed for analysis of Java programs
2946 \cite{hendren:soot:2000}.\finalremark{This citation is probably the
2947 best for Soot in general, but there doesn't appear
2948 to be any formal publication that actually details the dataflow
2949 framework part. ---JD}
2950 While Soot's dataflow library supports only analysis, not
2951 transformation, we found much
2952 to admire in its design.
2953 Soot's library is abstracted over the representation of
2954 the control-flow graph and the representation of instructions.
2955 Soot's interface for defining lattice and analysis functions is
2956 like our own,
2957 although because Soot is implemented in an imperative style,
2958 additional functions are needed to copy lattice elements.
2959
2960
2961 The CIL toolkit \cite{necula:cil:2002}\finalremark{No good citation
2962 for same reason as Soot below ---JD}
2963 supports both analysis and rewriting of C~programs,
2964 but rewriting is clearly distinct from analysis:
2965 one runs an analysis to completion and then rewrites based on the
2966 results.
2967 The framework is limited to one representation of control-flow graphs
2968 and one representation of instructions, both of which are provided by
2969 the framework.
2970 The~API is complicated;
2971 much of the complexity is needed to enable the client to
2972 affect which instructions
2973 the analysis iterates over.
2974
2975
2976 \finalremark{FYI, LLVM has Pass Managers that try to control the order of passes,
2977 but I'll be darned if I can find anything that might be termed a dataflow framework.}
2978
2979 The Whirlwind compiler contains the dataflow framework implemented
2980 by \citet{lerner-grove-chambers:2002}, who were the first to
2981 interleave analysis and transformation.
2982 Their implementation is much like our early efforts:
2983 it is a complicated mix of code that simultaneously manages interleaving,
2984 deep rewriting, and fixed-point computation.
2985 By~separating these tasks,
2986 our implementation simplifies the problem dramatically.
2987 Whirlwind's implementation also suffers from the difficulty of
2988 maintaining pointer invariants in a mutable representation of
2989 control-flow graphs, a problem we have discussed elsewhere
2990 \cite{ramsey-dias:applicative-flow-graph}.
2991
2992 Because speculative transformation is difficult in an imperative setting,
2993 Whirlwind's implementation is split into two phases.
2994 The first phase runs the interleaved analyses and transformations
2995 to compute the final dataflow facts and a representation of the transformations
2996 that should be applied to the input graph.
2997 The second phase executes the transformations.
2998 In~\hoopl, because control-flow graphs are immutable, speculative transformations
2999 can be applied immediately, and there is no need
3000 for a phase distinction.
3001
3002 %%% % repetitious...
3003 %%%
3004 %%% \ourlib\ also improves upon Whirlwind's dataflow framework by providing
3005 %%% new support for the optimization writer:
3006 %%% \begin{itemize}
3007 %%% \item Using static type guarantees, \hoopl\ rules out a whole
3008 %%% class of possible bugs: transformations that produced malformed
3009 %%% control-flow graphs.
3010 %%% \item Using dynamic testing,
3011 %%% we can isolate the rewrite that transforms a working program
3012 %%% into a faulty program,
3013 %%% using Whalley's \citeyearpar{whalley:isolation} fault-isolation technique.
3014 %%% \end{itemize}
3015
3016 %% what follows now looks redundant with discussion below ---NR
3017
3018 %% In previous work \cite{ramsey-dias:applicative-flow-graph}, we
3019 %% described a zipper-based representation of control-flow
3020 %% graphs, stressing the advantages
3021 %% of immutability.
3022 %% Our new representation, described in \secref{graph-rep}, is a significant improvement:
3023 %% \begin{itemize}
3024 %% \item
3025 %% We can concatenate nodes, blocks, and graphs in constant time.
3026 %% %Previously, we had to resort to Hughes's
3027 %% %\citeyearpar{hughes:lists-representation:article} technique, representing
3028 %% %a graph as a function.
3029 %% \item
3030 %% We can do a backward analysis without having
3031 %% to ``unzip'' (and allocate a copy of) each block.
3032 %% \item
3033 %% Using GADTs, we can represent a flow-graph
3034 %% node using a single type, instead of the triple of first, middle, and
3035 %% last types used in our earlier representation.
3036 %% This change simplifies the interface significantly:
3037 %% instead of providing three transfer functions and three rewrite
3038 %% functions per pass---one for
3039 %% each type of node---a client of \hoopl\ provides only one transfer
3040 %% function and one rewrite function per pass.
3041 %% \item
3042 %% Errors in concatenation are ruled out at
3043 %% compile-compile time by Haskell's static
3044 %% type system.
3045 %% In~earlier implementations, such errors were not detected until
3046 %% the compiler~ran, at which point we tried to compensate
3047 %% for the errors---but
3048 %% the compensation code harbored subtle faults,
3049 %% which we discovered while developing a new back end
3050 %% for the Glasgow Haskell Compiler.
3051 %% \end{itemize}
3052 %%
3053 %% The implementation of \ourlib\ is also much better than
3054 %% our earlier implementations.
3055 %% Not only is the code simpler conceptually,
3056 %% but it is also shorter:
3057 %% our new implementation is about a third as long
3058 %% as the previous version, which is part of GHC, version 6.12.
3059
3060
3061
3062
3063 \section{Performance considerations}
3064
3065 Our work on \hoopl\ is too new for us to be able to say much
3066 about performance.
3067 It's~important to know how well \hoopl\ performs, but the
3068 question is comparative, and there isn't another library we can compare
3069 \hoopl\ with.
3070 For example, \hoopl\ is not a drop-in replacement for an existing
3071 component of GHC; we introduced \hoopl\ to GHC as part of a
3072 major refactoring of GHC's back end.
3073 The version of GHC with \hoopl\ seems about 15\%~slower than
3074 the previous version, and we
3075 don't know what portion of the slowdown can be attributed to the
3076 optimizer.
3077 %
3078 What we can say is that the costs of using \hoopl\ seem reasonable;
3079 there is no ``big performance hit.''
3080 And~a somewhat similar library, written in an \emph{impure} functional
3081 language, actually improved performance in an apples-to-apples
3082 comparison with a library using a mutable control-flow graph
3083 \cite{ramsey-dias:applicative-flow-graph}.
3084
3085 Although a~thorough evaluation of \hoopl's performance must await
3086 future work, we can identify some design decisions that affect
3087 performance.
3088 \begin{itemize}
3089 \item
3090 In \figref{graph}, we show a single concatenation operator for blocks.
3091 Using this representation, a block of $N$~nodes is represented using
3092 $2N-1$ heap objects.
3093 We~have also implemented a representation of blocks that include
3094 ``cons-like'' and ``snoc-like'' constructors;
3095 this representation requires only $N+1$ heap objects.
3096 We~don't know what difference this choice makes to performance.
3097 \item
3098 In \secref{engine}, the @body@ function analyzes and (speculatively)
3099 rewrites the body of a control-flow graph, and @fixpoint@ iterates
3100 this analysis until it reaches a fixed point.
3101 Decorated graphs computed on earlier iterations are thrown away.
3102 But~for each decorated graph of $N$~nodes, it is necessary to allocate
3103 at least $2N-1$~thunks, which correspond to applications of
3104 @singletonDG@ in~@node@ and of @dgSplice@ in~@cat@.
3105 In~an earlier version of \hoopl, this overhead was
3106 eliminated by splitting @arfGraph@ into two very similar functions: one to compute the
3107 fixed point, and the other to produce the rewritten graph.
3108 Having a single version of @arfGraph@ is simpler and easier
3109 to maintain, but we don't know the cost of allocating the
3110 additional thunks.
3111 \item
3112 The representation of a forward-transfer function is private to
3113 \hoopl.
3114 Two representations are possible:
3115 we may store a triple of functions, one for each shape a node may
3116 have;
3117 or we may store a single, polymorphic function.
3118 If~we use triples throughout, the costs are straightforward, but the
3119 code is complex.
3120 If~we use a single, polymorphic function, we sometimes have to use a
3121 \emph{shape classifier} (supplied by the client) when composing
3122 transfer functions.
3123 Using a shape classifier may introduce extra @case@ discriminations
3124 every time a transfer function or rewrite function is applied to a
3125 node.
3126 We~don't know how these extra discriminations might affect
3127 performance.
3128 \end{itemize}
3129 In summary, \hoopl\ performs well enough for use in~GHC,
3130 but there is much we don't know. Systematic investigation
3131 is indicated.
3132
3133
3134
3135 \section{Discussion}
3136
3137 We built \hoopl\ in order to combine three good ideas (interleaved
3138 analysis and transformation, optimization fuel, and an applicative
3139 control-flow graph) in a way that could easily be reused by many
3140 compiler writers.
3141 To~evaluate how well we succeeded, we examine how \hoopl\ has been
3142 used,
3143 we~examine the API, and we examine the implementation.
3144
3145 \paragraph{Using \hoopl}
3146
3147 As~suggested by the constant-propagation example in
3148 \figref{const-prop}, \hoopl\ makes it easy to implement many standard
3149 dataflow analyses.
3150 Students using \hoopl\ in a class at Tufts were able to implement
3151 such optimizations as lazy code motion \cite{knoop:lazy-code-motion}
3152 and induction-variable elimination
3153 \cite{cocke-kennedy:operator-strength-reduction} in just a few weeks.
3154 Students at Yale and at Portland State have also succeeded in building
3155 a variety of optimizations.
3156
3157 \hoopl's data types can support optimizations beyond classic
3158 dataflow.
3159 For example, within GHC, \hoopl's graphs are used
3160 to implement optimizations based on control flow,
3161 such as eliminating branch chains.
3162
3163 \hoopl\ is SSA-neutral:
3164 although we know of no attempt to use
3165 \hoopl\ to establish or enforce SSA~invariants,
3166 \hoopl\ makes it easy to include $\phi$-functions in the
3167 representation of first nodes,
3168 and if a transformation preserves SSA~invariants, it will continue to do
3169 so when implemented in \hoopl.
3170
3171 \paragraph{Examining the API}
3172
3173 We hope the our presentation of the API in \secref{api} speaks for
3174 itself,
3175 but there are a couple of properties we think are worth highlighting.
3176 First, it's a good sign that the API provides many higher-order
3177 combinators that make it easier to write client code. % with simple, expressive types.
3178 We~have had space to mention only a few:
3179 @extendJoinDomain@,
3180 @thenFwdRw@, @deepFwdRw@, and @pairFwd@.
3181
3182 Second,
3183 the static encoding of open and closed shapes at compile time worked
3184 out well.
3185 % especially because it applies equally to nodes, blocks, and graphs.
3186 Shapes may
3187 seem like a small refinement, but they helped eliminate a number of
3188 bugs from GHC, and we expect them to help other clients too.
3189 GADTs are a convenient way to express shapes, and for clients
3190 written in Haskell, they are clearly appropriate.
3191 If~one wished to port \hoopl\ to a language without GADTs,
3192 many of the benefits could be realized by making the shapes phantom
3193 types, but without GADTs, pattern matching would be significantly more
3194 tedious and error-prone.
3195
3196
3197 % An~advantage of our ``shapely'' node API is that a client can
3198 % write a \emph{single} transfer function that is polymorphic in shape.
3199 % To~make this design work, however, we \emph{must} have
3200 % the type-level function
3201 % @Fact@ (\figref{api-types}), to express how incoming
3202 % and outgoing facts depend on the shape of a node.
3203 % Without type-level functions, we would have had to force clients to
3204 % use \emph{only} the triple-of-functions interface described in
3205 % \secref{triple-of-functions}.
3206
3207 \paragraph{Examining the implementation}
3208
3209 If you are thinking of adopting \hoopl, you had better consider not
3210 only whether you like the API, but whether, in case of emergency, you
3211 could maintain the implementation.
3212 We~believe that \secref{dfengine} sketches enough to show that \hoopl's
3213 implementation is a clear improvement over previous implementations
3214 of similar ideas.
3215 % The implementation is more difficult to evaluate than the~API.
3216 % Previous implementations of similar ideas have rolled the problems
3217 % into a big ball of mud.
3218 By~decomposing our implementation into @node@, @block@, @body@,
3219 @graph@, @cat@, @fixpoint@, and @mkFRewrite@, we have clearly separated
3220 multiple concerns:
3221 interleaving analysis with rewriting,
3222 throttling rewriting using optimization fuel,
3223 and
3224 computing a fixed point using speculative rewriting.
3225 Because of this separation of concerns,
3226 we believe our implementation will be much easier to maintain than
3227 anything that preceded it.
3228
3229 Another good sign is that we have been able to make substantial
3230 changes in the implementation without changing the API.
3231 For example, in addition to ``@concatMap@ style,'' we have also
3232 implemented @arfGraph@ in ``fold style'' and in continuation-passing
3233 style.
3234 Which style is preferred is a matter of taste, and possibly
3235 a matter of performance.
3236
3237
3238
3239
3240 \iffalse
3241
3242 (We have also implemented a ``fold style,'' but because the types are
3243 a little less intuitive, we are sticking with @concatMap@ style for now.)
3244
3245
3246 > Some numbers, I have used it nine times, and would need the general fold
3247 > once to define blockToNodeList (or CB* equivalent suggested by you).
3248 > (We are using it in GHC to
3249 > - computing hash of the blocks from the nodes
3250 > - finding the last node of a block
3251 > - converting block to the old representation (2x)
3252 > - computing interference graph
3253 > - counting Area used by a block (2x)
3254 > - counting stack high-water mark for a block
3255 > - prettyprinting block)
3256
3257
3258 type-parameter hell, newtype hell, typechecking hell, instance hell,
3259 triple hell
3260
3261 \fi
3262
3263
3264 % We have spent six years implementing and reimplementing frameworks for
3265 % dataflow analysis and transformation.
3266 % This formidable design problem taught us
3267 % two kinds of lessons:
3268 % we learned some very specific lessons about representations and
3269 % algorithms for optimizing compilers,
3270 % and we were forcibly reminded of some very general, old lessons that are well
3271 % known not just to functional programmers, but to programmers
3272 % everywhere.
3273
3274
3275
3276 %%%% \remark{Orphaned: but for transfer functions that
3277 %%%% approximate weakest preconditions or strongest postconditions,
3278 %%%% monotonicity falls out naturally.}
3279 %%%%
3280 %%%%
3281 %%%% In conclusion we offer the following lessons from the experience of designing
3282 %%%% and implementing \ourlib{}.
3283 %%%% \begin{itemize}
3284 %%%% \item
3285 %%%% Although we have not stressed this point, there is a close connection
3286 %%%% between dataflow analyses and program logic:
3287 %%%% \begin{itemize}
3288 %%%% \item
3289 %%%% A forward dataflow analysis is represented by a predicate transformer
3290 %%%% that is related to \emph{strongest postconditions}
3291 %%%% \cite{floyd:meaning}.\footnote
3292 %%%% {In Floyd's paper the output of the predicate transformer is called
3293 %%%% the \emph{strongest verifiable consequent}, not the ``strongest
3294 %%%% postcondition.''}
3295 %%%% \item
3296 %%%% A backward dataflow analysis is represented by a predicate transformer
3297 %%%% that is related to \emph{weakest preconditions} \cite{dijkstra:discipline}.
3298 %%%% \end{itemize}
3299 %%%% Logicians write down the predicate transformers for the primitive
3300 %%%% program fragments, and then use compositional rules to ``lift'' them
3301 %%%% to a logic for whole programs. In the same way \ourlib{} lets the client
3302 %%%% write simple predicate transformers,
3303 %%%% and local rewrites based on those assertions, and ``lifts'' them to entire
3304 %%%% function bodies with arbitrary control flow.
3305
3306 \iffalse
3307
3308
3309 Reuse requires abstraction, and as is well known,
3310 designing good abstractions is challenging.
3311 \hoopl's data types and the functions over those types have been
3312 through {dozens} of revisions.
3313 \remark{dozens alert}
3314 As~we were refining our design, we~found it invaluable to operate in
3315 two modes:
3316 In the first mode, we designed, built, and used a framework as an
3317 important component of a real compiler (first Quick~{\PAL}, then GHC).
3318 In the second mode, we designed and built a standalone library, then
3319 redesigned and rebuilt it, sometimes going through several significant
3320 changes in a week.
3321 Operating in the first mode---inside a live compiler---forced us to
3322 make sure that no corners were cut, that we were solving a real
3323 problem, and that we did not inadvertently
3324 cripple some other part of the compiler.
3325 Operating in the second mode---as a standalone library---enabled us to
3326 iterate furiously, trying out many more ideas than would have
3327 been possible in the first mode.
3328 Alternating between these two modes has led to a
3329 better design than operating in either mode alone.
3330
3331 %% We were forcibly reminded of timeless truths:
3332 It is a truth universally acknowledged that
3333 interfaces are more important than implementations and data
3334 is more important than code.
3335 This truth is reflected in this paper, in which
3336 we
3337 have given \hoopl's API three times as much space as \hoopl's implementation.
3338
3339 We have evaluate \hoopl's API through small, focused classroom
3340 projects and by using \hoopl\ in the back end of the Glasgow Haskell
3341 Compiler.
3342
3343
3344
3345 We were also reminded that Haskell's type system (polymorphism, GADTs,
3346 higher-order functions, type classes, and so on) is a remarkably
3347 effective
3348 language for thinking about data and code---and that
3349 Haskell lacks a language of interfaces (like ML's signatures) that
3350 would make it equally effective for thinking about APIs at a larger scale.
3351 Still, as usual, the types were a remarkable aid to writing the code:
3352 when we finally agreed on the types presented above, the
3353 code almost wrote itself.
3354
3355 Types are widely appreciated at ICFP, but here are three specific
3356 examples of how types helped us:
3357 \begin{itemize}
3358 \item
3359 Reuse is enabled by representation-independence, which in a functional
3360 language is
3361 expressed through parametric polymorphism.
3362 Making \ourlib{} polymorphic in the nodes
3363 made the code simpler, easier to understand, and easier to maintain.
3364 In particular, it forced us to make explicit \emph{exactly} what
3365 \ourlib\ must know about nodes, and to embody that knowledge in the
3366 @NonLocal@ type class (\secref{nonlocal-class}).
3367 \item
3368 \remark{too much? Simon: better?}
3369 %
3370 % this paper is just not about run-time performance ---NR
3371 %
3372 %%%% Moreover, the implementation is faster than it would otherwise be,
3373 %%%% because, say, a @(Fact O f)e@ is known to be just an @f@ rather than
3374 %%%% being a sum type that must be tested (with a statically known outcome!).
3375 %
3376 Giving the \emph{same} shapes
3377 to nodes, blocks, and graphs helped our
3378 thinking and helped to structure the implementation.
3379 \item
3380 \end{itemize}
3381
3382 \fi
3383
3384 \paragraph{Alternative interfaces and implementations}
3385 \newtext{
3386 Why do we allow the client to define the monad~@m@ used in rewrite
3387 functions and in @analyzeAndRewriteFwdBody@?
3388 \remark{OK, I no longer find this~\P shitty\ldots}
3389 The~obvious alternative, which we have implemented and explored, is to require
3390 \hoopl's clients to use a monad provided by \hoopl.
3391 This alternative has advantages:
3392 we implement
3393 @checkpoint@, @restart@,
3394 @setFuel@, and @getFuel@,
3395 and we make sure they are right, without placing any obligation on
3396 the client.
3397 Our~alternative API also provides a supply of unique names.
3398 But~we are wary of mandating the representation of such a central
3399 abstraction as a supply of unique names;
3400 no~matter how well designed the~API,
3401 if it does not accomodate design choices already embodied in existing
3402 compilers, it won't be used.
3403 %
3404 And~experience has shown us that for the client, the convenience
3405 of a custom monad is well worth the cognitive costs of understanding
3406 the more complex API and of meeting the contracts for
3407 instances of @FuelMonad@ and @CkpointMonad@.
3408 As~a very simple example, a client might want one set of
3409 unique names for labels and a different set for registers.
3410 Or~in order to judge the effectiveness of an optimization,
3411 a client might want to log how many rewrites take place, or in what
3412 functions they take place.
3413 As~a more ambitious example, \citet{runciman:increasing-prs} describes
3414 Primitive Redex
3415 Speculation, a code-improving transformation that can create new
3416 function definitions.
3417 A~\hoopl\ client implementing this transformation would define a monad
3418 to accumulate the new definitions.
3419 In~these examples, the law governing @checkpoint@ and @restart@
3420 ensures that a speculative rewrite, if later rolled back, does not
3421 create a log entry or a function definition.
3422 }
3423
3424 \begin{ntext}
3425 In~addition to major API variations like ``who controls the monad,''
3426 we have considered myriad minor variations,
3427 mostly are driven by
3428 implementation concerns.
3429 There are some variations in implementation that don't affect the
3430 interface, but there are also many that have small effects.
3431 We~have space only to raise a few of the questions.
3432 An~earlier implementation of the fixed-point computation stored the
3433 ``current'' @FactBase@ in a monad; we~find it easier to understand and
3434 maintain the code that passes the current @FactBase@ as an argument.
3435 \secref{implementation} describes an implementation of @arfGraph@
3436 in what we call ``@concatMap@ style'' (\secref{concat-map-style}).
3437 We~have also implemented a ``fold style'' and continuation-passing
3438 style.
3439 No~one of these styles obviously makes all the code easiest to read
3440 and understand: @concatMap@ style is relatively straightforward throughout;
3441 fold~style is similar overall but different in detail;
3442 and continuation-passing style is very clear and elegant to those who
3443 like continuations, but a bit baffling to those who don't.
3444 Another question is which value constructors should be
3445 polymorphic in the shapes of their arguments and which should be
3446 monomorphic.
3447 We~experimented with a polymorphic
3448 \begin{code}
3449 `BNode :: n e x -> Block n e x
3450 \end{code}
3451 but we found that there are significant advantages to knowing the type
3452 of every node statically, using purely local information---so instead
3453 we use
3454 the three monomorphic constructors @BFirst@, @BMiddle@, and @BLast@
3455 (\figref{graph}).
3456 Similar questions arise about the polymorphic @BCat@ and about the
3457 graph constructors, and even among ourselves, we are divided about how
3458 best to answer them.
3459 Yet another question is whether it is worthwhile to save a level of
3460 indirection by providing a cons-like constructor to concatenate a node
3461 and a block.
3462 Perhaps some of these questions can be answered by appealing to
3463 performance, but the experiments that will provide the answers have
3464 yet to be devised.
3465 \end{ntext}
3466
3467
3468
3469
3470 \paragraph{Final remarks}
3471
3472 Dataflow optimization is usually described as a way to improve imperative
3473 programs by mutating control-flow graphs.
3474 Such transformations appear very different from the tree rewriting
3475 that functional languages are so well known for, and that makes
3476 \ifhaskellworkshop
3477 Haskell
3478 \else
3479 functional languages
3480 \fi
3481 so attractive for writing other parts of compilers.
3482 But even though dataflow optimization looks very different from
3483 what we are used to,
3484 writing a dataflow optimizer
3485 in
3486 \ifhaskellworkshop
3487 Haskell
3488 \else
3489 a pure functional language
3490 \fi
3491 was a win:
3492 % We could not possibly have conceived \ourlib{} in C++.
3493 we had to make every input and output explicit,
3494 and we had a strong incentive to implement things compositionally.
3495 Using Haskell helped us make real improvements in the implementation of
3496 some very sophisticated ideas.
3497 % %%
3498 % %%
3499 % %% In~a pure functional language, not only do we know that
3500 % %% no data structure will be unexpectedly mutated,
3501 % %% but we are forced to be
3502 % %% explicit about every input and output,
3503 % %% and we are encouraged to implement things compositionally.
3504 % %% This kind of thinking has helped us make
3505 % %% significant improvements to the already tricky work of Lerner, Grove,
3506 % %% and Chambers:
3507 % %% per-function control of shallow vs deep rewriting
3508 % %% (\secref{shallow-vs-deep}),
3509 % %% optimization fuel (\secref{fuel}),
3510 % %% and transparent management of unreachable blocks (\secref{arf-body}).
3511 % We~trust that the improvements are right only because they are
3512 % implemented in separate
3513 % parts of the code that cannot interact except through
3514 % explicit function calls.
3515 % %% %%
3516 % %% %%An ancestor of \ourlib{} is in the Glasgow Haskell Compiler today,
3517 % %% %%in version~6.12.
3518 % %% With this new, improved design in hand, we are now moving back to
3519 % %% live-compiler mode, pushing \hoopl\ into version
3520 % %% 6.13 of the Glasgow Haskell Compiler.
3521
3522
3523 \acks
3524
3525 Brian Huffman and Graham Hutton helped with algebraic laws.
3526 Sukyoung Ryu told us about Primitive Redex Speculation.
3527 Several anonymous reviewers helped improve the presentation.
3528 % , especially reviewer~C, who suggested better language with which to
3529 % describe our work.
3530
3531 The first and second authors were funded
3532 by a grant from Intel Corporation and
3533 by NSF awards CCF-0838899 and CCF-0311482.
3534 These authors also thank Microsoft Research Ltd, UK, for funding
3535 extended visits to the third author.
3536
3537
3538 \makeatother
3539
3540 \providecommand\includeftpref{\relax} %% total bafflement -- workaround
3541 \IfFileExists{nrbib.tex}{\bibliography{cs,ramsey}}{\bibliography{dfopt}}
3542 \bibliographystyle{plainnatx}
3543
3544
3545 \clearpage
3546
3547 \appendix
3548
3549 % omit LabelSet :: *
3550 % omit LabelMap :: *
3551 % omit delFromFactBase :: FactBase f -> [(Label,f)] -> FactBase f
3552 % omit elemFactBase :: Label -> FactBase f -> Bool
3553 % omit elemLabelSet :: Label -> LabelSet -> Bool
3554 % omit emptyLabelSet :: LabelSet
3555 % omit factBaseLabels :: FactBase f -> [Label]
3556 % omit extendFactBase :: FactBase f -> Label -> f -> FactBase f
3557 % omit extendLabelSet :: LabelSet -> Label -> LabelSet
3558 % omit lookupFact :: FactBase f -> Label -> Maybe f
3559 % omit factBaseList :: FactBase f -> [(Label, f)]
3560
3561 %% \section{Code for \textmd{\texttt{fixpoint}}}
3562 %% \label{app:fixpoint}
3563 %%
3564 %% {\def\baselinestretch{0.95}\hfuzz=20pt
3565 %% \begin{smallcode}
3566 %% data `TxFactBase n f
3567 %% = `TxFB { `tfb_fbase :: FactBase f
3568 %% , `tfb_rg :: DG n f C C -- Transformed blocks
3569 %% , `tfb_cha :: ChangeFlag
3570 %% , `tfb_lbls :: LabelSet }
3571 %% -- Set the tfb_cha flag iff
3572 %% -- (a) the fact in tfb_fbase for or a block L changes
3573 %% -- (b) L is in tfb_lbls.
3574 %% -- The tfb_lbls are all Labels of the *original*
3575 %% -- (not transformed) blocks
3576 %%
3577 %% `updateFact :: DataflowLattice f -> LabelSet -> (Label, f)
3578 %% -> (ChangeFlag, FactBase f)
3579 %% -> (ChangeFlag, FactBase f)
3580 %% updateFact ^lat ^lbls (lbl, ^new_fact) (^cha, fbase)
3581 %% | NoChange <- ^cha2 = (cha, fbase)
3582 %% | lbl `elemLabelSet` lbls = (SomeChange, new_fbase)
3583 %% | otherwise = (cha, new_fbase)
3584 %% where
3585 %% (cha2, ^res_fact)
3586 %% = case lookupFact fbase lbl of
3587 %% Nothing -> (SomeChange, new_fact)
3588 %% Just ^old_fact -> fact_extend lat old_fact new_fact
3589 %% ^new_fbase = extendFactBase fbase lbl res_fact
3590 %%
3591 %% fixpoint :: forall n f. NonLocal n
3592 %% => Bool -- Going forwards?
3593 %% -> DataflowLattice f
3594 %% -> (Block n C C -> FactBase f
3595 %% -> FuelMonad (DG n f C C, FactBase f))
3596 %% -> FactBase f -> [(Label, Block n C C)]
3597 %% -> FuelMonad (DG n f C C, FactBase f)
3598 %% fixpoint ^is_fwd lat ^do_block ^init_fbase ^blocks
3599 %% = do { ^fuel <- getFuel
3600 %% ; ^tx_fb <- loop fuel init_fbase
3601 %% ; return (tfb_rg tx_fb,
3602 %% tfb_fbase tx_fb `delFromFactBase` blocks) }
3603 %% -- The outgoing FactBase contains facts only for
3604 %% -- Labels *not* in the blocks of the graph
3605 %% where
3606 %% `tx_blocks :: [(Label, Block n C C)]
3607 %% -> TxFactBase n f -> FuelMonad (TxFactBase n f)
3608 %% tx_blocks [] tx_fb = return tx_fb
3609 %% tx_blocks ((lbl,blk):bs) tx_fb = tx_block lbl blk tx_fb
3610 %% >>= tx_blocks bs
3611 %%
3612 %% `tx_block :: Label -> Block n C C
3613 %% -> TxFactBase n f -> FuelMonad (TxFactBase n f)
3614 %% tx_block ^lbl ^blk tx_fb@(TxFB { tfb_fbase = fbase
3615 %% , tfb_lbls = lbls
3616 %% , tfb_rg = ^blks
3617 %% , tfb_cha = cha })
3618 %% | is_fwd && not (lbl `elemFactBase` fbase)
3619 %% = return tx_fb -- Note [Unreachable blocks]
3620 %% | otherwise
3621 %% = do { (rg, ^out_facts) <- do_block blk fbase
3622 %% ; let (^cha', ^fbase')
3623 %% = foldr (updateFact lat lbls) (cha,fbase)
3624 %% (factBaseList out_facts)
3625 %% ; return (TxFB { tfb_lbls = extendLabelSet lbls lbl
3626 %% , tfb_rg = rg `DGCatC` blks
3627 %% , tfb_fbase = fbase'
3628 %% , tfb_cha = cha' }) }
3629 %%
3630 %% loop :: Fuel -> FactBase f -> FuelMonad (TxFactBase n f)
3631 %% `loop fuel fbase
3632 %% = do { let ^init_tx_fb = TxFB { tfb_fbase = fbase
3633 %% , tfb_cha = NoChange
3634 %% , tfb_rg = DGNil
3635 %% , tfb_lbls = emptyLabelSet}
3636 %% ; tx_fb <- tx_blocks blocks init_tx_fb
3637 %% ; case tfb_cha tx_fb of
3638 %% NoChange -> return tx_fb
3639 %% SomeChange -> setFuel fuel >>
3640 %% loop fuel (tfb_fbase tx_fb) }
3641 %% \end{smallcode}
3642 %% \par
3643 %% } % end \baselinestretch
3644
3645
3646 \section{Index of defined identifiers}
3647
3648 This appendix lists every nontrivial identifier used in the body of
3649 the paper.
3650 For each identifier, we list the page on which that identifier is
3651 defined or discussed---or when appropriate, the figure (with line
3652 number where possible).
3653 For those few identifiers not defined or discussed in text, we give
3654 the type signature and the page on which the identifier is first
3655 referred to.
3656
3657 Some identifiers used in the text are defined in the Haskell Prelude;
3658 for those readers less familiar with Haskell (possible even at the
3659 Haskell Symposium!), these identifiers are
3660 listed in Appendix~\ref{sec:prelude}.
3661
3662 \newcommand\dropit[3][]{}
3663
3664 \newcommand\hsprelude[2]{\noindent
3665 \texttt{#1} defined in the Haskell Prelude\\}
3666 \let\hsprelude\dropit
3667
3668 \newcommand\hspagedef[3][]{\noindent
3669 \texttt{#2} defined on page~\pageref{#3}.\\}
3670 \newcommand\omithspagedef[3][]{\noindent
3671 \texttt{#2} not shown (but see page~\pageref{#3}).\\}
3672 \newcommand\omithsfigdef[3][]{\noindent
3673 \texttt{#2} not shown (but see Figure~\ref{#3} on page~\pageref{#3}).\\}
3674 \newcommand\hsfigdef[3][]{%
3675 \noindent
3676 \ifx!#1!%
3677 \texttt{#2} defined in Figure~\ref{#3} on page~\pageref{#3}.\\
3678 \else
3679 \texttt{#2} defined on \lineref{#1} of Figure~\ref{#3} on page~\pageref{#3}.\\
3680 \fi
3681 }
3682 \newcommand\hstabdef[3][]{%
3683 \noindent
3684 \ifx!#1!
3685 \texttt{#2} defined in Table~\ref{#3} on page~\pageref{#3}.\\
3686 \else
3687 \texttt{#2} defined on \lineref{#1} of Table~\ref{#3} on page~\pageref{#3}.\\
3688 \fi
3689 }
3690 \newcommand\hspagedefll[3][]{\noindent
3691 \texttt{#2} {let}- or $\lambda$-bound on page~\pageref{#3}.\\}
3692 \newcommand\hsfigdefll[3][]{%
3693 \noindent
3694 \ifx!#1!%
3695 \texttt{#2} {let}- or $\lambda$-bound in Figure~\ref{#3} on page~\pageref{#3}.\\
3696 \else
3697 \texttt{#2} {let}- or $\lambda$-bound on \lineref{#1} of Figure~\ref{#3} on page~\pageref{#3}.\\
3698 \fi
3699 }
3700
3701 \newcommand\nothspagedef[3][]{\notdefd\ndpage{#1}{#2}{#3}}
3702 \newcommand\nothsfigdef[3][]{\notdefd\ndfig{#1}{#2}{#3}}
3703 \newcommand\nothslinedef[3][]{\notdefd\ndline{#1}{#2}{#3}}
3704
3705 \newcommand\ndpage[3]{\texttt{#2}~(p\pageref{#3})}
3706 \newcommand\ndfig[3]{\texttt{#2}~(Fig~\ref{#3},~p\pageref{#3})}
3707 \newcommand\ndline[3]{%
3708 \ifx!#1!%
3709 \ndfig{#1}{#2}{#3}%
3710 \else
3711 \texttt{#2}~(Fig~\ref{#3}, line~\lineref{#1}, p\pageref{#3})%
3712 \fi
3713 }
3714
3715 \newif\ifundefinedsection\undefinedsectionfalse
3716
3717 \newcommand\notdefd[4]{%
3718 \ifundefinedsection
3719 , #1{#2}{#3}{#4}%
3720 \else
3721 \undefinedsectiontrue
3722 \par
3723 \section{Undefined identifiers}
3724 #1{#2}{#3}{#4}%
3725 \fi
3726 }
3727
3728 \begingroup
3729 \raggedright
3730
3731 \input{defuse}%
3732 \ifundefinedsection.\fi
3733
3734 \undefinedsectionfalse
3735
3736
3737 \renewcommand\hsprelude[2]{\noindent
3738 \ifundefinedsection
3739 , \texttt{#1}%
3740 \else
3741 \undefinedsectiontrue
3742 \par
3743 \section{Identifiers defined in Haskell Prelude or a standard library}\label{sec:prelude}
3744 \texttt{#1}%
3745 \fi
3746 }
3747 \let\hspagedef\dropit
3748 \let\omithspagedef\dropit
3749 \let\omithsfigdef\dropit
3750 \let\hsfigdef\dropit
3751 \let\hstabdef\dropit
3752 \let\hspagedefll\dropit
3753 \let\hsfigdefll\dropit
3754 \let\nothspagedef\dropit
3755 \let\nothsfigdef\dropit
3756 \let\nothslinedef\dropit
3757
3758 \input{defuse}
3759 \ifundefinedsection.\fi
3760
3761
3762
3763 \endgroup
3764
3765
3766 \iffalse
3767
3768 \section{Dataflow-engine functions}
3769
3770
3771 \begin{figure*}
3772 \setcounter{codeline}{0}
3773 \begin{numberedcode}
3774 \end{numberedcode}
3775 \caption{The forward iterator}
3776 \end{figure*}
3777
3778 \begin{figure*}
3779 \setcounter{codeline}{0}
3780 \begin{numberedcode}
3781 \end{numberedcode}
3782 \caption{The forward actualizer}
3783 \end{figure*}
3784
3785
3786 \fi
3787
3788
3789
3790 \end{document}
3791
3792
3793
3794
3795 THE FUEL PROBLEM:
3796
3797
3798 Here is the problem:
3799
3800 A graph has an entry sequence, a body, and an exit sequence.
3801 Correctly computing facts on and flowing out of the body requires
3802 iteration; computation on the entry and exit sequences do not, since
3803 each is connected to the body by exactly one flow edge.
3804
3805 The problem is to provide the correct fuel supply to the combined
3806 analysis/rewrite (iterator) functions, so that speculative rewriting
3807 is limited by the fuel supply.
3808
3809 I will number iterations from 1 and name the fuel supplies as
3810 follows:
3811
3812 f_pre fuel remaining before analysis/rewriting starts
3813 f_0 fuel remaining after analysis/rewriting of the entry sequence
3814 f_i, i>0 fuel remaining after iteration i of the body
3815 f_post fuel remaining after analysis/rewriting of the exit sequence
3816
3817 The issue here is that only the last iteration of the body 'counts'.
3818 To formalize, I will name fuel consumed:
3819
3820 C_pre fuel consumed by speculative rewrites in entry sequence
3821 C_i fuel consumed by speculative rewrites in iteration i of body
3822 C_post fuel consumed by speculative rewrites in exit sequence
3823
3824 These quantities should be related as follows:
3825
3826 f_0 = f_pre - C_pref
3827 f_i = f_0 - C_i where i > 0
3828 f_post = f_n - C_post where iteration converges after n steps
3829
3830 When the fuel supply is passed explicitly as parameter and result, it
3831 is fairly easy to see how to keep reusing f_0 at every iteration, then
3832 extract f_n for use before the exit sequence. It is not obvious to me
3833 how to do it cleanly using the fuel monad.
3834
3835
3836 Norman