\newif\ifhaskellworkshop\haskellworkshoptrue
\newif\ifbanner\bannertrue
\iffalse
Payload: alternatives for functions polymorphic in node type
(n C O -> a, n C O -> b, n C O -> c)
vs
(forall e x . n e x -> ShapeTag e x, forall e x . n e x -> result e x)
where
data ShapeTag e x where
First :: ShapeTag C O
Middle :: ShapeTag O O
Last :: ShapeTag O C
result C O = a
result O O = b
result O C = c
The function returning ShapeTag can be passed in a type-class
dictionary.
Say something about the proliferation of heavyweight type signatures
required for GADT pattern matches. When the signature is three times
the size of the function, something is wrong...
I'm going to leave this point out, because in all the client code
we've written, we are (a) matching only on a node, and (b) matching on
all cases. In this scenario the GADT exhaustiveness checker provides
no additional benefits. Indeed, GADTs can be an irritant to the
client: in many pattern matches, GADTs make it impossible to default
cases.
I was thinking again about unwrapped nodes, cons-lists, snoc-lists,
and tree fringe. I think there's an analogy with ordinary lists, and
the analogy is 'concatMap' vs 'fold'. Just as with lists, the common
case of 'concatMap (\a -> [a])' does a lot of unnecessary wrapping and
unwrapping of elements. We nevertheless prefer 'concatMap' because it
provides better separation of concerns. But this analogy suggests
several ideas:
- Can block processing be written in higher-order fashion?
- Can it be done in both styles (concatMap *and* fold)?
- Once we have framed the problem in these terms, can we write
fold-style cold that is not too hard to understand and maintain?
- Finally, is there an analog of stream fusion that would work for
control-flow graphs, enabling us to write some graph combinators
that be both perspicuous and efficient?
These are good observations for the paper and for future work.
----------------------------------------------------------------
P.S. The three of us should have a nice little Skype chat about
higher-rank types. A lot of the questions still at issue boil down to
the following observations:
- It's really convenient for the *implementation* of Hoopl to put
forall to the left of an arrow. Polymorphic functions as
arguments are powerful and keen, and they make it really easy for
Hoopl to do its job.
- It's a bid inconvenient for a *client* of Hoopl to be forced to
supply functions that are polymorphic in shape. All constructors
have to be written out by hand; defaulting is not possible. (Or
at least John and I haven't figured out how to do it.)
- Using higher-rank types in Hoopl's interface makes some very
desirable things downright impossible:
- One can't write a generic dominator analysis with type
dom :: NonLocal n => FwdPass n Dominators
- One can't write a generic debugging wrapper of type
debug :: (Show n, Show f)
=> TraceFn -> WhatToTrace -> FwdPass n f -> FwdPass n f
- One can't write a combinator of type
product :: FwdPass n f -> FwdPass n f' -> FwdPass n (f, f')
I submit that these things are all very desirable.
I'm therefore strongly in favor of removing the *requirement* that a
FwdPass include transfer and rewrite functions that are polymorphic in
shape. It will be a bit tedious to return to triples of functions,
but for those lucky clients who *can* write polymorphic functions and
who wish to, we can provide injection functions.
I should stress that I believe the tedium can be contained within
reasonable bounds---for example, the arfXXX functions that are
internal to Hoopl all remain polymorphic in shape (although not
higher-rank any longer).
\fi
%
% TODO
%
%
% AGraph = graph under construction
% Graph = replacement graph
%
%%% Hoopl assumes that the replacement graph for a node open on exit
%%% doesn't contain any additional exits
%%%
%%% introduce replacement graph in section 3, after graph.
%%% it has arbitrarily complicated internal control flow, but only
%%% one exit (if open on exit)
%%%
%%% a rquirement on the client that is not checked statically.
\input{dfoptdu.tex}
\newif\ifpagetuning \pagetuningtrue % adjust page breaks
\newif\ifnoauthornotes \noauthornotesfalse
\newif\iftimestamp\timestamptrue % show MD5 stamp of paper
% \timestampfalse % it's post-submission time
\IfFileExists{timestamp.tex}{}{\timestampfalse}
\newif\ifnotcutting\notcuttingfalse
\newif\ifgenkill\genkillfalse % have a section on gen and kill
\genkillfalse
\newif\ifnotesinmargin \notesinmarginfalse
\IfFileExists{notesinmargin.tex}{\notesinmargintrue}{\relax}
\documentclass[blockstyle,natbib,preprint]{sigplanconf}
\newcommand\ourlib{Hoopl}
% higher-order optimization library
% ('Hoople' was taken -- see hoople.org)
\let\hoopl\ourlib
\newcommand\ag{\ensuremath{\mathit{ag}}}
\renewcommand\ag{\ensuremath{g}} % not really seeing that 'ag' is helpful here ---NR
\newcommand\rw{\ensuremath{\mathit{rw}}}
% l2h substitution ourlib Hoopl
% l2h substitution hoopl Hoopl
\newcommand\fs{\ensuremath{\mathit{fs}}} % dataflow facts, possibly plural
\newcommand\vfilbreak[1][\baselineskip]{%
\vskip 0pt plus #1 \penalty -200 \vskip 0pt plus -#1 }
\usepackage{alltt}
\usepackage{array}
\usepackage{afterpage}
\newcommand\lbr{\char`\{}
\newcommand\rbr{\char`\}}
\clubpenalty=10000
\widowpenalty=10000
\usepackage{verbatim} % allows to define \begin{smallcode}
\newenvironment{smallcode}{\par\unskip\small\verbatim}{\endverbatim}
\newenvironment{fuzzcode}[1]{\par\unskip\hfuzz=#1 \verbatim}{\endverbatim}
\newenvironment{smallfuzzcode}[1]{\par\unskip\small\hfuzz=#1 \verbatim}{\endverbatim}
\newenvironment{smallttcode}{\par\unskip\small\alltt}{\endalltt}
\newcommand\smallverbatiminput[1]{{\small\verbatiminput{#1}}}
\newcommand\smallfuzzverbatiminput[2]{{\small\hfuzz=#1 \verbatiminput{#2}}}
\newcommand\lineref[1]{line~\ref{line:#1}}
\newcommand\linepairref[2]{lines \ref{line:#1}~and~\ref{line:#2}}
\newcommand\linerangeref[2]{\mbox{lines~\ref{line:#1}--\ref{line:#2}}}
\newcommand\Lineref[1]{Line~\ref{line:#1}}
\newcommand\Linepairref[2]{Lines \ref{line:#1}~and~\ref{line:#2}}
\newcommand\Linerangeref[2]{\mbox{Lines~\ref{line:#1}--\ref{line:#2}}}
\makeatletter
\def \@copyrightspace {%
\@float{copyrightbox}[b]%
\vbox to 0.8in{% less than 1in, please
\vfill
\parbox[b]{20pc}{%
\scriptsize
\if \@preprint
[Copyright notice will appear here
once 'preprint' option is removed.]\par
\else
\@toappear
\fi
\if \@reprint
\noindent Reprinted from \@conferencename,
\@proceedings,
\@conferenceinfo,
pp.~\number\thepage--\pageref{sigplanconf@finalpage}.\par
\fi}}%
\end@float}
\let\c@table=
\c@figure % one counter for tables and figures, please
\newcommand\setlabel[1]{%
\setlabel@#1!!\@endsetlabel
}
\def\setlabel@#1!#2!#3\@endsetlabel{%
\ifx*#1*% line begins with label or is empty
\ifx*#2*% line is empty
\verbatim@line{}%
\else
\@stripbangs#3\@endsetlabel%
\label{line:#2}%
\fi
\else
\@stripbangs#1!#2!#3\@endsetlabel%
\fi
}
\def\@stripbangs#1!!\@endsetlabel{%
\verbatim@line{#1}%
}
\verbatim@line{hello mama}
\newcommand{\numberedcodebackspace}{0.5\baselineskip}
\newcounter{codeline}
\newenvironment{numberedcode}
{\endgraf
\def\verbatim@processline{%
\noindent
\expandafter\ifx\expandafter+\the\verbatim@line+ % blank line
%{\small\textit{\def\rmdefault{cmr}\rmfamily\phantom{00}\phantom{: \,}}}%
\else
\refstepcounter{codeline}%
{\small\textit{\def\rmdefault{cmr}\rmfamily\phantom{00}\llap{\arabic{codeline}}: \,}}%
\fi
\expandafter\setlabel\expandafter{\the\verbatim@line}%
\expandafter\ifx\expandafter+\the\verbatim@line+ % blank line
\vspace*{-\numberedcodebackspace}\par%
\else
\the\verbatim@line\par
\fi}%
\verbatim
}
{\endverbatim}
\makeatother
\newcommand\arrow{\rightarrow}
\newcommand\join{\sqcup}
\newcommand\slotof[1]{\ensuremath{s_{#1}}}
\newcommand\tempof[1]{\ensuremath{t_{#1}}}
\let\tempOf=\tempof
\let\slotOf=\slotof
\makeatletter
\newcommand{\nrmono}[1]{%
{\@tempdima = \fontdimen2\font\relax
\texttt{\spaceskip = 1.1\@tempdima #1}}}
\makeatother
\usepackage{times} % denser fonts
\renewcommand{\ttdefault}{aett} % \texttt that goes better with times fonts
\usepackage{enumerate}
\usepackage{url}
\usepackage{graphicx}
\usepackage{natbib} % redundant for Simon
\bibpunct();A{},
\let\cite\citep
\let\citeyearnopar=\citeyear
\let\citeyear=\citeyearpar
\usepackage[ps2pdf,bookmarksopen,breaklinks,pdftitle=Hoopl]{hyperref}
\usepackage{breakurl} % enables \burl
\newcommand\naive{na\"\i ve}
\newcommand\naively{na\"\i vely}
\newcommand\Naive{Na\"\i ve}
\usepackage{amsfonts}
\newcommand\naturals{\ensuremath{\mathbb{N}}}
\newcommand\true{\ensuremath{\mathbf{true}}}
\newcommand\implies{\supseteq} % could use \Rightarrow?
\newcommand\PAL{\mbox{C{\texttt{-{}-}}}}
\newcommand\high[1]{\mbox{\fboxsep=1pt \smash{\fbox{\vrule height 6pt
depth 0pt width 0pt \leavevmode \kern 1pt #1}}}}
\usepackage{tabularx}
%%
%% 2009/05/10: removed 'float' package because it breaks multiple
%% \caption's per {figure} environment. ---NR
%%
%% % Put figures in boxes --- WHY??? --NR
%% \usepackage{float}
%% \floatstyle{boxed}
%% \restylefloat{figure}
%% \restylefloat{table}
% ON LINE THREE, set \noauthornotestrue to suppress notes (or not)
%\newcommand{\qed}{QED}
\ifnotesinmargin
\long\def\authornote#1{%
\ifvmode
\marginpar{\raggedright\hbadness=10000
\parindent=8pt \parskip=2pt
\def\baselinestretch{0.8}\tiny
\itshape\noindent #1\par}%
\else
\unskip\raisebox{-3.5pt}{\rlap{$\scriptstyle\diamond$}}%
\marginpar{\raggedright\hbadness=10000
\parindent=8pt \parskip=2pt
\def\baselinestretch{0.8}\tiny
\itshape\noindent #1\par}%
\fi}
\else
% Simon: please set \notesinmarginfalse on the first line
\long\def\authornote#1{{\em #1\/}}
\fi
\ifnoauthornotes
\def\authornote#1{\unskip\relax}
\fi
\long\def\newtext#1{{\mbox{$\clubsuit$}}{\slshape\ignorespaces#1}{\mbox{$\clubsuit$}}}
\newenvironment{ntext}{\mbox{$\clubsuit$}\slshape\ignorespaces}{\mbox{$\clubsuit$}}
\newcommand{\simon}[1]{\authornote{SLPJ: #1}}
\newcommand{\norman}[1]{\authornote{NR: #1}}
\let\remark\norman
\def\finalremark#1{\relax}
\let \finalremark \remark % uncomment after submission
\newcommand{\john}[1]{\authornote{JD: #1}}
\newcommand{\todo}[1]{\textbf{To~do:} \emph{#1}}
\newcommand\delendum[1]{\relax\ifvmode\else\unskip\fi\relax}
\newcommand\secref[1]{Section~\ref{sec:#1}}
\newcommand\secreftwo[2]{Sections \ref{sec:#1}~and~\ref{sec:#2}}
\newcommand\seclabel[1]{\label{sec:#1}}
\newcommand\figref[1]{Figure~\ref{fig:#1}}
\newcommand\figreftwo[2]{Figures \ref{fig:#1}~and~\ref{fig:#2}}
\newcommand\figlabel[1]{\label{fig:#1}}
\newcommand\tabref[1]{Table~\ref{tab:#1}}
\newcommand\tablabel[1]{\label{tab:#1}}
\newcommand{\CPS}{\textbf{StkMan}} % Not sure what to call it.
\usepackage{code} % At-sign notation
\IfFileExists{timestamp.tex}{\input{timestamp}}{}
\iftimestamp
\preprintfooter{\mdfivestamp}
\fi
\hyphenation{there-by}
\renewcommand{\floatpagefraction}{0.9} % must be less than \topfraction
\renewcommand{\topfraction}{0.95}
\renewcommand{\textfraction}{0.05}
\begin{document}
%\title{\ourlib: Dataflow Optimization Made Simple}
\title{\ourlib: A Modular, Reusable Library for\\ Dataflow Analysis and Transformation}
%\title{Implementing Dataflow Analysis and Optimization by Lifting Node Functions to Basic Blocks and Control-Flow Graphs}
%\subtitle{Programming pearl}
\ifbanner
\titlebanner{\textsf{\mdseries\itshape
Reprinted from the 2010 ACM Haskell Symposium.
}}
\fi
\conferenceinfo{Haskell'10,} {September 30, 2010, Baltimore, Maryland, USA.}
\CopyrightYear{2010}
\copyrightdata{978-1-4503-0252-4/10/09}
\ifnoauthornotes
\makeatletter
\let\HyPsd@Warning=
\@gobble
\makeatother
\fi
% Joćo
\authorinfo{Norman Ramsey}{Tufts University}{nr@cs.tufts.edu}
\authorinfo{Jo\~ao Dias}{Tufts University}{dias@cs.tufts.edu}
\authorinfo{Simon Peyton Jones}{Microsoft Research}{simonpj@microsoft.com}
\maketitle
\begin{abstract}
\iffalse % A vote for Simon's abstract
\remark{I have replaced a good abstract of the POPL submission with a
bad abstract of \emph{this} submission.}
We present \ourlib, a Haskell library that makes it easy for a
compiler writer
to implement program transformations based on dataflow analyses.
A~client of \ourlib\ defines a representation of
logical assertions,
a transfer function that computes outgoing assertions from incoming
assertions,
and a rewrite function that improves code when improvements are
justified by the assertions.
\ourlib\ does the actual analysis and transformation.
\ourlib\ implements state-of-the art algorithms:
Lerner, Grove, and Chambers's
\citeyearpar{lerner-grove-chambers:2002}
composition of simple analyses and transformations, which achieves
the same precision as complex, handwritten
``super-analyses;''
and Whalley's \citeyearpar{whalley:isolation} dynamic technique for
isolating bugs in a client's code.
\ourlib's implementation is unique in that unlike previous
implementations,
it carefully separates the tricky
elements of each of these algorithms, so that they can be examined and
understood independently.
\simon{Here is an alternative abstract based on the four-sentence model.}
\remark{Four-sentence model? You must teach me\ldots}
\fi
Dataflow analysis and transformation of control-flow graphs is
pervasive in optimizing compilers, but it is typically tightly
interwoven with the details of a \emph{particular} compiler.
We~describe \ourlib{}, a~reusable Haskell library that makes it
unusually easy to define new analyses and
transformations for \emph{any} compiler written in Haskell.
\ourlib's interface is modular and polymorphic,
and it offers unusually strong static guarantees.
The implementation encapsulates
state-of-the-art algorithms (interleaved analysis and rewriting,
dynamic error isolation), and it cleanly separates their tricky elements
so that they can be understood independently.
%
%\ourlib\ will be the workhorse of a new
%back end for the Glasgow Haskell Compiler (version~6.14, forthcoming).
\emph{Readers:} code examples are indexed at
\iffalse % one more line on page 1 blows up the paper...
{\url{http://bit.ly/doUJpr}}
\else
{\url{http://www.cs.tufts.edu/~nr/pubs/hoopl10-supplement.pdf}}
(\url{bit.ly/doUJpr}).
\fi
%%% Source: http://www.cs.tufts.edu/~nr/drop/popl-index.pdf
\end{abstract}
\makeatactive % Enable @foo@ notation
\section{Introduction}
A mature optimizing compiler for an imperative language includes many
analyses, the results of which justify the optimizer's
code-improving transformations.
Many analyses and transformations---constant
propagation, live-variable analysis, inlining, sinking of loads,
and so on---should be regarded as particular cases of
a single general problem: \emph{dataflow analysis and optimization}.
%% \remark{I do not feel compelled to cite Muchnick (or anyone else) here}
Dataflow analysis is over thirty years old,
but a recent, seminal paper by \citet{lerner-grove-chambers:2002} goes further,
describing a powerful but subtle way to
\emph{interleave} analysis and transformation so that each
piggybacks on the other.
Because optimizations based on dataflow analysis
share a common intellectual framework, and because that framework is
subtle, it~is tempting to
try to build a single reusable library that embodies the
subtle ideas, while
making it easy for clients to instantiate the library for different
situations.
% Tempting, but difficult.
Although such libraries exist, as we discuss
in \secref{related}, they have complex APIs and implementations,
and none implements the Lerner/Grove/Chambers technique.
In this paper we present \ourlib{} (short for ``higher-order
optimization library''), a new Haskell library for dataflow analysis and
optimization. It has the following distinctive characteristics:
\begin{itemize}
\item
\ourlib\ is purely functional.
Although pure functional languages are not obviously suited to writing
standard algorithms that
transform control-flow graphs,
pure functional code is actually easier to write, and far easier
to write correctly, than code that is mostly functional but uses a mutable
representation of graphs
\cite{ramsey-dias:applicative-flow-graph}.
When analysis and transformation
are interleaved, so that graphs must be transformed \emph{speculatively},
without knowing whether
a~transformed graph will be retained or discarded,
pure functional code offers even more benefits.
\item
\ourlib\ is polymorphic. Just as a list library is
polymorphic in the list elements, so is \ourlib{} polymorphic, both in
the nodes that inhabit graphs and in the dataflow facts that
analyses compute over these graphs (\secref{using-hoopl}).
\item
The paper by Lerner, Grove, and Chambers is inspiring but abstract.
We articulate their ideas in a concrete, simple API,
which hides
a subtle implementation
(\secreftwo{graph-rep}{using-hoopl}).
You provide a representation for assertions,
a transfer function that transforms assertions across a node,
and a rewrite function that uses an assertion to
justify rewriting a node.
\ourlib\ ``lifts'' these node-level functions to work over
control-flow graphs, solves recursion equations,
and interleaves rewriting with analysis.
Designing APIs is surprisingly
hard; we have been through over a dozen significantly different
iterations, and we offer our API as a contribution.
\item
Because the client
can perform very local reasoning (``@y@ is live before
@x:=y+2@''),
\seclabel{liveness-mention}
analyses and transformations built on \ourlib\
are small, simple, and easy to get right.
\seclabel{liveness-mention}
Moreover, \ourlib\ helps you write correct optimizations:
it~statically rules out transformations that violate invariants
of the control-flow graph (Sections \ref{sec:graph-rep} and \ref{sec:rewrites}),
and dynamically it can help find the first transformation that introduces a fault
in a test program (\secref{fuel}).
%%\finalremark{SLPJ: I wanted to write more about open/closed,
%%but I like this sentence with its claim to both static and dynamic assistance,
%%and maybe the open/closed story is hard to understand here.}
% \item \ourlib{} makes use of GADTS and type functions to offer unusually
% strong static guarantees. In particular, nodes, basic blocks, and
% graphs are all statically typed by their open or closedness on entry, and
% their open or closedness on exit (\secref{graph-rep}). For example, an add instruction is
% open on entry and exit, while a branch is open on entry and closed on exit.
% Using these types we can statically guarantee that, say, an add instruction
% is rewritten to a graph that is also open at both entry and exit; and
% that the user cannot construct a block where an add instruction follows an
% unconditional branch. We know of no other system that offers
% static guarantees this strong.
\item \ourlib{} implements subtle algorithms, including
(a)~interleaved analysis and rewriting, (b)~speculative rewriting,
(c)~computing fixed points, and (d)~dynamic fault isolation.
Previous implementations of these algorithms---including three of our
own---are complicated and hard to understand, because the tricky pieces
are implemented all together, inseparably.
In~this paper, each tricky piece is handled in just
one place, separate from all the others (\secref{engine}).
We~emphasize this implementation as an object of interest in
its own right.
\end{itemize}
Our work bridges the gap between abstract, theoretical presentations
and actual compilers.
\ourlib{} is available from
\burl{http://ghc.cs.tufts.edu/hoopl} and also from Hackage (version~3.8.3.0).
One of \hoopl's clients
is the Glasgow Haskell Compiler, which uses \hoopl\ to optimize
imperative code in GHC's back end.
\ourlib's API is made possible by
sophisticated aspects of Haskell's type system, such
as higher-rank polymorphism, GADTs, and type functions.
\ourlib{} may therefore also serve as a case study in the utility
of these features.
% \clearpage % Start section 2 on page 2
\section{Dataflow analysis {\&} transformation by \texorpdfstring{\rlap{example}}{example}}
\seclabel{overview}
\seclabel{constant-propagation}
\seclabel{example:transforms}
\seclabel{example:xforms}
%% We begin by setting the scene, introducing some vocabulary, and
%% showing a small motivating example.
A \emph{control-flow graph}, perhaps representing the body of a procedure,
is a collection of \emph{basic blocks}---or just ``blocks.''
Each block is a sequence of instructions,
beginning with a label and ending with a
control-transfer instruction that branches to other blocks.
% Each block has a label at the beginning,
% a sequence of
% -- each of which has a label at the
% beginning. Each block may branch to other blocks with arbitrarily
% complex control flow.
The goal of dataflow optimization is to compute valid
\emph{assertions} (or \emph{dataflow facts}),
then use those assertions to justify code-improving
transformations (or \emph{rewrites}) on a {control-flow graph}.
As~a concrete example, we show constant propagation with constant
folding.
On the left we show a basic block; in the middle we show
facts that hold between statements (or \emph{nodes})
in the block; and on
the right we show the result of transforming the block
based on the facts:
\begin{verbatim}
Before Facts After
------------{}-------------
x := 3+4 x := 7
----------{x=7}------------
z := x>5 z := True
-------{x=7, z=True}-------
if z goto L1
then goto L1
else goto L2
\end{verbatim}
Constant propagation works
from top to bottom. We start with the empty fact.
Given the empty fact and the node @x:=3+4@ can we make a (constant-folding)
transformation?
Yes: we can replace the node with @x:=7@.
Now, given this transformed node
and the original fact, what fact flows out of the bottom of
the transformed node?
The~fact~\mbox{\{@x=7@\}}.
Given the fact \{@x=7@\} and the node @z:=x>5@, can we make a
transformation? Yes: constant propagation can replace the node with @z:=7>5@.
Now, can we make another transformation? Yes: constant folding can
replace the node with @z:=True@.
The~process continues to the end of the block, where we
can replace the conditional branch with an unconditional one, @goto L1@.
The example above is simple because the program has only straight-line code;
when programs have loops, dataflow analysis gets more complicated.
For example,
consider the following graph,
where we assume @L1@ is the entry point:
\begin{verbatim}
L1: x=3; y=4; if z then goto L2 else goto L3
L2: x=7; goto L3
L3: ...
\end{verbatim}
Because control flows to @L3@ from two places (@L1@~and~@L2@),
we must \emph{join} the facts coming from those two places.
All paths to @L3@ produce the fact @y=@$4$,
so we can conclude that this fact holds at~@L3@.
But depending on the the path to~@L3@, @x@~may have different
values, so we conclude ``@x=@$\top$'',
meaning that there is no single value held by~@x@ at~@L3@.
%\footnote{
%In this example @x@ really does vary at @L3@, but in general
%the analysis might be conservative.}
The final result of joining the dataflow facts that flow to~@L3@
is the new fact $\mathord{@x=@\top\!} \land \mathord{@y=4@} \land \mathord{@z=@\top}$.
\seclabel{const-prop-example}
\paragraph{Forwards and backwards.}
Constant propagation works \emph{forwards}, and a fact is often an
assertion about the program state, such as ``variable~@x@ holds value~@7@.''
Some useful analyses work \emph{backwards}.
A~prime example is live-variable analysis, where a fact takes the~form
``variable @x@ is live'' and is an assertion about the
\emph{continuation} of a program point. For example, the fact ``@x@~is
live'' at a program point P is an assertion that @x@ is used on some program
path starting at \mbox{P}. % TeXbook, exercise 12.6
The accompanying transformation is called dead-code elimination;
if @x@~is not live, this transformation
replaces the node @x:=e@ with a no-op.
\seclabel{simple-tx}
\paragraph{Interleaved transformation and analysis.}
Our first example \emph{interleaves} analysis and transformation.
% constant propagation is a transformation, not an analysis
Interleaving makes it far easier to write effective analyses.
If, instead, we \emph{first} analyzed the block
and \emph{then} transformed it, the analysis would have to ``predict''
the transformations.
For example, given the incoming fact \{@x=7@\}
and the instruction @z:=x>5@,
a pure analysis could produce the outgoing fact
\{@x=7@, @z=True@\} by simplifying @x>5@ to @True@.
But the subsequent transformation must perform
\emph{exactly the same simplification} when it transforms the instruction to @z:=True@!
If instead we \emph{first} rewrite the node to @z:=True@,
and \emph{then} apply the transfer function to the new node,
the transfer function becomes wonderfully simple: it merely has to see if the
right hand side is a constant.
You~can see code in \secref{const-prop-client}.
Another example is the interleaving of liveness analysis and dead-code elimination.
As mentioned in \secref{liveness-mention},
it is sufficient for the analysis to~say ``@y@ is live before @x:=y+2@''.
It is not necessary to have the more complex rule
``if @x@~is live after @x:=y+2@ then @y@ is live before~it,''
because if @x@~is \emph{not} live after @x:=y+2@,
the assignment @x:=y+2@ will be eliminated.
If~there are a number of interacting
analyses and/or transformations,
the benefit of interleaving them is even more compelling; for more
substantial examples, consult \citet{lerner-grove-chambers:2002}.
%
But these compelling benefits come at a cost.
To~compute valid facts for a program that has loops,
an analysis may require multiple iterations over the program.
Before the final iteration, the analysis may
compute a~fact that is invalid,
and a transformation may use the invalid fact to rewrite the program
(\secref{ckpoint-monad}).
To~avoid unjustified rewrites,
any rewrite based on an invalid fact must be rolled back;
transformations must be \emph{speculative}.
As~described in \secref{checkpoint-monad},
\hoopl~manages speculation with cooperation from the client.
\seclabel{debugging-introduced}
While it is wonderful that we can implement complex optimizations by
composing very simple analyses and transformations,
it~is not so wonderful that very simple analyses and transformations,
when composed, can exhibit complex emergent behavior.
Because such behavior is not easily predicted, it is essential to have good
tools for debugging.
\hoopl's primary debugging tool is an implementation of
Whalley's \citeyearpar{whalley:isolation} search technique for
isolating faults in an optimizer (\secref{whalley-from-s2}).
\section{Representing control-flow graphs} \seclabel{graph-rep}
\ourlib{} is a library that makes it easy to define dataflow analyses,
and transformations driven by these analyses,
on control-flow graphs.
Graphs are composed from smaller units, which we discuss from the
bottom up:
\begin{itemize}
\item A \emph{node} is defined by \ourlib's client;
\ourlib{} knows nothing about the representation of nodes (\secref{nodes}).
\item A basic \emph{block} is a sequence of nodes (\secref{blocks}).
\item A \emph{graph} is an arbitrarily complicated control-flow graph:
basic blocks connected by edges (\secref{graphs}).
\end{itemize}
\subsection{Shapes: Open and closed}
Nodes, blocks, and graphs share important properties in common.
In particular, each is \emph{open or closed on entry}
and \emph{open or closed on exit}.
An \emph{open} point is one at which control may implicitly ``fall through;''
to transfer control at a \emph{closed} point requires an explicit
control-transfer instruction to a named label.
For example,
\begin{itemize}
\item A shift-left instruction is open on entry (because control can fall into it
from the preceding instruction), and open on exit (because control falls through
to the next instruction).
\item An unconditional branch is open on entry, but closed on exit (because
control cannot fall through to the next instruction).
\item A label is closed on entry (because in \ourlib{} we do not allow
control to fall through into a branch target), but open on exit.
\item
A~function call could, if the language being compiled is simple
enough, be open on entry and exit.
But if a function could return in multiple ways---for example by
transferring either to a normal return continuation or to an
exception handler---then a function call will have to be represented
by a node that is closed at exit.
\end{itemize}
% This taxonomy enables \ourlib{} to enforce invariants:
% only nodes closed on entry can be the targets of branches, and only nodes closed
% on exits can transfer control (see also \secref{nonlocal-class}).
% As~a consequence, all control transfers originate at control-transfer
% instructions and terminated at labels; this invariant dramatically
% simplifies analysis and transformation.
These examples concern nodes, but the same classification applies
to blocks and graphs. For example the block
\begin{code}
x:=7; y:=x+2; goto L
\end{code}
is open on entry and closed on exit.
This is the block's \emph{shape}, which we often abbreviate
``open/closed;''
we may also refer to an ``open/closed block.''
The shape of a thing determines that thing's control-flow properties.
In particular, whenever E is a node, block, or graph,
% : \simon{Removed the claim about a unique entry point.}
\begin{itemize}
\item
If E is open on entry, it has a unique predecessor;
if it is closed, it may have arbitrarily many predecessors---or none.
\item
If E is open on exit, it has a unique successor;
if it is closed, it may have arbitrarily many successors---or none.
\end{itemize}
%%%%
%%%%
%%%% % \item Regardless of whether E is open or closed,
%%%% % it has a unique entry point where execution begins.
%%%% \item If E is closed on exit, control leaves \emph{only}
%%%% by explicit branches from closed-on-exit nodes.
%%%% \item If E is open on exit, control \emph{may} leave E
%%%% by ``falling through'' from a distinguished exit point.
%%%% \remark{If E is a node or block, control \emph{only} leaves E by
%%%% falling through, but this isn't so for a graph. Example: a body of a
%%%% loop contains a \texttt{break} statement} \simon{I don't understand.
%%%% A break statement would have to be translated as a branch, no?
%%%% Can you give a an example? I claim that control only leaves an
%%%% open graph by falling through.}
%%%% \end{itemize}
\subsection{Nodes} \seclabel{nodes}
The primitive constituents of a control-flow graph are
\emph{nodes}.
% , which are defined by the client. SLPJ: said again very shortly
For~example, a node might represent a machine instruction, such as an
assignment, a call, or a conditional branch.
But \ourlib{}'s graph representation is \emph{polymorphic in the node type},
so each client can define nodes as it likes.
Because they contain nodes defined by the client,
graphs can include arbitrary client-specified data, including
(say) method calls, C~statements, stack maps, or
whatever.
\begin{figure}
\begin{fuzzcode}{0.98pt}
data `Node e x where
Label :: Label -> Node C O
`Assign :: Var -> Expr -> Node O O
`Store :: Expr -> Expr -> Node O O
`Branch :: Label -> Node O C
`Cond :: Expr -> Label -> Label -> Node O C
... more constructors ...
\end{fuzzcode}
\caption{A typical node type as it might be defined by a client}
\figlabel{cmm-node}
\end{figure}
The type of a node specifies \emph{at compile time} whether the node is open or
closed on entry and exit. Concretely,
the type constructor for a node has kind @*->*->*@, where the two type parameters
are type-level flags, one for entry and one for exit.
Such a type parameter may be instantiated only with type @O@~(for
open) or type~@C@ (for closed).
As an example,
\figref{cmm-node} shows a typical node type as it might be defined by
one of \ourlib's {clients}.
The type parameters are written @e@ and @x@, for
entry and exit respectively.
The type is a generalized algebraic data type;
the syntax gives the type of each constructor.
%%% \cite{peyton-jones:unification-based-gadts}.
For example, constructor @Label@
takes a @Label@ and returns a node of type @Node C O@, where
the~``@C@'' says ``closed on entry'' and the~``@O@'' says ``open on exit''.
The types @Label@, @O@, and~@C@ are
defined by \ourlib{} (\figref{graph}).
As another example, constructor @Assign@ takes a variable and an expression, and it
returns a @Node@ open on both entry and exit; constructor @Store@ is
similar.
Types @`Var@ and @`Expr@ are private to the client, and
\ourlib{} knows nothing of them.
Finally, control-transfer nodes @Branch@ and @Cond@ (conditional
branch) are open on entry
and closed on exit.
Nodes closed on entry are the only targets of control transfers;
nodes open on entry and exit never perform control transfers;
and nodes closed on exit always perform control transfers\footnote{%
To obey these invariants,
a node for
a conditional-branch instruction, which typically either transfers control
\emph{or} falls through, must be represented as a two-target
conditional branch, with the fall-through path in a separate block.
This representation is standard \cite{appel:modern},
and it costs nothing in practice:
such code is easily sequentialized without superfluous branches.
%a late-stage code-layout pass can readily reconstruct efficient code.
}.
Because of the position each type of node occupies in a
basic block,
we~often call them \emph{first}, \emph{middle}, and \emph{last} nodes
respectively.
\begin{figure}
\begin{fuzzcode}{10.5pt}
data `O -- Open
data `C -- Closed
data `Block n e x where
`BFirst :: n C O -> Block n C O
`BMiddle :: n O O -> Block n O O
`BLast :: n O C -> Block n O C
`BCat :: Block n e O -> Block n O x -> Block n e x
data `Graph n e x where
`GNil :: Graph n O O
`GUnit :: Block n O O -> Graph n O O
`GMany :: MaybeO e (Block n O C)
-> LabelMap (Block n C C)
-> MaybeO x (Block n C O)
-> Graph n e x
data `MaybeO ^ex t where
`JustO :: t -> MaybeO O t
`NothingO :: MaybeO C t
newtype `Label -- abstract
newtype `LabelMap a -- finite map from Label to a
`addBlock :: NonLocal n
=> Block n C C
-> LabelMap (Block n C C)
-> LabelMap (Block n C C)
`blockUnion :: LabelMap a -> LabelMap a -> LabelMap a
class `NonLocal n where
`entryLabel :: n C x -> Label
`successors :: n e C -> [Label]
\end{fuzzcode}
\caption{The block and graph types defined by \ourlib}
\figlabel{graph} \figlabel{edges}
\end{figure}
% omit MaybeC :: * -> * -> *
\subsection{Blocks} \seclabel{blocks}
\ourlib\ combines the client's nodes into
blocks and graphs, which, unlike the nodes, are defined by \ourlib{}
(\figref{graph}).
A~@Block@ is parameterized over the node type~@n@
as well as over the same flag types that make it open or closed at
entry and exit.
The @BFirst@, @BMiddle@, and @BLast@ constructors create one-node
blocks.
Each of these constructors is polymorphic in the node's \emph{representation}
but monomorphic in its \emph{shape}.
Why not use a single constructor
of type \mbox{@n e x -> Block n e x@}, which would be polymorphic in a
node's representation \emph{and}
shape?
Because by making the shape known statically, we simplify the
implementation of analysis and transformation in
\secref{dfengine}.
%%% \finalremark{We have no principled answer to the question of what
%%% parts of the representation of blocks are exposed.
%%% Do we tell our readers or just ignore the question?
%%% }
%%% \simon{Just ignore!}
The @BCat@ constructor concatenates blocks in sequence.
It~makes sense to concatenate blocks only when control can fall
through from the first to the second; therefore,
two blocks may be concatenated {only} if each block is open at
the point of concatenation.
This restriction is enforced by the type of @BCat@, whose first
argument must be open on exit, and whose second argument must be open on entry.
It~is statically impossible, for example, to concatenate a @Branch@
immediately before an
@Assign@.
Indeed, the @Block@ type statically guarantees that any
closed/closed @Block@---which compiler writers normally
call a ``basic block''---consists of exactly one first node
(such as @Label@ in \figref{cmm-node}),
followed by zero or more middle nodes (@Assign@ or @Store@),
and terminated with exactly one
last node (@Branch@ or @Cond@).
Using GADTs to enforce these invariants is one of
\ourlib{}'s innovations.
\subsection{Graphs} \seclabel{graphs}
\ourlib{} composes blocks into graphs, which are also defined
in \figref{graph}.
Like @Block@, the data type @Graph@ is parameterized over
both nodes @n@ and its open/closed shape (@e@ and @x@).
It has three constructors. The first two
deal with the base cases of open/open graphs:
an empty graph is represented by @GNil@ while a single-block graph
is represented by @GUnit@.
More general graphs are represented by @GMany@, which has three
fields: an optional entry sequence, a body, and an optional exit
sequence.
\begin{itemize}
\item
If the graph is open on entry, it contains an \emph{entry sequence} of type
@Block n O C@.
We could represent this sequence as a value of type
@Maybe (Block n O C)@, but we can do better:
a~value of @Maybe@ type requires a \emph{dynamic} test,
but we know \emph{statically}, at compile time, that the sequence is present if and only
if the graph is open on entry.
We~express our compile-time knowledge by using the type
@MaybeO e (Block n O C)@, a type-indexed version of @Maybe@
which is also defined in \figref{graph}:
the type @MaybeO O a@ is isomorphic to~@a@, while
the type @MaybeO C a@ is isomorphic to~@()@.
\item
The \emph{body} of the graph is a collection of closed/closed blocks.
To~facilitate traversal of the graph, we represent the body as a finite
map from label to block.
\simon{{\tt LabelMap} not defined in Fig 2}
\item
The \emph{exit sequence} is dual to the entry sequence, and like the entry
sequence, its presence or absence is deducible from the static type of the graph.
\end{itemize}
\seclabel{gSplice}
Graphs can be spliced together nicely; the cost is logarithmic in the
number of closed/closed blocks.
Unlike blocks, two graphs may be spliced together
not only when they are both open
at splice point but also
when they are both closed---but not in the other two cases:
\begingroup
\def\^{\\[-5pt]}%
\hfuzz=10.8pt
\begin{smallttcode}
`gSplice :: Graph n e a -> Graph n a x -> Graph n e x
gSplice GNil g2 = g2
gSplice g1 GNil = g1\^
gSplice (GUnit ^b1) (GUnit ^b2) = GUnit (b1 `BCat` b2)\^
gSplice (GUnit b) (GMany (JustO e) bs x)
= GMany (JustO (b `BCat` e)) bs x\^
gSplice (GMany e ^bs (JustO x)) (GUnit b2)
= GMany e bs (JustO (x `BCat` b2))\^
gSplice (GMany e1 ^bs1 (JustO x1)) (GMany (JustO e2) ^bs2 x2)
= GMany e1 (bs1 `blockUnion` (b `addBlock` bs2)) x2
where b = x1 `BCat` e2\^
gSplice (GMany e1 bs1 NothingO) (GMany NothingO bs2 x2)
= GMany e1 (bs1 `blockUnion` bs2) x2
\end{smallttcode}
\endgroup
This definition illustrates the power of GADTs: the
pattern matching is exhaustive, and all the open/closed invariants are
statically checked. For example, consider the second-last equation for @gSplice@.
Since the exit sequence of the first argument is @JustO x1@,
we know that type parameter~@a@ is~@O@, and hence the entry sequence of the second
argument must be @JustO e2@.
Moreover, block~@x1@ must be
closed/open, and block~@e2@ must be open/closed.
We can therefore concatenate
\ifpagetuning
@x1@~and~@e2@
\else
them
\fi
with @BCat@ to produce a closed/closed block~@b@, which is
added to the body of the result.
\ifnotcutting
We~have carefully crafted the types so that if @BCat@
is considered as an associative operator,
every graph has a unique representation.
\finalremark{This \P\ is a candidate to be cut.}
\simon{Right}
%% \simon{Well, you were the one who was so keen on a unique representation!
%% And since we have one, I think tis useful to say so. Lastly, the representation of
%% singleton blocks is not entirely obvious.}
%%%%
%%%% An empty open/open graph is represented
%%%% by @GNil@, while a closed/closed one is @gNilCC@:
%%%% \par {\small
%%%% \begin{code}
%%%% gNilCC :: Graph C C
%%%% gNilCC = GMany NothingO BodyEmpty NothingO
%%%% \end{code}}
%%%% The representation of a @Graph@ consisting of a single block~@b@
%%%% depends on the shape of~@b@:\remark{Does anyone care?}
%%%% \par{\small
%%%% \begin{code}
%%%% gUnitOO :: Block O O -> Graph O O
%%%% gUnitOC :: Block O C -> Graph O C
%%%% gUnitCO :: Block O C -> Graph C O
%%%% gUnitCC :: Block O C -> Graph C C
%%%% gUnitOO b = GUnit b
%%%% gUnitOC b = GMany (JustO b) BodyEmpty NothingO
%%%% gUnitCO b = GMany NothingO BodyEmpty (JustO b)
%%%% gUnitCC b = GMany NothingO (BodyUnit b) NothingO
%%%% \end{code}}
%%%% Multi-block graphs are similar.
%%%% From these definitions
To guarantee uniqueness, @GUnit@ is restricted to open/open
blocks.
If~@GUnit@ were more polymorphic, there would be
more than one way to represent some graphs, and it wouldn't be obvious
to a client which representation to choose---or if the choice made a difference.
{\hfuzz=1.8pt\par}
\fi
\subsection{Edges, labels and successors}
\seclabel{nonlocal-class}
\seclabel{edges}
Although \ourlib{} is polymorphic in the type of nodes,
it~still needs to know how control may be transferred from one node to another.
Within a~block, a control-flow edge is implicit in every application of
the @BCat@ constructor.
An~implicit edge originates in a first node or a middle node and flows
to a middle node or a last node.
Between blocks, a~control-flow edge is represented as chosen by the client.
An~explicit edge originates in a last node and flows to a (labelled)
first node.
If~\hoopl\ is polymorphic in the node type,
how can it{} follow such edges?
\hoopl\ requires the client to make the node type an instance of
\ourlib's @NonLocal@ type class, which is defined in \figref{edges}.
The @entryLabel@ method takes a first node (one closed on entry, \secref{nodes})
and returns its @Label@;
the~@successors@ method takes a last node (closed on exit) and returns
the @Label@s to
which it can transfer control.
%%A~middle node, which is open on both entry and exit, transfers control
%%only locally, to its successor within a basic block,
%%so no corresponding interrogation function is needed.
%% A node type defined by a client must be an instance of @NonLocal@.
In~\figref{cmm-node},
the client's instance declaration for @Node@ would be
\begin{code}
instance NonLocal Node where
entryLabel (Label l) = l
successors (Branch b) = [b]
successors (Cond e b1 b2) = [b1, b2]
\end{code}
Again, the pattern matching for both functions is exhaustive, and
the compiler statically checks this fact.
Here, @entryLabel@
cannot be applied to an @Assign@ or @Branch@ node,
and any attempt to define a case for @Assign@ or @Branch@ would result
in a type error.
While the client provides this information about
nodes, it is convenient for \ourlib\ to get the same information
about blocks.
Internally,
\ourlib{} uses this instance declaration for the @Block@ type:
\begin{code}
instance NonLocal n => NonLocal (Block n) where
entryLabel (BFirst n) = entryLabel n
entryLabel (BCat b _) = entryLabel b
successors (BLast n) = successors n
successors (BCat _ b) = successors b
\end{code}
Because the functions @entryLabel@ and @successors@ are used to track control
flow \emph{within} a graph, \ourlib\ does not need to ask for the
entry label or successors of a @Graph@ itself.
Indeed, @Graph@ \emph{cannot} be an instance of @NonLocal@, because even
if a @Graph@ is closed on entry, it need not have a unique entry label.
%
% A slight infelicity is that we cannot make @Graph@ an instance of @NonLocal@,
% because a graph closed on entry has no unique label. Fortunately, we never
% need @Graph@s to be in @NonLocal@.
%
% ``Never apologize. Never confess to infelicity.'' ---SLPJ
\section {Using \ourlib{} to analyze and transform graphs} \seclabel{using-hoopl}
\seclabel{making-simple}
\seclabel{create-analysis}
\seclabel{api}
Now that we have graphs, how do we optimize them?
\ourlib{} makes it easy to build a new dataflow analysis and optimization.
A~client must supply the following pieces:
\begin{itemize}
\item A \emph{node type} (\secref{nodes}).
\ourlib{} supplies the @Block@ and @Graph@ types
that let the client build control-flow graphs out of nodes.
\item A \emph{data type of facts} and some operations over
those facts (\secref{facts}).
Each analysis uses facts that are specific to that particular analysis,
which \ourlib{} accommodates by being polymorphic in
the fact type.
\item A \emph{transfer function} that takes a node and returns a
\emph{fact transformer}, which takes a fact flowing into the node and
returns the transformed fact that flows out of the node (\secref{transfers}).
\item A \emph{rewrite function} that takes a node and an input fact,
performs a monadic action,
and returns either @Nothing@
or @Just g@,
where @g@~is a graph that should
replace the node
(\secreftwo{rewrites}{shallow-vs-deep}).
The ability to replace a \emph{node} by a \emph{graph}
% that may include internal control flow
is crucial for many code-improving transformations.
%We discuss the rewrite function
%in Sections \ref{sec:rewrites} and \ref{sec:shallow-vs-deep}.
\end{itemize}
These requirements are summarized in \tabref{parts}.
Because facts, transfer functions, and rewrite functions work together,
we~combine them in a single record of type @FwdPass@ (\figref{api-types}).
\begin{table}
\centerline{%
\begin{tabular}{@{}>{\raggedright\arraybackslash}p{1.03in}>{\scshape}c>{\scshape
}
c>{\raggedright\arraybackslash}p{1.29in}@{}}
&\multicolumn1{r}{\llap{\emph{Specified}}\hspace*{-0.3em}}&
\multicolumn1{l}{\hspace*{-0.4em}\rlap{\emph{Implemented}}}&\\
\multicolumn1{c}{\emph{Part of optimizer}}
&\multicolumn1{c}{\emph{by}}&
\multicolumn1{c}{\emph{by}}&
\multicolumn1{c}{\emph{How many}}%
\\[5pt]
Control-flow graphs& Us & Us & One \\
Nodes in a control-flow graph & You & You & One type per intermediate language \\[3pt]
Dataflow fact~$F$ & You & You & One type per logic \\
Lattice operations & Us & You & One set per logic \\[3pt]
Transfer functions & Us & You & One per analysis \\
Rewrite functions & Us & You & One per \rlap{transformation} \\[3pt]
Analyze-and-rewrite functions & Us & Us & Two (forward, backward) \\
\end{tabular}%
}
%\ifpagetuning\vspace*{0.5\baselineskip}\fi
\caption{Parts of an optimizer built with \ourlib}
\tablabel{parts}
\end{table}
Given a node type~@n@ and a @FwdPass@,
a client can ask \ourlib{}\ to analyze and rewrite a
graph.
\hoopl\ provides a fully polymorphic interface, but for purposes of
exposition, we present a function that is specialized to a
closed/closed graph:
\begin{fuzzcode}{10.5pt}
`analyzeAndRewriteFwdBody
:: ( CkpointMonad m -- Roll back speculative actions
, NonLocal n ) -- Extract non-local flow edges
=> FwdPass m n f -- Lattice, transfer, rewrite
-> [Label] -- Entry point(s)
-> Graph n C C -- Input graph
-> FactBase f -- Input fact(s)
-> m ( Graph n C C -- Result graph
, FactBase f ) -- ... and its facts
\end{fuzzcode}
Given a @FwdPass@ and a list of entry points, the
analyze-and-rewrite function transforms a graph into
an optimized graph.
As its type shows, this function
is polymorphic in the types of nodes~@n@ and facts~@f@;
these types are chosen by the client.
The type of the monad~@m@ is also chosen by the client.
As well as taking and returning a graph, the
function also takes input facts (the @FactBase@) and produces output facts.
A~@FactBase@ is a finite mapping from @Label@ to facts (\figref{api-types});
if~a @Label@ is not in the domain of the @FactBase@, its fact is the
bottom element of the lattice.
For example, in our constant-propagation example from \secref{const-prop-example},
if the graph
represents the body of a procedure
with parameters $x,y,z$, we would map the entry @Label@ to a fact
\mbox{$@x=@\top \land @y=@\top \land @z=@\top$}, to specify that the procedure's
parameters are not known to be constants.
%%
%% The
%% output @FactBase@ maps each @Label@ in the body to its fact
The client's model of @analyzeAndRewriteFwdBody@ is as follows:
\ourlib\ walks forward over each block in the graph.
At~each node, \ourlib\ applies the
rewrite function to the node and the incoming fact. If the rewrite
function returns @Nothing@, the node is retained as part of the output
graph, the transfer function is used to compute the downstream fact,
and \ourlib\ moves on to the next node.
But if the rewrite function returns @Just g@,
indicating that it wants to rewrite the node to the replacement graph~@g@, then
\ourlib\ recursively analyzes and may further rewrite~@g@
before moving on to the next node.
A~node following a rewritten node sees
\emph{up-to-date} facts; that is, its input fact is computed by
analyzing the replacement graph.
A~rewrite function may take any action that is justified by
the incoming fact.
If~further analysis invalidates the fact, \hoopl\ rolls
back the action.
Because graphs cannot be mutated,
rolling back to the original graph is easy.
But~rolling back a rewrite function's {monadic} action
requires cooperation from the client:
the~client must provide @checkpoint@ and
@restart@ operations, which
make~@m@ an instance of
\hoopl's @CkpointMonad@ class
(\secref{checkpoint-monad}).
Below we flesh out the
interface to @analyzeAndRewriteFwdBody@, leaving the implementation for
\secref{engine}.
%{\hfuzz=7.2pt\par}
\begin{figure}
\begin{fuzzcode}{25pt}
data `FwdPass m n f
= FwdPass { `fp_lattice :: DataflowLattice f
, `fp_transfer :: FwdTransfer n f
, `fp_rewrite :: FwdRewrite m n f }
------- Lattice ----------
data `DataflowLattice f = DataflowLattice
{ `fact_bot :: f
, `fact_join :: JoinFun f }
type `JoinFun f =
OldFact f -> NewFact f -> (ChangeFlag, f)
newtype `OldFact f = OldFact f
newtype `NewFact f = NewFact f
data `ChangeFlag = `NoChange | `SomeChange
------- Transfers ----------
newtype `FwdTransfer n f -- abstract type
`mkFTransfer
:: (forall e x . n e x -> f -> Fact x f)
-> FwdTransfer n f
------- Rewrites ----------
newtype `FwdRewrite m n f -- abstract type
`mkFRewrite :: FuelMonad m
=> (forall e x . n e x -> f -> m (Maybe (Graph n e x)))
-> FwdRewrite m n f
`thenFwdRw :: FwdRewrite m n f -> FwdRewrite m n f
-> FwdRewrite m n f
`iterFwdRw :: FwdRewrite m n f -> FwdRewrite m n f
`noFwdRw :: Monad m => FwdRewrite m n f
------- Fact-like things, aka "fact(s)" -----
type family `Fact x f :: *
type instance Fact O f = f
type instance Fact C f = FactBase f
------- FactBase -------
type `FactBase f = LabelMap f
-- A finite mapping from Labels to facts f
------- Rolling back speculative rewrites ----
class Monad m => CkpointMonad m where
type Checkpoint m
checkpoint :: m (Checkpoint m)
restart :: Checkpoint m -> m ()
------- Optimization fuel ----
type `Fuel = Int
class Monad m => `FuelMonad m where
`getFuel :: m Fuel
`setFuel :: Fuel -> m ()
\end{fuzzcode}
\caption{\ourlib{} API data types}
\figlabel{api-types}
\figlabel{lattice-type} \figlabel{lattice}
\figlabel{transfers} \figlabel{rewrites}
\end{figure}
% omit mkFactBase :: [(Label, f)] -> FactBase f
%%%% \simon{We previously renamed @fact\_join@ to @fact\_extend@ because it really
%%%% is not a symmetrical join; we're extending an old fact with a new one.
%%%% NR: Yes, but the asymmetry is now explicit in the \emph{type}, so it
%%%% needn't also be reflected in the \emph{name}.}
\subsection{Dataflow lattices} \seclabel{lattices} \seclabel{facts}
For each analysis or transformation, the client must define a type
of dataflow facts.
A~dataflow fact often represents an assertion
about a program point,
but in general, dataflow analysis establishes properties of \emph{paths}:
\begin{itemize}
\item An assertion about all paths \emph{to} a program point is established
by a \emph{forward analysis}. For example the assertion ``$@x@=@3@$'' at point P
claims that variable @x@ holds value @3@ at P, regardless of the
path by which P is reached.
\item An assertion about all paths \emph{from} a program point is
established by a \emph{backward analysis}. For example, the
assertion ``@x@ is dead'' at point P claims that no path from P uses
variable @x@.
\end{itemize}
A~set of dataflow facts must form a lattice, and \ourlib{} must know
(a)~the bottom element of the lattice and (b)~how to take
the least upper bound (join) of two elements.
To ensure that analysis
terminates, it is enough if every fact has a finite number of
distinct facts above it, so that repeated joins
eventually reach a fixed point.
%% \simon{There is no mention here of the @OldFact@ and @NewFact@ types.
%% Shall we nuke them for the purposes of the paper?
%% NR: They should definitely not be nuked; they're needed if the reader
%% wants to understand the asymmetrical behavior of @fact\_join@.
%% I'm~also mindful that this paper will be the primary explanation of
%% \hoopl\ to its users, and I don't want to hide this part of the
%% interface.
%% As for mentions in the text,
%% if you look carefully, you'll
%% see that I've changed the subscripts in the text to ``old'' and
%% ``new''.
%% Together with the code, I believe that's enough to get the point across.
%% SLPJ -- OK. I don't agree that it's enough, but we have more important fish to fry.
%% }
In practice, joins are computed at labels.
If~$f_{\mathit{old}}$ is the fact currently associated with a
label~$L$,
and if a transfer function propagates a new fact~$f_{\mathit{new}}$
into label~$L$,
\hoopl\ replaces $f_{\mathit{old}}$ with
the join \mbox{$f_{\mathit{old}} \join f_{\mathit{new}}$}.
And \hoopl\ wants to know
if \mbox{$f_{\mathit{old}} \join f_{\mathit{new}} = f_{\mathit{old}}$},
because if not, the analysis has not reached a fixed point.
The bottom element and join operation of a lattice of facts of
type~@f@ are stored in a value of type @DataflowLattice f@
(\figref{lattice}).
%%%% \simon{Can we shorten ``@DataflowLattice@'' to just
%%%% ``@Lattice@''?} % bigger fish ---NR
%%Such a value is part of the @FwdPass n f@ that is passed to
%%@analyzeAndRewriteFwd@ above.
As noted in the previous paragraph,
\ourlib{} needs to know when the result of a join is equal
to the old fact.
Because this information can often be computed cheaply together
with the join, \ourlib\ does not
require a separate equality test on facts, which might be expensive.
Instead, \ourlib\ requires that @fact_join@ return a @ChangeFlag@ as
well as the least upper bound.
The @ChangeFlag@ should be @NoChange@ if
the result is the same as the old fact, and
@SomeChange@ if the result differs.
\seclabel{WithTop}
To help clients create lattices and join functions,
\hoopl\ includes functions and constructors that can extend a type~@a@
with top and bottom elements.
In this paper, we use only type @`WithTop@, which comes with value
constructors that have these types:
\begin{code}
`PElem :: f -> WithTop f
`Top :: WithTop f
\end{code}
\hoopl\ provides combinators which make it easier to create join
functions that use @Top@.
The most useful is @extendJoinDomain@, which uses auxiliary types
defined in \figref{lattice}:
\begin{smallcode}
`extendJoinDomain
:: (OldFact f -> NewFact f -> (ChangeFlag, WithTop f))
-> JoinFun (WithTop f)
\end{smallcode}
A~client can write a join function that \emph{consumes} only facts of
type~@f@, but may produce either @Top@ or a fact of type~@f@---as
in the example of \figref{const-prop} below.
Calling @extendJoinDomain@ extends the client's function to a proper
join function on the type @WithTop a@,
guaranteeing that joins
involving @Top@ obey the appropriate algebraic laws.
\hoopl\ also provides a value constructor @`Bot@ and type constructors
@`WithBot@ and @`WithTopAndBot@, along with similar functions.
Constructors @Top@ and @Bot@ are polymorphic, so for example,
@Top@~also has type @WithTopAndBot a@.
It is also very common to use a lattice that takes the form of a
finite~map.
In~such lattices it is typical to join maps pointwise, and \hoopl\
provides a function that makes it convenient to do so:
\begin{smallcode}
`joinMaps :: Ord k => JoinFun f -> JoinFun (Map.Map k f)
\end{smallcode}
\subsection{The transfer function} \seclabel{transfers}
A~forward transfer function is presented with the dataflow fact
coming
into a node, and it computes dataflow fact(s) on the node's outgoing edge(s).
In a forward analysis, \hoopl\ starts with the fact at the
beginning of a block and applies the transfer function to successive
nodes in that block, until eventually the transfer function for the last node
computes the facts that are propagated to the block's successors.
For example, consider doing constant propagation (\secref{constant-propagation}) on
the following graph, with entry at @L1@:
\begin{code}
L1: x=3; goto L2
L2: y=x+4; x=x-1;
if x>0 then goto L2 else return
\end{code}
Forward analysis starts with the bottom fact \{\} at every label
except the entry~@L1@.
The initial fact at~@L1@ is \{@x=@$\top$@,y=@$\top$\}.
Analyzing~@L1@ propagates this fact forward, applying the transfer
function successively to the nodes
of~@L1@, and propagating the new fact \{@x=3,y=@$\top$\} to~@L2@.
This new fact is joined with the existing (bottom) fact at~@L2@.
Now~the analysis propagates @L2@'s fact forward, again applying the transfer
function, and propagating the new fact \mbox{\{@x=2@, @y=7@\}} to~@L2@.
Again the new fact is joined with the existing fact at~@L2@, and the process
repeats until the facts for each label reach a fixed point.
A~transfer function has an unusual sort of type:
not quite a dependent type, but not a bog-standard polymorphic type either.
The result type of the transfer function is \emph{indexed} by the shape (i.e.,
the type) of the node argument:
If the node is open on exit, the transfer function produces a single fact.
But if the node is \emph{closed} on exit,
the transfer function
produces a collection of (@Label@,fact) pairs: one for each outgoing edge.
\simon{Lies all lies. It's a FactBase!}
%
The indexing is expressed by Haskell's (recently added)
\emph{indexed type families}.
The relevant part of \ourlib's interface is given in \figref{transfers}.
A~forward transfer function supplied by a client,
which would be passed to @mkFTransfer@,
is typically a function polymorphic in @e@ and @x@.
It~takes a
node of type \mbox{@n@ @e@ @x@}
and it returns a \emph{fact transformer} of type
@f -> Fact x f@.
Type constructor @Fact@
should be thought of as a type-level function: its signature is given in the
@type family@ declaration, while its definition is given by two @type instance@
declarations. The first declaration says that a @Fact O f@, which
comes out of a node
\emph{open} on exit, is just a fact~@f@.
The second declaration says that a @Fact C f@, which comes out of a
node \emph{closed} on exit, is a mapping from @Label@ to facts.
%%
%% \begin{code}
%% `transfer_fn :: forall e x . n e x -> f -> Fact x f
%% `node :: n e x
%% \end{code}
%% then @(transfer_fn node)@ is a fact transformer:
%% \begin{code}
%% transfer_fn node :: f -> Fact x f
%% \end{code}
%%
\subsection{The rewrite function and the client's monad}
\seclabel{rewrites}
\seclabel{example-rewrites}
We compute dataflow facts in order to enable code-improving
transformations.
In our constant-propagation example,
the dataflow facts may enable us
to simplify an expression by performing constant folding, or to
turn a conditional branch into an unconditional one.
Similarly, a liveness analysis may allow us to
replace a dead assignment with a no-op.
A @FwdPass@ therefore includes a \emph{rewrite function}, whose
type, @FwdRewrite@, is abstract (\figref{api-types}).
A~programmer creating a rewrite function chooses the type of a node~@n@ and
a dataflow fact~@f@.
A~rewrite function might also want to consume fresh
names (e.g., to label new blocks) or take other actions (e.g., logging
rewrites).
So~that a rewrite function may take actions, \hoopl\
requires that a programmer creating a rewrite function also choose a
monad~@m@.
So~that \hoopl\ may roll back actions taken by speculative rewrites, the monad
must satisfy the constraint @CkpointMonad m@, as described in
\secref{checkpoint-monad}.
The programmer may write code that works with any such monad,
may create a monad just for the client,
or may use a monad supplied by \hoopl.
When these choices are made, the easy way to create a rewrite
function is to call the function @mkFRewrite@ in \figref{api-types}.
The client supplies a function that is specialized to a particular
node, fact, and monad, but is polymorphic in the
\emph{shape} of the node to be rewritten.
The function, which we will call~$r$, takes a node and a fact and returns a monadic
computation, but what is the result of that computation?
One might
expect that the result should be a new node, but that is not enough:
in~general, it must be possible for rewriting to result in a graph.
For example,
we might want to remove a node by returning the empty graph,
or more ambitiously, we might want to replace a high-level operation
with a tree of conditional branches or a loop, which would entail
returning a graph containing new blocks with internal control flow.
It~must also be possible for a rewrite function to decide to do nothing.
The result of the monadic computation returned by~$r$ may therefore be
@Nothing@, indicating that the node should
not be rewritten,
or $@Just@\;\ag$, indicating that the node should
be replaced with~\ag: the replacement graph.
%%The additional value $\rw$ tells \hoopl\ whether
%%and how the replacement graph~$\ag$ should be analyzed and rewritten
%%further;
%%we explain~$\rw$ in \secref{shallow-vs-deep}.
The type of @mkFRewrite@ in \figref{api-types} guarantees
that
the replacement graph $\ag$ has
the \emph{same} open/closed shape as the node being rewritten.
For example, a branch instruction can be replaced only by a graph
closed on exit.
%% Moreover, because only an open/open graph can be
%% empty---look at the type of @GNil@ in \figref{graph}---the type
%% of @FwdRewrite@
%% guarantees, at compile time, that no head of a block (closed/open)
%% or tail of a block (open/closed) can ever be deleted by being
%% rewritten to an empty graph.
\subsection{Shallow rewriting, deep rewriting, rewriting combinators,
and the meaning of {\textmd\texttt{FwdRewrite}}}
\seclabel{shallow-vs-deep}
When a node is rewritten, the replacement graph~$g$
must itself be analyzed, and its nodes may be further rewritten.
\hoopl\ can make a recursive call to
@analyzeAndRewriteFwdBody@---but exactly how should it rewrite the
replacement graph~$g$?
There are two common ways to proceed:
\begin{itemize}
\item
Rewrite~$g$ using the same rewrite function that produced~$g$.
This procedure is
called \emph{deep rewriting}.
When deep rewriting is used, the client's rewrite function must
ensure that the graphs it produces are not rewritten indefinitely
(\secref{correctness}).
\item
Analyze~$g$ {without} rewriting it.
This procedure is called \emph{shallow rewriting}.
\end{itemize}
Deep rewriting is essential to achieve the full benefits of
interleaved analysis and transformation
\citep{lerner-grove-chambers:2002}.
But shallow rewriting can be vital as well;
for example, a backward dataflow pass that inserts
a spill before a call must not rewrite the call again, lest it attempt
to insert infinitely many spills.
An~innovation of \hoopl\ is to build the choice of shallow or deep
rewriting into each rewrite function, through the use of the four
combinators
@mkFRewrite@, @thenFwdRw@, @iterFwdRw@, and @noFwdRw@ shown in
\figref{api}.
Every rewrite function is made with these combinators,
and its behavior is characterized by the answers to two questions:
does the function rewrite a~node, and if so, what rewrite function
should be used to recursively analyze the replacement graph.
To~answer these questions precisely, we offer a model of rewrite
functions as an algebraic datatype:
\begin{smallcode}
data `Rw a = `Mk a | `Then (Rw a) (Rw a) | `Iter (Rw a) | `No
\end{smallcode}
Using this model, we specify how a rewrite function works by
giving a reference implementation.
The code is in continuation-passing style;
the first continuation~@j@ accepts a result where the graph is
rewritten,
and the second continuation~@n@ is the result when the graph is not
rewritten.
\begin{smallcode}
rewrite :: Monad m => FwdRewrite m n f -> n e x -> f
-> m (Maybe (Graph n e x, FwdRewrite m n f))
`rewrite ^rs node f = rew rs (return . Just) (return Nothing)
where
`rew (Mk r) j n = do ^mg <- r node f
case mg of Nothing -> n
Just g -> j (g, No)
rew (r1 `Then` r2) j n = rew r1 (j . add r2) (rew r2 j n)
rew (Iter r) j n = rew r (j . add (Iter r)) n
rew No j n = n
`add tail (g, r) = (g, r `Then` tail)
\end{smallcode}
Appealing to this model, we see that
\begin{itemize}
\item
A~function returned by @mkFRewrite@ never rewrites a replacement
graph---this behavior is shallow rewriting.
\item
When a~function @r1 `thenFwdRw` r2@ is applied to a node,
if @r1@ replaces the node, then @r2@~is used to transform the
replacement graph.
And~if @r1@~does not replace the node, \hoopl\ tries~@r2@.
\item
When a~function @iterFwdRw r@ rewrites a node, then @iterFwdRw r@ is
used to transform the replacement graph---this behavior is deep
rewriting.
But~if @r1@~does not replace the node, \hoopl\ returns @Nothing@.
\item
Finally, @noFwdRw@ never replaces a graph.
\end{itemize}
For~convenience, we~also provide the~function @`deepFwdRw@,
which is the composition of @iterFwdRw@ and~@mkFRewrite@.
\remark{maybe some of the real type signatures ought to be repeated?}
%% \begin{code}
%% `iterFwdRw :: FwdRewrite m n f -> FwdRewrite m n f
%% \end{code}
%% To try multiple rewrite functions in sequence,
%% provided they all use the same type of fact, \hoopl\ combines them with
%% \verbatiminput{comb1}
%% % defn thenFwdRw
%% Rewrite function @r1 `thenFwdRw` r2@ first does the rewrites of~@r1@,
%% then the rewrites of~@r2@.
%%
%%
%% by using
%% Rewrite functions are potentially more plentiful than transfer
%% functions, because
%% a~single dataflow fact might justify more than one kind of rewrite.
%% \hoopl\ makes it easy for a client to combine multiple rewrite
%% functions that use the same fact:
Our combinators satisfy the algebraic laws that you would expect;
for~example
@noFwdRw@ is a left and right identity of @thenFwdRw@.
A~more interesting law is
\begin{code}
iterFwdRw r = r `thenFwdRw` iterFwdRw r
\end{code}
Unfortunately, this law cannot be used to \emph{define}
@iterFwdRw@:
if~we used this law to define @iterFwdRw@, then when @r@ returned @Nothing@,
@iterFwdRw r@ would diverge.
%%%% \hoopl\ provides
%%%% a function that makes a shallow rewrite deep:\finalremark{algebraic
%%%% law wanted!}
%%%% \remark{Do we really care about @iterFwdRw@ or can we just drop it?}
%%%% \verbatiminput{iterf}
%%%% % defn iterFwdRw
\subsection{When the type of nodes is not known}
We note above (\secref{transfers}) that
the type of a transfer function's result
depends on the argument's shape on exit.
It~is easy for a client to write a type-indexed transfer function,
because the client defines the constructor and shape for each node.
The client's transfer functions discriminate on the constructor
and so can return a result that is indexed by each node's shape.
What if you want to write a transfer function that
{does \emph{not} know the type of the node}?
%
For example, a dominator analysis need not scrutinize nodes;
it needs to know only about
labels and edges in the graph.
Ideally, a dominator analysis would work
with \emph{any} type of node~@n@,
provided only that @n@~is an
instance of the @NonLocal@ type class.
But if we don't know
the type of~@n@,
we can't write a function of type
@n e x -> f -> Fact x f@,
because the only way to get the result type right is to
scrutinize the constructors of~@n@.
\seclabel{triple-of-functions}
There is another way;
instead of requiring a single function that is polymorphic in shape,
\hoopl\ also accepts a triple of functions, each of which is
polymorphic in the node's type but monomorphic in its shape:
\begin{code}
`mkFTransfer3 :: (n C O -> f -> Fact O f)
-> (n O O -> f -> Fact O f)
-> (n O C -> f -> Fact C f)
-> FwdTransfer n f
\end{code}
We have used this interface to write a number of functions that are
polymorphic in the node type~@n@:
\begin{itemize}
\item
A function that takes a @FwdTransfer@ and wraps it in logging code, so
an analysis can be debugged by watching the facts flow through the
nodes
\item
A pairing function that runs two passes interleaved, not sequentially,
potentially producing better results than any sequence:
\verbatiminput{pairf}
% defn pairFwd
\item
An efficient dominator analysis
in the style of
\citet{cooper-harvey-kennedy:simple-dominance},
whose transfer function is implemented
using only the
functions in the \texttt{NonLocal} type class
\end{itemize}
%% Or, you might want to write a combinator that
%% computes the pairwise composition of any two analyses
%% defined on any node type.
%%
%% You cannot write either of these examples
%% using the polymorphic function described in \secref{transfers}
%% because these examples require the ability
%% to inspect the shape of the node.
%%
%%
%% \subsection{Stuff cut from 4.2 that should disappear when the previous section is finished}
%%
%% So much for the interface.
%% What about the implementation?
%% The way GADTs work is that the compiler uses the value constructor for
%% type \mbox{@n@ @e@ @x@} to determine whether @e@~and~@x@ are open or
%% closed.
%% But we want to write functions that are \emph{polymorphic} in the node
%% type~@n@!
%% Such functions include
%% \begin{itemize}
%% \item
%% A function that takes a pair of @FwdTransfer@s for facts @f@~and~@f'@,
%% and returns a single @FwdTransfer@ for the fact @(f, f')@
%% \end{itemize}
%% \simon{These bullets are utterly opaque. They belong in 4.5. The rest of this section
%% is also very hard to understand without more explanation. See email. }
%% Such functions may also be useful in \hoopl's \emph{clients}:
%% % may need
%% %functions that are polymorphic in the node type~@n@:
%% \begin{itemize}
%% \item
%% A dominator analysis in the style of
%% \citet{cooper-harvey-kennedy:simple-dominance} requires only the
%% functions in the \texttt{NonLocal} type class;
%% we have written such an analysis using transfer functions that are
%% polymorphic in~@n@.
%% \end{itemize}
%% Because the mapping from
%% value constructors to shape is different for each node type~@n@, transfer
%% functions cannot be polymorphic in both
%% the representation and the shape of nodes.
%% Our implementation therefore sacrifices polymorphism in shape:
%% internally, \hoopl\ represents
%% a~@FwdTransfer n f@ as a \emph{triple} of functions,
%% each polymorphic in~@n@ but monomorphic in shape:
%% \begin{code}
%% newtype FwdTransfer n f
%% = FwdTransfers ( n C O -> f -> f
%% , n O O -> f -> f
%% , n O C -> f -> FactBase f
%% )
%% \end{code}
%% \simon{I argue strongly that the implementation should be
%% polymorphic too, using a shape classifier where necessary.
%% Regardless of how this debate pans out, though, I think it's
%% a bad idea to introduce triples here. They are simply confusing.}
%% Such triples can easily be composed and wrapped without requiring a
%% pattern match on the value constructor of an unknown node
%% type.\remark{Need to refer to this junk in the conclusion}
%% Working with triples is tedious, but only \hoopl\ itself is forced to
%% work with triples; each client, which knows the node type, may supply
%% a triple,
%% but it typically supplies a single function polymorphic in the shape
%% of the (known) node.
%%
%%
%% \subsection{Composing rewrite functions and dataflow passes} \seclabel{combinators}
%%
%% Requiring each rewrite to return a new rewrite function has another
%% advantage, beyond controlling the shallow-vs-deep choice: it
%% enables a variety of combinators over rewrite functions.
%% \remark{This whole subsection needs to be redone in light of the new
%% (triple-based) representation. It's not pretty any more.}
%% For example, here is a function
%% that combines two rewrite functions in sequence:
%% \remark{This code must be improved}
%% What a beautiful type @thenFwdRw@ has!
%% \remark{And what an ugly implementation! Implementations must go.}
%% It tries @rw1@, and if @rw1@
%% declines to rewrite, it behaves like @rw2@. But if
%% @rw1@ rewrites, returning a new rewriter @rw1a@, then the overall call also
%% succeeds, returning a new rewrite function obtained by combining @rw1a@
%% with @rw2@. (We cannot apply @rw1a@ or @rw2@
%% directly to the replacement graph~@g@,
%% because @r1@~returns a graph and @rw2@~expects a node.)
%% The rewriter @noFwdRw@ is the identity of @thenFwdRw@.
%% Finally, @thenFwdRw@ can
%% combine a deep-rewriting function and a shallow-rewriting function,
%% to produce a rewrite function that is a combination of deep and shallow.
%% %%This is something that the Lerner/Grove/Chambers framework could not do,
%% %%because there was a global shallow/deep flag.
%% %% Our shortsightedness; Lerner/Grove/Chambers is deep only ---NR
%%
%%
%% A shallow rewrite function can be made deep by iterating
%% it:\remark{Algebraic law wanted!}
%% If we have shallow rewrites $A$~and~$B$ then we can build $AB$,
%% $A^*B$, $(AB)^*$,
%% and so on: sequential composition is @thenFwdRw@ and the Kleene star
%% is @iterFwdRw@.\remark{Do we still believe this claim?}
%% \remark{We can't define @iterFwdRew@ in terms of @thenFwdRew@ because
%% the second argument to @thenFwdRew@ would have to be the constant
%% nothing function when applied but would have to be the original triple
%% when passed to @thenFwdRew@ as the second argument (in the recursive
%% call).}
%%
%%
%% The combinators above operate on rewrite functions that share a common
%% fact type and transfer function.
%% It~can also be useful to combine entire dataflow passes that use
%% different facts.
%% We~invite you to write one such combinator, with type
%% \begin{code}
%% `pairFwd :: Monad m
%% => FwdPass m n f1
%% -> FwdPass m n f2
%% -> FwdPass m n (f1,f2)
%% \end{code}
%% The two passes run interleaved, not sequentially, and each may help
%% the other,
%% yielding better results than running $A$~and then~$B$ or $B$~and then~$A$
%% \citep{lerner-grove-chambers:2002}.
%% %% call these passes. ``super-analyses;''
%% %% in \hoopl, construction of super-analyses is
%% %% particularly concrete.
\subsection{Example: Constant propagation and constant folding}
\seclabel{const-prop-client}
% omit Binop :: Operator -> Expr -> Expr -> Expr
% omit Add :: Operator
\begin{figure}
\smallfuzzverbatiminput{2.5pt}{cprop}
% local node
% defn ConstFact
% defn constLattice
% defn constFactAdd
% defn varHasLit
% local ft
% defn constProp
% local cp
% local lookup
% defn simplify
% local simp
% local s_node
% local s_exp
% defn constPropPass
% local new
% local old
% local tl
% local fl
\caption{The client for constant propagation and constant folding\break (extracted automatically from code distributed with Hoopl)}
\figlabel{const-prop}
%\ifpagetuning\vspace*{-1.0\baselineskip}\fi
\end{figure}
\figref{const-prop} shows client code for
constant propagation and constant folding.
For each variable, at each program point, the analysis concludes one
of three facts:
the variable holds a constant value of type~@`Lit@,
the variable might hold a non-constant value,
or what the variable holds is unknown.
We~represent these facts using a finite map from a variable to a
fact of type @WithTop Lit@ (\secref{WithTop}).
% Any one procedure has only
% finitely many variables; only finitely many facts are computed at any
% program point; and in this lattice any one fact can increase at most
% twice. These properties ensure that the dataflow engine will reach a
% fixed point.
A~variable with a constant value maps to @Just (PElem k)@, where
@k@ is the constant value;
a variable with a non-constant value maps to @Just Top@;
and a variable with an unknown value maps to @Nothing@ (it is not
in the domain of the finite map).
% \afterpage{\clearpage}
The definition of the lattice (@constLattice@) is straightforward.
The bottom element is an empty map (nothing is known about what
any variable holds).
%
The join function is implemented with the help of combinators provided
by \hoopl.
The client writes a simple function, @constFactAdd@, which
compares two values of type @Lit@ and returns a result of type
@WithTop Lit@.
The client uses @extendJoinDomain@ to lift @constFactAdd@ into a join
function on @WithTop Lit@, then uses
@joinMaps@ to lift \emph{that} join function up to the map
containing facts for all variables.
The forward transfer function @varHasLit@ is defined using the
shape-polymorphic auxiliary function~@ft@.
For most nodes~@n@, @ft n@ simply propagates the input fact forward.
But for an assignment node, if a variable~@x@ gets a constant value~@k@,
@ft@ extends the input fact by mapping @x@ to~@PElem k@.
And if a variable~@x@ is assigned a non-constant value,
@ft@ extends the input fact by mapping @x@ to~@Top@.
There is one other interesting case:
a conditional branch where the condition
is a variable.
If the conditional branch flows to the true successor,
the variable holds @True@, and similarly for the false successor,
\emph{mutatis mutandis}.
Function @ft@ updates the fact flowing to each successor accordingly.
Because~@ft@ scrutinizes a GADT, it cannot
use a wildcard to default the uninteresting cases.
The transfer function need not consider complicated cases such as
an assignment @x:=y@ where @y@ holds a constant value~@k@.
Instead, we rely on the interleaving of transformation
and analysis to first transform the assignment to @x:=k@,
which is exactly what our simple transfer function expects.
As we mention in \secref{simple-tx},
interleaving makes it possible to write
very simple transfer functions without missing
opportunities to improve the code.
\figref{const-prop}'s rewrite function for constant propagation, @constProp@,
rewrites each use of a variable to its constant value.
The client has defined auxiliary functions that may change expressions
or nodes:
\begin{smallcode}
type `MaybeChange a = a -> Maybe a
`mapVE :: (Var -> Maybe Expr) -> MaybeChange Expr
`mapEE :: MaybeChange Expr -> MaybeChange Expr
`mapEN :: MaybeChange Expr -> MaybeChange (Node e x)
`mapVN :: (Var -> Maybe Expr) -> MaybeChange (Node e x)
`nodeToG :: Node e x -> Graph Node e x
\end{smallcode}
The client composes @map@\emph{XX} functions
to apply @lookup@ to each use of a variable in each kind of node;
@lookup@ substitutes for each variable that has a constant value.
Applying @liftM nodeToG@ lifts the final node, if present, into a~@Graph@.
\figref{const-prop} also gives another, distinct function
for constant
folding: @simplify@.
This function
rewrites constant expressions to their values,
and it rewrites a conditional branch on a
boolean constant to an unconditional branch.
To~rewrite constant expressions,
it runs @s_exp@ on every subexpression.
Function @simplify@ does not check whether a variable holds a
constant value; it relies on @constProp@ to have replaced the
variable by the constant.
Indeed, @simplify@ does not consult the
incoming fact, so it is polymorphic in~@f@.
The @FwdRewrite@ functions @constProp@ and @simplify@
are useful independently.
In this case, however, we
want \emph{both} of them,
so we compose them with @thenFwdRw@.
The composition, along with the lattice and the
transfer function,
goes into @constPropPass@ (bottom of \figref{const-prop}).
Given @constPropPass@, we can
improve a graph~@g@ by passing @constPropPass@ and~@g@
to
@analyzeAndRewriteFwdBody@.
%%%% \subsection{Fixed points and speculative rewrites} \seclabel{fixpoints}
%%%%
%%%% Are rewrites sound, especially when there are loops?
%%%% Many analyses compute a fixed point starting from unsound
%%%% ``facts''; for example, a live-variable analysis starts from the
%%%% assumption that all variables are dead. This means \emph{rewrites
%%%% performed before a fixed point is reached may be unsound, and their results
%%%% must be discarded}. Each iteration of the fixed-point computation must
%%%% start afresh with the original graph.
%%%%
%%%%
%%%% Although the rewrites may be unsound, \emph{they must be performed}
%%%% (speculatively, and possibly recursively),
%%%% so that the facts downstream of the replacement graphs are as accurate
%%%% as possible.
%%%% For~example, consider this graph, with entry at @L1@:
%%%% \par{\small
%%%% \begin{code}
%%%% L1: x=0; goto L2
%%%% L2: x=x+1; if x==10 then goto L3 else goto L2
%%%% \end{code}}
%%%% The first traversal of block @L2@ starts with the unsound ``fact'' \{x=0\};
%%%% but analysis of the block propagates the new fact \{x=1\} to @L2@, which joins the
%%%% existing fact to get \{x=$\top$\}.
%%%% What if the predicate in the conditional branch were @x<10@ instead
%%%% of @x==10@?
%%%% Again the first iteration would begin with the tentative fact \{x=0\}.
%%%% Using that fact, we would rewrite the conditional branch to an unconditional
%%%% branch @goto L3@. No new fact would propagate to @L2@, and we would
%%%% have successfully (and soundly) eliminated the loop.
%%%% This example is contrived, but it illustrates that
%%%% for best results we should
%%%% \begin{itemize}
%%%% \item Perform the rewrites on every iteration.
%%%% \item Begin each new iteration with the original, virgin graph.
%%%% \end{itemize}
%%%% This sort of algorithm is hard to implement in an imperative setting, where rewrites
%%%% mutate a graph in place.
%%%% But with an immutable graph, implementing the algorithm
%%%% is trivially easy: we simply revert to the original graph at the start
%%%% of each fixed-point iteration.
\subsection{Checkpointing the client's monad}
\seclabel{ckpoint-monad} \seclabel{checkpoint-monad}
When analyzing a program with loops, a rewrite function could make a change
that later has to be rolled back.
For example, consider constant propagation in this loop, which
computes factorial:
\begin{code}
i = 1; `prod = 1;
L1: if (i >= n) goto L3 else goto L2;
L2: i = i + 1; prod = prod * i;
goto L1;
L3: ...
\end{code}
Function @analyzeAndRewriteFwdBody@ iterates through this graph until
the dataflow facts stop changing.
On~the first iteration, the assignment @i = i + 1@ will
be analyzed with an incoming fact @i=1@, and the assignment will be
rewritten to the graph @i = 2@.
But~on a later iteration, the incoming fact will increase to @i=@$\top$,
and the~rewrite will no longer be justified.
After each iteration, \hoopl\ starts the next iteration with
\emph{new} facts but with the \emph{original} graph---by~virtue
of using purely functional data structures, rewrites from
previous iterations are automatically rolled back.
But a rewrite function doesn't only produce new graphs; it
can also take a \emph{monadic action}, such as
acquiring a fresh name.
These~actions must also be rolled back, and because the~client chooses
the monad in which the actions take place, the client must provide the
means to roll them back.
\hoopl\ therefore defines a rollback \emph{interface}, which each client must implement;
it is the type class @CkpointMonad@ from
\figref{api-types}:
\begin{code}
class Monad m => `CkpointMonad m where
type `Checkpoint m
`checkpoint :: m (Checkpoint m)
`restart :: Checkpoint m -> m ()
\end{code}
\hoopl\ calls the @checkpoint@ method at the beginning of an
iteration, then calls the @restart@ method if another iteration is
necessary.
These operations must obey the following algebraic law:
\begin{code}
do { s <- checkpoint; m; restart s } == return ()
\end{code}
where @m@~represents any combination of monadic actions that might be
taken by rewrite functions.
(The safest course is to make sure the law holds throughout the entire
monad.)
The~type of the saved checkpoint~@s@ is up to the client;
it~is specified as an associated type of the @CkpointMonad@ class.
\subsection{Correctness} \seclabel{correctness}
% Facts computed by @analyzeAndRewriteFwdBody@ depend on graphs produced by the rewrite
Facts computed by the transfer function depend on graphs produced by the rewrite
function, which in turn depend on facts computed by the transfer function.
How~do we know this algorithm is sound, or if it terminates?
A~proof requires a POPL paper
\cite{lerner-grove-chambers:2002};
here we merely state the conditions for correctness as applied to \hoopl:
\begin{itemize}
\item
The lattice must have no \emph{infinite ascending chains}; that is,
every sequence of calls to @fact_join@ must eventually return @NoChange@.
\item
The transfer function must be
\emph{monotonic}: given a more informative fact in,
it must produce a more informative fact out.
\item
The rewrite function must be \emph{sound}:
if it replaces a node @n@ by a replacement graph~@g@, then @g@~must be
observationally equivalent to~@n@ under the
assumptions expressed by the incoming dataflow fact~@f@.
%%\footnote{We do not permit a transformation to change
%% the @Label@ of a node. We have not found any optimizations
%% that are prevented (or even affected) by this restriction.}
%
Moreover, analysis of~@g@ must produce output fact(s)
that are at least as informative as the fact(s) produced by applying
the transfer function to~@n@.
%% \item
%% The rewrite function must be \emph{consistent} with the transfer function;
%% that is, \mbox{@`transfer n f@ $\sqsubseteq$ @transfer g f@}.
For example, if the transfer function says that @x=7@ after the node~@n@,
then after analysis of~@g@,
@x@~had better still be~@7@.
\item
% To ensure termination,
A transformation that uses deep rewriting
must not return a replacement graph which
contains a node that could be rewritten indefinitely.
\end{itemize}
%% Without the conditions on monotonicity and consistency,
%% our algorithm will terminate,
%% but there is no guarantee that it will compute
%% a fixed point of the analysis. And that in turn threatens the
%% soundness of rewrites based on possibly bogus ``facts''.
Under these conditions, the algorithm terminates and is
sound.
%%
%% \begin{itemize}
%% \item
%% The algorithm terminates. The fixed-point loop must terminate because the
%% lattice has no infinite ascending chains. And the client is responsible
%% for avoiding infinite recursion when deep rewriting is used.
%% \item
%% The algorithm is sound. Why? Because if each rewrite is sound (in the sense given above),
%% then applying a succession of rewrites is also sound.
%% Moreover, a~sound analysis of the replacement graph
%% may generate only dataflow facts that could have been
%% generated by a more complicated analysis of the original graph.
%% \end{itemize}
%%
%% \finalremark{Doesn't the rewrite have to be have the following property:
%% for a forward analysis/transform, if (rewrite P s) = Just s',
%% then (transfer P s $\sqsubseteq$ transfer P s').
%% For backward: if (rewrite Q s) = Just s', then (transfer Q s' $\sqsubseteq$ transfer Q s).
%% Works for liveness.
%% ``It works for liveness, so it must be true'' (NR).
%% If this is true, it's worth a QuickCheck property!
%% }%
%% \finalremark{Version 2, after further rumination. Let's define
%% $\scriptstyle \mathit{rt}(f,s) = \mathit{transform}(f, \mathit{rewrite}(f,s))$.
%% Then $\mathit{rt}$ should
%% be monotonic in~$f$. We think this is true of liveness, but we are not sure
%% whether it's just a generally good idea, or whether it's actually a
%% precondition for some (as yet unarticulated) property of \ourlib{} to hold.}%
%%%% \simon{The rewrite functions must presumably satisfy
%%%% some monotonicity property. Something like: given a more informative
%%%% fact, the rewrite function will rewrite a node to a more informative graph
%%%% (in the fact lattice.).
%%%% \textbf{NR}: actually the only obligation of the rewrite function is
%%%% to preserve observable behavior. There's no requirement that it be
%%%% monotonic or indeed that it do anything useful. It just has to
%%%% preserve semantics (and be a pure function of course).
%%%% \textbf{SLPJ} In that case I think I could cook up a program that
%%%% would never reach a fixed point. Imagine a liveness analysis with a loop;
%%%% x is initially unused anywhere.
%%%% At some assignment node inside the loop, the rewriter behaves as follows:
%%%% if (and only if) x is dead downstream,
%%%% make it alive by rewriting the assignment to mention x.
%%%% Now in each successive iteration x will go live/dead/live/dead etc. I
%%%% maintain my claim that rewrite functions must satisfy some
%%%% monotonicity property.
%%%% \textbf{JD}: in the example you cite, monotonicity of facts at labels
%%%% means x cannot go live/dead/live/dead etc. The only way we can think
%%%% of not to terminate is infinite ``deep rewriting.''
%%%% }
\section{\ourlib's implementation}
\seclabel{implementation}
\seclabel{engine}
\seclabel{dfengine}
\secref{making-simple}
gives a client's-eye view of \hoopl, showing how to
create analyses and transformations.
\hoopl's interface is simple, but
the \emph{implementation} of interleaved analysis and rewriting is~not.
\citet{lerner-grove-chambers:2002}
do not describe their implementation. We have written
at least three previous implementations, all of which
were long and hard to understand, and only one of which
provided compile-time guarantees about open and closed shapes.
We are not confident that any of our previous implementations are correct.
In this paper we describe a new implementation. It is elegant and short
(about a third of the size of our last attempt), and it offers
strong compile-time guarantees about shapes.
\finalremark{Wanted: enumerate the critical components and give each one's size}
%
We describe the implementation of \emph{forward}
analysis and transformation.
The implementations of backward analysis and transformation are
exactly analogous and are included in \hoopl.
We~also, starting in \secref{first-debugging-section}, explain how we
isolate errors in faulty optimizers, and how the fault-isolation
machinery is integrated with the rest of the implementation.
\subsection{Overview}
%% We on @analyzeAndRewriteFwd@, whose type is more general
%% than that of @analyzeAndRewriteFwdBody@:
%% \begin{smallcode}
%% `analyzeAndRewriteFwd
%% :: forall m n f e x. (FuelMonad m, NonLocal n)
%% => FwdPass m n f -- lattice, transfers, rewrites
%% -> MaybeC e [Label] -- entry points for a closed graph
%% -> Graph n e x -- the original graph
%% -> Fact e f -- fact(s) flowing into the entry/entries
%% -> m (Graph n e x, FactBase f, MaybeO x f)
%% \end{smallcode}
%% We analyze graphs of all shapes; a single @FwdPass@ may be used with
%% multiple shapes.
%% If a graph is closed on entry, a list of entry points must be
%% provided;
%% if the graph is open on entry, it must be the case that the
%% implicit entry point is the only entry point.
%% The fact or set of facts flowing into the entries is similarly
%% determined by the shape of the entry point.
%% Finally, the result is a rewritten graph, a @FactBase@ that gives a
%% fixed point of the analysis on the rewritten graph, and if the graph
%% is open on exit, an ``exit fact'' flowing out.
Instead of the interface function @analyzeAndRewriteFwdBody@, we present
the private function @arfGraph@, which is short for ``analyze and rewrite
forward graph:''
\begin{smallfuzzcode}{15.1pt}
`arfGraph
:: forall m n f e x. (CkpointMonad m, NonLocal n)
=> FwdPass m n f -- lattice, transfers, rewrites
-> MaybeC e [Label] -- entry points for a closed graph
-> Graph n e x -- the original graph
-> Fact e f -- fact(s) flowing into entry/entries
-> m (DG f n e x, Fact x f)
\end{smallfuzzcode}
Function @arfGraph@ has a more general type than
the function @analyzeAndRewriteFwdBody@ % praying for a decent line break
because @arfGraph@ is used recursively
to analyze graphs of all shapes.
If a graph is closed on entry, a list of entry points must be
provided;
if the graph is open on entry,
the graph's entry sequence must be the only entry point.
The~graph's shape on entry also determines the type of fact or facts
flowing in.
Finally, the result is a ``decorated graph''
@DG f n e x@,
and if the graph
is open on exit, an ``exit fact'' flowing out.
%% \simon{I suggest (a) putting the paragraph break one sentence earlier,
%%% so that this para is all about DGs.}
%%% NR: previous para is about the type of arfGraph; I don't want to
%%% leave the result type dangling. I hope the opening sentence of
%%% this para suggests that the para is all about DGs.
A~``decorated graph'' is one in which each block is decorated with the
fact that holds at the start of the block.
@DG@ actually shares a representation with @Graph@,
which is possible because the definition of
@Graph@ in \figref{graph} contains a white lie: @Graph@~is a type
synonym for an underlying type @`Graph'@, which takes the type of block
as an additional parameter.
(Similarly, function @gSplice@ in \secref{gSplice} is actually a
higher-order function that takes a block-concatenation function as a
parameter.)
The truth about @Graph@ and @DG@ is as follows:
\smallverbatiminput{dg}
% defn DG
% defn DBlock
Type~@DG@ is internal to \hoopl; it is not seen by any client.
To~convert a~@DG@ to the @Graph@ and @FactBase@
that are returned by the API function @analyzeAndRewriteFwdBody@,
we use a 12-line function:
\begin{smallfuzzcode}{2.5pt}
`normalizeGraph
:: NonLocal n => DG f n e x -> (Graph n e x, FactBase f)
\end{smallfuzzcode}
Function @arfGraph@ is implemented as follows:
\begingroup
\def\^{\\[-6pt]}%
\hfuzz=15.1pt
\begin{smallttcode}
arfGraph ^pass entries = graph
where\^
node :: forall e x . (ShapeLifter e x)
=> n e x -> f -> m (DG f n e x, Fact x f)\^
block:: forall e x .
Block n e x -> f -> m (DG f n e x, Fact x f)\^
body :: [Label] -> LabelMap (Block n C C)
-> Fact C f -> m (DG f n C C, Fact C f)\^
`graph:: Graph n e x -> Fact e f -> m (DG f n e x, Fact x f)\^
... definitions of 'node', 'block', 'body', and 'graph' ...
\end{smallttcode}
The four auxiliary functions help us separate concerns: for example, only
\endgroup
@node@ knows about rewrite functions,
and only @body@ knows about fixed points.
%% All four functions have access to the @FwdPass@ and any entry points
%% that are passed to @arfGraph@.
%% These functions also have access to type variables bound by
%% @arfGraph@:
%% @n@~is the type of nodes; @f@~is the type of facts;
%% @m@~is the monad used in the rewrite functions of the @FwdPass@;
%% and
%% @e@~and~@x@ give the shape of the graph passed to @arfGraph@.
%% The types of the inner functions are
%% \begin{smallcode}
%% \end{smallcode}
Each auxiliary function works the same way: it~takes a ``thing'' and
returns an \emph{extended fact transformer}.
An~extended fact transformer takes dataflow fact(s) coming into
the ``thing,'' and it returns an output fact.
It~also returns a decorated graph representing the (possibly
rewritten) ``thing''---that's the \emph{extended} part.
Finally, because rewrites are monadic,
every extended fact transformer is monadic.
%%%% \begin{figure}
%%%% SIMON HAS ASKED IF TYPE SYNONYMS MIGHT IMPROVE THINGS FOR EXTENDED
%%%% FACT TRANSFORMERS. JUDGE FOR YOURSELF.
%%%% FIRST, SOMETHING THAT IS SOMEWHAT READABLE BUT IS NOT LEGAL HASKELL:
%%%% \begin{smallcode}
%%%% type EFFX ipt e x = ipt -> m (DG f n e x, Fact x f)
%%%% -- extended forward fact transformer
%%%%
%%%% node :: forall e x . (ShapeLifter e x)
%%%% => n e x -> EFFX f e x
%%%% block :: forall e x .
%%%% Block n e x -> EFFX f e x
%%%% body :: [Label] -> LabelMap (Block n C C)
%%%% -> EFFX (Fact C f) C C
%%%% graph :: Graph n e x -> EFFX (Fact e f) e x
%%%% \end{smallcode}
%%%% IF WE MAKE IT LEGAL HASKELL, IT BECOMES COMPLETELY HOPELESS:
%%%% \begin{smallcode}
%%%% type EFFX m n f ipt e x = ipt -> m (DG f n e x, Fact x f)
%%%% -- extended forward fact transformer
%%%%
%%%% node :: forall e x . (ShapeLifter e x)
%%%% => n e x -> EFFX m n f f e x
%%%% block :: forall e x .
%%%% Block n e x -> EFFX m n f f e x
%%%% body :: [Label] -> LabelMap (Block n C C)
%%%% -> EFFX m n f (Fact C f) C C
%%%% graph :: Graph n e x -> EFFX m n f (Fact e f) e x
%%%% \end{smallcode}
%%%% \caption{EXPERIMENTS WITH TYPE SYNONYMS}
%%%% \end{figure}
%%%%
The types of the
\ifpagetuning
\else
four
\fi
extended fact transformers are not quite
identical:
\begin{itemize}
\item
Extended fact transformers for nodes and blocks have the same type;
like forward transfer functions,
they expect a fact~@f@ rather than the more general @Fact e f@
required for a graph.
Because a node or a block has
exactly one fact flowing into the entry, it is easiest simply to pass
that fact.
\item
Extended fact transformers for graphs have the most general type,
as expressed using @Fact@:
if the graph is open on entry, its~fact transformer expects a
single fact;
if the graph is closed on entry, its~fact transformer expects a
@FactBase@.
\item
Extended fact transformers for bodies have the same type as
extended fact transformers for closed/closed graphs.
\end{itemize}
Function @arfGraph@ and its four auxiliary functions comprise a cycle of
mutual recursion:
@arfGraph@ calls @graph@;
@graph@ calls @body@ and @block@;
@body@ calls @block@;
@block@ calls @node@;
and
@node@ calls @arfGraph@.
These five functions do three different kinds of work:
compose extended fact transformers, analyze and rewrite nodes, and compute
fixed points.
\subsection{Analyzing blocks and graphs by composing extended fact transformers}
\seclabel{block-impl}
Extended fact transformers compose nicely.
For example, @block@ is implemented thus:
% we need the foralls
\begin{smallcode}
`block :: forall e x .
Block n e x -> f -> m (DG f n e x, Fact x f)
block (BFirst n) = node n
block (BMiddle n) = node n
block (BLast n) = node n
block (BCat b1 b2) = block b1 `cat` block b2
\end{smallcode}
The composition function @cat@ feeds facts from one extended fact
transformer to another, and it splices decorated graphs.
\smallverbatiminput{cat}
% defn cat
% local ft1
% local ft2
(Function @`dgSplice@ is the same splicing function used for an ordinary
@Graph@, but it uses a one-line block-concatenation function suitable
for @DBlock@s.)
The name @cat@ comes from the concatenation of the decorated graphs,
but it is also appropriate because the style in which it is used is
reminiscent of @concatMap@, with the @node@ and @block@ functions
playing the role of @map@.
\seclabel{concat-map-style}
Function @graph@ is much like @block@, but it has more cases.
%%%%
%%%% \begin{itemize}
%%%% \item
%%%% The @arfNode@ function processes nodes (\secref{arf-node}).
%%%% It handles the subtleties of interleaved analysis and rewriting,
%%%% and it deals with fuel consumption. It calls @arfGraph@ to analyze
%%%% and transform rewritten graphs.
%%%% \item
%%%% Based on @arfNode@ it is extremely easy to write @arfBlock@, which lifts
%%%% the analysis and rewriting from nodes to blocks (\secref{arf-block}).
%%%%
%%%%
%%%%
%%%% \item
%%%% Using @arfBlock@ we define @arfBody@, which analyzes and rewrites a
%%%% @Body@: a~group of closed/closed blocks linked by arbitrary
%%%% control flow.
%%%% Because a @Body@ is
%%%% always closed/closed and does not take shape parameters, function
%%%% @arfBody@ is less polymorphic than the others; its type is what
%%%% would be obtained by expanding and specializing the definition of
%%%% @ARF@ for a @thing@ which is always closed/closed and is equivalent to
%%%% a @Body@.
%%%%
%%%% Function @arfBody@ takes care of fixed points (\secref{arf-body}).
%%%% \item
%%%% Based on @arfBody@ it is easy to write @arfGraph@ (\secref{arf-graph}).
%%%% \end{itemize}
%%%% Given these functions, writing the main analyzer is a simple
%%%% matter of matching the external API to the internal functions:
%%%% \begin{code}
%%%% `analyzeAndRewriteFwdBody
%%%% :: forall n f. NonLocal n
%%%% => FwdPass n f -> Body n -> FactBase f
%%%% -> FuelMonad (Body n, FactBase f)
%%%%
%%%% analyzeAndRewriteFwdBody pass ^body facts
%%%% = do { (^rg, _) <- arfBody pass body facts
%%%% ; return (normalizeBody rg) }
%%%% \end{code}
%%%%
%%%% \subsection{From nodes to blocks} \seclabel{arf-block}
%%%% \seclabel{arf-graph}
%%%%
%%%% We begin our explanation with the second task:
%%%% writing @arfBlock@, which analyzes and transforms blocks.
%%%% \begin{code}
%%%% `arfBlock :: NonLocal n => ARF (Block n) n
%%%% arfBlock pass (BUnit node) f
%%%% = arfNode pass node f
%%%% arfBlock pass (BCat b1 b2) f
%%%% = do { (g1,f1) <- arfBlock pass b1 f
%%%% ; (g2,f2) <- arfBlock pass b2 f1
%%%% ; return (g1 `DGCatO` g2, f2) }
%%%% \end{code}
%%%% The code is delightfully simple.
%%%% The @BUnit@ case is implemented by @arfNode@.
%%%% The @BCat@ case is implemented by recursively applying @arfBlock@ to the two
%%%% sub-blocks, threading the output fact from the first as the
%%%% input to the second.
%%%% Each recursive call produces a rewritten graph;
%%%% we concatenate them with @DGCatO@.
%%%%
%%%% Function @arfGraph@ is equally straightforward:
%%%% XXXXXXXXXXXXXXX
%%%% The pattern is the same as for @arfBlock@: thread
%%%% facts through the sequence, and concatenate the results.
%%%% Because the constructors of type~@DG@ are more polymorphic than those
%%%% of @Graph@, type~@DG@ can represent
%%%% graphs more simply than @Graph@; for example, each element of a
%%%% @GMany@ becomes a single @DG@ object, and these @DG@ objects are then
%%%% concatenated to form a single result of type~@DG@.
%%%%
\subsection{Analyzing and rewriting nodes} \seclabel{arf-node}
The @node@ function is where we interleave analysis with rewriting:
\simon{NR: FwdGraphAndTail can be expunged and replaced with a simple pair
type}
\smallfuzzverbatiminput{15.1pt}{node}
% defn ShapeLifter
% defn singletonDG
% defn fwdEntryFact
% defn fwdEntryLabel
% defn ftransfer
% defn frewrite
% local pass'
% local gtail
% omit FwdGraphAndTail :: Graph n e x -> FwdRewrite m n f -> FwdGraphAndTail m n f e x
%
Function @node@ uses @frewrite@ to extract the rewrite function from
@pass@,
and applies the rewrite function to the node~@n@ and the incoming fact~@f@.
The result, @gtail@, is
scrutinized by the @case@ expression.
In the @Nothing@ case, no rewrite takes place.
We~return node~@n@ and its incoming fact~@f@
as the decorated graph @singletonDG f n@.
To produce the outgoing fact, we apply the transfer function
@ftransfer pass@ to @n@~and~@f@.
In the @Just@ case, we receive a replacement
graph~@g@ and a new rewrite function~@tail@.
The function~@tail@ represents the tail of a sequence of rewrite
functions, as described in \secref{rewrite-as-sequence}.
We~use @tail@ to recursively analyze and rewrite @g@ with @arfGraph@.
This analysis uses @pass'@, which contains the original lattice and transfer
function from @pass@, together with~@tail@.
Function @fwdEntryFact@ converts fact~@f@ from the type~@f@,
which @node@ expects, to the type @Fact e f@, which @arfGraph@ expects.
As you see, several functions called in @node@ are overloaded over a
(private) class @ShapeLifter@. Their implementations depend
on the open/closed shape of the node.
By design, the shape of a node is known statically everywhere @node@
is called, so
this use of @ShapeLifter@ is specialized
away by the compiler.
%% And that's it! If~the client wanted deep rewriting, it is
%% implemented by the call to @arfGraph@;
%% if the client wanted
%% shallow rewriting, the rewrite function will have returned
%% @noFwdRw@ as~@rw@, which is implanted in @pass'@
%% (\secref{shallow-vs-deep}).
\subsection{Fixed points} \seclabel{arf-body}
The fourth and final auxiliary function of @arfGraph@ is
@body@, which iterates to a fixed point.
This part of the implementation is the only really tricky part, and it is
cleanly separated from everything else:
\smallfuzzverbatiminput{2.5pt}{bodyfun}
% defn body
% local do_block
% local blocks
% local lattice
% local entryFact
% local entries
% local init_fbase
% local blockmap
% local fb
Function @getFact@ looks up a fact by its label.
If the label is not found,
@getFact@ returns
the bottom element of the lattice:
\begin{smallcode}
`getFact :: DataflowLattice f -> Label -> FactBase f -> f
\end{smallcode}
Function @forwardBlockList@ takes a list of possible entry points and
a finite map from labels to blocks.
It returns a list of
blocks, sorted into an order that makes forward dataflow efficient.\footnote
{The order of the blocks does not affect the fixed point or any other
result; it affects only the number of iterations needed to
reach the fixed point.}
\begin{smallcode}
`forwardBlockList
:: NonLocal n
=> [Label] -> LabelMap (Block n C C) -> [Block n C C]
\end{smallcode}
For
example, if the entry point is at~@L2@, and the block at~@L2@
branches to~@L1@, but not vice versa, then \hoopl\ will reach a fixed point
more quickly if we process @L2@ before~@L1@.
To~find an efficient order, @forwardBlockList@ uses
the methods of the @NonLocal@ class---@entryLabel@ and @successors@---to
perform a reverse postorder depth-first traversal of the control-flow graph.
%%
%%The @NonLocal@ type-class constraint on~@n@ propagates to all the
%%@`arfThing@ functions.
%% paragraph carrying too much freight
%%
The rest of the work is done by @`fixpoint@, which is shared by
both forward and backward analyses:
\smallfuzzverbatiminput{2.5pt}{fptype}
% defn Direction
% defn Fwd
% defn Bwd
Except for the @Direction@ passed as the first argument,
the type signature tells the story.
The third argument is an extended fact transformer for a single block;
@fixpoint@ applies that function successively to each block in the list
passed as the fourth argument.
The result is an extended fact transformer for the list.
The extended fact transformer returned by @fixpoint@
maintains a
``current @FactBase@''
which grows monotonically:
as each block is analyzed,
the block's input fact is taken from
the current @FactBase@,
and the current @FactBase@
is
augmented with the facts that flow out of the block.
%
The initial value of the current @FactBase@ is the input @FactBase@,
and
the extended fact transformer
iterates over the blocks until the current @FactBase@
stops changing.
Implementing @fixpoint@ requires about 90 lines,
formatted narrowly for display in one column.
%%
%% for completeness, it
%% appears in Appendix~\ref{app:fixpoint}.
The~code is mostly straightforward, although we try to be clever
about deciding when a new fact means that another iteration
will be required.
\finalremark{Rest of this \S\ is a candidate to be cut}
There is one more subtle point worth mentioning, which we highlight by
considering a forward analysis of this graph, where execution starts at~@L1@:
\begin{code}
L1: x:=3; goto L4
L2: x:=4; goto L4
L4: if x>3 goto L2 else goto L5
\end{code}
Block @L2@ is unreachable.
But if we \naively\ process all the blocks (say in
order @L1@, @L4@, @L2@), then we will start with the bottom fact for @L2@, propagate
\{@x=4@\} to @L4@, where it will join with \{@x=3@\} to yield
\{@x=@$\top$\}.
Given @x=@$\top$, the
conditional in @L4@ cannot be rewritten, and @L2@~seems reachable. We have
lost a good optimization.
Function @fixpoint@ solves this problem
by analyzing a block only if the block is
reachable from an entry point.
This trick is safe only for a forward analysis, which
is why
@fixpoint@ takes a @Direction@ as its first argument.
%% Although the trick can be implemented in just a couple of lines of
%% code, the reasoning behind it is quite subtle---exactly the sort of
%% thing that should be implemented once in \hoopl, so clients don't have
%% to worry about it.
\subsection{Throttling rewriting using ``optimization fuel''}
\seclabel{vpoiso}
\seclabel{fuel}
\seclabel{whalley-from-s2}
\seclabel{first-debugging-section}
When optimization produces a faulty program,
we use Whalley's \citeyearpar{whalley:isolation} technique to find the fault:
given a program that fails when compiled with optimization,
a binary search on the number of rewrites
finds an~$n$ such that the program works after $n-1$ rewrites
but fails after $n$~rewrites.
The $n$th rewrite is faulty.
As~alluded to at the end of \secref{debugging-introduced}, this
technique enables us to debug complex optimizations by
identifying one single rewrite that is faulty.
This debugging technique requires the~ability to~limit
the number of~rewrites.
We limit rewrites using \emph{optimization fuel}.
Each rewrite consumes one unit of fuel,
and when fuel is exhausted, all rewrite functions return @Nothing@.
To~debug, we do binary search on the amount of fuel.
\simon{Missing is the key point: that the entire fuel story is implementable
by snazzy rewrite functions, and could be done by the client. We
provide the wrapper only for convenience.}
The supply of fuel is encapsulated in the @FuelMonad@ type class (\figref{api-types}),
which must be implemented by the client's monad @m@.
To~ensure that each rewrite consumes one~unit of~fuel,
@mkFRewrite@ wraps the client's rewrite function, which is oblivious to fuel,
in~another function that satisfies the following contract:
\begin{itemize}
\item
If the fuel supply is empty, the wrapped function always returns @Nothing@.
\item
If the wrapped function returns @Just g@, it has the monadic effect of
reducing the fuel supply by one unit.
\end{itemize}
%%%% \seclabel{fuel-monad}
%%%%
%%%% \begin{ntext}
%%%% \subsection{Rewrite functions}
%%%%
%%%%
%%%%
%%%% \begin{code}
%%%% `withFuel :: FuelMonad m => Maybe a -> m (Maybe a)
%%%% \end{code}
%%%%
%%%%
%%%% as expressed by the
%%%% @FwdRew@ type returned by a @FwdRewrite@ (\figref{api-types}).
%%%% The first component of the @FwdRew@ is the replacement graph, as discussed earlier.
%%%% The second component, $\rw$, is a
%%%% \emph{new rewrite function} to use when recursively processing
%%%% the replacement graph.
%%%% For shallow rewriting this new function is
%%%% the constant @Nothing@ function; for deep rewriting it is the original
%%%% rewrite function.
%%%% While @mkFRewrite@ allows for general rewriting, most clients will
%%%% take advantage of \hoopl's support for these two common cases:
%%%% \begin{smallcode}
%%%% `deepFwdRw, `shallowFwdRw
%%%% :: Monad m
%%%% => (forall e x . n e x -> f -> m (Maybe (Graph n e x))
%%%% -> FwdRewrite m n f
%%%% \end{smallcode}
%%%% \end{ntext}
\section {Related work} \seclabel{related}
While there is a vast body of literature on
dataflow analysis and optimization,
relatively little can be found on
the \emph{design} of optimizers, which is the topic of this paper.
We therefore focus on the foundations of dataflow analysis
and on the implementations of some comparable dataflow frameworks.
\paragraph{Foundations}
When transfer functions are monotone and lattices are finite in height,
iterative dataflow analysis converges to a fixed point
\cite{kam-ullman:global-iterative-analysis}.
If~the lattice's join operation distributes over transfer
functions,
this fixed point is equivalent to a join-over-all-paths solution to
the recursive dataflow equations
\cite{kildall:unified-optimization}.\footnote
{Kildall uses meets, not joins.
Lattice orientation is a matter of convention, and conventions have changed.
We use Dana Scott's
orientation, in which higher elements carry more information.}
\citet{kam-ullman:monotone-flow-analysis} generalize to some
monotone functions.
Each~client of \hoopl\ must guarantee monotonicity.
\citet{cousot:abstract-interpretation:1977,cousot:systematic-analysis-frameworks}
introduce abstract interpretation as a technique for developing
lattices for program analysis.
\citet{steffen:data-flow-analysis-model-checking:1991} shows that
a dataflow analysis can be implemented using model checking;
\citet{schmidt:data-flow-analysis-model-checking}
expands on this~result by showing that
an all-paths dataflow problem can be viewed as model checking an
abstract interpretation.
\citet{marlowe-ryder:properties-data-flow-frameworks}
present a survey of different methods for performing dataflow analyses,
with emphasis on theoretical results.
\citet{muchnick:compiler-implementation}
presents many examples of both particular analyses and related
algorithms.
The soundness of interleaving analysis and transformation,
even when not all speculative transformations are performed on later
iterations, is shown by
\citet{lerner-grove-chambers:2002}.
\paragraph{Frameworks}
Most dataflow frameworks support only analysis, not transformation.
The framework computes a fixed point of transfer functions, and it is
up to the client of
the framework to use that fixed point for transformation.
Omitting transformation makes it much easier to build frameworks,
and one can find a spectrum of designs.
We~describe two representative
designs, then move on to frameworks that do interleave
analysis and transformation.
The Soot framework is designed for analysis of Java programs
\cite{hendren:soot:2000}.
%% {This citation is probably the
%% best for Soot in general, but there doesn't appear
%% to be any formal publication that actually details the dataflow
%% framework part. ---JD}
While Soot's dataflow library supports only analysis, not
transformation, we found much
to admire in its design.
Soot's library is abstracted over the representation of
the control-flow graph and the representation of instructions.
Soot's interface for defining lattice and analysis functions is
like our own,
although because Soot is implemented in an imperative style,
additional functions are needed to copy lattice elements.
The CIL toolkit \cite{necula:cil:2002}
%% \finalremark{No good citation
%% for same reason as Soot above ---JD}
supports both analysis and rewriting of C~programs,
but rewriting is clearly distinct from analysis:
one runs an analysis to completion and then rewrites based on the
results.
The framework is limited to one representation of control-flow graphs
and one representation of instructions, both of which are provided by
the framework.
The~API is complicated;
much of the complexity is needed to enable the client to
affect which instructions
the analysis iterates over.
%% \finalremark{FYI, LLVM has Pass Managers that try to control the
%% order of passes,
%% but I'll be darned if I can find anything that might be termed a
%% dataflow framework.}
The Whirlwind compiler contains the dataflow framework implemented
by \citet{lerner-grove-chambers:2002}, who were the first to
interleave analysis and transformation.
Their implementation is much like our early efforts:
it is a complicated mix of code that simultaneously manages interleaving,
deep rewriting, and fixed-point computation.
By~separating these tasks,
our implementation simplifies the problem dramatically.
Whirlwind's implementation also suffers from the difficulty of
maintaining pointer invariants in a mutable representation of
control-flow graphs, a problem we have discussed elsewhere
\cite{ramsey-dias:applicative-flow-graph}.
Because speculative transformation is difficult in an imperative setting,
Whirlwind's implementation is split into two phases.
The first phase runs the interleaved analyses and transformations
to compute the final dataflow facts and a representation of the transformations
that should be applied to the input graph.
The second phase executes the transformations.
In~\hoopl, because control-flow graphs are immutable, speculative transformations
can be applied immediately, and there is no need
for a phase distinction.
%%% % repetitious...
%%%
%%% \ourlib\ also improves upon Whirlwind's dataflow framework by providing
%%% new support for the optimization writer:
%%% \begin{itemize}
%%% \item Using static type guarantees, \hoopl\ rules out a whole
%%% class of possible bugs: transformations that produced malformed
%%% control-flow graphs.
%%% \item Using dynamic testing,
%%% we can isolate the rewrite that transforms a working program
%%% into a faulty program,
%%% using Whalley's \citeyearpar{whalley:isolation} fault-isolation technique.
%%% \end{itemize}
%% what follows now looks redundant with discussion below ---NR
%% In previous work \cite{ramsey-dias:applicative-flow-graph}, we
%% described a zipper-based representation of control-flow
%% graphs, stressing the advantages
%% of immutability.
%% Our new representation, described in \secref{graph-rep}, is a significant improvement:
%% \begin{itemize}
%% \item
%% We can concatenate nodes, blocks, and graphs in constant time.
%% %Previously, we had to resort to Hughes's
%% %\citeyearpar{hughes:lists-representation:article} technique, representing
%% %a graph as a function.
%% \item
%% We can do a backward analysis without having
%% to ``unzip'' (and allocate a copy of) each block.
%% \item
%% Using GADTs, we can represent a flow-graph
%% node using a single type, instead of the triple of first, middle, and
%% last types used in our earlier representation.
%% This change simplifies the interface significantly:
%% instead of providing three transfer functions and three rewrite
%% functions per pass---one for
%% each type of node---a client of \hoopl\ provides only one transfer
%% function and one rewrite function per pass.
%% \item
%% Errors in concatenation are ruled out at
%% compile-compile time by Haskell's static
%% type system.
%% In~earlier implementations, such errors were not detected until
%% the compiler~ran, at which point we tried to compensate
%% for the errors---but
%% the compensation code harbored subtle faults,
%% which we discovered while developing a new back end
%% for the Glasgow Haskell Compiler.
%% \end{itemize}
%%
%% The implementation of \ourlib\ is also much better than
%% our earlier implementations.
%% Not only is the code simpler conceptually,
%% but it is also shorter:
%% our new implementation is about a third as long
%% as the previous version, which is part of GHC, version 6.12.
\section{Performance considerations}
Our work on \hoopl\ is too new for us to be able to say much
about performance.
It's~important to know how well \hoopl\ performs, but the
question is comparative, and there isn't another library we can compare
\hoopl\ with.
For example, \hoopl\ is not a drop-in replacement for an existing
component of GHC; we introduced \hoopl\ to GHC as part of a
major refactoring of GHC's back end.
With \hoopl, GHC seems about 15\%~slower than
the previous~GHC, and we
don't know what portion of the slowdown can be attributed to the
optimizer.
%
We~can say that the costs of using \hoopl\ seem reasonable;
there is no ``big performance hit.''
And~a somewhat similar library, written in an \emph{impure} functional
language, actually improved performance in an apples-to-apples
comparison with a library using a mutable control-flow graph
\cite{ramsey-dias:applicative-flow-graph}.
Although thorough evaluation of \hoopl's performance must await
future work, we can identify some design decisions that might affect
performance. \simon{I think it's highly speculative whether these issues
are anywhere near the high-order bit of performance. Candidate for cutting
if you ask me.}
\begin{itemize}
\item
In \figref{graph}, we show a single concatenation operator for blocks.
Using this representation, a block of $N$~nodes is represented using
$2N-1$ heap objects.
We~have also implemented a representation of blocks that include
``cons-like'' and ``snoc-like'' constructors;
this representation requires only $N+1$ heap objects.
We~don't know what difference this choice makes to performance.
\item
In \secref{engine}, the @body@ function analyzes and (speculatively)
rewrites the body of a control-flow graph, and @fixpoint@ iterates
this analysis until it reaches a fixed point.
Decorated graphs computed on earlier iterations are thrown away.
But~for each decorated graph of $N$~nodes, it is necessary to allocate
at least $2N-1$~thunks, which correspond to applications of
@singletonDG@ in~@node@ and of @dgSplice@ in~@cat@.
In~an earlier version of \hoopl, this overhead was
eliminated by splitting @arfGraph@ into two very similar functions: one to compute the
fixed point, and the other to produce the rewritten graph.
Having a single version of @arfGraph@ is simpler and easier
to maintain, but we don't know the cost of allocating the
additional thunks.
\item
The representation of a forward-transfer function is private to
\hoopl.
Two representations are possible:
we may store a triple of functions, one for each shape a node may
have;
or we may store a single, polymorphic function.
If~we use triples throughout, the costs are straightforward, but the
code is complex.
If~we use a single, polymorphic function, we sometimes have to use a
\emph{shape classifier} (supplied by the client) when composing
transfer functions.
Using a shape classifier may introduce extra @case@ discriminations
every time a transfer function or rewrite function is applied to a
node.
We~don't know how these extra discriminations might affect
performance.
\end{itemize}
In summary, \hoopl\ performs well enough for use in~GHC,
but there is much we don't know. Systematic investigation
is indicated.
\remark{We have no evidence that any of the above actually affects performance}
\section{Discussion}
We built \hoopl\ in order to combine three good ideas (interleaved
analysis and transformation, optimization fuel, and an applicative
control-flow graph) in a way that could easily be reused by many
compiler writers.
To~evaluate how well we succeeded, we examine how \hoopl\ has been
used,
we~examine the API, and we examine the implementation.
We~also sketch some more alternatives.
\paragraph{Using \hoopl}
As~suggested by the constant-propagation example in
\figref{const-prop}, \hoopl\ makes it easy to implement many standard
dataflow analyses.
Students using \hoopl\ in a class at Tufts were able to implement
such optimizations as lazy code motion \cite{knoop:lazy-code-motion}
and induction-variable elimination
\cite{cocke-kennedy:operator-strength-reduction} in just a few weeks.
Students at Yale and at Portland State have also succeeded in building
a variety of optimizations.
\hoopl's graphs can support optimizations beyond classic
dataflow.
For example, in~GHC, \hoopl's graphs are used
to implement optimizations based on control flow,
such as eliminating branch chains.
\hoopl\ is SSA-neutral:
although we know of no attempt to use
\hoopl\ to establish or enforce SSA~invariants,
\hoopl\ makes it easy to include $\phi$-functions in the
representation of first nodes,
and if a transformation preserves SSA~invariants, it will continue to do
so when implemented in \hoopl.
\paragraph{Examining the API}
We hope that our presentation of the API in \secref{api} speaks for
itself,
but there are a couple of properties we think are worth highlighting.
First, it's a good sign that the API provides many higher-order
combinators that make it easier to write client code. % with simple, expressive types.
We~have had space to mention only a few:
@extendJoinDomain@,
@joinMaps@,
@thenFwdRw@, @iterFwdRw@, @deepFwdRw@, and @pairFwd@.
Second,
the static encoding of open and closed shapes at compile time worked
out well.
% especially because it applies equally to nodes, blocks, and graphs.
Shapes may
seem like a small refinement, but they helped eliminate a number of
bugs from GHC, and we expect them to help other clients too.
GADTs are a convenient way to express shapes, and for clients
written in Haskell, they are clearly appropriate.
If~one wished to port \hoopl\ to a language without GADTs,
many of the benefits could be realized by making the shapes phantom
types, but without GADTs, pattern matching would be significantly more
tedious and error-prone.
% An~advantage of our ``shapely'' node API is that a client can
% write a \emph{single} transfer function that is polymorphic in shape.
% To~make this design work, however, we \emph{must} have
% the type-level function
% @Fact@ (\figref{api-types}), to express how incoming
% and outgoing facts depend on the shape of a node.
% Without type-level functions, we would have had to force clients to
% use \emph{only} the triple-of-functions interface described in
% \secref{triple-of-functions}.
\paragraph{Examining the implementation}
If you are thinking of adopting \hoopl, you should consider not
only whether you like the API, but whether, in case of emergency, you
could maintain the implementation.
We~believe that \secref{dfengine} sketches enough to show that \hoopl's
implementation is a clear improvement over previous implementations
of similar ideas.
% The implementation is more difficult to evaluate than the~API.
% Previous implementations of similar ideas have rolled the problems
% into a big ball of mud.
By~decomposing our implementation into @node@, @block@, @body@,
@graph@, @cat@, @fixpoint@, and @mkFRewrite@, we have clearly separated
multiple concerns:
interleaving analysis with rewriting,
throttling rewriting using optimization fuel,
and
computing a fixed point using speculative rewriting.
Because of this separation of concerns,
we believe our implementation will be much easier to maintain than
anything that preceded~it.
%% Another good sign is that we have been able to make substantial
%% changes in the implementation without changing the API.
%% For example, in addition to ``@concatMap@ style,'' we have also
%% implemented @arfGraph@ in ``fold style'' and in continuation-passing
%% style.
%% Which style is preferred is a matter of taste, and possibly
%% a matter of performance.
\iffalse
(We have also implemented a ``fold style,'' but because the types are
a little less intuitive, we are sticking with @concatMap@ style for now.)
> Some numbers, I have used it nine times, and would need the general fold
> once to define blockToNodeList (or CB* equivalent suggested by you).
> (We are using it in GHC to
> - computing hash of the blocks from the nodes
> - finding the last node of a block
> - converting block to the old representation (2x)
> - computing interference graph
> - counting Area used by a block (2x)
> - counting stack high-water mark for a block
> - prettyprinting block)
type-parameter hell, newtype hell, typechecking hell, instance hell,
triple hell
\fi
% We have spent six years implementing and reimplementing frameworks for
% dataflow analysis and transformation.
% This formidable design problem taught us
% two kinds of lessons:
% we learned some very specific lessons about representations and
% algorithms for optimizing compilers,
% and we were forcibly reminded of some very general, old lessons that are well
% known not just to functional programmers, but to programmers
% everywhere.
%%%% \remark{Orphaned: but for transfer functions that
%%%% approximate weakest preconditions or strongest postconditions,
%%%% monotonicity falls out naturally.}
%%%%
%%%%
%%%% In conclusion we offer the following lessons from the experience of designing
%%%% and implementing \ourlib{}.
%%%% \begin{itemize}
%%%% \item
%%%% Although we have not stressed this point, there is a close connection
%%%% between dataflow analyses and program logic:
%%%% \begin{itemize}
%%%% \item
%%%% A forward dataflow analysis is represented by a predicate transformer
%%%% that is related to \emph{strongest postconditions}
%%%% \cite{floyd:meaning}.\footnote
%%%% {In Floyd's paper the output of the predicate transformer is called
%%%% the \emph{strongest verifiable consequent}, not the ``strongest
%%%% postcondition.''}
%%%% \item
%%%% A backward dataflow analysis is represented by a predicate transformer
%%%% that is related to \emph{weakest preconditions} \cite{dijkstra:discipline}.
%%%% \end{itemize}
%%%% Logicians write down the predicate transformers for the primitive
%%%% program fragments, and then use compositional rules to ``lift'' them
%%%% to a logic for whole programs. In the same way \ourlib{} lets the client
%%%% write simple predicate transformers,
%%%% and local rewrites based on those assertions, and ``lifts'' them to entire
%%%% function bodies with arbitrary control flow.
\iffalse
Reuse requires abstraction, and as is well known,
designing good abstractions is challenging.
\hoopl's data types and the functions over those types have been
through {dozens} of revisions.
\remark{dozens alert}
As~we were refining our design, we~found it invaluable to operate in
two modes:
In the first mode, we designed, built, and used a framework as an
important component of a real compiler (first Quick~{\PAL}, then GHC).
In the second mode, we designed and built a standalone library, then
redesigned and rebuilt it, sometimes going through several significant
changes in a week.
Operating in the first mode---inside a live compiler---forced us to
make sure that no corners were cut, that we were solving a real
problem, and that we did not inadvertently
cripple some other part of the compiler.
Operating in the second mode---as a standalone library---enabled us to
iterate furiously, trying out many more ideas than would have
been possible in the first mode.
Alternating between these two modes has led to a
better design than operating in either mode alone.
%% We were forcibly reminded of timeless truths:
It is a truth universally acknowledged that
interfaces are more important than implementations and data
is more important than code.
This truth is reflected in this paper, in which
we
have given \hoopl's API three times as much space as \hoopl's implementation.
We have evaluate \hoopl's API through small, focused classroom
projects and by using \hoopl\ in the back end of the Glasgow Haskell
Compiler.
We were also reminded that Haskell's type system (polymorphism, GADTs,
higher-order functions, type classes, and so on) is a remarkably
effective
language for thinking about data and code---and that
Haskell lacks a language of interfaces (like ML's signatures) that
would make it equally effective for thinking about APIs at a larger scale.
Still, as usual, the types were a remarkable aid to writing the code:
when we finally agreed on the types presented above, the
code almost wrote itself.
Types are widely appreciated at ICFP, but here are three specific
examples of how types helped us:
\begin{itemize}
\item
Reuse is enabled by representation-independence, which in a functional
language is
expressed through parametric polymorphism.
Making \ourlib{} polymorphic in the nodes
made the code simpler, easier to understand, and easier to maintain.
In particular, it forced us to make explicit \emph{exactly} what
\ourlib\ must know about nodes, and to embody that knowledge in the
@NonLocal@ type class (\secref{nonlocal-class}).
\item
\remark{too much? Simon: better?}
%
% this paper is just not about run-time performance ---NR
%
%%%% Moreover, the implementation is faster than it would otherwise be,
%%%% because, say, a @(Fact O f)e@ is known to be just an @f@ rather than
%%%% being a sum type that must be tested (with a statically known outcome!).
%
Giving the \emph{same} shapes
to nodes, blocks, and graphs helped our
thinking and helped to structure the implementation.
\item
\end{itemize}
\fi
\paragraph{More alternative interfaces and implementations}
Why do we allow the client to define the monad~@m@ used in rewrite
functions and in @analyzeAndRewriteFwdBody@?
\remark{OK, I no longer find this~\P shitty\ldots}
The~obvious alternative, which we have implemented and explored, is to require
\hoopl's clients to use a monad provided by \hoopl.
This alternative has advantages:
because \hoopl{} implements
@checkpoint@, @restart@,
@setFuel@, and @getFuel@,
we can ensure they are right, and
that the client cannot possibly misuse them.
The downside is that a rewrite function can \emph{only} use
\hoopl{}-provided monadic actions. Clearly this monad must be
able to supply fresh labels (for new blocks), but what if
% In~this alternative API, \hoopl\ also provides a supply of unique names.
%
% But~we are wary of mandating this abstraction;
% unique names affect many parts of a compiler,
% and
% no~matter how well designed the~API,
% if it does not play well with existing code,
% it won't be used.
%
% Moreover, experience has shown us that for the client, the convenience
% of a custom monad is well worth the cognitive costs of understanding
% the more complex API and the costs of meeting the contracts for
% @FuelMonad@ and @CkpointMonad@.
% As~a very simple example,
a client wanted one set of
unique names for labels and a different set for registers?
Moreover, a client might want a rewrite to perform other actions
entirely that could not be anticipated by \hoopl{}.
For example, in order to judge the effectiveness of an optimization,
a client might want to log how many rewrites take place, or in what
functions they take place.
As~a more ambitious example, \citet{runciman:increasing-prs} describes
Primitive Redex
Speculation, a code-improving transformation that can create new
function definitions.
A~\hoopl\ client implementing this transformation would define a monad
that could accumulate new definitions.
The law governing @checkpoint@ and @restart@
would ensure that a speculative rewrite, if later rolled back, would not
create a function definition (or a log entry).
Another merit of a user-defined monad~@m@ is that,
if~a user wants to manage optimization fuel differently,
he or she can make~@m@ an instance of @FuelMonad@ in which the fuel
supply is infinite.
The user is then free to create a new fuel supply in~@m@ and to wrap
rewrite functions---or not---so as to consume fuel in the new supply.
This freedom can used to implement more exotic uses of fuel;
for~example, a~user might find it convenient if it were possible to
instantiate a compiler temporary as a real hardware register without
consuming fuel.
%% \simon{These next two paras are incomprehensible. Cut?}
%% Of the many varied implementations we have tried,
%% we have space only to raise a few questions, with even fewer answers.
%% %
%% An~earlier implementation of @fixpoint@ stored the
%% ``current'' @FactBase@ in a monad; we~find it easier to understand and
%% maintain the code that passes the current @FactBase@ as an argument.
%% Among @concatMap@ style, fold style, and continuation-passing style,
%% which is best?
%% No~one of these styles makes all the code easiest to read
%% and understand: @concatMap@ style is relatively straightforward throughout;
%% fold~style is similar overall but different in detail;
%% and continuation-passing style is clear and elegant to those who
%% like continuations, but baffling to those who don't.
Which value constructors should be\simon{still incomprehensible. cut?}
polymorphic in the shapes of their arguments, and which should be
monomorphic?
We~experimented with a polymorphic
\begin{code}
`BNode :: n e x -> Block n e x
\end{code}
but we found that there are significant advantages to knowing the type
of every node statically, using purely local information---so instead
we use
the three monomorphic constructors @BFirst@, @BMiddle@, and @BLast@
(\figref{graph}).
Similar questions arise about the polymorphic @BCat@ and about the
graph constructors, and even among ourselves, we are divided about how
best to answer them.
Yet another question is whether it is worthwhile to save a level of
indirection by providing a cons-like constructor to concatenate a node
and a block.
Perhaps some of these questions can be answered by appealing to
performance, but the experiments that will provide the answers have
yet to be devised.
\paragraph{Final remarks}
Dataflow optimization is usually described as a way to improve imperative
programs by mutating control-flow graphs.
Such transformations appear very different from the tree rewriting
that functional languages are so well known for and which makes
\ifhaskellworkshop
Haskell
\else
functional languages
\fi
so attractive for writing other parts of compilers.
But even though dataflow optimization looks very different from
what we are used to,
writing a dataflow optimizer
in
\ifhaskellworkshop
Haskell
\else
a pure functional language
\fi
was a win:
% We could not possibly have conceived \ourlib{} in C++.
we had to make every input and output explicit,
and we had a strong incentive to implement things compositionally.
Using Haskell helped us make real improvements in the implementation of
some very sophisticated ideas.
% %%
% %%
% %% In~a pure functional language, not only do we know that
% %% no data structure will be unexpectedly mutated,
% %% but we are forced to be
% %% explicit about every input and output,
% %% and we are encouraged to implement things compositionally.
% %% This kind of thinking has helped us make
% %% significant improvements to the already tricky work of Lerner, Grove,
% %% and Chambers:
% %% per-function control of shallow vs deep rewriting
% %% (\secref{shallow-vs-deep}),
% %% optimization fuel (\secref{fuel}),
% %% and transparent management of unreachable blocks (\secref{arf-body}).
% We~trust that the improvements are right only because they are
% implemented in separate
% parts of the code that cannot interact except through
% explicit function calls.
% %% %%
% %% %%An ancestor of \ourlib{} is in the Glasgow Haskell Compiler today,
% %% %%in version~6.12.
% %% With this new, improved design in hand, we are now moving back to
% %% live-compiler mode, pushing \hoopl\ into version
% %% 6.13 of the Glasgow Haskell Compiler.
\acks
Brian Huffman and Graham Hutton helped with algebraic laws.
Sukyoung Ryu told us about Primitive Redex Speculation.
Several anonymous reviewers helped improve the presentation.
% , especially reviewer~C, who suggested better language with which to
% describe our work.
The first and second authors were funded
by a grant from Intel Corporation and
by NSF awards CCF-0838899 and CCF-0311482.
These authors also thank Microsoft Research Ltd, UK, for funding
extended visits to the third author.
\makeatother
\providecommand\includeftpref{\relax} %% total bafflement -- workaround
\IfFileExists{nrbib.tex}{\bibliography{cs,ramsey}}{\bibliography{dfopt}}
\bibliographystyle{plainnatx}
\clearpage
\appendix
% don't omit LabelSet :: *
% don't omit delFromFactBase :: FactBase f -> [(Label,f)] -> FactBase f
% don't omit elemFactBase :: Label -> FactBase f -> Bool
% don't omit elemLabelSet :: Label -> LabelSet -> Bool
% don't omit emptyLabelSet :: LabelSet
% don't omit factBaseLabels :: FactBase f -> [Label]
% don't omit extendFactBase :: FactBase f -> Label -> f -> FactBase f
% don't omit extendLabelSet :: LabelSet -> Label -> LabelSet
% don't omit lookupFact :: FactBase f -> Label -> Maybe f
% don't omit factBaseList :: FactBase f -> [(Label, f)]
%% \section{Code for \textmd{\texttt{fixpoint}}}
%% \label{app:fixpoint}
%%
%% {\def\baselinestretch{0.95}\hfuzz=20pt
%% \begin{smallcode}
%% data `TxFactBase n f
%% = `TxFB { `tfb_fbase :: FactBase f
%% , `tfb_rg :: DG n f C C -- Transformed blocks
%% , `tfb_cha :: ChangeFlag
%% , `tfb_lbls :: LabelSet }
%% -- Set the tfb_cha flag iff
%% -- (a) the fact in tfb_fbase for or a block L changes
%% -- (b) L is in tfb_lbls.
%% -- The tfb_lbls are all Labels of the *original*
%% -- (not transformed) blocks
%%
%% `updateFact :: DataflowLattice f -> LabelSet -> (Label, f)
%% -> (ChangeFlag, FactBase f)
%% -> (ChangeFlag, FactBase f)
%% updateFact ^lat ^lbls (lbl, ^new_fact) (^cha, fbase)
%% | NoChange <- ^cha2 = (cha, fbase)
%% | lbl `elemLabelSet` lbls = (SomeChange, new_fbase)
%% | otherwise = (cha, new_fbase)
%% where
%% (cha2, ^res_fact)
%% = case lookupFact fbase lbl of
%% Nothing -> (SomeChange, new_fact)
%% Just ^old_fact -> fact_extend lat old_fact new_fact
%% ^new_fbase = extendFactBase fbase lbl res_fact
%%
%% fixpoint :: forall n f. NonLocal n
%% => Bool -- Going forwards?
%% -> DataflowLattice f
%% -> (Block n C C -> FactBase f
%% -> FuelMonad (DG n f C C, FactBase f))
%% -> FactBase f -> [(Label, Block n C C)]
%% -> FuelMonad (DG n f C C, FactBase f)
%% fixpoint ^is_fwd lat ^do_block ^init_fbase ^blocks
%% = do { ^fuel <- getFuel
%% ; ^tx_fb <- loop fuel init_fbase
%% ; return (tfb_rg tx_fb,
%% tfb_fbase tx_fb `delFromFactBase` blocks) }
%% -- The outgoing FactBase contains facts only for
%% -- Labels *not* in the blocks of the graph
%% where
%% `tx_blocks :: [(Label, Block n C C)]
%% -> TxFactBase n f -> FuelMonad (TxFactBase n f)
%% tx_blocks [] tx_fb = return tx_fb
%% tx_blocks ((lbl,blk):bs) tx_fb = tx_block lbl blk tx_fb
%% >>= tx_blocks bs
%%
%% `tx_block :: Label -> Block n C C
%% -> TxFactBase n f -> FuelMonad (TxFactBase n f)
%% tx_block ^lbl ^blk tx_fb@(TxFB { tfb_fbase = fbase
%% , tfb_lbls = lbls
%% , tfb_rg = ^blks
%% , tfb_cha = cha })
%% | is_fwd && not (lbl `elemFactBase` fbase)
%% = return tx_fb -- Note [Unreachable blocks]
%% | otherwise
%% = do { (rg, ^out_facts) <- do_block blk fbase
%% ; let (^cha', ^fbase')
%% = foldr (updateFact lat lbls) (cha,fbase)
%% (factBaseList out_facts)
%% ; return (TxFB { tfb_lbls = extendLabelSet lbls lbl
%% , tfb_rg = rg `DGCatC` blks
%% , tfb_fbase = fbase'
%% , tfb_cha = cha' }) }
%%
%% loop :: Fuel -> FactBase f -> FuelMonad (TxFactBase n f)
%% `loop fuel fbase
%% = do { let ^init_tx_fb = TxFB { tfb_fbase = fbase
%% , tfb_cha = NoChange
%% , tfb_rg = DGNil
%% , tfb_lbls = emptyLabelSet}
%% ; tx_fb <- tx_blocks blocks init_tx_fb
%% ; case tfb_cha tx_fb of
%% NoChange -> return tx_fb
%% SomeChange -> setFuel fuel >>
%% loop fuel (tfb_fbase tx_fb) }
%% \end{smallcode}
%% \par
%% } % end \baselinestretch
\section{Index of defined identifiers}
This appendix lists every nontrivial identifier used in the body of
the paper.
For each identifier, we list the page on which that identifier is
defined or discussed---or when appropriate, the figure (with line
number where possible).
For those few identifiers not defined or discussed in text, we give
the type signature and the page on which the identifier is first
referred to.
Some identifiers used in the text are defined in the Haskell Prelude;
for those readers less familiar with Haskell (possible even at the
Haskell Symposium!), these identifiers are
listed in Appendix~\ref{sec:prelude}.
\newcommand\dropit[3][]{}
\newcommand\hsprelude[2]{\noindent
\texttt{#1} defined in the Haskell Prelude\\}
\let\hsprelude\dropit
\newcommand\hspagedef[3][]{\noindent
\texttt{#2} defined on page~\pageref{#3}.\\}
\newcommand\omithspagedef[3][]{\noindent
\texttt{#2} not shown (but see page~\pageref{#3}).\\}
\newcommand\omithsfigdef[3][]{\noindent
\texttt{#2} not shown (but see Figure~\ref{#3} on page~\pageref{#3}).\\}
\newcommand\hsfigdef[3][]{%
\noindent
\ifx!#1!%
\texttt{#2} defined in Figure~\ref{#3} on page~\pageref{#3}.\\
\else
\texttt{#2} defined on \lineref{#1} of Figure~\ref{#3} on page~\pageref{#3}.\\
\fi
}
\newcommand\hstabdef[3][]{%
\noindent
\ifx!#1!
\texttt{#2} defined in Table~\ref{#3} on page~\pageref{#3}.\\
\else
\texttt{#2} defined on \lineref{#1} of Table~\ref{#3} on page~\pageref{#3}.\\
\fi
}
\newcommand\hspagedefll[3][]{\noindent
\texttt{#2} {let}- or $\lambda$-bound on page~\pageref{#3}.\\}
\newcommand\hsfigdefll[3][]{%
\noindent
\ifx!#1!%
\texttt{#2} {let}- or $\lambda$-bound in Figure~\ref{#3} on page~\pageref{#3}.\\
\else
\texttt{#2} {let}- or $\lambda$-bound on \lineref{#1} of Figure~\ref{#3} on page~\pageref{#3}.\\
\fi
}
\newcommand\nothspagedef[3][]{\notdefd\ndpage{#1}{#2}{#3}}
\newcommand\nothsfigdef[3][]{\notdefd\ndfig{#1}{#2}{#3}}
\newcommand\nothslinedef[3][]{\notdefd\ndline{#1}{#2}{#3}}
\newcommand\ndpage[3]{\texttt{#2}~(p\pageref{#3})}
\newcommand\ndfig[3]{\texttt{#2}~(Fig~\ref{#3},~p\pageref{#3})}
\newcommand\ndline[3]{%
\ifx!#1!%
\ndfig{#1}{#2}{#3}%
\else
\texttt{#2}~(Fig~\ref{#3}, line~\lineref{#1}, p\pageref{#3})%
\fi
}
\newif\ifundefinedsection\undefinedsectionfalse
\newcommand\notdefd[4]{%
\ifundefinedsection
, #1{#2}{#3}{#4}%
\else
\undefinedsectiontrue
\par
\section{Undefined identifiers}
#1{#2}{#3}{#4}%
\fi
}
\begingroup
\raggedright
\input{defuse}%
\ifundefinedsection.\fi
\undefinedsectionfalse
\renewcommand\hsprelude[2]{\noindent
\ifundefinedsection
, \texttt{#1}%
\else
\undefinedsectiontrue
\par
\section{Identifiers defined in Haskell Prelude or a standard library}\label{sec:prelude}
\texttt{#1}%
\fi
}
\let\hspagedef\dropit
\let\omithspagedef\dropit
\let\omithsfigdef\dropit
\let\hsfigdef\dropit
\let\hstabdef\dropit
\let\hspagedefll\dropit
\let\hsfigdefll\dropit
\let\nothspagedef\dropit
\let\nothsfigdef\dropit
\let\nothslinedef\dropit
\input{defuse}
\ifundefinedsection.\fi
\endgroup
\iffalse
\section{Dataflow-engine functions}
\begin{figure*}
\setcounter{codeline}{0}
\begin{numberedcode}
\end{numberedcode}
\caption{The forward iterator}
\end{figure*}
\begin{figure*}
\setcounter{codeline}{0}
\begin{numberedcode}
\end{numberedcode}
\caption{The forward actualizer}
\end{figure*}
\fi
\end{document}
THE FUEL PROBLEM:
Here is the problem:
A graph has an entry sequence, a body, and an exit sequence.
Correctly computing facts on and flowing out of the body requires
iteration; computation on the entry and exit sequences do not, since
each is connected to the body by exactly one flow edge.
The problem is to provide the correct fuel supply to the combined
analysis/rewrite (iterator) functions, so that speculative rewriting
is limited by the fuel supply.
I will number iterations from 1 and name the fuel supplies as
follows:
f_pre fuel remaining before analysis/rewriting starts
f_0 fuel remaining after analysis/rewriting of the entry sequence
f_i, i>0 fuel remaining after iteration i of the body
f_post fuel remaining after analysis/rewriting of the exit sequence
The issue here is that only the last iteration of the body 'counts'.
To formalize, I will name fuel consumed:
C_pre fuel consumed by speculative rewrites in entry sequence
C_i fuel consumed by speculative rewrites in iteration i of body
C_post fuel consumed by speculative rewrites in exit sequence
These quantities should be related as follows:
f_0 = f_pre - C_pref
f_i = f_0 - C_i where i > 0
f_post = f_n - C_post where iteration converges after n steps
When the fuel supply is passed explicitly as parameter and result, it
is fairly easy to see how to keep reusing f_0 at every iteration, then
extract f_n for use before the exit sequence. It is not obvious to me
how to do it cleanly using the fuel monad.
Norman