1 \iffalse

3 Say something about the proliferation of heavyweight type signatures

4 required for GADT pattern matches. When the signature is three times

5 the size of the function, something is wrong...

8 I was thinking again about unwrapped nodes, cons-lists, snoc-lists,

9 and tree fringe. I think there's an analogy with ordinary lists, and

10 the analogy is 'concatMap' vs 'fold'. Just as with lists, the common

12 unwrapping of elements. We nevertheless prefer 'concatMap' because it

13 provides better separation of concerns. But this analogy suggests

14 several ideas:

16 - Can block processing be written in higher-order fashion?

18 - Can it be done in both styles (concatMap *and* fold)?

20 - Once we have framed the problem in these terms, can we write

21 fold-style cold that is not too hard to understand and maintain?

23 - Finally, is there an analog of stream fusion that would work for

24 control-flow graphs, enabling us to write some graph combinators

25 that be both perspicuous and efficient?

27 These are good observations for the paper and for future work.

30 ----------------------------------------------------------------

33 P.S. The three of us should have a nice little Skype chat about

34 higher-rank types. A lot of the questions still at issue boil down to

35 the following observations:

37 - It's really convenient for the *implementation* of Hoopl to put

38 forall to the left of an arrow. Polymorphic functions as

39 arguments are powerful and keen, and they make it really easy for

40 Hoopl to do its job.

42 - It's a bid inconvenient for a *client* of Hoopl to be forced to

43 supply functions that are polymorphic in shape. All constructors

44 have to be written out by hand; defaulting is not possible. (Or

45 at least John and I haven't figured out how to do it.)

47 - Using higher-rank types in Hoopl's interface makes some very

48 desirable things downright impossible:

50 - One can't write a generic dominator analysis with type

52 dom :: Edges n => FwdPass n Dominators

54 - One can't write a generic debugging wrapper of type

56 debug :: (Show n, Show f)

57 => TraceFn -> WhatToTrace -> FwdPass n f -> FwdPass n f

59 - One can't write a cominator of type

61 product :: FwdPass n f -> FwdPass n f' -> FwdPass n (f, f')

63 I submit that these things are all very desirable.

65 I'm therefore strongly in favor of removing the *requirement* that a

66 FwdPass include transfer and rewrite functions that are polymorphic in

67 shape. It will be a bit tedious to return to triples of functions,

68 but for those lucky clients who *can* write polymorphic functions and

69 who wish to, we can provide injection functions.

71 I should stress that I believe the tedium can be contained within

72 reasonable bounds---for example, the arfXXX functions that are

73 internal to Hoopl all remain polymorphic in shape (although not

74 higher-rank any longer).

76 \fi

79 %

80 % TODO

81 %

82 %

83 % AGraph = graph under construction

84 % Graph = replacement graph

85 %

86 %%% Hoopl assumes that the replacement graph for a node open at the exit

87 %%% doesn't contain any additional exits

88 %%%

89 %%% introduce replacement graph in section 3, after graph.

90 %%% it has arbitrarily complicated internal control flow, but only

91 %%% one exit (if open at the exit)

92 %%%

93 %%% a rquirement on the client that is not checked statically.

100 \newif\ifnoauthornotes \noauthornotesfalse

104 % \timestampfalse % it's post-submission time

112 \genkillfalse

115 \newif\ifnotesinmargin \notesinmarginfalse

121 % higher-order optimization library

122 % ('Hoople' was taken -- see hoople.org)

123 \let\hoopl\ourlib

129 % l2h substitution ourlib Hoopl

130 % l2h substitution hoopl Hoopl

158 \makeatletter

160 \let\c@table=

165 }

170 \else

173 \fi

174 \else

176 \fi

177 }

180 }

191 \noindent

193 %{\small\textit{\def\rmdefault{cmr}\rmfamily\phantom{00}\phantom{: \,}}}%

194 \else

197 \fi

201 \else

204 \verbatim

205 }

208 \makeatother

218 \makeatletter

222 \makeatother

231 \let\cite\citep

253 %%

254 %% 2009/05/10: removed 'float' package because it breaks multiple

255 %% \caption's per {figure} environment. ---NR

256 %%

257 %% % Put figures in boxes --- WHY??? --NR

258 %% \usepackage{float}

259 %% \floatstyle{boxed}

260 %% \restylefloat{figure}

261 %% \restylefloat{table}

265 % ON LINE THREE, set \noauthornotestrue to suppress notes (or not)

267 %\newcommand{\qed}{QED}

268 \ifnotesinmargin

270 \ifvmode

275 \else

282 \else

283 % Simon: please set \notesinmarginfalse on the first line

285 \fi

286 \ifnoauthornotes

288 \fi

292 \let\remark\norman

294 % \let \finalremark \remark % uncomment after submission

316 \iftimestamp

319 \fi

328 %\title{\ourlib: Dataflow Optimization Made Simple}

330 %\title{Implementing Dataflow Analysis and Optimization by Lifting Node Functions to Basic Blocks and Control-Flow Graphs}

334 This paper is a revised version of a submission to ICFP'10. Comments

335 are welcome; please identify this version by the words

337 }}

339 \ifnoauthornotes

340 \makeatletter

341 \let\HyPsd@Warning=

342 \@gobble

343 \makeatother

344 \fi

346 % JoÃ£o

354 \maketitle

360 We present \ourlib, a Haskell library that makes it easy for a

361 compiler writer

362 to implement program transformations based on dataflow analyses.

363 A~client of \ourlib\ defines a representation of

364 logical assertions,

365 a transfer function that computes outgoing assertions from incoming

366 assertions,

367 and a rewrite function that improves code when improvements are

368 justified by the assertions.

369 \ourlib\ does the actual analysis and transformation.

371 \ourlib\ implements state-of-the art algorithms:

372 Lerner, Grove, and Chambers's

374 composition of simple analyses and transformations, which achieves

375 the same precision as complex, handwritten

376 ``super-analyses;''

378 isolating bugs in a client's code.

379 \ourlib's implementation is unique in that unlike previous

380 implementations,

381 it carefully separates the tricky

382 elements of each of these algorithms, so that they can be examined and

383 understood independently.

388 \fi

389 Dataflow analysis and transformation of control-flow graphs is

390 pervasive in optimizing compilers, but it is typically tightly

393 unusually easy to define new analyses and

395 \ourlib's interface is modular and polymorphic,

396 and it offers unusually strong static guarantees.

397 The implementation

398 is also far from routine: it encapsulates

399 state-of-the-art algorithms (interleaved analysis and rewriting,

400 dynamic error isolation), and it cleanly separates their tricky elements

401 so that they can be understood independently.

402 %

403 %\ourlib\ will be the workhorse of a new

404 %back end for the Glasgow Haskell Compiler (version~6.14, forthcoming).

406 %% \emph{Reviewers:} code examples are indexed at {\small\url{http://bit.ly/jkr3K}}

407 %%% Source: http://www.cs.tufts.edu/~nr/drop/popl-index.pdf

414 A mature optimizing compiler for an imperative language includes many

415 analyses, the results of which justify the optimizer's

416 code-improving transformations.

417 Many of the most important analyses and transformations---constant

418 propagation, live-variable analysis, inlining, sinking of loads,

419 and so on---should be regarded as particular cases of

421 %% \remark{I do not feel compelled to cite Muchnick (or anyone else) here}

422 Dataflow analysis is over thirty years old,

424 describing a powerful but subtle way to

426 piggybacks on the other.

428 Because optimizations based on dataflow analysis

429 share a common intellectual framework, and because that framework is

430 subtle, it it tempting to

431 try to build a single reusable library that embodies the

432 subtle ideas, while

433 making it easy for clients to instantiate the library for different

434 situations.

435 Tempting, but difficult.

436 Although some such frameworks exist, as we discuss

438 and none implements the Lerner/Grove/Chambers technique.

441 optimization library''), a new Haskell library for dataflow analysis and

442 optimization. It has the following distinctive characteristics:

445 \item

446 \ourlib\ is purely functional.

447 Perhaps surprisingly, code that

448 manipulates control-flow graphs is easier to write, and far easier

449 to write correctly, when written in a purely functional style

451 When analysis and rewriting

453 without knowing whether

454 the result of the rewrite will be retained or discarded,

455 the benefit of a purely functional style is intensified

458 \item

459 \ourlib\ is polymorphic. Just as a list library is

461 the nodes that inhabit graphs, and in the dataflow facts that

464 \item The paper by Lerner, Grove, and Chambers is inspiring but abstract.

465 We articulate their ideas in a concrete but simple API that hides

467 You provide a representation for assertions,

468 a transfer function that transforms assertions across a node,

469 and a rewrite function that uses a assertion to

470 justify rewriting a node.

471 \ourlib\ ``lifts'' these node-level functions to work over

472 control-flow graphs, sets up and solves recursion equations,

473 and interleaves rewriting with analysis.

474 Designing good abstractions (data types, APIs) is surprisingly

475 hard; we have been through over a dozen significantly different

476 iterations, and we offer our API as a contribution.

477 % ``in its own right'' -- there's an echo in here...

479 \item

480 Because the client

481 can perform very local reasoning (``@y@ is live before

483 \footnote

485 ``if @x@~is live after @x:=y+2@ then @y@ is live before~it,''

488 analyses and transformations built on \ourlib\

489 are small, simple, and easy to get right.

490 Moreover, \ourlib\ helps you write correct optimizations:

491 it~statically rules out transformations that violate invariants

493 and dynamically it can help find the first transformation that introduces a fault

496 but I like this sentence with its claim to both static and dynamic assistance,

497 and maybe the open/closed story is hard to understand here.}

499 % \item \ourlib{} makes use of GADTS and type functions to offer unusually

500 % strong static guarantees. In particular, nodes, basic blocks, and

501 % graphs are all statically typed by their open or closedness on entry, and

502 % their open or closedness on exit (\secref{graph-rep}). For example, an add instruction is

503 % open on entry and exit, while a branch is open on entry and closed on exit.

504 % Using these types we can statically guarantee that, say, an add instruction

505 % is rewritten to a graph that is also open on both entry and exit; and

506 % that the user cannot construct a block where an add instruction follows an

507 % unconditional branch. We know of no other system that offers

508 % static guarantees this strong.

511 (a)~interleaved analysis and rewriting, (b)~speculative rewriting,

512 (c)~computing fixed points, and (d)~dynamic fault isolation.

513 Previous implementations of these algorithms---including three of our

514 own---are complicated and hard to understand, because the tricky pieces

515 are implemented all together, inseparably.

516 A~significant contribution of this paper is

517 a new way to structure the implementation so that each tricky piece

518 is handled in just

520 %\remark{This is a very important claim---is

521 %it substantiated in \secref{engine}? And shouldn't the word

522 %\textbf{contribution} appear in this~\P?}

523 % \simon{Better?} % yes thanks ---NR

524 The result is sufficiently elegant that

525 we emphasize the implementation as an object of interest in

526 its own right.

530 It is no toy: an ancestor of this library is

531 part of the Glasgow Haskell Compiler, where it optimizes the

533 nicer, and it will be in GHC shortly.

536 relatively sophisticated aspects of Haskell's type system, such

537 as higher-rank polymorphism, GADTs, and type functions.

539 of these features.

548 We begin by setting the scene, introducing some vocabulary, and

549 showing a small motivating example.

550 A control-flow graph, perhaps representing the body of a procedure,

552 Each block is a sequence of instructions,

553 beginning with a label and ending with a

554 control-transfer instruction that branches to other blocks.

555 % Each block has a label at the beginning,

556 % a sequence of

557 % -- each of which has a label at the

558 % beginning. Each block may branch to other blocks with arbitrarily

559 % complex control flow.

560 The goal of dataflow optimization is to compute valid

562 then use those assertions to justify code-improving

565 Consider a concrete example: constant propagation with constant folding.

566 On the left we have a basic block; in the middle we have

568 in the block; and at

569 the right we have the result of transforming the block

570 based on the assertions:

572 Before Facts After

573 ------------{}-------------

576 z := x>5 z := True

578 if z goto L1

579 then goto L1

580 else goto L2

582 Constant propagation works

583 from top to bottom. We start with the empty fact.

585 transformation?

586 Yes! We can replace the node with @x:=7@.

587 Now, given this transformed node,

588 and the original fact, what fact flows out of the bottom of

589 the transformed node?

593 Now, can we do another transformation? Yes: constant folding can

594 replace the node with @z:=True@.

595 And so the process continues to the end of the block, where we

596 can replace the conditional branch with an unconditional one, @goto L1@.

598 The example above is simple because the program has only straightline code;

599 when programs have loops, dataflow analysis gets more complicated.

600 For example,

601 consider the following graph,

602 where we assume @L1@ is the entry point:

605 L2: x=7; goto L3

606 L3: ...

608 Because control flows to @L3@ from two places,

610 All paths to @L3@ produce the fact @y=@$4$,

611 so we can conclude that this fact holds at @L3@.

612 But depending on the the path to @L3@, @x@ may have different

613 values, so we conclude ``@x=@$\top$'',

614 meaning that there is no single value held by @x@ at @L3@.%

616 In this example @x@ really does vary at @L3@, but in general

617 the analysis might be conservative.}

618 The final result of joining the dataflow facts that flow to @L3@

626 Interleaving makes it far easier to write effective analyses.

629 the transformations.

631 and the instruction @z:=x>5@,

632 a pure analysis could produce the outgoing fact

634 But the subsequent transformation must perform

638 the transfer function becomes laughably simple: it merely has to see if the

640 The gain is even more compelling if there are a number of interacting

641 analyses and/or transformations; for more substantial

646 assertion about the program state (such as ``variable~@x@ holds value~@7@'').

648 A~prime example is live-variable analysis, where a fact takes the~form

649 ``variable @x@ is live'' and is an assertion about the

651 live'' at a program point P is an assertion that @x@ is used on some program

653 The accompanying transformation is called dead-code elimination;

654 if @x@~is not live, this transformation

655 replaces the node @x:=e@ with a no-op.

657 % ----------------------------------------

661 and transformations driven by these analyses, on control-flow graphs.

662 Graphs are composed from smaller units, which we discuss from the

674 Nodes, blocks, and graphs share important properties in common.

679 control-transfer instruction to a named label.

680 For example,

682 \item A shift-left instruction is open on entry (because control can fall into it

683 from the preceding instruction), and open on exit (because control falls through

684 to the next instruction).

685 \item An unconditional branch is open on entry, but closed on exit (because

686 control cannot fall through to the next instruction).

688 control to fall through into a branch target), but open on exit.

690 % This taxonomy enables \ourlib{} to enforce invariants:

691 % only nodes closed at entry can be the targets of branches, and only nodes closed

692 % at exits can transfer control (see also \secref{edges}).

693 % As~a consequence, all control transfers originate at control-transfer

694 % instructions and terminated at labels; this invariant dramatically

695 % simplifies analysis and transformation.

696 These examples concern nodes, but the same classification applies

697 to blocks and graphs. For example the block

701 is open on entry and closed on exit.

703 ``open/closed;''

704 we may refer to an ``open/closed block.''

706 The shape of a thing determines that thing's control-flow properties.

707 In particular, whenever E is a node, block, or graph,

708 % : \simon{Removed the claim about a unique entry point.}

710 \item

711 If E is open at the entry, it has a unique predecessor;

712 if it is closed, it may have arbitrarily many predecessors---or none.

713 \item

714 If E is open at the exit, it has a unique successor;

715 if it is closed, it may have arbitrarily many successors---or none.

717 %%%%

718 %%%%

719 %%%% % \item Regardless of whether E is open or closed,

720 %%%% % it has a unique entry point where execution begins.

721 %%%% \item If E is closed at exit, control leaves \emph{only}

722 %%%% by explicit branches from closed-on-exit nodes.

723 %%%% \item If E is open at exit, control \emph{may} leave E

724 %%%% by ``falling through'' from a distinguished exit point.

725 %%%% \remark{If E is a node or block, control \emph{only} leaves E by

726 %%%% falling through, but this isn't so for a graph. Example: a body of a

727 %%%% loop contains a \texttt{break} statement} \simon{I don't understand.

728 %%%% A break statement would have to be translated as a branch, no?

729 %%%% Can you give a an example? I claim that control only leaves an

730 %%%% open graph by falling through.}

731 %%%% \end{itemize}

738 Typically, a node might represent a machine instruction, such as an

739 assignment, a call, or a conditional branch.

741 so each client can define nodes as it likes.

742 Because they contain nodes defined by the client,

743 graphs can include arbitrary client-specified data, including

744 (say) C~statements, method calls in an object-oriented language, or

745 whatever.

750 data `Node e x where

751 Label :: Label -> Node C O

752 `Assign :: Var -> Expr -> Node O O

753 `Store :: Expr -> Expr -> Node O O

754 `Branch :: Label -> Node O C

755 `CondBranch :: Expr -> Label -> Label -> Node O C

756 -- ... more constructors ...

763 closed at entry and exit:

764 the type of a node has kind @*->*->*@, where the two type parameters

765 are type-level flags, one for entry and one for exit.

766 Such a type parameter may be instantiated only with type @O@~(for

767 open) or type~@C@ (for closed).

768 As an example,

771 The type parameters are written @e@ and @x@, for

772 entry and exit respectively.

773 The type is a generalized algebraic data type;

774 the syntax gives the type of each constructor.

775 %%% \cite{peyton-jones:unification-based-gadts}.

776 For example, constructor @Label@

777 takes a @Label@ and returns a node of type @Node C O@, where

778 the~``@C@'' says ``closed at entry'' and the~``@O@'' says ``open at exit''.

779 The types @Label@, @O@, and~@C@ are

782 Similarly, an @Assign@ node takes a variable and an expression, and

783 returns a @Node@ open at both entry and exit; the @Store@ node is

784 similar. The types @`Var@ and @`Expr@ are private to the client, and

786 Finally, the control-transfer nodes @Branch@ and @CondBranch@ are open at entry

787 and closed at exit.

789 Nodes closed on entry are the only targets of control transfers;

790 nodes open on entry and exit never perform control transfers;

792 To obey these invariants,

793 a node for

794 a conditional-branch instruction, which typically either transfers control

796 conditional branch, with the fall-through path in a separate block.

798 and it costs nothing in practice:

799 such code is easily sequentialized without superfluous branches.

800 %a late-stage code-layout pass can readily reconstruct efficient code.

801 }.

802 Because of the position each type of node occupies in a

803 basic block,

805 respectively.

811 data `O -- Open

812 data `C -- Closed

814 data `Block n e x where

815 `BFirst :: n C O -> Block n C O

816 `BMiddle :: n O O -> Block n O O

817 `BLast :: n O C -> Block n O C

818 `BCat :: Block n e O -> Block n O x -> Block n e x

820 data `Graph n e x where

821 `GNil :: Graph n O O

822 `GUnit :: Block n O O -> Graph n O O

823 `GMany :: MaybeO e (Block n O C)

824 -> LabelMap (Block n C C)

825 -> MaybeO x (Block n C O)

826 -> Graph n e x

828 data `MaybeO ^ex t where

829 `JustO :: t -> MaybeO O t

830 `NothingO :: MaybeO C t

832 newtype `Label = Label Int

834 class `Edges n where

835 `entryLabel :: n C x -> Label

842 \ourlib\ combines the client's nodes into

845 A~@Block@ is parameterized over the node type~@n@

846 as well as over the same flag types that make it open or closed at

847 entry and exit.

849 The @BFirst@, @BMiddle@, and @BLast@ constructors create one-node

850 blocks.

853 An~earlier representation of blocks used a single constructor

855 shape.

857 uses the power of GADTs to ensure that the shape of every node is

858 known statically.

859 This property makes no difference to clients, but it significantly

860 simplifies the implementation of analysis and transformation in

863 The @BCat@ constructor concatenates blocks in sequence.

864 It~makes sense to concatenate blocks only when control can fall

865 through from the first to the second; therefore,

867 the point of concatenation.

868 This restriction is enforced by the type of @BCat@, whose first

869 argument must be open on exit, and whose second argument must be open on entry.

870 It~is statically impossible, for example, to concatenate a @Branch@

871 immediately before an

872 @Assign@.

873 Indeed, the @Block@ type statically guarantees that any

874 closed/closed @Block@---which compiler writers normally

875 call a ``basic block''---consists of exactly one closed/open node

877 followed by zero or more open/open nodes (@Assign@ or @Store@),

878 and terminated with exactly one

879 open/closed node (@Branch@ or @CondBranch@).

880 Using GADTs to enforce these invariants is one of

882 % Also notice that the definition of @Block@ guarantees that every @Block@

883 % has at least one node.

884 % SPJ: I think we've agreed to stick with PNil.

885 % \remark{So what? Who cares? Why is it important

886 % that no block is empty? (And if we had an empty block, we could ditch

887 % @PNil@ from @PGraph@, which woudl be an improvement.)}

893 Like @Block@, the data type @Graph@ is parameterized over

894 both nodes @n@ and its open/closed shape (@e@ and @x@).

895 It has three constructors. The first two

896 deal with the base cases of open/open graphs:

897 an empty graph is represented by @GNil@ while a single-block graph

898 is represented by @GUnit@.

900 More general graphs are represented by @GMany@, which has three

901 fields: an optional entry sequence, a body, and an optional exit

902 sequence.

904 \item

905 If the graph is open at the entry, it contains an entry sequence of type

906 @Block n O C@.

907 We could represent this sequence as a value of type

908 @Maybe (Block n O C)@, but we can do better:

911 if the graph is open at the entry.

912 We~express our compile-time knowledge by using the type

913 @MaybeO e (Block n O C)@, a type-indexed version of @Maybe@

915 the type @MaybeO O a@ is isomorphic to~@a@, while

916 the type @MaybeO C a@ is isomorphic to~@()@.

917 \item

918 The body of the graph is a collection of closed/closed blocks.

919 To~facilitate traversal of the graph, we represent a body as a finite

920 map from label to block.

921 \item

922 The exit sequence is dual to the entry sequence, and like the entry

923 sequence, its presence or absence is deducible from the static type of the graph.

926 We can splice graphs together nicely; the cost is logarithmic in the

927 number of closed/closed blocks.

928 Unlike blocks, two graphs may be spliced together

929 not only when they are both open

930 at splice point but also

931 when they are both closed---and not in the other two cases:

933 `gSplice :: Graph n e a -> Graph n a x -> Graph n e x

934 gSplice GNil g2 = g2

935 gSplice g1 GNil = g1

937 gSplice (GUnit ^b1) (GUnit ^b2) = GUnit (b1 `BCat` b2)

939 gSplice (GUnit b) (GMany (JustO e) bs x)

940 = GMany (JustO (b `BCat` e)) bs x

942 gSplice (GMany e ^bs (JustO x)) (GUnit b2)

943 = GMany e bs (JustO (x `BCat` b2))

945 gSplice (GMany e1 ^bs1 (JustO x1)) (GMany (JustO e2) ^bs2 x2)

946 = GMany e1 (bs1 `mapUnion` (b `addBlock` bs2) x2

947 where b = x1 `BCat` e2

949 gSplice (GMany e1 bs1 NothingO) (GMany NothingO bs2 x2)

950 = GMany e1 (bs1 `mapUnion` bs2) x2

952 This definition illustrates the power of GADTs: the

953 pattern matching is exhaustive, and all the open/closed invariants are

954 statically checked. For example, consider the second-last equation for @gSplice@.

955 Since the exit link of the first argument is @JustO x1@,

956 we know that type parameter~@a@ is~@O@, and hence the entry link of the second

957 argument must be @JustO e2@.

958 Moreover, block~@x1@ must be

959 closed/open, and block~@e2@ must be open/closed.

960 We can therefore concatenate them

961 with @BCat@ to produce a closed/closed block, which is

962 added to the body of the result.

964 We~have carefully crafted the types so that if @BCat@ and @BodyCat@

965 are considered as associative operators,

967 Need some words about how this is part of making the API simple for

968 the client---I've added something to the end of the paragraph, but I'm

969 not particularly thrilled.}

970 %% \simon{Well, you were the one who was so keen on a unique representation!

971 %% And since we have one, I think tis useful to say so. Lastly, the representation of

972 %% singleton blocks is not entirely obvious.}

973 %%%%

974 %%%% An empty open/open graph is represented

975 %%%% by @GNil@, while a closed/closed one is @gNilCC@:

976 %%%% \par {\small

977 %%%% \begin{code}

978 %%%% gNilCC :: Graph C C

979 %%%% gNilCC = GMany NothingO BodyEmpty NothingO

980 %%%% \end{code}}

981 %%%% The representation of a @Graph@ consisting of a single block~@b@

982 %%%% depends on the shape of~@b@:\remark{Does anyone care?}

983 %%%% \par{\small

984 %%%% \begin{code}

985 %%%% gUnitOO :: Block O O -> Graph O O

986 %%%% gUnitOC :: Block O C -> Graph O C

987 %%%% gUnitCO :: Block O C -> Graph C O

988 %%%% gUnitCC :: Block O C -> Graph C C

989 %%%% gUnitOO b = GUnit b

990 %%%% gUnitOC b = GMany (JustO b) BodyEmpty NothingO

991 %%%% gUnitCO b = GMany NothingO BodyEmpty (JustO b)

992 %%%% gUnitCC b = GMany NothingO (BodyUnit b) NothingO

993 %%%% \end{code}}

994 %%%% Multi-block graphs are similar.

995 %%%% From these definitions

996 To guarantee uniqueness, @GUnit@ is restricted to open/open

997 blocks.

998 If~@GUnit@ were more polymorphic, there would be

999 more than one way to represent some graphs, and it wouldn't be obvious

1000 to a client which representation to choose---or if the choice made a difference.

1006 a control transfer goes, or what is the @Label@ at the start of a block?

1007 To~answer such questions,

1008 the standard Haskell idiom is to define a type class whose methods

1009 provide exactly the operations needed;

1012 and returns its @Label@;

1013 the @successors@ method takes a last node (closed on exit) and returns

1014 the @Label@s to

1015 which it can transfer control.

1016 A~middle node, which is open at both entry and exit, cannot refer to

1017 any @Label@s,

1018 so no corresponding interrogation function is needed.

1020 A node type defined by a client must be an instance of @Edges@.

1022 the client's instance declaration for @Node@ would be

1024 instance Edges Node where

1025 entryLabel (Label l) = l

1029 Again, the pattern matching for both functions is exhaustive, and

1030 the compiler statically checks this fact. Here, @entryLabel@

1031 cannot be applied to an @Assign@ or @Branch@ node,

1032 and any attempt to define a case for @Assign@ or @Branch@ would result

1033 in a type error.

1035 While it is required for the client to provide this information about

1036 nodes, it is very convenient for \ourlib\ to get the same information

1037 about blocks.

1038 For its own internal use,

1041 instance Edges n => Edges (Block n) where

1042 entryLabel (BFirst n) = entryLabel n

1043 entryLabel (BCat b _) = entryLabel b

1044 successors (BLast n) = successors n

1045 successors (BCat _ b) = successors b

1047 Because the functions @entryLabel@ and @successors@ are used to track control

1049 entry label or successors of a @Graph@ itself.

1051 if a @Graph@ is closed at the entry, it does not have a unique entry label.

1052 %

1053 % A slight infelicity is that we cannot make @Graph@ an instance of @Edges@,

1054 % because a graph closed on entry has no unique label. Fortunately, we never

1055 % need @Graph@s to be in @Edges@.

1056 %

1057 % ``Never apologize. Never confess to infelicity.'' ---SLPJ

1065 }

1074 Control-flow graphs& Us & Us & One \\

1076 Dataflow fact~$F$ & You & You & One type per logic \\

1078 Transfer functions & Us & You & One per analysis \\

1080 Solve-and-rewrite functions & Us & Us & Two (forward, backward) \\

1082 }

1093 Now that we have graphs, how do we optimize them?

1095 client to build a new dataflow analysis and optimization.

1096 The client must supply the following pieces:

1100 that let the client build control-flow graphs out of nodes.

1103 Each analysis uses facts that are specific to that particular analysis,

1105 the fact type.

1110 and which returns either @Nothing@

1111 or @(Just g)@ where @g@~is a graph that should

1112 replace the node.

1114 internal control flow is

1115 crucial for many code-improving transformations.

1116 We discuss the rewrite function

1120 Because facts, transfer functions, and rewrite functions work closely together,

1121 we represent their combination

1127 Given a node type~@n@ and a @FwdPass@,

1129 graph.

1130 \hoopl\ provides a fully polymorphic interface, but for purposes of

1131 exposition, we present a function that is specialized to a

1132 closed/closed graph:

1134 analyzeAndRewriteFwdBody

1135 :: ( FuelMonad m -- Gensym and other state

1136 , Edges n ) -- Access to flow edges

1137 => FwdPass m n f -- Lattice, transfer, rewrite

1139 -> Graph n C C -- Input graph

1140 -> FactBase f -- Input fact(s)

1141 -> m ( Graph n C C -- Result graph

1142 , FactBase f ) -- ... and its facts

1144 Given a @FwdPass@, the

1145 analyze-and-rewrite function transforms a graph into

1146 an optimized graph.

1147 As its type shows, this function

1148 is polymorphic in the types of nodes~@n@ and facts~@f@;

1149 these types are determined entirely by the client.

1150 The type of the monad~@m@ is also supplied by the client, but it must

1151 meet certain constraints laid out by \hoopl, as described in

1157 data `FwdPass m n f

1158 = FwdPass { `fp_lattice :: DataflowLattice f

1159 , `fp_transfer :: FwdTransfer n f

1160 , `fp_rewrite :: FwdRewrite m n f }

1162 ------- Lattice ----------

1163 data `DataflowLattice a = DataflowLattice

1164 { `fact_bot :: a

1165 , `fact_extend :: a -> a -> (ChangeFlag, a) }

1167 data `ChangeFlag = `NoChange | `SomeChange

1169 ------- Transfers ----------

1170 newtype `FwdTransfer n f -- abstract type

1171 `mkFTransfer'

1172 :: (forall e x . n e x -> f -> Fact x f)

1173 -> FwdTransfer n f

1175 ------- Rewrites ----------

1176 newtype `FwdRewrite m n f -- abstract type

1177 `mkFRewrite'

1178 :: (forall e x . n e x -> f -> m (FwdRes m n f e x))

1179 -> FwdRewrite m n f

1181 data `FwdRes m n f e x

1182 = FwdRes (Graph n e x) (FwdRewrite m n f)

1183 | `NoFwdRes

1185 ------- Fact-like things -------

1186 type family `Fact x f :: *

1187 type instance Fact O f = f

1188 type instance Fact C f = FactBase f

1190 ------- FactBase -------

1191 type `FactBase f = LabelMap f

1192 -- A finite mapping from Labels to facts f

1199 % omit mkFactBase :: [(Label, f)] -> FactBase f

1203 As well as taking and returning a graph with its entry point(s), the

1204 function also takes input facts (the @FactBase@) and produces output facts.

1205 A~@FactBase@ is simply a finite mapping from @Label@ to facts.

1206 The

1207 output @FactBase@ maps each @Label@ in the @Body@ to its fact; if

1208 the @Label@ is not in the domain of the @FactBase@, its fact is the

1209 bottom element of the lattice.

1210 The point(s) at which control flow may enter the graph are supplied by

1211 the caller; the input @FactBase@ supplies any

1212 facts that hold at these points.

1214 if the graph

1215 represents the body of a procedure

1216 with parameters $x,y,z$, we would map the entry @Label@ to a fact

1218 parameters are not known to be constants.

1220 The client's model of how @analyzeAndRewriteFwd@ works is as follows:

1221 \ourlib\ walks forward over each block in the graph.

1222 At each node, \ourlib\ applies the

1223 rewrite function to the node and the incoming fact. If the rewrite

1224 function returns @NoFwdRes@, the node is retained as part of the output

1225 graph, the transfer function is used to compute the downstream fact,

1226 and \ourlib\ moves on to the next node.

1227 But if the rewrite function returns @(FwdRes g rw)@,

1228 indicating that it wants to rewrite the node to the replacement graph~@g@, then

1229 \ourlib\ recursively analyzes and rewrites~@g@, using the new rewrite

1230 function~@rw@, before moving on to the next node.

1231 A~node following a rewritten node sees

1233 analyzing the replacement graph.

1235 In this section we flesh out the

1242 For each analysis or transformation, the client must define a type

1243 of dataflow facts.

1244 A~dataflow fact often represents an assertion

1245 about a program point,\footnote

1247 control-flow graph.}

1252 claims that variable @x@ holds value @3@ at P, regardless of the

1253 path by which P is reached.

1256 assertion ``@x@ is dead'' at point P claims that no path from P uses

1257 variable @x@.

1261 (a)~the bottom element of the lattice and (b)~how to take

1262 the least upper bound (join) of two elements.

1263 To ensure that analysis

1264 terminates, it is enough if every fact has a finite number of

1265 distinct facts above it, so that repeated joins

1266 eventually reach a fixed point.

1268 In practice, joins are computed at labels.

1275 Furthermore, the dataflow engine wants to know if

1277 because if not, the analysis has not reached a fixed point.

1279 The bottom element and join operation of a lattice of facts of

1280 type~@f@ are stored in a value of type @DataflowLattice f@

1282 %%Such a value is part of the @FwdPass n f@ that is passed to

1283 %%@analyzeAndRewriteFwd@ above.

1284 As noted in the previous paragraph,

1286 to one of the arguments joined.

1287 Because this information is often available very cheaply at the time

1288 when the join is computed, \ourlib\ does not

1289 require a separate equality test on facts (which might be

1290 expensive).

1291 Instead, \ourlib\ requires that @fact_extend@ return a @ChangeFlag@ as

1292 well as the least upper bound.

1293 The @ChangeFlag@ should be @NoChange@ if

1294 the result is the same as the first argument (the old fact), and

1295 @SomeChange@ if the result differs.

1297 its arguments.)

1299 % A~possible representation of the facts needed to implement

1300 % constant propagation is shown in \figref{const-prop}.

1301 % A~fact

1302 % is represented as a finite map from a variable to a value of type

1303 % @Maybe HasConst@.%

1304 % A~variable $x$ maps to @Nothing@ iff $x=\bot$;

1305 % $x$~maps to @Just Top@ iff $x=\top$;

1306 % and $x$~maps to $@Just@\, (@I@\, k)$ iff $x=k$ (and similarly for

1307 % Boolean constants).\remark{Ugh}

1308 % Any one procedure has only

1309 % finitely many variables; only finitely many facts are computed at any

1310 % program point; and in this lattice any one fact can increase at most

1311 % twice. These properties ensure that the dataflow engine will reach a

1312 % fixed point.

1317 A~forward transfer function is presented with the dataflow fact(s) on

1318 the edge(s) coming

1319 into a node, and it computes dataflow fact(s) on the outgoing edge(s).

1320 In a forward analysis, the dataflow engine starts with the fact at the

1321 beginning of a block and applies the transfer function to successive

1322 nodes in that block until eventually the transfer function for the last node

1323 computes the facts that are propagated to the block's successors.

1324 For example, consider this graph, with entry at @L1@:

1326 L1: x=3; goto L2

1328 if x>0 then goto L2 else return

1331 Analyzing @L1@ propagates this fact forward, by applying the transfer

1332 function successively to the nodes

1334 This new fact is joined with the existing (bottom) fact for @L2@.

1335 Now the analysis propagates @L2@'s fact forward, again using the transfer

1337 Again, the new fact is joined with the existing fact for @L2@, and the process

1338 is iterated until the facts for each label reach a fixed point.

1341 If the node is open at exit, the transfer function produces a single fact.

1343 In that case the transfer function must

1344 produce a list of (@Label@,fact) pairs, one for each outgoing edge.

1345 %

1347 depends on the shape of the node's exit.}

1348 Fortunately, this dependency can be expressed precisely, at compile

1349 time, by Haskell's (recently added)

1352 A~forward transfer function supplied by a client,

1353 of type (@FwdTransfer@ @n@ @f@),

1354 is typically a function polymorphic in @e@ and @x@.

1355 It takes a

1357 outgoing ``fact-like thing'' of type (@Fact@ @x@ @f@). The

1358 type constructor @Fact@

1359 should be thought of as a type-level function; its signature is given in the

1360 @type family@ declaration, while its definition is given by two @type instance@

1361 declarations. The first declaration says that the fact-like thing

1362 coming out of a node

1364 the fact-like thing

1365 coming out of a node

1368 We have ordered the arguments such that if

1370 `transfer_fn :: forall e x . n e x -> f -> Fact x f

1371 `node :: n e x

1373 then @(transfer_fn node)@ is a fact transformer:

1375 transfer_fn node :: f -> Fact x f

1378 So much for the interface.

1379 What about the implementation?

1380 The way GADTs work is that the compiler uses the value constructor for

1382 closed.

1384 type~@n@!

1385 Such functions include

1387 \item

1388 A function that takes a pair of @FwdTransfer@s for facts @f@~and~@f'@,

1389 and returns a single @FwdTransfer@ for the fact @(f, f')@

1390 \item

1391 A function that takes a @FwdTransfer@ and wraps it in logging code, so

1392 an analysis can be debugged by watching the facts flow through the

1393 nodes

1395 Because the mapping from

1396 value constructors to shape is different for each node type~@n@, such

1397 functions cannot be polymorphic in both

1398 the representation and the shape of nodes.

1399 We~therefore, in our implementation, sacrifice polymorphism in shape:

1401 each polymorphic in the node type but monomorphic in shape:

1403 newtype FwdTransfer n f

1404 = FwdTransfers ( n C O -> f -> f

1405 , n O O -> f -> f

1406 , n O C -> f -> FactBase f

1407 )

1409 Such triples can easily be composed and wrapped without requiring a

1410 pattern match on the value constructor of an unknown node

1412 Working with triples is tedious, but only \hoopl\ itself is forced to

1413 work with triples; each client, which knows the node type, may supply

1414 a triple,

1415 but it typically supplies a single function polymorphic in the shape

1416 of the (known) node.

1425 %%%% \begin{figure}

1426 %%%% \begin{smallfuzzcode}{6.6pt}

1427 %%%% type `AGraph n e x

1428 %%%% = [Label] -> (Graph n e x, [Label])

1429 %%%%

1430 %%%% `withLabels :: Int -> ([Label] -> AGraph n e x)

1431 %%%% -> AGraph n e x

1432 %%%% withLabels n ^fn = \ls -> fn (take n ^ls) (drop n ls)

1433 %%%%

1434 %%%% `mkIfThenElse :: Expr -> AGraph Node O O

1435 %%%% -> AGraph Node O O -> AGraph Node O O

1436 %%%% mkIfThenElse p t e

1437 %%%% = withLabels 3 $ \[l1,l2,l3] ->

1438 %%%% gUnitOC (BUnit (CondBranch p l1 l2)) `gSplice`

1439 %%%% mkLabel l1 `gSplice` t `gSplice` mkBranch l3 `gSplice`

1440 %%%% mkLabel l2 `gSplice` e `gSplice` mkBranch l3 `gSplice`

1441 %%%% mkLabel l3

1442 %%%%

1443 %%%% `mkLabel l = gUnitCO (BUnit (Label l))

1444 %%%% `mkBranch l = gUnitOC (BUnit (Branch l))

1445 %%%% `gUnitOC b = GMany (JustO b) BodyEmpty NothingO

1446 %%%% `gUnitCO b = GMany NothingO BodyEmpty (JustO b)

1447 %%%% \end{smallfuzzcode}

1448 %%%% \caption{The \texttt{AGraph} type and example constructions} \figlabel{agraph}

1449 %%%% \end{figure}

1451 We compute dataflow facts in order to enable code-improving

1452 transformations.

1453 In our constant-propagation example,

1454 the dataflow facts may enable us

1455 to simplify an expression by performing constant folding, or to

1456 turn a conditional branch into an unconditional one.

1457 Similarly, a liveness analysis may allow us to

1458 replace a dead assignment with a no-op.

1462 A rewriting function takes a node and a fact, and optionally returns\ldots what?

1463 At first one might

1464 expect that rewriting should return a new node, but that is not enough:

1465 We~might want to remove a node by rewriting it to the empty graph,

1466 or more ambitiously, we might want to replace a high-level operation

1467 with a tree of conditional branches or a loop, which would entail

1468 introducing new blocks with internal control flow.

1469 In~general, a rewrite function must be able to return a

1474 Concretely, a @FwdRewrite@ takes a node and a suitably shaped

1475 fact, and returns either @NoFwdRes@, indicating that the node should

1476 not be replaced,

1478 be replaced with~\ag: the replacement graph.

1479 The result is monadic because

1480 if the rewriter makes graphs containing blocks,

1481 it may need fresh @Label@s, which are supplied by a monad.

1482 A~client may use its own monad or may use a monad or monad transformer

1483 supplied by \hoopl.

1486 %%%% \ifpagetuning\enlargethispage{0.8\baselineskip}\fi

1490 the replacement graph $\ag$ has

1492 For example, a branch instruction can be replaced only by a graph

1493 closed at the exit. Moreover, because only an open/open graph can be

1495 of @FwdRewrite@

1496 guarantees, at compile time, that no head of a block (closed/open)

1497 or tail of a block (open/closed) can ever be deleted by being

1498 rewritten to an empty graph.

1503 Once the rewrite has been performed, what then?

1504 The rewrite returns a replacement graph,

1505 which must itself be analyzed, and its nodes may be further rewritten.

1506 We~call @analyzeAndRewriteFwd@ to process the replacement

1507 graph---but what @FwdPass@ should we use?

1508 There are two common situations:

1510 \item Sometimes we want to analyze and transform the replacement graph

1511 with an unmodified @FwdPass@, further rewriting the replacement graph.

1512 This procedure is

1514 When deep rewriting is used, the client's rewrite function must

1515 ensure that the graphs it produces are not rewritten indefinitely

1519 It~is easily implemented by using a modified @FwdPass@

1520 whose rewriting function always returns @NoFwdRes@.

1522 Deep rewriting is essential to achieve the full benefits of

1523 interleaved analysis and transformation

1525 But shallow rewriting can be vital as well;

1526 for example, a forward dataflow pass that inserts

1527 a spill before a call must not rewrite the call again, lest it attempt

1528 to insert infinitely many spills.

1530 An~innovation of \hoopl\ is to build the choice of shallow or deep

1531 rewriting into each rewrite function,

1532 an idea that is elegantly captured by the

1534 The first component of the @FwdRes@ is the replacement graph, as discussed earlier.

1535 The second component, $\rw$, is a

1537 the replacement graph.

1538 For shallow rewriting this new function is

1539 the constant @Nothing@ function; for deep rewriting it is the original

1540 rewriting function.

1545 By requiring each rewrite to return a new rewrite function,

1546 \hoopl\ enables a variety of combinators over

1547 rewrite functions.

1549 (triple-based) representation. It's not pretty any more.}

1550 For example, here is a function

1551 that combines two rewriting functions in sequence:

1554 thenFwdRw :: Monad m

1555 => FwdRewrite m n f

1556 -> FwdRewrite m n f

1557 -> FwdRewrite m n f

1558 `thenFwdRw rw1 rw2 = wrapFRewrites2' f rw1 rw2

1559 where f rw1 rw2' n f = do

1560 res1 <- rw1 n f

1561 case res1 of

1562 NoFwdRes -> rw2' n f

1563 FwdRes g rw1a ->

1564 return $ FwdRes g (rw1a `thenFwdRw` rw2)

1566 `noFwdRw :: FwdRewrite n f

1567 noFwdRw = mkFRewrite' $ \ n f -> NoFwdRes

1569 What a beautiful type @thenFwdRw@ has! It tries @rw1@, and if @rw1@

1570 declines to rewrite, it behaves like @rw2@. But if

1571 @rw1@ rewrites, returning a new rewriter @rw1a@, then the overall call also

1572 succeeds, returning a new rewrite function obtained by combining @rw1a@

1573 with @rw2@. (We cannot apply @rw1a@ or @rw2@

1574 directly to the replacement graph~@g@,

1575 because @r1@~returns a graph and @rw2@~expects a node.)

1576 The rewriter @noFwdRw@ is the identity of @thenFwdRw@.

1577 Finally, @thenFwdRw@ can

1578 combine a deep-rewriting function and a shallow-rewriting function,

1579 to produce a rewriting function that is a combination of deep and shallow.

1580 %%This is something that the Lerner/Grove/Chambers framework could not do,

1581 %%because there was a global shallow/deep flag.

1582 %% Our shortsightedness; Lerner/Grove/Chambers is deep only ---NR

1585 A shallow rewriting function can be made deep by iterating

1588 iterFwdRw

1589 :: Monad m => FwdRewrite m n f -> FwdRewrite m n f

1590 `iterFwdRw rw = wrapFRewrites' f rw

1591 where f rw' n f = liftM iter (rw' n f)

1592 iter NoFwdRes = NoFwdRes

1593 iter (FwdRes g rw2) =

1594 FwdRes g (rw2 `thenFwdRw` iterFwdRw rw)

1596 If we have shallow rewrites $A$~and~$B$ then we can build $AB$,

1597 $A^*B$, $(AB)^*$,

1598 and so on: sequential composition is @thenFwdRw@ and the Kleene star

1601 The combinators above operate on rewrite functions that share a common

1602 fact type and transfer function.

1603 It~can also be useful to combine entire dataflow passes that use

1604 different facts.

1605 We~invite you to write one such combinator, with type

1607 `thenFwd :: Monad m

1608 => FwdPass m n f1

1609 -> FwdPass m n f2

1610 -> FwdPass m n (f1,f2)

1612 The two passes run interleaved, not sequentially, and each may help

1613 the other,

1614 yielding better results than running $A$~and then~$B$ or $B$~and then~$A$

1616 %% call these passes. ``super-analyses;''

1617 %% in \hoopl, construction of super-analyses is

1618 %% particularly concrete.

1623 % omit Binop :: Operator -> Expr -> Expr -> Expr

1624 % omit Add :: Operator

1630 -- Types and definition of the lattice

1631 data `HasConst = `Top | `B Bool | `I Integer

1632 type `ConstFact = Map.Map Var HasConst

1633 `constLattice = DataflowLattice

1634 { fact_bot = Map.empty

1635 , fact_extend = stdMapJoin constFactAdd }

1636 where

1637 `constFactAdd ^old ^new = (c, j)

1638 where j = if new == old then new else Top

1639 c = if j == old then NoChange else SomeChange

1641 -------------------------------------------------------

1642 -- Analysis: variable has constant value

1643 `varHasConst :: FwdTransfer Node ConstFact

1644 varHasConst (Label l) f = lookupFact f l

1645 varHasConst (Store _ _) f = f

1646 varHasConst (Assign x (Bool b)) f = Map.insert x (B b) f

1647 varHasConst (Assign x (Int i)) f = Map.insert x (I i) f

1648 varHasConst (Assign x _) f = Map.insert x Top f

1650 varHasConst (CondBranch (Var x) ^tid ^fid) f

1651 = mkFactBase [(tid, Map.insert x (B True) f),

1652 (fid, Map.insert x (B False) f)]

1653 varHasConst (CondBranch _ tid fid) f

1656 -------------------------------------------------------

1657 -- Constant propagation

1658 `constProp :: FwdRewrite Node ConstFact

1659 constProp node ^facts

1660 = fmap toAGraph (mapE rewriteE node)

1661 where

1662 `rewriteE e (Var x)

1663 = case Map.lookup x facts of

1664 Just (B b) -> Just $ Bool b

1665 Just (I i) -> Just $ Int i

1666 _ -> Nothing

1667 rewriteE e = Nothing

1669 -------------------------------------------------------

1670 -- Simplification ("constant folding")

1671 `simplify :: FwdRewrite Node f

1672 simplify (CondBranch (Bool b) t f) _

1673 = Just $ toAGraph $ Branch (if b then t else f)

1674 simplify node _ = fmap toAGraph (mapE s_exp node)

1675 where

1676 `s_exp (Binop Add (Int i1) (Int i2))

1677 = Just $ Int $ i1 + i2

1678 ... -- more cases for constant folding

1680 -- Rewriting expressions

1681 `mapE :: (Expr -> Maybe Expr)

1682 -> Node e x -> Maybe (Node e x)

1683 mapE f (Label _) = Nothing

1684 mapE f (Assign x e) = fmap (Assign x) $ f e

1685 ... -- more cases for rewriting expressions

1687 -------------------------------------------------------

1688 -- Defining the forward dataflow pass

1689 `constPropPass = FwdPass

1690 { fp_lattice = constLattice

1691 , fp_transfer = varHasConst

1692 , fp_rewrite = constProp `thenFwdRw` simplify }

1697 constant propagation and constant folding.

1698 For each variable at each point in a graph, the analysis concludes one of three facts:

1699 the variable holds a constant value (Boolean or integer),

1700 the variable might hold a non-constant value,

1701 or nothing is known about what the variable holds.

1702 We~represent these facts using a finite map from a variable to a

1703 fact of type (@Maybe HasConst@).

1704 A variable with a constant value maps to @Just k@, where @k@ is the constant value;

1705 a variable with a non-constant value maps to @Just Top@;

1706 and a variable with an unknown value maps to @Nothing@ (i.e., it is not

1707 in the domain of the finite map).

1709 % \afterpage{\clearpage}

1711 The definition of the lattice (@constLattice@) is straightforward.

1712 The bottom element is an empty map (nothing is known about the contents of any variable).

1713 We~use the @stdMapJoin@ function to lift the join operation

1714 for a single variable (@constFactAdd@) up to the map containing facts

1715 for all variables.

1717 % omit stdMapJoin :: Ord k => JoinFun v -> JoinFun (Map.Map k v)

1719 For the transfer function, @varHasConst@,

1720 there are two interesting kinds of nodes:

1721 assignment and conditional branch.

1722 In the first two cases for assignment, a variable gets a constant value,

1723 so we produce a dataflow fact mapping the variable to its value.

1724 In the third case for assignment, the variable gets a non-constant value,

1725 so we produce a dataflow fact mapping the variable to @Top@.

1726 The last interesting case is a conditional branch where the condition

1727 is a variable.

1728 If the conditional branch flows to the true successor,

1729 the variable holds @True@, and similarly for the false successor.

1730 We update the fact flowing to each successor accordingly.

1732 We do not need to consider complicated cases such as

1733 an assignment @x:=y@ where @y@ holds a constant value @k@.

1734 Instead, we rely on the interleaving of transformation

1735 and analysis to first transform the assignment to @x:=k@,

1736 which is exactly what our simple transfer function expects.

1738 interleaving makes it possible to write

1739 the simplest imaginable transfer functions, without missing

1740 opportunities to improve the code.

1742 The rewrite function for constant propagation, @constProp@,

1743 simply rewrites each use of a variable to its constant value.

1744 We use the auxiliary function @mapE@

1745 to apply @rewriteE@ to each use of a variable in each kind of node;

1746 in turn, the @rewriteE@ function checks if the variable has a constant

1747 value and makes the substitution. We assume an auxiliary function

1749 `toAGraph :: Node e x -> AGraph Node e x

1753 to perform constant

1754 folding, called @simplify@. It rewrites a conditional branch on a

1755 boolean constant to an unconditional branch, and

1756 to find constant subexpressions,

1757 it runs @s_exp@

1758 on every subexpression.

1759 Function @simplify@ does not need to check whether a variable holds a

1760 constant value; it relies on @constProp@ to have replaced the

1761 variable by the constant.

1762 Indeed, @simplify@ does not consult the

1763 incoming fact at all, and hence is polymorphic in~@f@.

1767 We have written two @FwdRewrite@ functions

1768 because they are independently useful. But in this case we

1770 so we compose them with @thenFwdRw@.

1771 The composed rewrite functions, along with the lattice and the

1772 transfer function,

1774 To improve a particular graph, we pass @constPropPass@ and the graph

1775 to

1776 @analyzeAndRewriteFwd@.

1783 Debugging an optimization can be tricky:

1784 an optimization may rewrite hundreds of nodes,

1785 and any of those rewrites could be incorrect.

1786 To debug dataflow optimizations, we use Whalley's

1788 to identify the first rewrite that

1789 transforms a program from working code to faulty code.

1791 The key idea is to~limit the number of rewrites that

1792 are performed while optimizing a graph.

1793 In \hoopl, the limit is called

1795 each rewrite costs one unit of fuel,

1796 and when the fuel is exhausted, no more rewrites are permitted.

1797 Because each rewrite leaves the observable behavior of the

1798 program unchanged, it is safe to stop rewriting at any point.

1799 Given a program that fails when compiled with optimization,

1800 a test infrastructure uses binary search on the amount of

1801 optimization fuel, until

1802 it finds that the program works correctly after $n-1$ rewrites but fails

1803 after $n$~rewrites.

1804 The $n$th rewrite is faulty.

1807 You may have noticed that @analyzeAndRewriteFwd@

1809 The @`FuelMonad@ is a simple state monad maintaining the supply of unused

1810 fuel. It also holds a supply of fresh labels, which are used by the rewriter

1811 for making new blocks; more precisely,

1812 \hoopl\ uses these labels to

1813 take the @AGraph@ returned by a pass's rewrite

1819 Are rewrites sound, especially when there are loops?

1820 Many analyses compute a fixed point starting from unsound

1821 ``facts''; for example, a live-variable analysis starts from the

1823 performed before a fixed point is reached may be unsound, and their results

1824 must be discarded}. Each iteration of the fixed-point computation must

1825 start afresh with the original graph.

1829 (speculatively, and possibly recursively),

1830 so that the facts downstream of the replacement graphs are as accurate

1831 as possible.

1832 For~example, consider this graph, with entry at @L1@:

1835 L1: x=0; goto L2

1841 What if the predicate in the conditional branch were @x<10@ instead

1842 of @x==10@?

1844 Using that fact, we would rewrite the conditional branch to an unconditional

1845 branch @goto L3@. No new fact would propagate to @L2@, and we would

1846 have successfully (and soundly) eliminated the loop.

1847 This example is contrived, but it illustrates that

1848 for best results we should

1850 \item Perform the rewrites on every iteration.

1851 \item Begin each new iteration with the original, virgin graph.

1853 This sort of algorithm is hard to implement in an imperative setting, where rewrites

1854 mutate a graph in place.

1855 But with an immutable graph, implementing the algorithm

1856 is trivially easy: we simply revert to the original graph at the start

1857 of each fixed-point iteration.

1861 Facts computed by @analyzeAndRewriteFwd@ depend on graphs produced by the rewrite

1862 function, which in turn depend on facts computed by the transfer function.

1863 How~do we know this algorithm is sound, or if it terminates?

1864 A~proof requires a POPL paper

1866 intuition.

1868 \hoopl\ requires that a client's functions meet

1869 these preconditions:

1871 \item

1873 every sequence of calls to @fact_extend@ must eventually return @NoChange@.

1874 \item

1875 The transfer function must be

1877 it should produce a more informative fact out.

1878 \item

1880 if it replaces a node @n@ by a replacement graph~@g@, then @g@~must be

1881 observationally equivalent to~@n@ under the

1882 assumptions expressed by the incoming dataflow fact~@f@.

1883 %%\footnote{We do not permit a transformation to change

1884 %% the @Label@ of a node. We have not found any optimizations

1885 %% that are prevented (or even affected) by this restriction.}

1886 \item

1889 For example, if the analysis says that @x@ is dead before the node~@n@,

1890 then it had better still be dead if @n@ is replaced by @g@.

1891 \item

1892 To ensure termination, a transformation that uses deep rewriting

1893 must not return replacement graphs which

1894 contain nodes that could be rewritten indefinitely.

1896 Without the conditions on monotonicity and consistency,

1897 our algorithm will terminate,

1898 but there is no guarantee that it will compute

1899 a fixed point of the analysis. And that in turn threatens the

1900 soundness of rewrites based on possibly bogus ``facts''.

1902 However, when the preconditions above are met,

1904 \item

1905 The algorithm terminates. The fixed-point loop must terminate because the

1906 lattice has no infinite ascending chains. And the client is responsible

1907 for avoiding infinite recursion when deep rewriting is used.

1908 \item

1909 The algorithm is sound. Why? Because if each rewrite is sound (in the sense given above),

1910 then applying a succession of rewrites is also sound.

1911 Moreover, a~sound analysis of the replacement graph

1912 may generate only dataflow facts that could have been

1913 generated by a more complicated analysis of the original graph.

1917 for a forward analysis/transform, if (rewrite P s) = Just s',

1918 then (transfer P s $\sqsubseteq$ transfer P s').

1919 For backward: if (rewrite Q s) = Just s', then (transfer Q s' $\sqsubseteq$ transfer Q s).

1920 Works for liveness.

1921 ``It works for liveness, so it must be true'' (NR).

1922 If this is true, it's worth a QuickCheck property!

1927 be monotonic in~$f$. We think this is true of liveness, but we are not sure

1928 whether it's just a generally good idea, or whether it's actually a

1931 %%%% \simon{The rewrite functions must presumably satisfy

1932 %%%% some monotonicity property. Something like: given a more informative

1933 %%%% fact, the rewrite function will rewrite a node to a more informative graph

1934 %%%% (in the fact lattice.).

1935 %%%% \textbf{NR}: actually the only obligation of the rewrite function is

1936 %%%% to preserve observable behavior. There's no requirement that it be

1937 %%%% monotonic or indeed that it do anything useful. It just has to

1938 %%%% preserve semantics (and be a pure function of course).

1939 %%%% \textbf{SLPJ} In that case I think I could cook up a program that

1940 %%%% would never reach a fixed point. Imagine a liveness analysis with a loop;

1941 %%%% x is initially unused anywhere.

1942 %%%% At some assignment node inside the loop, the rewriter behaves as follows:

1943 %%%% if (and only if) x is dead downstream,

1944 %%%% make it alive by rewriting the assignment to mention x.

1945 %%%% Now in each successive iteration x will go live/dead/live/dead etc. I

1946 %%%% maintain my claim that rewrite functions must satisfy some

1947 %%%% monotonicity property.

1948 %%%% \textbf{JD}: in the example you cite, monotonicity of facts at labels

1949 %%%% means x cannot go live/dead/live/dead etc. The only way we can think

1950 %%%% of not to terminate is infinite ``deep rewriting.''

1951 %%%% }

1961 gives a client's-eye view of \hoopl, showing how to use

1962 it to create analyses and transformations.

1963 \hoopl's interface is simple, but

1966 do not describe their implementation. We have written

1967 at least three previous implementations, all of which

1968 were long and hard to understand, and only one of which

1969 provided compile-time guarantees about open and closed shapes.

1970 We are not confident that any of these implementations are correct.

1972 In this paper we describe our new implementation. It is short

1973 (about a third of the size of our last attempt), elegant, and offers

1974 strong static shape guarantees. The whole thing is about 300~lines of

1975 code, excluding comments; this count includes both forward and backward

1976 dataflow analysis and transformation.

1979 analysis and transformation.

1980 The implementations of backward analysis and transformation are

1981 exactly analogous and are included in \hoopl.

1988 We concentrate on implementing @analyzeAndRewriteFwd@, whose

1990 Its implementation is built on the hierarchy of nodes, blocks, and graphs

1992 we develop a function of this type:

1994 type `ARF ^thing n

1995 = forall f e x. FwdPass n f

1996 -> thing e x -> Fact e f

1997 -> FuelMonad (RG n e x, Fact x f)

1999 An @ARF@ (short for ``analyze and rewrite forward'') is a combination of

2000 a rewrite and transfer function.

2002 An @ARF@ takes a @FwdPass@, a @thing@ (a node, block, or graph),

2003 and an input fact,

2004 and it returns a rewritten graph of type @(RG n e x)@ of the same shape

2005 as the @thing@, plus a suitably shaped output fact.

2006 %

2007 %Regardless of whether @thing@ is a node, block, or graph, the result

2008 %is always a graph.

2009 %

2010 % point is made adequately in the client section ---NR

2011 The type~@RG@ is internal to \hoopl; it is not seen by any client.

2012 We use it, not @Graph@, for two reasons:

2014 \item The client is often interested not only in the facts flowing

2015 out of the graph (which are returned in the @Fact x f@),

2017 of the graph. A~replacement graph of type @(RG n e x)@ is decorated with

2018 these internal facts.

2019 \item A @Graph@ has deliberately restrictive invariants; for example,

2024 The information in an @RG@ is returned to the client by

2025 the normalization function @normalizeBody@, which

2026 splits an @RG@ into a @Body@ and its corresponding @FactBase@:

2028 `normalizeBody :: Edges n => RG n f C C

2029 -> (Body n, FactBase f)

2034 data `RG n f e x where

2035 `RGNil :: RG n f a a

2036 `RGCatO :: RG n f e O -> RG n f O x -> RG n f e x

2037 `RGCatC :: RG n f e C -> RG n f C x -> RG n f e x

2038 `RGUnit :: Fact e f -> Block n e x -> RG n f e x

2043 The essential points are that constructor @RGUnit@ is polymorphic in

2044 the shape of a block, @RGUnit@ carries a fact as

2045 well as a block, and the concatenation constructors record the shapes

2046 of the graphs at the point of concatenation.

2048 (A~record of the shapes is needed so that when

2049 @normalizeBody@ is presented with a block carried by @RGUnit@, it is

2050 known whether the block is an entry sequence, an exit sequence, or a

2051 basic block.)

2053 Within \hoopl,

2054 the @RG@~type is a great convenience. Mutter mutter:

2055 it carries facts as well as blocks, and it frees the client's

2056 rewrite functions from any obligation to respect the invariants of

2057 type @Graph@---I'm not convinced.}

2060 We exploit the type distinctions of nodes, @Block@, @Body@,

2061 and @Graph@ to structure the code into several small pieces, each of which

2062 can be understood independently. Specifically, we define a layered set of

2063 functions, each of which calls the previous one:

2065 arfNode :: Edges n => ARF n n

2066 arfBlock :: Edges n => ARF (Block n) n

2067 arfBody :: Edges n

2068 => FwdPass n f -> Body n -> FactBase f

2069 -> FuelMonad (RG n f C C, FactBase f)

2070 arfGraph :: Edges n => ARF (Graph n) n

2073 \item

2075 It handles the subtleties of interleaved analysis and rewriting,

2076 and it deals with fuel consumption. It calls @arfGraph@ to analyze

2077 and transform rewritten graphs.

2078 \item

2079 Based on @arfNode@ it is extremely easy to write @arfBlock@, which lifts

2082 %%% \ifpagetuning\penalty -10000 \fi

2085 \item

2086 Using @arfBlock@ we define @arfBody@, which analyzes and rewrites a

2087 @Body@: a~group of closed/closed blocks linked by arbitrary

2088 control flow.

2089 Because a @Body@ is

2090 always closed/closed and does not take shape parameters, function

2091 @arfBody@ is less polymorphic than the others; its type is what

2092 would be obtained by expanding and specializing the definition of

2093 @ARF@ for a @thing@ which is always closed/closed and is equivalent to

2094 a @Body@.

2097 \item

2100 Given these functions, writing the main analyzer is a simple

2101 matter of matching the external API to the internal functions:

2103 `analyzeAndRewriteFwd

2104 :: forall n f. Edges n

2105 => FwdPass n f -> Body n -> FactBase f

2106 -> FuelMonad (Body n, FactBase f)

2108 analyzeAndRewriteFwd pass ^body facts

2109 = do { (^rg, _) <- arfBody pass body facts

2110 ; return (normalizeBody rg) }

2116 We begin our explanation with the second task:

2117 writing @arfBlock@, which analyzes and transforms blocks.

2119 `arfBlock :: Edges n => ARF (Block n) n

2120 arfBlock pass (BUnit node) f

2121 = arfNode pass node f

2122 arfBlock pass (BCat b1 b2) f

2123 = do { (g1,f1) <- arfBlock pass b1 f

2124 ; (g2,f2) <- arfBlock pass b2 f1

2125 ; return (g1 `RGCatO` g2, f2) }

2127 The code is delightfully simple.

2128 The @BUnit@ case is implemented by @arfNode@.

2129 The @BCat@ case is implemented by recursively applying @arfBlock@ to the two

2130 sub-blocks, threading the output fact from the first as the

2131 input to the second.

2132 Each recursive call produces a rewritten graph;

2133 we concatenate them with @RGCatO@.

2135 Function @arfGraph@ is equally straightforward:

2137 `arfGraph :: Edges n => ARF (Graph n) n

2138 arfGraph _ GNil f = return (RGNil, f)

2139 arfGraph pass (GUnit blk) f = arfBlock pass blk f

2140 arfGraph pass (GMany NothingO body NothingO) f

2141 = do { (^body', ^fb) <- arfBody pass body f

2142 ; return (body', fb) }

2143 arfGraph pass (GMany NothingO body (JustO ^exit)) f

2144 = do { (body', fb) <- arfBody pass body f

2145 ; (^exit', ^fx) <- arfBlock pass exit fb

2146 ; return (body' `RGCatC` exit', fx) }

2147 -- ... two more equations for GMany ...

2149 The pattern is the same as for @arfBlock@: thread

2150 facts through the sequence, and concatenate the results.

2151 Because the constructors of type~@RG@ are more polymorphic than those

2152 of @Graph@, type~@RG@ can represent

2153 graphs more simply than @Graph@; for example, each element of a

2154 @GMany@ becomes a single @RG@ object, and these @RG@ objects are then

2155 concatenated to form a single result of type~@RG@.

2157 %% \ifpagetuning\penalty -10000 \fi

2161 Although interleaving analysis with transformation is tricky, we have

2162 succeeded in isolating the algorithm in just two functions,

2163 @arfNode@ and its backward analog, @`arbNode@:

2165 `arfNode :: Edges n => ARF n n

2166 arfNode ^pass n f

2167 = do { ^mb_g <- withFuel (fp_rewrite pass n f)

2168 ; case mb_g of

2169 Nothing -> return (RGUnit f (BUnit n),

2170 fp_transfer pass n f)

2171 Just (FwdRes ^ag ^rw) ->

2172 do { g <- graphOfAGraph ag

2174 ; arfGraph pass' g f } }

2176 The code here is more complicated,

2177 but still admirably brief.

2179 we~begin by extracting the

2180 rewriting function from the @FwdPass@,

2181 and we apply it to the node~@n@

2182 and the incoming fact~@f@.

2184 The resulting @Maybe@ is passed to @withFuel@, which

2185 deals with fuel accounting:

2187 `withFuel :: Maybe a -> FuelMonad (Maybe a)

2191 Otherwise, @withFuel@ consumes one unit of fuel and returns its

2192 % defn Fuel

2193 argument (which will be a @Just@). That is all we need say about fuel.

2195 In the @Nothing@ case, no rewrite takes place---either because the rewrite function

2196 didn't want one or because fuel is exhausted.

2197 We~return a single-node

2198 graph @(RGUnit f (BUnit n))@, decorated with its incoming fact.

2199 We~also apply the transfer function @(fp_transfer pass)@

2200 to the incoming fact to produce the outgoing fact.

2201 (Like @fp_rewrite@, @fp_transfer@ is a record selector of @FwdPass@.)

2203 In the @Just@ case, we receive a replacement

2204 @AGraph@ @ag@ and a new rewrite function~@rw@.

2205 We~convert @ag@ to a @Graph@, using

2208 `graphOfAGraph :: AGraph n e x -> FuelMonad (Graph n e x)

2210 and we analyze the resulting @Graph@ with @arfGraph@.

2211 This analysis uses @pass'@, which contains the original lattice and transfer

2212 function from @pass@, together with the new rewrite function~@rg@.

2214 And that's it! If~the client wanted deep rewriting, it is

2215 implemented by the call to @arfGraph@;

2216 if the client wanted

2217 shallow rewriting, the rewrite function will have returned

2218 @noFwdRw@ as~@rw@, which is implanted in @pass'@

2223 Lastly, @arfBody@ deals with the fixed-point calculation.

2224 This part of the implementation is the only really tricky part, and it is

2225 cleanly separated from everything else:

2228 arfBody :: Edges n

2229 => FwdPass n f -> Body n -> FactBase f

2230 -> FuelMonad (RG n f C C, FactBase f)

2231 `arfBody pass body ^fbase

2232 = fixpoint (fp_lattice pass) (arfBlock pass) fbase $

2233 forwardBlockList (factBaseLabels fbase) body

2235 Function @forwardBlockList@ takes a list of possible entry points and @Body@,

2236 and it

2237 returns a linear list of

2238 blocks, sorted into an order that makes forward dataflow efficient:

2240 `forwardBlockList

2244 For

2245 example, if the @Body@ starts at block~@L2@, and @L2@

2246 branches to~@L1@, but not vice versa, then \hoopl\ will reach a fixed point

2247 more quickly if we process @L2@ before~@L1@.

2248 To~find an efficient order, @forwardBlockList@ uses

2249 the methods of the @Edges@ class---@entryLabel@ and @successors@---to

2250 perform a reverse depth-first traversal of the control-flow graph.

2251 %%

2252 %%The @Edges@ type-class constraint on~@n@ propagates to all the

2253 %%@`arfThing@ functions.

2254 %% paragraph carrying too much freight

2255 %%

2256 The order of the blocks does not affect the fixed point or any other

2257 part of the answer; it affects only the number of iterations needed to

2258 reach the fixed point.

2260 How do we know what entry points to pass to @forwardBlockList@?

2261 We treat

2262 any block with an entry in the in-flowing @FactBase@ as an entry point.

2266 The rest of the work is done by @fixpoint@, which is shared by

2267 both forward and backward analyses:

2269 `fixpoint :: forall n f.

2270 Edges n

2271 => Bool -- going Forward?

2272 -> DataflowLattice f

2273 -> (Block n C C -> FactBase f ->

2274 FuelMonad (RG n f C C, FactBase f))

2275 -> FactBase f

2277 -> FuelMonad (RG n f C C, FactBase f)

2279 Except for the mysterious @Bool@ passed as the first argument,

2280 the type signature tells the story.

2281 The third argument is

2282 a function that analyzes and rewrites a single block;

2283 @fixpoint@ applies that function successively to all the blocks,

2285 functions, blocks should come before @FactBase@, even though this change will

2286 ugly up the call site some.}

2287 The @fixpoint@ function maintains a

2288 ``Current @FactBase@''

2289 which grows monotonically:

2290 the initial value of the Current @FactBase@ is the fourth argument to

2291 @fixpoint@,

2292 and the Current @FactBase@ is augmented with the new facts that flow

2293 out of each @Block@ as it is analyzed.

2294 The @fixpoint@ function

2295 keeps analyzing blocks until the Current @FactBase@ reaches a fixed

2296 point.

2298 The code for @fixpoint@ is a massive 70 lines long;

2299 for completeness, it

2301 The~code is mostly straightforward, although we try to be a bit clever

2302 about deciding when a new fact means that another iteration over the

2303 blocks will be required.

2304 There is one more subtle point worth mentioning, which we highlight by

2305 considering a forward analysis of this graph, where execution starts at~@L1@:

2307 L1: x:=3; goto L4

2308 L2: x:=4; goto L4

2309 L4: if x>3 goto L2 else goto L5

2311 Block @L2@ is unreachable.

2312 But if we \naively\ process all the blocks (say in

2313 order @L1@, @L4@, @L2@), then we will start with the bottom fact for @L2@, propagate

2316 Given @x=@$\top$, the

2317 conditional in @L4@ cannot be rewritten, and @L2@~seems reachable. We have

2318 lost a good optimization.

2320 Our implementation solves this problem through a clever trick that is

2321 safe only for a forward analysis;

2322 @fixpoint@ analyzes a block only if the block is

2323 reachable from an entry point.

2324 This trick is not safe for a backward analysis, which

2325 is why

2326 @fixpoint@ takes a @Bool@ as its first argument:

2327 it must know if the analysis goes forward.

2329 Although the trick can be implemented in just a couple of lines of

2330 code, the reasoning behind it is quite subtle---exactly the sort of

2331 thing that should be implemented once in \hoopl, so clients don't have

2332 to worry about it.

2336 While there is a vast body of literature on

2337 dataflow analysis and optimization,

2338 relatively little can be found on

2340 We therefore focus on the foundations of dataflow analysis

2341 and on the implementations of some comparable dataflow frameworks.

2345 When transfer functions are monotone and lattices are finite in height,

2346 iterative dataflow analysis converges to a fixed point

2348 If~the lattice's join operation distributes over transfer

2349 functions,

2350 this fixed point is equivalent to a join-over-all-paths solution to

2351 the recursive dataflow equations

2353 {Kildall uses meets, not joins.

2354 Lattice orientation is conventional, and conventions have changed.

2355 We use Dana Scott's

2356 orientation, in which higher elements carry more information.}

2358 monotone functions.

2359 Each~client of \hoopl\ must guarantee monotonicity.

2361 \ifcutting

2363 \else

2365 \fi

2366 introduce abstract interpretation as a technique for developing

2367 lattices for program analysis.

2369 an all-paths dataflow problem can be viewed as model checking an

2370 abstract interpretation.

2373 presents many examples of both particular analyses and related

2374 algorithms.

2377 The soundness of interleaving analysis and transformation,

2378 even when not all speculative transformations are performed on later

2379 iterations, was shown by

2384 Most dataflow frameworks support only analysis, not transformation.

2385 The framework computes a fixed point of transfer functions, and it is

2386 up to the client of

2387 the framework to use that fixed point for transformation.

2388 Omitting transformation makes it much easier to build frameworks,

2389 and one can find a spectrum of designs.

2390 We~describe two representative

2391 designs, then move on to the prior frameworks that support interleaved

2392 analysis and transformation.

2395 for same reason as Soot below ---JD}

2396 provides an analysis-only framework for C~programs.

2397 The framework is limited to one representation of control-flow graphs

2398 and one representation of instructions, both of which are provided by

2399 the framework.

2400 The~API is complicated;

2401 much of the complexity is needed to enable the client to

2402 affect which instructions

2403 the analysis iterates over.

2406 The Soot framework is designed for analysis of Java programs

2408 best for Soot in general, but there doesn't appear

2409 to be any formal publication that actually details the dataflow

2410 framework part. ---JD}

2411 While Soot's dataflow library supports only analysis, not

2412 transformation, we found much

2413 to admire in its design.

2414 Soot's library is abstracted over the representation of

2415 the control-flow graph and the representation of instructions.

2416 Soot's interface for defining lattice and analysis functions is

2417 like our own,

2418 although because Soot is implemented in an imperative style,

2419 additional functions are needed to copy lattice elements.

2420 Like CIL, Soot provides only analysis, not transformation.

2424 but I'll be darned if I can find anything that might be termed a dataflow framework.}

2426 The Whirlwind compiler contains the dataflow framework implemented

2428 interleave analysis and transformation.

2429 Their implementation is much like our early efforts:

2430 it is a complicated mix of code that simultaneously manages interleaving,

2431 deep rewriting, and fixed-point computation.

2432 By~separating these tasks,

2433 our implementation simplifies the problem dramatically.

2434 Whirlwind's implementation also suffers from the difficulty of

2435 maintaining pointer invariants in a mutable representation of

2436 control-flow graphs, a problem we have discussed elsewhere

2439 Because speculative transformation is difficult in an imperative setting,

2440 Whirlwind's implementation is split into two phases.

2441 The first phase runs the interleaved analyses and transformations

2442 to compute the final dataflow facts and a representation of the transformations

2443 that should be applied to the input graph.

2444 The second phase executes the transformations.

2445 In~\hoopl, because control-flow graphs are immutable, speculative transformations

2446 can be applied immediately, and there is no need

2447 for a phase distinction.

2449 %%% % repetitious...

2450 %%%

2451 %%% \ourlib\ also improves upon Whirlwind's dataflow framework by providing

2452 %%% new support for the optimization writer:

2453 %%% \begin{itemize}

2454 %%% \item Using static type guarantees, \hoopl\ rules out a whole

2455 %%% class of possible bugs: transformations that produced malformed

2456 %%% control-flow graphs.

2457 %%% \item Using dynamic testing,

2458 %%% we can isolate the rewrite that transforms a working program

2459 %%% into a faulty program,

2460 %%% using Whalley's \citeyearpar{whalley:isolation} fault-isolation technique.

2461 %%% \end{itemize}

2464 described a zipper-based representation of control-flow

2465 graphs, stressing the advantages

2466 of immutability.

2469 \item

2470 We can concatenate nodes, blocks, and graphs in constant time.

2471 %Previously, we had to resort to Hughes's

2472 %\citeyearpar{hughes:lists-representation:article} technique, representing

2473 %a graph as a function.

2474 \item

2475 We can do a backward analysis without having

2476 to ``unzip'' (and allocate a copy of) each block.

2477 \item

2478 Using GADTs, we can represent a flow-graph

2479 node using a single type, instead of the triple of first, middle, and

2480 last types used in our earlier representation.

2481 This change simplifies the interface significantly:

2482 instead of providing three transfer functions and three rewrite

2483 functions per pass---one for

2484 each type of node---a client of \hoopl\ provides only one transfer

2485 function and one rewrite function per pass.

2486 \item

2487 Errors in concatenation are ruled out at

2488 compile-compile time by Haskell's static

2489 type system.

2490 In~earlier implementations, such errors were not detected until

2491 the compiler~ran, at which point we tried to compensate

2492 for the errors---but

2493 the compensation code harbored subtle faults,

2494 which we discovered while developing a new back end

2495 for the Glasgow Haskell Compiler.

2498 The implementation of \ourlib\ is also much better than

2499 our earlier implementations.

2500 Not only is the code simpler conceptually,

2501 but it is also shorter:

2502 our new implementation is about a third as long

2503 as the previous version, which is part of GHC, version 6.12.

2510 We have spent six years implementing and reimplementing frameworks for

2511 dataflow analysis and transformation.

2512 This formidable design problem taught us

2513 two kinds of lessons:

2514 we learned some very specific lessons about representations and

2515 algorithms for optimizing compilers,

2516 and we were forcibly reminded of some very general, old lessons that are well

2517 known not just to functional programmers, but to programmers

2518 everywhere.

2522 %% \remark{Orphaned: but for transfer functions that

2523 %% approximate weakest preconditions or strongest postconditions,

2524 %% monotonicity falls out naturally.}

2525 %%

2526 %%

2527 %% In conclusion we offer the following lessons from the experience of designing

2528 %% and implementing \ourlib{}.

2529 %% \begin{itemize}

2530 %% \item

2531 %% Although we have not stressed this point, there is a close connection

2532 %% between dataflow analyses and program logic:

2533 %% \begin{itemize}

2534 %% \item

2535 %% A forward dataflow analysis is represented by a predicate transformer

2536 %% that is related to \emph{strongest postconditions}

2537 %% \cite{floyd:meaning}.\footnote

2538 %% {In Floyd's paper the output of the predicate transformer is called

2539 %% the \emph{strongest verifiable consequent}, not the ``strongest

2540 %% postcondition.''}

2541 %% \item

2542 %% A backward dataflow analysis is represented by a predicate transformer

2543 %% that is related to \emph{weakest preconditions} \cite{dijkstra:discipline}.

2544 %% \end{itemize}

2545 %% Logicians write down the predicate transformers for the primitive

2546 %% program fragments, and then use compositional rules to ``lift'' them

2547 %% to a logic for whole programs. In the same way \ourlib{} lets the client

2548 %% write simple predicate transformers,

2549 %% and local rewrites based on those assertions, and ``lifts'' them to entire

2550 %% function bodies with arbitrary control flow.

2552 Our main goal for \hoopl\ was to combine three good ideas (interleaved

2553 analysis and transformation, optimization fuel, and an applicative

2554 control-flow graph) in a way that could easily be reused by many, many

2555 compiler writers.

2556 Reuse requires abstraction, and as is well known,

2557 designing good abstractions is challenging.

2558 \hoopl's data types and the functions over those types have been

2560 As~we were refining our design, we~found it invaluable to operate in

2561 two modes:

2562 In the first mode, we designed, built, and used a framework as an

2564 In the second mode, we designed and built a standalone library, then

2565 redesigned and rebuilt it, sometimes going through several significant

2566 changes in a week.

2567 Operating in the first mode---inside a live compiler---forced us to

2568 make sure that no corners were cut, that we were solving a real

2569 problem, and that we did not inadvertently

2570 cripple some other part of the compiler.

2571 Operating in the second mode---as a standalone library---enabled us to

2572 iterate furiously, trying out many more ideas than would have

2573 been possible in the first mode.

2574 We~have learned that alternating between these two modes leads to a

2575 better design than operating in either mode alone.

2577 We were forcibly reminded of timeless truths:

2578 that interfaces are more important than implementations, and that data

2579 is more important than code.

2580 These truths are reflected in this paper, in which

2581 we

2584 We were also reminded that Haskell's type system (polymorphism, GADTs,

2585 higher-order functions, type classes, and so on) is a remarkably

2586 effective

2587 language for thinking about data and code---and that

2588 Haskell lacks a language of interfaces (like ML's signatures) that

2589 would make it equally effective for thinking about APIs at a larger scale.

2590 Still, as usual, the types were a remarkable aid to writing the code:

2591 when we finally agreed on the types presented above, the

2592 code almost wrote itself.

2594 Types are widely appreciated at ICFP, but here are three specific

2595 examples of how types helped us:

2597 \item

2598 Reuse is enabled by representation-independence, which in a functional

2599 language is

2600 expressed through parametric polymorphism.

2602 made the code simpler, easier to understand, and easier to maintain.

2604 \ourlib\ must know about nodes, and to embody that knowledge in the

2606 \item

2607 We are proud of using GADTs to

2608 track the open and closed shapes of nodes, blocks, and graphs at

2609 compile time.

2610 Shapes may

2611 seem like a small refinement, but they helped tremendously when

2612 building \hoopl, and we expect them to help clients.

2613 %

2614 % this paper is just not about run-time performance ---NR

2615 %

2616 %%%% Moreover, the implementation is faster than it would otherwise be,

2617 %%%% because, say, a @(Fact O f)e@ is known to be just an @f@ rather than

2618 %%%% being a sum type that must be tested (with a statically known outcome!).

2619 %

2621 to nodes, blocks, and graphs helped our

2622 thinking and helped to structure the implementation.

2623 \item

2625 types: first, middle, and last nodes.

2626 Those designs therefore required

2627 three transfer functions, three rewrite functions, and so~on.

2628 Moving to a single, ``shapely'' node type was a major breakthrough:

2629 not only do we have just one node type, but our client need supply only

2630 one transfer function and one rewrite function.

2632 the type-level function for

2634 and outgoing facts depend on the shape of a node.

2637 Dataflow optimization is usually described as a way to improve imperative

2638 programs by mutating control-flow graphs.

2639 Such transformations appear very different from the tree rewriting

2640 that functional languages are so well known for, and that makes

2641 functional languages so attractive for writing other parts of compilers.

2642 But even though dataflow optimization looks very different from

2643 what we are used to,

2644 writing a dataflow optimizer

2645 in a pure functional language was a huge win.

2646 % We could not possibly have conceived \ourlib{} in C++.

2647 In~a pure functional language, not only do we know that

2648 no data structure will be unexpectedly mutated,

2649 but we are forced to be

2650 explicit about every input and output,

2651 and we are encouraged to implement things compositionally.

2652 This kind of thinking has helped us make

2653 significant improvements to the already tricky work of Lerner, Grove,

2654 and Chambers:

2655 per-function control of shallow vs deep rewriting

2660 We~trust that these improvements are right only because they are

2661 implemented in separate

2662 parts of the code that cannot interact except through

2663 explicit function calls.

2664 %%

2665 %%An ancestor of \ourlib{} is in the Glasgow Haskell Compiler today,

2666 %%in version~6.12.

2667 With this new, improved design in hand, we are now moving back to

2668 live-compiler mode, pushing \hoopl\ into version

2669 6.13 of the Glasgow Haskell Compiler.

2672 \acks

2674 The first and second authors were funded

2675 by a grant from Intel Corporation and

2677 These authors also thank Microsoft Research Ltd, UK, for funding

2678 extended visits to the third author.

2681 \makeatother

2688 \clearpage

2690 \appendix

2692 % omit LabelSet :: *

2693 % omit LabelMap :: *

2694 % omit delFromFactBase :: FactBase f -> [(Label,f)] -> FactBase f

2695 % omit elemFactBase :: Label -> FactBase f -> Bool

2696 % omit elemLabelSet :: Label -> LabelSet -> Bool

2697 % omit emptyLabelSet :: LabelSet

2698 % omit factBaseLabels :: FactBase f -> [Label]

2699 % omit extendFactBase :: FactBase f -> Label -> f -> FactBase f

2700 % omit extendLabelSet :: LabelSet -> Label -> LabelSet

2701 % omit getFuel :: FuelMonad Fuel

2702 % omit setFuel :: Fuel -> FuelMonad ()

2703 % omit lookupFact :: FactBase f -> Label -> Maybe f

2704 % omit factBaseList :: FactBase f -> [(Label, f)]

2711 data `TxFactBase n f

2712 = `TxFB { `tfb_fbase :: FactBase f

2713 , `tfb_rg :: RG n f C C -- Transformed blocks

2714 , `tfb_cha :: ChangeFlag

2715 , `tfb_lbls :: LabelSet }

2716 -- Set the tfb_cha flag iff

2717 -- (a) the fact in tfb_fbase for or a block L changes

2718 -- (b) L is in tfb_lbls.

2719 -- The tfb_lbls are all Labels of the *original*

2720 -- (not transformed) blocks

2722 `updateFact :: DataflowLattice f -> LabelSet -> (Label, f)

2723 -> (ChangeFlag, FactBase f)

2724 -> (ChangeFlag, FactBase f)

2725 updateFact ^lat ^lbls (lbl, ^new_fact) (^cha, fbase)

2726 | NoChange <- ^cha2 = (cha, fbase)

2727 | lbl `elemLabelSet` lbls = (SomeChange, new_fbase)

2728 | otherwise = (cha, new_fbase)

2729 where

2730 (cha2, ^res_fact)

2731 = case lookupFact fbase lbl of

2732 Nothing -> (SomeChange, new_fact)

2733 Just ^old_fact -> fact_extend lat old_fact new_fact

2734 ^new_fbase = extendFactBase fbase lbl res_fact

2736 fixpoint :: forall n f. Edges n

2737 => Bool -- Going forwards?

2738 -> DataflowLattice f

2739 -> (Block n C C -> FactBase f

2740 -> FuelMonad (RG n f C C, FactBase f))

2742 -> FuelMonad (RG n f C C, FactBase f)

2743 fixpoint ^is_fwd lat ^do_block ^init_fbase ^blocks

2744 = do { ^fuel <- getFuel

2745 ; ^tx_fb <- loop fuel init_fbase

2746 ; return (tfb_rg tx_fb,

2747 tfb_fbase tx_fb `delFromFactBase` blocks) }

2748 -- The outgoing FactBase contains facts only for

2749 -- Labels *not* in the blocks of the graph

2750 where

2752 -> TxFactBase n f -> FuelMonad (TxFactBase n f)

2753 tx_blocks [] tx_fb = return tx_fb

2754 tx_blocks ((lbl,blk):bs) tx_fb = tx_block lbl blk tx_fb

2755 >>= tx_blocks bs

2757 `tx_block :: Label -> Block n C C

2758 -> TxFactBase n f -> FuelMonad (TxFactBase n f)

2759 tx_block ^lbl ^blk tx_fb@(TxFB { tfb_fbase = fbase

2760 , tfb_lbls = lbls

2761 , tfb_rg = ^blks

2762 , tfb_cha = cha })

2763 | is_fwd && not (lbl `elemFactBase` fbase)

2765 | otherwise

2766 = do { (rg, ^out_facts) <- do_block blk fbase

2767 ; let (^cha', ^fbase')

2768 = foldr (updateFact lat lbls) (cha,fbase)

2769 (factBaseList out_facts)

2770 ; return (TxFB { tfb_lbls = extendLabelSet lbls lbl

2771 , tfb_rg = rg `RGCatC` blks

2772 , tfb_fbase = fbase'

2775 loop :: Fuel -> FactBase f -> FuelMonad (TxFactBase n f)

2776 `loop fuel fbase

2778 , tfb_cha = NoChange

2779 , tfb_rg = RGNil

2780 , tfb_lbls = emptyLabelSet}

2781 ; tx_fb <- tx_blocks blocks init_tx_fb

2782 ; case tfb_cha tx_fb of

2783 NoChange -> return tx_fb

2784 SomeChange -> setFuel fuel >>

2785 loop fuel (tfb_fbase tx_fb) }

2787 \par

2793 This appendix lists every nontrivial identifier used in the body of

2794 the paper.

2795 For each identifier, we list the page on which that identifier is

2796 defined or discussed---or when appropriate, the figure (with line

2797 number where possible).

2798 For those few identifiers not defined or discussed in text, we give

2799 the type signature and the page on which the identifier is first

2800 referred to.

2802 Some identifiers used in the text are defined in the Haskell Prelude;

2803 for those readers less familiar with Haskell, these identifiers are

2810 \let\hsprelude\dropit

2819 \noindent

2822 \else

2824 \fi

2825 }

2827 \noindent

2830 \else

2832 \fi

2833 }

2837 \noindent

2840 \else

2841 \texttt{#2} {let}- or $\lambda$-bound on \lineref{#1} of Figure~\ref{#3} on page~\pageref{#3}.\\

2842 \fi

2843 }

2854 \else

2856 \fi

2857 }

2859 \newif\ifundefinedsection\undefinedsectionfalse

2862 \ifundefinedsection

2864 \else

2865 \undefinedsectiontrue

2866 \par

2869 \fi

2870 }

2872 \begingroup

2873 \raggedright

2878 \undefinedsectionfalse

2882 \ifundefinedsection

2884 \else

2885 \undefinedsectiontrue

2886 \par

2889 \fi

2890 }

2891 \let\hspagedef\dropit

2892 \let\omithspagedef\dropit

2893 \let\omithsfigdef\dropit

2894 \let\hsfigdef\dropit

2895 \let\hstabdef\dropit

2896 \let\hspagedefll\dropit

2897 \let\hsfigdefll\dropit

2898 \let\nothspagedef\dropit

2899 \let\nothsfigdef\dropit

2900 \let\nothslinedef\dropit

2907 \endgroup

2910 \iffalse

2930 \fi

2939 THE FUEL PROBLEM:

2942 Here is the problem:

2944 A graph has an entry sequence, a body, and an exit sequence.

2945 Correctly computing facts on and flowing out of the body requires

2946 iteration; computation on the entry and exit sequences do not, since

2947 each is connected to the body by exactly one flow edge.

2949 The problem is to provide the correct fuel supply to the combined

2950 analysis/rewrite (iterator) functions, so that speculative rewriting

2951 is limited by the fuel supply.

2953 I will number iterations from 1 and name the fuel supplies as

2954 follows:

2956 f_pre fuel remaining before analysis/rewriting starts

2957 f_0 fuel remaining after analysis/rewriting of the entry sequence

2958 f_i, i>0 fuel remaining after iteration i of the body

2959 f_post fuel remaining after analysis/rewriting of the exit sequence

2961 The issue here is that only the last iteration of the body 'counts'.

2962 To formalize, I will name fuel consumed:

2964 C_pre fuel consumed by speculative rewrites in entry sequence

2965 C_i fuel consumed by speculative rewrites in iteration i of body

2966 C_post fuel consumed by speculative rewrites in exit sequence

2968 These quantities should be related as follows:

2970 f_0 = f_pre - C_pref

2971 f_i = f_0 - C_i where i > 0

2972 f_post = f_n - C_post where iteration converges after n steps

2974 When the fuel supply is passed explicitly as parameter and result, it

2975 is fairly easy to see how to keep reusing f_0 at every iteration, then

2976 extract f_n for use before the exit sequence. It is not obvious to me

2977 how to do it cleanly using the fuel monad.

2980 Norman