1 \newif\ifhaskellworkshop\haskellworkshoptrue

3 \iffalse

5 Payload: alternatives for functions polymorphic in node type

7 (n C O -> a, n C O -> b, n C O -> c)

9 vs

11 (forall e x . n e x -> ShapeTag e x, forall e x . n e x -> result e x)

13 where

15 data ShapeTag e x where

16 First :: ShapeTag C O

17 Middle :: ShapeTag O O

18 Last :: ShapeTag O C

20 result C O = a

21 result O O = b

22 result O C = c

24 The function returning ShapeTag can be passed in a type-class

25 dictionary.

30 Say something about the proliferation of heavyweight type signatures

31 required for GADT pattern matches. When the signature is three times

32 the size of the function, something is wrong...

35 I'm going to leave this point out, because in all the client code

36 we've written, we are (a) matching only on a node, and (b) matching on

37 all cases. In this scenario the GADT exhaustiveness checker provides

38 no additional benefits. Indeed, GADTs can be an irritant to the

39 client: in many pattern matches, GADTs make it impossible to default

40 cases.

43 I was thinking again about unwrapped nodes, cons-lists, snoc-lists,

44 and tree fringe. I think there's an analogy with ordinary lists, and

45 the analogy is 'concatMap' vs 'fold'. Just as with lists, the common

47 unwrapping of elements. We nevertheless prefer 'concatMap' because it

48 provides better separation of concerns. But this analogy suggests

49 several ideas:

51 - Can block processing be written in higher-order fashion?

53 - Can it be done in both styles (concatMap *and* fold)?

55 - Once we have framed the problem in these terms, can we write

56 fold-style cold that is not too hard to understand and maintain?

58 - Finally, is there an analog of stream fusion that would work for

59 control-flow graphs, enabling us to write some graph combinators

60 that be both perspicuous and efficient?

62 These are good observations for the paper and for future work.

65 ----------------------------------------------------------------

68 P.S. The three of us should have a nice little Skype chat about

69 higher-rank types. A lot of the questions still at issue boil down to

70 the following observations:

72 - It's really convenient for the *implementation* of Hoopl to put

73 forall to the left of an arrow. Polymorphic functions as

74 arguments are powerful and keen, and they make it really easy for

75 Hoopl to do its job.

77 - It's a bid inconvenient for a *client* of Hoopl to be forced to

78 supply functions that are polymorphic in shape. All constructors

79 have to be written out by hand; defaulting is not possible. (Or

80 at least John and I haven't figured out how to do it.)

82 - Using higher-rank types in Hoopl's interface makes some very

83 desirable things downright impossible:

85 - One can't write a generic dominator analysis with type

87 dom :: NonLocal n => FwdPass n Dominators

89 - One can't write a generic debugging wrapper of type

91 debug :: (Show n, Show f)

92 => TraceFn -> WhatToTrace -> FwdPass n f -> FwdPass n f

94 - One can't write a combinator of type

96 product :: FwdPass n f -> FwdPass n f' -> FwdPass n (f, f')

98 I submit that these things are all very desirable.

100 I'm therefore strongly in favor of removing the *requirement* that a

101 FwdPass include transfer and rewrite functions that are polymorphic in

102 shape. It will be a bit tedious to return to triples of functions,

103 but for those lucky clients who *can* write polymorphic functions and

104 who wish to, we can provide injection functions.

106 I should stress that I believe the tedium can be contained within

107 reasonable bounds---for example, the arfXXX functions that are

108 internal to Hoopl all remain polymorphic in shape (although not

109 higher-rank any longer).

111 \fi

114 %

115 % TODO

116 %

117 %

118 % AGraph = graph under construction

119 % Graph = replacement graph

120 %

121 %%% Hoopl assumes that the replacement graph for a node open on exit

122 %%% doesn't contain any additional exits

123 %%%

124 %%% introduce replacement graph in section 3, after graph.

125 %%% it has arbitrarily complicated internal control flow, but only

126 %%% one exit (if open on exit)

127 %%%

128 %%% a rquirement on the client that is not checked statically.

135 \newif\ifnoauthornotes \noauthornotesfalse

147 \genkillfalse

150 \newif\ifnotesinmargin \notesinmarginfalse

156 % higher-order optimization library

157 % ('Hoople' was taken -- see hoople.org)

158 \let\hoopl\ourlib

164 % l2h substitution ourlib Hoopl

165 % l2h substitution hoopl Hoopl

196 \makeatletter

198 \let\c@table=

203 }

208 \else

211 \fi

212 \else

214 \fi

215 }

218 }

229 \noindent

231 %{\small\textit{\def\rmdefault{cmr}\rmfamily\phantom{00}\phantom{: \,}}}%

232 \else

235 \fi

239 \else

242 \verbatim

243 }

246 \makeatother

256 \makeatletter

260 \makeatother

269 \let\cite\citep

291 %%

292 %% 2009/05/10: removed 'float' package because it breaks multiple

293 %% \caption's per {figure} environment. ---NR

294 %%

295 %% % Put figures in boxes --- WHY??? --NR

296 %% \usepackage{float}

297 %% \floatstyle{boxed}

298 %% \restylefloat{figure}

299 %% \restylefloat{table}

303 % ON LINE THREE, set \noauthornotestrue to suppress notes (or not)

305 %\newcommand{\qed}{QED}

306 \ifnotesinmargin

308 \ifvmode

313 \else

320 \else

321 % Simon: please set \notesinmarginfalse on the first line

323 \fi

324 \ifnoauthornotes

326 \fi

333 \let\remark\norman

335 % \let \finalremark \remark % uncomment after submission

358 \iftimestamp

360 \fi

369 %\title{\ourlib: Dataflow Optimization Made Simple}

371 %\title{Implementing Dataflow Analysis and Optimization by Lifting Node Functions to Basic Blocks and Control-Flow Graphs}

372 %\subtitle{Programming pearl}

375 Under consideration for publication in the ACM Haskell Symposium.

376 Comments

377 are welcome; please identify this version by the words

379 }}

385 \ifnoauthornotes

386 \makeatletter

387 \let\HyPsd@Warning=

388 \@gobble

389 \makeatother

390 \fi

392 % JoÃ£o

400 \maketitle

406 We present \ourlib, a Haskell library that makes it easy for a

407 compiler writer

408 to implement program transformations based on dataflow analyses.

409 A~client of \ourlib\ defines a representation of

410 logical assertions,

411 a transfer function that computes outgoing assertions from incoming

412 assertions,

413 and a rewrite function that improves code when improvements are

414 justified by the assertions.

415 \ourlib\ does the actual analysis and transformation.

417 \ourlib\ implements state-of-the art algorithms:

418 Lerner, Grove, and Chambers's

420 composition of simple analyses and transformations, which achieves

421 the same precision as complex, handwritten

422 ``super-analyses;''

424 isolating bugs in a client's code.

425 \ourlib's implementation is unique in that unlike previous

426 implementations,

427 it carefully separates the tricky

428 elements of each of these algorithms, so that they can be examined and

429 understood independently.

434 \fi

435 Dataflow analysis and transformation of control-flow graphs is

436 pervasive in optimizing compilers, but it is typically tightly

439 unusually easy to define new analyses and

441 \ourlib's interface is modular and polymorphic,

442 and it offers unusually strong static guarantees.

443 The implementation encapsulates

444 state-of-the-art algorithms (interleaved analysis and rewriting,

445 dynamic error isolation), and it cleanly separates their tricky elements

446 so that they can be understood independently.

447 %

448 %\ourlib\ will be the workhorse of a new

449 %back end for the Glasgow Haskell Compiler (version~6.14, forthcoming).

451 %% \emph{Reviewers:} code examples are indexed at {\small\url{http://bit.ly/jkr3K}}

452 %%% Source: http://www.cs.tufts.edu/~nr/drop/popl-index.pdf

459 A mature optimizing compiler for an imperative language includes many

460 analyses, the results of which justify the optimizer's

461 code-improving transformations.

462 Many of the most important analyses and transformations---constant

463 propagation, live-variable analysis, inlining, sinking of loads,

464 and so on---should be regarded as particular cases of

466 %% \remark{I do not feel compelled to cite Muchnick (or anyone else) here}

467 Dataflow analysis is over thirty years old,

469 describing a powerful but subtle way to

471 piggybacks on the other.

473 Because optimizations based on dataflow analysis

474 share a common intellectual framework, and because that framework is

475 subtle, it it tempting to

476 try to build a single reusable library that embodies the

477 subtle ideas, while

478 making it easy for clients to instantiate the library for different

479 situations.

480 % Tempting, but difficult.

481 Although such libraries exist, as we discuss

483 and none implements the Lerner/Grove/Chambers technique.

486 optimization library''), a new Haskell library for dataflow analysis and

487 optimization. It has the following distinctive characteristics:

490 \item

491 \ourlib\ is purely functional.

492 Although pure functional languages are not obviously suited to writing

493 standard algorithms that

494 manipulate control-flow graphs,

495 the pure functional code is actually easier to write, and far easier

496 to write correctly, than code that is mostly functional but uses a mutable

497 representation of graphs

499 When analysis and rewriting

501 without knowing whether

502 the result of the rewrite will be retained or discarded,

503 the benefit of a purely functional style is intensified.

505 \item

506 \ourlib\ is polymorphic. Just as a list library is

508 the nodes that inhabit graphs and in the dataflow facts that

511 \item

512 The paper by Lerner, Grove, and Chambers is inspiring but abstract.

513 We articulate their ideas in a concrete, simple API,

514 which hides

515 a subtle implementation

517 You provide a representation for assertions,

518 a transfer function that transforms assertions across a node,

519 and a rewrite function that uses an assertion to

520 justify rewriting a node.

521 \ourlib\ ``lifts'' these node-level functions to work over

522 control-flow graphs, solves recursion equations,

523 and interleaves rewriting with analysis.

524 Designing good APIs is surprisingly

525 hard; we have been through over a dozen significantly different

526 iterations, and we offer our API as a contribution.

528 \item

529 Because the client

530 can perform very local reasoning (``@y@ is live before

531 @x:=y+2@''),

533 analyses and transformations built on \ourlib\

534 are small, simple, and easy to get right.

535 Moreover, \ourlib\ helps you write correct optimizations:

536 it~statically rules out transformations that violate invariants

538 and dynamically it can help find the first transformation that introduces a fault

541 but I like this sentence with its claim to both static and dynamic assistance,

542 and maybe the open/closed story is hard to understand here.}

544 % \item \ourlib{} makes use of GADTS and type functions to offer unusually

545 % strong static guarantees. In particular, nodes, basic blocks, and

546 % graphs are all statically typed by their open or closedness on entry, and

547 % their open or closedness on exit (\secref{graph-rep}). For example, an add instruction is

548 % open on entry and exit, while a branch is open on entry and closed on exit.

549 % Using these types we can statically guarantee that, say, an add instruction

550 % is rewritten to a graph that is also open at both entry and exit; and

551 % that the user cannot construct a block where an add instruction follows an

552 % unconditional branch. We know of no other system that offers

553 % static guarantees this strong.

556 (a)~interleaved analysis and rewriting, (b)~speculative rewriting,

557 (c)~computing fixed points, and (d)~dynamic fault isolation.

558 Previous implementations of these algorithms---including three of our

559 own---are complicated and hard to understand, because the tricky pieces

560 are implemented all together, inseparably.

561 In~this paper, each tricky piece is handled in just

563 We~emphasize this implementation as an object of interest in

564 its own right.

566 Our work bridges the gap between abstract, theoretical presentations

567 and actual compilers.

570 One of \hoopl's clients

571 is the Glasgow Haskell Compiler, which uses \hoopl\ to optimize

572 imperative code in GHC's back end.

575 relatively sophisticated aspects of Haskell's type system, such

576 as higher-rank polymorphism, GADTs, and type functions.

578 of these features.

587 %% We begin by setting the scene, introducing some vocabulary, and

588 %% showing a small motivating example.

589 A control-flow graph, perhaps representing the body of a procedure,

591 Each block is a sequence of instructions,

592 beginning with a label and ending with a

593 control-transfer instruction that branches to other blocks.

594 % Each block has a label at the beginning,

595 % a sequence of

596 % -- each of which has a label at the

597 % beginning. Each block may branch to other blocks with arbitrarily

598 % complex control flow.

599 The goal of dataflow optimization is to compute valid

601 then use those assertions to justify code-improving

604 As~a concrete example, we consider constant propagation with constant

605 folding.

606 On the left we have a basic block; in the middle we have

608 in the block; and at

609 the right we have the result of transforming the block

610 based on the assertions:

612 Before Facts After

613 ------------{}-------------

616 z := x>5 z := True

618 if z goto L1

619 then goto L1

620 else goto L2

622 Constant propagation works

623 from top to bottom. We start with the empty fact.

625 transformation?

626 Yes: we can replace the node with @x:=7@.

627 Now, given this transformed node,

628 and the original fact, what fact flows out of the bottom of

629 the transformed node?

633 Now, can we make another transformation? Yes: constant folding can

634 replace the node with @z:=True@.

635 The~process continues to the end of the block, where we

636 can replace the conditional branch with an unconditional one, @goto L1@.

638 The example above is simple because the program has only straightline code;

639 when programs have loops, dataflow analysis gets more complicated.

640 For example,

641 consider the following graph,

642 where we assume @L1@ is the entry point:

645 L2: x=7; goto L3

646 L3: ...

648 Because control flows to @L3@ from two places (@L1@~and~@L2@),

650 All paths to @L3@ produce the fact @y=@$4$,

651 so we can conclude that this fact holds at~@L3@.

652 But depending on the the path to~@L3@, @x@~may have different

653 values, so we conclude ``@x=@$\top$'',

654 meaning that there is no single value held by~@x@ at~@L3@.

655 %\footnote{

656 %In this example @x@ really does vary at @L3@, but in general

657 %the analysis might be conservative.}

658 The final result of joining the dataflow facts that flow to~@L3@

665 assertion about the program state (such as ``variable~@x@ holds value~@7@'').

667 A~prime example is live-variable analysis, where a fact takes the~form

668 ``variable @x@ is live'' and is an assertion about the

670 live'' at a program point P is an assertion that @x@ is used on some program

672 The accompanying transformation is called dead-code elimination;

673 if @x@~is not live, this transformation

674 replaces the node @x:=e@ with a no-op.

679 Interleaving makes it far easier to write effective analyses.

682 the transformations.

684 and the instruction @z:=x>5@,

685 a pure analysis could produce the outgoing fact

687 But the subsequent transformation must perform

691 the transfer function becomes laughably simple: it merely has to see if the

694 Another example is the interleaving of liveness analysis and dead-code elimination.

696 it is sufficient for the analysis to~say ``@y@ is live before @x:=y+2@''.

697 It is not necessary to have the more complex rule

698 ``if @x@~is live after @x:=y+2@ then @y@ is live before~it,''

700 the assignment @x:=y+2@ will be eliminated.

701 If~there are a number of interacting

702 analyses and/or transformations,

703 the benefit of interleaving them is even more compelling; for more

706 But the compelling benefits come at a cost:

707 in~programs that have loops, it is impossible to analyze every block

708 after all its predecessors, and it is possible that the fact

709 justifying a program transformation in a block~@L@ may be invalidated

710 by later analysis of one of @L@'s predecessors.

711 For~this reason, for safety's sake, transformations must be

714 }

721 and transformations driven by these analyses, on control-flow graphs.

722 Graphs are composed from smaller units, which we discuss from the

735 Nodes, blocks, and graphs share important properties in common.

740 control-transfer instruction to a named label.

741 For example,

743 \item A shift-left instruction is open on entry (because control can fall into it

744 from the preceding instruction), and open on exit (because control falls through

745 to the next instruction).

746 \item An unconditional branch is open on entry, but closed on exit (because

747 control cannot fall through to the next instruction).

749 control to fall through into a branch target), but open on exit.

750 \item

751 A~function call should be closed on exit,

752 because control can flow from a call site to multiple points:

753 for example, a~return continuation or an exception handler.

754 (And~after optimization, distinct call sites may share a return

755 continuation.)

757 % This taxonomy enables \ourlib{} to enforce invariants:

758 % only nodes closed on entry can be the targets of branches, and only nodes closed

759 % on exits can transfer control (see also \secref{nonlocal-class}).

760 % As~a consequence, all control transfers originate at control-transfer

761 % instructions and terminated at labels; this invariant dramatically

762 % simplifies analysis and transformation.

763 These examples concern nodes, but the same classification applies

764 to blocks and graphs. For example the block

768 is open on entry and closed on exit.

770 ``open/closed;''

771 we may refer to an ``open/closed block.''

773 The shape of a thing determines that thing's control-flow properties.

774 In particular, whenever E is a node, block, or graph,

775 % : \simon{Removed the claim about a unique entry point.}

777 \item

778 If E is open on entry, it has a unique predecessor;

779 if it is closed, it may have arbitrarily many predecessors---or none.

780 \item

781 If E is open on exit, it has a unique successor;

782 if it is closed, it may have arbitrarily many successors---or none.

784 %%%%

785 %%%%

786 %%%% % \item Regardless of whether E is open or closed,

787 %%%% % it has a unique entry point where execution begins.

788 %%%% \item If E is closed on exit, control leaves \emph{only}

789 %%%% by explicit branches from closed-on-exit nodes.

790 %%%% \item If E is open on exit, control \emph{may} leave E

791 %%%% by ``falling through'' from a distinguished exit point.

792 %%%% \remark{If E is a node or block, control \emph{only} leaves E by

793 %%%% falling through, but this isn't so for a graph. Example: a body of a

794 %%%% loop contains a \texttt{break} statement} \simon{I don't understand.

795 %%%% A break statement would have to be translated as a branch, no?

796 %%%% Can you give a an example? I claim that control only leaves an

797 %%%% open graph by falling through.}

798 %%%% \end{itemize}

805 Typically, a node might represent a machine instruction, such as an

806 assignment, a call, or a conditional branch.

808 so each client can define nodes as it likes.

809 Because they contain nodes defined by the client,

810 graphs can include arbitrary client-specified data, including

811 (say) method calls, C~statements, stack maps, or

812 whatever.

817 data `Node e x where

818 Label :: Label -> Node C O

819 `Assign :: Var -> Expr -> Node O O

820 `Store :: Expr -> Expr -> Node O O

821 `Branch :: Label -> Node O C

822 `CondBranch :: Expr -> Label -> Label -> Node O C

823 ... more constructors ...

830 closed on entry and exit:

831 the type of a node has kind @*->*->*@, where the two type parameters

832 are type-level flags, one for entry and one for exit.

833 Such a type parameter may be instantiated only with type @O@~(for

834 open) or type~@C@ (for closed).

836 As an example,

839 The type parameters are written @e@ and @x@, for

840 entry and exit respectively.

841 The type is a generalized algebraic data type;

842 the syntax gives the type of each constructor.

843 %%% \cite{peyton-jones:unification-based-gadts}.

844 For example, constructor @Label@

845 takes a @Label@ and returns a node of type @Node C O@, where

846 the~``@C@'' says ``closed on entry'' and the~``@O@'' says ``open on exit''.

847 The types @Label@, @O@, and~@C@ are

849 As another example, constructor @Assign@ takes a variable and an expression, and it

850 returns a @Node@ open on both entry and exit; constructor @Store@ is

851 similar.

852 Types @`Var@ and @`Expr@ are private to the client, and

854 Finally, control-transfer nodes @Branch@ and @CondBranch@ are open on entry

855 and closed on exit.

857 Nodes closed on entry are the only targets of control transfers;

858 nodes open on entry and exit never perform control transfers;

860 To obey these invariants,

861 a node for

862 a conditional-branch instruction, which typically either transfers control

864 conditional branch, with the fall-through path in a separate block.

866 and it costs nothing in practice:

867 such code is easily sequentialized without superfluous branches.

868 %a late-stage code-layout pass can readily reconstruct efficient code.

869 }.

870 Because of the position each type of node occupies in a

871 basic block,

873 respectively.

877 \ourlib\ combines the client's nodes into

880 A~@Block@ is parameterized over the node type~@n@

881 as well as over the same flag types that make it open or closed at

882 entry and exit.

884 The @BFirst@, @BMiddle@, and @BLast@ constructors create one-node

885 blocks.

888 Why not use a single constructor

891 shape?

892 Because by making the shape known statically, we simplify the

893 implementation of analysis and transformation in

896 can see the representation of blocks?

897 Answer: At present the representation is partially exposed, and we

898 have no principled answer to the question. ---NR

899 OK; but what do we tell our readers?

900 }

902 The @BCat@ constructor concatenates blocks in sequence.

903 It~makes sense to concatenate blocks only when control can fall

904 through from the first to the second; therefore,

906 the point of concatenation.

907 This restriction is enforced by the type of @BCat@, whose first

908 argument must be open on exit, and whose second argument must be open on entry.

909 It~is statically impossible, for example, to concatenate a @Branch@

910 immediately before an

911 @Assign@.

912 Indeed, the @Block@ type statically guarantees that any

913 closed/closed @Block@---which compiler writers normally

914 call a ``basic block''---consists of exactly one first node

916 followed by zero or more middle nodes (@Assign@ or @Store@),

917 and terminated with exactly one

918 last node (@Branch@ or @CondBranch@).

919 Using GADTs to enforce these invariants is one of

925 In a block, a control-flow edge is implicit in every application of

926 the @BCat@ constructor.

927 An~implicit edge originates in a first node or a middle node and flows

928 to a middle node or a last node.

930 an~explicit edge originates in a last node and flows to a (labelled)

931 first node.

932 \hoopl\ discovers explicit edges by using the @successors@ and

933 @entryLabel@ functions of the @NonLocal@ class.

934 Any~edge, whether implicit or explicit, is considered a program point,

935 and an analysis written using \hoopl\ computes a dataflow fact at each

936 such point.

940 data `O -- Open

941 data `C -- Closed

943 data `Block n e x where

944 `BFirst :: n C O -> Block n C O

945 `BMiddle :: n O O -> Block n O O

946 `BLast :: n O C -> Block n O C

947 `BCat :: Block n e O -> Block n O x -> Block n e x

949 data `Graph n e x where

950 `GNil :: Graph n O O

951 `GUnit :: Block n O O -> Graph n O O

952 `GMany :: MaybeO e (Block n O C)

953 -> LabelMap (Block n C C)

954 -> MaybeO x (Block n C O)

955 -> Graph n e x

957 data `MaybeO ^ex t where

958 `JustO :: t -> MaybeO O t

959 `NothingO :: MaybeO C t

961 newtype `Label -- abstract type

963 class `NonLocal n where

964 `entryLabel :: n C x -> Label

970 % omit MaybeC :: * -> * -> *

977 Like @Block@, the data type @Graph@ is parameterized over

978 both nodes @n@ and its open/closed shape (@e@ and @x@).

979 It has three constructors. The first two

980 deal with the base cases of open/open graphs:

981 an empty graph is represented by @GNil@ while a single-block graph

982 is represented by @GUnit@.

984 More general graphs are represented by @GMany@, which has three

985 fields: an optional entry sequence, a body, and an optional exit

986 sequence.

988 \item

990 @Block n O C@.

991 We could represent this sequence as a value of type

992 @Maybe (Block n O C)@, but we can do better:

995 if the graph is open on entry.

996 We~express our compile-time knowledge by using the type

997 @MaybeO e (Block n O C)@, a type-indexed version of @Maybe@

999 the type @MaybeO O a@ is isomorphic to~@a@, while

1000 the type @MaybeO C a@ is isomorphic to~@()@.

1001 \item

1003 To~facilitate traversal of the graph, we represent a body as a finite

1004 map from label to block.

1005 \item

1007 sequence, its presence or absence is deducible from the static type of the graph.

1012 Graphs can be spliced together nicely; the cost is logarithmic in the

1013 number of closed/closed blocks.

1014 Unlike blocks, two graphs may be spliced together

1015 not only when they are both open

1016 at splice point but also

1017 when they are both closed---and not in the other two cases:

1018 \begingroup

1021 `gSplice :: Graph n e a -> Graph n a x -> Graph n e x

1022 gSplice GNil g2 = g2

1023 gSplice g1 GNil = g1\cnl

1024 gSplice (GUnit ^b1) (GUnit ^b2) = GUnit (b1 `BCat` b2)\cnl

1025 gSplice (GUnit b) (GMany (JustO e) bs x)

1026 = GMany (JustO (b `BCat` e)) bs x\cnl

1027 gSplice (GMany e ^bs (JustO x)) (GUnit b2)

1028 = GMany e bs (JustO (x `BCat` b2))\cnl

1029 gSplice (GMany e1 ^bs1 (JustO x1)) (GMany (JustO e2) ^bs2 x2)

1030 = GMany e1 (bs1 `mapUnion` (b `addBlock` bs2)) x2

1031 where b = x1 `BCat` e2\cnl

1032 gSplice (GMany e1 bs1 NothingO) (GMany NothingO bs2 x2)

1033 = GMany e1 (bs1 `mapUnion` bs2) x2

1035 \endgroup

1036 % omit mapUnion :: LabelMap a -> LabelMap a -> LabelMap a

1037 % omit addBlock :: NonLocal n => Block n C C -> LabelMap (Block n C C) -> LabelMap (Block n C C)

1038 This definition illustrates the power of GADTs: the

1039 pattern matching is exhaustive, and all the open/closed invariants are

1040 statically checked. For example, consider the second-last equation for @gSplice@.

1041 Since the exit sequence of the first argument is @JustO x1@,

1042 we know that type parameter~@a@ is~@O@, and hence the entry sequence of the second

1043 argument must be @JustO e2@.

1044 Moreover, block~@x1@ must be

1045 closed/open, and block~@e2@ must be open/closed.

1046 We can therefore concatenate them

1047 with @BCat@ to produce a closed/closed block, which is

1048 added to the body of the result.

1050 The representation of @Graph@s is exposed to \hoopl's clients.

1052 We~have carefully crafted the types so that if @BCat@

1053 is considered as an associative operator,

1055 Need some words about how this is part of making the API simple for

1056 the client---I've added something to the end of the paragraph, but I'm

1057 not particularly thrilled.}

1058 %% \simon{Well, you were the one who was so keen on a unique representation!

1059 %% And since we have one, I think tis useful to say so. Lastly, the representation of

1060 %% singleton blocks is not entirely obvious.}

1061 %%%%

1062 %%%% An empty open/open graph is represented

1063 %%%% by @GNil@, while a closed/closed one is @gNilCC@:

1064 %%%% \par {\small

1065 %%%% \begin{code}

1066 %%%% gNilCC :: Graph C C

1067 %%%% gNilCC = GMany NothingO BodyEmpty NothingO

1068 %%%% \end{code}}

1069 %%%% The representation of a @Graph@ consisting of a single block~@b@

1070 %%%% depends on the shape of~@b@:\remark{Does anyone care?}

1071 %%%% \par{\small

1072 %%%% \begin{code}

1073 %%%% gUnitOO :: Block O O -> Graph O O

1074 %%%% gUnitOC :: Block O C -> Graph O C

1075 %%%% gUnitCO :: Block O C -> Graph C O

1076 %%%% gUnitCC :: Block O C -> Graph C C

1077 %%%% gUnitOO b = GUnit b

1078 %%%% gUnitOC b = GMany (JustO b) BodyEmpty NothingO

1079 %%%% gUnitCO b = GMany NothingO BodyEmpty (JustO b)

1080 %%%% gUnitCC b = GMany NothingO (BodyUnit b) NothingO

1081 %%%% \end{code}}

1082 %%%% Multi-block graphs are similar.

1083 %%%% From these definitions

1084 To guarantee uniqueness, @GUnit@ is restricted to open/open

1085 blocks.

1086 If~@GUnit@ were more polymorphic, there would be

1087 more than one way to represent some graphs, and it wouldn't be obvious

1088 to a client which representation to choose---or if the choice made a difference.

1095 it~still needs to know how a node may transfer control from one

1096 block to another.

1097 \hoopl\ also

1098 needs to know what @Label@ is on the first node in a block.

1099 If \hoopl\ is polymorphic in the node type,

1100 how can it{} know these things?

1101 \hoopl\ requires the client to make the node type an instance of

1104 and returns its @Label@;

1105 the @successors@ method takes a last node (closed on exit) and returns

1106 the @Label@s to

1107 which it can transfer control.

1108 A~middle node, which is open on both entry and exit, transfers control

1109 only locally, to its successor within a basic block,

1110 so no corresponding interrogation function is needed.

1112 %% A node type defined by a client must be an instance of @NonLocal@.

1114 the client's instance declaration for @Node@ would be

1116 instance NonLocal Node where

1117 entryLabel (Label l) = l

1121 Again, the pattern matching for both functions is exhaustive, and

1122 the compiler statically checks this fact.

1123 Here, @entryLabel@

1124 cannot be applied to an @Assign@ or @Branch@ node,

1125 and any attempt to define a case for @Assign@ or @Branch@ would result

1126 in a type error.

1128 While the client provides this information about

1129 nodes, it is convenient for \ourlib\ to get the same information

1130 about blocks.

1131 Internally,

1134 instance NonLocal n => NonLocal (Block n) where

1135 entryLabel (BFirst n) = entryLabel n

1136 entryLabel (BCat b _) = entryLabel b

1137 successors (BLast n) = successors n

1138 successors (BCat _ b) = successors b

1140 Because the functions @entryLabel@ and @successors@ are used to track control

1142 entry label or successors of a @Graph@ itself.

1144 if a @Graph@ is closed on entry, it need not have a unique entry label.

1145 %

1146 % A slight infelicity is that we cannot make @Graph@ an instance of @NonLocal@,

1147 % because a graph closed on entry has no unique label. Fortunately, we never

1148 % need @Graph@s to be in @NonLocal@.

1149 %

1150 % ``Never apologize. Never confess to infelicity.'' ---SLPJ

1158 }

1167 Control-flow graphs& Us & Us & One \\

1169 Dataflow fact~$F$ & You & You & One type per logic \\

1171 Transfer functions & Us & You & One per analysis \\

1173 Solve-and-rewrite functions & Us & Us & Two (forward, backward) \\

1175 }

1188 Now that we have graphs, how do we optimize them?

1190 A~client must supply the following pieces:

1194 that let the client build control-flow graphs out of nodes.

1197 Each analysis uses facts that are specific to that particular analysis,

1199 the fact type.

1204 and which returns either @Nothing@

1205 or @Just g@,

1206 where @g@~is a graph that should

1207 replace the node

1210 % that may include internal control flow

1211 is crucial for many code-improving transformations.

1212 %We discuss the rewrite function

1213 %in Sections \ref{sec:rewrites} and \ref{sec:shallow-vs-deep}.

1216 Because facts, transfer functions, and rewrite functions work closely together,

1217 we represent their combination

1222 Given a node type~@n@ and a @FwdPass@,

1224 graph.

1225 \hoopl\ provides a fully polymorphic interface, but for purposes of

1226 exposition, we present a function that is specialized to a

1227 closed/closed graph:

1229 `analyzeAndRewriteFwdBody

1230 :: ( CkpointMonad m -- Roll back speculative actions

1231 , NonLocal n ) -- Extract non-local flow edges

1232 => FwdPass m n f -- Lattice, transfer, rewrite

1234 -> Graph n C C -- Input graph

1235 -> FactBase f -- Input fact(s)

1236 -> m ( Graph n C C -- Result graph

1237 , FactBase f ) -- ... and its facts

1239 Given a @FwdPass@ and a list of entry points, the

1240 analyze-and-rewrite function transforms a graph into

1241 an optimized graph.

1242 As its type shows, this function

1243 is polymorphic in the types of nodes~@n@ and facts~@f@;

1244 these types are determined by the client.

1245 The type of the monad~@m@ is also determined by the client, but it must

1246 meet the constraints implied by @FuelMonad@, as described in

1248 %%

1249 %% offer at least the operations of the @FuelMonad@ class (\figref{api-types}),

1250 %% namely: the ability to allocate fresh labels (\secref{rewrites}) and to

1251 %% manage ``optimisation fuel'' (\secref{fuel-monad}).

1253 As well as taking and returning a graph, the

1254 function also takes input facts (the @FactBase@) and produces output facts.

1255 A~@FactBase@ is simply a finite mapping from @Label@ to facts;

1256 if~a @Label@ is not in the domain of the @FactBase@, its fact is the

1257 bottom element of the lattice.

1259 if the graph

1260 represents the body of a procedure

1261 with parameters $x,y,z$, we would map the entry @Label@ to a fact

1263 parameters are not known to be constants.

1264 %%

1265 %% The

1266 %% output @FactBase@ maps each @Label@ in the body to its fact

1268 The client's model of how @analyzeAndRewriteFwdBody@ works is as follows:

1269 \ourlib\ walks forward over each block in the graph.

1270 At each node, \ourlib\ applies the

1271 rewrite function to the node and the incoming fact. If the rewrite

1272 function returns @Nothing@, the node is retained as part of the output

1273 graph, the transfer function is used to compute the downstream fact,

1274 and \ourlib\ moves on to the next node.

1275 But if the rewrite function returns @Just (FwdRew g rw)@,

1276 indicating that it wants to rewrite the node to the replacement graph~@g@, then

1277 \ourlib\ recursively analyzes and rewrites~@g@, using the new rewrite

1278 function~@rw@, before moving on to the next node.

1279 A~node following a rewritten node sees

1281 analyzing the replacement graph.

1284 the incoming fact.

1285 If~further analysis turns out to invalidate the fact, \hoopl\ rolls

1286 back the action.

1287 \hoopl\ can roll back the graph easily enough because the graph is a

1288 purely functional data structure.

1289 But in addition to changing the graph,

1290 a~rewrite function may take action in an arbitrary monad defined by

1292 One reasonable example might be that the client wishes to log how

1293 often a transformation is applied, or in which functions, or in a

1294 combined dataflow pass what transformations precede what others.

1295 Another example is that we don't want to mandate how fresh names are

1296 constructed, and we don't want to limit clients to a single supply of

1297 fresh names.

1299 no~matter how well designed the interface,

1300 if it does not accomodate design choices already embodied in existing

1301 compilers, it won't be used.

1302 }

1303 To~enable \hoopl\ to roll back any actions in this monad, the monad

1304 must satisfy the @CkpointMonad@ and its associated algebraic law, as

1306 }

1308 Below we flesh out the

1309 interface to @analyzeAndRewriteFwdBody@, leaving the implementation for

1311 %{\hfuzz=7.2pt\par}

1316 data `FwdPass m n f

1317 = FwdPass { `fp_lattice :: DataflowLattice f

1318 , `fp_transfer :: FwdTransfer n f

1319 , `fp_rewrite :: FwdRewrite m n f }

1321 ------- Lattice ----------

1322 data `DataflowLattice a = DataflowLattice

1323 { `fact_bot :: a

1324 , `fact_join :: OldFact a -> NewFact a

1325 -> (ChangeFlag, a) }

1327 data `ChangeFlag = `NoChange | `SomeChange

1328 newtype `OldFact a = OldFact a

1329 newtype `NewFact a = NewFact a

1331 ------- Transfers ----------

1332 newtype `FwdTransfer n f -- abstract type

1333 `mkFTransfer

1334 :: (forall e x . n e x -> f -> Fact x f)

1335 -> FwdTransfer n f

1337 ------- Rewrites ----------

1338 newtype `FwdRewrite m n f -- abstract type

1339 `mkFRewrite :: FuelMonad m

1340 => (forall e x . n e x -> f -> m (Maybe (Graph n e x)))

1341 -> FwdRewrite m n f

1342 thenFwdRw :: FwdRewrite m n f -> FwdRewrite m n f

1343 -> FwdRewrite m n f

1344 iterFwdRw :: FwdRewrite m n f -> FwdRewrite m n f

1345 noFwdRw :: Monad m => FwdRewrite m n f

1347 ------- Fact-like things, aka "fact(s)" -----

1348 type family `Fact x f :: *

1349 type instance Fact O f = f

1350 type instance Fact C f = FactBase f

1352 ------- FactBase -------

1353 type `FactBase f = LabelMap f

1354 -- A finite mapping from Labels to facts f

1356 ------- Monadic rewrites, speculatively ----

1357 class Monad m => CkpointMonad m where

1358 type Checkpoint m

1359 checkpoint :: m (Checkpoint m)

1360 restart :: Checkpoint m -> m ()

1362 ------- Optimization fuel ----

1363 type `Fuel = Int

1364 class Monad m => `FuelMonad m where

1365 `getFuel :: m Fuel

1366 `setFuel :: Fuel -> m ()

1373 % omit mkFactBase :: [(Label, f)] -> FactBase f

1374 %%%% \simon{We previously renamed @fact\_join@ to @fact\_extend@ because it really

1375 %%%% is not a symmetrical join; we're extending an old fact with a new one.

1376 %%%% NR: Yes, but the asymmetry is now explicit in the \emph{type}, so it

1377 %%%% needn't also be reflected in the \emph{name}.}

1381 For each analysis or transformation, the client must define a type

1382 of dataflow facts.

1383 A~dataflow fact often represents an assertion

1384 about a program point,

1389 claims that variable @x@ holds value @3@ at P, regardless of the

1390 path by which P is reached.

1393 assertion ``@x@ is dead'' at point P claims that no path from P uses

1394 variable @x@.

1398 (a)~the bottom element of the lattice and (b)~how to take

1399 the least upper bound (join) of two elements.

1400 To ensure that analysis

1401 terminates, it is enough if every fact has a finite number of

1402 distinct facts above it, so that repeated joins

1403 eventually reach a fixed point.

1404 %% \simon{There is no mention here of the @OldFact@ and @NewFact@ types.

1405 %% Shall we nuke them for the purposes of the paper?

1406 %% NR: They should definitely not be nuked; they're needed if the reader

1407 %% wants to understand the asymmetrical behavior of @fact\_join@.

1408 %% I'm~also mindful that this paper will be the primary explanation of

1409 %% \hoopl\ to its users, and I don't want to hide this part of the

1410 %% interface.

1411 %% As for mentions in the text,

1412 %% if you look carefully, you'll

1413 %% see that I've changed the subscripts in the text to ``old'' and

1414 %% ``new''.

1415 %% Together with the code, I believe that's enough to get the point across.

1416 %% SLPJ -- OK. I don't agree that it's enough, but we have more important fish to fry.

1417 %% }

1419 In practice, joins are computed at labels.

1421 label~$L$,

1423 into label~$L$,

1426 Furthermore, the dataflow engine wants to~know

1429 because if not, the analysis has not reached a fixed point.

1431 The bottom element and join operation of a lattice of facts of

1432 type~@f@ are stored in a value of type @DataflowLattice f@

1434 %%%% \simon{Can we shorten ``@DataflowLattice@'' to just

1435 %%%% ``@Lattice@''?} % bigger fish ---NR

1436 %%Such a value is part of the @FwdPass n f@ that is passed to

1437 %%@analyzeAndRewriteFwd@ above.

1438 As noted in the previous paragraph,

1440 to the old fact.

1441 Because this information can often be computed cheaply together

1442 with the join, \ourlib\ does not

1443 require a separate equality test on facts (which might be expensive).

1444 Instead, \ourlib\ requires that @fact_join@ return a @ChangeFlag@ as

1445 well as the least upper bound.

1446 The @ChangeFlag@ should be @NoChange@ if

1447 the result is the same as the old fact, and

1448 @SomeChange@ if the result differs.

1452 To help clients create lattices and join functions,

1453 \hoopl\ includes functions and constructors that can extend a type~@a@

1454 with top and bottom elements.

1455 In this paper, we use only type @`WithTop@, which comes with value

1456 constructors that have these types:

1458 `PElem :: a -> WithTop a

1459 `Top :: WithTop a

1461 \hoopl\ provides combinators which make it easier to create join

1462 functions that use @Top@.

1463 The most useful is @extendJoinDomain@:

1465 `extendJoinDomain

1466 :: (OldFact a -> NewFact a -> (ChangeFlag, WithTop a))

1467 -> (OldFact (WithTop a) -> NewFact (WithTop a)

1468 -> (ChangeFlag, WithTop a))

1471 type~@a@, but may produce @Top@ (as well as a fact of type~@a@)---as

1473 Calling @extendJoinDomain@ extends the client's function to a proper

1474 join function on the type @WithTop a@,

1475 and @extendJoinDomain@ makes sure that joins

1476 involving @Top@ obey the appropriate algebraic laws.

1478 \hoopl\ also provides a value constructor @`Bot@ and type constructors

1479 @`WithBot@ and @`WithTopAndBot@, along with similar functions.

1480 Constructors @Top@ and @Bot@ are polymorphic, so for example,

1481 @Top@~also has type @WithTopAndBot a@.

1487 A~forward transfer function is presented with the dataflow fact

1488 coming

1489 into a node, and it computes dataflow fact(s) on the node's outgoing edge(s).

1490 In a forward analysis, the dataflow engine starts with the fact at the

1491 beginning of a block and applies the transfer function to successive

1492 nodes in that block until eventually the transfer function for the last node

1493 computes the facts that are propagated to the block's successors.

1495 the following graph, with entry at @L1@:

1497 L1: x=3; goto L2

1499 if x>0 then goto L2 else return

1502 Analyzing @L1@ propagates this fact forward, by applying the transfer

1503 function successively to the nodes

1505 This new fact is joined with the existing (bottom) fact for @L2@.

1506 Now the analysis propagates @L2@'s fact forward, again using the transfer

1508 Again, the new fact is joined with the existing fact for @L2@, and the process

1509 is iterated until the facts for each label reach a fixed point.

1511 %%% But wait! % Too breathless ---NR

1512 A~transfer function has an unusual sort of type:

1513 not quite a dependent type, but not a bog-standard polymorphic type either.

1515 the type) of the node argument:

1516 If the node is open on exit, the transfer function produces a single fact.

1518 the transfer function

1519 produces a collection of (@Label@,fact) pairs, one for each outgoing edge.

1520 %

1521 The indexing is expressed by Haskell's (recently added)

1524 A~forward transfer function supplied by a client,

1525 which would be passed to @mkFTransfer@,

1526 is typically a function polymorphic in @e@ and @x@.

1527 It~takes a

1530 @f -> Fact x f@.

1531 Type constructor @Fact@

1532 should be thought of as a type-level function: its signature is given in the

1533 @type family@ declaration, while its definition is given by two @type instance@

1534 declarations. The first declaration says that a @Fact O f@, which

1535 comes out of a node

1537 The second declaration says that a @Fact C f@, which comes out of a

1540 %%

1541 %% \begin{code}

1542 %% `transfer_fn :: forall e x . n e x -> f -> Fact x f

1543 %% `node :: n e x

1544 %% \end{code}

1545 %% then @(transfer_fn node)@ is a fact transformer:

1546 %% \begin{code}

1547 %% transfer_fn node :: f -> Fact x f

1548 %% \end{code}

1549 %%

1557 We compute dataflow facts in order to enable code-improving

1558 transformations.

1559 In our constant-propagation example,

1560 the dataflow facts may enable us

1561 to simplify an expression by performing constant folding, or to

1562 turn a conditional branch into an unconditional one.

1563 Similarly, a liveness analysis may allow us to

1564 replace a dead assignment with a no-op.

1568 A~programmer creating a rewrite function chooses the type of a node~@n@ and

1569 a dataflow fact~@f@.

1570 A~rewrite function might also want access to fresh

1571 names (e.g., to label new blocks) or to other state (e.g., a~mapping

1572 indicating which loops a block is a part of).

1573 So~that a rewrite function may have access to such state, \hoopl\

1574 requires that a programmer creating a rewrite function also choose a

1577 must satisfy the constraint @CkpointMonad m@.}

1578 The programmer may write code that works with any such monad,

1579 may create a monad just for the client,

1580 or may use a monad supplied by \hoopl.

1582 When these choices are made, the easy way to create a rewrite

1584 The client supplies a function that is specialized to a particular

1585 node, fact, and (possibly) monad, but is polymorphic in the

1587 The function, which we will call~$r$, takes a node and a fact and returns a monadic

1588 computation, but what is the result of that computation?

1589 One might

1590 expect that the result should be a new node, but that is not enough:

1591 in~general, it must be possible for rewriting to result in a graph.

1592 For example,

1593 we might want to remove a node by returning the empty graph,

1594 or more ambitiously, we might want to replace a high-level operation

1595 with a tree of conditional branches or a loop, which would entail

1596 returning a graph containing new blocks with internal control flow.

1599 It~must also be possible for a rewrite function to decide to do nothing.

1600 The result of the monadic computation returned by~$r$ may therefore be

1601 @Nothing@, indicating that the node should

1602 not be rewritten,

1603 or $@Just@\;\ag$, indicating that the node should

1604 be replaced with~\ag: the replacement graph.

1605 %%The additional value $\rw$ tells \hoopl\ whether

1606 %%and how the replacement graph~$\ag$ should be analyzed and rewritten

1607 %%further;

1608 %%we explain~$\rw$ in \secref{shallow-vs-deep}.

1612 that

1613 the replacement graph $\ag$ has

1615 For example, a branch instruction can be replaced only by a graph

1616 closed on exit.

1617 %% Moreover, because only an open/open graph can be

1618 %% empty---look at the type of @GNil@ in \figref{graph}---the type

1619 %% of @FwdRewrite@

1620 %% guarantees, at compile time, that no head of a block (closed/open)

1621 %% or tail of a block (open/closed) can ever be deleted by being

1622 %% rewritten to an empty graph.

1628 When a node is rewritten, the replacement graph~$g$

1629 must itself be analyzed, and its nodes may be further rewritten.

1630 \hoopl\ can make a recursive call to

1631 @analyzeAndRewriteFwdBody@---but what @FwdPass@ should it use?

1632 There are two common situations:

1634 \item Sometimes we want to analyze and transform the replacement graph

1635 with an unmodified @FwdPass@, thereby further rewriting the replacement graph.

1636 This procedure is

1638 When deep rewriting is used, the client's rewrite function must

1639 ensure that the graphs it produces are not rewritten indefinitely

1641 \item

1642 A client may want to analyze the

1646 Deep rewriting is essential to achieve the full benefits of

1647 interleaved analysis and transformation

1649 But shallow rewriting can be vital as well;

1650 for example, a backward dataflow pass that inserts

1651 a spill before a call must not rewrite the call again, lest it attempt

1652 to insert infinitely many spills.

1656 An~innovation of \hoopl\ is to build the choice of shallow or deep

1657 rewriting into each rewrite function, through the use of combinators.

1658 A~function returned by @mkFRewrite@ is shallow by default, but it can

1659 be made deep by

1661 `iterFwdRw :: FwdRewrite m n f -> FwdRewrite m n f

1663 Another useful combinator enables \hoopl\ to try any number of functions to

1664 rewrite a node, provided only that they all use the same fact:

1666 % defn thenFwdRw

1667 Rewrite function @r1 `thenFwdRw` r2@ first does the rewrites of~@r1@,

1668 then the rewrites of~@r2@.

1669 %%

1670 %%

1671 %% by using

1672 %% Rewrite functions are potentially more plentiful than transfer

1673 %% functions, because

1674 %% a~single dataflow fact might justify more than one kind of rewrite.

1675 %% \hoopl\ makes it easy for a client to combine multiple rewrite

1676 %% functions that use the same fact:

1678 These combinators satisfy the algebraic law

1680 iterFwdRw r = r `thenFwdRw` iterFwdRw r

1683 @iterFwdRw@:

1684 if~we used this equation to define @iterFwdRw@, then when @r@ declined

1685 to rewrite, @iterFwdRw r@ would diverge.

1689 %%%% \hoopl\ provides

1690 %%%% a function that makes a shallow rewrite deep:\finalremark{algebraic

1691 %%%% law wanted!}

1692 %%%% \remark{Do we really care about @iterFwdRw@ or can we just drop it?}

1693 %%%% \verbatiminput{iterf}

1694 %%%% % defn iterFwdRw

1701 the type of the transfer function's result

1702 depends on the argument's shape on exit.

1703 It~is easy for a client to write a type-indexed transfer function,

1704 because the client defines the constructor and shape for each node.

1705 The client's transfer functions discriminate on the constructor

1706 and so can return a result that is indexed by each node's shape.

1708 What if you want to write a transfer function that

1710 %

1711 For example, a dominator analysis need not scrutinize nodes;

1712 it needs to know only about

1713 labels and edges in the graph.

1714 Ideally, a dominator analysis would work

1716 provided only that @n@~is an

1717 instance of the @NonLocal@ type class.

1718 But if we don't know

1719 the type of~@n@,

1720 we can't write a function of type

1721 @n e x -> f -> Fact x f@,

1722 because the only way to get the result type right is to

1723 scrutinize the constructors of~@n@.

1727 There is another way;

1728 instead of requiring a single function that is polymorphic in shape,

1729 \hoopl\ will also accept a triple of functions, each of which is

1730 polymorphic in the node's type but monomorphic in its shape:

1732 `mkFTransfer3 :: (n C O -> f -> Fact O f)

1733 -> (n O O -> f -> Fact O f)

1734 -> (n O C -> f -> Fact C f)

1735 -> FwdTransfer n f

1737 We have used this interface to write a number functions that are

1738 polymorphic in the node type~@n@:

1740 \item

1741 A function that takes a @FwdTransfer@ and wraps it in logging code, so

1742 an analysis can be debugged by watching the facts flow through the

1743 nodes

1744 \item

1745 A pairing function that runs two passes interleaved, not sequentially,

1746 potentially producing better results than any sequence:

1748 % defn pairFwd

1749 \item

1750 An efficient dominator analysis

1751 in the style of

1753 whose transfer function is implemented

1754 using only the

1759 %% Or, you might want to write a combinator that

1760 %% computes the pairwise composition of any two analyses

1761 %% defined on any node type.

1762 %%

1763 %% You cannot write either of these examples

1764 %% using the polymorphic function described in \secref{transfers}

1765 %% because these examples require the ability

1766 %% to inspect the shape of the node.

1767 %%

1768 %%

1769 %% \subsection{Stuff cut from 4.2 that should disappear when the previous section is finished}

1770 %%

1771 %% So much for the interface.

1772 %% What about the implementation?

1773 %% The way GADTs work is that the compiler uses the value constructor for

1774 %% type \mbox{@n@ @e@ @x@} to determine whether @e@~and~@x@ are open or

1775 %% closed.

1776 %% But we want to write functions that are \emph{polymorphic} in the node

1777 %% type~@n@!

1778 %% Such functions include

1779 %% \begin{itemize}

1780 %% \item

1781 %% A function that takes a pair of @FwdTransfer@s for facts @f@~and~@f'@,

1782 %% and returns a single @FwdTransfer@ for the fact @(f, f')@

1783 %% \end{itemize}

1784 %% \simon{These bullets are utterly opaque. They belong in 4.5. The rest of this section

1785 %% is also very hard to understand without more explanation. See email. }

1786 %% Such functions may also be useful in \hoopl's \emph{clients}:

1787 %% % may need

1788 %% %functions that are polymorphic in the node type~@n@:

1789 %% \begin{itemize}

1790 %% \item

1791 %% A dominator analysis in the style of

1792 %% \citet{cooper-harvey-kennedy:simple-dominance} requires only the

1793 %% functions in the \texttt{NonLocal} type class;

1794 %% we have written such an analysis using transfer functions that are

1795 %% polymorphic in~@n@.

1796 %% \end{itemize}

1797 %% Because the mapping from

1798 %% value constructors to shape is different for each node type~@n@, transfer

1799 %% functions cannot be polymorphic in both

1800 %% the representation and the shape of nodes.

1801 %% Our implementation therefore sacrifices polymorphism in shape:

1802 %% internally, \hoopl\ represents

1803 %% a~@FwdTransfer n f@ as a \emph{triple} of functions,

1804 %% each polymorphic in~@n@ but monomorphic in shape:

1805 %% \begin{code}

1806 %% newtype FwdTransfer n f

1807 %% = FwdTransfers ( n C O -> f -> f

1808 %% , n O O -> f -> f

1809 %% , n O C -> f -> FactBase f

1810 %% )

1811 %% \end{code}

1812 %% \simon{I argue strongly that the implementation should be

1813 %% polymorphic too, using a shape classifier where necessary.

1814 %% Regardless of how this debate pans out, though, I think it's

1815 %% a bad idea to introduce triples here. They are simply confusing.}

1816 %% Such triples can easily be composed and wrapped without requiring a

1817 %% pattern match on the value constructor of an unknown node

1818 %% type.\remark{Need to refer to this junk in the conclusion}

1819 %% Working with triples is tedious, but only \hoopl\ itself is forced to

1820 %% work with triples; each client, which knows the node type, may supply

1821 %% a triple,

1822 %% but it typically supplies a single function polymorphic in the shape

1823 %% of the (known) node.

1824 %%

1825 %%

1827 %% \subsection{Composing rewrite functions and dataflow passes} \seclabel{combinators}

1828 %%

1829 %% Requiring each rewrite to return a new rewrite function has another

1830 %% advantage, beyond controlling the shallow-vs-deep choice: it

1831 %% enables a variety of combinators over rewrite functions.

1832 %% \remark{This whole subsection needs to be redone in light of the new

1833 %% (triple-based) representation. It's not pretty any more.}

1834 %% For example, here is a function

1835 %% that combines two rewrite functions in sequence:

1836 %% \remark{This code must be improved}

1837 %% What a beautiful type @thenFwdRw@ has!

1838 %% \remark{And what an ugly implementation! Implementations must go.}

1839 %% It tries @rw1@, and if @rw1@

1840 %% declines to rewrite, it behaves like @rw2@. But if

1841 %% @rw1@ rewrites, returning a new rewriter @rw1a@, then the overall call also

1842 %% succeeds, returning a new rewrite function obtained by combining @rw1a@

1843 %% with @rw2@. (We cannot apply @rw1a@ or @rw2@

1844 %% directly to the replacement graph~@g@,

1845 %% because @r1@~returns a graph and @rw2@~expects a node.)

1846 %% The rewriter @noFwdRw@ is the identity of @thenFwdRw@.

1847 %% Finally, @thenFwdRw@ can

1848 %% combine a deep-rewriting function and a shallow-rewriting function,

1849 %% to produce a rewrite function that is a combination of deep and shallow.

1850 %% %%This is something that the Lerner/Grove/Chambers framework could not do,

1851 %% %%because there was a global shallow/deep flag.

1852 %% %% Our shortsightedness; Lerner/Grove/Chambers is deep only ---NR

1853 %%

1854 %%

1855 %% A shallow rewrite function can be made deep by iterating

1856 %% it:\remark{Algebraic law wanted!}

1857 %% If we have shallow rewrites $A$~and~$B$ then we can build $AB$,

1858 %% $A^*B$, $(AB)^*$,

1859 %% and so on: sequential composition is @thenFwdRw@ and the Kleene star

1860 %% is @iterFwdRw@.\remark{Do we still believe this claim?}

1861 %% \remark{We can't define @iterFwdRew@ in terms of @thenFwdRew@ because

1862 %% the second argument to @thenFwdRew@ would have to be the constant

1863 %% nothing function when applied but would have to be the original triple

1864 %% when passed to @thenFwdRew@ as the second argument (in the recursive

1865 %% call).}

1866 %%

1867 %%

1868 %% The combinators above operate on rewrite functions that share a common

1869 %% fact type and transfer function.

1870 %% It~can also be useful to combine entire dataflow passes that use

1871 %% different facts.

1872 %% We~invite you to write one such combinator, with type

1873 %% \begin{code}

1874 %% `pairFwd :: Monad m

1875 %% => FwdPass m n f1

1876 %% -> FwdPass m n f2

1877 %% -> FwdPass m n (f1,f2)

1878 %% \end{code}

1879 %% The two passes run interleaved, not sequentially, and each may help

1880 %% the other,

1881 %% yielding better results than running $A$~and then~$B$ or $B$~and then~$A$

1882 %% \citep{lerner-grove-chambers:2002}.

1883 %% %% call these passes. ``super-analyses;''

1884 %% %% in \hoopl, construction of super-analyses is

1885 %% %% particularly concrete.

1890 % omit Binop :: Operator -> Expr -> Expr -> Expr

1891 % omit Add :: Operator

1895 % local node

1896 % defn ConstFact

1897 % defn constLattice

1898 % defn constFactAdd

1899 % defn varHasLit

1900 % local ft

1901 % defn constProp

1902 % local cp

1903 % local lookup

1904 % defn simplify

1905 % local simp

1906 % local s_node

1907 % local s_exp

1908 % defn constPropPass

1909 \caption{The client for constant propagation and constant folding\break (extracted automatically from code distributed with Hoopl)}

1916 constant propagation and constant folding.

1917 For each variable, at each program point, the analysis concludes one

1918 of three facts:

1919 the variable holds a constant value of type~@Lit@,

1920 the variable might hold a non-constant value,

1921 or what the variable holds is unknown.

1922 We~represent these facts using a finite map from a variable to a

1924 % Any one procedure has only

1925 % finitely many variables; only finitely many facts are computed at any

1926 % program point; and in this lattice any one fact can increase at most

1927 % twice. These properties ensure that the dataflow engine will reach a

1928 % fixed point.

1929 A~variable with a constant value maps to @Just (PElem k)@, where

1930 @k@~is the constant value;

1931 a variable with a non-constant value maps to @Just Top@;

1932 and a variable with an unknown value maps to @Nothing@ (it is not

1933 in the domain of the finite map).

1935 % \afterpage{\clearpage}

1937 The definition of the lattice (@constLattice@) is straightforward.

1938 The bottom element is an empty map (nothing is known about what

1939 any variable holds).

1940 %

1941 The join function is implemented with the help of combinators provided

1942 by \hoopl.

1943 The client writes a simple function, @constFactAdd@, which

1944 compares two values of type @Lit@ and returns a result of type

1945 @WithTop Lit@.

1946 The client uses @extendJoinDomain@ to lift @constFactAdd@ into a join

1947 function on @WithTop Lit@, then uses

1949 containing facts for all variables.

1952 % omit stdMapJoin :: Ord k => JoinFun v -> JoinFun (Map.Map k v)

1954 The forward transfer function @varHasLit@ is defined using the

1955 shape-polymorphic auxiliary function~@ft@.

1956 For most nodes~@n@, @ft n@ simply propagates the input fact forward.

1957 But for an assignment node, if a variable~@x@ gets a constant value~@k@,

1958 @ft@ extends the input fact by mapping @x@ to~@PElem k@.

1959 And if a variable~@x@ is assigned a non-constant value,

1960 @ft@ extends the input fact by mapping @x@ to~@Top@.

1961 There is one other interesting case:

1962 a conditional branch where the condition

1963 is a variable.

1964 If the conditional branch flows to the true successor,

1965 the variable holds @True@, and similarly for the false successor,

1967 Function @ft@ updates the fact flowing to each successor accordingly.

1969 default the uninteresting cases.}

1971 The transfer function need not consider complicated cases such as

1972 an assignment @x:=y@ where @y@ holds a constant value~@k@.

1973 Instead, we rely on the interleaving of transformation

1974 and analysis to first transform the assignment to @x:=k@,

1975 which is exactly what our simple transfer function expects.

1977 interleaving makes it possible to write

1978 very simple transfer functions, without missing

1979 opportunities to improve the code.

1982 rewrites each use of a variable to its constant value.

1983 The client has defined auxiliary functions that may change expressions

1984 or nodes:

1986 type `MaybeChange a = a -> Maybe a

1987 `mapVE :: (Var -> Maybe Expr) -> MaybeChange Expr

1988 `mapEE :: MaybeChange Expr -> MaybeChange Expr

1989 `mapEN :: MaybeChange Expr -> MaybeChange (Node e x)

1990 `mapVN :: (Var -> Maybe Expr) -> MaybeChange (Node e x)

1991 `nodeToG :: Node e x -> Graph Node e x

1993 The client composes @mapXX@ functions

1994 to apply @lookup@ to each use of a variable in each kind of node;

1995 @lookup@ substitutes for each variable that has a constant value.

1996 Applying @liftM nodeToG@ lifts the final node, if present, into a~@Graph@.

2000 for constant

2001 folding: @simplify@.

2002 This function

2003 rewrites constant expressions to their values,

2004 and it rewrites a conditional branch on a

2005 boolean constant to an unconditional branch.

2006 To~rewrite constant expressions,

2007 it runs @s_exp@ on every subexpression.

2008 Function @simplify@ does not check whether a variable holds a

2009 constant value; it relies on @constProp@ to have replaced the

2010 variable by the constant.

2011 Indeed, @simplify@ does not consult the

2012 incoming fact, so it is polymorphic in~@f@.

2015 The @FwdRewrite@ functions @constProp@ and @simplify@

2016 are useful independently.

2017 In this case, however, we

2019 so we compose them with @thenFwdRw@.

2020 The composition, along with the lattice and the

2021 transfer function,

2023 Given @constPropPass@, we can

2024 improve a graph~@g@ by passing @constPropPass@ and~@g@

2025 to

2026 @analyzeAndRewriteFwdBody@.

2030 %%%% \subsection{Fixed points and speculative rewrites} \seclabel{fixpoints}

2031 %%%%

2032 %%%% Are rewrites sound, especially when there are loops?

2033 %%%% Many analyses compute a fixed point starting from unsound

2034 %%%% ``facts''; for example, a live-variable analysis starts from the

2035 %%%% assumption that all variables are dead. This means \emph{rewrites

2036 %%%% performed before a fixed point is reached may be unsound, and their results

2037 %%%% must be discarded}. Each iteration of the fixed-point computation must

2038 %%%% start afresh with the original graph.

2039 %%%%

2040 %%%%

2041 %%%% Although the rewrites may be unsound, \emph{they must be performed}

2042 %%%% (speculatively, and possibly recursively),

2043 %%%% so that the facts downstream of the replacement graphs are as accurate

2044 %%%% as possible.

2045 %%%% For~example, consider this graph, with entry at @L1@:

2046 %%%% \par{\small

2047 %%%% \begin{code}

2048 %%%% L1: x=0; goto L2

2049 %%%% L2: x=x+1; if x==10 then goto L3 else goto L2

2050 %%%% \end{code}}

2051 %%%% The first traversal of block @L2@ starts with the unsound ``fact'' \{x=0\};

2052 %%%% but analysis of the block propagates the new fact \{x=1\} to @L2@, which joins the

2053 %%%% existing fact to get \{x=$\top$\}.

2054 %%%% What if the predicate in the conditional branch were @x<10@ instead

2055 %%%% of @x==10@?

2056 %%%% Again the first iteration would begin with the tentative fact \{x=0\}.

2057 %%%% Using that fact, we would rewrite the conditional branch to an unconditional

2058 %%%% branch @goto L3@. No new fact would propagate to @L2@, and we would

2059 %%%% have successfully (and soundly) eliminated the loop.

2060 %%%% This example is contrived, but it illustrates that

2061 %%%% for best results we should

2062 %%%% \begin{itemize}

2063 %%%% \item Perform the rewrites on every iteration.

2064 %%%% \item Begin each new iteration with the original, virgin graph.

2065 %%%% \end{itemize}

2066 %%%% This sort of algorithm is hard to implement in an imperative setting, where rewrites

2067 %%%% mutate a graph in place.

2068 %%%% But with an immutable graph, implementing the algorithm

2069 %%%% is trivially easy: we simply revert to the original graph at the start

2070 %%%% of each fixed-point iteration.

2077 In the presence of loops, a rewrite function could make a change

2078 that later has to be rolled back.

2079 For example, consider constant propagation in this loop, which

2080 computes factorial:

2083 L1: if (i >= n) goto L3 else goto L2;

2084 L2: i = i + 1; prod = prod * i;

2085 goto L1;

2086 L3: ...

2088 Function @analyzeAndRewriteFwdBody@ iterates through this graph until

2089 the dataflow facts stop changing.

2090 On~the first iteration, the assignment @i = i + 1@ will

2091 be analyzed with an incoming fact @i=0@, and the assignment will be

2092 rewritten to the graph @i = 1@.

2093 But~on a later iteration, the incoming fact will increase to @i=Top@,

2094 and the rewrite will no longer be justified.

2095 After each iteration, \hoopl\ starts the next iteration with

2097 of using purely functional data structures, rewrites from

2098 previous iterations are automatically ``rolled back.''

2101 acquiring a fresh name.

2102 These actions must also be rolled back, and because the client chooses

2103 the monad in which the actions take place, the client must provide the

2104 means to roll them back.

2107 back monadic actions.

2108 The~interface is defined by the type class @CkpointMonad@ in

2111 class Monad m => `CkpointMonad m where

2112 type `Checkpoint m

2113 `checkpoint :: m (Checkpoint m)

2114 `restart :: Checkpoint m -> m ()

2116 \hoopl\ calls the @checkpoint@ method at the beginning of an

2117 iteration, then calls the @restart@ method if another iteration is

2118 necessary.

2119 These operations must obey the following algebraic law:

2123 where @m@~represents any combination of monadic actions that might be

2124 taken by rewrite functions.

2125 (The safest course is to make sure the law holds throughout the entire

2126 monad.)

2127 The~type of the saved checkpoint~@s@ is entirely up to the client;

2128 it is defined using SIMON, WHAT ARE THE RIGHT WORDS TO DESCRIBE THIS

2129 HASKELL FEATURE??.

2136 % Facts computed by @analyzeAndRewriteFwdBody@ depend on graphs produced by the rewrite

2137 Facts computed by the transfer function depend on graphs produced by the rewrite

2138 function, which in turn depend on facts computed by the transfer function.

2139 How~do we know this algorithm is sound, or if it terminates?

2140 A~proof requires a POPL paper

2142 here we merely state the conditions for correctness as applied to \hoopl:

2144 \item

2146 every sequence of calls to @fact_join@ must eventually return @NoChange@.

2147 \item

2148 The transfer function must be

2150 it must produce a more informative fact out.

2151 \item

2153 if it replaces a node @n@ by a replacement graph~@g@, then @g@~must be

2154 observationally equivalent to~@n@ under the

2155 assumptions expressed by the incoming dataflow fact~@f@.

2156 %%\footnote{We do not permit a transformation to change

2157 %% the @Label@ of a node. We have not found any optimizations

2158 %% that are prevented (or even affected) by this restriction.}

2159 %

2160 Moreover, analysis of~@g@ must produce output fact(s)

2161 that are at least as informative as the fact(s) produced by applying

2162 the transfer function to~@n@.

2163 %% \item

2164 %% The rewrite function must be \emph{consistent} with the transfer function;

2165 %% that is, \mbox{@`transfer n f@ $\sqsubseteq$ @transfer g f@}.

2166 For example, if the transfer function says that @x=7@ after the node~@n@,

2167 then after analysis of~@g@,

2168 @x@~had better still be~@7@.

2169 \item

2170 % To ensure termination,

2171 A transformation that uses deep rewriting

2172 must not return a replacement graph which

2173 contains a node that could be rewritten indefinitely.

2175 %% Without the conditions on monotonicity and consistency,

2176 %% our algorithm will terminate,

2177 %% but there is no guarantee that it will compute

2178 %% a fixed point of the analysis. And that in turn threatens the

2179 %% soundness of rewrites based on possibly bogus ``facts''.

2180 Under these conditions, the algorithm terminates and is

2181 sound.

2182 %%

2183 %% \begin{itemize}

2184 %% \item

2185 %% The algorithm terminates. The fixed-point loop must terminate because the

2186 %% lattice has no infinite ascending chains. And the client is responsible

2187 %% for avoiding infinite recursion when deep rewriting is used.

2188 %% \item

2189 %% The algorithm is sound. Why? Because if each rewrite is sound (in the sense given above),

2190 %% then applying a succession of rewrites is also sound.

2191 %% Moreover, a~sound analysis of the replacement graph

2192 %% may generate only dataflow facts that could have been

2193 %% generated by a more complicated analysis of the original graph.

2194 %% \end{itemize}

2195 %%

2196 %% \finalremark{Doesn't the rewrite have to be have the following property:

2197 %% for a forward analysis/transform, if (rewrite P s) = Just s',

2198 %% then (transfer P s $\sqsubseteq$ transfer P s').

2199 %% For backward: if (rewrite Q s) = Just s', then (transfer Q s' $\sqsubseteq$ transfer Q s).

2200 %% Works for liveness.

2201 %% ``It works for liveness, so it must be true'' (NR).

2202 %% If this is true, it's worth a QuickCheck property!

2203 %% }%

2204 %% \finalremark{Version 2, after further rumination. Let's define

2205 %% $\scriptstyle \mathit{rt}(f,s) = \mathit{transform}(f, \mathit{rewrite}(f,s))$.

2206 %% Then $\mathit{rt}$ should

2207 %% be monotonic in~$f$. We think this is true of liveness, but we are not sure

2208 %% whether it's just a generally good idea, or whether it's actually a

2209 %% precondition for some (as yet unarticulated) property of \ourlib{} to hold.}%

2211 %%%% \simon{The rewrite functions must presumably satisfy

2212 %%%% some monotonicity property. Something like: given a more informative

2213 %%%% fact, the rewrite function will rewrite a node to a more informative graph

2214 %%%% (in the fact lattice.).

2215 %%%% \textbf{NR}: actually the only obligation of the rewrite function is

2216 %%%% to preserve observable behavior. There's no requirement that it be

2217 %%%% monotonic or indeed that it do anything useful. It just has to

2218 %%%% preserve semantics (and be a pure function of course).

2219 %%%% \textbf{SLPJ} In that case I think I could cook up a program that

2220 %%%% would never reach a fixed point. Imagine a liveness analysis with a loop;

2221 %%%% x is initially unused anywhere.

2222 %%%% At some assignment node inside the loop, the rewriter behaves as follows:

2223 %%%% if (and only if) x is dead downstream,

2224 %%%% make it alive by rewriting the assignment to mention x.

2225 %%%% Now in each successive iteration x will go live/dead/live/dead etc. I

2226 %%%% maintain my claim that rewrite functions must satisfy some

2227 %%%% monotonicity property.

2228 %%%% \textbf{JD}: in the example you cite, monotonicity of facts at labels

2229 %%%% means x cannot go live/dead/live/dead etc. The only way we can think

2230 %%%% of not to terminate is infinite ``deep rewriting.''

2231 %%%% }

2241 gives a client's-eye view of \hoopl, showing how to use

2242 it to create analyses and transformations.

2243 \hoopl's interface is simple, but

2246 do not describe their implementation. We have written

2247 at least three previous implementations, all of which

2248 were long and hard to understand, and only one of which

2249 provided compile-time guarantees about open and closed shapes.

2250 We are not confident that any of our previous implementations are correct.

2252 In this paper we describe a new implementation. It is elegant and short

2253 (about a third of the size of our last attempt), and it offers

2254 strong compile-time guarantees about shapes.

2258 analysis and transformation.

2259 The implementations of backward analysis and transformation are

2260 exactly analogous and are included in \hoopl.

2270 `withFuel :: FuelMonad m => Maybe a -> m (Maybe a)

2274 as expressed by the

2276 The first component of the @FwdRew@ is the replacement graph, as discussed earlier.

2277 The second component, $\rw$, is a

2279 the replacement graph.

2280 For shallow rewriting this new function is

2281 the constant @Nothing@ function; for deep rewriting it is the original

2282 rewrite function.

2283 While @mkFRewrite@ allows for general rewriting, most clients will

2284 take advantage of \hoopl's support for these two common cases:

2286 `deepFwdRw, `shallowFwdRw

2287 :: Monad m

2288 => (forall e x . n e x -> f -> m (Maybe (Graph n e x))

2289 -> FwdRewrite m n f

2298 %% We on @analyzeAndRewriteFwd@, whose type is more general

2299 %% than that of @analyzeAndRewriteFwdBody@:

2300 %% \begin{smallcode}

2301 %% `analyzeAndRewriteFwd

2302 %% :: forall m n f e x. (FuelMonad m, NonLocal n)

2303 %% => FwdPass m n f -- lattice, transfers, rewrites

2304 %% -> MaybeC e [Label] -- entry points for a closed graph

2305 %% -> Graph n e x -- the original graph

2306 %% -> Fact e f -- fact(s) flowing into the entry/entries

2307 %% -> m (Graph n e x, FactBase f, MaybeO x f)

2308 %% \end{smallcode}

2309 %% We analyze graphs of all shapes; a single @FwdPass@ may be used with

2310 %% multiple shapes.

2311 %% If a graph is closed on entry, a list of entry points must be

2312 %% provided;

2313 %% if the graph is open on entry, it must be the case that the

2314 %% implicit entry point is the only entry point.

2315 %% The fact or set of facts flowing into the entries is similarly

2316 %% determined by the shape of the entry point.

2317 %% Finally, the result is a rewritten graph, a @FactBase@ that gives a

2318 %% fixed point of the analysis on the rewritten graph, and if the graph

2319 %% is open on exit, an ``exit fact'' flowing out.

2321 Instead of the interface function @analyzeAndRewriteFwdBody@, we present

2322 the private function @arfGraph@ (short for ``analyze and rewrite

2323 forward graph''):

2325 `arfGraph

2326 :: forall m n f e x. (FuelMonad m, NonLocal n)

2327 => FwdPass m n f -- lattice, transfers, rewrites

2329 -> Graph n e x -- the original graph

2330 -> Fact e f -- fact(s) flowing into the entry/entries

2331 -> m (DG f n e x, Fact x f)

2333 Function @arfGraph@ has a more general type than

2334 the function @analyzeAndRewriteFwdBody@ % praying for a decent line break

2335 because @arfGraph@ is used recursively

2336 to analyze graphs of all shapes.

2337 If a graph is closed on entry, a list of entry points must be

2338 provided;

2339 if the graph is open on entry,

2340 the graph's entry sequence must be the only entry point.

2341 The~graph's shape on entry also determines the type of fact or facts

2342 flowing in.

2343 Finally, the result is a ``decorated graph''

2344 @DG f n e x@,

2345 and if the graph

2346 is open on exit, an ``exit fact'' flowing out.

2348 %% \simon{I suggest (a) putting the paragraph break one sentence earlier,

2349 %%% so that this para is all about DGs.}

2350 %%% NR: previous para is about the type of arfGraph; I don't want to

2351 %%% leave the result type dangling. I hope the opening sentence of

2352 %%% this para suggests that the para is all about DGs.

2353 A~``decorated graph'' is one in which each block is decorated with the

2354 fact that holds at the start of the block.

2355 @DG@ actually shares a representation with @Graph@,

2356 which is possible because the definition of

2358 synonym for an underlying type @`Graph'@, which takes the type of block

2359 as an additional parameter.

2361 higher-order function that takes a block-concatenation function as a

2362 parameter.)

2363 The truth about @Graph@ and @DG@ is as follows:

2365 % defn DG

2366 % defn DBlock

2367 Type~@DG@ is internal to \hoopl; it is not seen by any client.

2368 To~convert a~@DG@ to the @Graph@ and @FactBase@

2369 that are returned by the API function @analyzeAndRewriteFwdBody@,

2370 we use a 12-line function:

2372 `normalizeGraph

2373 :: NonLocal n => DG f n e x -> (Graph n e x, FactBase f)

2376 Function @arfGraph@ is implemented as follows:

2377 \begingroup

2380 arfGraph pass entries = graph

2381 where\cnl

2382 node :: forall e x . (ShapeLifter e x)

2383 => n e x -> f -> m (DG f n e x, Fact x f)\cnl

2384 block:: forall e x .

2385 Block n e x -> f -> m (DG f n e x, Fact x f)\cnl

2387 -> Fact C f -> m (DG f n C C, Fact C f)\cnl

2388 `graph:: Graph n e x -> Fact e f -> m (DG f n e x, Fact x f)\cnl

2389 ... definitions of 'node', 'block', 'body', and 'graph' ...

2391 The four auxiliary functions help us separate concerns: for example, only

2392 \endgroup

2393 @node@ knows about rewrite functions;

2394 and only @body@ knows about fixed points.

2395 %% All four functions have access to the @FwdPass@ and any entry points

2396 %% that are passed to @arfGraph@.

2397 %% These functions also have access to type variables bound by

2398 %% @arfGraph@:

2399 %% @n@~is the type of nodes; @f@~is the type of facts;

2400 %% @m@~is the monad used in the rewrite functions of the @FwdPass@;

2401 %% and

2402 %% @e@~and~@x@ give the shape of the graph passed to @arfGraph@.

2403 %% The types of the inner functions are

2404 %% \begin{smallcode}

2405 %% \end{smallcode}

2406 Each auxiliary function works the same way: it~takes a ``thing'' and

2408 An~extended fact transformer takes dataflow fact(s) coming into

2409 the ``thing,'' and it returns an output fact.

2410 It~also returns a decorated graph representing the (possibly

2412 Finally, because a rewrite may require fresh names provided

2413 by the client,

2414 may wish to consult state provided by the client,

2416 every extended fact transformer is monadic.

2418 %%%% \begin{figure}

2419 %%%% SIMON HAS ASKED IF TYPE SYNONYMS MIGHT IMPROVE THINGS FOR EXTENDED

2420 %%%% FACT TRANSFORMERS. JUDGE FOR YOURSELF.

2421 %%%% FIRST, SOMETHING THAT IS SOMEWHAT READABLE BUT IS NOT LEGAL HASKELL:

2422 %%%% \begin{smallcode}

2423 %%%% type EFFX ipt e x = ipt -> m (DG f n e x, Fact x f)

2424 %%%% -- extended forward fact transformer

2425 %%%%

2426 %%%% node :: forall e x . (ShapeLifter e x)

2427 %%%% => n e x -> EFFX f e x

2428 %%%% block :: forall e x .

2429 %%%% Block n e x -> EFFX f e x

2430 %%%% body :: [Label] -> LabelMap (Block n C C)

2431 %%%% -> EFFX (Fact C f) C C

2432 %%%% graph :: Graph n e x -> EFFX (Fact e f) e x

2433 %%%% \end{smallcode}

2434 %%%% IF WE MAKE IT LEGAL HASKELL, IT BECOMES COMPLETELY HOPELESS:

2435 %%%% \begin{smallcode}

2436 %%%% type EFFX m n f ipt e x = ipt -> m (DG f n e x, Fact x f)

2437 %%%% -- extended forward fact transformer

2438 %%%%

2439 %%%% node :: forall e x . (ShapeLifter e x)

2440 %%%% => n e x -> EFFX m n f f e x

2441 %%%% block :: forall e x .

2442 %%%% Block n e x -> EFFX m n f f e x

2443 %%%% body :: [Label] -> LabelMap (Block n C C)

2444 %%%% -> EFFX m n f (Fact C f) C C

2445 %%%% graph :: Graph n e x -> EFFX m n f (Fact e f) e x

2446 %%%% \end{smallcode}

2447 %%%% \caption{EXPERIMENTS WITH TYPE SYNONYMS}

2448 %%%% \end{figure}

2449 %%%%

2452 The types of the four extended fact transformers are not quite

2453 identical:

2455 \item

2456 Extended fact transformers for nodes and blocks have the same type;

2457 like forward transfer functions,

2458 they expect a fact~@f@ rather than the more general @Fact e f@

2459 required for a graph.

2460 Because a node or a block has

2461 exactly one fact flowing into the entry, it is easiest simply to pass

2462 that fact.

2463 \item

2464 Extended fact transformers for graphs have the most general type,

2465 as expressed using @Fact@:

2466 if the graph is open on entry, its fact transformer expects a

2467 single fact;

2468 if the graph is closed on entry, its fact transformer expects a

2469 @FactBase@.

2470 \item

2471 Extended fact transformers for bodies have the same type as

2472 extended fact transformers for closed/closed graphs.

2476 Function @arfGraph@ and its four auxiliary functions comprise a cycle of

2477 mutual recursion:

2478 @arfGraph@ calls @graph@;

2479 @graph@ calls @body@ and @block@;

2480 @body@ calls @block@;

2481 @block@ calls @node@;

2482 and

2483 @node@ calls @arfGraph@.

2484 These five functions do three different kinds of work:

2485 compose extended fact transformers, analyze and rewrite nodes, and compute

2486 fixed points.

2493 Extended fact transformers compose nicely.

2494 For example, @block@ is implemented thus:

2497 `block :: forall e x .

2498 Block n e x -> f -> m (DG f n e x, Fact x f)

2499 block (BFirst n) = node n

2500 block (BMiddle n) = node n

2501 block (BLast n) = node n

2502 block (BCat b1 b2) = block b1 `cat` block b2

2504 The composition function @cat@ feeds facts from one extended fact

2505 transformer to another, and it splices decorated graphs.

2506 It~has a very general type:

2508 cat :: forall m e a x f f1 f2. Monad m

2509 => (f -> m (DG f n e a, f1))

2510 -> (f1 -> m (DG f n a x, f2))

2511 -> (f -> m (DG f n e x, f2))

2512 `cat ft1 ft2 f = do { (g1,f1) <- ft1 f

2513 ; (g2,f2) <- ft2 f1

2514 ; return (g1 `dgSplice` g2, f2) }

2516 (Function @`dgSplice@ is the same splicing function used for an ordinary

2517 @Graph@, but it uses a one-line block-concatenation function suitable

2518 for @DBlock@s.)

2519 The name @cat@ comes from the concatenation of the decorated graphs,

2520 but it is also appropriate because the style in which it is used is

2521 reminiscent of @concatMap@, with the @node@ and @block@ functions

2522 playing the role of @map@.

2524 Function @graph@ is much like @block@, but it has more cases.

2527 %%%%

2528 %%%% \begin{itemize}

2529 %%%% \item

2530 %%%% The @arfNode@ function processes nodes (\secref{arf-node}).

2531 %%%% It handles the subtleties of interleaved analysis and rewriting,

2532 %%%% and it deals with fuel consumption. It calls @arfGraph@ to analyze

2533 %%%% and transform rewritten graphs.

2534 %%%% \item

2535 %%%% Based on @arfNode@ it is extremely easy to write @arfBlock@, which lifts

2536 %%%% the analysis and rewriting from nodes to blocks (\secref{arf-block}).

2537 %%%%

2538 %%%%

2539 %%%%

2540 %%%% \item

2541 %%%% Using @arfBlock@ we define @arfBody@, which analyzes and rewrites a

2542 %%%% @Body@: a~group of closed/closed blocks linked by arbitrary

2543 %%%% control flow.

2544 %%%% Because a @Body@ is

2545 %%%% always closed/closed and does not take shape parameters, function

2546 %%%% @arfBody@ is less polymorphic than the others; its type is what

2547 %%%% would be obtained by expanding and specializing the definition of

2548 %%%% @ARF@ for a @thing@ which is always closed/closed and is equivalent to

2549 %%%% a @Body@.

2550 %%%%

2551 %%%% Function @arfBody@ takes care of fixed points (\secref{arf-body}).

2552 %%%% \item

2553 %%%% Based on @arfBody@ it is easy to write @arfGraph@ (\secref{arf-graph}).

2554 %%%% \end{itemize}

2555 %%%% Given these functions, writing the main analyzer is a simple

2556 %%%% matter of matching the external API to the internal functions:

2557 %%%% \begin{code}

2558 %%%% `analyzeAndRewriteFwdBody

2559 %%%% :: forall n f. NonLocal n

2560 %%%% => FwdPass n f -> Body n -> FactBase f

2561 %%%% -> FuelMonad (Body n, FactBase f)

2562 %%%%

2563 %%%% analyzeAndRewriteFwdBody pass ^body facts

2564 %%%% = do { (^rg, _) <- arfBody pass body facts

2565 %%%% ; return (normalizeBody rg) }

2566 %%%% \end{code}

2567 %%%%

2568 %%%% \subsection{From nodes to blocks} \seclabel{arf-block}

2569 %%%% \seclabel{arf-graph}

2570 %%%%

2571 %%%% We begin our explanation with the second task:

2572 %%%% writing @arfBlock@, which analyzes and transforms blocks.

2573 %%%% \begin{code}

2574 %%%% `arfBlock :: NonLocal n => ARF (Block n) n

2575 %%%% arfBlock pass (BUnit node) f

2576 %%%% = arfNode pass node f

2577 %%%% arfBlock pass (BCat b1 b2) f

2578 %%%% = do { (g1,f1) <- arfBlock pass b1 f

2579 %%%% ; (g2,f2) <- arfBlock pass b2 f1

2580 %%%% ; return (g1 `DGCatO` g2, f2) }

2581 %%%% \end{code}

2582 %%%% The code is delightfully simple.

2583 %%%% The @BUnit@ case is implemented by @arfNode@.

2584 %%%% The @BCat@ case is implemented by recursively applying @arfBlock@ to the two

2585 %%%% sub-blocks, threading the output fact from the first as the

2586 %%%% input to the second.

2587 %%%% Each recursive call produces a rewritten graph;

2588 %%%% we concatenate them with @DGCatO@.

2589 %%%%

2590 %%%% Function @arfGraph@ is equally straightforward:

2591 %%%% XXXXXXXXXXXXXXX

2592 %%%% The pattern is the same as for @arfBlock@: thread

2593 %%%% facts through the sequence, and concatenate the results.

2594 %%%% Because the constructors of type~@DG@ are more polymorphic than those

2595 %%%% of @Graph@, type~@DG@ can represent

2596 %%%% graphs more simply than @Graph@; for example, each element of a

2597 %%%% @GMany@ becomes a single @DG@ object, and these @DG@ objects are then

2598 %%%% concatenated to form a single result of type~@DG@.

2599 %%%%

2603 The @node@ function is where we interleave analysis with rewriting:

2605 node :: forall e x . (ShapeLifter e x, FuelMonad m)

2606 => n e x -> f -> m (DG f n e x, Fact x f)

2607 node n f

2608 = do { rew <- frewrite pass n f

2609 ; case rew of

2610 Nothing -> return (singletonDG f n,

2611 ftransfer pass n f)

2612 Just (FwdRew g rw) ->

2614 f' = fwdEntryFact n f

2615 in arfGraph pass' (fwdEntryLabel n) g f' }

2617 class ShapeLifter e x where

2618 singletonDG :: f -> n e x -> DG f n e x

2619 fwdEntryFact :: NonLocal n => n e x -> f -> Fact e f

2621 ftransfer :: FwdPass m n f -> n e x -> f -> Fact x f

2622 frewrite :: FwdPass m n f -> n e x

2623 -> f -> m (Maybe (FwdRew m n f e x))

2625 Function @node@ uses @frewrite@ to extract the rewrite function from

2626 @pass@,

2627 and applies the rewrite function to the node~@n@ and the incoming fact~@f@.

2628 The result of the rewrite is passed to @withFuel@, but for now,

2629 pretend @withFuel@ is ``@return@;'' we present

2630 the details below

2632 The result from @withFuel@, @rew@, is

2633 scrutinized by the @case@ expression.

2635 In the @Nothing@ case, no rewrite takes place.

2636 We~return node~@n@ and its incoming fact~@f@

2637 as the decorated graph @singletonDG f n@.

2638 To produce the outgoing fact, we apply the transfer function

2639 @ftransfer pass@ to @n@~and~@f@.

2641 In the @Just@ case, we receive a replacement

2642 graph~@g@ and a new rewrite function~@rw@.

2643 We~recursively analyze @g@ with @arfGraph@.

2644 This analysis uses @pass'@, which contains the original lattice and transfer

2645 function from @pass@, together with the new rewrite function~@rw@.

2646 Function @fwdEntryFact@ converts fact~@f@ from the type~@f@,

2647 which @node@ expects, to the type @Fact e f@, which @arfGraph@ expects.

2649 As you see, several functions called in @node@ are overloaded over a

2650 (private) class @ShapeLifter@, because their implementations depend

2651 on the open/closed shape of the node.

2652 By design, the shape of a node is known statically everywhere @node@

2653 is called, so

2654 this use of @ShapeLifter@ is specialized

2655 away by the compiler.

2657 %% And that's it! If~the client wanted deep rewriting, it is

2658 %% implemented by the call to @arfGraph@;

2659 %% if the client wanted

2660 %% shallow rewriting, the rewrite function will have returned

2661 %% @noFwdRw@ as~@rw@, which is implanted in @pass'@

2662 %% (\secref{shallow-vs-deep}).

2668 In function @node@, the call to

2669 @withFuel@ may prevent a node from

2670 being rewritten.

2673 If~@withFuel@ is passed a @Just@, a rewrite is being requested, and

2674 if~fuel is available, @withFuel@ @return@s the @Just@,

2675 reducing the supply of fuel by one unit.

2676 In~all other cases, including when fuel is exhausted, @withFuel@

2677 has no effect on the @FuelMonad@ and @return@s @Nothing@.

2680 Optimization fuel is used to debug the optimizer:

2681 when optimization produces a faulty program,

2682 we use Whalley's

2684 Given a program that fails when compiled with optimization,

2685 a binary search on the amount of

2686 optimization fuel

2687 finds an~$n$ such that the program works correctly after $n-1$ rewrites

2688 but fails

2689 after $n$~rewrites.

2690 The $n$th rewrite is faulty.

2700 The fourth and final auxiliary function of @arfGraph@ is

2701 @body@, which iterates to a fixed point.

2702 This part of the implementation is the only really tricky part, and it is

2703 cleanly separated from everything else:

2705 % defn body

2706 % local do_block

2707 % local blocks

2708 % local lattice

2709 % local entryFact

2710 % local entries

2711 % local init_fbase

2712 % local blockmap

2713 Function @getFact@ looks up a fact by its label.

2714 If the label is not found,

2715 @getFact@ returns

2716 the bottom element of the lattice:

2718 `getFact :: DataflowLattice f -> Label -> FactBase f -> f

2720 Function @forwardBlockList@ takes a list of possible entry points and

2721 a finite map from labels to blocks.

2722 It returns a list of

2723 blocks, sorted into an order that makes forward dataflow efficient.\footnote

2724 {The order of the blocks does not affect the fixed point or any other

2725 part of the answer; it affects only the number of iterations needed to

2726 reach the fixed point.}

2728 `forwardBlockList

2729 :: NonLocal n

2732 For

2733 example, if the entry point is at~@L2@, and the block at~@L2@

2734 branches to~@L1@, but not vice versa, then \hoopl\ will reach a fixed point

2735 more quickly if we process @L2@ before~@L1@.

2736 To~find an efficient order, @forwardBlockList@ uses

2737 the methods of the @NonLocal@ class---@entryLabel@ and @successors@---to

2738 perform a reverse postorder depth-first traversal of the control-flow graph.

2739 %%

2740 %%The @NonLocal@ type-class constraint on~@n@ propagates to all the

2741 %%@`arfThing@ functions.

2742 %% paragraph carrying too much freight

2743 %%

2745 The rest of the work is done by @fixpoint@, which is shared by

2746 both forward and backward analyses:

2748 % defn Direction

2749 % defn Fwd

2750 % defn Bwd

2751 Except for the @Direction@ passed as the first argument,

2752 the type signature tells the story.

2753 The third argument is an extended fact transformer for a single block;

2754 @fixpoint@ applies that function successively to each block in the list

2755 passed as the fourth argument.

2756 The result is an extended fact transformer for the list.

2758 The extended fact transformer returned by @fixpoint@

2759 maintains a

2760 ``current @FactBase@''

2761 which grows monotonically:

2762 as each block is analyzed,

2763 the block's input fact is taken from

2764 the current @FactBase@,

2765 and the current @FactBase@

2766 is

2767 augmented with the facts that flow out of the block.

2768 %

2769 The initial value of the current @FactBase@ is the input @FactBase@,

2770 and

2771 the extended fact transformer

2772 iterates over the blocks until the current @FactBase@

2773 stops changing.

2777 Implementing @fixpoint@ requires about 90 lines,

2778 formatted narrowly for display in one column.

2779 %%

2780 %% for completeness, it

2781 %% appears in Appendix~\ref{app:fixpoint}.

2782 The~code is mostly straightforward, although we try to be clever

2783 about deciding when a new fact means that another iteration over the

2784 blocks will be required.

2786 There is one more subtle point worth mentioning, which we highlight by

2787 considering a forward analysis of this graph, where execution starts at~@L1@:

2789 L1: x:=3; goto L4

2790 L2: x:=4; goto L4

2791 L4: if x>3 goto L2 else goto L5

2793 Block @L2@ is unreachable.

2794 But if we \naively\ process all the blocks (say in

2795 order @L1@, @L4@, @L2@), then we will start with the bottom fact for @L2@, propagate

2798 Given @x=@$\top$, the

2799 conditional in @L4@ cannot be rewritten, and @L2@~seems reachable. We have

2800 lost a good optimization.

2802 Function @fixpoint@ solves this problem

2803 by analyzing a block only if the block is

2804 reachable from an entry point.

2805 This trick is safe only for a forward analysis, which

2806 is why

2807 @fixpoint@ takes a @Direction@ as its first argument.

2809 %% Although the trick can be implemented in just a couple of lines of

2810 %% code, the reasoning behind it is quite subtle---exactly the sort of

2811 %% thing that should be implemented once in \hoopl, so clients don't have

2812 %% to worry about it.

2816 While there is a vast body of literature on

2817 dataflow analysis and optimization,

2818 relatively little can be found on

2820 We therefore focus on the foundations of dataflow analysis

2821 and on the implementations of some comparable dataflow frameworks.

2825 When transfer functions are monotone and lattices are finite in height,

2826 iterative dataflow analysis converges to a fixed point

2828 If~the lattice's join operation distributes over transfer

2829 functions,

2830 this fixed point is equivalent to a join-over-all-paths solution to

2831 the recursive dataflow equations

2833 {Kildall uses meets, not joins.

2834 Lattice orientation is conventional, and conventions have changed.

2835 We use Dana Scott's

2836 orientation, in which higher elements carry more information.}

2838 monotone functions.

2839 Each~client of \hoopl\ must guarantee monotonicity.

2841 \ifcutting

2843 \else

2845 \fi

2846 introduce abstract interpretation as a technique for developing

2847 lattices for program analysis.

2849 a dataflow analysis can be implemented using model checking;

2851 expands on this~result by showing that

2852 an all-paths dataflow problem can be viewed as model checking an

2853 abstract interpretation.

2856 present a survey of different methods for performing dataflow analyses,

2857 with emphasis on theoretical results.

2859 presents many examples of both particular analyses and related

2860 algorithms.

2863 The soundness of interleaving analysis and transformation,

2864 even when not all speculative transformations are performed on later

2865 iterations, was shown by

2870 Most dataflow frameworks support only analysis, not transformation.

2871 The framework computes a fixed point of transfer functions, and it is

2872 up to the client of

2873 the framework to use that fixed point for transformation.

2874 Omitting transformation makes it much easier to build frameworks,

2875 and one can find a spectrum of designs.

2876 We~describe two representative

2877 designs, then move on to the prior frameworks that support interleaved

2878 analysis and transformation.

2880 The Soot framework is designed for analysis of Java programs

2882 best for Soot in general, but there doesn't appear

2883 to be any formal publication that actually details the dataflow

2884 framework part. ---JD}

2885 While Soot's dataflow library supports only analysis, not

2886 transformation, we found much

2887 to admire in its design.

2888 Soot's library is abstracted over the representation of

2889 the control-flow graph and the representation of instructions.

2890 Soot's interface for defining lattice and analysis functions is

2891 like our own,

2892 although because Soot is implemented in an imperative style,

2893 additional functions are needed to copy lattice elements.

2897 for same reason as Soot below ---JD}

2898 supports both analysis and rewriting of C~programs,

2899 but rewriting is clearly distinct from analysis:

2900 one runs an analysis to completion and then rewrites based on the

2901 results.

2902 The framework is limited to one representation of control-flow graphs

2903 and one representation of instructions, both of which are provided by

2904 the framework.

2905 The~API is complicated;

2906 much of the complexity is needed to enable the client to

2907 affect which instructions

2908 the analysis iterates over.

2912 but I'll be darned if I can find anything that might be termed a dataflow framework.}

2914 The Whirlwind compiler contains the dataflow framework implemented

2916 interleave analysis and transformation.

2917 Their implementation is much like our early efforts:

2918 it is a complicated mix of code that simultaneously manages interleaving,

2919 deep rewriting, and fixed-point computation.

2920 By~separating these tasks,

2921 our implementation simplifies the problem dramatically.

2922 Whirlwind's implementation also suffers from the difficulty of

2923 maintaining pointer invariants in a mutable representation of

2924 control-flow graphs, a problem we have discussed elsewhere

2927 Because speculative transformation is difficult in an imperative setting,

2928 Whirlwind's implementation is split into two phases.

2929 The first phase runs the interleaved analyses and transformations

2930 to compute the final dataflow facts and a representation of the transformations

2931 that should be applied to the input graph.

2932 The second phase executes the transformations.

2933 In~\hoopl, because control-flow graphs are immutable, speculative transformations

2934 can be applied immediately, and there is no need

2935 for a phase distinction.

2937 %%% % repetitious...

2938 %%%

2939 %%% \ourlib\ also improves upon Whirlwind's dataflow framework by providing

2940 %%% new support for the optimization writer:

2941 %%% \begin{itemize}

2942 %%% \item Using static type guarantees, \hoopl\ rules out a whole

2943 %%% class of possible bugs: transformations that produced malformed

2944 %%% control-flow graphs.

2945 %%% \item Using dynamic testing,

2946 %%% we can isolate the rewrite that transforms a working program

2947 %%% into a faulty program,

2948 %%% using Whalley's \citeyearpar{whalley:isolation} fault-isolation technique.

2949 %%% \end{itemize}

2951 %% what follows now looks redundant with discussion below ---NR

2953 %% In previous work \cite{ramsey-dias:applicative-flow-graph}, we

2954 %% described a zipper-based representation of control-flow

2955 %% graphs, stressing the advantages

2956 %% of immutability.

2957 %% Our new representation, described in \secref{graph-rep}, is a significant improvement:

2958 %% \begin{itemize}

2959 %% \item

2960 %% We can concatenate nodes, blocks, and graphs in constant time.

2961 %% %Previously, we had to resort to Hughes's

2962 %% %\citeyearpar{hughes:lists-representation:article} technique, representing

2963 %% %a graph as a function.

2964 %% \item

2965 %% We can do a backward analysis without having

2966 %% to ``unzip'' (and allocate a copy of) each block.

2967 %% \item

2968 %% Using GADTs, we can represent a flow-graph

2969 %% node using a single type, instead of the triple of first, middle, and

2970 %% last types used in our earlier representation.

2971 %% This change simplifies the interface significantly:

2972 %% instead of providing three transfer functions and three rewrite

2973 %% functions per pass---one for

2974 %% each type of node---a client of \hoopl\ provides only one transfer

2975 %% function and one rewrite function per pass.

2976 %% \item

2977 %% Errors in concatenation are ruled out at

2978 %% compile-compile time by Haskell's static

2979 %% type system.

2980 %% In~earlier implementations, such errors were not detected until

2981 %% the compiler~ran, at which point we tried to compensate

2982 %% for the errors---but

2983 %% the compensation code harbored subtle faults,

2984 %% which we discovered while developing a new back end

2985 %% for the Glasgow Haskell Compiler.

2986 %% \end{itemize}

2987 %%

2988 %% The implementation of \ourlib\ is also much better than

2989 %% our earlier implementations.

2990 %% Not only is the code simpler conceptually,

2991 %% but it is also shorter:

2992 %% our new implementation is about a third as long

2993 %% as the previous version, which is part of GHC, version 6.12.

3000 Our work on \hoopl\ is too new for us to be able to say much

3001 about performance.

3002 It's~important to know how well \hoopl\ performs, but the

3003 question is comparative, and there isn't another library we can compare

3004 \hoopl\ with.

3005 For example, \hoopl\ is not a drop-in replacement for an existing

3006 component of GHC; we introduced \hoopl\ to GHC as part of a

3007 major refactoring of GHC's back end.

3009 the previous version, and we

3010 don't know what portion of the slowdown can be attributed to the

3011 optimizer.

3012 %

3013 What we can say is that the costs of using \hoopl\ seem reasonable;

3014 there is no ``big performance hit.''

3016 language, actually improved performance in an apples-to-apples

3017 comparison with a library using a mutable control-flow graph

3020 Although a~thorough evaluation of \hoopl's performance must await

3021 future work, we can identify some design decisions that affect

3022 performance.

3024 \item

3026 Using this representation, a block of $N$~nodes is represented using

3028 We~have also implemented a representation of blocks that include

3029 ``cons-like'' and ``snoc-like'' constructors;

3030 this representation requires only $N+1$ heap objects.

3031 We~don't know what difference this choice makes to performance.

3032 \item

3034 rewrites the body of a control-flow graph, and @fixpoint@ iterates

3035 this analysis until it reaches a fixed point.

3036 Decorated graphs computed on earlier iterations are thrown away.

3037 But~for each decorated graph of $N$~nodes, it is necessary to allocate

3039 @singletonDG@ in~@node@ and of @dgSplice@ in~@cat@.

3040 In~an earlier version of \hoopl, this overhead was

3041 eliminated by splitting @arfGraph@ into two very similar functions: one to compute the

3042 fixed point, and the other to produce the rewritten graph.

3043 Having a single version of @arfGraph@ is simpler and easier

3044 to maintain, but we don't know the cost of allocating the

3045 additional thunks.

3046 \item

3047 The representation of a forward-transfer function is private to

3048 \hoopl.

3049 Two representations are possible:

3050 we may store a triple of functions, one for each shape a node may

3051 have;

3052 or we may store a single, polymorphic function.

3053 If~we use triples throughout, the costs are straightforward, but the

3054 code is complex.

3055 If~we use a single, polymorphic function, we sometimes have to use a

3057 transfer functions.

3058 Using a shape classifier may introduce extra @case@ discriminations

3059 every time a transfer function or rewrite function is applied to a

3060 node.

3061 We~don't know how these extra discriminations might affect

3062 performance.

3064 In summary, \hoopl\ performs well enough for use in~GHC,

3065 but there is much we don't know. Systematic investigation

3066 is indicated.

3072 We built \hoopl\ in order to combine three good ideas (interleaved

3073 analysis and transformation, optimization fuel, and an applicative

3074 control-flow graph) in a way that could easily be reused by many, many

3075 compiler writers.

3076 To~evaluate how well we succeeded, we examine how \hoopl\ has been

3077 used,

3078 we~examine the API, and we examine the implementation.

3082 As~suggested by the constant-propagation example in

3084 dataflow analyses.

3085 Students using \hoopl\ in a class at Tufts were able to implement

3087 and induction-variable elimination

3089 Students at Yale and at Portland State have also succeeded in building

3090 a variety of optimizations.

3092 \hoopl's data types can support optimizations beyond classic

3093 dataflow.

3094 For example, within GHC, \hoopl's graphs are used

3095 to implement optimizations based on control flow,

3096 such as eliminating branch chains.

3098 \hoopl\ has not yet been used to implement SSA optimizations.

3099 \hoopl\ is SSA-neutral:

3100 although we know of no attempt to use

3101 \hoopl\ to establish or enforce SSA~invariants,

3103 representation of first nodes,

3104 and if a transformation preserves SSA~invariants, it will continue to do

3105 so when implemented in \hoopl.

3110 itself,

3111 but there are a couple of properties we think are worth highlighting.

3112 First, it's a good sign that the API provides many higher-order

3113 combinators that make it easier to write client code. % with simple, expressive types.

3114 We~have had space to mention only a few:

3115 @extendJoinDomain@,

3116 @thenFwdRw@, @deepFwdRw@, @shallowFwdRw@, and @pairFwd@.

3118 Second,

3119 the static encoding of open and closed shapes at compile time worked

3120 out well.

3121 % especially because it applies equally to nodes, blocks, and graphs.

3122 Shapes may

3123 seem like a small refinement, but they helped eliminate a number of

3124 bugs from GHC, and we expect them to help other clients too.

3125 GADTs are a convenient way to express shapes, and for clients

3126 written in Haskell, they are clearly appropriate.

3127 If~one wished to port \hoopl\ to a language without GADTs,

3128 many of the benefits could be realized by making the shapes phantom

3129 types, but without GADTs, pattern matching would be significantly more

3130 tedious and error-prone.

3133 % An~advantage of our ``shapely'' node API is that a client can

3134 % write a \emph{single} transfer function that is polymorphic in shape.

3135 % To~make this design work, however, we \emph{must} have

3136 % the type-level function

3137 % @Fact@ (\figref{api-types}), to express how incoming

3138 % and outgoing facts depend on the shape of a node.

3139 % Without type-level functions, we would have had to force clients to

3140 % use \emph{only} the triple-of-functions interface described in

3141 % \secref{triple-of-functions}.

3145 If you are thinking of adopting \hoopl, you had better consider not

3146 only whether you like the API, but whether, in case of emergency, you

3147 could maintain the implementation.

3149 implementation is a clear improvement over previous implementations

3150 of similar ideas.

3151 % The implementation is more difficult to evaluate than the~API.

3152 % Previous implementations of similar ideas have rolled the problems

3153 % into a big ball of mud.

3154 By~decomposing our implementation into @node@, @block@, @body@,

3155 @graph@, @cat@, @withFuel@, and @fixpoint@, we have clearly separated

3156 multiple concerns:

3157 interleaving analysis with rewriting,

3158 throttling rewriting using optimization fuel,

3159 and

3160 computing a fixed point using speculative rewriting.

3161 Because of this separation of concerns,

3162 we believe our implementation will be much easier to maintain than

3163 anything that preceded it.

3165 Another good sign is that we have been able to make substantial

3166 changes in the implementation without changing the API.

3167 For example, in addition to ``@concatMap@ style,'' we have also

3168 implemented @arfGraph@ in ``fold style'' and in continuation-passing

3169 style.

3170 Which style is preferred is a matter of taste, and possibly

3171 a matter of performance.

3176 \iffalse

3178 (We have also implemented a ``fold style,'' but because the types are

3179 a little less intuitive, we are sticking with @concatMap@ style for now.)

3182 > Some numbers, I have used it nine times, and would need the general fold

3183 > once to define blockToNodeList (or CB* equivalent suggested by you).

3184 > (We are using it in GHC to

3185 > - computing hash of the blocks from the nodes

3186 > - finding the last node of a block

3187 > - converting block to the old representation (2x)

3188 > - computing interference graph

3189 > - counting Area used by a block (2x)

3190 > - counting stack high-water mark for a block

3191 > - prettyprinting block)

3194 type-parameter hell, newtype hell, typechecking hell, instance hell,

3195 triple hell

3197 \fi

3200 % We have spent six years implementing and reimplementing frameworks for

3201 % dataflow analysis and transformation.

3202 % This formidable design problem taught us

3203 % two kinds of lessons:

3204 % we learned some very specific lessons about representations and

3205 % algorithms for optimizing compilers,

3206 % and we were forcibly reminded of some very general, old lessons that are well

3207 % known not just to functional programmers, but to programmers

3208 % everywhere.

3212 %%%% \remark{Orphaned: but for transfer functions that

3213 %%%% approximate weakest preconditions or strongest postconditions,

3214 %%%% monotonicity falls out naturally.}

3215 %%%%

3216 %%%%

3217 %%%% In conclusion we offer the following lessons from the experience of designing

3218 %%%% and implementing \ourlib{}.

3219 %%%% \begin{itemize}

3220 %%%% \item

3221 %%%% Although we have not stressed this point, there is a close connection

3222 %%%% between dataflow analyses and program logic:

3223 %%%% \begin{itemize}

3224 %%%% \item

3225 %%%% A forward dataflow analysis is represented by a predicate transformer

3226 %%%% that is related to \emph{strongest postconditions}

3227 %%%% \cite{floyd:meaning}.\footnote

3228 %%%% {In Floyd's paper the output of the predicate transformer is called

3229 %%%% the \emph{strongest verifiable consequent}, not the ``strongest

3230 %%%% postcondition.''}

3231 %%%% \item

3232 %%%% A backward dataflow analysis is represented by a predicate transformer

3233 %%%% that is related to \emph{weakest preconditions} \cite{dijkstra:discipline}.

3234 %%%% \end{itemize}

3235 %%%% Logicians write down the predicate transformers for the primitive

3236 %%%% program fragments, and then use compositional rules to ``lift'' them

3237 %%%% to a logic for whole programs. In the same way \ourlib{} lets the client

3238 %%%% write simple predicate transformers,

3239 %%%% and local rewrites based on those assertions, and ``lifts'' them to entire

3240 %%%% function bodies with arbitrary control flow.

3242 \iffalse

3245 Reuse requires abstraction, and as is well known,

3246 designing good abstractions is challenging.

3247 \hoopl's data types and the functions over those types have been

3250 As~we were refining our design, we~found it invaluable to operate in

3251 two modes:

3252 In the first mode, we designed, built, and used a framework as an

3254 In the second mode, we designed and built a standalone library, then

3255 redesigned and rebuilt it, sometimes going through several significant

3256 changes in a week.

3257 Operating in the first mode---inside a live compiler---forced us to

3258 make sure that no corners were cut, that we were solving a real

3259 problem, and that we did not inadvertently

3260 cripple some other part of the compiler.

3261 Operating in the second mode---as a standalone library---enabled us to

3262 iterate furiously, trying out many more ideas than would have

3263 been possible in the first mode.

3264 Alternating between these two modes has led to a

3265 better design than operating in either mode alone.

3267 %% We were forcibly reminded of timeless truths:

3268 It is a truth universally acknowledged that

3269 interfaces are more important than implementations and data

3270 is more important than code.

3271 This truth is reflected in this paper, in which

3272 we

3275 We have evaluate \hoopl's API through small, focused classroom

3276 projects and by using \hoopl\ in the back end of the Glasgow Haskell

3277 Compiler.

3281 We were also reminded that Haskell's type system (polymorphism, GADTs,

3282 higher-order functions, type classes, and so on) is a remarkably

3283 effective

3284 language for thinking about data and code---and that

3285 Haskell lacks a language of interfaces (like ML's signatures) that

3286 would make it equally effective for thinking about APIs at a larger scale.

3287 Still, as usual, the types were a remarkable aid to writing the code:

3288 when we finally agreed on the types presented above, the

3289 code almost wrote itself.

3291 Types are widely appreciated at ICFP, but here are three specific

3292 examples of how types helped us:

3294 \item

3295 Reuse is enabled by representation-independence, which in a functional

3296 language is

3297 expressed through parametric polymorphism.

3299 made the code simpler, easier to understand, and easier to maintain.

3301 \ourlib\ must know about nodes, and to embody that knowledge in the

3303 \item

3305 %

3306 % this paper is just not about run-time performance ---NR

3307 %

3308 %%%% Moreover, the implementation is faster than it would otherwise be,

3309 %%%% because, say, a @(Fact O f)e@ is known to be just an @f@ rather than

3310 %%%% being a sum type that must be tested (with a statically known outcome!).

3311 %

3313 to nodes, blocks, and graphs helped our

3314 thinking and helped to structure the implementation.

3315 \item

3318 \fi

3322 Dataflow optimization is usually described as a way to improve imperative

3323 programs by mutating control-flow graphs.

3324 Such transformations appear very different from the tree rewriting

3325 that functional languages are so well known for, and that makes

3326 \ifhaskellworkshop

3327 Haskell

3328 \else

3329 functional languages

3330 \fi

3331 so attractive for writing other parts of compilers.

3332 But even though dataflow optimization looks very different from

3333 what we are used to,

3334 writing a dataflow optimizer

3335 in

3336 \ifhaskellworkshop

3337 Haskell

3338 \else

3339 a pure functional language

3340 \fi

3341 was a win:

3342 % We could not possibly have conceived \ourlib{} in C++.

3343 we had to make every input and output explicit,

3344 and we had a strong incentive to implement things compositionally.

3345 Using Haskell helped us make real improvements in the implementation of

3346 some very sophisticated ideas.

3347 % %%

3348 % %%

3349 % %% In~a pure functional language, not only do we know that

3350 % %% no data structure will be unexpectedly mutated,

3351 % %% but we are forced to be

3352 % %% explicit about every input and output,

3353 % %% and we are encouraged to implement things compositionally.

3354 % %% This kind of thinking has helped us make

3355 % %% significant improvements to the already tricky work of Lerner, Grove,

3356 % %% and Chambers:

3357 % %% per-function control of shallow vs deep rewriting

3358 % %% (\secref{shallow-vs-deep}),

3359 % %% optimization fuel (\secref{fuel}),

3360 % %% and transparent management of unreachable blocks (\secref{arf-body}).

3361 % We~trust that the improvements are right only because they are

3362 % implemented in separate

3363 % parts of the code that cannot interact except through

3364 % explicit function calls.

3365 % %% %%

3366 % %% %%An ancestor of \ourlib{} is in the Glasgow Haskell Compiler today,

3367 % %% %%in version~6.12.

3368 % %% With this new, improved design in hand, we are now moving back to

3369 % %% live-compiler mode, pushing \hoopl\ into version

3370 % %% 6.13 of the Glasgow Haskell Compiler.

3373 \acks

3375 Brian Huffman and Graham Hutton helped with algebraic laws.

3376 Several anonymous reviewers helped improve the

3377 presentation, especially

3378 reviewer~C, who suggested better language with which to describe our

3379 work.

3381 The first and second authors were funded

3382 by a grant from Intel Corporation and

3384 These authors also thank Microsoft Research Ltd, UK, for funding

3385 extended visits to the third author.

3388 \makeatother

3395 \clearpage

3397 \appendix

3399 % omit LabelSet :: *

3400 % omit LabelMap :: *

3401 % omit delFromFactBase :: FactBase f -> [(Label,f)] -> FactBase f

3402 % omit elemFactBase :: Label -> FactBase f -> Bool

3403 % omit elemLabelSet :: Label -> LabelSet -> Bool

3404 % omit emptyLabelSet :: LabelSet

3405 % omit factBaseLabels :: FactBase f -> [Label]

3406 % omit extendFactBase :: FactBase f -> Label -> f -> FactBase f

3407 % omit extendLabelSet :: LabelSet -> Label -> LabelSet

3408 % omit lookupFact :: FactBase f -> Label -> Maybe f

3409 % omit factBaseList :: FactBase f -> [(Label, f)]

3411 %% \section{Code for \textmd{\texttt{fixpoint}}}

3412 %% \label{app:fixpoint}

3413 %%

3414 %% {\def\baselinestretch{0.95}\hfuzz=20pt

3415 %% \begin{smallcode}

3416 %% data `TxFactBase n f

3417 %% = `TxFB { `tfb_fbase :: FactBase f

3418 %% , `tfb_rg :: DG n f C C -- Transformed blocks

3419 %% , `tfb_cha :: ChangeFlag

3420 %% , `tfb_lbls :: LabelSet }

3421 %% -- Set the tfb_cha flag iff

3422 %% -- (a) the fact in tfb_fbase for or a block L changes

3423 %% -- (b) L is in tfb_lbls.

3424 %% -- The tfb_lbls are all Labels of the *original*

3425 %% -- (not transformed) blocks

3426 %%

3427 %% `updateFact :: DataflowLattice f -> LabelSet -> (Label, f)

3428 %% -> (ChangeFlag, FactBase f)

3429 %% -> (ChangeFlag, FactBase f)

3430 %% updateFact ^lat ^lbls (lbl, ^new_fact) (^cha, fbase)

3431 %% | NoChange <- ^cha2 = (cha, fbase)

3432 %% | lbl `elemLabelSet` lbls = (SomeChange, new_fbase)

3433 %% | otherwise = (cha, new_fbase)

3434 %% where

3435 %% (cha2, ^res_fact)

3436 %% = case lookupFact fbase lbl of

3437 %% Nothing -> (SomeChange, new_fact)

3438 %% Just ^old_fact -> fact_extend lat old_fact new_fact

3439 %% ^new_fbase = extendFactBase fbase lbl res_fact

3440 %%

3441 %% fixpoint :: forall n f. NonLocal n

3442 %% => Bool -- Going forwards?

3443 %% -> DataflowLattice f

3444 %% -> (Block n C C -> FactBase f

3445 %% -> FuelMonad (DG n f C C, FactBase f))

3446 %% -> FactBase f -> [(Label, Block n C C)]

3447 %% -> FuelMonad (DG n f C C, FactBase f)

3448 %% fixpoint ^is_fwd lat ^do_block ^init_fbase ^blocks

3449 %% = do { ^fuel <- getFuel

3450 %% ; ^tx_fb <- loop fuel init_fbase

3451 %% ; return (tfb_rg tx_fb,

3452 %% tfb_fbase tx_fb `delFromFactBase` blocks) }

3453 %% -- The outgoing FactBase contains facts only for

3454 %% -- Labels *not* in the blocks of the graph

3455 %% where

3456 %% `tx_blocks :: [(Label, Block n C C)]

3457 %% -> TxFactBase n f -> FuelMonad (TxFactBase n f)

3458 %% tx_blocks [] tx_fb = return tx_fb

3459 %% tx_blocks ((lbl,blk):bs) tx_fb = tx_block lbl blk tx_fb

3460 %% >>= tx_blocks bs

3461 %%

3462 %% `tx_block :: Label -> Block n C C

3463 %% -> TxFactBase n f -> FuelMonad (TxFactBase n f)

3464 %% tx_block ^lbl ^blk tx_fb@(TxFB { tfb_fbase = fbase

3465 %% , tfb_lbls = lbls

3466 %% , tfb_rg = ^blks

3467 %% , tfb_cha = cha })

3468 %% | is_fwd && not (lbl `elemFactBase` fbase)

3469 %% = return tx_fb -- Note [Unreachable blocks]

3470 %% | otherwise

3471 %% = do { (rg, ^out_facts) <- do_block blk fbase

3472 %% ; let (^cha', ^fbase')

3473 %% = foldr (updateFact lat lbls) (cha,fbase)

3474 %% (factBaseList out_facts)

3475 %% ; return (TxFB { tfb_lbls = extendLabelSet lbls lbl

3476 %% , tfb_rg = rg `DGCatC` blks

3477 %% , tfb_fbase = fbase'

3478 %% , tfb_cha = cha' }) }

3479 %%

3480 %% loop :: Fuel -> FactBase f -> FuelMonad (TxFactBase n f)

3481 %% `loop fuel fbase

3482 %% = do { let ^init_tx_fb = TxFB { tfb_fbase = fbase

3483 %% , tfb_cha = NoChange

3484 %% , tfb_rg = DGNil

3485 %% , tfb_lbls = emptyLabelSet}

3486 %% ; tx_fb <- tx_blocks blocks init_tx_fb

3487 %% ; case tfb_cha tx_fb of

3488 %% NoChange -> return tx_fb

3489 %% SomeChange -> setFuel fuel >>

3490 %% loop fuel (tfb_fbase tx_fb) }

3491 %% \end{smallcode}

3492 %% \par

3493 %% } % end \baselinestretch

3498 This appendix lists every nontrivial identifier used in the body of

3499 the paper.

3500 For each identifier, we list the page on which that identifier is

3501 defined or discussed---or when appropriate, the figure (with line

3502 number where possible).

3503 For those few identifiers not defined or discussed in text, we give

3504 the type signature and the page on which the identifier is first

3505 referred to.

3507 Some identifiers used in the text are defined in the Haskell Prelude;

3508 for those readers less familiar with Haskell (possible even at the

3509 Haskell Symposium!), these identifiers are

3516 \let\hsprelude\dropit

3525 \noindent

3528 \else

3530 \fi

3531 }

3533 \noindent

3536 \else

3538 \fi

3539 }

3543 \noindent

3546 \else

3547 \texttt{#2} {let}- or $\lambda$-bound on \lineref{#1} of Figure~\ref{#3} on page~\pageref{#3}.\\

3548 \fi

3549 }

3560 \else

3562 \fi

3563 }

3565 \newif\ifundefinedsection\undefinedsectionfalse

3568 \ifundefinedsection

3570 \else

3571 \undefinedsectiontrue

3572 \par

3575 \fi

3576 }

3578 \begingroup

3579 \raggedright

3584 \undefinedsectionfalse

3588 \ifundefinedsection

3590 \else

3591 \undefinedsectiontrue

3592 \par

3595 \fi

3596 }

3597 \let\hspagedef\dropit

3598 \let\omithspagedef\dropit

3599 \let\omithsfigdef\dropit

3600 \let\hsfigdef\dropit

3601 \let\hstabdef\dropit

3602 \let\hspagedefll\dropit

3603 \let\hsfigdefll\dropit

3604 \let\nothspagedef\dropit

3605 \let\nothsfigdef\dropit

3606 \let\nothslinedef\dropit

3613 \endgroup

3616 \iffalse

3636 \fi

3645 THE FUEL PROBLEM:

3648 Here is the problem:

3650 A graph has an entry sequence, a body, and an exit sequence.

3651 Correctly computing facts on and flowing out of the body requires

3652 iteration; computation on the entry and exit sequences do not, since

3653 each is connected to the body by exactly one flow edge.

3655 The problem is to provide the correct fuel supply to the combined

3656 analysis/rewrite (iterator) functions, so that speculative rewriting

3657 is limited by the fuel supply.

3659 I will number iterations from 1 and name the fuel supplies as

3660 follows:

3662 f_pre fuel remaining before analysis/rewriting starts

3663 f_0 fuel remaining after analysis/rewriting of the entry sequence

3664 f_i, i>0 fuel remaining after iteration i of the body

3665 f_post fuel remaining after analysis/rewriting of the exit sequence

3667 The issue here is that only the last iteration of the body 'counts'.

3668 To formalize, I will name fuel consumed:

3670 C_pre fuel consumed by speculative rewrites in entry sequence

3671 C_i fuel consumed by speculative rewrites in iteration i of body

3672 C_post fuel consumed by speculative rewrites in exit sequence

3674 These quantities should be related as follows:

3676 f_0 = f_pre - C_pref

3677 f_i = f_0 - C_i where i > 0

3678 f_post = f_n - C_post where iteration converges after n steps

3680 When the fuel supply is passed explicitly as parameter and result, it

3681 is fairly easy to see how to keep reusing f_0 at every iteration, then

3682 extract f_n for use before the exit sequence. It is not obvious to me

3683 how to do it cleanly using the fuel monad.

3686 Norman