1 {-# LANGUAGE ScopedTypeVariables #-}

2 {-# OPTIONS -Wall -fno-warn-name-shadowing #-}

3 module ZipCfg

12 -- Observers and transformers

15 , tailOfLast

19 , postorder_dfs

24 , pprLgraph

25 )

26 where

28 import Maybes

30 import Panic

32 import Unique

33 import UniqFM

34 import UniqSet

35 import UniqSupply

37 -------------------------------------------------------------------------

38 -- GENERIC ZIPPER-BASED CONTROL-FLOW GRAPH --

39 -------------------------------------------------------------------------

40 {-

42 This module defines datatypes used to represent control-flow graphs,

43 along with some functions for analyzing and splicing graphs.

44 Functions for building graphs are found in a separate module 'MkZipCfg'.

46 Every graph has a distinguished entry point. A graph has at least one

47 exit; most exits are instructions (or statements) like 'jump' or

48 'return', which transfer control to other procedures, but a graph may

49 have up to one 'fall through' exit. (A graph that represents an

50 entire Haskell or C-- procedure does not have a 'fall through' exit.)

52 A graph is a collection of basic blocks. A basic block begins with a

53 label (unique id; see Note [Unique BlockId]) which is followed by a

54 sequence of zero or more 'middle' nodes; the basic block ends with a

55 'last' node. Each 'middle' node is a single-entry, single-exit,

56 uninterruptible computation. A 'last' node is a single-entry,

57 multiple-exit computation. A last node may have zero or more successors,

58 which are identified by their unique ids.

60 A special case of last node is the ``default exit,'' which represents

61 'falling off the end' of the graph. Such a node is always represented by

62 the data constructor 'LastExit'. A graph may contain at most one

63 'LastExit' node, and a graph representing a full procedure should not

64 contain any 'LastExit' nodes. 'LastExit' nodes are used only to splice

65 graphs together, either during graph construction (see module 'MkZipCfg')

66 or during optimization (see module 'ZipDataflow').

68 A graph is parameterized over the types of middle and last nodes. Each of

69 these types will typically be instantiated with a subset of C-- statements

70 (see module 'ZipCfgCmm') or a subset of machine instructions (yet to be

71 implemented as of August 2007).

75 This module exposes three representations of graphs. In order of

76 increasing complexity, they are:

78 Graph m l The basic graph with its distinguished entry point

80 LGraph m l A graph with a *labelled* entry point

82 FGraph m l A labelled graph with the *focus* on a particular edge

84 There are three types because each type offers a slightly different

85 invariant or cost model.

87 * The distinguished entry of a Graph has no label. Because labels must

88 be unique, acquiring one requires a monadic operation ('freshBlockId').

89 The primary advantage of the Graph representation is that we can build

90 a small Graph purely functionally, without entering a monad. For

91 example, during optimization we can easily rewrite a single middle

92 node into a Graph containing a sequence of two middle nodes followed by

93 LastExit.

95 * In an LGraph, every basic block is labelled. The primary advantage of

96 this representation is its simplicity: each basic block can be treated

97 like any other. This representation is used for mapping, folding, and

98 translation, as well as layout.

100 Like any graph, an LGraph still has a distinguished entry point,

101 which you can discover using 'gr_entry'.

103 * An FGraph is an LGraph with the *focus* on one particular edge. The

104 primary advantage of this representation is that it provides

105 constant-time access to the nodes connected by that edge, and it also

106 allows constant-time, functional *replacement* of those nodes---in the

107 style of Huet's 'zipper'.

109 None of these representations is ideally suited to the incremental

110 construction of large graphs. A separate module, 'MkZipCfg', provides a

111 fourth representation that is asymptotically optimal for such construction.

113 -}

117 -- (fails if there isn't one)

120 -- focus on start of block satisfying predicate

123 -- | We can insert a single-entry, single-exit subgraph at

124 -- the current focus.

125 -- The new focus can be at either the entry edge or the exit edge.

130 --------------- Representation --------------------

132 -- | A basic block is a [[first]] node, followed by zero or more [[middle]]

133 -- nodes, followed by a [[last]] node.

135 -- eventually this module should probably replace the original Cmm, but for

136 -- now we leave it to dynamic invariants what can be found where

138 data ZLast l

140 -- LastExit is a device used only for graphs under

141 -- construction, or framgments of graph under optimisation,

142 -- so we don't want to pollute the 'l' type parameter with it

143 | LastOther l

146 -- ZHead is a (reversed) sequence of middle nodes labeled by a BlockId

148 -- ZTail is a sequence of middle nodes followed by a last node

150 -- | Blocks and flow graphs

158 -- | And now the zipper. The focus is between the head and tail.

159 -- Notice we cannot ever focus on an inter-block edge.

164 -- Invariant: the block represented by 'zg_focus' is *not*

165 -- in the map 'zg_others'

167 ---- Utility functions ---

178 -- | Some ways to combine parts:

184 -- | We can splice a single-entry, single-exit LGraph onto a head or a tail.

185 -- For a head, we have a head~[[h]] followed by a LGraph~[[g]].

186 -- The entry node of~[[g]] gets joined to~[[h]], forming the entry into

187 -- the new LGraph. The exit of~[[g]] becomes the new head.

188 -- For both arguments and results, the order of values is the order of

189 -- control flow: before splicing, the head flows into the LGraph; after

190 -- splicing, the LGraph flows into the head.

191 -- Splicing a tail is the dual operation.

192 -- (In order to maintain the order-means-control-flow convention, the

193 -- orders are reversed.)

198 -- | We can also splice a single-entry, no-exit LGraph into a head.

201 -- | Finally, we can remove the entry label of an LGraph and remove

202 -- it, leaving a Graph:

208 -- | Traversal: [[postorder_dfs]] returns a list of blocks reachable from

209 -- the entry node.

210 -- The postorder depth-first-search order means the list is in roughly

211 -- first-to-last order, as suitable for use in a forward dataflow problem.

215 -- | For layout, we fold over pairs of [[Block m l]] and [[Maybe BlockId]]

216 -- in layout order. The [[BlockId]], if any, identifies the block that

217 -- will be the layout successor of the current block. This may be

218 -- useful to help an emitter omit the final [[goto]] of a block that

219 -- flows directly to its layout successor.

220 fold_layout ::

223 -- | We can also fold and iterate over blocks.

227 -- mapping includes the entry id!

232 {-

233 translateA :: forall m l m' l' .

234 (m -> Agraph m' l') -> (l -> AGraph m' l') -> LGraph m l -> LGraph m' l'

235 -}

237 ------------------- Last nodes

239 -- | We can't make a graph out of just any old 'last node' type. A

240 -- last node has to be able to find its successors, and we need to

241 -- be able to create and identify unconditional branches. We put

242 -- these capabilities in a type class.

256 succs LastExit = []

258 fold_succs _ LastExit z = z

275 ------------------- Observing nodes

277 -- | Fold from first to last

278 fold_fwd_block ::

282 -- | iterate from first to last

283 foldM_fwd_block ::

287 -- ================ IMPLEMENTATION ================--

291 -- | Convert block between forms.

292 -- These functions are tail-recursive, so we can go as deep as we like

293 -- without fear of stack overflow.

312 zipht = ht_to_first

347 -- | 'insertBlock' should not be used to *replace* an existing block

348 -- but only to insert a new one

350 insertBlock b bs =

359 check_single_exit g =

363 _ -> found

365 panic "graph does not have an exit"

366 else

375 where

379 cont acc visited

380 else

382 vchildren block bs cont acc visited =

390 Nothing -> rst

420 {-

421 \paragraph{Splicing support}

423 We want to be able to scrutinize a single-entry, single-exit LGraph for

424 splicing purposes.

425 There are two useful cases: the LGraph is a single block or it isn't.

426 We use continuation-passing style.

427 -}

429 prepare_for_splicing ::

431 -> a

432 prepare_for_splicing g single multi =

434 ZBlock _ etail = gentry

437 LastExit -> single etail

439 else

448 check_single_exit g $

454 splice_many_blocks entry exit others =

456 in prepare_for_splicing g splice_one_block splice_many_blocks

459 check_single_exit g $

467 splice_many_blocks entry exit others =

469 in prepare_for_splicing g splice_one_block splice_many_blocks

485 remove_entry_label g =

492 --- Translation

497 where

498 txblock ::

515 ----------------------------------------------------------------

516 --- Block Ids, their environments, and their sets

518 {- Note [Unique BlockId]

519 ~~~~~~~~~~~~~~~~~~~~~~~~

520 Although a 'BlockId' is a local label, for reasons of implementation,

521 'BlockId's must be unique within an entire compilation unit. The reason

522 is that each local label is mapped to an assembly-language label, and in

523 most assembly languages allow, a label is visible throughout the enitre

524 compilation unit in which it appears.

525 -}

541 emptyBlockEnv :: BlockEnv a

542 emptyBlockEnv = emptyUFM

544 lookupBlockEnv = lookupUFM

546 extendBlockEnv = addToUFM

548 mkBlockEnv = listToUFM

551 emptyBlockSet :: BlockSet

552 emptyBlockSet = emptyUniqSet

554 elemBlockSet = elementOfUniqSet

556 extendBlockSet = addOneToUniqSet

558 mkBlockSet = mkUniqSet

560 ----------------------------------------------------------------

561 -- putting this code in PprCmmZ leads to circular imports :-(

564 ppr = pprTail

566 -- | 'pprTail' is used for debugging only

575 blocks = postorder_dfs g