1 {-

2 (c) The GRASP/AQUA Project, Glasgow University, 1992-1998

4 ************************************************************************

5 * *

6 \section[FloatIn]{Floating Inwards pass}

7 * *

8 ************************************************************************

10 The main purpose of @floatInwards@ is floating into branches of a

11 case, so that we don't allocate things, save them on the stack, and

12 then discover that they aren't needed in the chosen branch.

13 -}

15 {-# LANGUAGE CPP #-}

16 {-# OPTIONS_GHC -fprof-auto #-}

22 import GhcPrelude

24 import CoreSyn

25 import MkCore

27 import CoreUtils

28 import CoreFVs

31 import Var

32 import Type

33 import VarSet

34 import Util

35 import DynFlags

36 import Outputable

37 -- import Data.List ( mapAccumL )

40 {-

41 Top-level interface function, @floatInwards@. Note that we do not

42 actually float any bindings downwards from the top-level.

43 -}

49 where

56 {-

57 ************************************************************************

58 * *

59 \subsection{Mail from Andr\'e [edited]}

60 * *

61 ************************************************************************

63 {\em Will wrote: What??? I thought the idea was to float as far

64 inwards as possible, no matter what. This is dropping all bindings

65 every time it sees a lambda of any kind. Help! }

67 You are assuming we DO DO full laziness AFTER floating inwards! We

68 have to [not float inside lambdas] if we don't.

70 If we indeed do full laziness after the floating inwards (we could

71 check the compilation flags for that) then I agree we could be more

72 aggressive and do float inwards past lambdas.

74 Actually we are not doing a proper full laziness (see below), which

75 was another reason for not floating inwards past a lambda.

77 This can easily be fixed. The problem is that we float lets outwards,

78 but there are a few expressions which are not let bound, like case

79 scrutinees and case alternatives. After floating inwards the

80 simplifier could decide to inline the let and the laziness would be

81 lost, e.g.

83 \begin{verbatim}

84 let a = expensive ==> \b -> case expensive of ...

85 in \ b -> case a of ...

86 \end{verbatim}

87 The fix is

88 \begin{enumerate}

89 \item

90 to let bind the algebraic case scrutinees (done, I think) and

91 the case alternatives (except the ones with an

92 unboxed type)(not done, I think). This is best done in the

93 SetLevels.hs module, which tags things with their level numbers.

94 \item

95 do the full laziness pass (floating lets outwards).

96 \item

97 simplify. The simplifier inlines the (trivial) lets that were

98 created but were not floated outwards.

99 \end{enumerate}

101 With the fix I think Will's suggestion that we can gain even more from

102 strictness by floating inwards past lambdas makes sense.

104 We still gain even without going past lambdas, as things may be

105 strict in the (new) context of a branch (where it was floated to) or

106 of a let rhs, e.g.

107 \begin{verbatim}

108 let a = something case x of

109 in case x of alt1 -> case something of a -> a + a

110 alt1 -> a + a ==> alt2 -> b

111 alt2 -> b

113 let a = something let b = case something of a -> a + a

114 in let b = a + a ==> in (b,b)

115 in (b,b)

116 \end{verbatim}

117 Also, even if a is not found to be strict in the new context and is

118 still left as a let, if the branch is not taken (or b is not entered)

119 the closure for a is not built.

121 ************************************************************************

122 * *

123 \subsection{Main floating-inwards code}

124 * *

125 ************************************************************************

126 -}

132 -- The FreeVarSet is the free variables of the binding. In the case

133 -- of recursive bindings, the set doesn't include the bound

134 -- variables.

137 -- In reverse dependency order (innermost binder first)

139 fiExpr :: DynFlags

141 -- as far "inwards" as possible

146 -- See Note [Dead bindings]

153 where

157 to_drop

159 {-

160 Applications: we do float inside applications, mainly because we

161 need to get at all the arguments. The next simplifier run will

162 pull out any silly ones.

163 -}

167 mkTicks ticks $

170 where

173 fun_fvs = freeVarsOf ann_fun

179 to_drop

180 -- Shortcut behaviour: if to_drop is empty,

181 -- sepBindsByDropPoint returns a suitable bunch of empty

182 -- lists without evaluating extra_fvs, and hence without

183 -- peering into each argument

188 _ -> emptyDVarSet

189 -- Don't float the binding for f into f x y z; see Note [Join points]

190 -- for why we *can't* do it when f is a join point. (If f isn't a

191 -- join point, floating it in isn't especially harmful but it's

192 -- useless since the simplifier will immediately float it back out.)

199 | noFloatIntoArg arg arg_ty

201 | otherwise

203 where

206 {- Note [Dead bindings]

207 ~~~~~~~~~~~~~~~~~~~~~~~

208 At a literal we won't usually have any floated bindings; the

209 only way that can happen is if the binding wrapped the literal

210 /in the original input program/. e.g.

211 case x of { DEFAULT -> 1# }

212 But, while this may be unusual it is not actually wrong, and it did

213 once happen (Trac #15696).

215 Note [Do not destroy the let/app invariant]

216 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

217 Watch out for

218 f (x +# y)

219 We don't want to float bindings into here

220 f (case ... of { x -> x +# y })

221 because that might destroy the let/app invariant, which requires

222 unlifted function arguments to be ok-for-speculation.

224 Note [Join points]

225 ~~~~~~~~~~~~~~~~~~

226 Generally, we don't need to worry about join points - there are places we're

227 not allowed to float them, but since they can't have occurrences in those

228 places, we're not tempted.

230 We do need to be careful about jumps, however:

232 joinrec j x y z = ... in

233 jump j a b c

235 Previous versions often floated the definition of a recursive function into its

236 only non-recursive occurrence. But for a join point, this is a disaster:

238 (joinrec j x y z = ... in

239 jump j) a b c -- wrong!

241 Every jump must be exact, so the jump to j must have three arguments. Hence

242 we're careful not to float into the target of a jump (though we can float into

243 the arguments just fine).

245 Note [Floating in past a lambda group]

246 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

247 * We must be careful about floating inside a value lambda.

248 That risks losing laziness.

249 The float-out pass might rescue us, but then again it might not.

251 * We must be careful about type lambdas too. At one time we did, and

252 there is no risk of duplicating work thereby, but we do need to be

253 careful. In particular, here is a bad case (it happened in the

254 cichelli benchmark:

255 let v = ...

256 in let f = /\t -> \a -> ...

257 ==>

258 let f = /\t -> let v = ... in \a -> ...

259 This is bad as now f is an updatable closure (update PAP)

260 and has arity 0.

262 * Hack alert! We only float in through one-shot lambdas,

263 not (as you might guess) through lone big lambdas.

264 Reason: we float *out* past big lambdas (see the test in the Lam

265 case of FloatOut.floatExpr) and we don't want to float straight

266 back in again.

268 It *is* important to float into one-shot lambdas, however;

269 see the remarks with noFloatIntoRhs.

271 So we treat lambda in groups, using the following rule:

273 Float in if (a) there is at least one Id,

274 and (b) there are no non-one-shot Ids

276 Otherwise drop all the bindings outside the group.

278 This is what the 'go' function in the AnnLam case is doing.

280 (Join points are handled similarly: a join point is considered one-shot iff

281 it's non-recursive, so we float only into non-recursive join points.)

283 Urk! if all are tyvars, and we don't float in, we may miss an

284 opportunity to float inside a nested case branch

287 Note [Floating coercions]

288 ~~~~~~~~~~~~~~~~~~~~~~~~~

289 We could, in principle, have a coercion binding like

290 case f x of co { DEFAULT -> e1 e2 }

291 It's not common to have a function that returns a coercion, but nothing

292 in Core prohibits it. If so, 'co' might be mentioned in e1 or e2

293 /only in a type/. E.g. suppose e1 was

294 let (x :: Int |> co) = blah in blah2

297 But, with coercions appearing in types, there is a complication: we

298 might be floating in a "strict let" -- that is, a case. Case expressions

299 mention their return type. We absolutely can't float a coercion binding

300 inward to the point that the type of the expression it's about to wrap

301 mentions the coercion. So we include the union of the sets of free variables

302 of the types of all the drop points involved. If any of the floaters

303 bind a coercion variable mentioned in any of the types, that binder must

304 be dropped right away.

306 -}

309 | noFloatIntoLam bndrs -- Dump it all here

310 -- NB: Must line up with noFloatIntoRhs (AnnLam...); see Trac #7088

316 where

319 {-

320 We don't float lets inwards past an SCC.

321 ToDo: keep info on current cc, and when passing

322 one, if it is not the same, annotate all lets in binds with current

323 cc, change current cc to the new one and float binds into expr.

324 -}

327 | tickish `tickishScopesLike` SoftScope

333 {-

334 For @Lets@, the possible ``drop points'' for the \tr{to_drop}

335 bindings are: (a)~in the body, (b1)~in the RHS of a NonRec binding,

336 or~(b2), in each of the RHSs of the pairs of a @Rec@.

338 Note that we do {\em weird things} with this let's binding. Consider:

339 \begin{verbatim}

340 let

341 w = ...

342 in {

343 let v = ... w ...

344 in ... v .. w ...

345 }

346 \end{verbatim}

347 Look at the inner \tr{let}. As \tr{w} is used in both the bind and

348 body of the inner let, we could panic and leave \tr{w}'s binding where

349 it is. But \tr{v} is floatable further into the body of the inner let, and

350 {\em then} \tr{w} will also be only in the body of that inner let.

352 So: rather than drop \tr{w}'s binding here, we add it onto the list of

353 things to drop in the outer let's body, and let nature take its

354 course.

356 Note [extra_fvs (1): avoid floating into RHS]

357 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

358 Consider let x=\y....t... in body. We do not necessarily want to float

359 a binding for t into the RHS, because it'll immediately be floated out

360 again. (It won't go inside the lambda else we risk losing work.)

361 In letrec, we need to be more careful still. We don't want to transform

362 let x# = y# +# 1#

363 in

364 letrec f = \z. ...x#...f...

365 in ...

366 into

367 letrec f = let x# = y# +# 1# in \z. ...x#...f... in ...

368 because now we can't float the let out again, because a letrec

369 can't have unboxed bindings.

371 So we make "extra_fvs" which is the rhs_fvs of such bindings, and

372 arrange to dump bindings that bind extra_fvs before the entire let.

374 Note [extra_fvs (2): free variables of rules]

375 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

376 Consider

377 let x{rule mentioning y} = rhs in body

378 Here y is not free in rhs or body; but we still want to dump bindings

379 that bind y outside the let. So we augment extra_fvs with the

380 idRuleAndUnfoldingVars of x. No need for type variables, hence not using

381 idFreeVars.

382 -}

386 -- to_drop is in reverse dependency order

387 where

389 body_fvs = freeVarsOf body

391 {- Note [Floating primops]

392 ~~~~~~~~~~~~~~~~~~~~~~~~~~

393 We try to float-in a case expression over an unlifted type. The

394 motivating example was Trac #5658: in particular, this change allows

395 array indexing operations, which have a single DEFAULT alternative

396 without any binders, to be floated inward.

398 SIMD primops for unpacking SIMD vectors into an unboxed tuple of unboxed

399 scalars also need to be floated inward, but unpacks have a single non-DEFAULT

400 alternative that binds the elements of the tuple. We now therefore also support

401 floating in cases with a single alternative that may bind values.

403 But there are wrinkles

405 * Which unlifted cases do we float? See PrimOp.hs

406 Note [PrimOp can_fail and has_side_effects] which explains:

407 - We can float-in can_fail primops, but we can't float them out.

408 - But we can float a has_side_effects primop, but NOT inside a lambda,

409 so for now we don't float them at all.

410 Hence exprOkForSideEffects

412 * Because we can float can-fail primops (array indexing, division) inwards

413 but not outwards, we must be careful not to transform

414 case a /# b of r -> f (F# r)

415 ===>

416 f (case a /# b of r -> F# r)

417 because that creates a new thunk that wasn't there before. And

418 because it can't be floated out (can_fail), the thunk will stay

419 there. Disaster! (This happened in nofib 'simple' and 'scs'.)

421 Solution: only float cases into the branches of other cases, and

422 not into the arguments of an application, or the RHS of a let. This

423 is somewhat conservative, but it's simple. And it still hits the

424 cases like Trac #5658. This is implemented in sepBindsByJoinPoint;

425 if is_case is False we dump all floating cases right here.

427 * Trac #14511 is another example of why we want to restrict float-in

428 of case-expressions. Consider

429 case indexArray# a n of (# r #) -> writeArray# ma i (f r)

430 Now, floating that indexing operation into the (f r) thunk will

431 not create any new thunks, but it will keep the array 'a' alive

432 for much longer than the programmer expected.

434 So again, not floating a case into a let or argument seems like

435 the Right Thing

437 For @Case@, the possible drop points for the 'to_drop'

438 bindings are:

439 (a) inside the scrutinee

440 (b) inside one of the alternatives/default (default FVs always /first/!).

442 -}

447 -- See Note [Floating primops]

450 where

453 scrut' = fiExpr dflags scrut_binds scrut

455 scrut_fvs = freeVarsOf scrut

460 to_drop

464 wrapFloats drop_here2 $

467 where

468 -- Float into the scrut and alts-considered-together just like App

472 to_drop

474 -- Float into the alts with the is_case flag set

479 scrut_fvs = freeVarsOf scrut

481 all_alts_fvs = unionDVarSets alts_fvs

484 -- Delete case_bndr and args from free vars of rhs

485 -- to get free vars of alt

489 ------------------

490 fiBind :: DynFlags

492 -- as far "inwards" as possible

501 -- See Note [extra_fvs (1,2)]

506 where

510 extra_fvs | noFloatIntoRhs NonRecursive id rhs

511 = rule_fvs `unionDVarSet` rhs_fvs

512 | otherwise

513 = rule_fvs

514 -- See Note [extra_fvs (1): avoid floating into RHS]

515 -- No point in floating in only to float straight out again

516 -- We *can't* float into ok-for-speculation unlifted RHSs

517 -- But do float into join points

522 to_drop

524 -- Push rhs_binds into the right hand side of the binding

526 rhs_fvs' = rhs_fvs `unionDVarSet` floatedBindsFVs rhs_binds `unionDVarSet` rule_fvs

527 -- Don't forget the rule_fvs; the binding mentions them!

534 where

538 -- See Note [extra_fvs (1,2)]

539 rule_fvs = mapUnionDVarSet bndrRuleAndUnfoldingVarsDSet ids

540 extra_fvs = rule_fvs `unionDVarSet`

547 to_drop

549 rhs_fvs' = unionDVarSets rhss_fvs `unionDVarSet`

551 rule_fvs -- Don't forget the rule variables!

553 -- Push rhs_binds into the right hand side of the binding

558 fi_bind to_drops pairs

562 ------------------

564 fiRhs dflags to_drop bndr rhs

565 | Just join_arity <- isJoinId_maybe bndr

568 | otherwise

569 = fiExpr dflags to_drop rhs

571 ------------------

574 where

576 -- Don't float inside a non-one-shot lambda

579 -- ^ True if it's a bad idea to float bindings into this RHS

580 noFloatIntoRhs is_rec bndr rhs

581 | isJoinId bndr

584 | otherwise

588 noFloatIntoArg expr expr_ty

589 | isUnliftedType expr_ty

592 | AnnLam bndr e <- expr

596 -- See Note [noFloatInto considerations] wrinkle 2

599 = exprIsTrivial deann_expr || exprIsHNF deann_expr

600 where

603 {- Note [noFloatInto considerations]

604 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

605 When do we want to float bindings into

606 - noFloatIntoRHs: the RHS of a let-binding

607 - noFloatIntoArg: the argument of a function application

609 Definitely don't float in if it has unlifted type; that

610 would destroy the let/app invariant.

612 * Wrinkle 1: do not float in if

613 (a) any non-one-shot value lambdas

614 or (b) all type lambdas

615 In both cases we'll float straight back out again

616 NB: Must line up with fiExpr (AnnLam...); see Trac #7088

618 (a) is important: we /must/ float into a one-shot lambda group

619 (which includes join points). This makes a big difference

620 for things like

621 f x# = let x = I# x#

622 in let j = \() -> ...x...

623 in if <condition> then normal-path else j ()

624 If x is used only in the error case join point, j, we must float the

625 boxing constructor into it, else we box it every time which is very

626 bad news indeed.

628 * Wrinkle 2: for RHSs, do not float into a HNF; we'll just float right

629 back out again... not tragic, but a waste of time.

631 For function arguments we will still end up with this

632 in-then-out stuff; consider

633 letrec x = e in f x

634 Here x is not a HNF, so we'll produce

635 f (letrec x = e in x)

636 which is OK... it's not that common, and we'll end up

637 floating out again, in CorePrep if not earlier.

638 Still, we use exprIsTrivial to catch this case (sigh)

641 ************************************************************************

642 * *

643 \subsection{@sepBindsByDropPoint@}

644 * *

645 ************************************************************************

647 This is the crucial function. The idea is: We have a wad of bindings

648 that we'd like to distribute inside a collection of {\em drop points};

649 insides the alternatives of a \tr{case} would be one example of some

650 drop points; the RHS and body of a non-recursive \tr{let} binding

651 would be another (2-element) collection.

653 So: We're given a list of sets-of-free-variables, one per drop point,

654 and a list of floating-inwards bindings. If a binding can go into

655 only one drop point (without suddenly making something out-of-scope),

656 in it goes. If a binding is used inside {\em multiple} drop points,

657 then it has to go in a you-must-drop-it-above-all-these-drop-points

658 point.

660 We have to maintain the order on these drop-point-related lists.

661 -}

663 -- pprFIB :: FloatInBinds -> SDoc

664 -- pprFIB fibs = text "FIB:" <+> ppr [b | FB _ _ b <- fibs]

666 sepBindsByDropPoint

667 :: DynFlags

670 -- Always at least two long!

673 -- inside any drop point; the rest correspond

674 -- one-to-one with the input list of FV sets

676 -- Every input floater is returned somewhere in the result;

677 -- none are dropped, not even ones which don't seem to be

678 -- free in *any* of the drop-point fvs. Why? Because, for example,

679 -- a binding (let x = E in B) might have a specialised version of

680 -- x (say x') stored inside x, but x' isn't free in E or B.

684 sepBindsByDropPoint dflags is_case drop_pts floaters

688 | otherwise

691 where

695 -- The *first* one in the argument list is the drop_here set

696 -- The FloatInBinds in the lists are in the reverse of

697 -- the normal FloatInBinds order; that is, they are the right way round!

702 = go binds new_boxes

703 where

704 -- "here" means the group of bindings dropped at the top of the fork

709 drop_here = used_here || cant_push

713 cant_push

715 -- Remember n_alts > 1

717 -- floatIsDupable: see Note [Duplicating floats]

720 -- floatIsCase: see Note [Floating primops]

726 fork_boxes used_in_flags

737 {- Note [Duplicating floats]

738 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~

740 For case expressions we duplicate the binding if it is reasonably

741 small, and if it is not used in all the RHSs This is good for

742 situations like

743 let x = I# y in

744 case e of

745 C -> error x

746 D -> error x

747 E -> ...not mentioning x...

749 If the thing is used in all RHSs there is nothing gained,

750 so we don't duplicate then.

751 -}

754 floatedBindsFVs binds = mapUnionDVarSet fbFVs binds

760 -- Remember FloatInBinds is in *reverse* dependency order