1 {-

2 (c) The GRASP/AQUA Project, Glasgow University, 1992-1998

4 ************************************************************************

5 * *

6 \section[OccurAnal]{Occurrence analysis pass}

7 * *

8 ************************************************************************

10 The occurrence analyser re-typechecks a core expression, returning a new

11 core expression with (hopefully) improved usage information.

12 -}

14 {-# LANGUAGE CPP, BangPatterns, MultiWayIf, ViewPatterns #-}

22 import GhcPrelude

24 import CoreSyn

25 import CoreFVs

29 import Id

30 import IdInfo

32 import BasicTypes

34 import Coercion

35 import Type

37 import VarSet

38 import VarEnv

39 import Var

42 , stronglyConnCompFromEdgedVerticesUniq

44 import Unique

45 import UniqFM

46 import UniqSet

47 import Util

48 import Outputable

52 {-

53 ************************************************************************

54 * *

55 occurAnalysePgm, occurAnalyseExpr, occurAnalyseExpr_NoBinderSwap

56 * *

57 ************************************************************************

59 Here's the externally-callable interface:

60 -}

67 occurAnalysePgm this_mod active_unf active_rule imp_rules binds

68 | isEmptyDetails final_usage

69 = occ_anald_binds

74 occ_anald_glommed_binds

75 where

81 imp_rule_edges

83 initial_uds

84 -- It's crucial to re-analyse the glommed-together bindings

85 -- so that we establish the right loop breakers. Otherwise

86 -- we can easily create an infinite loop (#9583 is an example)

88 initial_uds = addManyOccsSet emptyDetails

90 -- The RULES declarations keep things alive!

92 -- Note [Preventing loops due to imported functions rules]

96 | imp_rule <- imp_rules

99 `delVarSetList` ru_bndrs imp_rule

103 go _ []

107 where

110 bs_usage

113 -- Do occurrence analysis, and discard occurrence info returned

120 occurAnalyseExpr' enable_binder_swap expr

122 where

125 {- Note [Plugin rules]

126 ~~~~~~~~~~~~~~~~~~~~~~

127 Conal Elliott (#11651) built a GHC plugin that added some

128 BuiltinRules (for imported Ids) to the mg_rules field of ModGuts, to

129 do some domain-specific transformations that could not be expressed

130 with an ordinary pattern-matching CoreRule. But then we can't extract

131 the dependencies (in imp_rule_edges) from ru_rhs etc, because a

132 BuiltinRule doesn't have any of that stuff.

134 So we simply assume that BuiltinRules have no dependencies, and filter

135 them out from the imp_rule_edges comprehension.

136 -}

138 {-

139 ************************************************************************

140 * *

141 Bindings

142 * *

143 ************************************************************************

145 Note [Recursive bindings: the grand plan]

146 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

147 When we come across a binding group

148 Rec { x1 = r1; ...; xn = rn }

149 we treat it like this (occAnalRecBind):

151 1. Occurrence-analyse each right hand side, and build a

152 "Details" for each binding to capture the results.

154 Wrap the details in a Node (details, node-id, dep-node-ids),

155 where node-id is just the unique of the binder, and

156 dep-node-ids lists all binders on which this binding depends.

157 We'll call these the "scope edges".

158 See Note [Forming the Rec groups].

160 All this is done by makeNode.

162 2. Do SCC-analysis on these Nodes. Each SCC will become a new Rec or

163 NonRec. The key property is that every free variable of a binding

164 is accounted for by the scope edges, so that when we are done

165 everything is still in scope.

167 3. For each Cyclic SCC of the scope-edge SCC-analysis in (2), we

168 identify suitable loop-breakers to ensure that inlining terminates.

169 This is done by occAnalRec.

171 4. To do so we form a new set of Nodes, with the same details, but

172 different edges, the "loop-breaker nodes". The loop-breaker nodes

173 have both more and fewer dependencies than the scope edges

174 (see Note [Choosing loop breakers])

176 More edges: if f calls g, and g has an active rule that mentions h

177 then we add an edge from f -> h

179 Fewer edges: we only include dependencies on active rules, on rule

180 RHSs (not LHSs) and if there is an INLINE pragma only

181 on the stable unfolding (and vice versa). The scope

182 edges must be much more inclusive.

184 5. The "weak fvs" of a node are, by definition:

185 the scope fvs - the loop-breaker fvs

186 See Note [Weak loop breakers], and the nd_weak field of Details

188 6. Having formed the loop-breaker nodes

190 Note [Dead code]

191 ~~~~~~~~~~~~~~~~

192 Dropping dead code for a cyclic Strongly Connected Component is done

193 in a very simple way:

195 the entire SCC is dropped if none of its binders are mentioned

196 in the body; otherwise the whole thing is kept.

198 The key observation is that dead code elimination happens after

199 dependency analysis: so 'occAnalBind' processes SCCs instead of the

200 original term's binding groups.

202 Thus 'occAnalBind' does indeed drop 'f' in an example like

204 letrec f = ...g...

205 g = ...(...g...)...

206 in

207 ...g...

209 when 'g' no longer uses 'f' at all (eg 'f' does not occur in a RULE in

210 'g'). 'occAnalBind' first consumes 'CyclicSCC g' and then it consumes

211 'AcyclicSCC f', where 'body_usage' won't contain 'f'.

213 ------------------------------------------------------------

214 Note [Forming Rec groups]

215 ~~~~~~~~~~~~~~~~~~~~~~~~~

216 We put bindings {f = ef; g = eg } in a Rec group if "f uses g"

217 and "g uses f", no matter how indirectly. We do a SCC analysis

218 with an edge f -> g if "f uses g".

220 More precisely, "f uses g" iff g should be in scope wherever f is.

221 That is, g is free in:

222 a) the rhs 'ef'

223 b) or the RHS of a rule for f (Note [Rules are extra RHSs])

224 c) or the LHS or a rule for f (Note [Rule dependency info])

226 These conditions apply regardless of the activation of the RULE (eg it might be

227 inactive in this phase but become active later). Once a Rec is broken up

228 it can never be put back together, so we must be conservative.

230 The principle is that, regardless of rule firings, every variable is

231 always in scope.

233 * Note [Rules are extra RHSs]

234 ~~~~~~~~~~~~~~~~~~~~~~~~~~~

235 A RULE for 'f' is like an extra RHS for 'f'. That way the "parent"

236 keeps the specialised "children" alive. If the parent dies

237 (because it isn't referenced any more), then the children will die

238 too (unless they are already referenced directly).

240 To that end, we build a Rec group for each cyclic strongly

241 connected component,

242 *treating f's rules as extra RHSs for 'f'*.

243 More concretely, the SCC analysis runs on a graph with an edge

244 from f -> g iff g is mentioned in

245 (a) f's rhs

246 (b) f's RULES

247 These are rec_edges.

249 Under (b) we include variables free in *either* LHS *or* RHS of

250 the rule. The former might seems silly, but see Note [Rule

251 dependency info]. So in Example [eftInt], eftInt and eftIntFB

252 will be put in the same Rec, even though their 'main' RHSs are

253 both non-recursive.

255 * Note [Rule dependency info]

256 ~~~~~~~~~~~~~~~~~~~~~~~~~~~

257 The VarSet in a RuleInfo is used for dependency analysis in the

258 occurrence analyser. We must track free vars in *both* lhs and rhs.

259 Hence use of idRuleVars, rather than idRuleRhsVars in occAnalBind.

260 Why both? Consider

261 x = y

262 RULE f x = v+4

263 Then if we substitute y for x, we'd better do so in the

264 rule's LHS too, so we'd better ensure the RULE appears to mention 'x'

265 as well as 'v'

267 * Note [Rules are visible in their own rec group]

268 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

269 We want the rules for 'f' to be visible in f's right-hand side.

270 And we'd like them to be visible in other functions in f's Rec

271 group. E.g. in Note [Specialisation rules] we want f' rule

272 to be visible in both f's RHS, and fs's RHS.

274 This means that we must simplify the RULEs first, before looking

275 at any of the definitions. This is done by Simplify.simplRecBind,

276 when it calls addLetIdInfo.

278 ------------------------------------------------------------

279 Note [Choosing loop breakers]

280 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

281 Loop breaking is surprisingly subtle. First read the section 4 of

282 "Secrets of the GHC inliner". This describes our basic plan.

283 We avoid infinite inlinings by choosing loop breakers, and

284 ensuring that a loop breaker cuts each loop.

286 See also Note [Inlining and hs-boot files] in ToIface, which deals

287 with a closely related source of infinite loops.

289 Fundamentally, we do SCC analysis on a graph. For each recursive

290 group we choose a loop breaker, delete all edges to that node,

291 re-analyse the SCC, and iterate.

293 But what is the graph? NOT the same graph as was used for Note

294 [Forming Rec groups]! In particular, a RULE is like an equation for

295 'f' that is *always* inlined if it is applicable. We do *not* disable

296 rules for loop-breakers. It's up to whoever makes the rules to make

297 sure that the rules themselves always terminate. See Note [Rules for

298 recursive functions] in Simplify.hs

300 Hence, if

301 f's RHS (or its INLINE template if it has one) mentions g, and

302 g has a RULE that mentions h, and

303 h has a RULE that mentions f

305 then we *must* choose f to be a loop breaker. Example: see Note

306 [Specialisation rules].

308 In general, take the free variables of f's RHS, and augment it with

309 all the variables reachable by RULES from those starting points. That

310 is the whole reason for computing rule_fv_env in occAnalBind. (Of

311 course we only consider free vars that are also binders in this Rec

312 group.) See also Note [Finding rule RHS free vars]

314 Note that when we compute this rule_fv_env, we only consider variables

315 free in the *RHS* of the rule, in contrast to the way we build the

316 Rec group in the first place (Note [Rule dependency info])

318 Note that if 'g' has RHS that mentions 'w', we should add w to

319 g's loop-breaker edges. More concretely there is an edge from f -> g

320 iff

321 (a) g is mentioned in f's RHS `xor` f's INLINE rhs

322 (see Note [Inline rules])

323 (b) or h is mentioned in f's RHS, and

324 g appears in the RHS of an active RULE of h

325 or a transitive sequence of active rules starting with h

327 Why "active rules"? See Note [Finding rule RHS free vars]

329 Note that in Example [eftInt], *neither* eftInt *nor* eftIntFB is

330 chosen as a loop breaker, because their RHSs don't mention each other.

331 And indeed both can be inlined safely.

333 Note again that the edges of the graph we use for computing loop breakers

334 are not the same as the edges we use for computing the Rec blocks.

335 That's why we compute

337 - rec_edges for the Rec block analysis

338 - loop_breaker_nodes for the loop breaker analysis

340 * Note [Finding rule RHS free vars]

341 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

342 Consider this real example from Data Parallel Haskell

343 tagZero :: Array Int -> Array Tag

344 {-# INLINE [1] tagZeroes #-}

347 {-# RULES "tagZero" [~1] forall xs n.

348 pmap fromBool <blah blah> = tagZero xs #-}

353 To make this work, we look for the RHS free vars only for

358 ~~~~~~~~~~~~~~~~~~~~~~~~~

362 RULE f [] = g

364 h = h_rhs

365 g = h

367 }

376 RULE!

400 Inline postInlineUnconditionally

401 strong IAmLoopBreaker False no no

402 weak IAmLoopBreaker True yes no

403 other yes yes

411 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

412 Consider this

415 Note that

418 Now we

419 can get

421 --> 1 + f Int Firing RULE

426 f's definition had been

431 f.

438 RULE f x = f x

442 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

443 Consider:

446 {-# RULES "filterList" forall p. foldr (filterFB (:) p) [] = filter p #-}

448 filterFB c p = ...

462 {-# LANGUAGE RankNTypes #-}

468 {-# INLINABLE filter #-}

473 {-# NOINLINE [0] filterFB #-}

475 filterFB c p x r | p x = x `c` r

478 {-# RULES

479 "filter" [~1] forall p xs. filter p xs = build (\c n -> foldr

480 (filterFB c p) n xs)

481 "filterList" [1] forall p. foldr (filterFB (:) p) [] = filter p

482 #-}

492 filter p xs

504 dies with "ticks exhausted"

514 local functions because the standard occurrence analysis stuff is

520 Ids.

523 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

528 we get

535 To avoid this,

537 give a NOINLINE pragma to the specialised function

540 ~~~~~~~~~~~~~~~

541 RULES for imported Ids can make something at the top refer to something at the bottom:

552 NOTICE that this cannot happen for rules whose head is a

557 Solution:

571 ------------------------------------------------------------

573 ~~~~~~~~~~~~~~~~~~~

594 {-# INLINE [1] foo #-}

598 {-# INLINE [1] bar #-}

601 {-# RULES "foo" [~1] forall x. foo x = bar x #-}

606 a) We may get

613 b)

617 ~~~~~~~~~~~~~~~

623 {-# INLINE [0] eftIntFB #-}

627 {-# RULES

628 "eftInt" [~1] forall x y. eftInt x y = build (\ c n -> eftIntFB c n x y)

629 "eftIntList" [1] eftIntFB (:) [] = eftInt

630 #-}

633 ~~~~~~~~~~~~~~~~~~~~~~~~~~~

638 {-# RULE f (C a) = fs a #-}

649 ------------------------------------------------------------

651 ~~~~~~~~~~~~~~~~~~~~~~~~~~

657 occurrence sites. Dividing the work this way means that the occurrence analyser

658 still only takes one pass, yet one can always tell the difference between a

669 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~

676 {-# RULES "SPEC k 0" k 0 = j #-}

685 {-# RULES "SPEC k 0" forall a. k 0 a = j a #-}

687 So conceivably we could notice that a potential join point would have an

699 statistics to be sure.

701 ------------------------------------------------------------

703 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

715 ------------------------------------------------------------

721 joinrec j x = e Yes No

737 recursive.)

741 -}

743 ------------------------------------------------------------------

744 -- occAnalBind

745 ------------------------------------------------------------------

748 -> TopLevelFlag

749 -> ImpRuleEdges

750 -> CoreBind

756 = occAnalNonRecBind env lvl top_env binder rhs body_usage

758 = occAnalRecBind env lvl top_env pairs body_usage

760 -----------------

763 occAnalNonRecBind env lvl imp_rule_edges binder rhs body_usage

764 | isTyVar binder -- A type let; we don't gather usage info

772 where

774 mb_join_arity = willBeJoinId_maybe tagged_binder

780 -- For a /non-recursive/ join point we can mark all

781 -- its join-lambda as one-shot; and it's a good idea to do so

783 -- Unfoldings

784 -- See Note [Unfoldings and join points]

786 Just unf_usage -> rhs_usage1 `andUDs` unf_usage

787 Nothing -> rhs_usage1

789 -- Rules

790 -- See Note [Rules are extra RHSs] and Note [Rule dependency info]

791 rules_w_uds = occAnalRules env mb_join_arity NonRecursive tagged_binder

795 Nothing -> rhs_usage3

796 Just vs -> addManyOccsSet rhs_usage3 vs

797 -- See Note [Preventing loops due to imported functions rules]

799 -- Final adjustment

802 -----------------

805 occAnalRecBind env lvl imp_rule_edges pairs body_usage

807 -- For a recursive group, we

808 -- * occ-analyse all the RHSs

809 -- * compute strongly-connected components

810 -- * feed those components to occAnalRec

811 -- See Note [Recursive bindings: the grand plan]

812 where

815 stronglyConnCompFromEdgedVerticesUniq nodes

823 {-

824 Note [Unfoldings and join points]

825 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

827 We assume that anything in an unfolding occurs multiple times, since unfoldings

828 are often copied (that's the whole point!). But we still need to track tail

829 calls for the purpose of finding join points.

830 -}

832 -----------------------------

834 -> SCC Details

838 -- The NonRec case is just like a Let (NonRec ...) above

848 where

851 rhs_bndrs rhs_uds

853 -- The Rec case is the interesting one

854 -- See Note [Recursive bindings: the grand plan]

855 -- See Note [Loop breaking]

862 -- [ text "weak_fvs" <+> ppr weak_fvs

863 -- , text "lb nodes" <+> ppr loop_breaker_nodes])

866 where

868 bndr_set = mkVarSet bndrs

870 ------------------------------

871 -- See Note [Choosing loop breakers] for loop_breaker_nodes

872 final_uds :: UsageDetails

875 = mkLoopBreakerNodes env lvl bndr_set body_uds details_s

877 ------------------------------

878 weak_fvs :: VarSet

879 weak_fvs = mapUnionVarSet nd_weak details_s

881 ---------------------------

882 -- Now reconstruct the cycle

886 -- If weak_fvs is empty, the loop_breaker_nodes will include

887 -- all the edges in the original scope edges [remember,

888 -- weak_fvs is the difference between scope edges and

889 -- lb-edges], so a fresh SCC computation would yield a

890 -- single CyclicSCC result; and reOrderNodes deals with

891 -- exactly that case

894 ------------------------------------------------------------------

895 -- Loop breaking

896 ------------------------------------------------------------------

903 -- See Note [Weak loop breakers]

907 {-

908 loopBreakNodes is applied to the list of nodes for a cyclic strongly

909 connected component (there's guaranteed to be a cycle). It returns

910 the same nodes, but

911 a) in a better order,

912 b) with some of the Ids having a IAmALoopBreaker pragma

914 The "loop-breaker" Ids are sufficient to break all cycles in the SCC. This means

915 that the simplifier can guarantee not to loop provided it never records an inlining

916 for these no-inline guys.

918 Furthermore, the order of the binds is such that if we neglect dependencies

919 on the no-inline Ids then the binds are topologically sorted. This means

920 that the simplifier will generally do a good job if it works from top bottom,

921 recording inlinings for any Ids which aren't marked as "no-inline" as it goes.

922 -}

924 -- Return the bindings sorted into a plausible order, and marked with loop breakers.

925 loopBreakNodes depth bndr_set weak_fvs nodes binds

928 where

932 loop_break_scc scc binds

935 CyclicSCC nodes -> reOrderNodes depth bndr_set weak_fvs nodes binds

937 ----------------------------------

939 -- Choose a loop breaker, mark it no-inline,

940 -- and call loopBreakNodes on the rest

945 -- , text "chosen" <+> ppr chosen_nodes ]) $

946 loopBreakNodes new_depth bndr_set weak_fvs unchosen $

948 where

956 -- After two iterations (d=0, d=1) give up

957 -- and approximate, returning to d=0

962 where

966 -- See Note [Weak loop breakers]

971 where

975 ----------------------------------

977 -- so approximate

983 -- This loop looks for the bind with the lowest score

984 -- to pick as the loop breaker. The rest accumulate in

985 chooseLoopBreaker _ _ loop_nodes acc []

988 -- If approximate_loop_breaker is True, we pick *all*

989 -- nodes with lowest score, else just one

990 -- See Note [Complexity of loop breaking]

992 | approx_lb

996 | sc `betterLB` loop_sc -- Better score so pick this new one

1001 where

1004 {-

1005 Note [Complexity of loop breaking]

1006 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1007 The loop-breaking algorithm knocks out one binder at a time, and

1008 performs a new SCC analysis on the remaining binders. That can

1009 behave very badly in tightly-coupled groups of bindings; in the

1010 worst case it can be (N**2)*log N, because it does a full SCC

1011 on N, then N-1, then N-2 and so on.

1013 To avoid this, we switch plans after 2 (or whatever) attempts:

1014 Plan A: pick one binder with the lowest score, make it

1015 a loop breaker, and try again

1016 Plan B: pick *all* binders with the lowest score, make them

1017 all loop breakers, and try again

1018 Since there are only a small finite number of scores, this will

1019 terminate in a constant number of iterations, rather than O(N)

1020 iterations.

1022 You might thing that it's very unlikely, but RULES make it much

1023 more likely. Here's a real example from #1969:

1024 Rec { $dm = \d.\x. op d

1025 {-# RULES forall d. $dm Int d = $s$dm1

1026 forall d. $dm Bool d = $s$dm2 #-}

1030 opInt = $dm dInt

1031 opBool = $dm dBool

1041 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1054 {-# INLINE f #-}

1061 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1065 We give DFuns a higher score than ordinary CONLIKE things because

1071 {-# DFUN #-}

1073 }

1079 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1091 Inlining dictionaries is really essential to unravelling

1095 ~~~~~~~~~~~~~~~~~~~~~~~~~

1098 which generated code like this:

1118 by an INLINE pragma. For these we record that anything which occurs

1128 ************************************************************************

1129 * *

1130 Making nodes

1131 * *

1132 ************************************************************************

1133 -}

1137 noImpRuleEdges :: ImpRuleEdges

1138 noImpRuleEdges = emptyVarEnv

1141 -- The Unique key is gotten from the Id

1142 data Details

1146 -- INVARIANT: (nd_rhs_bndrs nd, _) ==

1147 -- collectBinders (nd_rhs nd)

1150 -- ignoring phase (ie assuming all are active)

1151 -- See Note [Forming Rec groups]

1154 -- the stable unfolding (if present and active)

1155 -- or the RHS (if not)

1156 -- but excluding any RULES

1157 -- This is the IdSet that may be used if the Id is inlined

1160 -- but are *not* in nd_inl. These are the ones whose

1161 -- dependencies might not be respected by loop_breaker_nodes

1162 -- See Note [Weak loop breakers]

1167 }

1177 ])

1179 -- The NodeScore is compared lexicographically;

1180 -- e.g. lower rank wins regardless of size

1183 -- Maxes out at maxExprSize; we just use it to prioritise

1184 -- small functions

1186 -- True => more likely to be picked

1187 -- Note [Loop breakers, node scoring, and stability]

1194 -- See Note [Recursive bindings: the grand plan]

1197 -- It's OK to use nonDetKeysUniqSet here as stronglyConnCompFromEdgedVerticesR

1198 -- is still deterministic with edges in nondeterministic order as

1199 -- explained in Note [Deterministic SCC] in Digraph.

1200 where

1210 -- Constructing the edges for the main Rec computation

1211 -- See Note [Forming Rec groups]

1216 -- Note [Rules are extra RHSs]

1217 -- Note [Rule dependency info]

1219 Just unf_uds -> rhs_usage2 `andUDs` unf_uds

1220 Nothing -> rhs_usage2

1221 node_fvs = udFreeVars bndr_set rhs_usage3

1223 -- Finding the free variables of the rules

1232 -- See Note [Preventing loops due to imported functions rules]

1239 -- Finding the usage details of the INLINE pragma (if any)

1240 mb_unf_uds = occAnalUnfolding env Recursive bndr

1242 -- Find the "nd_inl" free vars; for the loop-breaker phase

1245 Just unf_uds -> udFreeVars bndr_set unf_uds

1246 -- We could check for an *active* INLINE (returning

1247 -- emptyVarSet for an inactive one), but is_active

1248 -- isn't the right thing (it tells about

1249 -- RULE activation), so we'd need more plumbing

1252 -> VarSet

1257 -- Does four things

1258 -- a) tag each binder with its occurrence info

1259 -- b) add a NodeScore to each node

1260 -- c) make a Node with the right dependency edges for

1261 -- the loop-breaker SCC analysis

1262 -- d) adjust each RHS's usage details according to

1263 -- the binder's (new) shotness and join-point-hood

1264 mkLoopBreakerNodes env lvl bndr_set body_uds details_s

1266 where

1274 -- It's OK to use nonDetKeysUniqSet here as

1275 -- stronglyConnCompFromEdgedVerticesR is still deterministic with edges

1276 -- in nondeterministic order as explained in

1277 -- Note [Deterministic SCC] in Digraph.

1278 where

1281 lb_deps = extendFvs_ rule_fv_env inl_fvs

1283 rule_fv_env :: IdEnv IdSet

1284 -- Maps a variable f to the variables from this group

1285 -- mentioned in RHS of active rules for f

1286 -- Domain is *subset* of bound vars (others have no rule fvs)

1288 init_rule_fvs -- See Note [Finding rule RHS free vars]

1295 ------------------------------------------

1296 nodeScore :: OccEnv

1301 -> NodeScore

1302 nodeScore env old_bndr new_bndr bind_rhs lb_deps

1306 | old_bndr `elemVarSet` lb_deps -- Self-recursive things are great loop breakers

1312 | exprIsTrivial rhs

1314 -- Used to have also: && not (isExportedId bndr)

1315 -- But I found this sometimes cost an extra iteration when we have

1316 -- rec { d = (a,b); a = ...df...; b = ...df...; df = d }

1317 -- where df is the exported dictionary. Then df makes a really

1318 -- bad choice for loop breaker

1321 -- Never choose a DFun as a loop breaker

1322 -- Note [DFuns should not be loop breakers]

1325 -- Data structures are more important than INLINE pragmas

1326 -- so that dictionary/method recursion unravels

1331 | is_con_app rhs -- Data types help with cases:

1334 | isStableUnfolding id_unfolding

1335 , can_unfold

1341 | can_unfold -- The Id has some kind of unfolding

1344 | otherwise

1347 where

1354 | isStableSource src

1355 -> unf_rhs

1356 _ -> bind_rhs

1357 -- 'bind_rhs' is irrelevant for inlining things with a stable unfolding

1361 -> size

1362 _ -> cheapExprSize rhs

1364 can_unfold = canUnfold id_unfolding

1365 id_unfolding = realIdUnfolding old_bndr

1366 -- realIdUnfolding: Ignore loop-breaker-ness here because

1367 -- that is what we are setting!

1369 -- Checking for a constructor application

1370 -- Cheap and cheerful; the simplifier moves casts out of the way

1371 -- The lambda case is important to spot x = /\a. C (f a)

1372 -- which comes up when C is a dictionary constructor and

1373 -- f is a default method.

1374 -- Example: the instance for Show (ST s a) in GHC.ST

1375 --

1376 -- However we *also* treat (\x. C p q) as a con-app-like thing,

1377 -- Note [Closure conversion]

1388 -- Maxes out at maxExprSize

1389 cheapExprSize e

1391 where

1403 | isTyVar b = go1 n e

1408 gos n [] = n

1413 -- If n1 `betterLB` n2 then choose n1 as the loop breaker

1422 {- Note [Self-recursion and loop breakers]

1423 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1424 If we have

1425 rec { f = ...f...g...

1426 ; g = .....f... }

1427 then 'f' has to be a loop breaker anyway, so we may as well choose it

1428 right away, so that g can inline freely.

1430 This is really just a cheap hack. Consider

1431 rec { f = ...g...

1432 ; g = ..f..h...

1433 ; h = ...f....}

1434 Here f or g are better loop breakers than h; but we might accidentally

1435 choose h. Finding the minimal set of loop breakers is hard.

1437 Note [Loop breakers, node scoring, and stability]

1438 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1439 To choose a loop breaker, we give a NodeScore to each node in the SCC,

1440 and pick the one with the best score (according to 'betterLB').

1442 We need to be jolly careful (#12425, #12234) about the stability

1443 of this choice. Suppose we have

1445 let rec { f = ...g...g...

1446 ; g = ...f...f... }

1447 in

1448 case x of

1449 True -> ...f..

1450 False -> ..f...

1452 In each iteration of the simplifier the occurrence analyser OccAnal

1453 chooses a loop breaker. Suppose in iteration 1 it choose g as the loop

1454 breaker. That means it is free to inline f.

1456 Suppose that GHC decides to inline f in the branches of the case, but

1457 (for some reason; eg it is not saturated) in the rhs of g. So we get

1459 let rec { f = ...g...g...

1460 ; g = ...f...f... }

1461 in

1462 case x of

1463 True -> ...g...g.....

1464 False -> ..g..g....

1466 Now suppose that, for some reason, in the next iteration the occurrence

1467 analyser chooses f as the loop breaker, so it can freely inline g. And

1468 again for some reason the simplifier inlines g at its calls in the case

1469 branches, but not in the RHS of f. Then we get

1471 let rec { f = ...g...g...

1472 ; g = ...f...f... }

1473 in

1474 case x of

1475 True -> ...(...f...f...)...(...f..f..).....

1476 False -> ..(...f...f...)...(..f..f...)....

1478 You can see where this is going! Each iteration of the simplifier

1479 doubles the number of calls to f or g. No wonder GHC is slow!

1481 (In the particular example in comment:3 of #12425, f and g are the two

1482 mutually recursive fmap instances for CondT and Result. They are both

1483 marked INLINE which, oddly, is why they don't inline in each other's

1484 RHS, because the call there is not saturated.)

1486 The root cause is that we flip-flop on our choice of loop breaker. I

1487 always thought it didn't matter, and indeed for any single iteration

1488 to terminate, it doesn't matter. But when we iterate, it matters a

1489 lot!!

1491 So The Plan is this:

1492 If there is a tie, choose the node that

1493 was a loop breaker last time round

1495 Hence the is_lb field of NodeScore

1497 ************************************************************************

1498 * *

1499 Right hand sides

1500 * *

1501 ************************************************************************

1502 -}

1506 -- Returned usage details covers only the RHS,

1507 -- and *not* the RULE or INLINE template for the Id

1508 occAnalRhs env Recursive _ bndrs body

1509 = occAnalRecRhs env bndrs body

1510 occAnalRhs env NonRecursive id bndrs body

1515 -- Returned usage details covers only the RHS,

1516 -- and *not* the RULE or INLINE template for the Id

1519 occAnalNonRecRhs :: OccEnv

1521 -- Binder is already tagged with occurrence info

1523 -- Returned usage details covers only the RHS,

1524 -- and *not* the RULE or INLINE template for the Id

1525 occAnalNonRecRhs env bndr bndrs body

1526 = occAnalLamOrRhs rhs_env bndrs body

1527 where

1532 -- See Note [Sources of one-shot information]

1535 certainly_inline -- See Note [Cascading inlines]

1541 is_join_point = isAlwaysTailCalled occ

1542 -- Like (isJoinId bndr) but happens one step earlier

1543 -- c.f. willBeJoinId_maybe

1545 occ = idOccInfo bndr

1546 dmd = idDemandInfo bndr

1550 occAnalUnfolding :: OccEnv

1551 -> RecFlag

1552 -> Id

1554 -- Just the analysis, not a new unfolding. The unfolding

1555 -- got analysed when it was created and we don't need to

1556 -- update it.

1557 occAnalUnfolding env rec_flag id

1561 -> Nothing

1562 | otherwise

1564 where

1570 where

1573 _ -> Nothing

1575 occAnalRules :: OccEnv

1577 -- point, what its join arity is (or WOULD

1578 -- become). See Note [Rules and join points].

1579 -> RecFlag

1580 -> Id

1584 occAnalRules env mb_expected_join_arity rec_flag id

1587 where

1590 where

1595 -- Note [Rules are extra RHSs]

1596 -- Note [Rule dependency info]

1599 occ_anal_rule _

1602 adjust_tail_info args uds -- see Note [Rules and join points]

1604 Just ar | args `lengthIs` ar -> uds

1605 _ -> markAllNonTailCalled uds

1606 {- Note [Join point RHSs]

1607 ~~~~~~~~~~~~~~~~~~~~~~~~~

1608 Consider

1609 x = e

1610 join j = Just x

1612 We want to inline x into j right away, so we don't want to give

1613 the join point a RhsCtxt (#14137). It's not a huge deal, because

1614 the FloatIn pass knows to float into join point RHSs; and the simplifier

1615 does not float things out of join point RHSs. But it's a simple, cheap

1616 thing to do. See #14137.

1618 Note [Cascading inlines]

1619 ~~~~~~~~~~~~~~~~~~~~~~~~

1620 By default we use an rhsCtxt for the RHS of a binding. This tells the

1621 occ anal n that it's looking at an RHS, which has an effect in

1622 occAnalApp. In particular, for constructor applications, it makes

1623 the arguments appear to have NoOccInfo, so that we don't inline into

1624 them. Thus x = f y

1625 k = Just x

1626 we do not want to inline x.

1628 But there's a problem. Consider

1629 x1 = a0 : []

1630 x2 = a1 : x1

1631 x3 = a2 : x2

1632 g = f x3

1633 First time round, it looks as if x1 and x2 occur as an arg of a

1634 let-bound constructor ==> give them a many-occurrence.

1635 But then x3 is inlined (unconditionally as it happens) and

1636 next time round, x2 will be, and the next time round x1 will be

1637 Result: multiple simplifier iterations. Sigh.

1639 So, when analysing the RHS of x3 we notice that x3 will itself

1640 definitely inline the next time round, and so we analyse x3's rhs in

1641 an ordinary context, not rhsCtxt. Hence the "certainly_inline" stuff.

1643 Annoyingly, we have to approximate SimplUtils.preInlineUnconditionally.

1644 If (a) the RHS is expandable (see isExpandableApp in occAnalApp), and

1645 (b) certainly_inline says "yes" when preInlineUnconditionally says "no"

1646 then the simplifier iterates indefinitely:

1647 x = f y

1648 k = Just x -- We decide that k is 'certainly_inline'

1649 v = ...k... -- but preInlineUnconditionally doesn't inline it

1650 inline ==>

1651 k = Just (f y)

1652 v = ...k...

1653 float ==>

1654 x1 = f y

1655 k = Just x1

1656 v = ...k...

1658 This is worse than the slow cascade, so we only want to say "certainly_inline"

1659 if it really is certain. Look at the note with preInlineUnconditionally

1660 for the various clauses.

1663 ************************************************************************

1664 * *

1665 Expressions

1666 * *

1667 ************************************************************************

1668 -}

1670 occAnal :: OccEnv

1671 -> CoreExpr

1673 CoreExpr)

1678 -- At one stage, I gathered the idRuleVars for the variable here too,

1679 -- which in a way is the right thing to do.

1680 -- But that went wrong right after specialisation, when

1681 -- the *occurrences* of the overloaded function didn't have any

1682 -- rules in them, so the *specialised* versions looked as if they

1683 -- weren't used at all.

1687 -- See Note [Gather occurrences of coercion variables]

1689 {-

1690 Note [Gather occurrences of coercion variables]

1691 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1692 We need to gather info about what coercion variables appear, so that

1693 we can sort them into the right place when doing dependency analysis.

1694 -}

1697 | SourceNote{} <- tickish

1699 -- SourceNotes are best-effort; so we just proceed as usual.

1700 -- If we drop a tick due to the issues described below it's

1701 -- not the end of the world.

1703 | tickish `tickishScopesLike` SoftScope

1706 | Breakpoint _ ids <- tickish

1708 -- never substitute for any of the Ids in a Breakpoint

1710 | otherwise

1712 where

1714 -- for a non-soft tick scope, we can inline lambdas only

1716 -- TODO There may be ways to make ticks and join points play

1717 -- nicer together, but right now there are problems:

1718 -- let j x = ... in tick<t> (j 1)

1719 -- Making j a join point may cause the simplifier to drop t

1720 -- (if the tick is put into the continuation). So we don't

1721 -- count j 1 as a tail call.

1722 -- See #14242.

1727 -- usage1: if we see let x = y `cast` co

1728 -- then mark y as 'Many' so that we don't

1729 -- immediately inline y again.

1731 -- usage2: see Note [Gather occurrences of coercion variables]

1733 }

1738 -- Ignore type variables altogether

1739 -- (a) occurrences inside type lambdas only not marked as InsideLam

1740 -- (b) type variables not in environment

1743 | isTyVar x

1746 }

1748 -- For value lambdas we do a special hack. Consider

1749 -- (\x. \y. ...x...)

1750 -- If we did nothing, x is used inside the \y, so would be marked

1751 -- as dangerous to dup. But in the common case where the abstraction

1752 -- is applied to two arguments this is over-pessimistic.

1753 -- So instead, we just mark each binder with its occurrence

1754 -- info in the *body* of the multiple lambda.

1755 -- Then, the simplifier is careful when partially applying lambdas.

1759 let

1761 usage1 = markAllNonTailCalled usage

1763 final_usage | one_shot_gp = usage1

1765 in

1767 where

1773 let

1776 total_usage = markAllNonTailCalled scrut_usage `andUDs` alts_usage1

1777 -- Alts can have tail calls, but the scrutinee can't

1778 in

1780 where

1781 alt_env = mkAltEnv env scrut bndr

1782 occ_anal_alt = occAnalAlt alt_env

1787 -- The 'True' says that the variable occurs in an interesting

1788 -- context; the case has at least one non-default alternative

1790 | t `tickishScopesLike` SoftScope

1791 -- No reason to not look through all ticks here, but only

1792 -- for soft-scoped ticks we can do so without having to

1793 -- update returned occurance info (see occAnal)

1796 occ_anal_scrut scrut _alts

1801 case occAnalBind env NotTopLevel

1802 noImpRuleEdges bind

1807 occAnalArgs _ [] _

1811 | isTypeArg arg

1815 | otherwise

1821 {-

1822 Applications are dealt with specially because we want

1823 the "build hack" to work.

1825 Note [Arguments of let-bound constructors]

1826 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1827 Consider

1828 f x = let y = expensive x in

1829 let z = (True,y) in

1830 (case z of {(p,q)->q}, case z of {(p,q)->q})

1831 We feel free to duplicate the WHNF (True,y), but that means

1832 that y may be duplicated thereby.

1834 If we aren't careful we duplicate the (expensive x) call!

1835 Constructors are rather like lambdas in this way.

1836 -}

1838 occAnalApp :: OccEnv

1844 where

1845 uds = fun_uds `andUDs` final_args_uds

1848 !final_args_uds

1850 markAllInsideLam args_uds

1852 -- We mark the free vars of the argument of a constructor or PAP

1853 -- as "inside-lambda", if it is the RHS of a let(rec).

1854 -- This means that nothing gets inlined into a constructor or PAP

1855 -- argument position, which is what we want. Typically those

1856 -- constructor arguments are just variables, or trivial expressions.

1857 -- We use inside-lam because it's like eta-expanding the PAP.

1858 --

1859 -- This is the *whole point* of the isRhsEnv predicate

1860 -- See Note [Arguments of let-bound constructors]

1862 n_val_args = valArgCount args

1865 is_exp = isExpandableApp fun n_val_args

1866 -- See Note [CONLIKE pragma] in BasicTypes

1867 -- The definition of is_exp should match that in Simplify.prepareRhs

1872 -- See Note [Sources of one-shot information], bullet point A']

1877 where

1879 -- The addAppCtxt is a bit cunning. One iteration of the simplifier

1880 -- often leaves behind beta redexs like

1881 -- (\x y -> e) a1 a2

1882 -- Here we would like to mark x,y as one-shot, and treat the whole

1883 -- thing much like a let. We do this by pushing some True items

1884 -- onto the context stack.

1889 -> UsageDetails

1893 {-

1894 Note [Sources of one-shot information]

1895 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1896 The occurrence analyser obtains one-shot-lambda information from two sources:

1898 A: Saturated applications: eg f e1 .. en

1900 In general, given a call (f e1 .. en) we can propagate one-shot info from

1901 f's strictness signature into e1 .. en, but /only/ if n is enough to

1902 saturate the strictness signature. A strictness signature like

1904 f :: C1(C1(L))LS

1906 means that *if f is applied to three arguments* then it will guarantee to

1907 call its first argument at most once, and to call the result of that at

1908 most once. But if f has fewer than three arguments, all bets are off; e.g.

1910 map (f (\x y. expensive) e2) xs

1912 Here the \x y abstraction may be called many times (once for each element of

1913 xs) so we should not mark x and y as one-shot. But if it was

1915 map (f (\x y. expensive) 3 2) xs

1917 then the first argument of f will be called at most once.

1919 The one-shot info, derived from f's strictness signature, is

1920 computed by 'argsOneShots', called in occAnalApp.

1922 A': Non-obviously saturated applications: eg build (f (\x y -> expensive))

1923 where f is as above.

1925 In this case, f is only manifestly applied to one argument, so it does not

1926 look saturated. So by the previous point, we should not use its strictness

1927 signature to learn about the one-shotness of \x y. But in this case we can:

1928 build is fully applied, so we may use its strictness signature; and from

1929 that we learn that build calls its argument with two arguments *at most once*.

1931 So there is really only one call to f, and it will have three arguments. In

1932 that sense, f is saturated, and we may proceed as described above.

1934 Hence the computation of 'guaranteed_val_args' in occAnalApp, using

1935 '(occ_one_shots env)'. See also #13227, comment:9

1937 B: Let-bindings: eg let f = \c. let ... in \n -> blah

1938 in (build f, build f)

1940 Propagate one-shot info from the demanand-info on 'f' to the

1941 lambdas in its RHS (which may not be syntactically at the top)

1943 This information must have come from a previous run of the demanand

1944 analyser.

1946 Previously, the demand analyser would *also* set the one-shot information, but

1947 that code was buggy (see #11770), so doing it only in on place, namely here, is

1948 saner.

1950 Note [OneShots]

1951 ~~~~~~~~~~~~~~~

1952 When analysing an expression, the occ_one_shots argument contains information

1953 about how the function is being used. The length of the list indicates

1954 how many arguments will eventually be passed to the analysed expression,

1955 and the OneShotInfo indicates whether this application is once or multiple times.

1957 Example:

1959 Context of f occ_one_shots when analysing f

1961 f 1 2 [OneShot, OneShot]

1962 map (f 1) [OneShot, NoOneShotInfo]

1963 build f [OneShot, OneShot]

1964 f 1 2 `seq` f 2 1 [NoOneShotInfo, OneShot]

1966 Note [Binders in case alternatives]

1967 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1968 Consider

1969 case x of y { (a,b) -> f y }

1970 We treat 'a', 'b' as dead, because they don't physically occur in the

1971 case alternative. (Indeed, a variable is dead iff it doesn't occur in

1972 its scope in the output of OccAnal.) It really helps to know when

1973 binders are unused. See esp the call to isDeadBinder in

1974 Simplify.mkDupableAlt

1976 In this example, though, the Simplifier will bring 'a' and 'b' back to

1977 life, beause it binds 'y' to (a,b) (imagine got inlined and

1978 scrutinised y).

1979 -}

1983 occAnalLamOrRhs env [] body

1985 -- RHS of thunk or nullary join point

1987 | isTyVar bndr

1989 -- \(@ x) -> C @x (f @x)

1990 -- (see the beginning of Note [Cascading inlines]).

1993 occAnalLamOrRhs env binders body

1995 let

1997 -- Use binders' to put one-shot info on the lambdas

1998 in

2000 where

2004 -> CoreAlt

2008 let

2010 -- See Note [Binders in case alternatives]

2012 in

2015 wrapAltRHS :: OccEnv

2022 | occ_binder_swap env

2024 -- handles condition (a) in Note [Binder swap]

2028 where

2031 -- The rhs of the let may include coercion variables

2032 -- if the scrutinee was a cast, so we must gather their

2033 -- usage. See Note [Gather occurrences of coercion variables]

2034 -- Moreover, the rhs of the let may mention the case-binder, and

2035 -- we want to gather its occ-info as well

2040 wrapAltRHS _ _ alt_usg _ alt_rhs

2043 {-

2044 ************************************************************************

2045 * *

2046 OccEnv

2047 * *

2048 ************************************************************************

2049 -}

2051 data OccEnv

2059 -- See Note [Finding rule RHS free vars]

2062 -- See CorePrep Note [Dead code in CorePrep]

2063 }

2067 -----------------------------

2068 -- OccEncl is used to control whether to inline into constructor arguments

2069 -- For example:

2070 -- x = (p,q) -- Don't inline p or q

2071 -- y = /\a -> (p a, q a) -- Still don't inline p or q

2072 -- z = f (p,q) -- Do inline p,q; it may make a rule fire

2073 -- So OccEncl tells enough about the context to know what to do when

2074 -- we encounter a constructor application or PAP.

2076 data OccEncl

2078 -- Don't inline into constructor args here

2079 | OccVanilla -- Argument of function, body of lambda, scruintee of case etc.

2080 -- Do inline into constructor args here

2086 -- See note [OneShots]

2089 initOccEnv :: OccEnv

2090 initOccEnv

2094 -- To be conservative, we say that all

2095 -- inlines and rules are active

2107 argCtxt env []

2117 -> ( OccEnv

2119 -- The result binders have one-shot-ness set that they might not have had originally.

2120 -- This happens in (build (\c n -> e)). Here the occurrence analyser

2121 -- linearity context knows that c,n are one-shot, and it records that fact in

2122 -- the binder. This is useful to guide subsequent float-in/float-out tranformations

2126 where

2127 go ctxt [] rev_bndrs

2131 go [] bndrs rev_bndrs

2138 where

2139 bndr' = updOneShotInfo bndr one_shot

2140 -- Use updOneShotInfo, not setOneShotInfo, as pre-existing

2141 -- one-shot info might be better than what we can infer, e.g.

2142 -- due to explicit use of the magic 'oneShot' function.

2143 -- See Note [The oneShot function]

2147 -- Mark the lambdas of a non-recursive join point as one-shot.

2148 -- This is good to prevent gratuitous float-out etc

2149 markJoinOneShots mb_join_arity bndrs

2151 Nothing -> bndrs

2152 Just n -> go n bndrs

2153 where

2156 -- e.g. let j = case ... in j True

2157 -- This will become an arity-1 join point after the

2158 -- simplifier has eta-expanded it; but it may not have

2159 -- enough lambdas /yet/. (Lint checks that JoinIds do

2160 -- have enough lambdas.)

2162 where

2171 -- If (f,g), (g,h) are in the input, then (f,h) is in the output

2172 -- as well as (f,g), (g,h)

2173 transClosureFV env

2174 | no_change = env

2176 where

2178 -- It's OK to use nonDetUFMToList here because we'll forget the

2179 -- ordering by creating a new set with listToUFM

2183 where

2186 -------------

2191 -- (extendFVs env s) returns

2192 -- (s `union` env(s), env(s) `subset` s)

2193 extendFvs env s

2194 | isNullUFM env

2196 | otherwise

2198 where

2201 -- It's OK to use nonDetFoldUFM here because unionVarSet commutes

2204 {-

2205 ************************************************************************

2206 * *

2207 Binder swap

2208 * *

2209 ************************************************************************

2211 Note [Binder swap]

2212 ~~~~~~~~~~~~~~~~~~

2213 The "binder swap" tranformation swaps occurence of the

2214 scrutinee of a case for occurrences of the case-binder:

2216 (1) case x of b { pi -> ri }

2217 ==>

2218 case x of b { pi -> let x=b in ri }

2220 (2) case (x |> co) of b { pi -> ri }

2221 ==>

2222 case (x |> co) of b { pi -> let x = b |> sym co in ri }

2224 In both cases, the trivial 'let' can be eliminated by the

2225 immediately following simplifier pass.

2227 There are two reasons for making this swap:

2229 (A) It reduces the number of occurrences of the scrutinee, x.

2230 That in turn might reduce its occurrences to one, so we

2231 can inline it and save an allocation. E.g.

2232 let x = factorial y in case x of b { I# v -> ...x... }

2233 If we replace 'x' by 'b' in the alternative we get

2234 let x = factorial y in case x of b { I# v -> ...b... }

2235 and now we can inline 'x', thus

2236 case (factorial y) of b { I# v -> ...b... }

2238 (B) The case-binder b has unfolding information; in the

2239 example above we know that b = I# v. That in turn allows

2240 nested cases to simplify. Consider

2241 case x of b { I# v ->

2242 ...(case x of b2 { I# v2 -> rhs })...

2243 If we replace 'x' by 'b' in the alternative we get

2244 case x of b { I# v ->

2245 ...(case b of b2 { I# v2 -> rhs })...

2246 and now it is trivial to simplify the inner case:

2247 case x of b { I# v ->

2248 ...(let b2 = b in rhs)...

2250 The same can happen even if the scrutinee is a variable

2251 with a cast: see Note [Case of cast]

2253 In both cases, in a particular alternative (pi -> ri), we only

2254 add the binding if

2255 (a) x occurs free in (pi -> ri)

2256 (ie it occurs in ri, but is not bound in pi)

2257 (b) the pi does not bind b (or the free vars of co)

2258 We need (a) and (b) for the inserted binding to be correct.

2260 For the alternatives where we inject the binding, we can transfer

2261 all x's OccInfo to b. And that is the point.

2263 Notice that

2264 * The deliberate shadowing of 'x'.

2265 * That (a) rapidly becomes false, so no bindings are injected.

2267 The reason for doing these transformations /here in the occurrence

2268 analyser/ is because it allows us to adjust the OccInfo for 'x' and

2269 'b' as we go.

2271 * Suppose the only occurrences of 'x' are the scrutinee and in the

2272 ri; then this transformation makes it occur just once, and hence

2273 get inlined right away.

2275 * If instead we do this in the Simplifier, we don't know whether 'x'

2276 is used in ri, so we are forced to pessimistically zap b's OccInfo

2277 even though it is typically dead (ie neither it nor x appear in

2278 the ri). There's nothing actually wrong with zapping it, except

2279 that it's kind of nice to know which variables are dead. My nose

2280 tells me to keep this information as robustly as possible.

2282 The Maybe (Id,CoreExpr) passed to occAnalAlt is the extra let-binding

2283 {x=b}; it's Nothing if the binder-swap doesn't happen.

2285 There is a danger though. Consider

2286 let v = x +# y

2287 in case (f v) of w -> ...v...v...

2288 And suppose that (f v) expands to just v. Then we'd like to

2289 use 'w' instead of 'v' in the alternative. But it may be too

2290 late; we may have substituted the (cheap) x+#y for v in the

2291 same simplifier pass that reduced (f v) to v.

2293 I think this is just too bad. CSE will recover some of it.

2295 Note [Case of cast]

2296 ~~~~~~~~~~~~~~~~~~~

2297 Consider case (x `cast` co) of b { I# ->

2298 ... (case (x `cast` co) of {...}) ...

2299 We'd like to eliminate the inner case. That is the motivation for

2300 equation (2) in Note [Binder swap]. When we get to the inner case, we

2301 inline x, cancel the casts, and away we go.

2303 Note [Binder swap on GlobalId scrutinees]

2304 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

2305 When the scrutinee is a GlobalId we must take care in two ways

2307 i) In order to *know* whether 'x' occurs free in the RHS, we need its

2308 occurrence info. BUT, we don't gather occurrence info for

2309 GlobalIds. That's the reason for the (small) occ_gbl_scrut env in

2310 OccEnv is for: it says "gather occurrence info for these".

2312 ii) We must call localiseId on 'x' first, in case it's a GlobalId, or

2313 has an External Name. See, for example, SimplEnv Note [Global Ids in

2314 the substitution].

2316 Note [Zap case binders in proxy bindings]

2317 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

2318 From the original

2319 case x of cb(dead) { p -> ...x... }

2320 we will get

2321 case x of cb(live) { p -> let x = cb in ...x... }

2323 Core Lint never expects to find an *occurrence* of an Id marked

2324 as Dead, so we must zap the OccInfo on cb before making the

2325 binding x = cb. See #5028.

2327 NB: the OccInfo on /occurrences/ really doesn't matter much; the simplifier

2328 doesn't use it. So this is only to satisfy the perhpas-over-picky Lint.

2330 Historical note [no-case-of-case]

2331 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

2332 We *used* to suppress the binder-swap in case expressions when

2333 -fno-case-of-case is on. Old remarks:

2334 "This happens in the first simplifier pass,

2335 and enhances full laziness. Here's the bad case:

2336 f = \ y -> ...(case x of I# v -> ...(case x of ...) ... )

2337 If we eliminate the inner case, we trap it inside the I# v -> arm,

2338 which might prevent some full laziness happening. I've seen this

2339 in action in spectral/cichelli/Prog.hs:

2340 [(m,n) | m <- [1..max], n <- [1..max]]

2341 Hence the check for NoCaseOfCase."

2342 However, now the full-laziness pass itself reverses the binder-swap, so this

2343 check is no longer necessary.

2345 Historical note [Suppressing the case binder-swap]

2346 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

2347 This old note describes a problem that is also fixed by doing the

2348 binder-swap in OccAnal:

2350 There is another situation when it might make sense to suppress the

2351 case-expression binde-swap. If we have

2353 case x of w1 { DEFAULT -> case x of w2 { A -> e1; B -> e2 }

2354 ...other cases .... }

2356 We'll perform the binder-swap for the outer case, giving

2358 case x of w1 { DEFAULT -> case w1 of w2 { A -> e1; B -> e2 }

2359 ...other cases .... }

2361 But there is no point in doing it for the inner case, because w1 can't

2362 be inlined anyway. Furthermore, doing the case-swapping involves

2363 zapping w2's occurrence info (see paragraphs that follow), and that

2364 forces us to bind w2 when doing case merging. So we get

2366 case x of w1 { A -> let w2 = w1 in e1

2367 B -> let w2 = w1 in e2

2368 ...other cases .... }

2370 This is plain silly in the common case where w2 is dead.

2372 Even so, I can't see a good way to implement this idea. I tried

2373 not doing the binder-swap if the scrutinee was already evaluated

2374 but that failed big-time:

2376 data T = MkT !Int

2378 case v of w { MkT x ->

2379 case x of x1 { I# y1 ->

2380 case x of x2 { I# y2 -> ...

2382 Notice that because MkT is strict, x is marked "evaluated". But to

2383 eliminate the last case, we must either make sure that x (as well as

2384 x1) has unfolding MkT y1. The straightforward thing to do is to do

2385 the binder-swap. So this whole note is a no-op.

2387 It's fixed by doing the binder-swap in OccAnal because we can do the

2388 binder-swap unconditionally and still get occurrence analysis

2389 information right.

2390 -}

2393 -- Does three things: a) makes the occ_one_shots = OccVanilla

2394 -- b) extends the GlobalScruts if possible

2395 -- c) returns a proxy mapping, binding the scrutinee

2396 -- to the case binder, if possible

2401 -- See Note [Case of cast]

2404 where

2405 add_scrut v rhs

2410 -- ToDO: this isGlobalId stuff is a TEMPORARY FIX

2411 -- to avoid the binder-swap for GlobalIds

2412 -- See #16346

2415 -- See Note [Zap case binders in proxy bindings]

2417 -- Localise the scrut_var before shadowing it; we're making a

2418 -- new binding for it, and it might have an External Name, or

2419 -- even be a GlobalId; Note [Binder swap on GlobalId scrutinees]

2420 -- Also we don't want any INLINE or NOINLINE pragmas!

2424 {-

2425 ************************************************************************

2426 * *

2427 \subsection[OccurAnal-types]{OccEnv}

2428 * *

2429 ************************************************************************

2431 Note [UsageDetails and zapping]

2432 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

2434 On many occasions, we must modify all gathered occurrence data at once. For

2435 instance, all occurrences underneath a (non-one-shot) lambda set the

2436 'occ_in_lam' flag to become 'True'. We could use 'mapVarEnv' to do this, but

2437 that takes O(n) time and we will do this often---in particular, there are many

2438 places where tail calls are not allowed, and each of these causes all variables

2439 to get marked with 'NoTailCallInfo'.

2441 Instead of relying on `mapVarEnv`, then, we carry three 'IdEnv's around along

2442 with the 'OccInfoEnv'. Each of these extra environments is a "zapped set"

2443 recording which variables have been zapped in some way. Zapping all occurrence

2444 info then simply means setting the corresponding zapped set to the whole

2445 'OccInfoEnv', a fast O(1) operation.

2446 -}

2449 -- INVARIANT: never IAmDead

2450 -- (Deadness is signalled by not being in the map at all)

2454 data UsageDetails

2459 -- INVARIANT: All three zapped sets are subsets of the OccInfoEnv

2464 -------------------

2465 -- UsageDetails API

2467 andUDs, orUDs

2469 andUDs = combineUsageDetailsWith addOccInfo

2470 orUDs = combineUsageDetailsWith orOccInfo

2476 mkOneOcc env id int_cxt arity

2477 | isLocalId id

2482 | id `elemVarSet` occ_gbl_scrut env

2483 = singleton noOccInfo

2485 | otherwise

2486 = emptyDetails

2487 where

2491 addOneOcc ud id info

2494 where

2498 addManyOccsSet usage id_set = nonDetFoldUniqSet addManyOccs usage id_set

2499 -- It's OK to use nonDetFoldUFM here because addManyOccs commutes

2501 -- Add several occurrences, assumed not to be tail calls

2503 addManyOccs v u | isId v = addOneOcc u v noOccInfo

2505 -- Give a non-committal binder info (i.e noOccInfo) because

2506 -- a) Many copies of the specialised thing can appear

2507 -- b) We don't want to substitute a BIG expression inside a RULE

2508 -- even if that's the only occurrence of the thing

2509 -- (Same goes for INLINE.)

2512 delDetails ud bndr

2516 delDetailsList ud bndrs

2519 emptyDetails :: UsageDetails

2537 lookupDetails ud id

2540 -- See Note [DoO not mark CoVars as dead]

2541 | otherwise

2544 Nothing -> IAmDead

2547 v `usedIn` ud = isExportedId v || v `elemVarEnv` ud_env ud

2550 -- Find the subset of bndrs that are mentioned in uds

2553 {- Note [Do not mark CoVars as dead]

2554 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

2555 It's obviously wrong to mark CoVars as dead if they are used.

2556 Currently we don't traverse types to gather usase info for CoVars,

2557 so we had better treat them as having noOccInfo.

2559 This showed up in #15696 we had something like

2560 case eq_sel d of co -> ...(typeError @(...co...) "urk")...

2562 Then 'd' was substitued by a dictionary, so the expression

2563 simpified to

2564 case (Coercion <blah>) of co -> ...(typeError @(...co...) "urk")...

2566 But then the "drop the case altogether" equation of rebuildCase

2567 thought that 'co' was dead, and discarded the entire case. Urk!

2569 I have no idea how we managed to avoid this pitfall for so long!

2570 -}

2572 -------------------

2573 -- Auxiliary functions for UsageDetails implementation

2577 combineUsageDetailsWith plus_occ_info ud1 ud2

2578 | isEmptyDetails ud1 = ud2

2579 | isEmptyDetails ud2 = ud1

2580 | otherwise

2587 doZapping ud var occ

2591 doZappingByUnique ud uniq

2593 | in_subset ud_z_in_lam -> markInsideLam

2597 where

2598 in_subset field = uniq `elemVarEnvByKey` field ud

2601 alterZappedSets ud f

2607 alterUsageDetails ud f

2609 `alterZappedSets` f

2612 flattenUsageDetails ud

2614 `alterZappedSets` const emptyVarEnv

2616 -------------------

2617 -- See Note [Adjusting right-hand sides]

2621 adjustRhsUsage mb_join_arity rec_flag bndrs usage

2623 where

2624 maybe_mark_lam ud | one_shot = ud

2626 maybe_drop_tails ud | exact_join = ud

2630 Just join_arity

2636 Just join_arity -> bndrs `lengthIs` join_arity

2645 tagLamBinders usage binders

2647 where

2654 -- Used for lambda and case binders

2655 -- It copes with the fact that lambda bindings can have a

2656 -- stable unfolding, used for join points

2657 tagLamBinder usage bndr

2659 where

2660 occ = lookupDetails usage bndr

2662 -- Don't try to make an argument into a join point

2663 usage1 = usage `delDetails` bndr

2665 -- This is effectively the RHS of a

2666 -- non-join-point binding, so it's okay to use

2667 -- addManyOccsSet, which assumes no tail calls

2676 tagNonRecBinder lvl usage binder

2678 occ = lookupDetails usage binder

2684 usage' = usage `delDetails` binder

2685 in

2694 -- with binders removed

2696 -- Substantially more complicated than non-recursive case. Need to adjust RHS

2697 -- details *before* tagging binders (because the tags depend on the RHSes).

2698 tagRecBinders lvl body_uds triples

2702 -- 1. Determine join-point-hood of whole group, as determined by

2703 -- the *unadjusted* usage details

2705 will_be_joins = decideJoinPointHood lvl unadj_uds bndrs

2707 -- 2. Adjust usage details of each RHS, taking into account the

2708 -- join-point-hood decision

2711 = adjustRhsUsage mb_join_arity Recursive rhs_bndrs rhs_uds

2712 where

2713 -- Can't use willBeJoinId_maybe here because we haven't tagged the

2714 -- binder yet (the tag depends on these adjustments!)

2715 mb_join_arity

2716 | will_be_joins

2719 = Just arity

2720 | otherwise

2722 Nothing -- we are making join points!

2724 -- 3. Compute final usage details from adjusted RHS details

2727 -- 4. Tag each binder with its adjusted details

2731 -- 5. Drop the binders from the adjusted details and return

2732 usage' = adj_uds `delDetailsList` bndrs

2733 in

2737 setBinderOcc occ_info bndr

2738 | isTyVar bndr = bndr

2740 then bndr

2741 else setIdOccInfo bndr noOccInfo

2742 -- Don't use local usage info for visible-elsewhere things

2743 -- BUT *do* erase any IAmALoopBreaker annotation, because we're

2744 -- about to re-generate it and it shouldn't be "sticky"

2748 -- | Decide whether some bindings should be made into join points or not.

2749 -- Returns `False` if they can't be join points. Note that it's an

2750 -- all-or-nothing decision, as if multiple binders are given, they're

2751 -- assumed to be mutually recursive.

2752 --

2753 -- It must, however, be a final decision. If we say "True" for 'f',

2754 -- and then subsequently decide /not/ make 'f' into a join point, then

2755 -- the decision about another binding 'g' might be invalidated if (say)

2756 -- 'f' tail-calls 'g'.

2757 --

2758 -- See Note [Invariants on join points] in CoreSyn.

2762 decideJoinPointHood TopLevel _ _

2764 decideJoinPointHood NotTopLevel usage bndrs

2767 ppr bndrs)

2768 all_ok

2769 | otherwise

2770 = all_ok

2771 where

2772 -- See Note [Invariants on join points]; invariants cited by number below.

2773 -- Invariant 2 is always satisfiable by the simplifier by eta expansion.

2775 all ok bndrs

2777 ok bndr

2778 | -- Invariant 1: Only tail calls, all same join arity

2784 -- Invariant 2a: stable unfoldings

2785 -- See Note [Join points and INLINE pragmas]

2788 -- Invariant 4: Satisfies polymorphism rule

2792 | otherwise

2797 = args `lengthIs` join_arity

2798 -- Invariant 1 as applied to LHSes of rules

2800 -- ok_unfolding returns False if we should /not/ convert a non-join-id

2801 -- into a join-id, even though it is AlwaysTailCalled

2806 ok_unfolding _ _

2810 willBeJoinId_maybe bndr

2812 AlwaysTailCalled arity -> Just arity

2813 _ -> isJoinId_maybe bndr

2816 {- Note [Join points and INLINE pragmas]

2817 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

2818 Consider

2819 f x = let g = \x. not -- Arity 1

2820 {-# INLINE g #-}

2824 C -> blah2

2846 ************************************************************************

2847 * *

2849 * *

2850 ************************************************************************

2851 -}

2855 markMany IAmDead = IAmDead

2859 markInsideLam occ = occ

2861 markNonTailCalled IAmDead = IAmDead

2868 tailCallInfo a2 }

2869 -- Both branches are at least One

2870 -- (Argument is never IAmDead)

2872 -- (orOccInfo orig new) is used

2873 -- when combining occurrence info from branches of a case

2886 tailCallInfo a2 }

2891 andTailCallInfo _ _ = NoTailCallInfo