1 --

2 -- Copyright (c) 2014 Joachim Breitner

3 --

5 module CallArity

6 ( callArityAnalProgram

10 import GhcPrelude

12 import VarSet

13 import VarEnv

16 import BasicTypes

17 import CoreSyn

18 import Id

21 import UnVarGraph

22 import Demand

23 import Util

28 {-

29 %************************************************************************

30 %* *

31 Call Arity Analysis

32 %* *

33 %************************************************************************

35 Note [Call Arity: The goal]

36 ~~~~~~~~~~~~~~~~~~~~~~~~~~~

38 The goal of this analysis is to find out if we can eta-expand a local function,

39 based on how it is being called. The motivating example is this code,

40 which comes up when we implement foldl using foldr, and do list fusion:

42 let go = \x -> let d = case ... of

43 False -> go (x+1)

44 True -> id

45 in \z -> d (x + z)

46 in go 1 0

48 If we do not eta-expand `go` to have arity 2, we are going to allocate a lot of

49 partial function applications, which would be bad.

51 The function `go` has a type of arity two, but only one lambda is manifest.

52 Furthermore, an analysis that only looks at the RHS of go cannot be sufficient

53 to eta-expand go: If `go` is ever called with one argument (and the result used

54 multiple times), we would be doing the work in `...` multiple times.

56 So `callArityAnalProgram` looks at the whole let expression to figure out if

57 all calls are nice, i.e. have a high enough arity. It then stores the result in

58 the `calledArity` field of the `IdInfo` of `go`, which the next simplifier

59 phase will eta-expand.

61 The specification of the `calledArity` field is:

63 No work will be lost if you eta-expand me to the arity in `calledArity`.

65 What we want to know for a variable

66 -----------------------------------

68 For every let-bound variable we'd like to know:

69 1. A lower bound on the arity of all calls to the variable, and

70 2. whether the variable is being called at most once or possible multiple

71 times.

73 It is always ok to lower the arity, or pretend that there are multiple calls.

74 In particular, "Minimum arity 0 and possible called multiple times" is always

75 correct.

78 What we want to know from an expression

79 ---------------------------------------

81 In order to obtain that information for variables, we analyze expression and

82 obtain bits of information:

84 I. The arity analysis:

85 For every variable, whether it is absent, or called,

86 and if called, which what arity.

88 II. The Co-Called analysis:

89 For every two variables, whether there is a possibility that both are being

90 called.

91 We obtain as a special case: For every variables, whether there is a

92 possibility that it is being called twice.

94 For efficiency reasons, we gather this information only for a set of

95 *interesting variables*, to avoid spending time on, e.g., variables from pattern matches.

97 The two analysis are not completely independent, as a higher arity can improve

98 the information about what variables are being called once or multiple times.

100 Note [Analysis I: The arity analysis]

101 ------------------------------------

103 The arity analysis is quite straight forward: The information about an

104 expression is an

105 VarEnv Arity

106 where absent variables are bound to Nothing and otherwise to a lower bound to

107 their arity.

109 When we analyze an expression, we analyze it with a given context arity.

110 Lambdas decrease and applications increase the incoming arity. Analysizing a

111 variable will put that arity in the environment. In lets or cases all the

112 results from the various subexpressions are lubed, which takes the point-wise

113 minimum (considering Nothing an infinity).

116 Note [Analysis II: The Co-Called analysis]

117 ------------------------------------------

119 The second part is more sophisticated. For reasons explained below, it is not

120 sufficient to simply know how often an expression evaluates a variable. Instead

121 we need to know which variables are possibly called together.

123 The data structure here is an undirected graph of variables, which is provided

124 by the abstract

125 UnVarGraph

127 It is safe to return a larger graph, i.e. one with more edges. The worst case

128 (i.e. the least useful and always correct result) is the complete graph on all

129 free variables, which means that anything can be called together with anything

130 (including itself).

132 Notation for the following:

133 C(e) is the co-called result for e.

134 G₁∪G₂ is the union of two graphs

135 fv is the set of free variables (conveniently the domain of the arity analysis result)

136 S₁×S₂ is the complete bipartite graph { {a,b} | a ∈ S₁, b ∈ S₂ }

137 S² is the complete graph on the set of variables S, S² = S×S

138 C'(e) is a variant for bound expression:

139 If e is called at most once, or it is and stays a thunk (after the analysis),

140 it is simply C(e). Otherwise, the expression can be called multiple times

141 and we return (fv e)²

143 The interesting cases of the analysis:

144 * Var v:

145 No other variables are being called.

146 Return {} (the empty graph)

147 * Lambda v e, under arity 0:

148 This means that e can be evaluated many times and we cannot get

149 any useful co-call information.

150 Return (fv e)²

151 * Case alternatives alt₁,alt₂,...:

152 Only one can be execuded, so

153 Return (alt₁ ∪ alt₂ ∪...)

154 * App e₁ e₂ (and analogously Case scrut alts), with non-trivial e₂:

155 We get the results from both sides, with the argument evaluated at most once.

156 Additionally, anything called by e₁ can possibly be called with anything

157 from e₂.

158 Return: C(e₁) ∪ C(e₂) ∪ (fv e₁) × (fv e₂)

159 * App e₁ x:

160 As this is already in A-normal form, CorePrep will not separately lambda

161 bind (and hence share) x. So we conservatively assume multiple calls to x here

162 Return: C(e₁) ∪ (fv e₁) × {x} ∪ {(x,x)}

163 * Let v = rhs in body:

164 In addition to the results from the subexpressions, add all co-calls from

165 everything that the body calls together with v to everthing that is called

166 by v.

167 Return: C'(rhs) ∪ C(body) ∪ (fv rhs) × {v'| {v,v'} ∈ C(body)}

168 * Letrec v₁ = rhs₁ ... vₙ = rhsₙ in body

169 Tricky.

170 We assume that it is really mutually recursive, i.e. that every variable

171 calls one of the others, and that this is strongly connected (otherwise we

172 return an over-approximation, so that's ok), see note [Recursion and fixpointing].

174 Let V = {v₁,...vₙ}.

175 Assume that the vs have been analysed with an incoming demand and

176 cardinality consistent with the final result (this is the fixed-pointing).

177 Again we can use the results from all subexpressions.

178 In addition, for every variable vᵢ, we need to find out what it is called

179 with (call this set Sᵢ). There are two cases:

180 * If vᵢ is a function, we need to go through all right-hand-sides and bodies,

181 and collect every variable that is called together with any variable from V:

182 Sᵢ = {v' | j ∈ {1,...,n}, {v',vⱼ} ∈ C'(rhs₁) ∪ ... ∪ C'(rhsₙ) ∪ C(body) }

183 * If vᵢ is a thunk, then its rhs is evaluated only once, so we need to

184 exclude it from this set:

185 Sᵢ = {v' | j ∈ {1,...,n}, j≠i, {v',vⱼ} ∈ C'(rhs₁) ∪ ... ∪ C'(rhsₙ) ∪ C(body) }

186 Finally, combine all this:

187 Return: C(body) ∪

188 C'(rhs₁) ∪ ... ∪ C'(rhsₙ) ∪

189 (fv rhs₁) × S₁) ∪ ... ∪ (fv rhsₙ) × Sₙ)

191 Using the result: Eta-Expansion

192 -------------------------------

194 We use the result of these two analyses to decide whether we can eta-expand the

195 rhs of a let-bound variable.

197 If the variable is already a function (exprIsCheap), and all calls to the

198 variables have a higher arity than the current manifest arity (i.e. the number

199 of lambdas), expand.

201 If the variable is a thunk we must be careful: Eta-Expansion will prevent

202 sharing of work, so this is only safe if there is at most one call to the

203 function. Therefore, we check whether {v,v} ∈ G.

205 Example:

207 let n = case .. of .. -- A thunk!

208 in n 0 + n 1

210 vs.

212 let n = case .. of ..

213 in case .. of T -> n 0

214 F -> n 1

216 We are only allowed to eta-expand `n` if it is going to be called at most

217 once in the body of the outer let. So we need to know, for each variable

218 individually, that it is going to be called at most once.

221 Why the co-call graph?

222 ----------------------

224 Why is it not sufficient to simply remember which variables are called once and

225 which are called multiple times? It would be in the previous example, but consider

227 let n = case .. of ..

228 in case .. of

229 True -> let go = \y -> case .. of

230 True -> go (y + n 1)

231 False > n

232 in go 1

233 False -> n

235 vs.

237 let n = case .. of ..

238 in case .. of

239 True -> let go = \y -> case .. of

240 True -> go (y+1)

241 False > n

242 in go 1

243 False -> n

245 In both cases, the body and the rhs of the inner let call n at most once.

246 But only in the second case that holds for the whole expression! The

247 crucial difference is that in the first case, the rhs of `go` can call

248 *both* `go` and `n`, and hence can call `n` multiple times as it recurses,

249 while in the second case find out that `go` and `n` are not called together.

252 Why co-call information for functions?

253 --------------------------------------

255 Although for eta-expansion we need the information only for thunks, we still

256 need to know whether functions are being called once or multiple times, and

257 together with what other functions.

259 Example:

261 let n = case .. of ..

262 f x = n (x+1)

263 in f 1 + f 2

265 vs.

267 let n = case .. of ..

268 f x = n (x+1)

269 in case .. of T -> f 0

270 F -> f 1

272 Here, the body of f calls n exactly once, but f itself is being called

273 multiple times, so eta-expansion is not allowed.

276 Note [Analysis type signature]

277 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

279 The work-hourse of the analysis is the function `callArityAnal`, with the

280 following type:

282 type CallArityRes = (UnVarGraph, VarEnv Arity)

283 callArityAnal ::

284 Arity -> -- The arity this expression is called with

285 VarSet -> -- The set of interesting variables

286 CoreExpr -> -- The expression to analyse

287 (CallArityRes, CoreExpr)

289 and the following specification:

291 ((coCalls, callArityEnv), expr') = callArityEnv arity interestingIds expr

293 <=>

295 Assume the expression `expr` is being passed `arity` arguments. Then it holds that

296 * The domain of `callArityEnv` is a subset of `interestingIds`.

297 * Any variable from `interestingIds` that is not mentioned in the `callArityEnv`

298 is absent, i.e. not called at all.

299 * Every call from `expr` to a variable bound to n in `callArityEnv` has at

300 least n value arguments.

301 * For two interesting variables `v1` and `v2`, they are not adjacent in `coCalls`,

302 then in no execution of `expr` both are being called.

303 Furthermore, expr' is expr with the callArity field of the `IdInfo` updated.

306 Note [Which variables are interesting]

307 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

309 The analysis would quickly become prohibitive expensive if we would analyse all

310 variables; for most variables we simply do not care about how often they are

311 called, i.e. variables bound in a pattern match. So interesting are variables that are

312 * top-level or let bound

313 * and possibly functions (typeArity > 0)

315 Note [Taking boring variables into account]

316 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

318 If we decide that the variable bound in `let x = e1 in e2` is not interesting,

319 the analysis of `e2` will not report anything about `x`. To ensure that

320 `callArityBind` does still do the right thing we have to take that into account

321 everytime we would be lookup up `x` in the analysis result of `e2`.

322 * Instead of calling lookupCallArityRes, we return (0, True), indicating

323 that this variable might be called many times with no arguments.

324 * Instead of checking `calledWith x`, we assume that everything can be called

325 with it.

326 * In the recursive case, when calclulating the `cross_calls`, if there is

327 any boring variable in the recursive group, we ignore all co-call-results

328 and directly go to a very conservative assumption.

330 The last point has the nice side effect that the relatively expensive

331 integration of co-call results in a recursive groups is often skipped. This

332 helped to avoid the compile time blowup in some real-world code with large

333 recursive groups (#10293).

335 Note [Recursion and fixpointing]

336 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

338 For a mutually recursive let, we begin by

339 1. analysing the body, using the same incoming arity as for the whole expression.

340 2. Then we iterate, memoizing for each of the bound variables the last

341 analysis call, i.e. incoming arity, whether it is called once, and the CallArityRes.

342 3. We combine the analysis result from the body and the memoized results for

343 the arguments (if already present).

344 4. For each variable, we find out the incoming arity and whether it is called

345 once, based on the current analysis result. If this differs from the

346 memoized results, we re-analyse the rhs and update the memoized table.

347 5. If nothing had to be reanalyzed, we are done.

348 Otherwise, repeat from step 3.

351 Note [Thunks in recursive groups]

352 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

354 We never eta-expand a thunk in a recursive group, on the grounds that if it is

355 part of a recursive group, then it will be called multiple times.

357 This is not necessarily true, e.g. it would be safe to eta-expand t2 (but not

358 t1) in the following code:

360 let go x = t1

361 t1 = if ... then t2 else ...

362 t2 = if ... then go 1 else ...

363 in go 0

365 Detecting this would require finding out what variables are only ever called

366 from thunks. While this is certainly possible, we yet have to see this to be

367 relevant in the wild.

370 Note [Analysing top-level binds]

371 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

373 We can eta-expand top-level-binds if they are not exported, as we see all calls

374 to them. The plan is as follows: Treat the top-level binds as nested lets around

375 a body representing “all external calls”, which returns a pessimistic

376 CallArityRes (the co-call graph is the complete graph, all arityies 0).

378 Note [Trimming arity]

379 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~

381 In the Call Arity papers, we are working on an untyped lambda calculus with no

382 other id annotations, where eta-expansion is always possible. But this is not

383 the case for Core!

384 1. We need to ensure the invariant

385 callArity e <= typeArity (exprType e)

386 for the same reasons that exprArity needs this invariant (see Note

387 [exprArity invariant] in CoreArity).

389 If we are not doing that, a too-high arity annotation will be stored with

390 the id, confusing the simplifier later on.

392 2. Eta-expanding a right hand side might invalidate existing annotations. In

393 particular, if an id has a strictness annotation of <...><...>b, then

394 passing two arguments to it will definitely bottom out, so the simplifier

395 will throw away additional parameters. This conflicts with Call Arity! So

396 we ensure that we never eta-expand such a value beyond the number of

397 arguments mentioned in the strictness signature.

398 See #10176 for a real-world-example.

400 Note [What is a thunk]

401 ~~~~~~~~~~~~~~~~~~~~~~

403 Originally, everything that is not in WHNF (`exprIsWHNF`) is considered a

404 thunk, not eta-expanded, to avoid losing any sharing. This is also how the

405 published papers on Call Arity describe it.

407 In practice, there are thunks that do a just little work, such as

408 pattern-matching on a variable, and the benefits of eta-expansion likely

409 oughtweigh the cost of doing that repeatedly. Therefore, this implementation of

410 Call Arity considers everything that is not cheap (`exprIsCheap`) as a thunk.

412 Note [Call Arity and Join Points]

413 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

415 The Call Arity analysis does not care about join points, and treats them just

416 like normal functions. This is ok.

418 The analysis *could* make use of the fact that join points are always evaluated

419 in the same context as the join-binding they are defined in and are always

420 one-shot, and handle join points separately, as suggested in

421 https://ghc.haskell.org/trac/ghc/ticket/13479#comment:10.

422 This *might* be more efficient (for example, join points would not have to be

423 considered interesting variables), but it would also add redundant code. So for

424 now we do not do that.

426 The simplifier never eta-expands join points (it instead pushes extra arguments from

427 an eta-expanded context into the join point’s RHS), so the call arity

428 annotation on join points is not actually used. As it would be equally valid

429 (though less efficient) to eta-expand join points, this is the simplifier's

430 choice, and hence Call Arity sets the call arity for join points as well.

431 -}

433 -- Main entry point

437 where

440 -- See Note [Analysing top-level-binds]

442 callArityTopLvl exported _ []

444 , [] )

447 where

448 int2 = bindersOf b

450 int' = int1 `addInterestingBinds` b

458 -- The main analysis function. See Note [Analysis type signature]

459 callArityAnal ::

464 -- How this expression uses its interesting variables

465 -- and the expression with IdInfo updated

467 -- The trivial base cases

474 -- The transparent cases

480 -- The interesting case: Variables, Lambdas, Lets, Applications, Cases

482 | v `elemVarSet` int

484 | otherwise

487 -- Non-value lambdas are ignored

491 -- We have a lambda that may be called multiple times, so its free variables

492 -- can all be co-called.

495 where

497 ae' = calledMultipleTimes ae

498 -- We have a lambda that we are calling. decrease arity.

501 where

504 -- Application. Increase arity for the called expression, nothing to know about

505 -- the second

510 where

513 -- If the argument is trivial (e.g. a variable), then it will _not_ be

514 -- let-bound in the Core to STG transformation (CorePrep actually),

515 -- so no sharing will happen here, and we have to assume many calls.

520 -- Case expression.

523 -- (vcat [ppr scrut, ppr final_ae])

525 where

529 alt_ae = lubRess alt_aes

531 final_ae = scrut_ae `both` alt_ae

533 -- For lets, use callArityBind

536 -- (vcat [ppr v, ppr arity, ppr n, ppr final_ae ])

538 where

539 int_body = int `addInterestingBinds` bind

543 -- Which bindings should we look at?

544 -- See Note [Which variables are interesting]

555 addInterestingBinds int bind

557 `extendVarSetList` interestingBinds bind

559 -- Used for both local and top-level binds

560 -- Second argument is the demand from the body

562 -- Non-recursive let

564 | otherwise

566 -- (vcat [ppr v, ppr ae_body, ppr int, ppr ae_rhs, ppr safe_arity])

568 where

570 -- If v is boring, we will not find it in ae_body, but always assume (0, False)

571 boring = v `elemVarSet` boring_vars

576 safe_arity | called_once = arity

580 -- See Note [Trimming arity]

581 trimmed_arity = trimArity v safe_arity

591 called_with_v

592 | boring = domRes ae_body

596 v' = v `setIdCallArity` trimmed_arity

599 -- Recursive let. See Note [Recursion and fixpointing]

602 -- pprTrace "callArityBind:Rec"

603 -- (vcat [ppr (Rec binds'), ppr ae_body, ppr int, ppr ae_rhs]) else id) $

605 where

606 -- See Note [Taking boring variables into account]

609 int_body = int `addInterestingBinds` b

611 final_ae = bindersOf b `resDelList` ae_rhs

616 fix ann_binds

617 | -- pprTrace "callArityBind:fix" (vcat [ppr ann_binds, ppr any_change, ppr ae]) $

618 any_change

620 | otherwise

622 where

624 ae = callArityRecEnv any_boring aes_old ae_body

628 -- No call to this yet, so do nothing

634 -- No change, no need to re-analyze

637 | otherwise

638 -- We previously analyzed this with a different arity (or not at all)

644 -- See Note [Trimming arity]

645 trimmed_arity = trimArity i safe_arity

653 i' = i `setIdCallArity` trimmed_arity

656 where

657 -- See Note [Taking boring variables into account]

664 -- Combining the results from body and rhs, (mutually) recursive case

665 -- See Note [Analysis II: The Co-Called analysis]

667 callArityRecEnv any_boring ae_rhss ae_body

668 = -- (if length ae_rhss > 300 then pprTrace "callArityRecEnv" (vcat [ppr ae_rhss, ppr ae_body, ppr ae_new]) else id) $

669 ae_new

670 where

675 cross_calls

676 -- See Note [Taking boring variables into account]

678 -- Also, calculating cross_calls is expensive. Simply be conservative

679 -- if the mutually recursive group becomes too large.

683 where

685 -- What rhs are relevant as happening before (or after) calling v?

686 -- If v is a thunk, everything from all the _other_ variables

687 -- If v is not a thunk, everything can happen.

690 -- What do we want to know from these?

691 -- Which calls can happen next to any recursive call.

692 called_with_v

694 called_by_v = domRes ae_rhs

698 -- See Note [Trimming arity]

701 where

703 max_arity_by_strsig

709 ---------------------------------------

710 -- Functions related to CallArityRes --

711 ---------------------------------------

713 -- Result type for the two analyses.

714 -- See Note [Analysis I: The arity analysis]

715 -- and Note [Analysis II: The Co-Called analysis]

718 emptyArityRes :: CallArityRes

733 -- In the result, find out the minimum arity and whether the variable is called

734 -- at most once.

747 -- Replaces the co-call graph by a complete graph (i.e. no information)

751 -- Used for application and cases

755 -- Used when combining results from alternative cases; take the minimum