1 --

2 -- Copyright (c) 2014 Joachim Breitner

3 --

5 module CallArity

6 ( callArityAnalProgram

10 import VarSet

11 import VarEnv

14 import BasicTypes

15 import CoreSyn

16 import Id

19 import Outputable

24 {-

25 %************************************************************************

26 %* *

27 Call Arity Analyis

28 %* *

29 %************************************************************************

31 Note [Call Arity: The goal]

32 ~~~~~~~~~~~~~~~~~~~~~~~~~~~

34 The goal of this analysis is to find out if we can eta-expand a local function,

35 based on how it is being called. The motivating example is code this this,

36 which comes up when we implement foldl using foldr, and do list fusion:

38 let go = \x -> let d = case ... of

39 False -> go (x+1)

40 True -> id

41 in \z -> d (x + z)

42 in go 1 0

44 If we do not eta-expand `go` to have arity 2, we are going to allocate a lot of

45 partial function applications, which would be bad.

47 The function `go` has a type of arity two, but only one lambda is manifest.

48 Further more, an analysis that only looks at the RHS of go cannot be sufficient

49 to eta-expand go: If `go` is ever called with one argument (and the result used

50 multiple times), we would be doing the work in `...` multiple times.

52 So `callArityAnalProgram` looks at the whole let expression to figure out if

53 all calls are nice, i.e. have a high enough arity. It then stores the result in

54 the `calledArity` field of the `IdInfo` of `go`, which the next simplifier

55 phase will eta-expand.

57 The specification of the `calledArity` field is:

59 No work will be lost if you eta-expand me to the arity in `calledArity`.

61 The specification of the analysis

62 ---------------------------------

64 The analysis only does a conservative approximation, there are plenty of

65 situations where eta-expansion would be ok, but we do not catch it. We are

66 content if all the code that foldl-via-foldr generates is being optimized

67 sufficiently.

69 The work-hourse of the analysis is the function `callArityAnal`, with the

70 following type:

72 data Count = Many | OnceAndOnly

73 type CallCount = (Count, Arity)

74 type CallArityEnv = VarEnv (CallCount, Arity)

75 callArityAnal ::

76 Arity -> -- The arity this expression is called with

77 VarSet -> -- The set of interesting variables

78 CoreExpr -> -- The expression to analyse

79 (CallArityEnv, CoreExpr)

81 and the following specification:

83 (callArityEnv, expr') = callArityEnv arity interestingIds expr

85 <=>

87 Assume the expression `expr` is being passed `arity` arguments. Then it calls

88 the functions mentioned in `interestingIds` according to `callArityEnv`:

89 * The domain of `callArityEnv` is a subset of `interestingIds`.

90 * Any variable from interestingIds that is not mentioned in the `callArityEnv`

91 is absent, i.e. not called at all.

92 * Of all the variables that are mapped to OnceAndOnly by the `callArityEnv`,

93 at most one is being called, at most once, with at least that many

94 arguments.

95 * Variables mapped to Many are called an unknown number of times, but if they

96 are called, then with at least that many arguments.

97 Furthermore, expr' is expr with the callArity field of the `IdInfo` updated.

99 The (pointwise) domain is a product domain:

101 Many 0

102 | × |

103 OneAndOnly 1

104 |

105 ...

107 The at-most-once is important for various reasons:

109 1. Consider:

111 let n = case .. of .. -- A thunk!

112 in n 0 + n 1

114 vs.

116 let n = case .. of ..

117 in case .. of T -> n 0

118 F -> n 1

120 We are only allowed to eta-expand `n` if it is going to be called at most

121 once in the body of the outer let. So we need to know, for each variable

122 individually, that it is going to be called at most once.

124 2. We need to know it for non-thunks as well, because they might call a thunk:

126 let n = case .. of ..

127 f x = n (x+1)

128 in f 1 + f 2

130 vs.

132 let n = case .. of ..

133 f x = n (x+1)

134 in case .. of T -> f 0

135 F -> f 1

137 Here, the body of f calls n exactly once, but f itself is being called

138 multiple times, so eta-expansion is not allowed.

140 3. We need to know that at most one of the interesting functions is being

141 called, because of recursion. Consider:

143 let n = case .. of ..

144 in case .. of

145 True -> let go = \y -> case .. of

146 True -> go (y + n 1)

147 False > n

148 in go 1

149 False -> n

151 vs.

153 let n = case .. of ..

154 in case .. of

155 True -> let go = \y -> case .. of

156 True -> go (y+1)

157 False > n

158 in go 1

159 False -> n

161 In both cases, the body and the rhs of the inner let call n at most once.

162 But only in the second case that holds for the whole expression! The

163 crucial difference is that in the first case, the rhs of `go` can call

164 *both* `go` and `n`, and hence can call `n` multiple times as it recurses,

165 while in the second case it calls `go` or `n`, but not both.

167 Note [Which variables are interesting]

168 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

170 Unfortunately, the set of interesting variables is not irrelevant for the

171 precision of the analysis. Consider this example (and ignore the pointlessnes

172 of `d` recursing into itself):

174 let n = ... :: Int

175 in let d = let d = case ... of

176 False -> d

177 True -> id

178 in \z -> d (x + z)

179 in d 0

181 Of course, `d` should be interesting. If we consider `n` as interesting as

182 well, then the body of the second let will return

183 { go |-> (Many, 1) , n |-> (OnceAndOnly, 0) }

184 or

185 { go |-> (OnceAndOnly, 1), n |-> (Many, 0)}.

186 Only the latter is useful, but it is hard to decide that locally.

187 (Returning OnceAndOnly for both would be wrong, as both are being called.)

189 So the heuristics is:

191 Variables are interesting if their RHS has a lower exprArity than

192 typeArity.

194 (which is precisely the those variables where this analysis can actually cause

195 some eta-expansion.)

197 But this is not uniformly a win. Consider:

199 let go = \x -> let d = case ... of

200 False -> go (x+1)

201 True -> id

202 n x = d (x+1)

203 in \z -> n (x + z)

204 in go n 0

206 Now `n` is not going to be considered interesting (its type is `Int -> Int`).

207 But this will prevent us from detecting how often the body of the let calls

208 `d`, and we will not find out anything.

210 It might be possible to be smarter here; this needs find-tuning as we find more

211 examples.

214 Note [Recursion and fixpointing]

215 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

217 For a recursive let, we begin by analysing the body, using the same incoming

218 arity as for the whole expression.

219 * We use the arity from the body on the variable as the incoming demand on the

220 rhs. Then we check if the rhs calls itself with the same arity.

221 - If so, we are done.

222 - If not, we re-analise the rhs with the reduced arity. We do that until

223 we are down to the exprArity, which then is certainly correct.

224 * If the rhs calls itself many times, we must (conservatively) pass the result

225 through forgetOnceCalls.

226 * Similarly, if the body calls the variable many times, we must pass the

227 result of the fixpointing through forgetOnceCalls.

228 * Then we can `lubEnv` the results from the body and the rhs: If all mentioned

229 calls are OnceAndOnly calls, then the body calls *either* the rhs *or* one

230 of the other mentioned variables. Similarly, the rhs calls *either* itself

231 again *or* one of the other mentioned variables. This precision is required!

232 If the recursive function is called by the body, or the rhs, tagged with Many

233 then we can also just `lubEnv`, because the result will no longer contain

234 any OnceAndOnly values.

236 Note [Case and App: Which side to take?]

237 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

239 Combining the case branches is easy, just `lubEnv` them – at most one branch is

240 taken.

242 But how to combine that with the information coming from the scrunitee? Very

243 similarly, how to combine the information from the callee and argument of an

244 `App`?

246 It would not be correct to just `lubEnv` then: `f n` obviously calls *both* `f`

247 and `n`. We need to forget about the cardinality of calls from one side using

248 `forgetOnceCalls`. But which one?

250 Both are correct, and sometimes one and sometimes the other is more precise

251 (also see example in [Which variables are interesting]).

253 So currently, we first check the scrunitee (resp. the callee) if the returned

254 value has any usesful information, and if so, we use that; otherwise we use the

255 information from the alternatives (resp. the argument).

257 It might be smarter to look for “more important” variables first, i.e. the

258 innermost recursive variable.

260 Note [Analysing top-level binds]

261 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

263 We can eta-expand top-level-binds if they are not exported, as we see all calls

264 to them. The plan is as follows: Treat the top-level binds as nested lets around

265 a body representing “all external calls”, which returns a CallArityEnv that calls

266 every exported function with the top of the lattice.

268 This means that the incoming arity on all top-level binds will have a Many

269 attached, and we will never eta-expand CAFs. Which is good.

271 -}

275 where

278 -- See Note [Analysing top-level-binds]

280 callArityTopLvl exported _ []

284 where

285 int2 = interestingBinds b

287 int' = int1 `addInterestingBinds` b

299 topCallCount :: CallCount

304 callArityAnal ::

309 -- How this expression uses its interesting variables

310 -- and the expression with IdInfo updated

312 -- The trivial base cases

319 -- The transparent cases

325 -- The interesting case: Variables, Lambdas, Lets, Applications, Cases

327 | v `elemVarSet` int

329 | otherwise

332 -- Non-value lambdas are ignored

336 -- We have a lambda that we are not sure to call. Tail calls therein

337 -- are no longer OneAndOnly calls

340 where

342 ae' = forgetOnceCalls ae

343 -- We have a lambda that we are calling. decrease arity.

346 where

349 -- For lets, use callArityBind

352 -- (vcat [ppr v, ppr arity, ppr n, ppr final_ae ])

354 where

355 int_body = int `addInterestingBinds` bind

360 -- Application. Increase arity for the called expresion, nothing to know about

361 -- the second

366 where

369 -- See Note [Case and App: Which side to take?]

370 final_ae = ae1 `useBetterOf` ae2

372 -- Case expression. Here we decide whether

373 -- we want to look at calls from the scrunitee or the alternatives;

374 -- one of them we set to Nothing.

375 -- Naive idea: If there are interesting calls in the scrunitee,

376 -- zap the alternatives

379 -- (vcat [ppr scrut, ppr final_ae])

381 where

387 -- See Note [Case and App: Which side to take?]

388 final_ae = scrut_ae `useBetterOf` alt_ae

390 -- Which bindings should we look at?

391 -- See Note [Which variables are interesting]

393 interestingBinds bind =

396 where

400 boringBinds bind =

403 where

407 addInterestingBinds int bind

409 `extendVarSetList` interestingBinds bind

412 addBoringCalls ae bind

415 -- Used for both local and top-level binds

416 -- First argument is the demand from the body

419 -- Non-recursive let

422 -- (vcat [ppr v, ppr ae_body, ppr int, ppr ae_rhs, ppr safe_arity])

424 where

425 callcount = lookupWithDefaultVarEnv ae_body topCallCount v

428 v' = v `setIdCallArity` safe_arity

430 -- Recursive let. See Note [Recursion and fixpointing]

433 where

434 int_body = int `addInterestingBinds` b

435 -- We are ignoring calls to boring binds, so we need to pretend them here!

436 ae_body' = ae_body `addBoringCalls` b

438 final_ae = ae_rhs `delVarEnvList` interestingBinds b

440 -- Here we do the fix-pointing for possibly mutually recursive values. The

441 -- idea is that we start with the calls coming from the body, and analyize

442 -- every called value with that arity, adding lub these calls into the

443 -- environment. We also remember for each variable the CallCount we analised it

444 -- with. Then we check for every variable if in the new envrionment, it is

445 -- called with a different (i.e. lower) arity. If so, we reanalize that, and

446 -- lub the result back into the environment. If we had a change for any of the

447 -- variables, we repeat this step, otherwise we are done.

448 callArityFix ::

452 callArityFix ae int ann_binds

453 | any_change

455 | otherwise

457 where

464 | mb_new_arity == mbArity

465 -- No change. No need to re-analize, and no need to change the arity

466 -- environment

469 | Just new_arity <- mb_new_arity

470 -- We previously analized this with a different arity (or not at all)

474 | otherwise

475 -- No call to this yet, so do nothing

477 where

478 mb_new_arity = lookupVarEnv ae i

484 -- This is a variant of callArityAnal that takes a CallCount (i.e. an arity with a

485 -- cardinality) and adjust the resulting environment accordingly. It is to be used

486 -- on bound expressions that can possibly be shared.

487 -- It also returns the safe arity used: For a thunk that is called multiple

488 -- times, this will be 0!

491 where

514 -- See Note [Case and App: Which side to take?]

516 useBetterOf ae1 ae2 | anyGoodCalls ae1 = ae1 `lubEnv` forgetOnceCalls ae2

524 lubCount OnceAndOnly OnceAndOnly = OnceAndOnly

525 lubCount _ _ = Many

527 -- Used when combining results from alternative cases; take the minimum

529 lubEnv = plusVarEnv_C lubCallCount