Typos in comments [skip ci]
[ghc.git] / compiler / specialise / SpecConstr.hs
1 {-
2 ToDo [Oct 2013]
3 ~~~~~~~~~~~~~~~
4 1. Nuke ForceSpecConstr for good (it is subsumed by GHC.Types.SPEC in ghc-prim)
5 2. Nuke NoSpecConstr
6
7
8 (c) The GRASP/AQUA Project, Glasgow University, 1992-1998
9
10 \section[SpecConstr]{Specialise over constructors}
11 -}
12
13 {-# LANGUAGE CPP #-}
14
15 module SpecConstr(
16 specConstrProgram
17 #ifdef GHCI
18 , SpecConstrAnnotation(..)
19 #endif
20 ) where
21
22 #include "HsVersions.h"
23
24 import CoreSyn
25 import CoreSubst
26 import CoreUtils
27 import CoreUnfold ( couldBeSmallEnoughToInline )
28 import CoreFVs ( exprsFreeVarsList )
29 import CoreMonad
30 import Literal ( litIsLifted )
31 import HscTypes ( ModGuts(..) )
32 import WwLib ( mkWorkerArgs )
33 import DataCon
34 import Coercion hiding( substCo )
35 import Rules
36 import Type hiding ( substTy )
37 import TyCon ( isRecursiveTyCon, tyConName )
38 import Id
39 import PprCore ( pprParendExpr )
40 import MkCore ( mkImpossibleExpr )
41 import Var
42 import VarEnv
43 import VarSet
44 import Name
45 import BasicTypes
46 import DynFlags ( DynFlags(..) )
47 import StaticFlags ( opt_PprStyle_Debug )
48 import Maybes ( orElse, catMaybes, isJust, isNothing )
49 import Demand
50 import GHC.Serialized ( deserializeWithData )
51 import Util
52 import Pair
53 import UniqSupply
54 import Outputable
55 import FastString
56 import UniqFM
57 import MonadUtils
58 import Control.Monad ( zipWithM )
59 import Data.List
60 import PrelNames ( specTyConName )
61 import Module
62
63 -- See Note [Forcing specialisation]
64 #ifndef GHCI
65 type SpecConstrAnnotation = ()
66 #else
67 import TyCon ( TyCon )
68 import GHC.Exts( SpecConstrAnnotation(..) )
69 #endif
70
71 {-
72 -----------------------------------------------------
73 Game plan
74 -----------------------------------------------------
75
76 Consider
77 drop n [] = []
78 drop 0 xs = []
79 drop n (x:xs) = drop (n-1) xs
80
81 After the first time round, we could pass n unboxed. This happens in
82 numerical code too. Here's what it looks like in Core:
83
84 drop n xs = case xs of
85 [] -> []
86 (y:ys) -> case n of
87 I# n# -> case n# of
88 0 -> []
89 _ -> drop (I# (n# -# 1#)) xs
90
91 Notice that the recursive call has an explicit constructor as argument.
92 Noticing this, we can make a specialised version of drop
93
94 RULE: drop (I# n#) xs ==> drop' n# xs
95
96 drop' n# xs = let n = I# n# in ...orig RHS...
97
98 Now the simplifier will apply the specialisation in the rhs of drop', giving
99
100 drop' n# xs = case xs of
101 [] -> []
102 (y:ys) -> case n# of
103 0 -> []
104 _ -> drop' (n# -# 1#) xs
105
106 Much better!
107
108 We'd also like to catch cases where a parameter is carried along unchanged,
109 but evaluated each time round the loop:
110
111 f i n = if i>0 || i>n then i else f (i*2) n
112
113 Here f isn't strict in n, but we'd like to avoid evaluating it each iteration.
114 In Core, by the time we've w/wd (f is strict in i) we get
115
116 f i# n = case i# ># 0 of
117 False -> I# i#
118 True -> case n of { I# n# ->
119 case i# ># n# of
120 False -> I# i#
121 True -> f (i# *# 2#) n
122
123 At the call to f, we see that the argument, n is known to be (I# n#),
124 and n is evaluated elsewhere in the body of f, so we can play the same
125 trick as above.
126
127
128 Note [Reboxing]
129 ~~~~~~~~~~~~~~~
130 We must be careful not to allocate the same constructor twice. Consider
131 f p = (...(case p of (a,b) -> e)...p...,
132 ...let t = (r,s) in ...t...(f t)...)
133 At the recursive call to f, we can see that t is a pair. But we do NOT want
134 to make a specialised copy:
135 f' a b = let p = (a,b) in (..., ...)
136 because now t is allocated by the caller, then r and s are passed to the
137 recursive call, which allocates the (r,s) pair again.
138
139 This happens if
140 (a) the argument p is used in other than a case-scrutinisation way.
141 (b) the argument to the call is not a 'fresh' tuple; you have to
142 look into its unfolding to see that it's a tuple
143
144 Hence the "OR" part of Note [Good arguments] below.
145
146 ALTERNATIVE 2: pass both boxed and unboxed versions. This no longer saves
147 allocation, but does perhaps save evals. In the RULE we'd have
148 something like
149
150 f (I# x#) = f' (I# x#) x#
151
152 If at the call site the (I# x) was an unfolding, then we'd have to
153 rely on CSE to eliminate the duplicate allocation.... This alternative
154 doesn't look attractive enough to pursue.
155
156 ALTERNATIVE 3: ignore the reboxing problem. The trouble is that
157 the conservative reboxing story prevents many useful functions from being
158 specialised. Example:
159 foo :: Maybe Int -> Int -> Int
160 foo (Just m) 0 = 0
161 foo x@(Just m) n = foo x (n-m)
162 Here the use of 'x' will clearly not require boxing in the specialised function.
163
164 The strictness analyser has the same problem, in fact. Example:
165 f p@(a,b) = ...
166 If we pass just 'a' and 'b' to the worker, it might need to rebox the
167 pair to create (a,b). A more sophisticated analysis might figure out
168 precisely the cases in which this could happen, but the strictness
169 analyser does no such analysis; it just passes 'a' and 'b', and hopes
170 for the best.
171
172 So my current choice is to make SpecConstr similarly aggressive, and
173 ignore the bad potential of reboxing.
174
175
176 Note [Good arguments]
177 ~~~~~~~~~~~~~~~~~~~~~
178 So we look for
179
180 * A self-recursive function. Ignore mutual recursion for now,
181 because it's less common, and the code is simpler for self-recursion.
182
183 * EITHER
184
185 a) At a recursive call, one or more parameters is an explicit
186 constructor application
187 AND
188 That same parameter is scrutinised by a case somewhere in
189 the RHS of the function
190
191 OR
192
193 b) At a recursive call, one or more parameters has an unfolding
194 that is an explicit constructor application
195 AND
196 That same parameter is scrutinised by a case somewhere in
197 the RHS of the function
198 AND
199 Those are the only uses of the parameter (see Note [Reboxing])
200
201
202 What to abstract over
203 ~~~~~~~~~~~~~~~~~~~~~
204 There's a bit of a complication with type arguments. If the call
205 site looks like
206
207 f p = ...f ((:) [a] x xs)...
208
209 then our specialised function look like
210
211 f_spec x xs = let p = (:) [a] x xs in ....as before....
212
213 This only makes sense if either
214 a) the type variable 'a' is in scope at the top of f, or
215 b) the type variable 'a' is an argument to f (and hence fs)
216
217 Actually, (a) may hold for value arguments too, in which case
218 we may not want to pass them. Supose 'x' is in scope at f's
219 defn, but xs is not. Then we'd like
220
221 f_spec xs = let p = (:) [a] x xs in ....as before....
222
223 Similarly (b) may hold too. If x is already an argument at the
224 call, no need to pass it again.
225
226 Finally, if 'a' is not in scope at the call site, we could abstract
227 it as we do the term variables:
228
229 f_spec a x xs = let p = (:) [a] x xs in ...as before...
230
231 So the grand plan is:
232
233 * abstract the call site to a constructor-only pattern
234 e.g. C x (D (f p) (g q)) ==> C s1 (D s2 s3)
235
236 * Find the free variables of the abstracted pattern
237
238 * Pass these variables, less any that are in scope at
239 the fn defn. But see Note [Shadowing] below.
240
241
242 NOTICE that we only abstract over variables that are not in scope,
243 so we're in no danger of shadowing variables used in "higher up"
244 in f_spec's RHS.
245
246
247 Note [Shadowing]
248 ~~~~~~~~~~~~~~~~
249 In this pass we gather up usage information that may mention variables
250 that are bound between the usage site and the definition site; or (more
251 seriously) may be bound to something different at the definition site.
252 For example:
253
254 f x = letrec g y v = let x = ...
255 in ...(g (a,b) x)...
256
257 Since 'x' is in scope at the call site, we may make a rewrite rule that
258 looks like
259 RULE forall a,b. g (a,b) x = ...
260 But this rule will never match, because it's really a different 'x' at
261 the call site -- and that difference will be manifest by the time the
262 simplifier gets to it. [A worry: the simplifier doesn't *guarantee*
263 no-shadowing, so perhaps it may not be distinct?]
264
265 Anyway, the rule isn't actually wrong, it's just not useful. One possibility
266 is to run deShadowBinds before running SpecConstr, but instead we run the
267 simplifier. That gives the simplest possible program for SpecConstr to
268 chew on; and it virtually guarantees no shadowing.
269
270 Note [Specialising for constant parameters]
271 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
272 This one is about specialising on a *constant* (but not necessarily
273 constructor) argument
274
275 foo :: Int -> (Int -> Int) -> Int
276 foo 0 f = 0
277 foo m f = foo (f m) (+1)
278
279 It produces
280
281 lvl_rmV :: GHC.Base.Int -> GHC.Base.Int
282 lvl_rmV =
283 \ (ds_dlk :: GHC.Base.Int) ->
284 case ds_dlk of wild_alH { GHC.Base.I# x_alG ->
285 GHC.Base.I# (GHC.Prim.+# x_alG 1)
286
287 T.$wfoo :: GHC.Prim.Int# -> (GHC.Base.Int -> GHC.Base.Int) ->
288 GHC.Prim.Int#
289 T.$wfoo =
290 \ (ww_sme :: GHC.Prim.Int#) (w_smg :: GHC.Base.Int -> GHC.Base.Int) ->
291 case ww_sme of ds_Xlw {
292 __DEFAULT ->
293 case w_smg (GHC.Base.I# ds_Xlw) of w1_Xmo { GHC.Base.I# ww1_Xmz ->
294 T.$wfoo ww1_Xmz lvl_rmV
295 };
296 0 -> 0
297 }
298
299 The recursive call has lvl_rmV as its argument, so we could create a specialised copy
300 with that argument baked in; that is, not passed at all. Now it can perhaps be inlined.
301
302 When is this worth it? Call the constant 'lvl'
303 - If 'lvl' has an unfolding that is a constructor, see if the corresponding
304 parameter is scrutinised anywhere in the body.
305
306 - If 'lvl' has an unfolding that is a inlinable function, see if the corresponding
307 parameter is applied (...to enough arguments...?)
308
309 Also do this is if the function has RULES?
310
311 Also
312
313 Note [Specialising for lambda parameters]
314 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
315 foo :: Int -> (Int -> Int) -> Int
316 foo 0 f = 0
317 foo m f = foo (f m) (\n -> n-m)
318
319 This is subtly different from the previous one in that we get an
320 explicit lambda as the argument:
321
322 T.$wfoo :: GHC.Prim.Int# -> (GHC.Base.Int -> GHC.Base.Int) ->
323 GHC.Prim.Int#
324 T.$wfoo =
325 \ (ww_sm8 :: GHC.Prim.Int#) (w_sma :: GHC.Base.Int -> GHC.Base.Int) ->
326 case ww_sm8 of ds_Xlr {
327 __DEFAULT ->
328 case w_sma (GHC.Base.I# ds_Xlr) of w1_Xmf { GHC.Base.I# ww1_Xmq ->
329 T.$wfoo
330 ww1_Xmq
331 (\ (n_ad3 :: GHC.Base.Int) ->
332 case n_ad3 of wild_alB { GHC.Base.I# x_alA ->
333 GHC.Base.I# (GHC.Prim.-# x_alA ds_Xlr)
334 })
335 };
336 0 -> 0
337 }
338
339 I wonder if SpecConstr couldn't be extended to handle this? After all,
340 lambda is a sort of constructor for functions and perhaps it already
341 has most of the necessary machinery?
342
343 Furthermore, there's an immediate win, because you don't need to allocate the lambda
344 at the call site; and if perchance it's called in the recursive call, then you
345 may avoid allocating it altogether. Just like for constructors.
346
347 Looks cool, but probably rare...but it might be easy to implement.
348
349
350 Note [SpecConstr for casts]
351 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
352 Consider
353 data family T a :: *
354 data instance T Int = T Int
355
356 foo n = ...
357 where
358 go (T 0) = 0
359 go (T n) = go (T (n-1))
360
361 The recursive call ends up looking like
362 go (T (I# ...) `cast` g)
363 So we want to spot the constructor application inside the cast.
364 That's why we have the Cast case in argToPat
365
366 Note [Local recursive groups]
367 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
368 For a *local* recursive group, we can see all the calls to the
369 function, so we seed the specialisation loop from the calls in the
370 body, not from the calls in the RHS. Consider:
371
372 bar m n = foo n (n,n) (n,n) (n,n) (n,n)
373 where
374 foo n p q r s
375 | n == 0 = m
376 | n > 3000 = case p of { (p1,p2) -> foo (n-1) (p2,p1) q r s }
377 | n > 2000 = case q of { (q1,q2) -> foo (n-1) p (q2,q1) r s }
378 | n > 1000 = case r of { (r1,r2) -> foo (n-1) p q (r2,r1) s }
379 | otherwise = case s of { (s1,s2) -> foo (n-1) p q r (s2,s1) }
380
381 If we start with the RHSs of 'foo', we get lots and lots of specialisations,
382 most of which are not needed. But if we start with the (single) call
383 in the rhs of 'bar' we get exactly one fully-specialised copy, and all
384 the recursive calls go to this fully-specialised copy. Indeed, the original
385 function is later collected as dead code. This is very important in
386 specialising the loops arising from stream fusion, for example in NDP where
387 we were getting literally hundreds of (mostly unused) specialisations of
388 a local function.
389
390 In a case like the above we end up never calling the original un-specialised
391 function. (Although we still leave its code around just in case.)
392
393 However, if we find any boring calls in the body, including *unsaturated*
394 ones, such as
395 letrec foo x y = ....foo...
396 in map foo xs
397 then we will end up calling the un-specialised function, so then we *should*
398 use the calls in the un-specialised RHS as seeds. We call these
399 "boring call patterns", and callsToPats reports if it finds any of these.
400
401 Note [Seeding top-level recursive groups]
402 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
403 This seeding is done in the binding for seed_calls in specRec.
404
405 1. If all the bindings in a top-level recursive group are local (not
406 exported), then all the calls are in the rest of the top-level
407 bindings. This means we can specialise with those call patterns
408 ONLY, and NOT with the RHSs of the recursive group (exactly like
409 Note [Local recursive groups])
410
411 2. But if any of the bindings are exported, the function may be called
412 with any old arguments, so (for lack of anything better) we specialise
413 based on
414 (a) the call patterns in the RHS
415 (b) the call patterns in the rest of the top-level bindings
416 NB: before Apr 15 we used (a) only, but Dimitrios had an example
417 where (b) was crucial, so I added that.
418 Adding (b) also improved nofib allocation results:
419 multiplier: 4% better
420 minimax: 2.8% better
421
422 Actually in case (2), instead of using the calls from the RHS, it
423 would be better to specialise in the importing module. We'd need to
424 add an INLINEABLE pragma to the function, and then it can be
425 specialised in the importing scope, just as is done for type classes
426 in Specialise.specImports. This remains to be done (#10346).
427
428 Note [Top-level recursive groups]
429 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
430 To get the call usage information from "the rest of the top level
431 bindings" (c.f. Note [Seeding top-level recursive groups]), we work
432 backwards through the top-level bindings so we see the usage before we
433 get to the binding of the function. Before we can collect the usage
434 though, we go through all the bindings and add them to the
435 environment. This is necessary because usage is only tracked for
436 functions in the environment. These two passes are called
437 'go' and 'goEnv'
438 in specConstrProgram. (Looks a bit revolting to me.)
439
440 Note [Do not specialise diverging functions]
441 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
442 Specialising a function that just diverges is a waste of code.
443 Furthermore, it broke GHC (simpl014) thus:
444 {-# STR Sb #-}
445 f = \x. case x of (a,b) -> f x
446 If we specialise f we get
447 f = \x. case x of (a,b) -> fspec a b
448 But fspec doesn't have decent strictness info. As it happened,
449 (f x) :: IO t, so the state hack applied and we eta expanded fspec,
450 and hence f. But now f's strictness is less than its arity, which
451 breaks an invariant.
452
453
454 Note [Forcing specialisation]
455 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
456
457 With stream fusion and in other similar cases, we want to fully
458 specialise some (but not necessarily all!) loops regardless of their
459 size and the number of specialisations.
460
461 We allow a library to do this, in one of two ways (one which is
462 deprecated):
463
464 1) Add a parameter of type GHC.Types.SPEC (from ghc-prim) to the loop body.
465
466 2) (Deprecated) Annotate a type with ForceSpecConstr from GHC.Exts,
467 and then add *that* type as a parameter to the loop body
468
469 The reason #2 is deprecated is because it requires GHCi, which isn't
470 available for things like a cross compiler using stage1.
471
472 Here's a (simplified) example from the `vector` package. You may bring
473 the special 'force specialization' type into scope by saying:
474
475 import GHC.Types (SPEC(..))
476
477 or by defining your own type (again, deprecated):
478
479 data SPEC = SPEC | SPEC2
480 {-# ANN type SPEC ForceSpecConstr #-}
481
482 (Note this is the exact same definition of GHC.Types.SPEC, just
483 without the annotation.)
484
485 After that, you say:
486
487 foldl :: (a -> b -> a) -> a -> Stream b -> a
488 {-# INLINE foldl #-}
489 foldl f z (Stream step s _) = foldl_loop SPEC z s
490 where
491 foldl_loop !sPEC z s = case step s of
492 Yield x s' -> foldl_loop sPEC (f z x) s'
493 Skip -> foldl_loop sPEC z s'
494 Done -> z
495
496 SpecConstr will spot the SPEC parameter and always fully specialise
497 foldl_loop. Note that
498
499 * We have to prevent the SPEC argument from being removed by
500 w/w which is why (a) SPEC is a sum type, and (b) we have to seq on
501 the SPEC argument.
502
503 * And lastly, the SPEC argument is ultimately eliminated by
504 SpecConstr itself so there is no runtime overhead.
505
506 This is all quite ugly; we ought to come up with a better design.
507
508 ForceSpecConstr arguments are spotted in scExpr' and scTopBinds which then set
509 sc_force to True when calling specLoop. This flag does four things:
510 * Ignore specConstrThreshold, to specialise functions of arbitrary size
511 (see scTopBind)
512 * Ignore specConstrCount, to make arbitrary numbers of specialisations
513 (see specialise)
514 * Specialise even for arguments that are not scrutinised in the loop
515 (see argToPat; Trac #4488)
516 * Only specialise on recursive types a finite number of times
517 (see is_too_recursive; Trac #5550; Note [Limit recursive specialisation])
518
519 This flag is inherited for nested non-recursive bindings (which are likely to
520 be join points and hence should be fully specialised) but reset for nested
521 recursive bindings.
522
523 What alternatives did I consider? Annotating the loop itself doesn't
524 work because (a) it is local and (b) it will be w/w'ed and having
525 w/w propagating annotations somehow doesn't seem like a good idea. The
526 types of the loop arguments really seem to be the most persistent
527 thing.
528
529 Annotating the types that make up the loop state doesn't work,
530 either, because (a) it would prevent us from using types like Either
531 or tuples here, (b) we don't want to restrict the set of types that
532 can be used in Stream states and (c) some types are fixed by the user
533 (e.g., the accumulator here) but we still want to specialise as much
534 as possible.
535
536 Alternatives to ForceSpecConstr
537 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
538 Instead of giving the loop an extra argument of type SPEC, we
539 also considered *wrapping* arguments in SPEC, thus
540 data SPEC a = SPEC a | SPEC2
541
542 loop = \arg -> case arg of
543 SPEC state ->
544 case state of (x,y) -> ... loop (SPEC (x',y')) ...
545 S2 -> error ...
546 The idea is that a SPEC argument says "specialise this argument
547 regardless of whether the function case-analyses it". But this
548 doesn't work well:
549 * SPEC must still be a sum type, else the strictness analyser
550 eliminates it
551 * But that means that 'loop' won't be strict in its real payload
552 This loss of strictness in turn screws up specialisation, because
553 we may end up with calls like
554 loop (SPEC (case z of (p,q) -> (q,p)))
555 Without the SPEC, if 'loop' were strict, the case would move out
556 and we'd see loop applied to a pair. But if 'loop' isn't strict
557 this doesn't look like a specialisable call.
558
559 Note [Limit recursive specialisation]
560 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
561 It is possible for ForceSpecConstr to cause an infinite loop of specialisation.
562 Because there is no limit on the number of specialisations, a recursive call with
563 a recursive constructor as an argument (for example, list cons) will generate
564 a specialisation for that constructor. If the resulting specialisation also
565 contains a recursive call with the constructor, this could proceed indefinitely.
566
567 For example, if ForceSpecConstr is on:
568 loop :: [Int] -> [Int] -> [Int]
569 loop z [] = z
570 loop z (x:xs) = loop (x:z) xs
571 this example will create a specialisation for the pattern
572 loop (a:b) c = loop' a b c
573
574 loop' a b [] = (a:b)
575 loop' a b (x:xs) = loop (x:(a:b)) xs
576 and a new pattern is found:
577 loop (a:(b:c)) d = loop'' a b c d
578 which can continue indefinitely.
579
580 Roman's suggestion to fix this was to stop after a couple of times on recursive types,
581 but still specialising on non-recursive types as much as possible.
582
583 To implement this, we count the number of recursive constructors in each
584 function argument. If the maximum is greater than the specConstrRecursive limit,
585 do not specialise on that pattern.
586
587 This is only necessary when ForceSpecConstr is on: otherwise the specConstrCount
588 will force termination anyway.
589
590 See Trac #5550.
591
592 Note [NoSpecConstr]
593 ~~~~~~~~~~~~~~~~~~~
594 The ignoreDataCon stuff allows you to say
595 {-# ANN type T NoSpecConstr #-}
596 to mean "don't specialise on arguments of this type". It was added
597 before we had ForceSpecConstr. Lacking ForceSpecConstr we specialised
598 regardless of size; and then we needed a way to turn that *off*. Now
599 that we have ForceSpecConstr, this NoSpecConstr is probably redundant.
600 (Used only for PArray.)
601
602 -----------------------------------------------------
603 Stuff not yet handled
604 -----------------------------------------------------
605
606 Here are notes arising from Roman's work that I don't want to lose.
607
608 Example 1
609 ~~~~~~~~~
610 data T a = T !a
611
612 foo :: Int -> T Int -> Int
613 foo 0 t = 0
614 foo x t | even x = case t of { T n -> foo (x-n) t }
615 | otherwise = foo (x-1) t
616
617 SpecConstr does no specialisation, because the second recursive call
618 looks like a boxed use of the argument. A pity.
619
620 $wfoo_sFw :: GHC.Prim.Int# -> T.T GHC.Base.Int -> GHC.Prim.Int#
621 $wfoo_sFw =
622 \ (ww_sFo [Just L] :: GHC.Prim.Int#) (w_sFq [Just L] :: T.T GHC.Base.Int) ->
623 case ww_sFo of ds_Xw6 [Just L] {
624 __DEFAULT ->
625 case GHC.Prim.remInt# ds_Xw6 2 of wild1_aEF [Dead Just A] {
626 __DEFAULT -> $wfoo_sFw (GHC.Prim.-# ds_Xw6 1) w_sFq;
627 0 ->
628 case w_sFq of wild_Xy [Just L] { T.T n_ad5 [Just U(L)] ->
629 case n_ad5 of wild1_aET [Just A] { GHC.Base.I# y_aES [Just L] ->
630 $wfoo_sFw (GHC.Prim.-# ds_Xw6 y_aES) wild_Xy
631 } } };
632 0 -> 0
633
634 Example 2
635 ~~~~~~~~~
636 data a :*: b = !a :*: !b
637 data T a = T !a
638
639 foo :: (Int :*: T Int) -> Int
640 foo (0 :*: t) = 0
641 foo (x :*: t) | even x = case t of { T n -> foo ((x-n) :*: t) }
642 | otherwise = foo ((x-1) :*: t)
643
644 Very similar to the previous one, except that the parameters are now in
645 a strict tuple. Before SpecConstr, we have
646
647 $wfoo_sG3 :: GHC.Prim.Int# -> T.T GHC.Base.Int -> GHC.Prim.Int#
648 $wfoo_sG3 =
649 \ (ww_sFU [Just L] :: GHC.Prim.Int#) (ww_sFW [Just L] :: T.T
650 GHC.Base.Int) ->
651 case ww_sFU of ds_Xws [Just L] {
652 __DEFAULT ->
653 case GHC.Prim.remInt# ds_Xws 2 of wild1_aEZ [Dead Just A] {
654 __DEFAULT ->
655 case ww_sFW of tpl_B2 [Just L] { T.T a_sFo [Just A] ->
656 $wfoo_sG3 (GHC.Prim.-# ds_Xws 1) tpl_B2 -- $wfoo1
657 };
658 0 ->
659 case ww_sFW of wild_XB [Just A] { T.T n_ad7 [Just S(L)] ->
660 case n_ad7 of wild1_aFd [Just L] { GHC.Base.I# y_aFc [Just L] ->
661 $wfoo_sG3 (GHC.Prim.-# ds_Xws y_aFc) wild_XB -- $wfoo2
662 } } };
663 0 -> 0 }
664
665 We get two specialisations:
666 "SC:$wfoo1" [0] __forall {a_sFB :: GHC.Base.Int sc_sGC :: GHC.Prim.Int#}
667 Foo.$wfoo sc_sGC (Foo.T @ GHC.Base.Int a_sFB)
668 = Foo.$s$wfoo1 a_sFB sc_sGC ;
669 "SC:$wfoo2" [0] __forall {y_aFp :: GHC.Prim.Int# sc_sGC :: GHC.Prim.Int#}
670 Foo.$wfoo sc_sGC (Foo.T @ GHC.Base.Int (GHC.Base.I# y_aFp))
671 = Foo.$s$wfoo y_aFp sc_sGC ;
672
673 But perhaps the first one isn't good. After all, we know that tpl_B2 is
674 a T (I# x) really, because T is strict and Int has one constructor. (We can't
675 unbox the strict fields, because T is polymorphic!)
676
677 ************************************************************************
678 * *
679 \subsection{Top level wrapper stuff}
680 * *
681 ************************************************************************
682 -}
683
684 specConstrProgram :: ModGuts -> CoreM ModGuts
685 specConstrProgram guts
686 = do
687 dflags <- getDynFlags
688 us <- getUniqueSupplyM
689 annos <- getFirstAnnotations deserializeWithData guts
690 this_mod <- getModule
691 let binds' = reverse $ fst $ initUs us $ do
692 -- Note [Top-level recursive groups]
693 (env, binds) <- goEnv (initScEnv dflags this_mod annos)
694 (mg_binds guts)
695 -- binds is identical to (mg_binds guts), except that the
696 -- binders on the LHS have been replaced by extendBndr
697 -- (SPJ this seems like overkill; I don't think the binders
698 -- will change at all; and we don't substitute in the RHSs anyway!!)
699 go env nullUsage (reverse binds)
700
701 return (guts { mg_binds = binds' })
702 where
703 -- See Note [Top-level recursive groups]
704 goEnv env [] = return (env, [])
705 goEnv env (bind:binds) = do (env', bind') <- scTopBindEnv env bind
706 (env'', binds') <- goEnv env' binds
707 return (env'', bind' : binds')
708
709 -- Arg list of bindings is in reverse order
710 go _ _ [] = return []
711 go env usg (bind:binds) = do (usg', bind') <- scTopBind env usg bind
712 binds' <- go env usg' binds
713 return (bind' : binds')
714
715 {-
716 ************************************************************************
717 * *
718 \subsection{Environment: goes downwards}
719 * *
720 ************************************************************************
721
722 Note [Work-free values only in environment]
723 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
724 The sc_vals field keeps track of in-scope value bindings, so
725 that if we come across (case x of Just y ->...) we can reduce the
726 case from knowing that x is bound to a pair.
727
728 But only *work-free* values are ok here. For example if the envt had
729 x -> Just (expensive v)
730 then we do NOT want to expand to
731 let y = expensive v in ...
732 because the x-binding still exists and we've now duplicated (expensive v).
733
734 This seldom happens because let-bound constructor applications are
735 ANF-ised, but it can happen as a result of on-the-fly transformations in
736 SpecConstr itself. Here is Trac #7865:
737
738 let {
739 a'_shr =
740 case xs_af8 of _ {
741 [] -> acc_af6;
742 : ds_dgt [Dmd=<L,A>] ds_dgu [Dmd=<L,A>] ->
743 (expensive x_af7, x_af7
744 } } in
745 let {
746 ds_sht =
747 case a'_shr of _ { (p'_afd, q'_afe) ->
748 TSpecConstr_DoubleInline.recursive
749 (GHC.Types.: @ GHC.Types.Int x_af7 wild_X6) (q'_afe, p'_afd)
750 } } in
751
752 When processed knowing that xs_af8 was bound to a cons, we simplify to
753 a'_shr = (expensive x_af7, x_af7)
754 and we do NOT want to inline that at the occurrence of a'_shr in ds_sht.
755 (There are other occurrences of a'_shr.) No no no.
756
757 It would be possible to do some on-the-fly ANF-ising, so that a'_shr turned
758 into a work-free value again, thus
759 a1 = expensive x_af7
760 a'_shr = (a1, x_af7)
761 but that's more work, so until its shown to be important I'm going to
762 leave it for now.
763 -}
764
765 data ScEnv = SCE { sc_dflags :: DynFlags,
766 sc_module :: !Module,
767 sc_size :: Maybe Int, -- Size threshold
768 sc_count :: Maybe Int, -- Max # of specialisations for any one fn
769 -- See Note [Avoiding exponential blowup]
770
771 sc_recursive :: Int, -- Max # of specialisations over recursive type.
772 -- Stops ForceSpecConstr from diverging.
773
774 sc_force :: Bool, -- Force specialisation?
775 -- See Note [Forcing specialisation]
776
777 sc_subst :: Subst, -- Current substitution
778 -- Maps InIds to OutExprs
779
780 sc_how_bound :: HowBoundEnv,
781 -- Binds interesting non-top-level variables
782 -- Domain is OutVars (*after* applying the substitution)
783
784 sc_vals :: ValueEnv,
785 -- Domain is OutIds (*after* applying the substitution)
786 -- Used even for top-level bindings (but not imported ones)
787 -- The range of the ValueEnv is *work-free* values
788 -- such as (\x. blah), or (Just v)
789 -- but NOT (Just (expensive v))
790 -- See Note [Work-free values only in environment]
791
792 sc_annotations :: UniqFM SpecConstrAnnotation
793 }
794
795 ---------------------
796 -- As we go, we apply a substitution (sc_subst) to the current term
797 type InExpr = CoreExpr -- _Before_ applying the subst
798 type InVar = Var
799
800 type OutExpr = CoreExpr -- _After_ applying the subst
801 type OutId = Id
802 type OutVar = Var
803
804 ---------------------
805 type HowBoundEnv = VarEnv HowBound -- Domain is OutVars
806
807 ---------------------
808 type ValueEnv = IdEnv Value -- Domain is OutIds
809 data Value = ConVal AltCon [CoreArg] -- _Saturated_ constructors
810 -- The AltCon is never DEFAULT
811 | LambdaVal -- Inlinable lambdas or PAPs
812
813 instance Outputable Value where
814 ppr (ConVal con args) = ppr con <+> interpp'SP args
815 ppr LambdaVal = text "<Lambda>"
816
817 ---------------------
818 initScEnv :: DynFlags -> Module -> UniqFM SpecConstrAnnotation -> ScEnv
819 initScEnv dflags this_mod anns
820 = SCE { sc_dflags = dflags,
821 sc_module = this_mod,
822 sc_size = specConstrThreshold dflags,
823 sc_count = specConstrCount dflags,
824 sc_recursive = specConstrRecursive dflags,
825 sc_force = False,
826 sc_subst = emptySubst,
827 sc_how_bound = emptyVarEnv,
828 sc_vals = emptyVarEnv,
829 sc_annotations = anns }
830
831 data HowBound = RecFun -- These are the recursive functions for which
832 -- we seek interesting call patterns
833
834 | RecArg -- These are those functions' arguments, or their sub-components;
835 -- we gather occurrence information for these
836
837 instance Outputable HowBound where
838 ppr RecFun = text "RecFun"
839 ppr RecArg = text "RecArg"
840
841 scForce :: ScEnv -> Bool -> ScEnv
842 scForce env b = env { sc_force = b }
843
844 lookupHowBound :: ScEnv -> Id -> Maybe HowBound
845 lookupHowBound env id = lookupVarEnv (sc_how_bound env) id
846
847 scSubstId :: ScEnv -> Id -> CoreExpr
848 scSubstId env v = lookupIdSubst (text "scSubstId") (sc_subst env) v
849
850 scSubstTy :: ScEnv -> Type -> Type
851 scSubstTy env ty = substTy (sc_subst env) ty
852
853 scSubstCo :: ScEnv -> Coercion -> Coercion
854 scSubstCo env co = substCo (sc_subst env) co
855
856 zapScSubst :: ScEnv -> ScEnv
857 zapScSubst env = env { sc_subst = zapSubstEnv (sc_subst env) }
858
859 extendScInScope :: ScEnv -> [Var] -> ScEnv
860 -- Bring the quantified variables into scope
861 extendScInScope env qvars = env { sc_subst = extendInScopeList (sc_subst env) qvars }
862
863 -- Extend the substitution
864 extendScSubst :: ScEnv -> Var -> OutExpr -> ScEnv
865 extendScSubst env var expr = env { sc_subst = extendSubst (sc_subst env) var expr }
866
867 extendScSubstList :: ScEnv -> [(Var,OutExpr)] -> ScEnv
868 extendScSubstList env prs = env { sc_subst = extendSubstList (sc_subst env) prs }
869
870 extendHowBound :: ScEnv -> [Var] -> HowBound -> ScEnv
871 extendHowBound env bndrs how_bound
872 = env { sc_how_bound = extendVarEnvList (sc_how_bound env)
873 [(bndr,how_bound) | bndr <- bndrs] }
874
875 extendBndrsWith :: HowBound -> ScEnv -> [Var] -> (ScEnv, [Var])
876 extendBndrsWith how_bound env bndrs
877 = (env { sc_subst = subst', sc_how_bound = hb_env' }, bndrs')
878 where
879 (subst', bndrs') = substBndrs (sc_subst env) bndrs
880 hb_env' = sc_how_bound env `extendVarEnvList`
881 [(bndr,how_bound) | bndr <- bndrs']
882
883 extendBndrWith :: HowBound -> ScEnv -> Var -> (ScEnv, Var)
884 extendBndrWith how_bound env bndr
885 = (env { sc_subst = subst', sc_how_bound = hb_env' }, bndr')
886 where
887 (subst', bndr') = substBndr (sc_subst env) bndr
888 hb_env' = extendVarEnv (sc_how_bound env) bndr' how_bound
889
890 extendRecBndrs :: ScEnv -> [Var] -> (ScEnv, [Var])
891 extendRecBndrs env bndrs = (env { sc_subst = subst' }, bndrs')
892 where
893 (subst', bndrs') = substRecBndrs (sc_subst env) bndrs
894
895 extendBndr :: ScEnv -> Var -> (ScEnv, Var)
896 extendBndr env bndr = (env { sc_subst = subst' }, bndr')
897 where
898 (subst', bndr') = substBndr (sc_subst env) bndr
899
900 extendValEnv :: ScEnv -> Id -> Maybe Value -> ScEnv
901 extendValEnv env _ Nothing = env
902 extendValEnv env id (Just cv)
903 | valueIsWorkFree cv -- Don't duplicate work!! Trac #7865
904 = env { sc_vals = extendVarEnv (sc_vals env) id cv }
905 extendValEnv env _ _ = env
906
907 extendCaseBndrs :: ScEnv -> OutExpr -> OutId -> AltCon -> [Var] -> (ScEnv, [Var])
908 -- When we encounter
909 -- case scrut of b
910 -- C x y -> ...
911 -- we want to bind b, to (C x y)
912 -- NB1: Extends only the sc_vals part of the envt
913 -- NB2: Kill the dead-ness info on the pattern binders x,y, since
914 -- they are potentially made alive by the [b -> C x y] binding
915 extendCaseBndrs env scrut case_bndr con alt_bndrs
916 = (env2, alt_bndrs')
917 where
918 live_case_bndr = not (isDeadBinder case_bndr)
919 env1 | Var v <- stripTicksTopE (const True) scrut
920 = extendValEnv env v cval
921 | otherwise = env -- See Note [Add scrutinee to ValueEnv too]
922 env2 | live_case_bndr = extendValEnv env1 case_bndr cval
923 | otherwise = env1
924
925 alt_bndrs' | case scrut of { Var {} -> True; _ -> live_case_bndr }
926 = map zap alt_bndrs
927 | otherwise
928 = alt_bndrs
929
930 cval = case con of
931 DEFAULT -> Nothing
932 LitAlt {} -> Just (ConVal con [])
933 DataAlt {} -> Just (ConVal con vanilla_args)
934 where
935 vanilla_args = map Type (tyConAppArgs (idType case_bndr)) ++
936 varsToCoreExprs alt_bndrs
937
938 zap v | isTyVar v = v -- See NB2 above
939 | otherwise = zapIdOccInfo v
940
941
942 decreaseSpecCount :: ScEnv -> Int -> ScEnv
943 -- See Note [Avoiding exponential blowup]
944 decreaseSpecCount env n_specs
945 = env { sc_count = case sc_count env of
946 Nothing -> Nothing
947 Just n -> Just (n `div` (n_specs + 1)) }
948 -- The "+1" takes account of the original function;
949 -- See Note [Avoiding exponential blowup]
950
951 ---------------------------------------------------
952 -- See Note [Forcing specialisation]
953 ignoreType :: ScEnv -> Type -> Bool
954 ignoreDataCon :: ScEnv -> DataCon -> Bool
955 forceSpecBndr :: ScEnv -> Var -> Bool
956
957 #ifndef GHCI
958 ignoreType _ _ = False
959 ignoreDataCon _ _ = False
960 #else /* GHCI */
961
962 ignoreDataCon env dc = ignoreTyCon env (dataConTyCon dc)
963
964 ignoreType env ty
965 = case tyConAppTyCon_maybe ty of
966 Just tycon -> ignoreTyCon env tycon
967 _ -> False
968
969 ignoreTyCon :: ScEnv -> TyCon -> Bool
970 ignoreTyCon env tycon
971 = lookupUFM (sc_annotations env) tycon == Just NoSpecConstr
972 #endif /* GHCI */
973
974 forceSpecBndr env var = forceSpecFunTy env . snd . splitForAllTys . varType $ var
975
976 forceSpecFunTy :: ScEnv -> Type -> Bool
977 forceSpecFunTy env = any (forceSpecArgTy env) . fst . splitFunTys
978
979 forceSpecArgTy :: ScEnv -> Type -> Bool
980 forceSpecArgTy env ty
981 | Just ty' <- coreView ty = forceSpecArgTy env ty'
982
983 forceSpecArgTy env ty
984 | Just (tycon, tys) <- splitTyConApp_maybe ty
985 , tycon /= funTyCon
986 = tyConName tycon == specTyConName
987 #ifdef GHCI
988 || lookupUFM (sc_annotations env) tycon == Just ForceSpecConstr
989 #endif
990 || any (forceSpecArgTy env) tys
991
992 forceSpecArgTy _ _ = False
993
994 {-
995 Note [Add scrutinee to ValueEnv too]
996 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
997 Consider this:
998 case x of y
999 (a,b) -> case b of c
1000 I# v -> ...(f y)...
1001 By the time we get to the call (f y), the ValueEnv
1002 will have a binding for y, and for c
1003 y -> (a,b)
1004 c -> I# v
1005 BUT that's not enough! Looking at the call (f y) we
1006 see that y is pair (a,b), but we also need to know what 'b' is.
1007 So in extendCaseBndrs we must *also* add the binding
1008 b -> I# v
1009 else we lose a useful specialisation for f. This is necessary even
1010 though the simplifier has systematically replaced uses of 'x' with 'y'
1011 and 'b' with 'c' in the code. The use of 'b' in the ValueEnv came
1012 from outside the case. See Trac #4908 for the live example.
1013
1014 Note [Avoiding exponential blowup]
1015 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1016 The sc_count field of the ScEnv says how many times we are prepared to
1017 duplicate a single function. But we must take care with recursive
1018 specialisations. Consider
1019
1020 let $j1 = let $j2 = let $j3 = ...
1021 in
1022 ...$j3...
1023 in
1024 ...$j2...
1025 in
1026 ...$j1...
1027
1028 If we specialise $j1 then in each specialisation (as well as the original)
1029 we can specialise $j2, and similarly $j3. Even if we make just *one*
1030 specialisation of each, because we also have the original we'll get 2^n
1031 copies of $j3, which is not good.
1032
1033 So when recursively specialising we divide the sc_count by the number of
1034 copies we are making at this level, including the original.
1035
1036
1037 ************************************************************************
1038 * *
1039 \subsection{Usage information: flows upwards}
1040 * *
1041 ************************************************************************
1042 -}
1043
1044 data ScUsage
1045 = SCU {
1046 scu_calls :: CallEnv, -- Calls
1047 -- The functions are a subset of the
1048 -- RecFuns in the ScEnv
1049
1050 scu_occs :: !(IdEnv ArgOcc) -- Information on argument occurrences
1051 } -- The domain is OutIds
1052
1053 type CallEnv = IdEnv [Call]
1054 data Call = Call Id [CoreArg] ValueEnv
1055 -- The arguments of the call, together with the
1056 -- env giving the constructor bindings at the call site
1057 -- We keep the function mainly for debug output
1058
1059 instance Outputable ScUsage where
1060 ppr (SCU { scu_calls = calls, scu_occs = occs })
1061 = text "SCU" <+> braces (sep [ ptext (sLit "calls =") <+> ppr calls
1062 , text "occs =" <+> ppr occs ])
1063
1064 instance Outputable Call where
1065 ppr (Call fn args _) = ppr fn <+> fsep (map pprParendExpr args)
1066
1067 nullUsage :: ScUsage
1068 nullUsage = SCU { scu_calls = emptyVarEnv, scu_occs = emptyVarEnv }
1069
1070 combineCalls :: CallEnv -> CallEnv -> CallEnv
1071 combineCalls = plusVarEnv_C (++)
1072 where
1073 -- plus cs ds | length res > 1
1074 -- = pprTrace "combineCalls" (vcat [ text "cs:" <+> ppr cs
1075 -- , text "ds:" <+> ppr ds])
1076 -- res
1077 -- | otherwise = res
1078 -- where
1079 -- res = cs ++ ds
1080
1081 combineUsage :: ScUsage -> ScUsage -> ScUsage
1082 combineUsage u1 u2 = SCU { scu_calls = combineCalls (scu_calls u1) (scu_calls u2),
1083 scu_occs = plusVarEnv_C combineOcc (scu_occs u1) (scu_occs u2) }
1084
1085 combineUsages :: [ScUsage] -> ScUsage
1086 combineUsages [] = nullUsage
1087 combineUsages us = foldr1 combineUsage us
1088
1089 lookupOccs :: ScUsage -> [OutVar] -> (ScUsage, [ArgOcc])
1090 lookupOccs (SCU { scu_calls = sc_calls, scu_occs = sc_occs }) bndrs
1091 = (SCU {scu_calls = sc_calls, scu_occs = delVarEnvList sc_occs bndrs},
1092 [lookupVarEnv sc_occs b `orElse` NoOcc | b <- bndrs])
1093
1094 data ArgOcc = NoOcc -- Doesn't occur at all; or a type argument
1095 | UnkOcc -- Used in some unknown way
1096
1097 | ScrutOcc -- See Note [ScrutOcc]
1098 (DataConEnv [ArgOcc]) -- How the sub-components are used
1099
1100 type DataConEnv a = UniqFM a -- Keyed by DataCon
1101
1102 {- Note [ScrutOcc]
1103 ~~~~~~~~~~~~~~~~~~~
1104 An occurrence of ScrutOcc indicates that the thing, or a `cast` version of the thing,
1105 is *only* taken apart or applied.
1106
1107 Functions, literal: ScrutOcc emptyUFM
1108 Data constructors: ScrutOcc subs,
1109
1110 where (subs :: UniqFM [ArgOcc]) gives usage of the *pattern-bound* components,
1111 The domain of the UniqFM is the Unique of the data constructor
1112
1113 The [ArgOcc] is the occurrences of the *pattern-bound* components
1114 of the data structure. E.g.
1115 data T a = forall b. MkT a b (b->a)
1116 A pattern binds b, x::a, y::b, z::b->a, but not 'a'!
1117
1118 -}
1119
1120 instance Outputable ArgOcc where
1121 ppr (ScrutOcc xs) = text "scrut-occ" <> ppr xs
1122 ppr UnkOcc = text "unk-occ"
1123 ppr NoOcc = text "no-occ"
1124
1125 evalScrutOcc :: ArgOcc
1126 evalScrutOcc = ScrutOcc emptyUFM
1127
1128 -- Experimentally, this vesion of combineOcc makes ScrutOcc "win", so
1129 -- that if the thing is scrutinised anywhere then we get to see that
1130 -- in the overall result, even if it's also used in a boxed way
1131 -- This might be too aggressive; see Note [Reboxing] Alternative 3
1132 combineOcc :: ArgOcc -> ArgOcc -> ArgOcc
1133 combineOcc NoOcc occ = occ
1134 combineOcc occ NoOcc = occ
1135 combineOcc (ScrutOcc xs) (ScrutOcc ys) = ScrutOcc (plusUFM_C combineOccs xs ys)
1136 combineOcc UnkOcc (ScrutOcc ys) = ScrutOcc ys
1137 combineOcc (ScrutOcc xs) UnkOcc = ScrutOcc xs
1138 combineOcc UnkOcc UnkOcc = UnkOcc
1139
1140 combineOccs :: [ArgOcc] -> [ArgOcc] -> [ArgOcc]
1141 combineOccs xs ys = zipWithEqual "combineOccs" combineOcc xs ys
1142
1143 setScrutOcc :: ScEnv -> ScUsage -> OutExpr -> ArgOcc -> ScUsage
1144 -- _Overwrite_ the occurrence info for the scrutinee, if the scrutinee
1145 -- is a variable, and an interesting variable
1146 setScrutOcc env usg (Cast e _) occ = setScrutOcc env usg e occ
1147 setScrutOcc env usg (Tick _ e) occ = setScrutOcc env usg e occ
1148 setScrutOcc env usg (Var v) occ
1149 | Just RecArg <- lookupHowBound env v = usg { scu_occs = extendVarEnv (scu_occs usg) v occ }
1150 | otherwise = usg
1151 setScrutOcc _env usg _other _occ -- Catch-all
1152 = usg
1153
1154 {-
1155 ************************************************************************
1156 * *
1157 \subsection{The main recursive function}
1158 * *
1159 ************************************************************************
1160
1161 The main recursive function gathers up usage information, and
1162 creates specialised versions of functions.
1163 -}
1164
1165 scExpr, scExpr' :: ScEnv -> CoreExpr -> UniqSM (ScUsage, CoreExpr)
1166 -- The unique supply is needed when we invent
1167 -- a new name for the specialised function and its args
1168
1169 scExpr env e = scExpr' env e
1170
1171 scExpr' env (Var v) = case scSubstId env v of
1172 Var v' -> return (mkVarUsage env v' [], Var v')
1173 e' -> scExpr (zapScSubst env) e'
1174
1175 scExpr' env (Type t) = return (nullUsage, Type (scSubstTy env t))
1176 scExpr' env (Coercion c) = return (nullUsage, Coercion (scSubstCo env c))
1177 scExpr' _ e@(Lit {}) = return (nullUsage, e)
1178 scExpr' env (Tick t e) = do (usg, e') <- scExpr env e
1179 return (usg, Tick t e')
1180 scExpr' env (Cast e co) = do (usg, e') <- scExpr env e
1181 return (usg, mkCast e' (scSubstCo env co))
1182 -- Important to use mkCast here
1183 -- See Note [SpecConstr call patterns]
1184 scExpr' env e@(App _ _) = scApp env (collectArgs e)
1185 scExpr' env (Lam b e) = do let (env', b') = extendBndr env b
1186 (usg, e') <- scExpr env' e
1187 return (usg, Lam b' e')
1188
1189 scExpr' env (Case scrut b ty alts)
1190 = do { (scrut_usg, scrut') <- scExpr env scrut
1191 ; case isValue (sc_vals env) scrut' of
1192 Just (ConVal con args) -> sc_con_app con args scrut'
1193 _other -> sc_vanilla scrut_usg scrut'
1194 }
1195 where
1196 sc_con_app con args scrut' -- Known constructor; simplify
1197 = do { let (_, bs, rhs) = findAlt con alts
1198 `orElse` (DEFAULT, [], mkImpossibleExpr ty)
1199 alt_env' = extendScSubstList env ((b,scrut') : bs `zip` trimConArgs con args)
1200 ; scExpr alt_env' rhs }
1201
1202 sc_vanilla scrut_usg scrut' -- Normal case
1203 = do { let (alt_env,b') = extendBndrWith RecArg env b
1204 -- Record RecArg for the components
1205
1206 ; (alt_usgs, alt_occs, alts')
1207 <- mapAndUnzip3M (sc_alt alt_env scrut' b') alts
1208
1209 ; let scrut_occ = foldr combineOcc NoOcc alt_occs
1210 scrut_usg' = setScrutOcc env scrut_usg scrut' scrut_occ
1211 -- The combined usage of the scrutinee is given
1212 -- by scrut_occ, which is passed to scScrut, which
1213 -- in turn treats a bare-variable scrutinee specially
1214
1215 ; return (foldr combineUsage scrut_usg' alt_usgs,
1216 Case scrut' b' (scSubstTy env ty) alts') }
1217
1218 sc_alt env scrut' b' (con,bs,rhs)
1219 = do { let (env1, bs1) = extendBndrsWith RecArg env bs
1220 (env2, bs2) = extendCaseBndrs env1 scrut' b' con bs1
1221 ; (usg, rhs') <- scExpr env2 rhs
1222 ; let (usg', b_occ:arg_occs) = lookupOccs usg (b':bs2)
1223 scrut_occ = case con of
1224 DataAlt dc -> ScrutOcc (unitUFM dc arg_occs)
1225 _ -> ScrutOcc emptyUFM
1226 ; return (usg', b_occ `combineOcc` scrut_occ, (con, bs2, rhs')) }
1227
1228 scExpr' env (Let (NonRec bndr rhs) body)
1229 | isTyVar bndr -- Type-lets may be created by doBeta
1230 = scExpr' (extendScSubst env bndr rhs) body
1231
1232 | otherwise
1233 = do { let (body_env, bndr') = extendBndr env bndr
1234 ; rhs_info <- scRecRhs env (bndr',rhs)
1235
1236 ; let body_env2 = extendHowBound body_env [bndr'] RecFun
1237 -- Note [Local let bindings]
1238 rhs' = ri_new_rhs rhs_info
1239 body_env3 = extendValEnv body_env2 bndr' (isValue (sc_vals env) rhs')
1240
1241 ; (body_usg, body') <- scExpr body_env3 body
1242
1243 -- NB: For non-recursive bindings we inherit sc_force flag from
1244 -- the parent function (see Note [Forcing specialisation])
1245 ; (spec_usg, specs) <- specNonRec env body_usg rhs_info
1246
1247 ; return (body_usg { scu_calls = scu_calls body_usg `delVarEnv` bndr' }
1248 `combineUsage` spec_usg, -- Note [spec_usg includes rhs_usg]
1249 mkLets [NonRec b r | (b,r) <- ruleInfoBinds rhs_info specs] body')
1250 }
1251
1252
1253 -- A *local* recursive group: see Note [Local recursive groups]
1254 scExpr' env (Let (Rec prs) body)
1255 = do { let (bndrs,rhss) = unzip prs
1256 (rhs_env1,bndrs') = extendRecBndrs env bndrs
1257 rhs_env2 = extendHowBound rhs_env1 bndrs' RecFun
1258 force_spec = any (forceSpecBndr env) bndrs'
1259 -- Note [Forcing specialisation]
1260
1261 ; rhs_infos <- mapM (scRecRhs rhs_env2) (bndrs' `zip` rhss)
1262 ; (body_usg, body') <- scExpr rhs_env2 body
1263
1264 -- NB: start specLoop from body_usg
1265 ; (spec_usg, specs) <- specRec NotTopLevel (scForce rhs_env2 force_spec)
1266 body_usg rhs_infos
1267 -- Do not unconditionally generate specialisations from rhs_usgs
1268 -- Instead use them only if we find an unspecialised call
1269 -- See Note [Local recursive groups]
1270
1271 ; let all_usg = spec_usg `combineUsage` body_usg -- Note [spec_usg includes rhs_usg]
1272 bind' = Rec (concat (zipWith ruleInfoBinds rhs_infos specs))
1273
1274 ; return (all_usg { scu_calls = scu_calls all_usg `delVarEnvList` bndrs' },
1275 Let bind' body') }
1276
1277 {-
1278 Note [Local let bindings]
1279 ~~~~~~~~~~~~~~~~~~~~~~~~~
1280 It is not uncommon to find this
1281
1282 let $j = \x. <blah> in ...$j True...$j True...
1283
1284 Here $j is an arbitrary let-bound function, but it often comes up for
1285 join points. We might like to specialise $j for its call patterns.
1286 Notice the difference from a letrec, where we look for call patterns
1287 in the *RHS* of the function. Here we look for call patterns in the
1288 *body* of the let.
1289
1290 At one point I predicated this on the RHS mentioning the outer
1291 recursive function, but that's not essential and might even be
1292 harmful. I'm not sure.
1293 -}
1294
1295 scApp :: ScEnv -> (InExpr, [InExpr]) -> UniqSM (ScUsage, CoreExpr)
1296
1297 scApp env (Var fn, args) -- Function is a variable
1298 = ASSERT( not (null args) )
1299 do { args_w_usgs <- mapM (scExpr env) args
1300 ; let (arg_usgs, args') = unzip args_w_usgs
1301 arg_usg = combineUsages arg_usgs
1302 ; case scSubstId env fn of
1303 fn'@(Lam {}) -> scExpr (zapScSubst env) (doBeta fn' args')
1304 -- Do beta-reduction and try again
1305
1306 Var fn' -> return (arg_usg `combineUsage` mkVarUsage env fn' args',
1307 mkApps (Var fn') args')
1308
1309 other_fn' -> return (arg_usg, mkApps other_fn' args') }
1310 -- NB: doing this ignores any usage info from the substituted
1311 -- function, but I don't think that matters. If it does
1312 -- we can fix it.
1313 where
1314 doBeta :: OutExpr -> [OutExpr] -> OutExpr
1315 -- ToDo: adjust for System IF
1316 doBeta (Lam bndr body) (arg : args) = Let (NonRec bndr arg) (doBeta body args)
1317 doBeta fn args = mkApps fn args
1318
1319 -- The function is almost always a variable, but not always.
1320 -- In particular, if this pass follows float-in,
1321 -- which it may, we can get
1322 -- (let f = ...f... in f) arg1 arg2
1323 scApp env (other_fn, args)
1324 = do { (fn_usg, fn') <- scExpr env other_fn
1325 ; (arg_usgs, args') <- mapAndUnzipM (scExpr env) args
1326 ; return (combineUsages arg_usgs `combineUsage` fn_usg, mkApps fn' args') }
1327
1328 ----------------------
1329 mkVarUsage :: ScEnv -> Id -> [CoreExpr] -> ScUsage
1330 mkVarUsage env fn args
1331 = case lookupHowBound env fn of
1332 Just RecFun -> SCU { scu_calls = unitVarEnv fn [Call fn args (sc_vals env)]
1333 , scu_occs = emptyVarEnv }
1334 Just RecArg -> SCU { scu_calls = emptyVarEnv
1335 , scu_occs = unitVarEnv fn arg_occ }
1336 Nothing -> nullUsage
1337 where
1338 -- I rather think we could use UnkOcc all the time
1339 arg_occ | null args = UnkOcc
1340 | otherwise = evalScrutOcc
1341
1342 ----------------------
1343 scTopBindEnv :: ScEnv -> CoreBind -> UniqSM (ScEnv, CoreBind)
1344 scTopBindEnv env (Rec prs)
1345 = do { let (rhs_env1,bndrs') = extendRecBndrs env bndrs
1346 rhs_env2 = extendHowBound rhs_env1 bndrs RecFun
1347
1348 prs' = zip bndrs' rhss
1349 ; return (rhs_env2, Rec prs') }
1350 where
1351 (bndrs,rhss) = unzip prs
1352
1353 scTopBindEnv env (NonRec bndr rhs)
1354 = do { let (env1, bndr') = extendBndr env bndr
1355 env2 = extendValEnv env1 bndr' (isValue (sc_vals env) rhs)
1356 ; return (env2, NonRec bndr' rhs) }
1357
1358 ----------------------
1359 scTopBind :: ScEnv -> ScUsage -> CoreBind -> UniqSM (ScUsage, CoreBind)
1360
1361 {-
1362 scTopBind _ usage _
1363 | pprTrace "scTopBind_usage" (ppr (scu_calls usage)) False
1364 = error "false"
1365 -}
1366
1367 scTopBind env body_usage (Rec prs)
1368 | Just threshold <- sc_size env
1369 , not force_spec
1370 , not (all (couldBeSmallEnoughToInline (sc_dflags env) threshold) rhss)
1371 -- No specialisation
1372 = do { (rhs_usgs, rhss') <- mapAndUnzipM (scExpr env) rhss
1373 ; return (body_usage `combineUsage` combineUsages rhs_usgs, Rec (bndrs `zip` rhss')) }
1374
1375 | otherwise -- Do specialisation
1376 = do { rhs_infos <- mapM (scRecRhs env) prs
1377
1378 ; (spec_usage, specs) <- specRec TopLevel (scForce env force_spec)
1379 body_usage rhs_infos
1380
1381 ; return (body_usage `combineUsage` spec_usage,
1382 Rec (concat (zipWith ruleInfoBinds rhs_infos specs))) }
1383 where
1384 (bndrs,rhss) = unzip prs
1385 force_spec = any (forceSpecBndr env) bndrs
1386 -- Note [Forcing specialisation]
1387
1388 scTopBind env usage (NonRec bndr rhs) -- Oddly, we don't seem to specialise top-level non-rec functions
1389 = do { (rhs_usg', rhs') <- scExpr env rhs
1390 ; return (usage `combineUsage` rhs_usg', NonRec bndr rhs') }
1391
1392 ----------------------
1393 scRecRhs :: ScEnv -> (OutId, InExpr) -> UniqSM RhsInfo
1394 scRecRhs env (bndr,rhs)
1395 = do { let (arg_bndrs,body) = collectBinders rhs
1396 (body_env, arg_bndrs') = extendBndrsWith RecArg env arg_bndrs
1397 ; (body_usg, body') <- scExpr body_env body
1398 ; let (rhs_usg, arg_occs) = lookupOccs body_usg arg_bndrs'
1399 ; return (RI { ri_rhs_usg = rhs_usg
1400 , ri_fn = bndr, ri_new_rhs = mkLams arg_bndrs' body'
1401 , ri_lam_bndrs = arg_bndrs, ri_lam_body = body
1402 , ri_arg_occs = arg_occs }) }
1403 -- The arg_occs says how the visible,
1404 -- lambda-bound binders of the RHS are used
1405 -- (including the TyVar binders)
1406 -- Two pats are the same if they match both ways
1407
1408 ----------------------
1409 ruleInfoBinds :: RhsInfo -> [OneSpec] -> [(Id,CoreExpr)]
1410 ruleInfoBinds (RI { ri_fn = fn, ri_new_rhs = new_rhs }) specs
1411 = [(id,rhs) | OS _ _ id rhs <- specs] ++
1412 -- First the specialised bindings
1413
1414 [(fn `addIdSpecialisations` rules, new_rhs)]
1415 -- And now the original binding
1416 where
1417 rules = [r | OS _ r _ _ <- specs]
1418
1419 {-
1420 ************************************************************************
1421 * *
1422 The specialiser itself
1423 * *
1424 ************************************************************************
1425 -}
1426
1427 data RhsInfo
1428 = RI { ri_fn :: OutId -- The binder
1429 , ri_new_rhs :: OutExpr -- The specialised RHS (in current envt)
1430 , ri_rhs_usg :: ScUsage -- Usage info from specialising RHS
1431
1432 , ri_lam_bndrs :: [InVar] -- The *original* RHS (\xs.body)
1433 , ri_lam_body :: InExpr -- Note [Specialise original body]
1434 , ri_arg_occs :: [ArgOcc] -- Info on how the xs occur in body
1435 }
1436
1437 data RuleInfo = SI [OneSpec] -- The specialisations we have generated
1438
1439 Int -- Length of specs; used for numbering them
1440
1441 (Maybe ScUsage) -- Just cs => we have not yet used calls in the
1442 -- from calls in the *original* RHS as
1443 -- seeds for new specialisations;
1444 -- if you decide to do so, here is the
1445 -- RHS usage (which has not yet been
1446 -- unleashed)
1447 -- Nothing => we have
1448 -- See Note [Local recursive groups]
1449 -- See Note [spec_usg includes rhs_usg]
1450
1451 -- One specialisation: Rule plus definition
1452 data OneSpec = OS CallPat -- Call pattern that generated this specialisation
1453 CoreRule -- Rule connecting original id with the specialisation
1454 OutId OutExpr -- Spec id + its rhs
1455
1456
1457 ----------------------
1458 specNonRec :: ScEnv
1459 -> ScUsage -- Body usage
1460 -> RhsInfo -- Structure info usage info for un-specialised RHS
1461 -> UniqSM (ScUsage, [OneSpec]) -- Usage from RHSs (specialised and not)
1462 -- plus details of specialisations
1463
1464 specNonRec env body_usg rhs_info
1465 = do { (spec_usg, SI specs _ _) <- specialise env (scu_calls body_usg)
1466 rhs_info
1467 (SI [] 0 (Just (ri_rhs_usg rhs_info)))
1468 ; return (spec_usg, specs) }
1469
1470 ----------------------
1471 specRec :: TopLevelFlag -> ScEnv
1472 -> ScUsage -- Body usage
1473 -> [RhsInfo] -- Structure info and usage info for un-specialised RHSs
1474 -> UniqSM (ScUsage, [[OneSpec]]) -- Usage from all RHSs (specialised and not)
1475 -- plus details of specialisations
1476
1477 specRec top_lvl env body_usg rhs_infos
1478 = do { (spec_usg, spec_infos) <- go seed_calls nullUsage init_spec_infos
1479 ; return (spec_usg, [ s | SI s _ _ <- spec_infos ]) }
1480 where
1481 (seed_calls, init_spec_infos) -- Note [Seeding top-level recursive groups]
1482 | isTopLevel top_lvl
1483 , any (isExportedId . ri_fn) rhs_infos -- Seed from body and RHSs
1484 = (all_calls, [SI [] 0 Nothing | _ <- rhs_infos])
1485 | otherwise -- Seed from body only
1486 = (calls_in_body, [SI [] 0 (Just (ri_rhs_usg ri)) | ri <- rhs_infos])
1487
1488 calls_in_body = scu_calls body_usg
1489 calls_in_rhss = foldr (combineCalls . scu_calls . ri_rhs_usg) emptyVarEnv rhs_infos
1490 all_calls = calls_in_rhss `combineCalls` calls_in_body
1491
1492 -- Loop, specialising, until you get no new specialisations
1493 go seed_calls usg_so_far spec_infos
1494 | isEmptyVarEnv seed_calls
1495 = return (usg_so_far, spec_infos)
1496 | otherwise
1497 = do { specs_w_usg <- zipWithM (specialise env seed_calls) rhs_infos spec_infos
1498 ; let (extra_usg_s, new_spec_infos) = unzip specs_w_usg
1499 extra_usg = combineUsages extra_usg_s
1500 all_usg = usg_so_far `combineUsage` extra_usg
1501 ; go (scu_calls extra_usg) all_usg new_spec_infos }
1502
1503 ----------------------
1504 specialise
1505 :: ScEnv
1506 -> CallEnv -- Info on newly-discovered calls to this function
1507 -> RhsInfo
1508 -> RuleInfo -- Original RHS plus patterns dealt with
1509 -> UniqSM (ScUsage, RuleInfo) -- New specialised versions and their usage
1510
1511 -- See Note [spec_usg includes rhs_usg]
1512
1513 -- Note: this only generates *specialised* bindings
1514 -- The original binding is added by ruleInfoBinds
1515 --
1516 -- Note: the rhs here is the optimised version of the original rhs
1517 -- So when we make a specialised copy of the RHS, we're starting
1518 -- from an RHS whose nested functions have been optimised already.
1519
1520 specialise env bind_calls (RI { ri_fn = fn, ri_lam_bndrs = arg_bndrs
1521 , ri_lam_body = body, ri_arg_occs = arg_occs })
1522 spec_info@(SI specs spec_count mb_unspec)
1523 | isBottomingId fn -- Note [Do not specialise diverging functions]
1524 -- and do not generate specialisation seeds from its RHS
1525 = return (nullUsage, spec_info)
1526
1527 | isNeverActive (idInlineActivation fn) -- See Note [Transfer activation]
1528 || null arg_bndrs -- Only specialise functions
1529 = case mb_unspec of -- Behave as if there was a single, boring call
1530 Just rhs_usg -> return (rhs_usg, SI specs spec_count Nothing)
1531 -- See Note [spec_usg includes rhs_usg]
1532 Nothing -> return (nullUsage, spec_info)
1533
1534 | Just all_calls <- lookupVarEnv bind_calls fn
1535 = -- pprTrace "specialise entry {" (ppr fn <+> ppr (length all_calls)) $
1536 do { (boring_call, pats) <- callsToPats env specs arg_occs all_calls
1537
1538 -- Bale out if too many specialisations
1539 ; let n_pats = length pats
1540 spec_count' = n_pats + spec_count
1541 ; case sc_count env of
1542 Just max | not (sc_force env) && spec_count' > max
1543 -> if (debugIsOn || opt_PprStyle_Debug) -- Suppress this scary message for
1544 then pprTrace "SpecConstr" msg $ -- ordinary users! Trac #5125
1545 return (nullUsage, spec_info)
1546 else return (nullUsage, spec_info)
1547 where
1548 msg = vcat [ sep [ text "Function" <+> quotes (ppr fn)
1549 , nest 2 (text "has" <+>
1550 speakNOf spec_count' (text "call pattern") <> comma <+>
1551 text "but the limit is" <+> int max) ]
1552 , text "Use -fspec-constr-count=n to set the bound"
1553 , extra ]
1554 extra | not opt_PprStyle_Debug = text "Use -dppr-debug to see specialisations"
1555 | otherwise = text "Specialisations:" <+> ppr (pats ++ [p | OS p _ _ _ <- specs])
1556
1557 _normal_case -> do {
1558
1559 -- ; if (not (null pats) || isJust mb_unspec) then
1560 -- pprTrace "specialise" (vcat [ ppr fn <+> text "with" <+> int (length pats) <+> text "good patterns"
1561 -- , text "mb_unspec" <+> ppr (isJust mb_unspec)
1562 -- , text "arg_occs" <+> ppr arg_occs
1563 -- , text "good pats" <+> ppr pats]) $
1564 -- return ()
1565 -- else return ()
1566
1567 ; let spec_env = decreaseSpecCount env n_pats
1568 ; (spec_usgs, new_specs) <- mapAndUnzipM (spec_one spec_env fn arg_bndrs body)
1569 (pats `zip` [spec_count..])
1570 -- See Note [Specialise original body]
1571
1572 ; let spec_usg = combineUsages spec_usgs
1573
1574 -- If there were any boring calls among the seeds (= all_calls), then those
1575 -- calls will call the un-specialised function. So we should use the seeds
1576 -- from the _unspecialised_ function's RHS, which are in mb_unspec, by returning
1577 -- then in new_usg.
1578 (new_usg, mb_unspec')
1579 = case mb_unspec of
1580 Just rhs_usg | boring_call -> (spec_usg `combineUsage` rhs_usg, Nothing)
1581 _ -> (spec_usg, mb_unspec)
1582
1583 -- ; pprTrace "specialise return }" (ppr fn
1584 -- <+> ppr (scu_calls new_usg))
1585 ; return (new_usg, SI (new_specs ++ specs) spec_count' mb_unspec') } }
1586
1587
1588 | otherwise -- No new seeds, so return nullUsage
1589 = return (nullUsage, spec_info)
1590
1591
1592 ---------------------
1593 spec_one :: ScEnv
1594 -> OutId -- Function
1595 -> [InVar] -- Lambda-binders of RHS; should match patterns
1596 -> InExpr -- Body of the original function
1597 -> (CallPat, Int)
1598 -> UniqSM (ScUsage, OneSpec) -- Rule and binding
1599
1600 -- spec_one creates a specialised copy of the function, together
1601 -- with a rule for using it. I'm very proud of how short this
1602 -- function is, considering what it does :-).
1603
1604 {-
1605 Example
1606
1607 In-scope: a, x::a
1608 f = /\b \y::[(a,b)] -> ....f (b,c) ((:) (a,(b,c)) (x,v) (h w))...
1609 [c::*, v::(b,c) are presumably bound by the (...) part]
1610 ==>
1611 f_spec = /\ b c \ v::(b,c) hw::[(a,(b,c))] ->
1612 (...entire body of f...) [b -> (b,c),
1613 y -> ((:) (a,(b,c)) (x,v) hw)]
1614
1615 RULE: forall b::* c::*, -- Note, *not* forall a, x
1616 v::(b,c),
1617 hw::[(a,(b,c))] .
1618
1619 f (b,c) ((:) (a,(b,c)) (x,v) hw) = f_spec b c v hw
1620 -}
1621
1622 spec_one env fn arg_bndrs body (call_pat@(qvars, pats), rule_number)
1623 = do { spec_uniq <- getUniqueM
1624 ; let spec_env = extendScSubstList (extendScInScope env qvars)
1625 (arg_bndrs `zip` pats)
1626 fn_name = idName fn
1627 fn_loc = nameSrcSpan fn_name
1628 fn_occ = nameOccName fn_name
1629 spec_occ = mkSpecOcc fn_occ
1630 -- We use fn_occ rather than fn in the rule_name string
1631 -- as we don't want the uniq to end up in the rule, and
1632 -- hence in the ABI, as that can cause spurious ABI
1633 -- changes (#4012).
1634 rule_name = mkFastString ("SC:" ++ occNameString fn_occ ++ show rule_number)
1635 spec_name = mkInternalName spec_uniq spec_occ fn_loc
1636 -- ; pprTrace "{spec_one" (ppr (sc_count env) <+> ppr fn <+> ppr pats <+> text "-->" <+> ppr spec_name) $
1637 -- return ()
1638
1639 -- Specialise the body
1640 ; (spec_usg, spec_body) <- scExpr spec_env body
1641
1642 -- ; pprTrace "done spec_one}" (ppr fn) $
1643 -- return ()
1644
1645 -- And build the results
1646 ; let spec_id = mkLocalIdOrCoVar spec_name (mkLamTypes spec_lam_args body_ty)
1647 -- See Note [Transfer strictness]
1648 `setIdStrictness` spec_str
1649 `setIdArity` count isId spec_lam_args
1650 spec_str = calcSpecStrictness fn spec_lam_args pats
1651
1652
1653 -- Conditionally use result of new worker-wrapper transform
1654 (spec_lam_args, spec_call_args) = mkWorkerArgs (sc_dflags env) qvars body_ty
1655 -- Usual w/w hack to avoid generating
1656 -- a spec_rhs of unlifted type and no args
1657
1658 spec_lam_args_str = handOutStrictnessInformation (fst (splitStrictSig spec_str)) spec_lam_args
1659 -- Annotate the variables with the strictness information from
1660 -- the function (see Note [Strictness information in worker binders])
1661
1662 spec_rhs = mkLams spec_lam_args_str spec_body
1663 body_ty = exprType spec_body
1664 rule_rhs = mkVarApps (Var spec_id) spec_call_args
1665 inline_act = idInlineActivation fn
1666 this_mod = sc_module spec_env
1667 rule = mkRule this_mod True {- Auto -} True {- Local -}
1668 rule_name inline_act fn_name qvars pats rule_rhs
1669 -- See Note [Transfer activation]
1670 ; return (spec_usg, OS call_pat rule spec_id spec_rhs) }
1671
1672
1673 -- See Note [Strictness information in worker binders]
1674 handOutStrictnessInformation :: [Demand] -> [Var] -> [Var]
1675 handOutStrictnessInformation = go
1676 where
1677 go _ [] = []
1678 go [] vs = vs
1679 go (d:dmds) (v:vs) | isId v = setIdDemandInfo v d : go dmds vs
1680 go dmds (v:vs) = v : go dmds vs
1681
1682 calcSpecStrictness :: Id -- The original function
1683 -> [Var] -> [CoreExpr] -- Call pattern
1684 -> StrictSig -- Strictness of specialised thing
1685 -- See Note [Transfer strictness]
1686 calcSpecStrictness fn qvars pats
1687 = mkClosedStrictSig spec_dmds topRes
1688 where
1689 spec_dmds = [ lookupVarEnv dmd_env qv `orElse` topDmd | qv <- qvars, isId qv ]
1690 StrictSig (DmdType _ dmds _) = idStrictness fn
1691
1692 dmd_env = go emptyVarEnv dmds pats
1693
1694 go :: DmdEnv -> [Demand] -> [CoreExpr] -> DmdEnv
1695 go env ds (Type {} : pats) = go env ds pats
1696 go env ds (Coercion {} : pats) = go env ds pats
1697 go env (d:ds) (pat : pats) = go (go_one env d pat) ds pats
1698 go env _ _ = env
1699
1700 go_one :: DmdEnv -> Demand -> CoreExpr -> DmdEnv
1701 go_one env d (Var v) = extendVarEnv_C bothDmd env v d
1702 go_one env d e
1703 | Just ds <- splitProdDmd_maybe d -- NB: d does not have to be strict
1704 , (Var _, args) <- collectArgs e = go env ds args
1705 go_one env _ _ = env
1706
1707 {-
1708 Note [spec_usg includes rhs_usg]
1709 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1710 In calls to 'specialise', the returned ScUsage must include the rhs_usg in
1711 the passed-in RuleInfo, unless there are no calls at all to the function.
1712
1713 The caller can, indeed must, assume this. He should not combine in rhs_usg
1714 himself, or he'll get rhs_usg twice -- and that can lead to an exponential
1715 blowup of duplicates in the CallEnv. This is what gave rise to the massive
1716 performace loss in Trac #8852.
1717
1718 Note [Specialise original body]
1719 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1720 The RhsInfo for a binding keeps the *original* body of the binding. We
1721 must specialise that, *not* the result of applying specExpr to the RHS
1722 (which is also kept in RhsInfo). Otherwise we end up specialising a
1723 specialised RHS, and that can lead directly to exponential behaviour.
1724
1725 Note [Transfer activation]
1726 ~~~~~~~~~~~~~~~~~~~~~~~~~~
1727 This note is for SpecConstr, but exactly the same thing
1728 happens in the overloading specialiser; see
1729 Note [Auto-specialisation and RULES] in Specialise.
1730
1731 In which phase should the specialise-constructor rules be active?
1732 Originally I made them always-active, but Manuel found that this
1733 defeated some clever user-written rules. Then I made them active only
1734 in Phase 0; after all, currently, the specConstr transformation is
1735 only run after the simplifier has reached Phase 0, but that meant
1736 that specialisations didn't fire inside wrappers; see test
1737 simplCore/should_compile/spec-inline.
1738
1739 So now I just use the inline-activation of the parent Id, as the
1740 activation for the specialiation RULE, just like the main specialiser;
1741
1742 This in turn means there is no point in specialising NOINLINE things,
1743 so we test for that.
1744
1745 Note [Transfer strictness]
1746 ~~~~~~~~~~~~~~~~~~~~~~~~~~
1747 We must transfer strictness information from the original function to
1748 the specialised one. Suppose, for example
1749
1750 f has strictness SS
1751 and a RULE f (a:as) b = f_spec a as b
1752
1753 Now we want f_spec to have strictness LLS, otherwise we'll use call-by-need
1754 when calling f_spec instead of call-by-value. And that can result in
1755 unbounded worsening in space (cf the classic foldl vs foldl')
1756
1757 See Trac #3437 for a good example.
1758
1759 The function calcSpecStrictness performs the calculation.
1760
1761 Note [Strictness information in worker binders]
1762 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1763
1764 After having calculated the strictness annotation for the worker (see Note
1765 [Transfer strictness] above), we also want to have this information attached to
1766 the worker’s arguments, for the benefit of later passes. The function
1767 handOutStrictnessInformation decomposes the strictness annotation calculated by
1768 calcSpecStrictness and attaches them to the variables.
1769
1770 ************************************************************************
1771 * *
1772 \subsection{Argument analysis}
1773 * *
1774 ************************************************************************
1775
1776 This code deals with analysing call-site arguments to see whether
1777 they are constructor applications.
1778
1779 Note [Free type variables of the qvar types]
1780 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1781 In a call (f @a x True), that we want to specialise, what variables should
1782 we quantify over. Clearly over 'a' and 'x', but what about any type variables
1783 free in x's type? In fact we don't need to worry about them because (f @a)
1784 can only be a well-typed application if its type is compatible with x, so any
1785 variables free in x's type must be free in (f @a), and hence either be gathered
1786 via 'a' itself, or be in scope at f's defn. Hence we just take
1787 (exprsFreeVars pats).
1788
1789 BUT phantom type synonyms can mess this reasoning up,
1790 eg x::T b with type T b = Int
1791 So we apply expandTypeSynonyms to the bound Ids.
1792 See Trac # 5458. Yuk.
1793
1794 Note [SpecConstr call patterns]
1795 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1796 A "call patterns" that we collect is going to become the LHS of a RULE.
1797 It's important that it doesn't have
1798 e |> Refl
1799 or
1800 e |> g1 |> g2
1801 because both of these will be optimised by Simplify.simplRule. In the
1802 former case such optimisation benign, because the rule will match more
1803 terms; but in the latter we may lose a binding of 'g1' or 'g2', and
1804 end up with a rule LHS that doesn't bind the template variables
1805 (Trac #10602).
1806
1807 The simplifier eliminates such things, but SpecConstr itself constructs
1808 new terms by substituting. So the 'mkCast' in the Cast case of scExpr
1809 is very important!
1810 -}
1811
1812 type CallPat = ([Var], [CoreExpr]) -- Quantified variables and arguments
1813 -- See Note [SpecConstr call patterns]
1814
1815 callsToPats :: ScEnv -> [OneSpec] -> [ArgOcc] -> [Call] -> UniqSM (Bool, [CallPat])
1816 -- Result has no duplicate patterns,
1817 -- nor ones mentioned in done_pats
1818 -- Bool indicates that there was at least one boring pattern
1819 callsToPats env done_specs bndr_occs calls
1820 = do { mb_pats <- mapM (callToPats env bndr_occs) calls
1821
1822 ; let good_pats :: [(CallPat, ValueEnv)]
1823 good_pats = catMaybes mb_pats
1824 done_pats = [p | OS p _ _ _ <- done_specs]
1825 is_done p = any (samePat p) done_pats
1826 no_recursive = map fst (filterOut (is_too_recursive env) good_pats)
1827
1828 ; return (any isNothing mb_pats,
1829 filterOut is_done (nubBy samePat no_recursive)) }
1830
1831 is_too_recursive :: ScEnv -> (CallPat, ValueEnv) -> Bool
1832 -- Count the number of recursive constructors in a call pattern,
1833 -- filter out if there are more than the maximum.
1834 -- This is only necessary if ForceSpecConstr is in effect:
1835 -- otherwise specConstrCount will cause specialisation to terminate.
1836 -- See Note [Limit recursive specialisation]
1837 is_too_recursive env ((_,exprs), val_env)
1838 = sc_force env && maximum (map go exprs) > sc_recursive env
1839 where
1840 go e
1841 | Just (ConVal (DataAlt dc) args) <- isValue val_env e
1842 , isRecursiveTyCon (dataConTyCon dc)
1843 = 1 + sum (map go args)
1844
1845 |App f a <- e
1846 = go f + go a
1847
1848 | otherwise
1849 = 0
1850
1851 callToPats :: ScEnv -> [ArgOcc] -> Call -> UniqSM (Maybe (CallPat, ValueEnv))
1852 -- The [Var] is the variables to quantify over in the rule
1853 -- Type variables come first, since they may scope
1854 -- over the following term variables
1855 -- The [CoreExpr] are the argument patterns for the rule
1856 callToPats env bndr_occs (Call _ args con_env)
1857 | length args < length bndr_occs -- Check saturated
1858 = return Nothing
1859 | otherwise
1860 = do { let in_scope = substInScope (sc_subst env)
1861 ; (interesting, pats) <- argsToPats env in_scope con_env args bndr_occs
1862 ; let pat_fvs = exprsFreeVarsList pats
1863 -- To get determinism we need the list of free variables in
1864 -- deterministic order. Otherwise we end up creating
1865 -- lambdas with different argument orders. See
1866 -- determinism/simplCore/should_compile/spec-inline-determ.hs
1867 -- for an example. For explanation of determinism
1868 -- considerations See Note [Unique Determinism] in Unique.
1869 in_scope_vars = getInScopeVars in_scope
1870 qvars = filterOut (`elemVarSet` in_scope_vars) pat_fvs
1871 -- Quantify over variables that are not in scope
1872 -- at the call site
1873 -- See Note [Free type variables of the qvar types]
1874 -- See Note [Shadowing] at the top
1875
1876 (ktvs, ids) = partition isTyVar qvars
1877 qvars' = toposortTyVars ktvs ++ map sanitise ids
1878 -- Order into kind variables, type variables, term variables
1879 -- The kind of a type variable may mention a kind variable
1880 -- and the type of a term variable may mention a type variable
1881
1882 sanitise id = id `setIdType` expandTypeSynonyms (idType id)
1883 -- See Note [Free type variables of the qvar types]
1884
1885 ; -- pprTrace "callToPats" (ppr args $$ ppr bndr_occs) $
1886 if interesting
1887 then return (Just ((qvars', pats), con_env))
1888 else return Nothing }
1889
1890 -- argToPat takes an actual argument, and returns an abstracted
1891 -- version, consisting of just the "constructor skeleton" of the
1892 -- argument, with non-constructor sub-expression replaced by new
1893 -- placeholder variables. For example:
1894 -- C a (D (f x) (g y)) ==> C p1 (D p2 p3)
1895
1896 argToPat :: ScEnv
1897 -> InScopeSet -- What's in scope at the fn defn site
1898 -> ValueEnv -- ValueEnv at the call site
1899 -> CoreArg -- A call arg (or component thereof)
1900 -> ArgOcc
1901 -> UniqSM (Bool, CoreArg)
1902
1903 -- Returns (interesting, pat),
1904 -- where pat is the pattern derived from the argument
1905 -- interesting=True if the pattern is non-trivial (not a variable or type)
1906 -- E.g. x:xs --> (True, x:xs)
1907 -- f xs --> (False, w) where w is a fresh wildcard
1908 -- (f xs, 'c') --> (True, (w, 'c')) where w is a fresh wildcard
1909 -- \x. x+y --> (True, \x. x+y)
1910 -- lvl7 --> (True, lvl7) if lvl7 is bound
1911 -- somewhere further out
1912
1913 argToPat _env _in_scope _val_env arg@(Type {}) _arg_occ
1914 = return (False, arg)
1915
1916 argToPat env in_scope val_env (Tick _ arg) arg_occ
1917 = argToPat env in_scope val_env arg arg_occ
1918 -- Note [Notes in call patterns]
1919 -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1920 -- Ignore Notes. In particular, we want to ignore any InlineMe notes
1921 -- Perhaps we should not ignore profiling notes, but I'm going to
1922 -- ride roughshod over them all for now.
1923 --- See Note [Notes in RULE matching] in Rules
1924
1925 argToPat env in_scope val_env (Let _ arg) arg_occ
1926 = argToPat env in_scope val_env arg arg_occ
1927 -- See Note [Matching lets] in Rule.hs
1928 -- Look through let expressions
1929 -- e.g. f (let v = rhs in (v,w))
1930 -- Here we can specialise for f (v,w)
1931 -- because the rule-matcher will look through the let.
1932
1933 {- Disabled; see Note [Matching cases] in Rule.hs
1934 argToPat env in_scope val_env (Case scrut _ _ [(_, _, rhs)]) arg_occ
1935 | exprOkForSpeculation scrut -- See Note [Matching cases] in Rule.hhs
1936 = argToPat env in_scope val_env rhs arg_occ
1937 -}
1938
1939 argToPat env in_scope val_env (Cast arg co) arg_occ
1940 | not (ignoreType env ty2)
1941 = do { (interesting, arg') <- argToPat env in_scope val_env arg arg_occ
1942 ; if not interesting then
1943 wildCardPat ty2
1944 else do
1945 { -- Make a wild-card pattern for the coercion
1946 uniq <- getUniqueM
1947 ; let co_name = mkSysTvName uniq (fsLit "sg")
1948 co_var = mkCoVar co_name (mkCoercionType Representational ty1 ty2)
1949 ; return (interesting, Cast arg' (mkCoVarCo co_var)) } }
1950 where
1951 Pair ty1 ty2 = coercionKind co
1952
1953
1954
1955 {- Disabling lambda specialisation for now
1956 It's fragile, and the spec_loop can be infinite
1957 argToPat in_scope val_env arg arg_occ
1958 | is_value_lam arg
1959 = return (True, arg)
1960 where
1961 is_value_lam (Lam v e) -- Spot a value lambda, even if
1962 | isId v = True -- it is inside a type lambda
1963 | otherwise = is_value_lam e
1964 is_value_lam other = False
1965 -}
1966
1967 -- Check for a constructor application
1968 -- NB: this *precedes* the Var case, so that we catch nullary constrs
1969 argToPat env in_scope val_env arg arg_occ
1970 | Just (ConVal (DataAlt dc) args) <- isValue val_env arg
1971 , not (ignoreDataCon env dc) -- See Note [NoSpecConstr]
1972 , Just arg_occs <- mb_scrut dc
1973 = do { let (ty_args, rest_args) = splitAtList (dataConUnivTyVars dc) args
1974 ; (_, args') <- argsToPats env in_scope val_env rest_args arg_occs
1975 ; return (True,
1976 mkConApp dc (ty_args ++ args')) }
1977 where
1978 mb_scrut dc = case arg_occ of
1979 ScrutOcc bs
1980 | Just occs <- lookupUFM bs dc
1981 -> Just (occs) -- See Note [Reboxing]
1982 _other | sc_force env -> Just (repeat UnkOcc)
1983 | otherwise -> Nothing
1984
1985 -- Check if the argument is a variable that
1986 -- (a) is used in an interesting way in the function body
1987 -- (b) we know what its value is
1988 -- In that case it counts as "interesting"
1989 argToPat env in_scope val_env (Var v) arg_occ
1990 | sc_force env || case arg_occ of { UnkOcc -> False; _other -> True }, -- (a)
1991 is_value, -- (b)
1992 not (ignoreType env (varType v))
1993 = return (True, Var v)
1994 where
1995 is_value
1996 | isLocalId v = v `elemInScopeSet` in_scope
1997 && isJust (lookupVarEnv val_env v)
1998 -- Local variables have values in val_env
1999 | otherwise = isValueUnfolding (idUnfolding v)
2000 -- Imports have unfoldings
2001
2002 -- I'm really not sure what this comment means
2003 -- And by not wild-carding we tend to get forall'd
2004 -- variables that are in scope, which in turn can
2005 -- expose the weakness in let-matching
2006 -- See Note [Matching lets] in Rules
2007
2008 -- Check for a variable bound inside the function.
2009 -- Don't make a wild-card, because we may usefully share
2010 -- e.g. f a = let x = ... in f (x,x)
2011 -- NB: this case follows the lambda and con-app cases!!
2012 -- argToPat _in_scope _val_env (Var v) _arg_occ
2013 -- = return (False, Var v)
2014 -- SLPJ : disabling this to avoid proliferation of versions
2015 -- also works badly when thinking about seeding the loop
2016 -- from the body of the let
2017 -- f x y = letrec g z = ... in g (x,y)
2018 -- We don't want to specialise for that *particular* x,y
2019
2020 -- The default case: make a wild-card
2021 -- We use this for coercions too
2022 argToPat _env _in_scope _val_env arg _arg_occ
2023 = wildCardPat (exprType arg)
2024
2025 wildCardPat :: Type -> UniqSM (Bool, CoreArg)
2026 wildCardPat ty
2027 = do { uniq <- getUniqueM
2028 ; let id = mkSysLocalOrCoVar (fsLit "sc") uniq ty
2029 ; return (False, varToCoreExpr id) }
2030
2031 argsToPats :: ScEnv -> InScopeSet -> ValueEnv
2032 -> [CoreArg] -> [ArgOcc] -- Should be same length
2033 -> UniqSM (Bool, [CoreArg])
2034 argsToPats env in_scope val_env args occs
2035 = do { stuff <- zipWithM (argToPat env in_scope val_env) args occs
2036 ; let (interesting_s, args') = unzip stuff
2037 ; return (or interesting_s, args') }
2038
2039 isValue :: ValueEnv -> CoreExpr -> Maybe Value
2040 isValue _env (Lit lit)
2041 | litIsLifted lit = Nothing
2042 | otherwise = Just (ConVal (LitAlt lit) [])
2043
2044 isValue env (Var v)
2045 | Just cval <- lookupVarEnv env v
2046 = Just cval -- You might think we could look in the idUnfolding here
2047 -- but that doesn't take account of which branch of a
2048 -- case we are in, which is the whole point
2049
2050 | not (isLocalId v) && isCheapUnfolding unf
2051 = isValue env (unfoldingTemplate unf)
2052 where
2053 unf = idUnfolding v
2054 -- However we do want to consult the unfolding
2055 -- as well, for let-bound constructors!
2056
2057 isValue env (Lam b e)
2058 | isTyVar b = case isValue env e of
2059 Just _ -> Just LambdaVal
2060 Nothing -> Nothing
2061 | otherwise = Just LambdaVal
2062
2063 isValue env (Tick t e)
2064 | not (tickishIsCode t)
2065 = isValue env e
2066
2067 isValue _env expr -- Maybe it's a constructor application
2068 | (Var fun, args, _) <- collectArgsTicks (not . tickishIsCode) expr
2069 = case isDataConWorkId_maybe fun of
2070
2071 Just con | args `lengthAtLeast` dataConRepArity con
2072 -- Check saturated; might be > because the
2073 -- arity excludes type args
2074 -> Just (ConVal (DataAlt con) args)
2075
2076 _other | valArgCount args < idArity fun
2077 -- Under-applied function
2078 -> Just LambdaVal -- Partial application
2079
2080 _other -> Nothing
2081
2082 isValue _env _expr = Nothing
2083
2084 valueIsWorkFree :: Value -> Bool
2085 valueIsWorkFree LambdaVal = True
2086 valueIsWorkFree (ConVal _ args) = all exprIsWorkFree args
2087
2088 samePat :: CallPat -> CallPat -> Bool
2089 samePat (vs1, as1) (vs2, as2)
2090 = all2 same as1 as2
2091 where
2092 same (Var v1) (Var v2)
2093 | v1 `elem` vs1 = v2 `elem` vs2
2094 | v2 `elem` vs2 = False
2095 | otherwise = v1 == v2
2096
2097 same (Lit l1) (Lit l2) = l1==l2
2098 same (App f1 a1) (App f2 a2) = same f1 f2 && same a1 a2
2099
2100 same (Type {}) (Type {}) = True -- Note [Ignore type differences]
2101 same (Coercion {}) (Coercion {}) = True
2102 same (Tick _ e1) e2 = same e1 e2 -- Ignore casts and notes
2103 same (Cast e1 _) e2 = same e1 e2
2104 same e1 (Tick _ e2) = same e1 e2
2105 same e1 (Cast e2 _) = same e1 e2
2106
2107 same e1 e2 = WARN( bad e1 || bad e2, ppr e1 $$ ppr e2)
2108 False -- Let, lambda, case should not occur
2109 bad (Case {}) = True
2110 bad (Let {}) = True
2111 bad (Lam {}) = True
2112 bad _other = False
2113
2114 {-
2115 Note [Ignore type differences]
2116 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2117 We do not want to generate specialisations where the call patterns
2118 differ only in their type arguments! Not only is it utterly useless,
2119 but it also means that (with polymorphic recursion) we can generate
2120 an infinite number of specialisations. Example is Data.Sequence.adjustTree,
2121 I think.
2122 -}