Update Trac ticket URLs to point to GitLab
[ghc.git] / compiler / specialise / SpecConstr.hs
1 {-
2 ToDo [Oct 2013]
3 ~~~~~~~~~~~~~~~
4 1. Nuke ForceSpecConstr for good (it is subsumed by GHC.Types.SPEC in ghc-prim)
5 2. Nuke NoSpecConstr
6
7
8 (c) The GRASP/AQUA Project, Glasgow University, 1992-1998
9
10 \section[SpecConstr]{Specialise over constructors}
11 -}
12
13 {-# LANGUAGE CPP #-}
14
15 module SpecConstr(
16 specConstrProgram,
17 SpecConstrAnnotation(..)
18 ) where
19
20 #include "HsVersions.h"
21
22 import GhcPrelude
23
24 import CoreSyn
25 import CoreSubst
26 import CoreUtils
27 import CoreUnfold ( couldBeSmallEnoughToInline )
28 import CoreFVs ( exprsFreeVarsList )
29 import CoreMonad
30 import Literal ( litIsLifted )
31 import HscTypes ( ModGuts(..) )
32 import WwLib ( isWorkerSmallEnough, mkWorkerArgs )
33 import DataCon
34 import Coercion hiding( substCo )
35 import Rules
36 import Type hiding ( substTy )
37 import TyCon ( tyConName )
38 import Id
39 import PprCore ( pprParendExpr )
40 import MkCore ( mkImpossibleExpr )
41 import VarEnv
42 import VarSet
43 import Name
44 import BasicTypes
45 import DynFlags ( DynFlags(..), GeneralFlag( Opt_SpecConstrKeen )
46 , gopt, hasPprDebug )
47 import Maybes ( orElse, catMaybes, isJust, isNothing )
48 import Demand
49 import GHC.Serialized ( deserializeWithData )
50 import Util
51 import Pair
52 import UniqSupply
53 import Outputable
54 import FastString
55 import UniqFM
56 import MonadUtils
57 import Control.Monad ( zipWithM )
58 import Data.List
59 import PrelNames ( specTyConName )
60 import Module
61 import TyCon ( TyCon )
62 import GHC.Exts( SpecConstrAnnotation(..) )
63 import Data.Ord( comparing )
64
65 {-
66 -----------------------------------------------------
67 Game plan
68 -----------------------------------------------------
69
70 Consider
71 drop n [] = []
72 drop 0 xs = []
73 drop n (x:xs) = drop (n-1) xs
74
75 After the first time round, we could pass n unboxed. This happens in
76 numerical code too. Here's what it looks like in Core:
77
78 drop n xs = case xs of
79 [] -> []
80 (y:ys) -> case n of
81 I# n# -> case n# of
82 0 -> []
83 _ -> drop (I# (n# -# 1#)) xs
84
85 Notice that the recursive call has an explicit constructor as argument.
86 Noticing this, we can make a specialised version of drop
87
88 RULE: drop (I# n#) xs ==> drop' n# xs
89
90 drop' n# xs = let n = I# n# in ...orig RHS...
91
92 Now the simplifier will apply the specialisation in the rhs of drop', giving
93
94 drop' n# xs = case xs of
95 [] -> []
96 (y:ys) -> case n# of
97 0 -> []
98 _ -> drop' (n# -# 1#) xs
99
100 Much better!
101
102 We'd also like to catch cases where a parameter is carried along unchanged,
103 but evaluated each time round the loop:
104
105 f i n = if i>0 || i>n then i else f (i*2) n
106
107 Here f isn't strict in n, but we'd like to avoid evaluating it each iteration.
108 In Core, by the time we've w/wd (f is strict in i) we get
109
110 f i# n = case i# ># 0 of
111 False -> I# i#
112 True -> case n of { I# n# ->
113 case i# ># n# of
114 False -> I# i#
115 True -> f (i# *# 2#) n
116
117 At the call to f, we see that the argument, n is known to be (I# n#),
118 and n is evaluated elsewhere in the body of f, so we can play the same
119 trick as above.
120
121
122 Note [Reboxing]
123 ~~~~~~~~~~~~~~~
124 We must be careful not to allocate the same constructor twice. Consider
125 f p = (...(case p of (a,b) -> e)...p...,
126 ...let t = (r,s) in ...t...(f t)...)
127 At the recursive call to f, we can see that t is a pair. But we do NOT want
128 to make a specialised copy:
129 f' a b = let p = (a,b) in (..., ...)
130 because now t is allocated by the caller, then r and s are passed to the
131 recursive call, which allocates the (r,s) pair again.
132
133 This happens if
134 (a) the argument p is used in other than a case-scrutinisation way.
135 (b) the argument to the call is not a 'fresh' tuple; you have to
136 look into its unfolding to see that it's a tuple
137
138 Hence the "OR" part of Note [Good arguments] below.
139
140 ALTERNATIVE 2: pass both boxed and unboxed versions. This no longer saves
141 allocation, but does perhaps save evals. In the RULE we'd have
142 something like
143
144 f (I# x#) = f' (I# x#) x#
145
146 If at the call site the (I# x) was an unfolding, then we'd have to
147 rely on CSE to eliminate the duplicate allocation.... This alternative
148 doesn't look attractive enough to pursue.
149
150 ALTERNATIVE 3: ignore the reboxing problem. The trouble is that
151 the conservative reboxing story prevents many useful functions from being
152 specialised. Example:
153 foo :: Maybe Int -> Int -> Int
154 foo (Just m) 0 = 0
155 foo x@(Just m) n = foo x (n-m)
156 Here the use of 'x' will clearly not require boxing in the specialised function.
157
158 The strictness analyser has the same problem, in fact. Example:
159 f p@(a,b) = ...
160 If we pass just 'a' and 'b' to the worker, it might need to rebox the
161 pair to create (a,b). A more sophisticated analysis might figure out
162 precisely the cases in which this could happen, but the strictness
163 analyser does no such analysis; it just passes 'a' and 'b', and hopes
164 for the best.
165
166 So my current choice is to make SpecConstr similarly aggressive, and
167 ignore the bad potential of reboxing.
168
169
170 Note [Good arguments]
171 ~~~~~~~~~~~~~~~~~~~~~
172 So we look for
173
174 * A self-recursive function. Ignore mutual recursion for now,
175 because it's less common, and the code is simpler for self-recursion.
176
177 * EITHER
178
179 a) At a recursive call, one or more parameters is an explicit
180 constructor application
181 AND
182 That same parameter is scrutinised by a case somewhere in
183 the RHS of the function
184
185 OR
186
187 b) At a recursive call, one or more parameters has an unfolding
188 that is an explicit constructor application
189 AND
190 That same parameter is scrutinised by a case somewhere in
191 the RHS of the function
192 AND
193 Those are the only uses of the parameter (see Note [Reboxing])
194
195
196 What to abstract over
197 ~~~~~~~~~~~~~~~~~~~~~
198 There's a bit of a complication with type arguments. If the call
199 site looks like
200
201 f p = ...f ((:) [a] x xs)...
202
203 then our specialised function look like
204
205 f_spec x xs = let p = (:) [a] x xs in ....as before....
206
207 This only makes sense if either
208 a) the type variable 'a' is in scope at the top of f, or
209 b) the type variable 'a' is an argument to f (and hence fs)
210
211 Actually, (a) may hold for value arguments too, in which case
212 we may not want to pass them. Suppose 'x' is in scope at f's
213 defn, but xs is not. Then we'd like
214
215 f_spec xs = let p = (:) [a] x xs in ....as before....
216
217 Similarly (b) may hold too. If x is already an argument at the
218 call, no need to pass it again.
219
220 Finally, if 'a' is not in scope at the call site, we could abstract
221 it as we do the term variables:
222
223 f_spec a x xs = let p = (:) [a] x xs in ...as before...
224
225 So the grand plan is:
226
227 * abstract the call site to a constructor-only pattern
228 e.g. C x (D (f p) (g q)) ==> C s1 (D s2 s3)
229
230 * Find the free variables of the abstracted pattern
231
232 * Pass these variables, less any that are in scope at
233 the fn defn. But see Note [Shadowing] below.
234
235
236 NOTICE that we only abstract over variables that are not in scope,
237 so we're in no danger of shadowing variables used in "higher up"
238 in f_spec's RHS.
239
240
241 Note [Shadowing]
242 ~~~~~~~~~~~~~~~~
243 In this pass we gather up usage information that may mention variables
244 that are bound between the usage site and the definition site; or (more
245 seriously) may be bound to something different at the definition site.
246 For example:
247
248 f x = letrec g y v = let x = ...
249 in ...(g (a,b) x)...
250
251 Since 'x' is in scope at the call site, we may make a rewrite rule that
252 looks like
253 RULE forall a,b. g (a,b) x = ...
254 But this rule will never match, because it's really a different 'x' at
255 the call site -- and that difference will be manifest by the time the
256 simplifier gets to it. [A worry: the simplifier doesn't *guarantee*
257 no-shadowing, so perhaps it may not be distinct?]
258
259 Anyway, the rule isn't actually wrong, it's just not useful. One possibility
260 is to run deShadowBinds before running SpecConstr, but instead we run the
261 simplifier. That gives the simplest possible program for SpecConstr to
262 chew on; and it virtually guarantees no shadowing.
263
264 Note [Specialising for constant parameters]
265 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
266 This one is about specialising on a *constant* (but not necessarily
267 constructor) argument
268
269 foo :: Int -> (Int -> Int) -> Int
270 foo 0 f = 0
271 foo m f = foo (f m) (+1)
272
273 It produces
274
275 lvl_rmV :: GHC.Base.Int -> GHC.Base.Int
276 lvl_rmV =
277 \ (ds_dlk :: GHC.Base.Int) ->
278 case ds_dlk of wild_alH { GHC.Base.I# x_alG ->
279 GHC.Base.I# (GHC.Prim.+# x_alG 1)
280
281 T.$wfoo :: GHC.Prim.Int# -> (GHC.Base.Int -> GHC.Base.Int) ->
282 GHC.Prim.Int#
283 T.$wfoo =
284 \ (ww_sme :: GHC.Prim.Int#) (w_smg :: GHC.Base.Int -> GHC.Base.Int) ->
285 case ww_sme of ds_Xlw {
286 __DEFAULT ->
287 case w_smg (GHC.Base.I# ds_Xlw) of w1_Xmo { GHC.Base.I# ww1_Xmz ->
288 T.$wfoo ww1_Xmz lvl_rmV
289 };
290 0 -> 0
291 }
292
293 The recursive call has lvl_rmV as its argument, so we could create a specialised copy
294 with that argument baked in; that is, not passed at all. Now it can perhaps be inlined.
295
296 When is this worth it? Call the constant 'lvl'
297 - If 'lvl' has an unfolding that is a constructor, see if the corresponding
298 parameter is scrutinised anywhere in the body.
299
300 - If 'lvl' has an unfolding that is a inlinable function, see if the corresponding
301 parameter is applied (...to enough arguments...?)
302
303 Also do this is if the function has RULES?
304
305 Also
306
307 Note [Specialising for lambda parameters]
308 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
309 foo :: Int -> (Int -> Int) -> Int
310 foo 0 f = 0
311 foo m f = foo (f m) (\n -> n-m)
312
313 This is subtly different from the previous one in that we get an
314 explicit lambda as the argument:
315
316 T.$wfoo :: GHC.Prim.Int# -> (GHC.Base.Int -> GHC.Base.Int) ->
317 GHC.Prim.Int#
318 T.$wfoo =
319 \ (ww_sm8 :: GHC.Prim.Int#) (w_sma :: GHC.Base.Int -> GHC.Base.Int) ->
320 case ww_sm8 of ds_Xlr {
321 __DEFAULT ->
322 case w_sma (GHC.Base.I# ds_Xlr) of w1_Xmf { GHC.Base.I# ww1_Xmq ->
323 T.$wfoo
324 ww1_Xmq
325 (\ (n_ad3 :: GHC.Base.Int) ->
326 case n_ad3 of wild_alB { GHC.Base.I# x_alA ->
327 GHC.Base.I# (GHC.Prim.-# x_alA ds_Xlr)
328 })
329 };
330 0 -> 0
331 }
332
333 I wonder if SpecConstr couldn't be extended to handle this? After all,
334 lambda is a sort of constructor for functions and perhaps it already
335 has most of the necessary machinery?
336
337 Furthermore, there's an immediate win, because you don't need to allocate the lambda
338 at the call site; and if perchance it's called in the recursive call, then you
339 may avoid allocating it altogether. Just like for constructors.
340
341 Looks cool, but probably rare...but it might be easy to implement.
342
343
344 Note [SpecConstr for casts]
345 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
346 Consider
347 data family T a :: *
348 data instance T Int = T Int
349
350 foo n = ...
351 where
352 go (T 0) = 0
353 go (T n) = go (T (n-1))
354
355 The recursive call ends up looking like
356 go (T (I# ...) `cast` g)
357 So we want to spot the constructor application inside the cast.
358 That's why we have the Cast case in argToPat
359
360 Note [Local recursive groups]
361 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
362 For a *local* recursive group, we can see all the calls to the
363 function, so we seed the specialisation loop from the calls in the
364 body, not from the calls in the RHS. Consider:
365
366 bar m n = foo n (n,n) (n,n) (n,n) (n,n)
367 where
368 foo n p q r s
369 | n == 0 = m
370 | n > 3000 = case p of { (p1,p2) -> foo (n-1) (p2,p1) q r s }
371 | n > 2000 = case q of { (q1,q2) -> foo (n-1) p (q2,q1) r s }
372 | n > 1000 = case r of { (r1,r2) -> foo (n-1) p q (r2,r1) s }
373 | otherwise = case s of { (s1,s2) -> foo (n-1) p q r (s2,s1) }
374
375 If we start with the RHSs of 'foo', we get lots and lots of specialisations,
376 most of which are not needed. But if we start with the (single) call
377 in the rhs of 'bar' we get exactly one fully-specialised copy, and all
378 the recursive calls go to this fully-specialised copy. Indeed, the original
379 function is later collected as dead code. This is very important in
380 specialising the loops arising from stream fusion, for example in NDP where
381 we were getting literally hundreds of (mostly unused) specialisations of
382 a local function.
383
384 In a case like the above we end up never calling the original un-specialised
385 function. (Although we still leave its code around just in case.)
386
387 However, if we find any boring calls in the body, including *unsaturated*
388 ones, such as
389 letrec foo x y = ....foo...
390 in map foo xs
391 then we will end up calling the un-specialised function, so then we *should*
392 use the calls in the un-specialised RHS as seeds. We call these
393 "boring call patterns", and callsToPats reports if it finds any of these.
394
395 Note [Seeding top-level recursive groups]
396 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
397 This seeding is done in the binding for seed_calls in specRec.
398
399 1. If all the bindings in a top-level recursive group are local (not
400 exported), then all the calls are in the rest of the top-level
401 bindings. This means we can specialise with those call patterns
402 ONLY, and NOT with the RHSs of the recursive group (exactly like
403 Note [Local recursive groups])
404
405 2. But if any of the bindings are exported, the function may be called
406 with any old arguments, so (for lack of anything better) we specialise
407 based on
408 (a) the call patterns in the RHS
409 (b) the call patterns in the rest of the top-level bindings
410 NB: before Apr 15 we used (a) only, but Dimitrios had an example
411 where (b) was crucial, so I added that.
412 Adding (b) also improved nofib allocation results:
413 multiplier: 4% better
414 minimax: 2.8% better
415
416 Actually in case (2), instead of using the calls from the RHS, it
417 would be better to specialise in the importing module. We'd need to
418 add an INLINABLE pragma to the function, and then it can be
419 specialised in the importing scope, just as is done for type classes
420 in Specialise.specImports. This remains to be done (#10346).
421
422 Note [Top-level recursive groups]
423 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
424 To get the call usage information from "the rest of the top level
425 bindings" (c.f. Note [Seeding top-level recursive groups]), we work
426 backwards through the top-level bindings so we see the usage before we
427 get to the binding of the function. Before we can collect the usage
428 though, we go through all the bindings and add them to the
429 environment. This is necessary because usage is only tracked for
430 functions in the environment. These two passes are called
431 'go' and 'goEnv'
432 in specConstrProgram. (Looks a bit revolting to me.)
433
434 Note [Do not specialise diverging functions]
435 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
436 Specialising a function that just diverges is a waste of code.
437 Furthermore, it broke GHC (simpl014) thus:
438 {-# STR Sb #-}
439 f = \x. case x of (a,b) -> f x
440 If we specialise f we get
441 f = \x. case x of (a,b) -> fspec a b
442 But fspec doesn't have decent strictness info. As it happened,
443 (f x) :: IO t, so the state hack applied and we eta expanded fspec,
444 and hence f. But now f's strictness is less than its arity, which
445 breaks an invariant.
446
447
448 Note [Forcing specialisation]
449 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
450 With stream fusion and in other similar cases, we want to fully
451 specialise some (but not necessarily all!) loops regardless of their
452 size and the number of specialisations.
453
454 We allow a library to do this, in one of two ways (one which is
455 deprecated):
456
457 1) Add a parameter of type GHC.Types.SPEC (from ghc-prim) to the loop body.
458
459 2) (Deprecated) Annotate a type with ForceSpecConstr from GHC.Exts,
460 and then add *that* type as a parameter to the loop body
461
462 The reason #2 is deprecated is because it requires GHCi, which isn't
463 available for things like a cross compiler using stage1.
464
465 Here's a (simplified) example from the `vector` package. You may bring
466 the special 'force specialization' type into scope by saying:
467
468 import GHC.Types (SPEC(..))
469
470 or by defining your own type (again, deprecated):
471
472 data SPEC = SPEC | SPEC2
473 {-# ANN type SPEC ForceSpecConstr #-}
474
475 (Note this is the exact same definition of GHC.Types.SPEC, just
476 without the annotation.)
477
478 After that, you say:
479
480 foldl :: (a -> b -> a) -> a -> Stream b -> a
481 {-# INLINE foldl #-}
482 foldl f z (Stream step s _) = foldl_loop SPEC z s
483 where
484 foldl_loop !sPEC z s = case step s of
485 Yield x s' -> foldl_loop sPEC (f z x) s'
486 Skip -> foldl_loop sPEC z s'
487 Done -> z
488
489 SpecConstr will spot the SPEC parameter and always fully specialise
490 foldl_loop. Note that
491
492 * We have to prevent the SPEC argument from being removed by
493 w/w which is why (a) SPEC is a sum type, and (b) we have to seq on
494 the SPEC argument.
495
496 * And lastly, the SPEC argument is ultimately eliminated by
497 SpecConstr itself so there is no runtime overhead.
498
499 This is all quite ugly; we ought to come up with a better design.
500
501 ForceSpecConstr arguments are spotted in scExpr' and scTopBinds which then set
502 sc_force to True when calling specLoop. This flag does four things:
503
504 * Ignore specConstrThreshold, to specialise functions of arbitrary size
505 (see scTopBind)
506 * Ignore specConstrCount, to make arbitrary numbers of specialisations
507 (see specialise)
508 * Specialise even for arguments that are not scrutinised in the loop
509 (see argToPat; #4448)
510 * Only specialise on recursive types a finite number of times
511 (see is_too_recursive; #5550; Note [Limit recursive specialisation])
512
513 The flag holds only for specialising a single binding group, and NOT
514 for nested bindings. (So really it should be passed around explicitly
515 and not stored in ScEnv.) #14379 turned out to be caused by
516 f SPEC x = let g1 x = ...
517 in ...
518 We force-specialise f (because of the SPEC), but that generates a specialised
519 copy of g1 (as well as the original). Alas g1 has a nested binding g2; and
520 in each copy of g1 we get an unspecialised and specialised copy of g2; and so
521 on. Result, exponential. So the force-spec flag now only applies to one
522 level of bindings at a time.
523
524 Mechanism for this one-level-only thing:
525
526 - Switch it on at the call to specRec, in scExpr and scTopBinds
527 - Switch it off when doing the RHSs;
528 this can be done very conveniently in decreaseSpecCount
529
530 What alternatives did I consider?
531
532 * Annotating the loop itself doesn't work because (a) it is local and
533 (b) it will be w/w'ed and having w/w propagating annotations somehow
534 doesn't seem like a good idea. The types of the loop arguments
535 really seem to be the most persistent thing.
536
537 * Annotating the types that make up the loop state doesn't work,
538 either, because (a) it would prevent us from using types like Either
539 or tuples here, (b) we don't want to restrict the set of types that
540 can be used in Stream states and (c) some types are fixed by the
541 user (e.g., the accumulator here) but we still want to specialise as
542 much as possible.
543
544 Alternatives to ForceSpecConstr
545 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
546 Instead of giving the loop an extra argument of type SPEC, we
547 also considered *wrapping* arguments in SPEC, thus
548 data SPEC a = SPEC a | SPEC2
549
550 loop = \arg -> case arg of
551 SPEC state ->
552 case state of (x,y) -> ... loop (SPEC (x',y')) ...
553 S2 -> error ...
554 The idea is that a SPEC argument says "specialise this argument
555 regardless of whether the function case-analyses it". But this
556 doesn't work well:
557 * SPEC must still be a sum type, else the strictness analyser
558 eliminates it
559 * But that means that 'loop' won't be strict in its real payload
560 This loss of strictness in turn screws up specialisation, because
561 we may end up with calls like
562 loop (SPEC (case z of (p,q) -> (q,p)))
563 Without the SPEC, if 'loop' were strict, the case would move out
564 and we'd see loop applied to a pair. But if 'loop' isn't strict
565 this doesn't look like a specialisable call.
566
567 Note [Limit recursive specialisation]
568 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
569 It is possible for ForceSpecConstr to cause an infinite loop of specialisation.
570 Because there is no limit on the number of specialisations, a recursive call with
571 a recursive constructor as an argument (for example, list cons) will generate
572 a specialisation for that constructor. If the resulting specialisation also
573 contains a recursive call with the constructor, this could proceed indefinitely.
574
575 For example, if ForceSpecConstr is on:
576 loop :: [Int] -> [Int] -> [Int]
577 loop z [] = z
578 loop z (x:xs) = loop (x:z) xs
579 this example will create a specialisation for the pattern
580 loop (a:b) c = loop' a b c
581
582 loop' a b [] = (a:b)
583 loop' a b (x:xs) = loop (x:(a:b)) xs
584 and a new pattern is found:
585 loop (a:(b:c)) d = loop'' a b c d
586 which can continue indefinitely.
587
588 Roman's suggestion to fix this was to stop after a couple of times on recursive types,
589 but still specialising on non-recursive types as much as possible.
590
591 To implement this, we count the number of times we have gone round the
592 "specialise recursively" loop ('go' in 'specRec'). Once have gone round
593 more than N times (controlled by -fspec-constr-recursive=N) we check
594
595 - If sc_force is off, and sc_count is (Just max) then we don't
596 need to do anything: trim_pats will limit the number of specs
597
598 - Otherwise check if any function has now got more than (sc_count env)
599 specialisations. If sc_count is "no limit" then we arbitrarily
600 choose 10 as the limit (ugh).
601
602 See #5550. Also #13623, where this test had become over-aggressive,
603 and we lost a wonderful specialisation that we really wanted!
604
605 Note [NoSpecConstr]
606 ~~~~~~~~~~~~~~~~~~~
607 The ignoreDataCon stuff allows you to say
608 {-# ANN type T NoSpecConstr #-}
609 to mean "don't specialise on arguments of this type". It was added
610 before we had ForceSpecConstr. Lacking ForceSpecConstr we specialised
611 regardless of size; and then we needed a way to turn that *off*. Now
612 that we have ForceSpecConstr, this NoSpecConstr is probably redundant.
613 (Used only for PArray, TODO: remove?)
614
615 -----------------------------------------------------
616 Stuff not yet handled
617 -----------------------------------------------------
618
619 Here are notes arising from Roman's work that I don't want to lose.
620
621 Example 1
622 ~~~~~~~~~
623 data T a = T !a
624
625 foo :: Int -> T Int -> Int
626 foo 0 t = 0
627 foo x t | even x = case t of { T n -> foo (x-n) t }
628 | otherwise = foo (x-1) t
629
630 SpecConstr does no specialisation, because the second recursive call
631 looks like a boxed use of the argument. A pity.
632
633 $wfoo_sFw :: GHC.Prim.Int# -> T.T GHC.Base.Int -> GHC.Prim.Int#
634 $wfoo_sFw =
635 \ (ww_sFo [Just L] :: GHC.Prim.Int#) (w_sFq [Just L] :: T.T GHC.Base.Int) ->
636 case ww_sFo of ds_Xw6 [Just L] {
637 __DEFAULT ->
638 case GHC.Prim.remInt# ds_Xw6 2 of wild1_aEF [Dead Just A] {
639 __DEFAULT -> $wfoo_sFw (GHC.Prim.-# ds_Xw6 1) w_sFq;
640 0 ->
641 case w_sFq of wild_Xy [Just L] { T.T n_ad5 [Just U(L)] ->
642 case n_ad5 of wild1_aET [Just A] { GHC.Base.I# y_aES [Just L] ->
643 $wfoo_sFw (GHC.Prim.-# ds_Xw6 y_aES) wild_Xy
644 } } };
645 0 -> 0
646
647 Example 2
648 ~~~~~~~~~
649 data a :*: b = !a :*: !b
650 data T a = T !a
651
652 foo :: (Int :*: T Int) -> Int
653 foo (0 :*: t) = 0
654 foo (x :*: t) | even x = case t of { T n -> foo ((x-n) :*: t) }
655 | otherwise = foo ((x-1) :*: t)
656
657 Very similar to the previous one, except that the parameters are now in
658 a strict tuple. Before SpecConstr, we have
659
660 $wfoo_sG3 :: GHC.Prim.Int# -> T.T GHC.Base.Int -> GHC.Prim.Int#
661 $wfoo_sG3 =
662 \ (ww_sFU [Just L] :: GHC.Prim.Int#) (ww_sFW [Just L] :: T.T
663 GHC.Base.Int) ->
664 case ww_sFU of ds_Xws [Just L] {
665 __DEFAULT ->
666 case GHC.Prim.remInt# ds_Xws 2 of wild1_aEZ [Dead Just A] {
667 __DEFAULT ->
668 case ww_sFW of tpl_B2 [Just L] { T.T a_sFo [Just A] ->
669 $wfoo_sG3 (GHC.Prim.-# ds_Xws 1) tpl_B2 -- $wfoo1
670 };
671 0 ->
672 case ww_sFW of wild_XB [Just A] { T.T n_ad7 [Just S(L)] ->
673 case n_ad7 of wild1_aFd [Just L] { GHC.Base.I# y_aFc [Just L] ->
674 $wfoo_sG3 (GHC.Prim.-# ds_Xws y_aFc) wild_XB -- $wfoo2
675 } } };
676 0 -> 0 }
677
678 We get two specialisations:
679 "SC:$wfoo1" [0] __forall {a_sFB :: GHC.Base.Int sc_sGC :: GHC.Prim.Int#}
680 Foo.$wfoo sc_sGC (Foo.T @ GHC.Base.Int a_sFB)
681 = Foo.$s$wfoo1 a_sFB sc_sGC ;
682 "SC:$wfoo2" [0] __forall {y_aFp :: GHC.Prim.Int# sc_sGC :: GHC.Prim.Int#}
683 Foo.$wfoo sc_sGC (Foo.T @ GHC.Base.Int (GHC.Base.I# y_aFp))
684 = Foo.$s$wfoo y_aFp sc_sGC ;
685
686 But perhaps the first one isn't good. After all, we know that tpl_B2 is
687 a T (I# x) really, because T is strict and Int has one constructor. (We can't
688 unbox the strict fields, because T is polymorphic!)
689
690 ************************************************************************
691 * *
692 \subsection{Top level wrapper stuff}
693 * *
694 ************************************************************************
695 -}
696
697 specConstrProgram :: ModGuts -> CoreM ModGuts
698 specConstrProgram guts
699 = do
700 dflags <- getDynFlags
701 us <- getUniqueSupplyM
702 annos <- getFirstAnnotations deserializeWithData guts
703 this_mod <- getModule
704 let binds' = reverse $ fst $ initUs us $ do
705 -- Note [Top-level recursive groups]
706 (env, binds) <- goEnv (initScEnv dflags this_mod annos)
707 (mg_binds guts)
708 -- binds is identical to (mg_binds guts), except that the
709 -- binders on the LHS have been replaced by extendBndr
710 -- (SPJ this seems like overkill; I don't think the binders
711 -- will change at all; and we don't substitute in the RHSs anyway!!)
712 go env nullUsage (reverse binds)
713
714 return (guts { mg_binds = binds' })
715 where
716 -- See Note [Top-level recursive groups]
717 goEnv env [] = return (env, [])
718 goEnv env (bind:binds) = do (env', bind') <- scTopBindEnv env bind
719 (env'', binds') <- goEnv env' binds
720 return (env'', bind' : binds')
721
722 -- Arg list of bindings is in reverse order
723 go _ _ [] = return []
724 go env usg (bind:binds) = do (usg', bind') <- scTopBind env usg bind
725 binds' <- go env usg' binds
726 return (bind' : binds')
727
728 {-
729 ************************************************************************
730 * *
731 \subsection{Environment: goes downwards}
732 * *
733 ************************************************************************
734
735 Note [Work-free values only in environment]
736 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
737 The sc_vals field keeps track of in-scope value bindings, so
738 that if we come across (case x of Just y ->...) we can reduce the
739 case from knowing that x is bound to a pair.
740
741 But only *work-free* values are ok here. For example if the envt had
742 x -> Just (expensive v)
743 then we do NOT want to expand to
744 let y = expensive v in ...
745 because the x-binding still exists and we've now duplicated (expensive v).
746
747 This seldom happens because let-bound constructor applications are
748 ANF-ised, but it can happen as a result of on-the-fly transformations in
749 SpecConstr itself. Here is #7865:
750
751 let {
752 a'_shr =
753 case xs_af8 of _ {
754 [] -> acc_af6;
755 : ds_dgt [Dmd=<L,A>] ds_dgu [Dmd=<L,A>] ->
756 (expensive x_af7, x_af7
757 } } in
758 let {
759 ds_sht =
760 case a'_shr of _ { (p'_afd, q'_afe) ->
761 TSpecConstr_DoubleInline.recursive
762 (GHC.Types.: @ GHC.Types.Int x_af7 wild_X6) (q'_afe, p'_afd)
763 } } in
764
765 When processed knowing that xs_af8 was bound to a cons, we simplify to
766 a'_shr = (expensive x_af7, x_af7)
767 and we do NOT want to inline that at the occurrence of a'_shr in ds_sht.
768 (There are other occurrences of a'_shr.) No no no.
769
770 It would be possible to do some on-the-fly ANF-ising, so that a'_shr turned
771 into a work-free value again, thus
772 a1 = expensive x_af7
773 a'_shr = (a1, x_af7)
774 but that's more work, so until its shown to be important I'm going to
775 leave it for now.
776
777 Note [Making SpecConstr keener]
778 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
779 Consider this, in (perf/should_run/T9339)
780 last (filter odd [1..1000])
781
782 After optimisation, including SpecConstr, we get:
783 f :: Int# -> Int -> Int
784 f x y = case case remInt# x 2# of
785 __DEFAULT -> case x of
786 __DEFAULT -> f (+# wild_Xp 1#) (I# x)
787 1000000# -> ...
788 0# -> case x of
789 __DEFAULT -> f (+# wild_Xp 1#) y
790 1000000# -> y
791
792 Not good! We build an (I# x) box every time around the loop.
793 SpecConstr (as described in the paper) does not specialise f, despite
794 the call (f ... (I# x)) because 'y' is not scrutinied in the body.
795 But it is much better to specialise f for the case where the argument
796 is of form (I# x); then we build the box only when returning y, which
797 is on the cold path.
798
799 Another example:
800
801 f x = ...(g x)....
802
803 Here 'x' is not scrutinised in f's body; but if we did specialise 'f'
804 then the call (g x) might allow 'g' to be specialised in turn.
805
806 So sc_keen controls whether or not we take account of whether argument is
807 scrutinised in the body. True <=> ignore that, and speicalise whenever
808 the function is applied to a data constructor.
809 -}
810
811 data ScEnv = SCE { sc_dflags :: DynFlags,
812 sc_module :: !Module,
813 sc_size :: Maybe Int, -- Size threshold
814 -- Nothing => no limit
815
816 sc_count :: Maybe Int, -- Max # of specialisations for any one fn
817 -- Nothing => no limit
818 -- See Note [Avoiding exponential blowup]
819
820 sc_recursive :: Int, -- Max # of specialisations over recursive type.
821 -- Stops ForceSpecConstr from diverging.
822
823 sc_keen :: Bool, -- Specialise on arguments that are known
824 -- constructors, even if they are not
825 -- scrutinised in the body. See
826 -- Note [Making SpecConstr keener]
827
828 sc_force :: Bool, -- Force specialisation?
829 -- See Note [Forcing specialisation]
830
831 sc_subst :: Subst, -- Current substitution
832 -- Maps InIds to OutExprs
833
834 sc_how_bound :: HowBoundEnv,
835 -- Binds interesting non-top-level variables
836 -- Domain is OutVars (*after* applying the substitution)
837
838 sc_vals :: ValueEnv,
839 -- Domain is OutIds (*after* applying the substitution)
840 -- Used even for top-level bindings (but not imported ones)
841 -- The range of the ValueEnv is *work-free* values
842 -- such as (\x. blah), or (Just v)
843 -- but NOT (Just (expensive v))
844 -- See Note [Work-free values only in environment]
845
846 sc_annotations :: UniqFM SpecConstrAnnotation
847 }
848
849 ---------------------
850 type HowBoundEnv = VarEnv HowBound -- Domain is OutVars
851
852 ---------------------
853 type ValueEnv = IdEnv Value -- Domain is OutIds
854 data Value = ConVal AltCon [CoreArg] -- _Saturated_ constructors
855 -- The AltCon is never DEFAULT
856 | LambdaVal -- Inlinable lambdas or PAPs
857
858 instance Outputable Value where
859 ppr (ConVal con args) = ppr con <+> interpp'SP args
860 ppr LambdaVal = text "<Lambda>"
861
862 ---------------------
863 initScEnv :: DynFlags -> Module -> UniqFM SpecConstrAnnotation -> ScEnv
864 initScEnv dflags this_mod anns
865 = SCE { sc_dflags = dflags,
866 sc_module = this_mod,
867 sc_size = specConstrThreshold dflags,
868 sc_count = specConstrCount dflags,
869 sc_recursive = specConstrRecursive dflags,
870 sc_keen = gopt Opt_SpecConstrKeen dflags,
871 sc_force = False,
872 sc_subst = emptySubst,
873 sc_how_bound = emptyVarEnv,
874 sc_vals = emptyVarEnv,
875 sc_annotations = anns }
876
877 data HowBound = RecFun -- These are the recursive functions for which
878 -- we seek interesting call patterns
879
880 | RecArg -- These are those functions' arguments, or their sub-components;
881 -- we gather occurrence information for these
882
883 instance Outputable HowBound where
884 ppr RecFun = text "RecFun"
885 ppr RecArg = text "RecArg"
886
887 scForce :: ScEnv -> Bool -> ScEnv
888 scForce env b = env { sc_force = b }
889
890 lookupHowBound :: ScEnv -> Id -> Maybe HowBound
891 lookupHowBound env id = lookupVarEnv (sc_how_bound env) id
892
893 scSubstId :: ScEnv -> Id -> CoreExpr
894 scSubstId env v = lookupIdSubst (text "scSubstId") (sc_subst env) v
895
896 scSubstTy :: ScEnv -> Type -> Type
897 scSubstTy env ty = substTy (sc_subst env) ty
898
899 scSubstCo :: ScEnv -> Coercion -> Coercion
900 scSubstCo env co = substCo (sc_subst env) co
901
902 zapScSubst :: ScEnv -> ScEnv
903 zapScSubst env = env { sc_subst = zapSubstEnv (sc_subst env) }
904
905 extendScInScope :: ScEnv -> [Var] -> ScEnv
906 -- Bring the quantified variables into scope
907 extendScInScope env qvars = env { sc_subst = extendInScopeList (sc_subst env) qvars }
908
909 -- Extend the substitution
910 extendScSubst :: ScEnv -> Var -> OutExpr -> ScEnv
911 extendScSubst env var expr = env { sc_subst = extendSubst (sc_subst env) var expr }
912
913 extendScSubstList :: ScEnv -> [(Var,OutExpr)] -> ScEnv
914 extendScSubstList env prs = env { sc_subst = extendSubstList (sc_subst env) prs }
915
916 extendHowBound :: ScEnv -> [Var] -> HowBound -> ScEnv
917 extendHowBound env bndrs how_bound
918 = env { sc_how_bound = extendVarEnvList (sc_how_bound env)
919 [(bndr,how_bound) | bndr <- bndrs] }
920
921 extendBndrsWith :: HowBound -> ScEnv -> [Var] -> (ScEnv, [Var])
922 extendBndrsWith how_bound env bndrs
923 = (env { sc_subst = subst', sc_how_bound = hb_env' }, bndrs')
924 where
925 (subst', bndrs') = substBndrs (sc_subst env) bndrs
926 hb_env' = sc_how_bound env `extendVarEnvList`
927 [(bndr,how_bound) | bndr <- bndrs']
928
929 extendBndrWith :: HowBound -> ScEnv -> Var -> (ScEnv, Var)
930 extendBndrWith how_bound env bndr
931 = (env { sc_subst = subst', sc_how_bound = hb_env' }, bndr')
932 where
933 (subst', bndr') = substBndr (sc_subst env) bndr
934 hb_env' = extendVarEnv (sc_how_bound env) bndr' how_bound
935
936 extendRecBndrs :: ScEnv -> [Var] -> (ScEnv, [Var])
937 extendRecBndrs env bndrs = (env { sc_subst = subst' }, bndrs')
938 where
939 (subst', bndrs') = substRecBndrs (sc_subst env) bndrs
940
941 extendBndr :: ScEnv -> Var -> (ScEnv, Var)
942 extendBndr env bndr = (env { sc_subst = subst' }, bndr')
943 where
944 (subst', bndr') = substBndr (sc_subst env) bndr
945
946 extendValEnv :: ScEnv -> Id -> Maybe Value -> ScEnv
947 extendValEnv env _ Nothing = env
948 extendValEnv env id (Just cv)
949 | valueIsWorkFree cv -- Don't duplicate work!! #7865
950 = env { sc_vals = extendVarEnv (sc_vals env) id cv }
951 extendValEnv env _ _ = env
952
953 extendCaseBndrs :: ScEnv -> OutExpr -> OutId -> AltCon -> [Var] -> (ScEnv, [Var])
954 -- When we encounter
955 -- case scrut of b
956 -- C x y -> ...
957 -- we want to bind b, to (C x y)
958 -- NB1: Extends only the sc_vals part of the envt
959 -- NB2: Kill the dead-ness info on the pattern binders x,y, since
960 -- they are potentially made alive by the [b -> C x y] binding
961 extendCaseBndrs env scrut case_bndr con alt_bndrs
962 = (env2, alt_bndrs')
963 where
964 live_case_bndr = not (isDeadBinder case_bndr)
965 env1 | Var v <- stripTicksTopE (const True) scrut
966 = extendValEnv env v cval
967 | otherwise = env -- See Note [Add scrutinee to ValueEnv too]
968 env2 | live_case_bndr = extendValEnv env1 case_bndr cval
969 | otherwise = env1
970
971 alt_bndrs' | case scrut of { Var {} -> True; _ -> live_case_bndr }
972 = map zap alt_bndrs
973 | otherwise
974 = alt_bndrs
975
976 cval = case con of
977 DEFAULT -> Nothing
978 LitAlt {} -> Just (ConVal con [])
979 DataAlt {} -> Just (ConVal con vanilla_args)
980 where
981 vanilla_args = map Type (tyConAppArgs (idType case_bndr)) ++
982 varsToCoreExprs alt_bndrs
983
984 zap v | isTyVar v = v -- See NB2 above
985 | otherwise = zapIdOccInfo v
986
987
988 decreaseSpecCount :: ScEnv -> Int -> ScEnv
989 -- See Note [Avoiding exponential blowup]
990 decreaseSpecCount env n_specs
991 = env { sc_force = False -- See Note [Forcing specialisation]
992 , sc_count = case sc_count env of
993 Nothing -> Nothing
994 Just n -> Just (n `div` (n_specs + 1)) }
995 -- The "+1" takes account of the original function;
996 -- See Note [Avoiding exponential blowup]
997
998 ---------------------------------------------------
999 -- See Note [Forcing specialisation]
1000 ignoreType :: ScEnv -> Type -> Bool
1001 ignoreDataCon :: ScEnv -> DataCon -> Bool
1002 forceSpecBndr :: ScEnv -> Var -> Bool
1003
1004 ignoreDataCon env dc = ignoreTyCon env (dataConTyCon dc)
1005
1006 ignoreType env ty
1007 = case tyConAppTyCon_maybe ty of
1008 Just tycon -> ignoreTyCon env tycon
1009 _ -> False
1010
1011 ignoreTyCon :: ScEnv -> TyCon -> Bool
1012 ignoreTyCon env tycon
1013 = lookupUFM (sc_annotations env) tycon == Just NoSpecConstr
1014
1015 forceSpecBndr env var = forceSpecFunTy env . snd . splitForAllTys . varType $ var
1016
1017 forceSpecFunTy :: ScEnv -> Type -> Bool
1018 forceSpecFunTy env = any (forceSpecArgTy env) . fst . splitFunTys
1019
1020 forceSpecArgTy :: ScEnv -> Type -> Bool
1021 forceSpecArgTy env ty
1022 | Just ty' <- coreView ty = forceSpecArgTy env ty'
1023
1024 forceSpecArgTy env ty
1025 | Just (tycon, tys) <- splitTyConApp_maybe ty
1026 , tycon /= funTyCon
1027 = tyConName tycon == specTyConName
1028 || lookupUFM (sc_annotations env) tycon == Just ForceSpecConstr
1029 || any (forceSpecArgTy env) tys
1030
1031 forceSpecArgTy _ _ = False
1032
1033 {-
1034 Note [Add scrutinee to ValueEnv too]
1035 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1036 Consider this:
1037 case x of y
1038 (a,b) -> case b of c
1039 I# v -> ...(f y)...
1040 By the time we get to the call (f y), the ValueEnv
1041 will have a binding for y, and for c
1042 y -> (a,b)
1043 c -> I# v
1044 BUT that's not enough! Looking at the call (f y) we
1045 see that y is pair (a,b), but we also need to know what 'b' is.
1046 So in extendCaseBndrs we must *also* add the binding
1047 b -> I# v
1048 else we lose a useful specialisation for f. This is necessary even
1049 though the simplifier has systematically replaced uses of 'x' with 'y'
1050 and 'b' with 'c' in the code. The use of 'b' in the ValueEnv came
1051 from outside the case. See #4908 for the live example.
1052
1053 Note [Avoiding exponential blowup]
1054 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1055 The sc_count field of the ScEnv says how many times we are prepared to
1056 duplicate a single function. But we must take care with recursive
1057 specialisations. Consider
1058
1059 let $j1 = let $j2 = let $j3 = ...
1060 in
1061 ...$j3...
1062 in
1063 ...$j2...
1064 in
1065 ...$j1...
1066
1067 If we specialise $j1 then in each specialisation (as well as the original)
1068 we can specialise $j2, and similarly $j3. Even if we make just *one*
1069 specialisation of each, because we also have the original we'll get 2^n
1070 copies of $j3, which is not good.
1071
1072 So when recursively specialising we divide the sc_count by the number of
1073 copies we are making at this level, including the original.
1074
1075
1076 ************************************************************************
1077 * *
1078 \subsection{Usage information: flows upwards}
1079 * *
1080 ************************************************************************
1081 -}
1082
1083 data ScUsage
1084 = SCU {
1085 scu_calls :: CallEnv, -- Calls
1086 -- The functions are a subset of the
1087 -- RecFuns in the ScEnv
1088
1089 scu_occs :: !(IdEnv ArgOcc) -- Information on argument occurrences
1090 } -- The domain is OutIds
1091
1092 type CallEnv = IdEnv [Call]
1093 data Call = Call Id [CoreArg] ValueEnv
1094 -- The arguments of the call, together with the
1095 -- env giving the constructor bindings at the call site
1096 -- We keep the function mainly for debug output
1097
1098 instance Outputable ScUsage where
1099 ppr (SCU { scu_calls = calls, scu_occs = occs })
1100 = text "SCU" <+> braces (sep [ ptext (sLit "calls =") <+> ppr calls
1101 , text "occs =" <+> ppr occs ])
1102
1103 instance Outputable Call where
1104 ppr (Call fn args _) = ppr fn <+> fsep (map pprParendExpr args)
1105
1106 nullUsage :: ScUsage
1107 nullUsage = SCU { scu_calls = emptyVarEnv, scu_occs = emptyVarEnv }
1108
1109 combineCalls :: CallEnv -> CallEnv -> CallEnv
1110 combineCalls = plusVarEnv_C (++)
1111 where
1112 -- plus cs ds | length res > 1
1113 -- = pprTrace "combineCalls" (vcat [ text "cs:" <+> ppr cs
1114 -- , text "ds:" <+> ppr ds])
1115 -- res
1116 -- | otherwise = res
1117 -- where
1118 -- res = cs ++ ds
1119
1120 combineUsage :: ScUsage -> ScUsage -> ScUsage
1121 combineUsage u1 u2 = SCU { scu_calls = combineCalls (scu_calls u1) (scu_calls u2),
1122 scu_occs = plusVarEnv_C combineOcc (scu_occs u1) (scu_occs u2) }
1123
1124 combineUsages :: [ScUsage] -> ScUsage
1125 combineUsages [] = nullUsage
1126 combineUsages us = foldr1 combineUsage us
1127
1128 lookupOccs :: ScUsage -> [OutVar] -> (ScUsage, [ArgOcc])
1129 lookupOccs (SCU { scu_calls = sc_calls, scu_occs = sc_occs }) bndrs
1130 = (SCU {scu_calls = sc_calls, scu_occs = delVarEnvList sc_occs bndrs},
1131 [lookupVarEnv sc_occs b `orElse` NoOcc | b <- bndrs])
1132
1133 data ArgOcc = NoOcc -- Doesn't occur at all; or a type argument
1134 | UnkOcc -- Used in some unknown way
1135
1136 | ScrutOcc -- See Note [ScrutOcc]
1137 (DataConEnv [ArgOcc]) -- How the sub-components are used
1138
1139 type DataConEnv a = UniqFM a -- Keyed by DataCon
1140
1141 {- Note [ScrutOcc]
1142 ~~~~~~~~~~~~~~~~~~~
1143 An occurrence of ScrutOcc indicates that the thing, or a `cast` version of the thing,
1144 is *only* taken apart or applied.
1145
1146 Functions, literal: ScrutOcc emptyUFM
1147 Data constructors: ScrutOcc subs,
1148
1149 where (subs :: UniqFM [ArgOcc]) gives usage of the *pattern-bound* components,
1150 The domain of the UniqFM is the Unique of the data constructor
1151
1152 The [ArgOcc] is the occurrences of the *pattern-bound* components
1153 of the data structure. E.g.
1154 data T a = forall b. MkT a b (b->a)
1155 A pattern binds b, x::a, y::b, z::b->a, but not 'a'!
1156
1157 -}
1158
1159 instance Outputable ArgOcc where
1160 ppr (ScrutOcc xs) = text "scrut-occ" <> ppr xs
1161 ppr UnkOcc = text "unk-occ"
1162 ppr NoOcc = text "no-occ"
1163
1164 evalScrutOcc :: ArgOcc
1165 evalScrutOcc = ScrutOcc emptyUFM
1166
1167 -- Experimentally, this vesion of combineOcc makes ScrutOcc "win", so
1168 -- that if the thing is scrutinised anywhere then we get to see that
1169 -- in the overall result, even if it's also used in a boxed way
1170 -- This might be too aggressive; see Note [Reboxing] Alternative 3
1171 combineOcc :: ArgOcc -> ArgOcc -> ArgOcc
1172 combineOcc NoOcc occ = occ
1173 combineOcc occ NoOcc = occ
1174 combineOcc (ScrutOcc xs) (ScrutOcc ys) = ScrutOcc (plusUFM_C combineOccs xs ys)
1175 combineOcc UnkOcc (ScrutOcc ys) = ScrutOcc ys
1176 combineOcc (ScrutOcc xs) UnkOcc = ScrutOcc xs
1177 combineOcc UnkOcc UnkOcc = UnkOcc
1178
1179 combineOccs :: [ArgOcc] -> [ArgOcc] -> [ArgOcc]
1180 combineOccs xs ys = zipWithEqual "combineOccs" combineOcc xs ys
1181
1182 setScrutOcc :: ScEnv -> ScUsage -> OutExpr -> ArgOcc -> ScUsage
1183 -- _Overwrite_ the occurrence info for the scrutinee, if the scrutinee
1184 -- is a variable, and an interesting variable
1185 setScrutOcc env usg (Cast e _) occ = setScrutOcc env usg e occ
1186 setScrutOcc env usg (Tick _ e) occ = setScrutOcc env usg e occ
1187 setScrutOcc env usg (Var v) occ
1188 | Just RecArg <- lookupHowBound env v = usg { scu_occs = extendVarEnv (scu_occs usg) v occ }
1189 | otherwise = usg
1190 setScrutOcc _env usg _other _occ -- Catch-all
1191 = usg
1192
1193 {-
1194 ************************************************************************
1195 * *
1196 \subsection{The main recursive function}
1197 * *
1198 ************************************************************************
1199
1200 The main recursive function gathers up usage information, and
1201 creates specialised versions of functions.
1202 -}
1203
1204 scExpr, scExpr' :: ScEnv -> CoreExpr -> UniqSM (ScUsage, CoreExpr)
1205 -- The unique supply is needed when we invent
1206 -- a new name for the specialised function and its args
1207
1208 scExpr env e = scExpr' env e
1209
1210 scExpr' env (Var v) = case scSubstId env v of
1211 Var v' -> return (mkVarUsage env v' [], Var v')
1212 e' -> scExpr (zapScSubst env) e'
1213
1214 scExpr' env (Type t) = return (nullUsage, Type (scSubstTy env t))
1215 scExpr' env (Coercion c) = return (nullUsage, Coercion (scSubstCo env c))
1216 scExpr' _ e@(Lit {}) = return (nullUsage, e)
1217 scExpr' env (Tick t e) = do (usg, e') <- scExpr env e
1218 return (usg, Tick t e')
1219 scExpr' env (Cast e co) = do (usg, e') <- scExpr env e
1220 return (usg, mkCast e' (scSubstCo env co))
1221 -- Important to use mkCast here
1222 -- See Note [SpecConstr call patterns]
1223 scExpr' env e@(App _ _) = scApp env (collectArgs e)
1224 scExpr' env (Lam b e) = do let (env', b') = extendBndr env b
1225 (usg, e') <- scExpr env' e
1226 return (usg, Lam b' e')
1227
1228 scExpr' env (Case scrut b ty alts)
1229 = do { (scrut_usg, scrut') <- scExpr env scrut
1230 ; case isValue (sc_vals env) scrut' of
1231 Just (ConVal con args) -> sc_con_app con args scrut'
1232 _other -> sc_vanilla scrut_usg scrut'
1233 }
1234 where
1235 sc_con_app con args scrut' -- Known constructor; simplify
1236 = do { let (_, bs, rhs) = findAlt con alts
1237 `orElse` (DEFAULT, [], mkImpossibleExpr ty)
1238 alt_env' = extendScSubstList env ((b,scrut') : bs `zip` trimConArgs con args)
1239 ; scExpr alt_env' rhs }
1240
1241 sc_vanilla scrut_usg scrut' -- Normal case
1242 = do { let (alt_env,b') = extendBndrWith RecArg env b
1243 -- Record RecArg for the components
1244
1245 ; (alt_usgs, alt_occs, alts')
1246 <- mapAndUnzip3M (sc_alt alt_env scrut' b') alts
1247
1248 ; let scrut_occ = foldr combineOcc NoOcc alt_occs
1249 scrut_usg' = setScrutOcc env scrut_usg scrut' scrut_occ
1250 -- The combined usage of the scrutinee is given
1251 -- by scrut_occ, which is passed to scScrut, which
1252 -- in turn treats a bare-variable scrutinee specially
1253
1254 ; return (foldr combineUsage scrut_usg' alt_usgs,
1255 Case scrut' b' (scSubstTy env ty) alts') }
1256
1257 sc_alt env scrut' b' (con,bs,rhs)
1258 = do { let (env1, bs1) = extendBndrsWith RecArg env bs
1259 (env2, bs2) = extendCaseBndrs env1 scrut' b' con bs1
1260 ; (usg, rhs') <- scExpr env2 rhs
1261 ; let (usg', b_occ:arg_occs) = lookupOccs usg (b':bs2)
1262 scrut_occ = case con of
1263 DataAlt dc -> ScrutOcc (unitUFM dc arg_occs)
1264 _ -> ScrutOcc emptyUFM
1265 ; return (usg', b_occ `combineOcc` scrut_occ, (con, bs2, rhs')) }
1266
1267 scExpr' env (Let (NonRec bndr rhs) body)
1268 | isTyVar bndr -- Type-lets may be created by doBeta
1269 = scExpr' (extendScSubst env bndr rhs) body
1270
1271 | otherwise
1272 = do { let (body_env, bndr') = extendBndr env bndr
1273 ; rhs_info <- scRecRhs env (bndr',rhs)
1274
1275 ; let body_env2 = extendHowBound body_env [bndr'] RecFun
1276 -- Note [Local let bindings]
1277 rhs' = ri_new_rhs rhs_info
1278 body_env3 = extendValEnv body_env2 bndr' (isValue (sc_vals env) rhs')
1279
1280 ; (body_usg, body') <- scExpr body_env3 body
1281
1282 -- NB: For non-recursive bindings we inherit sc_force flag from
1283 -- the parent function (see Note [Forcing specialisation])
1284 ; (spec_usg, specs) <- specNonRec env body_usg rhs_info
1285
1286 ; return (body_usg { scu_calls = scu_calls body_usg `delVarEnv` bndr' }
1287 `combineUsage` spec_usg, -- Note [spec_usg includes rhs_usg]
1288 mkLets [NonRec b r | (b,r) <- ruleInfoBinds rhs_info specs] body')
1289 }
1290
1291
1292 -- A *local* recursive group: see Note [Local recursive groups]
1293 scExpr' env (Let (Rec prs) body)
1294 = do { let (bndrs,rhss) = unzip prs
1295 (rhs_env1,bndrs') = extendRecBndrs env bndrs
1296 rhs_env2 = extendHowBound rhs_env1 bndrs' RecFun
1297 force_spec = any (forceSpecBndr env) bndrs'
1298 -- Note [Forcing specialisation]
1299
1300 ; rhs_infos <- mapM (scRecRhs rhs_env2) (bndrs' `zip` rhss)
1301 ; (body_usg, body') <- scExpr rhs_env2 body
1302
1303 -- NB: start specLoop from body_usg
1304 ; (spec_usg, specs) <- specRec NotTopLevel (scForce rhs_env2 force_spec)
1305 body_usg rhs_infos
1306 -- Do not unconditionally generate specialisations from rhs_usgs
1307 -- Instead use them only if we find an unspecialised call
1308 -- See Note [Local recursive groups]
1309
1310 ; let all_usg = spec_usg `combineUsage` body_usg -- Note [spec_usg includes rhs_usg]
1311 bind' = Rec (concat (zipWith ruleInfoBinds rhs_infos specs))
1312
1313 ; return (all_usg { scu_calls = scu_calls all_usg `delVarEnvList` bndrs' },
1314 Let bind' body') }
1315
1316 {-
1317 Note [Local let bindings]
1318 ~~~~~~~~~~~~~~~~~~~~~~~~~
1319 It is not uncommon to find this
1320
1321 let $j = \x. <blah> in ...$j True...$j True...
1322
1323 Here $j is an arbitrary let-bound function, but it often comes up for
1324 join points. We might like to specialise $j for its call patterns.
1325 Notice the difference from a letrec, where we look for call patterns
1326 in the *RHS* of the function. Here we look for call patterns in the
1327 *body* of the let.
1328
1329 At one point I predicated this on the RHS mentioning the outer
1330 recursive function, but that's not essential and might even be
1331 harmful. I'm not sure.
1332 -}
1333
1334 scApp :: ScEnv -> (InExpr, [InExpr]) -> UniqSM (ScUsage, CoreExpr)
1335
1336 scApp env (Var fn, args) -- Function is a variable
1337 = ASSERT( not (null args) )
1338 do { args_w_usgs <- mapM (scExpr env) args
1339 ; let (arg_usgs, args') = unzip args_w_usgs
1340 arg_usg = combineUsages arg_usgs
1341 ; case scSubstId env fn of
1342 fn'@(Lam {}) -> scExpr (zapScSubst env) (doBeta fn' args')
1343 -- Do beta-reduction and try again
1344
1345 Var fn' -> return (arg_usg `combineUsage` mkVarUsage env fn' args',
1346 mkApps (Var fn') args')
1347
1348 other_fn' -> return (arg_usg, mkApps other_fn' args') }
1349 -- NB: doing this ignores any usage info from the substituted
1350 -- function, but I don't think that matters. If it does
1351 -- we can fix it.
1352 where
1353 doBeta :: OutExpr -> [OutExpr] -> OutExpr
1354 -- ToDo: adjust for System IF
1355 doBeta (Lam bndr body) (arg : args) = Let (NonRec bndr arg) (doBeta body args)
1356 doBeta fn args = mkApps fn args
1357
1358 -- The function is almost always a variable, but not always.
1359 -- In particular, if this pass follows float-in,
1360 -- which it may, we can get
1361 -- (let f = ...f... in f) arg1 arg2
1362 scApp env (other_fn, args)
1363 = do { (fn_usg, fn') <- scExpr env other_fn
1364 ; (arg_usgs, args') <- mapAndUnzipM (scExpr env) args
1365 ; return (combineUsages arg_usgs `combineUsage` fn_usg, mkApps fn' args') }
1366
1367 ----------------------
1368 mkVarUsage :: ScEnv -> Id -> [CoreExpr] -> ScUsage
1369 mkVarUsage env fn args
1370 = case lookupHowBound env fn of
1371 Just RecFun -> SCU { scu_calls = unitVarEnv fn [Call fn args (sc_vals env)]
1372 , scu_occs = emptyVarEnv }
1373 Just RecArg -> SCU { scu_calls = emptyVarEnv
1374 , scu_occs = unitVarEnv fn arg_occ }
1375 Nothing -> nullUsage
1376 where
1377 -- I rather think we could use UnkOcc all the time
1378 arg_occ | null args = UnkOcc
1379 | otherwise = evalScrutOcc
1380
1381 ----------------------
1382 scTopBindEnv :: ScEnv -> CoreBind -> UniqSM (ScEnv, CoreBind)
1383 scTopBindEnv env (Rec prs)
1384 = do { let (rhs_env1,bndrs') = extendRecBndrs env bndrs
1385 rhs_env2 = extendHowBound rhs_env1 bndrs RecFun
1386
1387 prs' = zip bndrs' rhss
1388 ; return (rhs_env2, Rec prs') }
1389 where
1390 (bndrs,rhss) = unzip prs
1391
1392 scTopBindEnv env (NonRec bndr rhs)
1393 = do { let (env1, bndr') = extendBndr env bndr
1394 env2 = extendValEnv env1 bndr' (isValue (sc_vals env) rhs)
1395 ; return (env2, NonRec bndr' rhs) }
1396
1397 ----------------------
1398 scTopBind :: ScEnv -> ScUsage -> CoreBind -> UniqSM (ScUsage, CoreBind)
1399
1400 {-
1401 scTopBind _ usage _
1402 | pprTrace "scTopBind_usage" (ppr (scu_calls usage)) False
1403 = error "false"
1404 -}
1405
1406 scTopBind env body_usage (Rec prs)
1407 | Just threshold <- sc_size env
1408 , not force_spec
1409 , not (all (couldBeSmallEnoughToInline (sc_dflags env) threshold) rhss)
1410 -- No specialisation
1411 = -- pprTrace "scTopBind: nospec" (ppr bndrs) $
1412 do { (rhs_usgs, rhss') <- mapAndUnzipM (scExpr env) rhss
1413 ; return (body_usage `combineUsage` combineUsages rhs_usgs, Rec (bndrs `zip` rhss')) }
1414
1415 | otherwise -- Do specialisation
1416 = do { rhs_infos <- mapM (scRecRhs env) prs
1417
1418 ; (spec_usage, specs) <- specRec TopLevel (scForce env force_spec)
1419 body_usage rhs_infos
1420
1421 ; return (body_usage `combineUsage` spec_usage,
1422 Rec (concat (zipWith ruleInfoBinds rhs_infos specs))) }
1423 where
1424 (bndrs,rhss) = unzip prs
1425 force_spec = any (forceSpecBndr env) bndrs
1426 -- Note [Forcing specialisation]
1427
1428 scTopBind env usage (NonRec bndr rhs) -- Oddly, we don't seem to specialise top-level non-rec functions
1429 = do { (rhs_usg', rhs') <- scExpr env rhs
1430 ; return (usage `combineUsage` rhs_usg', NonRec bndr rhs') }
1431
1432 ----------------------
1433 scRecRhs :: ScEnv -> (OutId, InExpr) -> UniqSM RhsInfo
1434 scRecRhs env (bndr,rhs)
1435 = do { let (arg_bndrs,body) = collectBinders rhs
1436 (body_env, arg_bndrs') = extendBndrsWith RecArg env arg_bndrs
1437 ; (body_usg, body') <- scExpr body_env body
1438 ; let (rhs_usg, arg_occs) = lookupOccs body_usg arg_bndrs'
1439 ; return (RI { ri_rhs_usg = rhs_usg
1440 , ri_fn = bndr, ri_new_rhs = mkLams arg_bndrs' body'
1441 , ri_lam_bndrs = arg_bndrs, ri_lam_body = body
1442 , ri_arg_occs = arg_occs }) }
1443 -- The arg_occs says how the visible,
1444 -- lambda-bound binders of the RHS are used
1445 -- (including the TyVar binders)
1446 -- Two pats are the same if they match both ways
1447
1448 ----------------------
1449 ruleInfoBinds :: RhsInfo -> SpecInfo -> [(Id,CoreExpr)]
1450 ruleInfoBinds (RI { ri_fn = fn, ri_new_rhs = new_rhs })
1451 (SI { si_specs = specs })
1452 = [(id,rhs) | OS { os_id = id, os_rhs = rhs } <- specs] ++
1453 -- First the specialised bindings
1454
1455 [(fn `addIdSpecialisations` rules, new_rhs)]
1456 -- And now the original binding
1457 where
1458 rules = [r | OS { os_rule = r } <- specs]
1459
1460 {-
1461 ************************************************************************
1462 * *
1463 The specialiser itself
1464 * *
1465 ************************************************************************
1466 -}
1467
1468 data RhsInfo
1469 = RI { ri_fn :: OutId -- The binder
1470 , ri_new_rhs :: OutExpr -- The specialised RHS (in current envt)
1471 , ri_rhs_usg :: ScUsage -- Usage info from specialising RHS
1472
1473 , ri_lam_bndrs :: [InVar] -- The *original* RHS (\xs.body)
1474 , ri_lam_body :: InExpr -- Note [Specialise original body]
1475 , ri_arg_occs :: [ArgOcc] -- Info on how the xs occur in body
1476 }
1477
1478 data SpecInfo -- Info about specialisations for a particular Id
1479 = SI { si_specs :: [OneSpec] -- The specialisations we have generated
1480
1481 , si_n_specs :: Int -- Length of si_specs; used for numbering them
1482
1483 , si_mb_unspec :: Maybe ScUsage -- Just cs => we have not yet used calls in the
1484 } -- from calls in the *original* RHS as
1485 -- seeds for new specialisations;
1486 -- if you decide to do so, here is the
1487 -- RHS usage (which has not yet been
1488 -- unleashed)
1489 -- Nothing => we have
1490 -- See Note [Local recursive groups]
1491 -- See Note [spec_usg includes rhs_usg]
1492
1493 -- One specialisation: Rule plus definition
1494 data OneSpec =
1495 OS { os_pat :: CallPat -- Call pattern that generated this specialisation
1496 , os_rule :: CoreRule -- Rule connecting original id with the specialisation
1497 , os_id :: OutId -- Spec id
1498 , os_rhs :: OutExpr } -- Spec rhs
1499
1500 noSpecInfo :: SpecInfo
1501 noSpecInfo = SI { si_specs = [], si_n_specs = 0, si_mb_unspec = Nothing }
1502
1503 ----------------------
1504 specNonRec :: ScEnv
1505 -> ScUsage -- Body usage
1506 -> RhsInfo -- Structure info usage info for un-specialised RHS
1507 -> UniqSM (ScUsage, SpecInfo) -- Usage from RHSs (specialised and not)
1508 -- plus details of specialisations
1509
1510 specNonRec env body_usg rhs_info
1511 = specialise env (scu_calls body_usg) rhs_info
1512 (noSpecInfo { si_mb_unspec = Just (ri_rhs_usg rhs_info) })
1513
1514 ----------------------
1515 specRec :: TopLevelFlag -> ScEnv
1516 -> ScUsage -- Body usage
1517 -> [RhsInfo] -- Structure info and usage info for un-specialised RHSs
1518 -> UniqSM (ScUsage, [SpecInfo]) -- Usage from all RHSs (specialised and not)
1519 -- plus details of specialisations
1520
1521 specRec top_lvl env body_usg rhs_infos
1522 = go 1 seed_calls nullUsage init_spec_infos
1523 where
1524 (seed_calls, init_spec_infos) -- Note [Seeding top-level recursive groups]
1525 | isTopLevel top_lvl
1526 , any (isExportedId . ri_fn) rhs_infos -- Seed from body and RHSs
1527 = (all_calls, [noSpecInfo | _ <- rhs_infos])
1528 | otherwise -- Seed from body only
1529 = (calls_in_body, [noSpecInfo { si_mb_unspec = Just (ri_rhs_usg ri) }
1530 | ri <- rhs_infos])
1531
1532 calls_in_body = scu_calls body_usg
1533 calls_in_rhss = foldr (combineCalls . scu_calls . ri_rhs_usg) emptyVarEnv rhs_infos
1534 all_calls = calls_in_rhss `combineCalls` calls_in_body
1535
1536 -- Loop, specialising, until you get no new specialisations
1537 go :: Int -- Which iteration of the "until no new specialisations"
1538 -- loop we are on; first iteration is 1
1539 -> CallEnv -- Seed calls
1540 -- Two accumulating parameters:
1541 -> ScUsage -- Usage from earlier specialisations
1542 -> [SpecInfo] -- Details of specialisations so far
1543 -> UniqSM (ScUsage, [SpecInfo])
1544 go n_iter seed_calls usg_so_far spec_infos
1545 | isEmptyVarEnv seed_calls
1546 = -- pprTrace "specRec1" (vcat [ ppr (map ri_fn rhs_infos)
1547 -- , ppr seed_calls
1548 -- , ppr body_usg ]) $
1549 return (usg_so_far, spec_infos)
1550
1551 -- Limit recursive specialisation
1552 -- See Note [Limit recursive specialisation]
1553 | n_iter > sc_recursive env -- Too many iterations of the 'go' loop
1554 , sc_force env || isNothing (sc_count env)
1555 -- If both of these are false, the sc_count
1556 -- threshold will prevent non-termination
1557 , any ((> the_limit) . si_n_specs) spec_infos
1558 = -- pprTrace "specRec2" (ppr (map (map os_pat . si_specs) spec_infos)) $
1559 return (usg_so_far, spec_infos)
1560
1561 | otherwise
1562 = -- pprTrace "specRec3" (vcat [ text "bndrs" <+> ppr (map ri_fn rhs_infos)
1563 -- , text "iteration" <+> int n_iter
1564 -- , text "spec_infos" <+> ppr (map (map os_pat . si_specs) spec_infos)
1565 -- ]) $
1566 do { specs_w_usg <- zipWithM (specialise env seed_calls) rhs_infos spec_infos
1567 ; let (extra_usg_s, new_spec_infos) = unzip specs_w_usg
1568 extra_usg = combineUsages extra_usg_s
1569 all_usg = usg_so_far `combineUsage` extra_usg
1570 ; go (n_iter + 1) (scu_calls extra_usg) all_usg new_spec_infos }
1571
1572 -- See Note [Limit recursive specialisation]
1573 the_limit = case sc_count env of
1574 Nothing -> 10 -- Ugh!
1575 Just max -> max
1576
1577
1578 ----------------------
1579 specialise
1580 :: ScEnv
1581 -> CallEnv -- Info on newly-discovered calls to this function
1582 -> RhsInfo
1583 -> SpecInfo -- Original RHS plus patterns dealt with
1584 -> UniqSM (ScUsage, SpecInfo) -- New specialised versions and their usage
1585
1586 -- See Note [spec_usg includes rhs_usg]
1587
1588 -- Note: this only generates *specialised* bindings
1589 -- The original binding is added by ruleInfoBinds
1590 --
1591 -- Note: the rhs here is the optimised version of the original rhs
1592 -- So when we make a specialised copy of the RHS, we're starting
1593 -- from an RHS whose nested functions have been optimised already.
1594
1595 specialise env bind_calls (RI { ri_fn = fn, ri_lam_bndrs = arg_bndrs
1596 , ri_lam_body = body, ri_arg_occs = arg_occs })
1597 spec_info@(SI { si_specs = specs, si_n_specs = spec_count
1598 , si_mb_unspec = mb_unspec })
1599 | isBottomingId fn -- Note [Do not specialise diverging functions]
1600 -- and do not generate specialisation seeds from its RHS
1601 = -- pprTrace "specialise bot" (ppr fn) $
1602 return (nullUsage, spec_info)
1603
1604 | isNeverActive (idInlineActivation fn) -- See Note [Transfer activation]
1605 || null arg_bndrs -- Only specialise functions
1606 = -- pprTrace "specialise inactive" (ppr fn) $
1607 case mb_unspec of -- Behave as if there was a single, boring call
1608 Just rhs_usg -> return (rhs_usg, spec_info { si_mb_unspec = Nothing })
1609 -- See Note [spec_usg includes rhs_usg]
1610 Nothing -> return (nullUsage, spec_info)
1611
1612 | Just all_calls <- lookupVarEnv bind_calls fn
1613 = -- pprTrace "specialise entry {" (ppr fn <+> ppr all_calls) $
1614 do { (boring_call, new_pats) <- callsToNewPats env fn spec_info arg_occs all_calls
1615
1616 ; let n_pats = length new_pats
1617 -- ; if (not (null new_pats) || isJust mb_unspec) then
1618 -- pprTrace "specialise" (vcat [ ppr fn <+> text "with" <+> int n_pats <+> text "good patterns"
1619 -- , text "mb_unspec" <+> ppr (isJust mb_unspec)
1620 -- , text "arg_occs" <+> ppr arg_occs
1621 -- , text "good pats" <+> ppr new_pats]) $
1622 -- return ()
1623 -- else return ()
1624
1625 ; let spec_env = decreaseSpecCount env n_pats
1626 ; (spec_usgs, new_specs) <- mapAndUnzipM (spec_one spec_env fn arg_bndrs body)
1627 (new_pats `zip` [spec_count..])
1628 -- See Note [Specialise original body]
1629
1630 ; let spec_usg = combineUsages spec_usgs
1631
1632 -- If there were any boring calls among the seeds (= all_calls), then those
1633 -- calls will call the un-specialised function. So we should use the seeds
1634 -- from the _unspecialised_ function's RHS, which are in mb_unspec, by returning
1635 -- then in new_usg.
1636 (new_usg, mb_unspec')
1637 = case mb_unspec of
1638 Just rhs_usg | boring_call -> (spec_usg `combineUsage` rhs_usg, Nothing)
1639 _ -> (spec_usg, mb_unspec)
1640
1641 -- ; pprTrace "specialise return }"
1642 -- (vcat [ ppr fn
1643 -- , text "boring_call:" <+> ppr boring_call
1644 -- , text "new calls:" <+> ppr (scu_calls new_usg)]) $
1645 -- return ()
1646
1647 ; return (new_usg, SI { si_specs = new_specs ++ specs
1648 , si_n_specs = spec_count + n_pats
1649 , si_mb_unspec = mb_unspec' }) }
1650
1651 | otherwise -- No new seeds, so return nullUsage
1652 = return (nullUsage, spec_info)
1653
1654
1655
1656
1657 ---------------------
1658 spec_one :: ScEnv
1659 -> OutId -- Function
1660 -> [InVar] -- Lambda-binders of RHS; should match patterns
1661 -> InExpr -- Body of the original function
1662 -> (CallPat, Int)
1663 -> UniqSM (ScUsage, OneSpec) -- Rule and binding
1664
1665 -- spec_one creates a specialised copy of the function, together
1666 -- with a rule for using it. I'm very proud of how short this
1667 -- function is, considering what it does :-).
1668
1669 {-
1670 Example
1671
1672 In-scope: a, x::a
1673 f = /\b \y::[(a,b)] -> ....f (b,c) ((:) (a,(b,c)) (x,v) (h w))...
1674 [c::*, v::(b,c) are presumably bound by the (...) part]
1675 ==>
1676 f_spec = /\ b c \ v::(b,c) hw::[(a,(b,c))] ->
1677 (...entire body of f...) [b -> (b,c),
1678 y -> ((:) (a,(b,c)) (x,v) hw)]
1679
1680 RULE: forall b::* c::*, -- Note, *not* forall a, x
1681 v::(b,c),
1682 hw::[(a,(b,c))] .
1683
1684 f (b,c) ((:) (a,(b,c)) (x,v) hw) = f_spec b c v hw
1685 -}
1686
1687 spec_one env fn arg_bndrs body (call_pat@(qvars, pats), rule_number)
1688 = do { spec_uniq <- getUniqueM
1689 ; let spec_env = extendScSubstList (extendScInScope env qvars)
1690 (arg_bndrs `zip` pats)
1691 fn_name = idName fn
1692 fn_loc = nameSrcSpan fn_name
1693 fn_occ = nameOccName fn_name
1694 spec_occ = mkSpecOcc fn_occ
1695 -- We use fn_occ rather than fn in the rule_name string
1696 -- as we don't want the uniq to end up in the rule, and
1697 -- hence in the ABI, as that can cause spurious ABI
1698 -- changes (#4012).
1699 rule_name = mkFastString ("SC:" ++ occNameString fn_occ ++ show rule_number)
1700 spec_name = mkInternalName spec_uniq spec_occ fn_loc
1701 -- ; pprTrace "{spec_one" (ppr (sc_count env) <+> ppr fn
1702 -- <+> ppr pats <+> text "-->" <+> ppr spec_name) $
1703 -- return ()
1704
1705 -- Specialise the body
1706 ; (spec_usg, spec_body) <- scExpr spec_env body
1707
1708 -- ; pprTrace "done spec_one}" (ppr fn) $
1709 -- return ()
1710
1711 -- And build the results
1712 ; let (spec_lam_args, spec_call_args) = mkWorkerArgs (sc_dflags env)
1713 qvars body_ty
1714 -- Usual w/w hack to avoid generating
1715 -- a spec_rhs of unlifted type and no args
1716
1717 spec_lam_args_str = handOutStrictnessInformation (fst (splitStrictSig spec_str)) spec_lam_args
1718 -- Annotate the variables with the strictness information from
1719 -- the function (see Note [Strictness information in worker binders])
1720
1721 spec_join_arity | isJoinId fn = Just (length spec_lam_args)
1722 | otherwise = Nothing
1723 spec_id = mkLocalIdOrCoVar spec_name
1724 (mkLamTypes spec_lam_args body_ty)
1725 -- See Note [Transfer strictness]
1726 `setIdStrictness` spec_str
1727 `setIdArity` count isId spec_lam_args
1728 `asJoinId_maybe` spec_join_arity
1729 spec_str = calcSpecStrictness fn spec_lam_args pats
1730
1731
1732 -- Conditionally use result of new worker-wrapper transform
1733 spec_rhs = mkLams spec_lam_args_str spec_body
1734 body_ty = exprType spec_body
1735 rule_rhs = mkVarApps (Var spec_id) spec_call_args
1736 inline_act = idInlineActivation fn
1737 this_mod = sc_module spec_env
1738 rule = mkRule this_mod True {- Auto -} True {- Local -}
1739 rule_name inline_act fn_name qvars pats rule_rhs
1740 -- See Note [Transfer activation]
1741 ; return (spec_usg, OS { os_pat = call_pat, os_rule = rule
1742 , os_id = spec_id
1743 , os_rhs = spec_rhs }) }
1744
1745
1746 -- See Note [Strictness information in worker binders]
1747 handOutStrictnessInformation :: [Demand] -> [Var] -> [Var]
1748 handOutStrictnessInformation = go
1749 where
1750 go _ [] = []
1751 go [] vs = vs
1752 go (d:dmds) (v:vs) | isId v = setIdDemandInfo v d : go dmds vs
1753 go dmds (v:vs) = v : go dmds vs
1754
1755 calcSpecStrictness :: Id -- The original function
1756 -> [Var] -> [CoreExpr] -- Call pattern
1757 -> StrictSig -- Strictness of specialised thing
1758 -- See Note [Transfer strictness]
1759 calcSpecStrictness fn qvars pats
1760 = mkClosedStrictSig spec_dmds topRes
1761 where
1762 spec_dmds = [ lookupVarEnv dmd_env qv `orElse` topDmd | qv <- qvars, isId qv ]
1763 StrictSig (DmdType _ dmds _) = idStrictness fn
1764
1765 dmd_env = go emptyVarEnv dmds pats
1766
1767 go :: DmdEnv -> [Demand] -> [CoreExpr] -> DmdEnv
1768 go env ds (Type {} : pats) = go env ds pats
1769 go env ds (Coercion {} : pats) = go env ds pats
1770 go env (d:ds) (pat : pats) = go (go_one env d pat) ds pats
1771 go env _ _ = env
1772
1773 go_one :: DmdEnv -> Demand -> CoreExpr -> DmdEnv
1774 go_one env d (Var v) = extendVarEnv_C bothDmd env v d
1775 go_one env d e
1776 | Just ds <- splitProdDmd_maybe d -- NB: d does not have to be strict
1777 , (Var _, args) <- collectArgs e = go env ds args
1778 go_one env _ _ = env
1779
1780 {-
1781 Note [spec_usg includes rhs_usg]
1782 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1783 In calls to 'specialise', the returned ScUsage must include the rhs_usg in
1784 the passed-in SpecInfo, unless there are no calls at all to the function.
1785
1786 The caller can, indeed must, assume this. He should not combine in rhs_usg
1787 himself, or he'll get rhs_usg twice -- and that can lead to an exponential
1788 blowup of duplicates in the CallEnv. This is what gave rise to the massive
1789 performance loss in #8852.
1790
1791 Note [Specialise original body]
1792 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1793 The RhsInfo for a binding keeps the *original* body of the binding. We
1794 must specialise that, *not* the result of applying specExpr to the RHS
1795 (which is also kept in RhsInfo). Otherwise we end up specialising a
1796 specialised RHS, and that can lead directly to exponential behaviour.
1797
1798 Note [Transfer activation]
1799 ~~~~~~~~~~~~~~~~~~~~~~~~~~
1800 This note is for SpecConstr, but exactly the same thing
1801 happens in the overloading specialiser; see
1802 Note [Auto-specialisation and RULES] in Specialise.
1803
1804 In which phase should the specialise-constructor rules be active?
1805 Originally I made them always-active, but Manuel found that this
1806 defeated some clever user-written rules. Then I made them active only
1807 in Phase 0; after all, currently, the specConstr transformation is
1808 only run after the simplifier has reached Phase 0, but that meant
1809 that specialisations didn't fire inside wrappers; see test
1810 simplCore/should_compile/spec-inline.
1811
1812 So now I just use the inline-activation of the parent Id, as the
1813 activation for the specialisation RULE, just like the main specialiser;
1814
1815 This in turn means there is no point in specialising NOINLINE things,
1816 so we test for that.
1817
1818 Note [Transfer strictness]
1819 ~~~~~~~~~~~~~~~~~~~~~~~~~~
1820 We must transfer strictness information from the original function to
1821 the specialised one. Suppose, for example
1822
1823 f has strictness SS
1824 and a RULE f (a:as) b = f_spec a as b
1825
1826 Now we want f_spec to have strictness LLS, otherwise we'll use call-by-need
1827 when calling f_spec instead of call-by-value. And that can result in
1828 unbounded worsening in space (cf the classic foldl vs foldl')
1829
1830 See #3437 for a good example.
1831
1832 The function calcSpecStrictness performs the calculation.
1833
1834 Note [Strictness information in worker binders]
1835 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1836
1837 After having calculated the strictness annotation for the worker (see Note
1838 [Transfer strictness] above), we also want to have this information attached to
1839 the worker’s arguments, for the benefit of later passes. The function
1840 handOutStrictnessInformation decomposes the strictness annotation calculated by
1841 calcSpecStrictness and attaches them to the variables.
1842
1843 ************************************************************************
1844 * *
1845 \subsection{Argument analysis}
1846 * *
1847 ************************************************************************
1848
1849 This code deals with analysing call-site arguments to see whether
1850 they are constructor applications.
1851
1852 Note [Free type variables of the qvar types]
1853 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1854 In a call (f @a x True), that we want to specialise, what variables should
1855 we quantify over. Clearly over 'a' and 'x', but what about any type variables
1856 free in x's type? In fact we don't need to worry about them because (f @a)
1857 can only be a well-typed application if its type is compatible with x, so any
1858 variables free in x's type must be free in (f @a), and hence either be gathered
1859 via 'a' itself, or be in scope at f's defn. Hence we just take
1860 (exprsFreeVars pats).
1861
1862 BUT phantom type synonyms can mess this reasoning up,
1863 eg x::T b with type T b = Int
1864 So we apply expandTypeSynonyms to the bound Ids.
1865 See # 5458. Yuk.
1866
1867 Note [SpecConstr call patterns]
1868 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1869 A "call patterns" that we collect is going to become the LHS of a RULE.
1870 It's important that it doesn't have
1871 e |> Refl
1872 or
1873 e |> g1 |> g2
1874 because both of these will be optimised by Simplify.simplRule. In the
1875 former case such optimisation benign, because the rule will match more
1876 terms; but in the latter we may lose a binding of 'g1' or 'g2', and
1877 end up with a rule LHS that doesn't bind the template variables
1878 (#10602).
1879
1880 The simplifier eliminates such things, but SpecConstr itself constructs
1881 new terms by substituting. So the 'mkCast' in the Cast case of scExpr
1882 is very important!
1883
1884 Note [Choosing patterns]
1885 ~~~~~~~~~~~~~~~~~~~~~~~~
1886 If we get lots of patterns we may not want to make a specialisation
1887 for each of them (code bloat), so we choose as follows, implemented
1888 by trim_pats.
1889
1890 * The flag -fspec-constr-count-N sets the sc_count field
1891 of the ScEnv to (Just n). This limits the total number
1892 of specialisations for a given function to N.
1893
1894 * -fno-spec-constr-count sets the sc_count field to Nothing,
1895 which switches of the limit.
1896
1897 * The ghastly ForceSpecConstr trick also switches of the limit
1898 for a particular function
1899
1900 * Otherwise we sort the patterns to choose the most general
1901 ones first; more general => more widely applicable.
1902
1903 Note [SpecConstr and casts]
1904 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
1905 Consider (#14270) a call like
1906
1907 let f = e
1908 in ... f (K @(a |> co)) ...
1909
1910 where 'co' is a coercion variable not in scope at f's definition site.
1911 If we aren't caereful we'll get
1912
1913 let $sf a co = e (K @(a |> co))
1914 RULE "SC:f" forall a co. f (K @(a |> co)) = $sf a co
1915 f = e
1916 in ...
1917
1918 But alas, when we match the call we won't bind 'co', because type-matching
1919 (for good reasons) discards casts).
1920
1921 I don't know how to solve this, so for now I'm just discarding any
1922 call patterns that
1923 * Mentions a coercion variable in a type argument
1924 * That is not in scope at the binding of the function
1925
1926 I think this is very rare.
1927
1928 It is important (e.g. #14936) that this /only/ applies to
1929 coercions mentioned in casts. We don't want to be discombobulated
1930 by casts in terms! For example, consider
1931 f ((e1,e2) |> sym co)
1932 where, say,
1933 f :: Foo -> blah
1934 co :: Foo ~R (Int,Int)
1935
1936 Here we definitely do want to specialise for that pair! We do not
1937 match on the structre of the coercion; instead we just match on a
1938 coercion variable, so the RULE looks like
1939
1940 forall (x::Int, y::Int, co :: (Int,Int) ~R Foo)
1941 f ((x,y) |> co) = $sf x y co
1942
1943 Often the body of f looks like
1944 f arg = ...(case arg |> co' of
1945 (x,y) -> blah)...
1946
1947 so that the specialised f will turn into
1948 $sf x y co = let arg = (x,y) |> co
1949 in ...(case arg>| co' of
1950 (x,y) -> blah)....
1951
1952 which will simplify to not use 'co' at all. But we can't guarantee
1953 that co will end up unused, so we still pass it. Absence analysis
1954 may remove it later.
1955
1956 Note that this /also/ discards the call pattern if we have a cast in a
1957 /term/, although in fact Rules.match does make a very flaky and
1958 fragile attempt to match coercions. e.g. a call like
1959 f (Maybe Age) (Nothing |> co) blah
1960 where co :: Maybe Int ~ Maybe Age
1961 will be discarded. It's extremely fragile to match on the form of a
1962 coercion, so I think it's better just not to try. A more complicated
1963 alternative would be to discard calls that mention coercion variables
1964 only in kind-casts, but I'm doing the simple thing for now.
1965 -}
1966
1967 type CallPat = ([Var], [CoreExpr]) -- Quantified variables and arguments
1968 -- See Note [SpecConstr call patterns]
1969
1970 callsToNewPats :: ScEnv -> Id
1971 -> SpecInfo
1972 -> [ArgOcc] -> [Call]
1973 -> UniqSM (Bool, [CallPat])
1974 -- Result has no duplicate patterns,
1975 -- nor ones mentioned in done_pats
1976 -- Bool indicates that there was at least one boring pattern
1977 callsToNewPats env fn spec_info@(SI { si_specs = done_specs }) bndr_occs calls
1978 = do { mb_pats <- mapM (callToPats env bndr_occs) calls
1979
1980 ; let have_boring_call = any isNothing mb_pats
1981
1982 good_pats :: [CallPat]
1983 good_pats = catMaybes mb_pats
1984
1985 -- Remove patterns we have already done
1986 new_pats = filterOut is_done good_pats
1987 is_done p = any (samePat p . os_pat) done_specs
1988
1989 -- Remove duplicates
1990 non_dups = nubBy samePat new_pats
1991
1992 -- Remove ones that have too many worker variables
1993 small_pats = filterOut too_big non_dups
1994 too_big (vars,_) = not (isWorkerSmallEnough (sc_dflags env) vars)
1995 -- We are about to construct w/w pair in 'spec_one'.
1996 -- Omit specialisation leading to high arity workers.
1997 -- See Note [Limit w/w arity] in WwLib
1998
1999 -- Discard specialisations if there are too many of them
2000 trimmed_pats = trim_pats env fn spec_info small_pats
2001
2002 -- ; pprTrace "callsToPats" (vcat [ text "calls to" <+> ppr fn <> colon <+> ppr calls
2003 -- , text "done_specs:" <+> ppr (map os_pat done_specs)
2004 -- , text "good_pats:" <+> ppr good_pats ]) $
2005 -- return ()
2006
2007 ; return (have_boring_call, trimmed_pats) }
2008
2009
2010 trim_pats :: ScEnv -> Id -> SpecInfo -> [CallPat] -> [CallPat]
2011 -- See Note [Choosing patterns]
2012 trim_pats env fn (SI { si_n_specs = done_spec_count }) pats
2013 | sc_force env
2014 || isNothing mb_scc
2015 || n_remaining >= n_pats
2016 = -- pprTrace "trim_pats: no-trim" (ppr (sc_force env) $$ ppr mb_scc $$ ppr n_remaining $$ ppr n_pats)
2017 pats -- No need to trim
2018
2019 | otherwise
2020 = emit_trace $ -- Need to trim, so keep the best ones
2021 take n_remaining sorted_pats
2022
2023 where
2024 n_pats = length pats
2025 spec_count' = n_pats + done_spec_count
2026 n_remaining = max_specs - done_spec_count
2027 mb_scc = sc_count env
2028 Just max_specs = mb_scc
2029
2030 sorted_pats = map fst $
2031 sortBy (comparing snd) $
2032 [(pat, pat_cons pat) | pat <- pats]
2033 -- Sort in order of increasing number of constructors
2034 -- (i.e. decreasing generality) and pick the initial
2035 -- segment of this list
2036
2037 pat_cons :: CallPat -> Int
2038 -- How many data constructors of literals are in
2039 -- the pattern. More data-cons => less general
2040 pat_cons (qs, ps) = foldr ((+) . n_cons) 0 ps
2041 where
2042 q_set = mkVarSet qs
2043 n_cons (Var v) | v `elemVarSet` q_set = 0
2044 | otherwise = 1
2045 n_cons (Cast e _) = n_cons e
2046 n_cons (App e1 e2) = n_cons e1 + n_cons e2
2047 n_cons (Lit {}) = 1
2048 n_cons _ = 0
2049
2050 emit_trace result
2051 | debugIsOn || hasPprDebug (sc_dflags env)
2052 -- Suppress this scary message for ordinary users! #5125
2053 = pprTrace "SpecConstr" msg result
2054 | otherwise
2055 = result
2056 msg = vcat [ sep [ text "Function" <+> quotes (ppr fn)
2057 , nest 2 (text "has" <+>
2058 speakNOf spec_count' (text "call pattern") <> comma <+>
2059 text "but the limit is" <+> int max_specs) ]
2060 , text "Use -fspec-constr-count=n to set the bound"
2061 , text "done_spec_count =" <+> int done_spec_count
2062 , text "Keeping " <+> int n_remaining <> text ", out of" <+> int n_pats
2063 , text "Discarding:" <+> ppr (drop n_remaining sorted_pats) ]
2064
2065
2066 callToPats :: ScEnv -> [ArgOcc] -> Call -> UniqSM (Maybe CallPat)
2067 -- The [Var] is the variables to quantify over in the rule
2068 -- Type variables come first, since they may scope
2069 -- over the following term variables
2070 -- The [CoreExpr] are the argument patterns for the rule
2071 callToPats env bndr_occs call@(Call _ args con_env)
2072 | args `ltLength` bndr_occs -- Check saturated
2073 = return Nothing
2074 | otherwise
2075 = do { let in_scope = substInScope (sc_subst env)
2076 ; (interesting, pats) <- argsToPats env in_scope con_env args bndr_occs
2077 ; let pat_fvs = exprsFreeVarsList pats
2078 -- To get determinism we need the list of free variables in
2079 -- deterministic order. Otherwise we end up creating
2080 -- lambdas with different argument orders. See
2081 -- determinism/simplCore/should_compile/spec-inline-determ.hs
2082 -- for an example. For explanation of determinism
2083 -- considerations See Note [Unique Determinism] in Unique.
2084
2085 in_scope_vars = getInScopeVars in_scope
2086 is_in_scope v = v `elemVarSet` in_scope_vars
2087 qvars = filterOut is_in_scope pat_fvs
2088 -- Quantify over variables that are not in scope
2089 -- at the call site
2090 -- See Note [Free type variables of the qvar types]
2091 -- See Note [Shadowing] at the top
2092
2093 (ktvs, ids) = partition isTyVar qvars
2094 qvars' = scopedSort ktvs ++ map sanitise ids
2095 -- Order into kind variables, type variables, term variables
2096 -- The kind of a type variable may mention a kind variable
2097 -- and the type of a term variable may mention a type variable
2098
2099 sanitise id = id `setIdType` expandTypeSynonyms (idType id)
2100 -- See Note [Free type variables of the qvar types]
2101
2102 -- Bad coercion variables: see Note [SpecConstr and casts]
2103 bad_covars :: CoVarSet
2104 bad_covars = mapUnionVarSet get_bad_covars pats
2105 get_bad_covars :: CoreArg -> CoVarSet
2106 get_bad_covars (Type ty)
2107 = filterVarSet (\v -> isId v && not (is_in_scope v)) $
2108 tyCoVarsOfType ty
2109 get_bad_covars _
2110 = emptyVarSet
2111
2112 ; -- pprTrace "callToPats" (ppr args $$ ppr bndr_occs) $
2113 WARN( not (isEmptyVarSet bad_covars)
2114 , text "SpecConstr: bad covars:" <+> ppr bad_covars
2115 $$ ppr call )
2116 if interesting && isEmptyVarSet bad_covars
2117 then return (Just (qvars', pats))
2118 else return Nothing }
2119
2120 -- argToPat takes an actual argument, and returns an abstracted
2121 -- version, consisting of just the "constructor skeleton" of the
2122 -- argument, with non-constructor sub-expression replaced by new
2123 -- placeholder variables. For example:
2124 -- C a (D (f x) (g y)) ==> C p1 (D p2 p3)
2125
2126 argToPat :: ScEnv
2127 -> InScopeSet -- What's in scope at the fn defn site
2128 -> ValueEnv -- ValueEnv at the call site
2129 -> CoreArg -- A call arg (or component thereof)
2130 -> ArgOcc
2131 -> UniqSM (Bool, CoreArg)
2132
2133 -- Returns (interesting, pat),
2134 -- where pat is the pattern derived from the argument
2135 -- interesting=True if the pattern is non-trivial (not a variable or type)
2136 -- E.g. x:xs --> (True, x:xs)
2137 -- f xs --> (False, w) where w is a fresh wildcard
2138 -- (f xs, 'c') --> (True, (w, 'c')) where w is a fresh wildcard
2139 -- \x. x+y --> (True, \x. x+y)
2140 -- lvl7 --> (True, lvl7) if lvl7 is bound
2141 -- somewhere further out
2142
2143 argToPat _env _in_scope _val_env arg@(Type {}) _arg_occ
2144 = return (False, arg)
2145
2146 argToPat env in_scope val_env (Tick _ arg) arg_occ
2147 = argToPat env in_scope val_env arg arg_occ
2148 -- Note [Notes in call patterns]
2149 -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2150 -- Ignore Notes. In particular, we want to ignore any InlineMe notes
2151 -- Perhaps we should not ignore profiling notes, but I'm going to
2152 -- ride roughshod over them all for now.
2153 --- See Note [Notes in RULE matching] in Rules
2154
2155 argToPat env in_scope val_env (Let _ arg) arg_occ
2156 = argToPat env in_scope val_env arg arg_occ
2157 -- See Note [Matching lets] in Rule.hs
2158 -- Look through let expressions
2159 -- e.g. f (let v = rhs in (v,w))
2160 -- Here we can specialise for f (v,w)
2161 -- because the rule-matcher will look through the let.
2162
2163 {- Disabled; see Note [Matching cases] in Rule.hs
2164 argToPat env in_scope val_env (Case scrut _ _ [(_, _, rhs)]) arg_occ
2165 | exprOkForSpeculation scrut -- See Note [Matching cases] in Rule.hhs
2166 = argToPat env in_scope val_env rhs arg_occ
2167 -}
2168
2169 argToPat env in_scope val_env (Cast arg co) arg_occ
2170 | not (ignoreType env ty2)
2171 = do { (interesting, arg') <- argToPat env in_scope val_env arg arg_occ
2172 ; if not interesting then
2173 wildCardPat ty2
2174 else do
2175 { -- Make a wild-card pattern for the coercion
2176 uniq <- getUniqueM
2177 ; let co_name = mkSysTvName uniq (fsLit "sg")
2178 co_var = mkCoVar co_name (mkCoercionType Representational ty1 ty2)
2179 ; return (interesting, Cast arg' (mkCoVarCo co_var)) } }
2180 where
2181 Pair ty1 ty2 = coercionKind co
2182
2183
2184
2185 {- Disabling lambda specialisation for now
2186 It's fragile, and the spec_loop can be infinite
2187 argToPat in_scope val_env arg arg_occ
2188 | is_value_lam arg
2189 = return (True, arg)
2190 where
2191 is_value_lam (Lam v e) -- Spot a value lambda, even if
2192 | isId v = True -- it is inside a type lambda
2193 | otherwise = is_value_lam e
2194 is_value_lam other = False
2195 -}
2196
2197 -- Check for a constructor application
2198 -- NB: this *precedes* the Var case, so that we catch nullary constrs
2199 argToPat env in_scope val_env arg arg_occ
2200 | Just (ConVal (DataAlt dc) args) <- isValue val_env arg
2201 , not (ignoreDataCon env dc) -- See Note [NoSpecConstr]
2202 , Just arg_occs <- mb_scrut dc
2203 = do { let (ty_args, rest_args) = splitAtList (dataConUnivTyVars dc) args
2204 ; (_, args') <- argsToPats env in_scope val_env rest_args arg_occs
2205 ; return (True,
2206 mkConApp dc (ty_args ++ args')) }
2207 where
2208 mb_scrut dc = case arg_occ of
2209 ScrutOcc bs | Just occs <- lookupUFM bs dc
2210 -> Just (occs) -- See Note [Reboxing]
2211 _other | sc_force env || sc_keen env
2212 -> Just (repeat UnkOcc)
2213 | otherwise
2214 -> Nothing
2215
2216 -- Check if the argument is a variable that
2217 -- (a) is used in an interesting way in the function body
2218 -- (b) we know what its value is
2219 -- In that case it counts as "interesting"
2220 argToPat env in_scope val_env (Var v) arg_occ
2221 | sc_force env || case arg_occ of { UnkOcc -> False; _other -> True }, -- (a)
2222 is_value, -- (b)
2223 -- Ignoring sc_keen here to avoid gratuitously incurring Note [Reboxing]
2224 -- So sc_keen focused just on f (I# x), where we have freshly-allocated
2225 -- box that we can eliminate in the caller
2226 not (ignoreType env (varType v))
2227 = return (True, Var v)
2228 where
2229 is_value
2230 | isLocalId v = v `elemInScopeSet` in_scope
2231 && isJust (lookupVarEnv val_env v)
2232 -- Local variables have values in val_env
2233 | otherwise = isValueUnfolding (idUnfolding v)
2234 -- Imports have unfoldings
2235
2236 -- I'm really not sure what this comment means
2237 -- And by not wild-carding we tend to get forall'd
2238 -- variables that are in scope, which in turn can
2239 -- expose the weakness in let-matching
2240 -- See Note [Matching lets] in Rules
2241
2242 -- Check for a variable bound inside the function.
2243 -- Don't make a wild-card, because we may usefully share
2244 -- e.g. f a = let x = ... in f (x,x)
2245 -- NB: this case follows the lambda and con-app cases!!
2246 -- argToPat _in_scope _val_env (Var v) _arg_occ
2247 -- = return (False, Var v)
2248 -- SLPJ : disabling this to avoid proliferation of versions
2249 -- also works badly when thinking about seeding the loop
2250 -- from the body of the let
2251 -- f x y = letrec g z = ... in g (x,y)
2252 -- We don't want to specialise for that *particular* x,y
2253
2254 -- The default case: make a wild-card
2255 -- We use this for coercions too
2256 argToPat _env _in_scope _val_env arg _arg_occ
2257 = wildCardPat (exprType arg)
2258
2259 wildCardPat :: Type -> UniqSM (Bool, CoreArg)
2260 wildCardPat ty
2261 = do { uniq <- getUniqueM
2262 ; let id = mkSysLocalOrCoVar (fsLit "sc") uniq ty
2263 ; return (False, varToCoreExpr id) }
2264
2265 argsToPats :: ScEnv -> InScopeSet -> ValueEnv
2266 -> [CoreArg] -> [ArgOcc] -- Should be same length
2267 -> UniqSM (Bool, [CoreArg])
2268 argsToPats env in_scope val_env args occs
2269 = do { stuff <- zipWithM (argToPat env in_scope val_env) args occs
2270 ; let (interesting_s, args') = unzip stuff
2271 ; return (or interesting_s, args') }
2272
2273 isValue :: ValueEnv -> CoreExpr -> Maybe Value
2274 isValue _env (Lit lit)
2275 | litIsLifted lit = Nothing
2276 | otherwise = Just (ConVal (LitAlt lit) [])
2277
2278 isValue env (Var v)
2279 | Just cval <- lookupVarEnv env v
2280 = Just cval -- You might think we could look in the idUnfolding here
2281 -- but that doesn't take account of which branch of a
2282 -- case we are in, which is the whole point
2283
2284 | not (isLocalId v) && isCheapUnfolding unf
2285 = isValue env (unfoldingTemplate unf)
2286 where
2287 unf = idUnfolding v
2288 -- However we do want to consult the unfolding
2289 -- as well, for let-bound constructors!
2290
2291 isValue env (Lam b e)
2292 | isTyVar b = case isValue env e of
2293 Just _ -> Just LambdaVal
2294 Nothing -> Nothing
2295 | otherwise = Just LambdaVal
2296
2297 isValue env (Tick t e)
2298 | not (tickishIsCode t)
2299 = isValue env e
2300
2301 isValue _env expr -- Maybe it's a constructor application
2302 | (Var fun, args, _) <- collectArgsTicks (not . tickishIsCode) expr
2303 = case isDataConWorkId_maybe fun of
2304
2305 Just con | args `lengthAtLeast` dataConRepArity con
2306 -- Check saturated; might be > because the
2307 -- arity excludes type args
2308 -> Just (ConVal (DataAlt con) args)
2309
2310 _other | valArgCount args < idArity fun
2311 -- Under-applied function
2312 -> Just LambdaVal -- Partial application
2313
2314 _other -> Nothing
2315
2316 isValue _env _expr = Nothing
2317
2318 valueIsWorkFree :: Value -> Bool
2319 valueIsWorkFree LambdaVal = True
2320 valueIsWorkFree (ConVal _ args) = all exprIsWorkFree args
2321
2322 samePat :: CallPat -> CallPat -> Bool
2323 samePat (vs1, as1) (vs2, as2)
2324 = all2 same as1 as2
2325 where
2326 same (Var v1) (Var v2)
2327 | v1 `elem` vs1 = v2 `elem` vs2
2328 | v2 `elem` vs2 = False
2329 | otherwise = v1 == v2
2330
2331 same (Lit l1) (Lit l2) = l1==l2
2332 same (App f1 a1) (App f2 a2) = same f1 f2 && same a1 a2
2333
2334 same (Type {}) (Type {}) = True -- Note [Ignore type differences]
2335 same (Coercion {}) (Coercion {}) = True
2336 same (Tick _ e1) e2 = same e1 e2 -- Ignore casts and notes
2337 same (Cast e1 _) e2 = same e1 e2
2338 same e1 (Tick _ e2) = same e1 e2
2339 same e1 (Cast e2 _) = same e1 e2
2340
2341 same e1 e2 = WARN( bad e1 || bad e2, ppr e1 $$ ppr e2)
2342 False -- Let, lambda, case should not occur
2343 bad (Case {}) = True
2344 bad (Let {}) = True
2345 bad (Lam {}) = True
2346 bad _other = False
2347
2348 {-
2349 Note [Ignore type differences]
2350 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2351 We do not want to generate specialisations where the call patterns
2352 differ only in their type arguments! Not only is it utterly useless,
2353 but it also means that (with polymorphic recursion) we can generate
2354 an infinite number of specialisations. Example is Data.Sequence.adjustTree,
2355 I think.
2356 -}