1 {-

2 (c) The University of Glasgow 2006

3 (c) The AQUA Project, Glasgow University, 1994-1998

6 Core-syntax unfoldings

8 Unfoldings (which can travel across module boundaries) are in Core

9 syntax (namely @CoreExpr@s).

11 The type @Unfolding@ sits ``above'' simply-Core-expressions

12 unfoldings, capturing ``higher-level'' things we know about a binding,

13 usually things that the simplifier found out (e.g., ``it's a

14 literal''). In the corner of a @CoreUnfolding@ unfolding, you will

15 find, unsurprisingly, a Core expression.

16 -}

18 {-# LANGUAGE CPP #-}

29 specUnfolding,

31 ArgSummary(..),

38 -- Reexport from CoreSubst (it only live there so it can be used

39 -- by the Very Simple Optimiser)

40 exprIsConApp_maybe, exprIsLiteral_maybe

45 import GhcPrelude

47 import DynFlags

48 import CoreSyn

51 import CoreOpt

53 import CoreUtils

54 import Id

56 import DataCon

57 import Literal

58 import PrimOp

59 import IdInfo

61 import Type

62 import PrelNames

64 import Bag

65 import Util

66 import Outputable

67 import ForeignCall

68 import Name

73 {-

74 ************************************************************************

75 * *

76 \subsection{Making unfoldings}

77 * *

78 ************************************************************************

79 -}

82 mkTopUnfolding dflags is_bottoming rhs

86 -- For implicit Ids, do a tiny bit of optimising first

87 mkImplicitUnfolding dflags expr

90 -- Note [Top-level flag on inline rules]

91 -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

92 -- Slight hack: note that mk_inline_rules conservatively sets the

93 -- top-level flag to True. It gets set more accurately by the simplifier

94 -- Simplify.simplUnfolding.

97 mkSimpleUnfolding dflags rhs

101 mkDFunUnfolding bndrs con ops

105 -- See Note [Occurrence analysis of unfoldings]

108 mkWwInlineRule dflags expr arity

115 mkCompulsoryUnfolding expr -- Used for things that absolutely must be unfolded

122 -- See Note [Worker-wrapper for INLINABLE functions] in WorkWrap

123 mkWorkerUnfolding dflags work_fn

126 | isStableSource src

127 = mkCoreUnfolding src top_lvl new_tmpl guidance

128 where

132 mkWorkerUnfolding _ _ _ = noUnfolding

134 -- | Make an unfolding that may be used unsaturated

135 -- (ug_unsat_ok = unSaturatedOk) and that is reported as having its

136 -- manifest arity (the number of outer lambdas applications will

137 -- resolve before doing any work).

139 mkInlineUnfolding expr

140 = mkCoreUnfolding InlineStable

142 expr' guide

143 where

144 expr' = simpleOptExpr unsafeGlobalDynFlags expr

150 -- | Make an unfolding that will be used once the RHS has been saturated

151 -- to the given arity.

153 mkInlineUnfoldingWithArity arity expr

154 = mkCoreUnfolding InlineStable

156 expr' guide

157 where

158 expr' = simpleOptExpr unsafeGlobalDynFlags expr

162 -- See Note [INLINE pragmas and boring contexts] as to why we need to look

163 -- at the arity here.

168 mkInlinableUnfolding dflags expr

170 where

171 expr' = simpleOptExpr dflags expr

175 -- See Note [Specialising unfoldings]

176 -- specUnfolding spec_bndrs spec_app arity_decrease unf

177 -- = \spec_bndrs. spec_app( unf )

178 --

179 specUnfolding dflags spec_bndrs spec_app arity_decrease

183 -- There is a hard-to-check assumption here that the spec_app has

184 -- enough applications to exactly saturate the old_bndrs

185 -- For DFunUnfoldings we transform

186 -- \old_bndrs. MkD <op1> ... <opn>

187 -- to

188 -- \new_bndrs. MkD (spec_app(\old_bndrs. <op1>)) ... ditto <opn>

189 -- The ASSERT checks the value part of that

190 where

192 -- The beta-redexes created by spec_app will be

193 -- simplified away by simplOptExpr

195 specUnfolding dflags spec_bndrs spec_app arity_decrease

199 | isStableSource src -- See Note [Specialising unfoldings]

207 -- The beta-redexes created by spec_app will be

208 -- simplified away by simplOptExpr

210 in mkCoreUnfolding src top_lvl new_tmpl guidance

212 specUnfolding _ _ _ _ _ = noUnfolding

214 {- Note [Specialising unfoldings]

215 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

216 When we specialise a function for some given type-class arguments, we use

217 specUnfolding to specialise its unfolding. Some important points:

219 * If the original function has a DFunUnfolding, the specialised one

220 must do so too! Otherwise we lose the magic rules that make it

221 interact with ClassOps

223 * There is a bit of hack for INLINABLE functions:

224 f :: Ord a => ....

225 f = <big-rhs>

226 {- INLINABLE f #-}

244 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

245 Consider

248 {-# INLINE x #-}

252 The semantics of an INLINE pragma is

259 the `UnfoldingGuidance`.)

266 {-# INLINE x #-}

274 How do we deliver on this? By adjusting the ug_boring_ok

278 NB: there is a real risk that full laziness will float it right back

279 out again. Consider again

281 {-# INLINE x #-}

284 After inlining we get

294 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

295 An INLINE pragma uses mkInlineUnfoldingWithArity to build the

301 pragma.

308 -}

312 -- Occurrence-analyses the expression before capturing it

313 mkCoreUnfolding src top_lvl expr guidance

315 -- See Note [Occurrence analysis of unfoldings]

327 -- (only relevant for top-level bindings)

328 -> CoreExpr

329 -> Unfolding

330 -- Calculates unfolding guidance

331 -- Occurrence-analyses the expression before capturing it

332 mkUnfolding dflags src is_top_lvl is_bottoming expr

334 -- See Note [Occurrence analysis of unfoldings]

342 where

344 guidance = calcUnfoldingGuidance dflags is_top_bottoming expr

345 -- NB: *not* (calcUnfoldingGuidance (occurAnalyseExpr expr))!

346 -- See Note [Calculate unfolding guidance on the non-occ-anal'd expression]

348 {-

349 Note [Occurrence analysis of unfoldings]

350 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

351 We do occurrence-analysis of unfoldings once and for all, when the

352 unfolding is built, rather than each time we inline them.

354 But given this decision it's vital that we do

355 *always* do it. Consider this unfolding

356 \x -> letrec { f = ...g...; g* = f } in body

357 where g* is (for some strange reason) the loop breaker. If we don't

358 occ-anal it when reading it in, we won't mark g as a loop breaker, and

359 we may inline g entirely in body, dropping its binding, and leaving

360 the occurrence in f out of scope. This happened in Trac #8892, where

361 the unfolding in question was a DFun unfolding.

363 But more generally, the simplifier is designed on the

364 basis that it is looking at occurrence-analysed expressions, so better

365 ensure that they acutally are.

367 Note [Calculate unfolding guidance on the non-occ-anal'd expression]

368 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

369 Notice that we give the non-occur-analysed expression to

370 calcUnfoldingGuidance. In some ways it'd be better to occur-analyse

371 first; for example, sometimes during simplification, there's a large

372 let-bound thing which has been substituted, and so is now dead; so

373 'expr' contains two copies of the thing while the occurrence-analysed

374 expression doesn't.

376 Nevertheless, we *don't* and *must not* occ-analyse before computing

377 the size because

379 a) The size computation bales out after a while, whereas occurrence

380 analysis does not.

382 b) Residency increases sharply if you occ-anal first. I'm not

383 100% sure why, but it's a large effect. Compiling Cabal went

384 from residency of 534M to over 800M with this one change.

386 This can occasionally mean that the guidance is very pessimistic;

387 it gets fixed up next round. And it should be rare, because large

388 let-bound things that are dead are usually caught by preInlineUnconditionally

391 ************************************************************************

392 * *

393 \subsection{The UnfoldingGuidance type}

394 * *

395 ************************************************************************

396 -}

399 -- See Note [INLINE for small functions]

400 -- True => the result of inlining the expression is

401 -- no bigger than the expression itself

402 -- eg (\x y -> f y x)

403 -- This is a quick and dirty version. It doesn't attempt

404 -- to deal with (\x y z -> x (y z))

405 -- The really important one is (x `cast` c)

406 inlineBoringOk e

408 where

418 go _ _ = boringCxtNotOk

420 calcUnfoldingGuidance

421 :: DynFlags

424 -> UnfoldingGuidance

427 = calcUnfoldingGuidance dflags is_top_bottoming expr

428 calcUnfoldingGuidance dflags is_top_bottoming expr

430 TooBig -> UnfNever

431 SizeIs size cased_bndrs scrut_discount

432 | uncondInline expr n_val_bndrs size

437 | is_top_bottoming

440 | otherwise

445 where

447 bOMB_OUT_SIZE = ufCreationThreshold dflags

448 -- Bomb out if size gets bigger than this

454 where

462 -- See Note [Function and non-function discounts]

464 {-

465 Note [Computing the size of an expression]

466 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

467 The basic idea of sizeExpr is obvious enough: count nodes. But getting the

468 heuristics right has taken a long time. Here's the basic strategy:

470 * Variables, literals: 0

471 (Exception for string literals, see litSize.)

473 * Function applications (f e1 .. en): 1 + #value args

475 * Constructor applications: 1, regardless of #args

477 * Let(rec): 1 + size of components

479 * Note, cast: 0

481 Examples

483 Size Term

484 --------------

485 0 42#

486 0 x

487 0 True

488 2 f x

489 1 Just x

490 4 f (g x)

492 Notice that 'x' counts 0, while (f x) counts 2. That's deliberate: there's

493 a function call to account for. Notice also that constructor applications

494 are very cheap, because exposing them to a caller is so valuable.

496 [25/5/11] All sizes are now multiplied by 10, except for primops

497 (which have sizes like 1 or 4. This makes primops look fantastically

498 cheap, and seems to be almost unversally beneficial. Done partly as a

499 result of #4978.

501 Note [Do not inline top-level bottoming functions]

502 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

503 The FloatOut pass has gone to some trouble to float out calls to 'error'

504 and similar friends. See Note [Bottoming floats] in SetLevels.

505 Do not re-inline them! But we *do* still inline if they are very small

506 (the uncondInline stuff).

508 Note [INLINE for small functions]

509 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

510 Consider {-# INLINE f #-}

511 f x = Just x

512 g y = f y

518 Things to note:

524 -- arguments to the cons

525 x --> g 3 -- NO

526 x --> Just v -- NO

532 silly situation that

533 f x y = x

537 efficient at runtime.

544 If we inline f here we get

559 -}

562 -- Inline unconditionally if there no size increase

563 -- Size of call is arity (+1 for the function)

564 -- See Note [INLINE for small functions]

565 uncondInline rhs arity size

569 sizeExpr :: DynFlags

572 -- get case'd

573 -> CoreExpr

574 -> ExprSize

576 -- Note [Computing the size of an expression]

578 sizeExpr dflags bOMB_OUT_SIZE top_args expr

579 = size_up expr

580 where

587 -- Make sure we get constructor discounts even

588 -- on nullary constructors

592 | isTyCoArg arg = size_up fun

602 size_up body `addSizeN`

603 size_up_alloc binder

608 pairs

611 | null alts

615 -- Now alts is non-empty

620 -- alts_size tries to compute a good discount for

621 -- the case when we are scrutinising an argument variable

623 -- Size of all alternatives

625 -- Size of biggest alternative

627 `unionBags` tot_disc) tot_scrut

628 -- If the variable is known, we produce a

629 -- discount that will take us back to 'max',

630 -- the size of the largest alternative The

631 -- 1+ is a little discount for reduced

632 -- allocation in the caller

633 --

634 -- Notice though, that we return tot_disc,

635 -- the total discount from all branches. I

636 -- think that's right.

638 alts_size tot_size _ = tot_size

639 in

642 -- Good to inline if an arg is scrutinised, because

643 -- that may eliminate allocation in the caller

644 -- And it eliminates the case itself

645 where

648 is_top_arg _ = Nothing

653 where

654 case_size

657 -- Normally we don't charge for the case itself, but

658 -- we charge one per alternative (see size_up_alt,

659 -- below) to account for the cost of the info table

660 -- and comparisons.

661 --

662 -- However, in certain cases (see is_inline_scrut

663 -- below), no code is generated for the case unless

664 -- there are multiple alts. In these cases we

665 -- subtract one, making the first alt free.

666 -- e.g. case x# +# y# of _ -> ... should cost 1

667 -- case touch# x# of _ -> ... should cost 0

668 -- (see #4978)

669 --

670 -- I would like to not have the "lengthAtMost alts 1"

671 -- condition above, but without that some programs got worse

672 -- (spectral/hartel/event and spectral/para). I don't fully

673 -- understand why. (SDM 24/5/11)

675 -- unboxed variables, inline primops and unsafe foreign calls

676 -- are all "inline" things:

678 is_inline_scrut scrut

684 | otherwise

688 | Just join_arity <- isJoinId_maybe bndr

689 -- Skip arguments to join point

691 = size_up body

692 | otherwise

693 = size_up rhs

695 ------------

696 -- size_up_app is used when there's ONE OR MORE value args

698 | isTyCoArg arg = size_up_app fun args voids

705 size_up_app other args voids = size_up other `addSizeN`

707 -- if the lhs is not an App or a Var, or an invisible thing like a

708 -- Tick or Cast, then we should charge for a complete call plus the

709 -- size of the lhs itself.

711 ------------

713 size_up_call fun val_args voids

718 ClassOpId _ -> classOpSize dflags top_args val_args

721 ------------

723 -- Don't charge for args, so that wrappers look cheap

724 -- (See comments about wrappers with Case)

725 --

726 -- IMPORTANT: *do* charge 1 for the alternative, else we

727 -- find that giant case nests are treated as practically free

728 -- A good example is Foreign.C.Error.errnoToIOError

730 ------------

731 -- Cost to allocate binding with given binder

732 size_up_alloc bndr

733 | isTyVar bndr -- Doesn't exist at runtime

734 || isJoinId bndr -- Not allocated at all

737 | otherwise

740 ------------

741 -- These addSize things have to be here because

742 -- I don't want to give them bOMB_OUT_SIZE as an argument

743 addSizeN TooBig _ = TooBig

746 -- addAltSize is used to add the sizes of case alternatives

747 addAltSize TooBig _ = TooBig

748 addAltSize _ TooBig = TooBig

754 -- This variant ignores the result discount from its LEFT argument

755 -- It's used when the second argument isn't part of the result

756 addSizeNSD TooBig _ = TooBig

757 addSizeNSD _ TooBig = TooBig

761 d2 -- Ignore d1

765 -- an expression of type State# RealWorld must be a variable

770 -- | Finds a nominal size of a string literal.

772 -- Used by CoreUnfold.sizeExpr

776 -- If size could be 0 then @f "x"@ might be too small

777 -- [Sept03: make literal strings a bit bigger to avoid fruitless

778 -- duplication of little strings]

780 -- Key point: if x |-> 4, then x must inline unconditionally

781 -- (eg via case binding)

784 -- See Note [Conlike is interesting]

785 classOpSize _ _ []

786 = sizeZero

789 where

791 -- If the class op is scrutinising a lambda bound dictionary then

792 -- give it a discount, to encourage the inlining of this function

793 -- The actual discount is rather arbitrarily chosen

795 Var dict | dict `elem` top_args

797 _other -> emptyBag

799 -- | The size of a function call

800 callSize

805 -- The 1+ is for the function itself

806 -- Add 1 for each non-trivial arg;

807 -- the allocation cost, as in let(rec)

809 -- | The size of a jump to a join point

810 jumpSize

815 -- A jump is 20% the size of a function call. Making jumps free reopens

816 -- bug #6048, but making them any more expensive loses a 21% improvement in

817 -- spectral/puzzle. TODO Perhaps adjusting the default threshold would be a

818 -- better solution?

821 -- Size for functions that are not constructors or primops

822 -- Note [Function applications]

823 funSize dflags top_args fun n_val_args voids

824 | fun `hasKey` buildIdKey = buildSize

825 | fun `hasKey` augmentIdKey = augmentSize

827 where

829 is_join = isJoinId fun

831 size | is_join = jumpSize n_val_args voids

835 -- DISCOUNTS

836 -- See Note [Function and non-function discounts]

840 -- If the function is an argument and is applied

841 -- to some values, give it an arg-discount

845 -- If the function is partially applied, show a result discount

846 -- XXX maybe behave like ConSize for eval'd variable

849 conSize dc n_val_args

852 -- See Note [Unboxed tuple size and result discount]

855 -- See Note [Constructor size and result discount]

858 -- XXX still looks to large to me

860 {-

861 Note [Constructor size and result discount]

862 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

863 Treat a constructors application as size 10, regardless of how many

864 arguments it has; we are keen to expose them (and we charge separately

865 for their args). We can't treat them as size zero, else we find that

866 (Just x) has size 0, which is the same as a lone variable; and hence

867 'v' will always be replaced by (Just x), where v is bound to Just x.

869 The "result discount" is applied if the result of the call is

870 scrutinised (say by a case). For a constructor application that will

871 mean the constructor application will disappear, so we don't need to

872 charge it to the function. So the discount should at least match the

873 cost of the constructor application, namely 10. But to give a bit

874 of extra incentive we give a discount of 10*(1 + n_val_args).

876 Simon M tried a MUCH bigger discount: (10 * (10 + n_val_args)),

877 and said it was an "unambiguous win", but its terribly dangerous

878 because a function with many many case branches, each finishing with

879 a constructor, can have an arbitrarily large discount. This led to

880 terrible code bloat: see Trac #6099.

882 Note [Unboxed tuple size and result discount]

883 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

884 However, unboxed tuples count as size zero. I found occasions where we had

885 f x y z = case op# x y z of { s -> (# s, () #) }

886 and f wasn't getting inlined.

888 I tried giving unboxed tuples a *result discount* of zero (see the

889 commented-out line). Why? When returned as a result they do not

890 allocate, so maybe we don't want to charge so much for them If you

891 have a non-zero discount here, we find that workers often get inlined

892 back into wrappers, because it look like

893 f x = case $wf x of (# a,b #) -> (a,b)

894 and we are keener because of the case. However while this change

895 shrank binary sizes by 0.5% it also made spectral/boyer allocate 5%

896 more. All other changes were very small. So it's not a big deal but I

897 didn't adopt the idea.

899 Note [Function and non-function discounts]

900 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

901 We want a discount if the function is applied. A good example is

902 monadic combinators with continuation arguments, where inlining is

903 quite important.

905 But we don't want a big discount when a function is called many times

906 (see the detailed comments with Trac #6048) because if the function is

907 big it won't be inlined at its many call sites and no benefit results.

908 Indeed, we can get exponentially big inlinings this way; that is what

909 Trac #6048 is about.

911 On the other hand, for data-valued arguments, if there are lots of

912 case expressions in the body, each one will get smaller if we apply

913 the function to a constructor application, so we *want* a big discount

914 if the argument is scrutinised by many case expressions.

916 Conclusion:

917 - For functions, take the max of the discounts

918 - For data values, take the sum of the discounts

921 Note [Literal integer size]

922 ~~~~~~~~~~~~~~~~~~~~~~~~~~~

923 Literal integers *can* be big (mkInteger [...coefficients...]), but

924 need not be (S# n). We just use an arbitrary big-ish constant here

925 so that, in particular, we don't inline top-level defns like

926 n = S# 5

927 There's no point in doing so -- any optimisations will see the S#

928 through n's unfolding. Nor will a big size inhibit unfoldings functions

929 that mention a literal Integer, because the float-out pass will float

930 all those constants to top level.

931 -}

934 primOpSize op n_val_args

937 else sizeN op_size

938 where

939 op_size = primOpCodeSize op

942 buildSize :: ExprSize

944 -- We really want to inline applications of build

945 -- build t (\cn -> e) should cost only the cost of e (because build will be inlined later)

946 -- Indeed, we should add a result_discount because build is

947 -- very like a constructor. We don't bother to check that the

948 -- build is saturated (it usually is). The "-2" discounts for the \c n,

949 -- The "4" is rather arbitrary.

951 augmentSize :: ExprSize

953 -- Ditto (augment t (\cn -> e) ys) should cost only the cost of

954 -- e plus ys. The -2 accounts for the \cn

956 -- When we return a lambda, give a discount if it's used (applied)

959 lamScrutDiscount _ TooBig = TooBig

961 {-

962 Note [addAltSize result discounts]

963 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

964 When adding the size of alternatives, we *add* the result discounts

965 too, rather than take the *maximum*. For a multi-branch case, this

966 gives a discount for each branch that returns a constructor, making us

967 keener to inline. I did try using 'max' instead, but it makes nofib

968 'rewrite' and 'puzzle' allocate significantly more, and didn't make

969 binary sizes shrink significantly either.

971 Note [Discounts and thresholds]

972 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

973 Constants for discounts and thesholds are defined in main/DynFlags,

974 all of form ufXxxx. They are:

976 ufCreationThreshold

977 At a definition site, if the unfolding is bigger than this, we

978 may discard it altogether

980 ufUseThreshold

981 At a call site, if the unfolding, less discounts, is smaller than

982 this, then it's small enough inline

984 ufKeenessFactor

985 Factor by which the discounts are multiplied before

986 subtracting from size

988 ufDictDiscount

989 The discount for each occurrence of a dictionary argument

990 as an argument of a class method. Should be pretty small

991 else big functions may get inlined

993 ufFunAppDiscount

994 Discount for a function argument that is applied. Quite

995 large, because if we inline we avoid the higher-order call.

997 ufDearOp

998 The size of a foreign call or not-dupable PrimOp

1000 ufVeryAggressive

1001 If True, the compiler ignores all the thresholds and inlines very

1002 aggressively. It still adheres to arity, simplifier phase control and

1003 loop breakers.

1006 Note [Function applications]

1007 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1008 In a function application (f a b)

1010 - If 'f' is an argument to the function being analysed,

1011 and there's at least one value arg, record a FunAppDiscount for f

1013 - If the application if a PAP (arity > 2 in this example)

1014 record a *result* discount (because inlining

1015 with "extra" args in the call may mean that we now

1016 get a saturated application)

1018 Code for manipulating sizes

1019 -}

1021 -- | The size of a candidate expression for unfolding

1022 data ExprSize

1023 = TooBig

1026 -- ^ Arguments cased herein, and discount for each such

1028 -- ^ Size to subtract if result is scrutinised by a case

1029 -- expression

1030 }

1036 -- subtract the discount before deciding whether to bale out. eg. we

1037 -- want to inline a large constructor application into a selector:

1038 -- tup = (a_1, ..., a_99)

1039 -- x = case tup of ...

1040 --

1046 maxSize TooBig _ = TooBig

1047 maxSize _ TooBig = TooBig

1051 sizeZero :: ExprSize

1057 {-

1058 ************************************************************************

1059 * *

1060 \subsection[considerUnfolding]{Given all the info, do (not) do the unfolding}

1061 * *

1062 ************************************************************************

1064 We use 'couldBeSmallEnoughToInline' to avoid exporting inlinings that

1065 we ``couldn't possibly use'' on the other side. Can be overridden w/

1066 flaggery. Just the same as smallEnoughToInline, except that it has no

1067 actual arguments.

1068 -}

1071 couldBeSmallEnoughToInline dflags threshold rhs

1075 where

1078 ----------------

1082 smallEnoughToInline _ _

1085 ----------------

1088 -- Sees if the unfolding is pretty certain to inline

1089 -- If so, return a *stable* unfolding for it, that will always inline

1090 certainlyWillInline dflags fn_info

1097 -- to do so, and even if it is currently a

1098 -- loop breaker, it may not be later

1100 _other_unf -> Nothing

1102 where

1104 fn_unf = unfoldingInfo fn_info

1107 do_cunf _ UnfNever = Nothing

1109 -- INLINE functions have UnfWhen

1111 -- The UnfIfGoodArgs case seems important. If we w/w small functions

1112 -- binary sizes go up by 10%! (This is with SplitObjs.)

1113 -- I'm not totally sure why.

1114 -- INLINABLE functions come via this path

1115 -- See Note [certainlyWillInline: INLINABLE]

1122 -- Do not unconditionally inline a bottoming functions even if

1123 -- it seems smallish. We've carefully lifted it out to top level,

1124 -- so we don't want to re-inline it.

1131 -- Note the "unsaturatedOk". A function like f = \ab. a

1132 -- will certainly inline, even if partially applied (f e), so we'd

1133 -- better make sure that the transformed inlining has the same property

1134 | otherwise

1135 = Nothing

1137 {- Note [certainlyWillInline: be careful of thunks]

1138 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1139 Don't claim that thunks will certainly inline, because that risks work

1140 duplication. Even if the work duplication is not great (eg is_cheap

1141 holds), it can make a big difference in an inner loop In Trac #5623 we

1142 found that the WorkWrap phase thought that

1143 y = case x of F# v -> F# (v +# v)

1144 was certainlyWillInline, so the addition got duplicated.

1146 Note [certainlyWillInline: INLINABLE]

1147 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1148 certainlyWillInline /must/ return Nothing for a large INLINABLE thing,

1149 even though we have a stable inlining, so that strictness w/w takes

1150 place. It makes a big difference to efficiency, and the w/w pass knows

1151 how to transfer the INLINABLE info to the worker; see WorkWrap

1152 Note [Worker-wrapper for INLINABLE functions]

1154 ************************************************************************

1155 * *

1156 \subsection{callSiteInline}

1157 * *

1158 ************************************************************************

1160 This is the key function. It decides whether to inline a variable at a call site

1162 callSiteInline is used at call sites, so it is a bit more generous.

1163 It's a very important function that embodies lots of heuristics.

1164 A non-WHNF can be inlined if it doesn't occur inside a lambda,

1165 and occurs exactly once or

1166 occurs once in each branch of a case and is small

1168 If the thing is in WHNF, there's no danger of duplicating work,

1169 so we can inline if it occurs once, or is small

1171 NOTE: we don't want to inline top-level functions that always diverge.

1172 It just makes the code bigger. Tt turns out that the convenient way to prevent

1173 them inlining is to give them a NOINLINE pragma, which we do in

1174 StrictAnal.addStrictnessInfoToTopId

1175 -}

1177 callSiteInline :: DynFlags

1186 | NonTrivArg -- Arg has structure

1187 | ValueArg -- Arg is a con-app or PAP

1188 -- ..or con-like. Note [Conlike is interesting]

1199 data CallCtxt

1200 = BoringCtxt

1201 | RhsCtxt -- Rhs of a let-binding; see Note [RHS of lets]

1202 | DiscArgCtxt -- Argument of a function with non-zero arg discount

1203 | RuleArgCtxt -- We are somewhere in the argument of a function with rules

1205 | ValAppCtxt -- We're applied to at least one value arg

1206 -- This arises when we have ((f x |> co) y)

1207 -- Then the (f x) has argument 'x' but in a ValAppCtxt

1209 | CaseCtxt -- We're the scrutinee of a case

1210 -- that decomposes its scrutinee

1220 callSiteInline dflags id active_unfolding lone_variable arg_infos cont_info

1222 -- idUnfolding checks for loop-breakers, returning NoUnfolding

1223 -- Things with an INLINE pragma may have an unfolding *and*

1224 -- be a loop breaker (maybe the knot is not yet untied)

1229 arg_infos cont_info unf_template

1230 is_wf is_exp guidance

1232 NoUnfolding -> Nothing

1233 BootUnfolding -> Nothing

1234 OtherCon {} -> Nothing

1238 traceInline dflags inline_id str doc result

1239 | Just prefix <- inlineCheck dflags

1241 then pprTrace str doc result

1242 else result

1243 | dopt Opt_D_dump_inlinings dflags && dopt Opt_D_verbose_core2core dflags

1244 = pprTrace str doc result

1245 | otherwise

1246 = result

1251 tryUnfolding dflags id lone_variable

1252 arg_infos cont_info unf_template

1253 is_wf is_exp guidance

1259 -- See Note [INLINE for small functions (3)]

1261 | otherwise

1263 where

1264 some_benefit = calc_some_benefit uf_arity

1268 | ufVeryAggressive dflags

1272 | otherwise

1274 where

1279 discount = computeDiscount dflags arg_discounts

1280 res_discount arg_infos cont_info

1282 where

1283 mk_doc some_benefit extra_doc yes_or_no

1290 , extra_doc

1296 -- some_benefit is used when the RHS is small enough

1297 -- and the call has enough (or too many) value

1298 -- arguments (ie n_val_args >= arity). But there must

1299 -- be *something* interesting about some argument, or the

1300 -- result context, to make it worth inlining

1302 -- expected by the unfolding

1303 calc_some_benefit uf_arity

1305 -- Note [Unsaturated applications]

1307 || interesting_call

1308 where

1312 -- NB: (any nonTriv arg_infos) looks at the

1313 -- over-saturated args too which is "wrong";

1314 -- but if over-saturated we inline anyway.

1316 interesting_call

1317 | over_saturated

1319 | otherwise

1329 {-

1330 Note [Unfold into lazy contexts], Note [RHS of lets]

1331 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1332 When the call is the argument of a function with a RULE, or the RHS of a let,

1333 we are a little bit keener to inline. For example

1334 f y = (y,y,y)

1335 g y = let x = f y in ...(case x of (a,b,c) -> ...) ...

1336 We'd inline 'f' if the call was in a case context, and it kind-of-is,

1337 only we can't see it. Also

1338 x = f v

1339 could be expensive whereas

1340 x = case v of (a,b) -> a

1341 is patently cheap and may allow more eta expansion.

1342 So we treat the RHS of a let as not-totally-boring.

1344 Note [Unsaturated applications]

1345 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1346 When a call is not saturated, we *still* inline if one of the

1347 arguments has interesting structure. That's sometimes very important.

1348 A good example is the Ord instance for Bool in Base:

1350 Rec {

1351 $fOrdBool =GHC.Classes.D:Ord

1352 @ Bool

1353 ...

1354 $cmin_ajX

1356 $cmin_ajX [Occ=LoopBreaker] :: Bool -> Bool -> Bool

1357 $cmin_ajX = GHC.Classes.$dmmin @ Bool $fOrdBool

1358 }

1360 But the defn of GHC.Classes.$dmmin is:

1362 $dmmin :: forall a. GHC.Classes.Ord a => a -> a -> a

1363 {- Arity: 3, HasNoCafRefs, Strictness: SLL,

1364 Unfolding: (\ @ a $dOrd :: GHC.Classes.Ord a x :: a y :: a ->

1365 case @ a GHC.Classes.<= @ a $dOrd x y of wild {

1366 GHC.Types.False -> y GHC.Types.True -> x }) -}

1373 ~~~~~~~~~~~~~~~~~~~~~~

1376 Then we want x to inline unconditionally; no reason for it

1381 Lest we get extra allocation.

1384 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1385 An InlineRules is used for

1395 require saturation.

1398 ~~~~~~~~~~~~~~~~~~~~~~~

1406 it makes virtually no difference to nofib. So I simplified away this

1407 special case

1410 ~~~~~~~~~~~~~~~~~~~~~~

1411 Consider

1420 ~~~~~~~~~~~~~~~~~~~~~~~~~~

1430 to work ok now.

1438 ~~~~~~~~~~~~~~~~~~~~~ which appears below

1441 variable appears all alone

1446 AND

1447 it is bound to a cheap expression

1455 into

1457 and thence to

1461 Another example: I discovered that strings

1463 because the latter is strict.

1486 InlineRule branch.

1489 Consider

1501 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1506 consider

1509 think that expression is a constructor application.

1517 {-# INLINE foo #-}

1520 which Roman did.

1523 -}

1527 computeDiscount dflags arg_discounts res_discount arg_infos cont_info

1528 -- We multiple the raw discounts (args_discount and result_discount)

1529 -- ty opt_UnfoldingKeenessFactor because the former have to do with

1530 -- *size* whereas the discounts imply that there's some extra

1531 -- *efficiency* to be gained (e.g. beta reductions, case reductions)

1532 -- by inlining.

1535 -- so we count 10 for the function itself

1538 -- Discount of 10 for each arg supplied,

1539 -- because the result replaces the call

1543 where

1549 mk_arg_discount discount ValueArg = discount

1551 res_discount'

1552 | LT <- arg_discounts `compareLength` arg_infos

1554 | otherwise

1560 -- ToDo: this 40 `min` res_discount doesn't seem right

1561 -- for DiscArgCtxt it shouldn't matter because the function will

1562 -- get the arg discount for any non-triv arg

1563 -- for RuleArgCtxt we do want to be keener to inline; but not only

1564 -- constructor results

1565 -- for RhsCtxt I suppose that exposing a data con is good in general

1566 -- And 40 seems very arbitrary

1567 --

1568 -- res_discount can be very large when a function returns

1569 -- constructors; but we only want to invoke that large discount

1570 -- when there's a case continuation.

1571 -- Otherwise we, rather arbitrarily, threshold it. Yuk.

1572 -- But we want to aovid inlining large functions that return

1573 -- constructors into contexts that are simply "interesting"