1 {-

2 (c) The University of Glasgow 2006

3 (c) The AQUA Project, Glasgow University, 1994-1998

6 Core-syntax unfoldings

8 Unfoldings (which can travel across module boundaries) are in Core

9 syntax (namely @CoreExpr@s).

11 The type @Unfolding@ sits ``above'' simply-Core-expressions

12 unfoldings, capturing ``higher-level'' things we know about a binding,

13 usually things that the simplifier found out (e.g., ``it's a

14 literal''). In the corner of a @CoreUnfolding@ unfolding, you will

15 find, unsurprisingly, a Core expression.

16 -}

18 {-# LANGUAGE CPP #-}

29 specUnfolding,

31 ArgSummary(..),

38 -- Reexport from CoreSubst (it only live there so it can be used

39 -- by the Very Simple Optimiser)

40 exprIsConApp_maybe, exprIsLiteral_maybe

45 import GhcPrelude

47 import DynFlags

48 import CoreSyn

51 import CoreOpt

53 import CoreUtils

54 import Id

56 import DataCon

57 import Literal

58 import PrimOp

59 import IdInfo

61 import Type

62 import PrelNames

64 import Bag

65 import Util

66 import Outputable

67 import ForeignCall

68 import Name

73 {-

74 ************************************************************************

75 * *

76 \subsection{Making unfoldings}

77 * *

78 ************************************************************************

79 -}

82 mkTopUnfolding dflags is_bottoming rhs

86 -- For implicit Ids, do a tiny bit of optimising first

87 mkImplicitUnfolding dflags expr

90 -- Note [Top-level flag on inline rules]

91 -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

92 -- Slight hack: note that mk_inline_rules conservatively sets the

93 -- top-level flag to True. It gets set more accurately by the simplifier

94 -- Simplify.simplUnfolding.

97 mkSimpleUnfolding dflags rhs

101 mkDFunUnfolding bndrs con ops

105 -- See Note [Occurrence analysis of unfoldings]

108 mkWwInlineRule dflags expr arity

115 mkCompulsoryUnfolding expr -- Used for things that absolutely must be unfolded

122 -- See Note [Worker-wrapper for INLINABLE functions] in WorkWrap

123 mkWorkerUnfolding dflags work_fn

126 | isStableSource src

127 = mkCoreUnfolding src top_lvl new_tmpl guidance

128 where

132 mkWorkerUnfolding _ _ _ = noUnfolding

134 -- | Make an unfolding that may be used unsaturated

135 -- (ug_unsat_ok = unSaturatedOk) and that is reported as having its

136 -- manifest arity (the number of outer lambdas applications will

137 -- resolve before doing any work).

139 mkInlineUnfolding expr

140 = mkCoreUnfolding InlineStable

142 expr' guide

143 where

144 expr' = simpleOptExpr unsafeGlobalDynFlags expr

150 -- | Make an unfolding that will be used once the RHS has been saturated

151 -- to the given arity.

153 mkInlineUnfoldingWithArity arity expr

154 = mkCoreUnfolding InlineStable

156 expr' guide

157 where

158 expr' = simpleOptExpr unsafeGlobalDynFlags expr

162 -- See Note [INLINE pragmas and boring contexts] as to why we need to look

163 -- at the arity here.

168 mkInlinableUnfolding dflags expr

170 where

171 expr' = simpleOptExpr dflags expr

175 -- See Note [Specialising unfoldings]

176 -- specUnfolding spec_bndrs spec_app arity_decrease unf

177 -- = \spec_bndrs. spec_app( unf )

178 --

179 specUnfolding dflags spec_bndrs spec_app arity_decrease

183 -- There is a hard-to-check assumption here that the spec_app has

184 -- enough applications to exactly saturate the old_bndrs

185 -- For DFunUnfoldings we transform

186 -- \old_bndrs. MkD <op1> ... <opn>

187 -- to

188 -- \new_bndrs. MkD (spec_app(\old_bndrs. <op1>)) ... ditto <opn>

189 -- The ASSERT checks the value part of that

190 where

192 -- The beta-redexes created by spec_app will be

193 -- simplified away by simplOptExpr

195 specUnfolding dflags spec_bndrs spec_app arity_decrease

199 | isStableSource src -- See Note [Specialising unfoldings]

207 -- The beta-redexes created by spec_app will be

208 -- simplified away by simplOptExpr

210 in mkCoreUnfolding src top_lvl new_tmpl guidance

212 specUnfolding _ _ _ _ _ = noUnfolding

214 {- Note [Specialising unfoldings]

215 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

216 When we specialise a function for some given type-class arguments, we use

217 specUnfolding to specialise its unfolding. Some important points:

219 * If the original function has a DFunUnfolding, the specialised one

220 must do so too! Otherwise we lose the magic rules that make it

221 interact with ClassOps

223 * There is a bit of hack for INLINABLE functions:

224 f :: Ord a => ....

225 f = <big-rhs>

226 {- INLINABLE f #-}

244 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

245 Consider

248 {-# INLINE x #-}

252 The semantics of an INLINE pragma is

259 the `UnfoldingGuidance`.)

266 {-# INLINE x #-}

274 How do we deliver on this? By adjusting the ug_boring_ok

278 NB: there is a real risk that full laziness will float it right back

279 out again. Consider again

281 {-# INLINE x #-}

284 After inlining we get

294 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

295 An INLINE pragma uses mkInlineUnfoldingWithArity to build the

301 pragma.

308 -}

312 -- Occurrence-analyses the expression before capturing it

313 mkCoreUnfolding src top_lvl expr guidance

315 -- See Note [Occurrence analysis of unfoldings]

327 -- (only relevant for top-level bindings)

328 -> CoreExpr

329 -> Unfolding

330 -- Calculates unfolding guidance

331 -- Occurrence-analyses the expression before capturing it

332 mkUnfolding dflags src is_top_lvl is_bottoming expr

334 -- See Note [Occurrence analysis of unfoldings]

342 where

344 guidance = calcUnfoldingGuidance dflags is_top_bottoming expr

345 -- NB: *not* (calcUnfoldingGuidance (occurAnalyseExpr_NoBinderSwap expr))!

346 -- See Note [Calculate unfolding guidance on the non-occ-anal'd expression]

348 {-

349 Note [Occurrence analysis of unfoldings]

350 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

351 We do occurrence-analysis of unfoldings once and for all, when the

352 unfolding is built, rather than each time we inline them.

354 But given this decision it's vital that we do

355 *always* do it. Consider this unfolding

356 \x -> letrec { f = ...g...; g* = f } in body

357 where g* is (for some strange reason) the loop breaker. If we don't

358 occ-anal it when reading it in, we won't mark g as a loop breaker, and

359 we may inline g entirely in body, dropping its binding, and leaving

360 the occurrence in f out of scope. This happened in #8892, where

361 the unfolding in question was a DFun unfolding.

363 But more generally, the simplifier is designed on the

364 basis that it is looking at occurrence-analysed expressions, so better

365 ensure that they acutally are.

367 We use occurAnalyseExpr_NoBinderSwap instead of occurAnalyseExpr;

368 see Note [No binder swap in unfoldings].

370 Note [No binder swap in unfoldings]

371 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

372 The binder swap can temporarily violate Core Lint, by assinging

373 a LocalId binding to a GlobalId. For example, if A.foo{r872}

374 is a GlobalId with unique r872, then

376 case A.foo{r872} of bar {

377 K x -> ...(A.foo{r872})...

378 }

380 gets transformed to

382 case A.foo{r872} of bar {

383 K x -> let foo{r872} = bar

384 in ...(A.foo{r872})...

386 This is usually not a problem, because the simplifier will transform

387 this to:

389 case A.foo{r872} of bar {

390 K x -> ...(bar)...

392 However, after occurrence analysis but before simplification, this extra 'let'

393 violates the Core Lint invariant that we do not have local 'let' bindings for

394 GlobalIds. That seems (just) tolerable for the occurrence analysis that happens

395 just before the Simplifier, but not for unfoldings, which are Linted

396 independently.

397 As a quick workaround, we disable binder swap in this module.

398 See #16288 and #16296 for further plans.

400 Note [Calculate unfolding guidance on the non-occ-anal'd expression]

401 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

402 Notice that we give the non-occur-analysed expression to

403 calcUnfoldingGuidance. In some ways it'd be better to occur-analyse

404 first; for example, sometimes during simplification, there's a large

405 let-bound thing which has been substituted, and so is now dead; so

406 'expr' contains two copies of the thing while the occurrence-analysed

407 expression doesn't.

409 Nevertheless, we *don't* and *must not* occ-analyse before computing

410 the size because

412 a) The size computation bales out after a while, whereas occurrence

413 analysis does not.

415 b) Residency increases sharply if you occ-anal first. I'm not

416 100% sure why, but it's a large effect. Compiling Cabal went

417 from residency of 534M to over 800M with this one change.

419 This can occasionally mean that the guidance is very pessimistic;

420 it gets fixed up next round. And it should be rare, because large

421 let-bound things that are dead are usually caught by preInlineUnconditionally

424 ************************************************************************

425 * *

426 \subsection{The UnfoldingGuidance type}

427 * *

428 ************************************************************************

429 -}

432 -- See Note [INLINE for small functions]

433 -- True => the result of inlining the expression is

434 -- no bigger than the expression itself

435 -- eg (\x y -> f y x)

436 -- This is a quick and dirty version. It doesn't attempt

437 -- to deal with (\x y z -> x (y z))

438 -- The really important one is (x `cast` c)

439 inlineBoringOk e

441 where

451 go _ _ = boringCxtNotOk

453 calcUnfoldingGuidance

454 :: DynFlags

457 -> UnfoldingGuidance

460 = calcUnfoldingGuidance dflags is_top_bottoming expr

461 calcUnfoldingGuidance dflags is_top_bottoming expr

463 TooBig -> UnfNever

464 SizeIs size cased_bndrs scrut_discount

465 | uncondInline expr n_val_bndrs size

470 | is_top_bottoming

473 | otherwise

478 where

480 bOMB_OUT_SIZE = ufCreationThreshold dflags

481 -- Bomb out if size gets bigger than this

487 where

495 -- See Note [Function and non-function discounts]

497 {-

498 Note [Computing the size of an expression]

499 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

500 The basic idea of sizeExpr is obvious enough: count nodes. But getting the

501 heuristics right has taken a long time. Here's the basic strategy:

503 * Variables, literals: 0

504 (Exception for string literals, see litSize.)

506 * Function applications (f e1 .. en): 1 + #value args

508 * Constructor applications: 1, regardless of #args

510 * Let(rec): 1 + size of components

512 * Note, cast: 0

514 Examples

516 Size Term

517 --------------

518 0 42#

519 0 x

520 0 True

521 2 f x

522 1 Just x

523 4 f (g x)

525 Notice that 'x' counts 0, while (f x) counts 2. That's deliberate: there's

526 a function call to account for. Notice also that constructor applications

527 are very cheap, because exposing them to a caller is so valuable.

529 [25/5/11] All sizes are now multiplied by 10, except for primops

530 (which have sizes like 1 or 4. This makes primops look fantastically

531 cheap, and seems to be almost unversally beneficial. Done partly as a

532 result of #4978.

534 Note [Do not inline top-level bottoming functions]

535 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

536 The FloatOut pass has gone to some trouble to float out calls to 'error'

537 and similar friends. See Note [Bottoming floats] in SetLevels.

538 Do not re-inline them! But we *do* still inline if they are very small

539 (the uncondInline stuff).

541 Note [INLINE for small functions]

542 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

543 Consider {-# INLINE f #-}

544 f x = Just x

545 g y = f y

551 Things to note:

557 -- arguments to the cons

558 x --> g 3 -- NO

559 x --> Just v -- NO

565 silly situation that

566 f x y = x

570 efficient at runtime.

577 If we inline f here we get

592 -}

595 -- Inline unconditionally if there no size increase

596 -- Size of call is arity (+1 for the function)

597 -- See Note [INLINE for small functions]

598 uncondInline rhs arity size

602 sizeExpr :: DynFlags

605 -- get case'd

606 -> CoreExpr

607 -> ExprSize

609 -- Note [Computing the size of an expression]

611 sizeExpr dflags bOMB_OUT_SIZE top_args expr

612 = size_up expr

613 where

620 -- Make sure we get constructor discounts even

621 -- on nullary constructors

625 | isTyCoArg arg = size_up fun

635 size_up body `addSizeN`

636 size_up_alloc binder

641 pairs

644 | null alts

648 -- Now alts is non-empty

653 -- alts_size tries to compute a good discount for

654 -- the case when we are scrutinising an argument variable

656 -- Size of all alternatives

658 -- Size of biggest alternative

660 `unionBags` tot_disc) tot_scrut

661 -- If the variable is known, we produce a

662 -- discount that will take us back to 'max',

663 -- the size of the largest alternative The

664 -- 1+ is a little discount for reduced

665 -- allocation in the caller

666 --

667 -- Notice though, that we return tot_disc,

668 -- the total discount from all branches. I

669 -- think that's right.

671 alts_size tot_size _ = tot_size

672 in

675 -- Good to inline if an arg is scrutinised, because

676 -- that may eliminate allocation in the caller

677 -- And it eliminates the case itself

678 where

681 is_top_arg _ = Nothing

686 where

687 case_size

690 -- Normally we don't charge for the case itself, but

691 -- we charge one per alternative (see size_up_alt,

692 -- below) to account for the cost of the info table

693 -- and comparisons.

694 --

695 -- However, in certain cases (see is_inline_scrut

696 -- below), no code is generated for the case unless

697 -- there are multiple alts. In these cases we

698 -- subtract one, making the first alt free.

699 -- e.g. case x# +# y# of _ -> ... should cost 1

700 -- case touch# x# of _ -> ... should cost 0

701 -- (see #4978)

702 --

703 -- I would like to not have the "lengthAtMost alts 1"

704 -- condition above, but without that some programs got worse

705 -- (spectral/hartel/event and spectral/para). I don't fully

706 -- understand why. (SDM 24/5/11)

708 -- unboxed variables, inline primops and unsafe foreign calls

709 -- are all "inline" things:

711 is_inline_scrut scrut

717 | otherwise

721 | Just join_arity <- isJoinId_maybe bndr

722 -- Skip arguments to join point

724 = size_up body

725 | otherwise

726 = size_up rhs

728 ------------

729 -- size_up_app is used when there's ONE OR MORE value args

731 | isTyCoArg arg = size_up_app fun args voids

738 size_up_app other args voids = size_up other `addSizeN`

740 -- if the lhs is not an App or a Var, or an invisible thing like a

741 -- Tick or Cast, then we should charge for a complete call plus the

742 -- size of the lhs itself.

744 ------------

746 size_up_call fun val_args voids

751 ClassOpId _ -> classOpSize dflags top_args val_args

754 ------------

756 -- Don't charge for args, so that wrappers look cheap

757 -- (See comments about wrappers with Case)

758 --

759 -- IMPORTANT: *do* charge 1 for the alternative, else we

760 -- find that giant case nests are treated as practically free

761 -- A good example is Foreign.C.Error.errnoToIOError

763 ------------

764 -- Cost to allocate binding with given binder

765 size_up_alloc bndr

766 | isTyVar bndr -- Doesn't exist at runtime

767 || isJoinId bndr -- Not allocated at all

770 | otherwise

773 ------------

774 -- These addSize things have to be here because

775 -- I don't want to give them bOMB_OUT_SIZE as an argument

776 addSizeN TooBig _ = TooBig

779 -- addAltSize is used to add the sizes of case alternatives

780 addAltSize TooBig _ = TooBig

781 addAltSize _ TooBig = TooBig

787 -- This variant ignores the result discount from its LEFT argument

788 -- It's used when the second argument isn't part of the result

789 addSizeNSD TooBig _ = TooBig

790 addSizeNSD _ TooBig = TooBig

794 d2 -- Ignore d1

798 -- an expression of type State# RealWorld must be a variable

803 -- | Finds a nominal size of a string literal.

805 -- Used by CoreUnfold.sizeExpr

809 -- If size could be 0 then @f "x"@ might be too small

810 -- [Sept03: make literal strings a bit bigger to avoid fruitless

811 -- duplication of little strings]

813 -- Key point: if x |-> 4, then x must inline unconditionally

814 -- (eg via case binding)

817 -- See Note [Conlike is interesting]

818 classOpSize _ _ []

819 = sizeZero

822 where

824 -- If the class op is scrutinising a lambda bound dictionary then

825 -- give it a discount, to encourage the inlining of this function

826 -- The actual discount is rather arbitrarily chosen

828 Var dict | dict `elem` top_args

830 _other -> emptyBag

832 -- | The size of a function call

833 callSize

838 -- The 1+ is for the function itself

839 -- Add 1 for each non-trivial arg;

840 -- the allocation cost, as in let(rec)

842 -- | The size of a jump to a join point

843 jumpSize

848 -- A jump is 20% the size of a function call. Making jumps free reopens

849 -- bug #6048, but making them any more expensive loses a 21% improvement in

850 -- spectral/puzzle. TODO Perhaps adjusting the default threshold would be a

851 -- better solution?

854 -- Size for functions that are not constructors or primops

855 -- Note [Function applications]

856 funSize dflags top_args fun n_val_args voids

857 | fun `hasKey` buildIdKey = buildSize

858 | fun `hasKey` augmentIdKey = augmentSize

860 where

862 is_join = isJoinId fun

864 size | is_join = jumpSize n_val_args voids

868 -- DISCOUNTS

869 -- See Note [Function and non-function discounts]

873 -- If the function is an argument and is applied

874 -- to some values, give it an arg-discount

878 -- If the function is partially applied, show a result discount

879 -- XXX maybe behave like ConSize for eval'd variable

882 conSize dc n_val_args

885 -- See Note [Unboxed tuple size and result discount]

888 -- See Note [Constructor size and result discount]

891 -- XXX still looks to large to me

893 {-

894 Note [Constructor size and result discount]

895 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

896 Treat a constructors application as size 10, regardless of how many

897 arguments it has; we are keen to expose them (and we charge separately

898 for their args). We can't treat them as size zero, else we find that

899 (Just x) has size 0, which is the same as a lone variable; and hence

900 'v' will always be replaced by (Just x), where v is bound to Just x.

902 The "result discount" is applied if the result of the call is

903 scrutinised (say by a case). For a constructor application that will

904 mean the constructor application will disappear, so we don't need to

905 charge it to the function. So the discount should at least match the

906 cost of the constructor application, namely 10. But to give a bit

907 of extra incentive we give a discount of 10*(1 + n_val_args).

909 Simon M tried a MUCH bigger discount: (10 * (10 + n_val_args)),

910 and said it was an "unambiguous win", but its terribly dangerous

911 because a function with many many case branches, each finishing with

912 a constructor, can have an arbitrarily large discount. This led to

913 terrible code bloat: see #6099.

915 Note [Unboxed tuple size and result discount]

916 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

917 However, unboxed tuples count as size zero. I found occasions where we had

918 f x y z = case op# x y z of { s -> (# s, () #) }

919 and f wasn't getting inlined.

921 I tried giving unboxed tuples a *result discount* of zero (see the

922 commented-out line). Why? When returned as a result they do not

923 allocate, so maybe we don't want to charge so much for them If you

924 have a non-zero discount here, we find that workers often get inlined

925 back into wrappers, because it look like

926 f x = case $wf x of (# a,b #) -> (a,b)

927 and we are keener because of the case. However while this change

928 shrank binary sizes by 0.5% it also made spectral/boyer allocate 5%

929 more. All other changes were very small. So it's not a big deal but I

930 didn't adopt the idea.

932 Note [Function and non-function discounts]

933 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

934 We want a discount if the function is applied. A good example is

935 monadic combinators with continuation arguments, where inlining is

936 quite important.

938 But we don't want a big discount when a function is called many times

939 (see the detailed comments with #6048) because if the function is

940 big it won't be inlined at its many call sites and no benefit results.

941 Indeed, we can get exponentially big inlinings this way; that is what

942 #6048 is about.

944 On the other hand, for data-valued arguments, if there are lots of

945 case expressions in the body, each one will get smaller if we apply

946 the function to a constructor application, so we *want* a big discount

947 if the argument is scrutinised by many case expressions.

949 Conclusion:

950 - For functions, take the max of the discounts

951 - For data values, take the sum of the discounts

954 Note [Literal integer size]

955 ~~~~~~~~~~~~~~~~~~~~~~~~~~~

956 Literal integers *can* be big (mkInteger [...coefficients...]), but

957 need not be (S# n). We just use an arbitrary big-ish constant here

958 so that, in particular, we don't inline top-level defns like

959 n = S# 5

960 There's no point in doing so -- any optimisations will see the S#

961 through n's unfolding. Nor will a big size inhibit unfoldings functions

962 that mention a literal Integer, because the float-out pass will float

963 all those constants to top level.

964 -}

967 primOpSize op n_val_args

970 else sizeN op_size

971 where

972 op_size = primOpCodeSize op

975 buildSize :: ExprSize

977 -- We really want to inline applications of build

978 -- build t (\cn -> e) should cost only the cost of e (because build will be inlined later)

979 -- Indeed, we should add a result_discount because build is

980 -- very like a constructor. We don't bother to check that the

981 -- build is saturated (it usually is). The "-2" discounts for the \c n,

982 -- The "4" is rather arbitrary.

984 augmentSize :: ExprSize

986 -- Ditto (augment t (\cn -> e) ys) should cost only the cost of

987 -- e plus ys. The -2 accounts for the \cn

989 -- When we return a lambda, give a discount if it's used (applied)

992 lamScrutDiscount _ TooBig = TooBig

994 {-

995 Note [addAltSize result discounts]

996 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

997 When adding the size of alternatives, we *add* the result discounts

998 too, rather than take the *maximum*. For a multi-branch case, this

999 gives a discount for each branch that returns a constructor, making us

1000 keener to inline. I did try using 'max' instead, but it makes nofib

1001 'rewrite' and 'puzzle' allocate significantly more, and didn't make

1002 binary sizes shrink significantly either.

1004 Note [Discounts and thresholds]

1005 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1006 Constants for discounts and thesholds are defined in main/DynFlags,

1007 all of form ufXxxx. They are:

1009 ufCreationThreshold

1010 At a definition site, if the unfolding is bigger than this, we

1011 may discard it altogether

1013 ufUseThreshold

1014 At a call site, if the unfolding, less discounts, is smaller than

1015 this, then it's small enough inline

1017 ufKeenessFactor

1018 Factor by which the discounts are multiplied before

1019 subtracting from size

1021 ufDictDiscount

1022 The discount for each occurrence of a dictionary argument

1023 as an argument of a class method. Should be pretty small

1024 else big functions may get inlined

1026 ufFunAppDiscount

1027 Discount for a function argument that is applied. Quite

1028 large, because if we inline we avoid the higher-order call.

1030 ufDearOp

1031 The size of a foreign call or not-dupable PrimOp

1033 ufVeryAggressive

1034 If True, the compiler ignores all the thresholds and inlines very

1035 aggressively. It still adheres to arity, simplifier phase control and

1036 loop breakers.

1039 Note [Function applications]

1040 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1041 In a function application (f a b)

1043 - If 'f' is an argument to the function being analysed,

1044 and there's at least one value arg, record a FunAppDiscount for f

1046 - If the application if a PAP (arity > 2 in this example)

1047 record a *result* discount (because inlining

1048 with "extra" args in the call may mean that we now

1049 get a saturated application)

1051 Code for manipulating sizes

1052 -}

1054 -- | The size of a candidate expression for unfolding

1055 data ExprSize

1056 = TooBig

1059 -- ^ Arguments cased herein, and discount for each such

1061 -- ^ Size to subtract if result is scrutinised by a case

1062 -- expression

1063 }

1069 -- subtract the discount before deciding whether to bale out. eg. we

1070 -- want to inline a large constructor application into a selector:

1071 -- tup = (a_1, ..., a_99)

1072 -- x = case tup of ...

1073 --

1079 maxSize TooBig _ = TooBig

1080 maxSize _ TooBig = TooBig

1084 sizeZero :: ExprSize

1090 {-

1091 ************************************************************************

1092 * *

1093 \subsection[considerUnfolding]{Given all the info, do (not) do the unfolding}

1094 * *

1095 ************************************************************************

1097 We use 'couldBeSmallEnoughToInline' to avoid exporting inlinings that

1098 we ``couldn't possibly use'' on the other side. Can be overridden w/

1099 flaggery. Just the same as smallEnoughToInline, except that it has no

1100 actual arguments.

1101 -}

1104 couldBeSmallEnoughToInline dflags threshold rhs

1108 where

1111 ----------------

1115 smallEnoughToInline _ _

1118 ----------------

1121 -- ^ Sees if the unfolding is pretty certain to inline.

1122 -- If so, return a *stable* unfolding for it, that will always inline.

1123 certainlyWillInline dflags fn_info

1131 -- to do so, and even if it is currently a

1132 -- loop breaker, it may not be later

1134 _other_unf -> Nothing

1136 where

1139 fn_unf = unfoldingInfo fn_info

1142 do_cunf _ UnfNever = Nothing

1144 -- INLINE functions have UnfWhen

1146 -- The UnfIfGoodArgs case seems important. If we w/w small functions

1147 -- binary sizes go up by 10%! (This is with SplitObjs.)

1148 -- I'm not totally sure why.

1149 -- INLINABLE functions come via this path

1150 -- See Note [certainlyWillInline: INLINABLE]

1154 -- Do not unconditionally inline a bottoming functions even if

1155 -- it seems smallish. We've carefully lifted it out to top level,

1156 -- so we don't want to re-inline it.

1163 -- Note the "unsaturatedOk". A function like f = \ab. a

1164 -- will certainly inline, even if partially applied (f e), so we'd

1165 -- better make sure that the transformed inlining has the same property

1166 | otherwise

1167 = Nothing

1169 {- Note [certainlyWillInline: be careful of thunks]

1170 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1171 Don't claim that thunks will certainly inline, because that risks work

1172 duplication. Even if the work duplication is not great (eg is_cheap

1173 holds), it can make a big difference in an inner loop In #5623 we

1174 found that the WorkWrap phase thought that

1175 y = case x of F# v -> F# (v +# v)

1176 was certainlyWillInline, so the addition got duplicated.

1178 Note that we check arityInfo instead of the arity of the unfolding to detect

1179 this case. This is so that we don't accidentally fail to inline small partial

1180 applications, like `f = g 42` (where `g` recurses into `f`) where g has arity 2

1181 (say). Here there is no risk of work duplication, and the RHS is tiny, so

1182 certainlyWillInline should return True. But `unf_arity` is zero! However f's

1183 arity, gotten from `arityInfo fn_info`, is 1.

1185 Failing to say that `f` will inline forces W/W to generate a potentially huge

1186 worker for f that will immediately cancel with `g`'s wrapper anyway, causing

1187 unnecessary churn in the Simplifier while arriving at the same result.

1189 Note [certainlyWillInline: INLINABLE]

1190 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1191 certainlyWillInline /must/ return Nothing for a large INLINABLE thing,

1192 even though we have a stable inlining, so that strictness w/w takes

1193 place. It makes a big difference to efficiency, and the w/w pass knows

1194 how to transfer the INLINABLE info to the worker; see WorkWrap

1195 Note [Worker-wrapper for INLINABLE functions]

1197 ************************************************************************

1198 * *

1199 \subsection{callSiteInline}

1200 * *

1201 ************************************************************************

1203 This is the key function. It decides whether to inline a variable at a call site

1205 callSiteInline is used at call sites, so it is a bit more generous.

1206 It's a very important function that embodies lots of heuristics.

1207 A non-WHNF can be inlined if it doesn't occur inside a lambda,

1208 and occurs exactly once or

1209 occurs once in each branch of a case and is small

1211 If the thing is in WHNF, there's no danger of duplicating work,

1212 so we can inline if it occurs once, or is small

1214 NOTE: we don't want to inline top-level functions that always diverge.

1215 It just makes the code bigger. Tt turns out that the convenient way to prevent

1216 them inlining is to give them a NOINLINE pragma, which we do in

1217 StrictAnal.addStrictnessInfoToTopId

1218 -}

1220 callSiteInline :: DynFlags

1229 | NonTrivArg -- Arg has structure

1230 | ValueArg -- Arg is a con-app or PAP

1231 -- ..or con-like. Note [Conlike is interesting]

1242 data CallCtxt

1243 = BoringCtxt

1244 | RhsCtxt -- Rhs of a let-binding; see Note [RHS of lets]

1245 | DiscArgCtxt -- Argument of a function with non-zero arg discount

1246 | RuleArgCtxt -- We are somewhere in the argument of a function with rules

1248 | ValAppCtxt -- We're applied to at least one value arg

1249 -- This arises when we have ((f x |> co) y)

1250 -- Then the (f x) has argument 'x' but in a ValAppCtxt

1252 | CaseCtxt -- We're the scrutinee of a case

1253 -- that decomposes its scrutinee

1263 callSiteInline dflags id active_unfolding lone_variable arg_infos cont_info

1265 -- idUnfolding checks for loop-breakers, returning NoUnfolding

1266 -- Things with an INLINE pragma may have an unfolding *and*

1267 -- be a loop breaker (maybe the knot is not yet untied)

1272 arg_infos cont_info unf_template

1273 is_wf is_exp guidance

1275 NoUnfolding -> Nothing

1276 BootUnfolding -> Nothing

1277 OtherCon {} -> Nothing

1281 traceInline dflags inline_id str doc result

1282 | Just prefix <- inlineCheck dflags

1284 then pprTrace str doc result

1285 else result

1286 | dopt Opt_D_dump_inlinings dflags && dopt Opt_D_verbose_core2core dflags

1287 = pprTrace str doc result

1288 | otherwise

1289 = result

1294 tryUnfolding dflags id lone_variable

1295 arg_infos cont_info unf_template

1296 is_wf is_exp guidance

1302 -- See Note [INLINE for small functions (3)]

1304 | otherwise

1306 where

1307 some_benefit = calc_some_benefit uf_arity

1311 | ufVeryAggressive dflags

1315 | otherwise

1317 where

1322 discount = computeDiscount dflags arg_discounts

1323 res_discount arg_infos cont_info

1325 where

1326 mk_doc some_benefit extra_doc yes_or_no

1333 , extra_doc

1339 -- some_benefit is used when the RHS is small enough

1340 -- and the call has enough (or too many) value

1341 -- arguments (ie n_val_args >= arity). But there must

1342 -- be *something* interesting about some argument, or the

1343 -- result context, to make it worth inlining

1345 -- expected by the unfolding

1346 calc_some_benefit uf_arity

1348 -- Note [Unsaturated applications]

1350 || interesting_call

1351 where

1355 -- NB: (any nonTriv arg_infos) looks at the

1356 -- over-saturated args too which is "wrong";

1357 -- but if over-saturated we inline anyway.

1359 interesting_call

1360 | over_saturated

1362 | otherwise

1372 {-

1373 Note [Unfold into lazy contexts], Note [RHS of lets]

1374 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1375 When the call is the argument of a function with a RULE, or the RHS of a let,

1376 we are a little bit keener to inline. For example

1377 f y = (y,y,y)

1378 g y = let x = f y in ...(case x of (a,b,c) -> ...) ...

1379 We'd inline 'f' if the call was in a case context, and it kind-of-is,

1380 only we can't see it. Also

1381 x = f v

1382 could be expensive whereas

1383 x = case v of (a,b) -> a

1384 is patently cheap and may allow more eta expansion.

1385 So we treat the RHS of a let as not-totally-boring.

1387 Note [Unsaturated applications]

1388 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1389 When a call is not saturated, we *still* inline if one of the

1390 arguments has interesting structure. That's sometimes very important.

1391 A good example is the Ord instance for Bool in Base:

1393 Rec {

1394 $fOrdBool =GHC.Classes.D:Ord

1395 @ Bool

1396 ...

1397 $cmin_ajX

1399 $cmin_ajX [Occ=LoopBreaker] :: Bool -> Bool -> Bool

1400 $cmin_ajX = GHC.Classes.$dmmin @ Bool $fOrdBool

1401 }

1403 But the defn of GHC.Classes.$dmmin is:

1405 $dmmin :: forall a. GHC.Classes.Ord a => a -> a -> a

1406 {- Arity: 3, HasNoCafRefs, Strictness: SLL,

1407 Unfolding: (\ @ a $dOrd :: GHC.Classes.Ord a x :: a y :: a ->

1408 case @ a GHC.Classes.<= @ a $dOrd x y of wild {

1409 GHC.Types.False -> y GHC.Types.True -> x }) -}

1416 ~~~~~~~~~~~~~~~~~~~~~~

1419 Then we want x to inline unconditionally; no reason for it

1424 Lest we get extra allocation.

1427 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1428 An InlineRules is used for

1438 require saturation.

1441 ~~~~~~~~~~~~~~~~~~~~~~~

1449 it makes virtually no difference to nofib. So I simplified away this

1450 special case

1453 ~~~~~~~~~~~~~~~~~~~~~~

1454 Consider

1463 ~~~~~~~~~~~~~~~~~~~~~~~~~~

1473 to work ok now.

1481 ~~~~~~~~~~~~~~~~~~~~~ which appears below

1484 variable appears all alone

1489 AND

1490 it is bound to a cheap expression

1498 into

1500 and thence to

1504 Another example: I discovered that strings

1506 because the latter is strict.

1529 InlineRule branch.

1532 Consider

1544 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1549 consider

1552 think that expression is a constructor application.

1560 {-# INLINE foo #-}

1563 which Roman did.

1566 -}

1570 computeDiscount dflags arg_discounts res_discount arg_infos cont_info

1571 -- We multiple the raw discounts (args_discount and result_discount)

1572 -- ty opt_UnfoldingKeenessFactor because the former have to do with

1573 -- *size* whereas the discounts imply that there's some extra

1574 -- *efficiency* to be gained (e.g. beta reductions, case reductions)

1575 -- by inlining.

1578 -- so we count 10 for the function itself

1581 -- Discount of 10 for each arg supplied,

1582 -- because the result replaces the call

1586 where

1592 mk_arg_discount discount ValueArg = discount

1594 res_discount'

1595 | LT <- arg_discounts `compareLength` arg_infos

1597 | otherwise

1603 -- ToDo: this 40 `min` res_discount doesn't seem right

1604 -- for DiscArgCtxt it shouldn't matter because the function will

1605 -- get the arg discount for any non-triv arg

1606 -- for RuleArgCtxt we do want to be keener to inline; but not only

1607 -- constructor results

1608 -- for RhsCtxt I suppose that exposing a data con is good in general

1609 -- And 40 seems very arbitrary

1610 --

1611 -- res_discount can be very large when a function returns

1612 -- constructors; but we only want to invoke that large discount

1613 -- when there's a case continuation.

1614 -- Otherwise we, rather arbitrarily, threshold it. Yuk.

1615 -- But we want to aovid inlining large functions that return

1616 -- constructors into contexts that are simply "interesting"