1 -- | Implements a selective lambda lifter, running late in the optimisation

2 -- pipeline.

3 --

4 -- The transformation itself is implemented in "StgLiftLams.Transformation".

5 -- If you are interested in the cost model that is employed to decide whether

6 -- to lift a binding or not, look at "StgLiftLams.Analysis".

7 -- "StgLiftLams.LiftM" contains the transformation monad that hides away some

8 -- plumbing of the transformation.

10 -- * Late lambda lifting in STG

11 -- $note

12 Transformation.stgLiftLams

17 -- Note [Late lambda lifting in STG]

18 -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

19 -- $note

20 -- See also the <https://ghc.haskell.org/trac/ghc/wiki/LateLamLift wiki page>

21 -- and Trac #9476.

22 --

23 -- The basic idea behind lambda lifting is to turn locally defined functions

24 -- into top-level functions. Free variables are then passed as additional

25 -- arguments at *call sites* instead of having a closure allocated for them at

26 -- *definition site*. Example:

27 --

28 -- @

29 -- let x = ...; y = ... in

30 -- let f = {x y} \a -> a + x + y in

31 -- let g = {f x} \b -> f b + x in

32 -- g 5

33 -- @

34 --

35 -- Lambda lifting @f@ would

36 --

37 -- 1. Turn @f@'s free variables into formal parameters

38 -- 2. Update @f@'s call site within @g@ to @f x y b@

39 -- 3. Update @g@'s closure: Add @y@ as an additional free variable, while

40 -- removing @f@, because @f@ no longer allocates and can be floated to

41 -- top-level.

42 -- 4. Actually float the binding of @f@ to top-level, eliminating the @let@

43 -- in the process.

44 --

45 -- This results in the following program (with free var annotations):

46 --

47 -- @

48 -- f x y a = a + x + y;

49 -- let x = ...; y = ... in

50 -- let g = {x y} \b -> f x y b + x in

51 -- g 5

52 -- @

53 --

54 -- This optimisation is all about lifting only when it is beneficial to do so.

55 -- The above seems like a worthwhile lift, judging from heap allocation:

56 -- We eliminate @f@'s closure, saving to allocate a closure with 2 words, while

57 -- not changing the size of @g@'s closure.

58 --

59 -- You can probably sense that there's some kind of cost model at play here.

60 -- And you are right! But we also employ a couple of other heuristics for the

61 -- lifting decision which are outlined in "StgLiftLams.Analysis#when".

62 --

63 -- The transformation is done in "StgLiftLams.Transformation", which calls out

64 -- to 'StgLiftLams.Analysis.goodToLift' for its lifting decision.

65 -- It relies on "StgLiftLams.LiftM", which abstracts some subtle STG invariants

66 -- into a monadic substrate.

67 --

68 -- Suffice to say: We trade heap allocation for stack allocation.

69 -- The additional arguments have to passed on the stack (or in registers,

70 -- depending on architecture) every time we call the function to save a single

71 -- heap allocation when entering the let binding. Nofib suggests a mean

72 -- improvement of about 1% for this pass, so it seems like a worthwhile thing to

73 -- do. Compile-times went up by 0.6%, so all in all a very modest change.

74 --

75 -- For a concrete example, look at @spectral/atom@. There's a call to 'zipWith'

76 -- that is ultimately compiled to something like this

77 -- (module desugaring/lowering to actual STG):

78 --

79 -- @

80 -- propagate dt = ...;

81 -- runExperiment ... =

82 -- let xs = ... in

83 -- let ys = ... in

84 -- let go = {dt go} \xs ys -> case (xs, ys) of

85 -- ([], []) -> []

86 -- (x:xs', y:ys') -> propagate dt x y : go xs' ys'

87 -- in go xs ys

88 -- @

89 --

90 -- This will lambda lift @go@ to top-level, speeding up the resulting program

91 -- by roughly one percent:

92 --

93 -- @

94 -- propagate dt = ...;

95 -- go dt xs ys = case (xs, ys) of

96 -- ([], []) -> []

97 -- (x:xs', y:ys') -> propagate dt x y : go dt xs' ys'

98 -- runExperiment ... =

99 -- let xs = ... in

100 -- let ys = ... in

101 -- in go dt xs ys

102 -- @