1 -- | Implements a selective lambda lifter, running late in the optimisation
2 -- pipeline.
3 --
4 -- The transformation itself is implemented in "StgLiftLams.Transformation".
5 -- If you are interested in the cost model that is employed to decide whether
6 -- to lift a binding or not, look at "StgLiftLams.Analysis".
7 -- "StgLiftLams.LiftM" contains the transformation monad that hides away some
8 -- plumbing of the transformation.
9 module StgLiftLams (
10 -- * Late lambda lifting in STG
11 -- \$note
12 Transformation.stgLiftLams
13 ) where
15 import qualified StgLiftLams.Transformation as Transformation
17 -- Note [Late lambda lifting in STG]
18 -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
19 -- \$note
21 -- and Trac #9476.
22 --
23 -- The basic idea behind lambda lifting is to turn locally defined functions
24 -- into top-level functions. Free variables are then passed as additional
25 -- arguments at *call sites* instead of having a closure allocated for them at
26 -- *definition site*. Example:
27 --
28 -- @
29 -- let x = ...; y = ... in
30 -- let f = {x y} \a -> a + x + y in
31 -- let g = {f x} \b -> f b + x in
32 -- g 5
33 -- @
34 --
35 -- Lambda lifting @f@ would
36 --
37 -- 1. Turn @f@'s free variables into formal parameters
38 -- 2. Update @f@'s call site within @g@ to @f x y b@
39 -- 3. Update @g@'s closure: Add @y@ as an additional free variable, while
40 -- removing @f@, because @f@ no longer allocates and can be floated to
41 -- top-level.
42 -- 4. Actually float the binding of @f@ to top-level, eliminating the @let@
43 -- in the process.
44 --
45 -- This results in the following program (with free var annotations):
46 --
47 -- @
48 -- f x y a = a + x + y;
49 -- let x = ...; y = ... in
50 -- let g = {x y} \b -> f x y b + x in
51 -- g 5
52 -- @
53 --
54 -- This optimisation is all about lifting only when it is beneficial to do so.
55 -- The above seems like a worthwhile lift, judging from heap allocation:
56 -- We eliminate @f@'s closure, saving to allocate a closure with 2 words, while
57 -- not changing the size of @g@'s closure.
58 --
59 -- You can probably sense that there's some kind of cost model at play here.
60 -- And you are right! But we also employ a couple of other heuristics for the
61 -- lifting decision which are outlined in "StgLiftLams.Analysis#when".
62 --
63 -- The transformation is done in "StgLiftLams.Transformation", which calls out
64 -- to 'StgLiftLams.Analysis.goodToLift' for its lifting decision.
65 -- It relies on "StgLiftLams.LiftM", which abstracts some subtle STG invariants
66 -- into a monadic substrate.
67 --
68 -- Suffice to say: We trade heap allocation for stack allocation.
69 -- The additional arguments have to passed on the stack (or in registers,
70 -- depending on architecture) every time we call the function to save a single
71 -- heap allocation when entering the let binding. Nofib suggests a mean
72 -- improvement of about 1% for this pass, so it seems like a worthwhile thing to
73 -- do. Compile-times went up by 0.6%, so all in all a very modest change.
74 --
75 -- For a concrete example, look at @spectral/atom@. There's a call to 'zipWith'
76 -- that is ultimately compiled to something like this
77 -- (module desugaring/lowering to actual STG):
78 --
79 -- @
80 -- propagate dt = ...;
81 -- runExperiment ... =
82 -- let xs = ... in
83 -- let ys = ... in
84 -- let go = {dt go} \xs ys -> case (xs, ys) of
85 -- ([], []) -> []
86 -- (x:xs', y:ys') -> propagate dt x y : go xs' ys'
87 -- in go xs ys
88 -- @
89 --
90 -- This will lambda lift @go@ to top-level, speeding up the resulting program
91 -- by roughly one percent:
92 --
93 -- @
94 -- propagate dt = ...;
95 -- go dt xs ys = case (xs, ys) of
96 -- ([], []) -> []
97 -- (x:xs', y:ys') -> propagate dt x y : go dt xs' ys'
98 -- runExperiment ... =
99 -- let xs = ... in
100 -- let ys = ... in
101 -- in go dt xs ys
102 -- @