Test Trac #9036
[ghc.git] / docs / comm / the-beast / stg.html
2 <html>
3 <head>
4 <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1">
5 <title>The GHC Commentary - You Got Control: The STG-language</title>
6 </head>
9 <h1>The GHC Commentary - You Got Control: The STG-language</h1>
10 <p>
11 GHC contains two completely independent backends: the byte code
12 generator and the machine code generator. The decision over which of
13 the two is invoked is made in <a
14 href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/main/HscMain.lhs"><code>HscMain</code></a><code>.hscCodeGen</code>.
15 The machine code generator proceeds itself in a number of phases: First,
16 the <a href="desugar.html">Core</a> intermediate language is translated
17 into <em>STG-language</em>; second, STG-language is transformed into a
18 GHC-internal variant of <a href="http://www.cminusminus.org/">C--</a>;
19 and thirdly, this is either emitted as concrete C--, converted to GNU C,
20 or translated to native code (by the <a href="ncg.html">native code
21 generator</a> which targets IA32, Sparc, and PowerPC [as of March '5]).
22 </p>
23 <p>
24 In the following, we will have a look at the first step of machine code
25 generation, namely the translation steps involving the STG-language.
26 Details about the underlying abstract machine, the <em>Spineless Tagless
27 G-machine</em>, are in <a
28 href="http://research.microsoft.com/copyright/accept.asp?path=/users/simonpj/papers/spineless-tagless-gmachine.ps.gz&pub=34">Implementing
29 lazy functional languages on stock hardware: the Spineless Tagless
30 G-machine</a>, SL Peyton Jones, Journal of Functional Programming 2(2),
31 Apr 1992, pp127-202. (Some details have changed since the publication of
32 this article, but it still gives a good introduction to the main
33 concepts.)
34 </p>
36 <h2>The STG Language</h2>
37 <p>
38 The AST of the STG-language and the generation of STG code from Core is
39 both located in the <a
40 href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/stgSyn/"><code>stgSyn/</code></a>
41 directory; in the modules <a
42 href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/stgSyn/StgSyn.lhs"><code>StgSyn</code></a>
43 and <a
44 href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/stgSyn/CoreToStg.lhs"><code>CoreToStg</code></a>,
45 respectively.
46 </p>
47 <p>
48 Conceptually, the STG-language is a lambda calculus (including data
49 constructors and case expressions) whose syntax is restricted to make
50 all control flow explicit. As such, it can be regarded as a variant of
51 <em>administrative normal form (ANF).</em> (C.f., <a
52 href="http://doi.acm.org/10.1145/173262.155113">The essence of compiling
53 with continuations.</a> Cormac Flanagan, Amr Sabry, Bruce F. Duba, and
54 Matthias Felleisen. <em>ACM SIGPLAN Conference on Programming Language
55 Design and Implementation,</em> ACM Press, 1993.) Each syntactic from
56 has a precise operational interpretation, in addition to the
57 denotational interpretation inherited from the lambda calculus. The
58 concrete representation of the STG language inside GHC also includes
59 auxiliary attributes, such as <em>static reference tables (SRTs),</em>
60 which determine the top-level bindings referenced by each let binding
61 and case expression.
62 </p>
63 <p>
64 As usual in ANF, arguments to functions etc. are restricted to atoms
65 (i.e., constants or variables), which implies that all sub-expressions
66 are explicitly named and evaluation order is explicit. Specific to the
67 STG language is that all let bindings correspond to closure allocation
68 (thunks, function closures, and data constructors) and that case
69 expressions encode both computation and case selection. There are two
70 flavours of case expressions scrutinising boxed and unboxed values,
71 respectively. The former perform function calls including demanding the
72 evaluation of thunks, whereas the latter execute primitive operations
73 (such as arithmetic on fixed size integers and floating-point numbers).
74 </p>
75 <p>
76 The representation of STG language defined in <a
77 href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/stgSyn/StgSyn.lhs"><code>StgSyn</code></a>
78 abstracts over both binders and occurrences of variables. The type names
79 involved in this generic definition all carry the prefix
80 <code>Gen</code> (such as in <code>GenStgBinding</code>). Instances of
81 these generic definitions, where both binders and occurrences are of type
82 <a
83 href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/basicTypes/Id.lhs"><code>Id</code></a><code>.Id</code>
84 are defined as type synonyms and use type names that drop the
85 <code>Gen</code> prefix (i.e., becoming plain <code>StgBinding</code>).
86 Complete programs in STG form are represented by values of type
87 <code>[StgBinding]</code>.
88 </p>
90 <h2>From Core to STG</h2>
91 <p>
92 Although, the actual translation from Core AST into STG AST is performed
93 by the function <a
94 href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/stgSyn/CoreToStg.lhs"><code>CoreToStg</code></a><code>.coreToStg</code>
95 (or <a
96 href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/stgSyn/CoreToStg.lhs"><code>CoreToStg</code></a><code>.coreExprToStg</code>
97 for individual expressions), the translation crucial depends on <a
98 href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/coreSyn/CorePrep.lhs"><code>CorePrep</code></a><code>.corePrepPgm</code>
99 (resp. <a
100 href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/coreSyn/CorePrep.lhs"><code>CorePrep</code></a><code>.corePrepExpr</code>),
101 which prepares Core code for code generation (for both byte code and
102 machine code generation). <code>CorePrep</code> saturates primitive and
103 constructor applications, turns the code into A-normal form, renames all
104 identifiers into globally unique names, generates bindings for
105 constructor workers, constructor wrappers, and record selectors plus
106 some further cleanup.
107 </p>
108 <p>
109 In other words, after Core code is prepared for code generation it is
110 structurally already in the form required by the STG language. The main
111 work performed by the actual transformation from Core to STG, as
112 performed by <a
113 href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/stgSyn/CoreToStg.lhs"><code>CoreToStg</code></a><code>.coreToStg</code>,
114 is to compute the live and free variables as well as live CAFs (constant
115 applicative forms) at each let binding and case alternative. In
116 subsequent phases, the live CAF information is used to compute SRTs.
117 The live variable information is used to determine which stack slots
118 need to be zapped (to avoid space leaks) and the free variable
119 information is need to construct closures. Moreover, hints for
120 optimised code generation are computed, such as whether a closure needs
121 to be updated after is has been evaluated.
122 </p>
124 <h2>STG Passes</h2>
125 <p>
126 These days little actual work is performed on programs in STG form; in
127 particular, the code is not further optimised. All serious optimisation
128 (except low-level optimisations which are performed during native code
129 generation) has already been done on Core. The main task of <a
130 href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/stgSyn/CoreToStg.lhs"><code>CoreToStg</code></a><code>.stg2stg</code>
131 is to compute SRTs from the live CAF information determined during STG
132 generation. Other than that, <a
133 href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/profiling/SCCfinal.lhs"><code>SCCfinal</code></a><code>.stgMassageForProfiling</code>
134 is executed when compiling for profiling and information may be dumped
135 for debugging purposes.
136 </p>
138 <h2>Towards C--</h2>
139 <p>
140 GHC's internal form of C-- is defined in the module <a
141 href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/cmm/Cmm.hs"><code>Cmm</code></a>.
142 The definition is generic in that it abstracts over the type of static
143 data and of the contents of basic blocks (i.e., over the concrete
144 representation of constant data and instructions). These generic
145 definitions have names carrying the prefix <code>Gen</code> (such as
146 <code>GenCmm</code>). The same module also instantiates the generic
147 form to a concrete form where data is represented by
148 <code>CmmStatic</code> and instructions are represented by
149 <code>CmmStmt</code> (giving us, e.g., <code>Cmm</code> from
150 <code>GenCmm</code>). The concrete form more or less follows the
151 external <a href="http://www.cminusminus.org/">C--</a> language.
152 </p>
153 <p>
154 Programs in STG form are translated to <code>Cmm</code> by <a
155 href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/codeGen/CodeGen.lhs"><code>CodeGen</code></a><code>.codeGen</code>
156 </p>
158 <p><hr><small>
159 <!-- hhmts start -->
160 Last modified: Sat Mar 5 22:55:25 EST 2005
161 <!-- hhmts end -->
162 </small>
163 </body>
164 </html>