Add docbook-ised external-core doc; from #5844, by James H. Fisher
authorIan Lynagh <igloo@earth.li>
Tue, 14 Feb 2012 14:43:08 +0000 (14:43 +0000)
committerIan Lynagh <igloo@earth.li>
Tue, 14 Feb 2012 15:27:04 +0000 (15:27 +0000)
docs/users_guide/external_core.xml [new file with mode: 0644]
docs/users_guide/ug-book.xml.in
docs/users_guide/ug-ent.xml.in

diff --git a/docs/users_guide/external_core.xml b/docs/users_guide/external_core.xml
new file mode 100644 (file)
index 0000000..faf7148
--- /dev/null
@@ -0,0 +1,1807 @@
+<?xml version="1.0" encoding="utf-8"?>
+
+<!--
+This document is a semi-automatic conversion of docs/ext-core/core.tex to DocBook using
+
+1. `htlatex` to convert LaTeX to HTML
+2. `pandoc` to convert HTML to DocBook
+3. extensive manual work by James H. Fisher (jameshfisher@gmail.com)
+-->
+
+<!--
+TODO:
+
+* Replace "java" programlisting with "ghccore"
+("ghccore" is not recognized by highlighters,
+causing some generators to fail).
+
+* Complete bibliography entries with journal titles;
+I am unsure of the proper DocBook elements.
+
+* Integrate this file with the rest of the Users' Guide.
+-->
+
+
+<chapter id="an-external-representation-for-the-ghc-core-language-for-ghc-6.10">
+  <title>An External Representation for the GHC Core Language (For GHC 6.10)</title>
+
+  <para>Andrew Tolmach, Tim Chevalier ({apt,tjc}@cs.pdx.edu) and The GHC Team</para>
+
+  <abstract>
+    <para>This document provides a precise definition for the GHC Core
+    language, so that it can be used to communicate between GHC and new
+    stand-alone compilation tools such as back-ends or
+    optimizers.<footnote>
+    <para>This is a draft document, which attempts
+    to describe GHC’s current behavior as precisely as possible. Working
+    notes scattered throughout indicate areas where further work is
+    needed. Constructive comments are very welcome, both on the
+    presentation, and on ways in which GHC could be improved in order to
+    simplify the Core story.</para>
+
+    <para>Support for generating external Core (post-optimization) was
+    originally introduced in GHC 5.02. The definition of external Core in
+    this document reflects the version of external Core generated by the
+    HEAD (unstable) branch of GHC as of May 3, 2008 (version 6.9), using
+    the compiler flag <code>-fext-core</code>. We expect that GHC 6.10 will be
+    consistent with this definition.</para>
+  </footnote>
+  The definition includes a formal grammar and an informal semantics.
+  An executable typechecker and interpreter (in Haskell), which
+  formally embody the static and dynamic semantics, are available
+    separately.</para>
+
+  </abstract>
+
+  <section id="introduction">
+    <title>Introduction</title>
+
+    <para>The Glasgow Haskell Compiler (GHC) uses an intermediate language,
+    called <quote>Core,</quote> as its internal program representation within the
+    compiler’s simplification phase. Core resembles a subset of
+    Haskell, but with explicit type annotations in the style of the
+    polymorphic lambda calculus (F<subscript>ω</subscript>).</para>
+
+    <para>GHC’s front-end translates full Haskell 98 (plus some extensions)
+    into Core. The GHC optimizer then repeatedly transforms Core
+    programs while preserving their meaning. A <quote>Core Lint</quote> pass in GHC
+    typechecks Core in between transformation passes (at least when
+    the user enables linting by setting a compiler flag), verifying
+    that transformations preserve type-correctness. Finally, GHC’s
+    back-end translates Core into STG-machine code <citation>stg-machine</citation> and then into C
+    or native code.</para>
+
+    <para>Two existing papers discuss the original rationale for the design
+    and use of Core <citation>ghc-inliner,comp-by-trans-scp</citation>, although the (two different) idealized
+    versions of Core described therein differ in significant ways from
+    the actual Core language in current GHC. In particular, with the
+    advent of GHC support for generalized algebraic datatypes (GADTs)
+    <citation>gadts</citation> Core was extended beyond its previous
+    F<subscript>ω</subscript>-style incarnation to support type
+    equality constraints and safe coercions, and is now based on a
+    system known as F<subscript>C</subscript> <citation>system-fc</citation>.</para>
+
+    <para>Researchers interested in writing just <emphasis>part</emphasis> of a Haskell compiler,
+    such as a new back-end or a new optimizer pass, might like to use
+    GHC to provide the other parts of the compiler. For example, they
+    might like to use GHC’s front-end to parse, desugar, and
+    type-check source Haskell, then feeding the resulting code to
+    their own back-end tool. As another example, they might like to
+    use Core as the target language for a front-end compiler of their
+    own design, feeding externally synthesized Core into GHC in order
+    to take advantage of GHC’s optimizer, code generator, and run-time
+    system. Without external Core, there are two ways for compiler
+    writers to do this: they can link their code into the GHC
+    executable, which is an arduous process, or they can use the GHC
+    API <citation>ghc-api</citation> to do the same task more cleanly. Both ways require new
+    code to be written in Haskell.</para>
+
+    <para>We present a precisely specified external format for Core files.
+    The external format is text-based and human-readable, to promote
+    interoperability and ease of use. We hope this format will make it
+    easier for external developers to use GHC in a modular way.</para>
+
+    <para>It has long been true that GHC prints an ad-hoc textual
+    representation of Core if you set certain compiler flags. But this
+    representation is intended to be read by people who are debugging
+    the compiler, not by other programs. Making Core into a
+    machine-readable, bi-directional communication format requires:
+
+    <orderedlist>
+      <listitem>
+        precisely specifying the external format of Core;
+      </listitem>
+      <listitem>
+        modifying GHC to generate external Core files
+        (post-simplification; as always, users can control the exact
+        transformations GHC does with command-line flags);
+      </listitem>
+      <listitem>
+        modifying GHC to accept external Core files in place of
+        Haskell source files (users will also be able to control what
+        GHC does to those files with command-line flags).
+      </listitem>
+    </orderedlist>
+    </para>
+
+    <para>The first two facilities will let developers couple GHC’s
+    front-end (parser, type-checker, desugarer), and optionally its
+    optimizer, with new back-end tools. The last facility will let
+    developers write new Core-to-Core transformations as an external
+    tool and integrate them into GHC. It will also allow new
+    front-ends to generate Core that can be fed into GHC’s optimizer
+    or back-end.</para>
+
+    <para>However, because there are many (undocumented) idiosyncracies in
+    the way GHC produces Core from source Haskell, it will be hard for
+    an external tool to produce Core that can be integrated with
+    GHC-produced Core (e.g., for the Prelude), and we don’t aim to
+    support this. Indeed, for the time being, we aim to support only
+    the first two facilities and not the third: we define and
+    implement Core as an external format that GHC can use to
+    communicate with external back-end tools, and defer the larger
+    task of extending GHC to support reading this external format back
+    in.</para>
+
+    <para>This document addresses the first requirement, a formal Core
+    definition, by proposing a formal grammar for an
+    <link linkend="external-grammar-of-core">external representation of Core</link>,
+    and an <link linkend="informal-semantics">informal semantics</link>.</para>
+
+    <para>GHC supports many type system extensions; the External Core
+    printer built into GHC only supports some of them. However,
+    External Core should be capable of representing any Haskell 98
+    program, and may be able to represent programs that require
+    certain type system extensions as well. If a program uses
+    unsupported features, GHC may fail to compile it to Core when the
+    -fext-core flag is set, or GHC may successfully compile it to
+    Core, but the external tools will not be able to typecheck or
+    interpret it.</para>
+
+    <para>Formal static and dynamic semantics in the form of an executable
+    typechecker and interpreter are available separately in the GHC
+    source tree
+    <footnote><ulink url="http://darcs.haskell.org/ghc">http://darcs.haskell.org/ghc</ulink></footnote>
+    under <code>utils/ext-core</code>.</para>
+
+  </section>
+  <section id="external-grammar-of-core">
+    <title>External Grammar of Core</title>
+
+    <para>In designing the external grammar, we have tried to strike a
+    balance among a number of competing goals, including easy
+    parseability by machines, easy readability by humans, and adequate
+    structural simplicity to allow straightforward presentations of
+    the semantics. Thus, we had to make some compromises.
+    Specifically:</para>
+
+    <itemizedlist>
+      <listitem>In order to avoid explosion of parentheses, we support
+      standard precedences and short-cuts for expressions, types,
+      and kinds. Thus we had to introduce multiple non-terminals for
+      each of these syntactic categories, and as a result, the
+      concrete grammar is longer and more complex than the
+      underlying abstract syntax.</listitem>
+
+      <listitem>On the other hand, we have kept the grammar simpler by
+      avoiding special syntax for tuple types and terms. Tuples
+      (both boxed and unboxed) are treated as ordinary constructors.</listitem>
+
+      <listitem>All type abstractions and applications are given in full, even
+      though some of them (e.g., for tuples) could be reconstructed;
+      this means a parser for Core does not have to reconstruct
+      types.<footnote>
+      These choices are certainly debatable. In
+      particular, keeping type applications on tuples and case arms
+      considerably increases the size of Core files and makes them less
+      human-readable, though it allows a Core parser to be simpler.
+      </footnote></listitem>
+
+      <listitem>The syntax of identifiers is heavily restricted (to just
+      alphanumerics and underscores); this again makes Core easier
+      to parse but harder to read.</listitem>
+    </itemizedlist>
+
+    <para>We use the following notational conventions for syntax:
+
+    <informaltable frame="none">
+      <tgroup cols='2' align='left' colsep="0" rowsep="0">
+        <tbody>
+          <row>
+            <entry>[ pat ]</entry>
+            <entry>optional</entry>
+          </row>
+
+          <row>
+            <entry>{ pat }</entry>
+            <entry>zero or more repetitions</entry>
+          </row>
+
+          <row>
+            <entry>
+              { pat }<superscript>+</superscript>
+            </entry>
+            <entry>one or more repetitions</entry>
+          </row>
+
+          <row>
+            <entry>
+              pat<subscript>1</subscript> ∣ pat<subscript>2</subscript>
+            </entry>
+            <entry>choice</entry>
+          </row>
+
+          <row>
+            <entry>
+              <code>fibonacci</code>
+            </entry>
+            <entry>terminal syntax in typewriter font</entry>
+          </row>
+        </tbody>
+      </tgroup>
+    </informaltable>
+    </para>
+
+    <informaltable frame="none" colsep="0" rowsep="0">
+      <tgroup cols='5'>
+        <colspec colname="cat"  align="left"   colwidth="3*"  />
+        <colspec colname="lhs"  align="right"  colwidth="2*"  />
+        <colspec                align="center" colwidth="*"   />
+        <colspec colname="rhs"  align="left"   colwidth="10*" />
+        <colspec colname="name" align="right"  colwidth="6*"  />
+        <tbody>
+          <row rowsep="1">
+            <entry>Module</entry>
+            <entry>module</entry>
+            <entry>→</entry>
+            <entry>
+              <code>%module</code> mident { tdef ; }{ vdefg ; }
+            </entry>
+            <entry></entry>
+          </row>
+
+          <row>
+            <entry morerows="1" valign="top">Type defn.</entry>
+            <entry morerows="1" valign="top">tdef</entry>
+            <entry>→</entry>
+            <entry>
+              <code>%data</code> qtycon { tbind } <code>= {</code>  [ cdef {<code>;</code> cdef } ] <code>}</code>
+            </entry>
+            <entry>algebraic type</entry>
+          </row>
+          <row rowsep="1">
+            <entry>∣</entry>
+            <entry>
+              <code>%newtype</code> qtycon qtycon { tbind } <code>=</code> ty
+            </entry>
+            <entry>newtype</entry>
+          </row>
+
+          <row rowsep="1">
+            <entry>Constr. defn.</entry>
+            <entry>cdef</entry>
+            <entry>→</entry>
+            <entry>
+              qdcon { <code>@</code> tbind }{ aty }<superscript>+</superscript>
+            </entry>
+          </row>
+
+          <row>
+            <entry morerows="2" valign="top">Value defn.</entry>
+            <entry morerows="1" valign="top">vdefg</entry>
+            <entry>→</entry>
+            <entry><code>%rec {</code> vdef { <code>;</code> vdef } <code>}</code></entry>
+            <entry>recursive</entry>
+          </row>
+
+          <row>
+            <entry>∣</entry>
+            <entry>vdef</entry>
+            <entry>non-recursive</entry>
+          </row>
+
+          <row rowsep="1">
+            <entry>vdef</entry>
+            <entry>→</entry>
+            <entry>qvar <code>::</code> ty <code>=</code> exp</entry>
+            <entry></entry>
+          </row>
+
+          <row>
+            <entry morerows="3" valign="top">Atomic expr.</entry>
+            <entry morerows="3" valign="top">aexp</entry>
+            <entry>→</entry>
+            <entry>qvar</entry>
+            <entry>variable</entry>
+          </row>
+
+          <row>
+            <entry>∣</entry>
+            <entry>qdcon</entry>
+            <entry>data constructor</entry>
+          </row>
+
+          <row>
+            <entry>∣</entry>
+            <entry>lit</entry>
+            <entry>literal</entry>
+          </row>
+
+          <row rowsep="1">
+            <entry>∣</entry>
+            <entry><code>(</code> exp <code>)</code></entry>
+            <entry>nested expr.</entry>
+          </row>
+
+          <row>
+            <entry morerows="9" valign="top">Expression</entry>
+            <entry morerows="9" valign="top">exp</entry>
+            <entry>→</entry>
+            <entry>aexp</entry>
+            <entry>atomic expresion</entry>
+          </row>
+
+          <row>
+            <entry>∣</entry>
+            <entry>aexp { arg }<superscript>+</superscript></entry>
+            <entry>application</entry>
+          </row>
+
+          <row>
+            <entry>∣</entry>
+            <entry><code>\</code> { binder }<superscript>+</superscript> &arw; exp</entry>
+            <entry>abstraction</entry>
+          </row>
+
+          <row>
+            <entry>∣</entry>
+            <entry><code>%let</code> vdefg <code>%in</code> exp</entry>
+            <entry>local definition</entry>
+          </row>
+
+          <row>
+            <entry>∣</entry>
+            <entry><code>%case (</code> aty <code>)</code> exp <code>%of</code> vbind <code>{</code> alt { <code>;</code> alt } <code>}</code></entry>
+            <entry>case expression</entry>
+          </row>
+
+          <row>
+            <entry>∣</entry>
+            <entry><code>%cast</code> exp aty</entry>
+            <entry>type coercion</entry>
+          </row>
+
+          <row>
+            <entry>∣</entry>
+            <entry><code>%note</code> &quot; { char } &quot; exp</entry>
+            <entry>expression note</entry>
+          </row>
+
+          <row>
+            <entry>∣</entry>
+            <entry><code>%external ccall &quot;</code> { char } <code>&quot;</code> aty</entry>
+            <entry>external reference</entry>
+          </row>
+
+          <row>
+            <entry>∣</entry>
+            <entry><code>%dynexternal ccall</code> aty</entry>
+            <entry>external reference (dynamic)</entry>
+          </row>
+
+          <row rowsep="1">
+            <entry>∣</entry>
+            <entry><code>%label &quot;</code> { char } <code>&quot;</code></entry>
+            <entry>external label</entry>
+          </row>
+
+          <row>
+            <entry morerows="1" valign="top">Argument</entry>
+            <entry morerows="1" valign="top">arg</entry>
+            <entry>→</entry>
+            <entry><code>@</code> aty</entry>
+            <entry>type argument</entry>
+          </row>
+
+          <row rowsep="1">
+            <entry>∣</entry>
+            <entry>aexp</entry>
+            <entry>value argument</entry>
+          </row>
+
+          <row>
+            <entry morerows="2" valign="top">Case alt.</entry>
+            <entry morerows="2" valign="top">alt</entry>
+            <entry>→</entry>
+            <entry>qdcon { <code>@</code> tbind }{ vbind } <code>&arw;</code> exp</entry>
+            <entry>constructor alternative</entry>
+          </row>
+
+          <row>
+            <entry>∣</entry>
+            <entry>lit <code>&arw;</code> exp</entry>
+            <entry>literal alternative</entry>
+          </row>
+
+          <row rowsep="1">
+            <entry>∣</entry>
+            <entry><code>%_ &arw;</code> exp</entry>
+            <entry>default alternative</entry>
+          </row>
+
+          <row>
+            <entry morerows="1" valign="top">Binder</entry>
+            <entry morerows="1" valign="top">binder</entry>
+            <entry>→</entry>
+            <entry><code>@</code> tbind</entry>
+            <entry>type binder</entry>
+          </row>
+
+          <row rowsep="1">
+            <entry>∣</entry>
+            <entry>vbind</entry>
+            <entry>value binder</entry>
+          </row>
+
+          <row>
+            <entry morerows="1" valign="top">Type binder</entry>
+            <entry morerows="1" valign="top">tbind</entry>
+            <entry>→</entry>
+            <entry>tyvar</entry>
+            <entry>implicitly of kind *</entry>
+          </row>
+
+          <row rowsep="1">
+            <entry>∣</entry>
+            <entry><code>(</code> tyvar <code>::</code> kind <code>)</code></entry>
+            <entry>explicitly kinded</entry>
+          </row>
+
+          <row rowsep="1">
+            <entry>Value binder</entry>
+            <entry>vbind</entry>
+            <entry>→</entry>
+            <entry><code>(</code> var <code>::</code> ty <code>)</code></entry>
+            <entry></entry>
+          </row>
+
+          <row>
+            <entry morerows="3" valign="top">Literal</entry>
+            <entry morerows="3" valign="top">lit</entry>
+            <entry>→</entry>
+            <entry><code>(</code> [<code>-</code>] { digit }<superscript>+</superscript> <code>::</code> ty <code>)</code></entry>
+            <entry>integer</entry>
+          </row>
+
+          <row>
+            <entry>∣</entry>
+            <entry><code>(</code> [<code>-</code>] { digit }<superscript>+</superscript> <code>%</code> { digit }<superscript>+</superscript> <code>::</code> ty <code>)</code></entry>
+            <entry>rational</entry>
+          </row>
+
+          <row>
+            <entry>∣</entry>
+            <entry><code>( '</code> char <code>' ::</code> ty <code>)</code></entry>
+            <entry>character</entry>
+          </row>
+
+          <row rowsep="1">
+            <entry>∣</entry>
+            <entry><code>( &quot;</code> { char } <code>&quot; ::</code> ty <code>)</code></entry>
+            <entry>string</entry>
+          </row>
+
+          <row>
+            <entry morerows="2" valign="top">Character</entry>
+            <entry morerows="1" valign="top">char</entry>
+            <entry>→</entry>
+            <entry namest="rhs" nameend="name"><emphasis>any ASCII character in range 0x20-0x7E except 0x22,0x27,0x5c</emphasis></entry>
+          </row>
+
+          <row>
+            <entry>∣</entry>
+            <entry><code>\x</code> hex hex</entry>
+            <entry>ASCII code escape sequence</entry>
+          </row>
+
+          <row rowsep="1">
+            <entry>hex</entry>
+            <entry>→</entry>
+            <entry>0∣…∣9 ∣a ∣…∣f</entry>
+            <entry></entry>
+          </row>
+
+          <row>
+            <entry morerows="2" valign="top">Atomic type</entry>
+            <entry morerows="2" valign="top">aty</entry>
+            <entry>→</entry>
+            <entry>tyvar</entry>
+            <entry>type variable</entry>
+          </row>
+
+          <row>
+            <entry>∣</entry>
+            <entry>qtycon</entry>
+            <entry>type constructor</entry>
+          </row>
+
+          <row rowsep="1">
+            <entry>∣</entry>
+            <entry><code>(</code> ty <code>)</code></entry>
+            <entry>nested type</entry>
+          </row>
+
+          <row>
+            <entry morerows="7" valign="top">Basic type</entry>
+            <entry morerows="7" valign="top">bty</entry>
+            <entry>→</entry>
+            <entry>aty</entry>
+            <entry>atomic type</entry>
+          </row>
+
+          <row>
+            <entry>∣</entry>
+            <entry>bty aty</entry>
+            <entry>type application</entry>
+          </row>
+
+          <row>
+            <entry>∣</entry>
+            <entry><code>%trans</code> aty aty</entry>
+            <entry>transitive coercion</entry>
+          </row>
+
+          <row>
+            <entry>∣</entry>
+            <entry><code>%sym</code> aty</entry>
+            <entry>symmetric coercion</entry>
+          </row>
+
+          <row>
+            <entry>∣</entry>
+            <entry><code>%unsafe</code> aty aty</entry>
+            <entry>unsafe coercion</entry>
+          </row>
+
+          <row>
+            <entry>∣</entry>
+            <entry><code>%left</code> aty</entry>
+            <entry>left coercion</entry>
+          </row>
+
+          <row>
+            <entry>∣</entry>
+            <entry><code>%right</code> aty</entry>
+            <entry>right coercion</entry>
+          </row>
+
+          <row rowsep="1">
+            <entry>∣</entry>
+            <entry><code>%inst</code> aty aty</entry>
+            <entry>instantiation coercion</entry>
+          </row>
+
+          <row>
+            <entry morerows="2" valign="top">Type</entry>
+            <entry morerows="2" valign="top">ty</entry>
+            <entry>→</entry>
+            <entry>bty</entry>
+            <entry>basic type</entry>
+          </row>
+
+          <row>
+            <entry>∣</entry>
+            <entry><code>%forall</code> { tbind }<superscript>+</superscript> <code>.</code> ty</entry>
+            <entry>type abstraction</entry>
+          </row>
+
+          <row rowsep="1">
+            <entry>∣</entry>
+            <entry>bty <code>&arw;</code> ty</entry>
+            <entry>arrow type construction</entry>
+          </row>
+
+          <row>
+            <entry morerows="4" valign="top">Atomic kind</entry>
+            <entry morerows="4" valign="top">akind</entry>
+            <entry>→</entry>
+            <entry><code>*</code></entry>
+            <entry>lifted kind</entry>
+          </row>
+
+          <row>
+            <entry>∣</entry>
+            <entry><code>#</code></entry>
+            <entry>unlifted kind</entry>
+          </row>
+
+          <row>
+            <entry>∣</entry>
+            <entry><code>?</code></entry>
+            <entry>open kind</entry>
+          </row>
+
+          <row>
+            <entry>∣</entry>
+            <entry>bty <code>:=:</code> bty</entry>
+            <entry>equality kind</entry>
+          </row>
+
+          <row rowsep="1">
+            <entry>∣</entry>
+            <entry><code>(</code> kind <code>)</code></entry>
+            <entry>nested kind</entry>
+          </row>
+
+          <row>
+            <entry morerows="1" valign="top">Kind</entry>
+            <entry morerows="1" valign="top">kind</entry>
+            <entry>→</entry>
+            <entry>akind</entry>
+            <entry>atomic kind</entry>
+          </row>
+
+          <row rowsep="1">
+            <entry>∣</entry>
+            <entry>akind <code>&arw;</code> kind</entry>
+            <entry>arrow kind</entry>
+          </row>
+
+          <row>
+            <entry morerows="7" valign="top">Identifier</entry>
+            <entry>mident</entry>
+            <entry>→</entry>
+            <entry>pname <code>:</code> uname</entry>
+            <entry>module</entry>
+          </row>
+
+          <row>
+            <entry>tycon</entry>
+            <entry>→</entry>
+            <entry>uname</entry>
+            <entry>type constr.</entry>
+          </row>
+
+          <row>
+            <entry>qtycon</entry>
+            <entry>→</entry>
+            <entry>mident <code>.</code> tycon</entry>
+            <entry>qualified type constr.</entry>
+          </row>
+
+          <row>   
+            <entry>tyvar</entry>
+            <entry>→</entry>
+            <entry>lname</entry>
+            <entry>type variable</entry>
+          </row>
+
+          <row>
+            <entry>dcon</entry>
+            <entry>→</entry>
+            <entry>uname</entry>
+            <entry>data constr.</entry>
+          </row>
+
+          <row>
+            <entry>qdcon</entry>
+            <entry>→</entry>
+            <entry>mident <code>.</code> dcon</entry>
+            <entry>qualified data constr.</entry>
+          </row>
+
+          <row>
+            <entry>var</entry>
+            <entry>→</entry>
+            <entry>lname</entry>
+            <entry>variable</entry>
+          </row>
+
+          <row rowsep="1">
+            <entry>qvar</entry>
+            <entry>→</entry>
+            <entry>[ mident <code>.</code> ] var</entry>
+            <entry>optionally qualified variable</entry>
+          </row>
+
+          <row>
+            <entry morerows="6" valign="top">Name</entry>
+            <entry>lname</entry>
+            <entry>→</entry>
+            <entry>lower { namechar }</entry>
+            <entry></entry>
+          </row>
+
+          <row>
+            <entry>uname</entry>
+            <entry>→</entry>
+            <entry>upper { namechar }</entry>
+            <entry></entry>
+          </row>
+
+          <row>
+            <entry>pname</entry>
+            <entry>→</entry>
+            <entry>{ namechar }<superscript>+</superscript></entry>
+            <entry></entry>
+          </row>
+
+          <row>
+            <entry>namechar</entry>
+            <entry>→</entry>
+            <entry>lower ∣ upper ∣ digit</entry>
+            <entry></entry>
+          </row>
+
+          <row>
+            <entry>lower</entry>
+            <entry>→</entry>
+            <entry><code>a</code> ∣ <code>b</code> ∣ … ∣ <code>z</code> ∣ <code>_</code></entry>
+            <entry></entry>
+          </row>
+
+          <row>
+            <entry>upper</entry>
+            <entry>→</entry>
+            <entry><code>A</code> ∣ <code>B</code> ∣ … ∣ <code>Z</code></entry>
+            <entry></entry>
+          </row>
+
+          <row>
+            <entry>digit</entry>
+            <entry>→</entry>
+            <entry><code>0</code> ∣ <code>1</code> ∣ … ∣ <code>9</code></entry>
+            <entry></entry>
+          </row>
+        </tbody>
+      </tgroup>
+    </informaltable>
+  </section>
+
+  <section id="informal-semantics">
+    <title>Informal Semantics</title>
+
+    <para>At the term level, Core resembles a explicitly-typed polymorphic
+    lambda calculus (F<subscript>ω</subscript>), with the addition of
+    local <code>let</code> bindings, algebraic type definitions, constructors, and
+    <code>case</code> expressions, and primitive types, literals and operators. Its
+    type system is richer than that of System F, supporting explicit
+    type equality coercions and type functions.<citation>system-fc</citation></para>
+
+    <para>In this section we concentrate on the less obvious points about
+    Core.</para>
+
+    <section id="program-organization-and-modules">
+      <title>Program Organization and Modules</title>
+
+      <para>Core programs are organized into <emphasis>modules</emphasis>, corresponding directly
+      to source-level Haskell modules. Each module has a identifying
+      name <emphasis>mident</emphasis>. A module identifier consists of a <emphasis>package name</emphasis>
+      followed by a module name, which may be hierarchical: for
+      example, <code>base:GHC.Base</code> is the module identifier for GHC’s Base
+      module. Its name is <code>Base</code>, and it lives in the GHC hierarchy
+      within the <code>base</code> package. Section 5.8 of the GHC users’ guide
+      explains package names <citation>ghc-user-guide</citation>. In particular, note that a Core
+      program may contain multiple modules with the same (possibly
+      hierarchical) module name that differ in their package names. In
+      some of the code examples that follow, we will omit package
+      names and possibly full hierarchical module names from
+      identifiers for brevity, but be aware that they are always
+      required.<footnote>
+      A possible improvement to the Core syntax
+      would be to add explicit import lists to Core modules, which could be
+      used to specify abbrevations for long qualified names. This would make
+      the code more human-readable.
+      </footnote></para>
+
+      <para>Each module may contain the following kinds of top-level
+      declarations:
+
+      <itemizedlist>
+        <listitem>
+          Algebraic data type declarations, each defining a type
+          constructor and one or more data constructors;
+        </listitem>
+        <listitem>
+          Newtype declarations, corresponding to Haskell <code>newtype</code>
+          declarations, each defining a type constructor and a
+          coercion name; and
+        </listitem>
+        <listitem>
+          Value declarations, defining the types and values of
+          top-level variables.
+        </listitem>
+      </itemizedlist>
+      </para>
+
+      <para>No type constructor, data constructor, or top-level value may be
+      declared more than once within a given module. All the type
+      declarations are (potentially) mutually recursive. Value
+      declarations must be in dependency order, with explicit grouping
+      of potentially mutually recursive declarations.</para>
+
+      <para>Identifiers defined in top-level declarations may be <emphasis>external</emphasis> or
+      <emphasis>internal</emphasis>. External identifiers can be referenced from any other
+      module in the program, using conventional dot notation (e.g.,
+      <code>base:GHC.Base.Bool</code>, <code>base:GHC.Base.True</code>). Internal identifiers
+      are visible only within the defining module. All type and data
+      constructors are external, and are always defined and referenced
+      using fully qualified names (with dots).</para>
+
+      <para>A top-level value is external if it is defined and referenced
+      using a fully qualified name with a dot (e.g., <code>main:MyModule.foo = ...</code>);
+      otherwise, it is internal (e.g., <code>bar = ...</code>). Note that
+      Core’s notion of an external identifier does not necessarily
+      coincide with that of <quote>exported</quote> identifier in a Haskell source
+      module. An identifier can be an external identifier in Core, but
+      not be exported by the original Haskell source
+      module.<footnote>
+      Two examples of such identifiers are: data
+      constructors, and values that potentially appear in an unfolding. For an
+      example of the latter, consider <code>Main.foo = ... Main.bar ...</code>, where
+      <code>Main.foo</code> is inlineable. Since <code>bar</code> appears in <code>foo</code>’s unfolding, it is
+      defined and referenced with an external name, even if <code>bar</code> was not
+      exported by the original source module.
+    </footnote>
+    However, if an identifier was exported by the Haskell source
+      module, it will appear as an external name in Core.</para>
+
+      <para>Core modules have no explicit import or export lists. Modules
+      may be mutually recursive. Note that because of the latter fact,
+      GHC currently prints out the top-level bindings for every module
+      as a single recursive group, in order to avoid keeping track of
+      dependencies between top-level values within a module. An
+      external Core tool could reconstruct dependencies later, of
+      course.</para>
+
+      <para>There is also an implicitly-defined module <code>ghc-prim:GHC.Prim</code>,
+      which exports the <quote>built-in</quote> types and values that must be
+      provided by any implementation of Core (including GHC). Details
+      of this module are in the <link linkend="primitive-module">Primitive Module section</link>.</para>
+
+      <para>A Core <emphasis>program</emphasis> is a collection of distinctly-named modules that
+      includes a module called main:Main having an exported value
+      called <code>main:ZCMain.main</code> of type <code>base:GHC.IOBase.IO a</code> (for some
+      type <code>a</code>). (Note that the strangely named wrapper for <code>main</code> is the
+      one exception to the rule that qualified names defined within a
+      module <code>m</code> must have module name <code>m</code>.)</para>
+
+      <para>Many Core programs will contain library modules, such as
+      <code>base:GHC.Base</code>, which implement parts of the Haskell standard
+      library. In principle, these modules are ordinary Haskell
+      modules, with no special status. In practice, the requirement on
+      the type of <code>main:Main.main</code> implies that every program will
+      contain a large subset of the standard library modules.</para>
+
+    </section>
+    <section id="namespaces">
+      <title>Namespaces</title>
+
+      <para>There are five distinct namespaces:
+      <orderedlist>
+        <listitem>module identifiers (<code>mident</code>),</listitem>
+        <listitem>type constructors  (<code>tycon</code>),</listitem>
+        <listitem>type variables     (<code>tyvar</code>),</listitem>
+        <listitem>data constructors  (<code>dcon</code>),</listitem>
+        <listitem>term variables     (<code>var</code>).</listitem>
+      </orderedlist>
+      </para>
+
+      <para>Spaces (1), (2+3), and (4+5) can be distinguished from each
+      other by context. To distinguish (2) from (3) and (4) from (5),
+      we require that data and type constructors begin with an
+      upper-case character, and that term and type variables begin
+      with a lower-case character.</para>
+
+      <para>Primitive types and operators are not syntactically
+      distinguished.</para>
+
+      <para>Primitive <emphasis>coercion</emphasis> operators, of which there are six, <emphasis>are</emphasis>
+      syntactically distinguished in the grammar. This is because
+      these coercions must be fully applied, and because
+      distinguishing their applications in the syntax makes
+      typechecking easier.</para>
+
+      <para>A given variable (type or term) may have multiple definitions
+      within a module. However, definitions of term variables never
+      <quote>shadow</quote> one another: the scope of the definition of a given
+      variable never contains a redefinition of the same variable.
+      Type variables may be shadowed. Thus, if a term variable has
+      multiple definitions within a module, all those definitions must
+      be local (let-bound). The only exception to this rule is that
+      (necessarily closed) types labelling <code>%external</code> expressions may
+      contain <code>tyvar</code> bindings that shadow outer bindings.</para>
+
+      <para>Core generated by GHC makes heavy use of encoded names, in which
+      the characters <code>Z</code> and <code>z</code> are used to introduce escape sequences
+      for non-alphabetic characters such as dollar sign <code>$</code> (<code>zd</code>), hash <code>#</code>
+      (<code>zh</code>), plus <code>+</code> (<code>zp</code>), etc. This is the same encoding used in <code>.hi</code>
+      files and in the back-end of GHC itself, except that we
+      sometimes change an initial <code>z</code> to <code>Z</code>, or vice-versa, in order to
+      maintain case distinctions.</para>
+
+      <para>Finally, note that hierarchical module names are z-encoded in
+      Core: for example, <code>base:GHC.Base.foo</code> is rendered as
+      <code>base:GHCziBase.foo</code>. A parser may reconstruct the module
+      hierarchy, or regard <code>GHCziBase</code> as a flat name.</para>
+
+    </section>
+    <section id="types-and-kinds">
+      <title>Types and Kinds</title>
+
+      <para>In Core, all type abstractions and applications are explicit.
+      This make it easy to typecheck any (closed) fragment of Core
+      code. An full executable typechecker is available separately.</para>
+
+      <section id="types">
+        <title>Types</title>
+
+        <para>Types are described by type expressions, which are built from
+        named type constructors and type variables using type
+        application and universal quantification. Each type
+        constructor has a fixed arity ≥ 0. Because it is so widely
+        used, there is special infix syntax for the fully-applied
+        function type constructor (<code>&arw;</code>). (The prefix identifier for
+        this constructor is <code>ghc-prim:GHC.Prim.ZLzmzgZR</code>; this should
+        only appear in unapplied or partially applied form.)</para>
+
+        <para>There are also a number of other primitive type constructors
+        (e.g., <code>Intzh</code>) that are predefined in the <code>GHC.Prim</code> module, but
+        have no special syntax. <code>%data</code> and <code>%newtype</code> declarations
+        introduce additional type constructors, as described below.
+        Type constructors are distinguished solely by name.</para>
+
+      </section>
+      <section id="coercions">
+        <title>Coercions</title>
+
+        <para>A type may also be built using one of the primitive coercion
+        operators, as described in <link linkend="namespaces">the Namespaces section</link>. For details on the
+        meanings of these operators, see the System FC paper <citation>system-fc</citation>. Also
+        see <link linkend="newtypes">the Newtypes section</link> for
+        examples of how GHC uses coercions in Core code.</para>
+
+      </section>
+      <section id="kinds">
+        <title>Kinds</title>
+        <para>As described in the Haskell definition, it is necessary to
+        distinguish well-formed type-expressions by classifying them
+        into different <emphasis>kinds</emphasis> <citation>haskell98, p. 41</citation><!-- TODO -->. In particular, Core
+        explicitly records the kind of every bound type variable.</para>
+
+        <para>In addition, Core’s kind system includes equality kinds, as in
+        System FC <citation>system-fc</citation>. An application of a built-in coercion, or of a
+        user-defined coercion as introduced by a <code>newtype</code> declaration,
+        has an equality kind.</para>
+
+      </section>
+      <section id="lifted-and-unlifted-types">
+        <title>Lifted and Unlifted Types</title>
+
+        <para>Semantically, a type is <emphasis>lifted</emphasis> if and only if it has bottom as
+        an element. We need to distinguish them because operationally,
+        terms with lifted types may be represented by closures; terms
+        with unlifted types must not be represented by closures, which
+        implies that any unboxed value is necessarily unlifted. We
+        distinguish between lifted and unlifted types by ascribing
+        them different kinds.</para>
+
+        <para>Currently, all the primitive types are unlifted (including a
+        few boxed primitive types such as <code>ByteArrayzh</code>). Peyton-Jones
+        and Launchbury <citation>pj:unboxed</citation> described the ideas behind unboxed and
+        unlifted types.</para>
+
+      </section>
+      <section id="type-constructors-base-kinds-and-higher-kinds">
+        <title>Type Constructors; Base Kinds and Higher Kinds</title>
+
+        <para>Every type constructor has a kind, depending on its arity and
+        whether it or its arguments are lifted.</para>
+
+        <para>Term variables can only be assigned types that have base
+        kinds: the base kinds are <code>*</code>, <code>#</code>, and <code>?</code>. The three base kinds
+        distinguish the liftedness of the types they classify: <code>*</code>
+        represents lifted types; <code>#</code> represents unlifted types; and <code>?</code> is
+        the <quote>open</quote> kind, representing a type that may be either lifted
+        or unlifted. Of these, only <code>*</code> ever appears in Core type
+        declarations generated from user code; the other two are
+        needed to describe certain types in primitive (or otherwise
+        specially-generated) code (which, after optimization, could
+        potentially appear anywhere).</para>
+
+        <para>In particular, no top-level identifier (except in
+        <code>ghc-prim:GHC.Prim</code>) has a type of kind <code>#</code> or <code>?</code>.</para>
+
+        <para>Nullary type constructors have base kinds: for example, the
+        type <code>Int</code> has kind <code>*</code>, and <code>Int#</code> has kind <code>#</code>.</para>
+
+        <para>Non-nullary type constructors have higher kinds: kinds that
+        have the form
+        k<subscript>1</subscript><code>&arw;</code>k<subscript>2</subscript>, where
+        k<subscript>1</subscript> and k<subscript>2</subscript> are
+        kinds. For example, the function type constructor <code>&arw;</code> has
+        kind <code>* &arw; (* &arw; *)</code>. Since Haskell allows abstracting
+        over type constructors, type variables may have higher kinds;
+        however, much more commonly they have kind <code>*</code>, so that is the
+        default if a type binder omits a kind.</para>
+
+      </section>
+
+      <section id="type-synonyms-and-type-equivalence">
+        <title>Type Synonyms and Type Equivalence</title>
+
+        <para>There is no mechanism for defining type synonyms
+        (corresponding to Haskell <code>type</code> declarations).</para>
+
+        <para>Type equivalence is just syntactic equivalence on type
+        expressions (of base kinds) modulo:</para>
+
+        <itemizedlist>
+          <listitem>alpha-renaming of variables bound in <code>%forall</code> types;</listitem>
+          <listitem>the identity a <code>&arw;</code> b ≡ <code>ghc-prim:GHC.Prim.ZLzmzgZR</code> a b</listitem>
+        </itemizedlist>
+        
+      </section>
+    </section>
+    <section id="algebraic-data-types">
+      <title>Algebraic data types</title>
+
+      <para>Each data declaration introduces a new type constructor and a
+      set of one or more data constructors, normally corresponding
+      directly to a source Haskell <code>data</code> declaration. For example, the
+      source declaration
+      
+      <programlisting language="haskell">
+data Bintree a =
+    Fork (Bintree a) (Bintree a)
+  | Leaf a
+      </programlisting>
+
+      might induce the following Core declaration
+      
+      <programlisting language="java">
+%data Bintree a = {
+  Fork (Bintree a) (Bintree a);
+  Leaf a)}
+      </programlisting>
+
+      which introduces the unary type constructor Bintree of kind
+      <code>*&arw;*</code> and two data constructors with types
+      
+      <programlisting language="java">
+Fork :: %forall a . Bintree a &arw; Bintree a &arw; Bintree a
+Leaf :: %forall a . a &arw; Bintree a
+      </programlisting>
+
+      We define the <emphasis>arity</emphasis> of each data constructor to be the number of
+      value arguments it takes; e.g. <code>Fork</code> has arity 2 and <code>Leaf</code> has
+      arity 1.</para>
+
+      <para>For a less conventional example illustrating the possibility of
+      higher-order kinds, the Haskell source declaration
+      
+      <programlisting language="haskell">
+data A f a = MkA (f a)
+      </programlisting>
+
+      might induce the Core declaration
+      
+      <programlisting language="java">
+%data A (f::*&arw;*) a = { MkA (f a) }
+      </programlisting>
+
+      which introduces the constructor
+      
+      <programlisting language="java">
+MkA :: %forall (f::*&arw;*) a . (f a) &arw; (A f) a
+      </programlisting></para>
+
+      <para>GHC (like some other Haskell implementations) supports an
+      extension to Haskell98 for existential types such as
+      
+      <programlisting language="haskell">
+data T = forall a . MkT a (a &arw; Bool)
+      </programlisting>
+
+      This is represented by the Core declaration
+      
+      <programlisting language="java">
+%data T = {MkT @a a (a &arw; Bool)}
+      </programlisting>
+
+      which introduces the nullary type constructor T and the data
+      constructor
+
+      <programlisting language="java">
+MkT :: %forall a . a &arw; (a &arw; Bool) &arw; T
+      </programlisting>
+
+      In general, existentially quantified variables appear as extra
+      universally quantified variables in the data contructor types. An
+      example of how to construct and deconstruct values of type <code>T</code> is
+      shown in <link linkend="expression-forms">the Expression Forms section</link>.</para>
+
+    </section>
+    <section id="newtypes">
+      <title>Newtypes</title>
+
+      <para>Each Core <code>%newtype</code> declaration introduces a new type constructor
+      and an associated representation type, corresponding to a source
+      Haskell <code>newtype</code> declaration. However, unlike in source Haskell,
+      a <code>%newtype</code> declaration does not introduce any data constructors.</para>
+
+      <para>Each <code>%newtype</code> declaration also introduces a new coercion
+      (syntactically, just another type constructor) that implies an
+      axiom equating the type constructor, applied to any type
+      variables bound by the <code>%newtype</code>, to the representation type.</para>
+
+      <para>For example, the Haskell fragment
+      
+      <programlisting language="haskell">
+newtype U = MkU Bool
+u = MkU True
+v = case u of
+  MkU b &arw; not b
+      </programlisting>
+
+      might induce the Core fragment
+
+      <programlisting language="java">
+%newtype U ZCCoU = Bool;
+u :: U = %cast (True)
+           ((%sym ZCCoU));
+v :: Bool = not (%cast (u) ZCCoU);
+      </programlisting></para>
+
+      <para>The <code>newtype</code> declaration implies that the types <code>U</code> and <code>Bool</code> have
+      equivalent representations, and the coercion axiom <code>ZCCoU</code>
+      provides evidence that <code>U</code> is equivalent to <code>Bool</code>. Notice that in
+      the body of <code>u</code>, the boolean value <code>True</code> is cast to type <code>U</code> using
+      the primitive symmetry rule applied to <code>ZCCoU</code>: that is, using a
+      coercion of kind <code>Bool :=: U</code>. And in the body of <code>v</code>, <code>u</code> is cast
+      back to type <code>Bool</code> using the axiom <code>ZCCoU</code>.</para>
+
+      <para>Notice that the <code>case</code> in the Haskell source code above translates
+      to a <code>cast</code> in the corresponding Core code. That is because
+      operationally, a <code>case</code> on a value whose type is declared by a
+      <code>newtype</code> declaration is a no-op. Unlike a <code>case</code> on any other
+      value, such a <code>case</code> does no evaluation: its only function is to
+      coerce its scrutinee’s type.</para>
+
+      <para>Also notice that unlike in a previous draft version of External
+      Core, there is no need to handle recursive newtypes specially.</para>
+
+    </section>
+
+    <section id="expression-forms">
+      <title>Expression Forms</title>
+
+      <para>Variables and data constructors are straightforward.</para>
+
+      <para>Literal (<emphasis role="variable">lit</emphasis>) expressions consist of a literal value, in one of
+      four different formats, and a (primitive) type annotation. Only
+      certain combinations of format and type are permitted;
+      see <link linkend="primitive-module">the Primitive Module section</link>.
+      The character and string formats can describe only 8-bit ASCII characters.</para>
+
+      <para>Moreover, because the operational semantics for Core interprets
+      strings as C-style null-terminated strings, strings should not
+      contain embedded nulls.</para>
+
+      <para>In Core, value applications, type applications, value
+      abstractions, and type abstractions are all explicit. To tell
+      them apart, type arguments in applications and formal type
+      arguments in abstractions are preceded by an <code>@ symbol</code>. (In
+      abstractions, the <code>@</code> plays essentially the same role as the more
+      usual Λ symbol.) For example, the Haskell source declaration
+      
+      <programlisting language="haskell">
+f x = Leaf (Leaf x)
+      </programlisting>
+
+      might induce the Core declaration
+
+      <programlisting language="java">
+f :: %forall a . a &arw; BinTree (BinTree a) =
+  \ @a (x::a) &arw; Leaf @(Bintree a) (Leaf @a x)
+      </programlisting></para>
+
+      <para>Value applications may be of user-defined functions, data
+      constructors, or primitives. None of these sorts of applications
+      are necessarily saturated.</para>
+
+      <para>Note that the arguments of type applications are not always of
+      kind <code>*</code>. For example, given our previous definition of type <code>A</code>:
+
+      <programlisting language="haskell">
+data A f a = MkA (f a)
+      </programlisting>
+
+      the source code
+
+      <programlisting language="haskell">
+MkA (Leaf True)
+      </programlisting>
+
+      becomes
+
+      <programlisting language="java">
+(MkA @Bintree @Bool) (Leaf @Bool True)
+      </programlisting></para>
+
+      <para>Local bindings, of a single variable or of a set of mutually
+      recursive variables, are represented by <code>%let</code> expressions in the
+      usual way.</para>
+
+      <para>By far the most complicated expression form is <code>%case</code>. <code>%case</code>
+      expressions are permitted over values of any type, although they
+      will normally be algebraic or primitive types (with literal
+      values). Evaluating a <code>%case</code> forces the evaluation of the
+      expression being tested (the <quote>scrutinee</quote>). The value of the
+      scrutinee is bound to the variable following the <code>%of</code> keyword,
+      which is in scope in all alternatives; this is useful when the
+      scrutinee is a non-atomic expression (see next example). The
+      scrutinee is preceded by the type of the entire <code>%case</code>
+      expression: that is, the result type that all of the <code>%case</code>
+      alternatives have (this is intended to make type reconstruction
+      easier in the presence of type equality coercions).</para>
+
+      <para>In an algebraic <code>%case</code>, all the case alternatives must be labeled
+      with distinct data constructors from the algebraic type,
+      followed by any existential type variable bindings (see below),
+      and typed term variable bindings corresponding to the data
+      constructor’s arguments. The number of variables must match the
+      data constructor’s arity.</para>
+
+      <para>For example, the following Haskell source expression
+
+      <programlisting language="haskell">
+case g x of
+  Fork l r &arw; Fork r l
+  t@(Leaf v) &arw; Fork t t
+      </programlisting>
+
+      might induce the Core expression
+
+      <programlisting language="java">
+%case ((Bintree a)) g x %of (t::Bintree a)
+  Fork (l::Bintree a) (r::Bintree a) &arw;
+    Fork @a r l
+  Leaf (v::a) &arw;
+    Fork @a t t
+      </programlisting></para>
+
+      <para>When performing a <code>%case</code> over a value of an
+      existentially-quantified algebraic type, the alternative must
+      include extra local type bindings for the
+      existentially-quantified variables. For example, given
+
+      <programlisting language="haskell">
+data T = forall a . MkT a (a &arw; Bool)
+      </programlisting>      
+
+      the source
+
+      <programlisting language="haskell">
+case x of
+  MkT w g &arw; g w
+      </programlisting>      
+
+      becomes
+
+      <programlisting language="java">
+%case x %of (x’::T)
+  MkT @b (w::b) (g::b&arw;Bool) &arw; g w
+      </programlisting></para>
+
+      <para>In a <code>%case</code> over literal alternatives, all the case alternatives
+      must be distinct literals of the same primitive type.</para>
+
+      <para>The list of alternatives may begin with a default alternative
+      labeled with an underscore (<code>%_</code>), whose right-hand side will be
+      evaluated if none of the other alternatives match. The default
+      is optional except for in a case over a primitive type, or when
+      there are no other alternatives. If the case is over neither an
+      algebraic type nor a primitive type, then the list of
+      alternatives must contain a default alternative and nothing
+      else. For algebraic cases, the set of alternatives need not be
+      exhaustive, even if no default is given; if alternatives are
+      missing, this implies that GHC has deduced that they cannot
+      occur.</para>
+
+      <para><code>%cast</code> is used to manipulate newtypes, as described in
+      <link linkend="newtypes">the Newtype section</link>. The <code>%cast</code> expression
+      takes an expression and a coercion: syntactically, the coercion
+      is an arbitrary type, but it must have an equality kind. In an
+      expression <code>(cast e co)</code>, if <code>e :: T</code> and <code>co</code> has kind <code>T :=: U</code>, then
+      the overall expression has type <code>U</code> <citation>ghc-fc-commentary</citation>. Here, <code>co</code> must be a
+      coercion whose left-hand side is <code>T</code>.</para>
+
+      <para>Note that unlike the <code>%coerce</code> expression that existed in previous
+      versions of Core, this means that <code>%cast</code> is (almost) type-safe:
+      the coercion argument provides evidence that can be verified by
+      a typechecker. There are still unsafe <code>%cast</code>s, corresponding to
+      the unsafe <code>%coerce</code> construct that existed in old versions of
+      Core, because there is a primitive unsafe coercion type that can
+      be used to cast arbitrary types to each other. GHC uses this for
+      such purposes as coercing the return type of a function (such as
+      error) which is guaranteed to never return:
+      
+      <programlisting language="haskell">
+case (error &quot;&quot;) of
+  True &arw; 1
+  False &arw; 2
+      </programlisting>
+
+      becomes:
+
+      <programlisting language="java">
+%cast (error @ Bool (ZMZN @ Char))
+(%unsafe Bool Integer);
+      </programlisting>
+
+      <code>%cast</code> has no operational meaning and is only used in
+      typechecking.</para>
+
+      <para>A <code>%note</code> expression carries arbitrary internal information that
+      GHC finds interesting. The information is encoded as a string.
+      Expression notes currently generated by GHC include the inlining
+      pragma (<code>InlineMe</code>) and cost-center labels for profiling.</para>
+
+      <para>A <code>%external</code> expression denotes an external identifier, which has
+      the indicated type (always expressed in terms of Haskell
+      primitive types). External Core supports two kinds of external
+      calls: <code>%external</code> and <code>%dynexternal</code>. Only the former is supported
+      by the current set of stand-alone Core tools. In addition, there
+      is a <code>%label</code> construct which GHC may generate but which the Core
+      tools do not support.</para>
+
+      <para>The present syntax for externals is sufficient for describing C
+      functions and labels. Interfacing to other languages may require
+      additional information or a different interpretation of the name
+      string.</para>
+
+    </section>
+
+    <section id="expression-evaluation">
+      <title>Expression Evaluation</title>
+      <para>The dynamic semantics of Core are defined on the type-erasure of
+      the program: for example, we ignore all type abstractions and
+      applications. The denotational semantics of the resulting
+      type-free program are just the conventional ones for a
+      call-by-name language, in which expressions are only evaluated
+      on demand. But Core is intended to be a call-by-<emphasis>need</emphasis> language,
+      in which expressions are only evaluated once. To express the
+      sharing behavior of call-by-need, we give an operational model
+      in the style of Launchbury <citation>launchbury93natural</citation>.</para>
+
+      <para>This section describes the model informally; a more formal
+      semantics is separately available as an executable interpreter.</para>
+
+      <para>To simplify the semantics, we consider only <quote>well-behaved</quote> Core
+      programs in which constructor and primitive applications are
+      fully saturated, and in which non-trivial expresssions of
+      unlifted kind (<code>#</code>) appear only as scrutinees in <code>%case</code>
+      expressions. Any program can easily be put into this form; a
+      separately available preprocessor illustrates how. In the
+      remainder of this section, we use <quote>Core</quote> to mean <quote>well-behaved</quote>
+      Core.</para>
+
+      <para>Evaluating a Core expression means reducing it to <emphasis>weak-head normal form (WHNF)</emphasis>,
+      i.e., a primitive value, lambda abstraction,
+      or fully-applied data constructor. Evaluating a program means
+      evaluating the expression <code>main:ZCMain.main</code>.</para>
+
+      <para>To make sure that expression evaluation is shared, we make use
+      of a <emphasis>heap</emphasis>, which contains <emphasis>heap entries</emphasis>. A heap entry can be:
+
+      <itemizedlist>
+       <listitem>
+         A <emphasis>thunk</emphasis>, representing an unevaluated expression, also known
+         as a suspension.
+       </listitem>
+       <listitem>
+         A <emphasis>WHNF</emphasis>, representing an evaluated expression. The result of
+         evaluating a thunk is a WHNF. A WHNF is always a closure
+         (corresponding to a lambda abstraction in the source
+         program) or a data constructor application: computations
+         over primitive types are never suspended.
+       </listitem>
+      </itemizedlist></para>
+
+      <para><emphasis>Heap pointers</emphasis> point to heap entries: at different times, the
+      same heap pointer can point to either a thunk or a WHNF, because
+      the run-time system overwrites thunks with WHNFs as computation
+      proceeds.</para>
+
+      <para>The suspended computation that a thunk represents might
+      represent evaluating one of three different kinds of expression.
+      The run-time system allocates a different kind of thunk
+      depending on what kind of expression it is:
+
+      <itemizedlist>
+       <listitem>
+         A thunk for a value definition has a group of suspended
+         defining expressions, along with a list of bindings between
+         defined names and heap pointers to those suspensions. (A
+         value definition may be a recursive group of definitions or
+         a single non-recursive definition, and it may be top-level
+         (global) or <code>let</code>-bound (local)).
+       </listitem>
+       <listitem>
+         A thunk for a function application (where the function is
+         user-defined) has a suspended actual argument expression,
+         and a binding between the formal argument and a heap pointer
+         to that suspension.
+       </listitem>
+       <listitem>
+         A thunk for a constructor application has a suspended actual
+         argument expression; the entire constructed value has a heap
+         pointer to that suspension embedded in it.
+       </listitem>
+      </itemizedlist></para>
+
+      <para>As computation proceeds, copies of the heap pointer for a given
+      thunk propagate through the executing program. When another
+      computation demands the result of that thunk, the thunk is
+      <emphasis>forced</emphasis>: the run-time system computes the thunk’s result,
+      yielding a WHNF, and overwrites the heap entry for the thunk
+      with the WHNF. Now, all copies of the heap pointer point to the
+      new heap entry: a WHNF. Forcing occurs only in the context of
+
+      <itemizedlist>
+        <listitem>evaluating the operator expression of an application;</listitem>
+        <listitem>evaluating the scrutinee of a <code>case</code> expression; or</listitem>
+        <listitem>evaluating an argument to a primitive or external function application</listitem>
+      </itemizedlist>
+      </para>
+
+      <para>When no pointers to a heap entry (whether it is a thunk or WHNF)
+      remain, the garbage collector can reclaim the space it uses. We
+      assume this happens implicitly.</para>
+
+      <para>With the exception of functions, arrays, and mutable variables,
+      we intend that values of all primitive types should be held
+      <emphasis>unboxed</emphasis>: they should not be heap-allocated. This does not
+      violate call-by-need semantics: all primitive types are
+      <emphasis>unlifted</emphasis>, which means that values of those types must be
+      evaluated strictly. Unboxed tuple types are not heap-allocated
+      either.</para>
+
+      <para>Certain primitives and <code>%external</code> functions cause side-effects to
+      state threads or to the real world. Where the ordering of these
+      side-effects matters, Core already forces this order with data
+      dependencies on the pseudo-values representing the threads.</para>
+
+      <para>An implementation must specially support the <code>raisezh</code> and
+      <code>handlezh</code> primitives: for example, by using a handler stack.
+      Again, real-world threading guarantees that they will execute in
+      the correct order.</para>
+
+    </section>
+  </section>
+  <section id="primitive-module">
+    <title>Primitive Module</title>
+
+    <para>The semantics of External Core rely on the contents and informal
+    semantics of the primitive module <code>ghc-prim:GHC.Prim</code>. Nearly all
+    the primitives are required in order to cover GHC’s implementation
+    of the Haskell98 standard prelude; the only operators that can be
+    completely omitted are those supporting the byte-code interpreter,
+    parallelism, and foreign objects. Some of the concurrency
+    primitives are needed, but can be given degenerate implementations
+    if it desired to target a purely sequential backend (see Section
+    <link linkend="non-concurrent-back-end">the Non-concurrent Back End section</link>).</para>
+
+    <para>In addition to these primitives, a large number of C library
+    functions are required to implement the full standard Prelude,
+    particularly to handle I/O and arithmetic on less usual types.</para>
+
+    <para>For a full listing of the names and types of the primitive
+    operators, see the GHC library documentation <citation>ghcprim</citation>.</para>
+
+    <section id="non-concurrent-back-end">
+      <title>Non-concurrent Back End</title>
+
+      <para>The Haskell98 standard prelude doesn’t include any concurrency
+      support, but GHC’s implementation of it relies on the existence
+      of some concurrency primitives. However, it never actually forks
+      multiple threads. Hence, the concurrency primitives can be given
+      degenerate implementations that will work in a non-concurrent
+      setting, as follows:</para>
+
+      <itemizedlist>
+        <listitem>
+         <code>ThreadIdzh</code> can be represented by a singleton type, whose
+         (unique) value is returned by <code>myThreadIdzh</code>.
+        </listitem>
+        <listitem>
+         <code>forkzh</code> can just die with an <quote>unimplemented</quote> message.
+        </listitem>
+        <listitem>
+         <code>killThreadzh</code> and <code>yieldzh</code> can also just die <quote>unimplemented</quote>
+         since in a one-thread world, the only thread a thread can
+         kill is itself, and if a thread yields the program hangs.
+        </listitem>
+        <listitem>
+         <code>MVarzh a</code> can be represented by <code>MutVarzh (Maybe a)</code>; where a
+         concurrent implementation would block, the sequential
+         implementation can just die with a suitable message (since
+         no other thread exists to unblock it).
+        </listitem>
+        <listitem>
+         <code>waitReadzh</code> and <code>waitWritezh</code> can be implemented using a <code>select</code>
+         with no timeout.
+        </listitem>
+      </itemizedlist>
+    </section>
+
+    <section id="literals">
+      <title>Literals</title>
+
+      <para>Only the following combination of literal forms and types are
+      permitted:</para>
+
+      <informaltable frame="none" colsep="0" rowsep="0">
+        <tgroup cols='3'>
+          <colspec colname="literal"     align="left" colwidth="*"  />
+          <colspec colname="type"        align="left" colwidth="*"  />
+          <colspec colname="description" align="left" colwidth="4*" />
+          <thead>
+            <row> 
+              <entry>Literal form</entry>
+              <entry>Type</entry>
+              <entry>Description</entry>
+            </row>
+          </thead>
+          <tbody>
+            <row>
+              <entry morerows="3" valign="top">integer</entry>
+              <entry><code>Intzh</code></entry>
+              <entry>Int</entry>
+            </row>
+            <row>
+              <entry><code>Wordzh</code></entry>
+              <entry>Word</entry>
+            </row>
+            <row>
+              <entry><code>Addrzh</code></entry>
+              <entry>Address</entry>
+            </row>
+            <row>
+              <entry><code>Charzh</code></entry>
+              <entry>Unicode character code</entry>
+            </row>
+
+            <row>
+              <entry morerows="1" valign="top">rational</entry>
+              <entry><code>Floatzh</code></entry>
+              <entry>Float</entry>
+            </row>
+            <row>
+              <entry><code>Doublezh</code></entry>
+              <entry>Double</entry>
+            </row>
+
+            <row>
+              <entry>character</entry>
+              <entry><code>Charzh</code></entry>
+              <entry>Unicode character specified by ASCII character</entry>
+            </row>
+
+            <row>
+              <entry>string</entry>
+              <entry><code>Addrzh</code></entry>
+              <entry>Address of specified C-format string</entry>
+            </row>
+          </tbody>
+        </tgroup>
+      </informaltable>
+    </section>
+  </section>
+
+
+  <bibliolist>
+    <!-- This bibliography was semi-automatically converted by JabRef from core.bib. -->
+
+    <title>References</title>
+    
+    <biblioentry>
+      <abbrev>ghc-user-guide</abbrev>
+      <authorgroup>
+        <author><surname>The GHC Team</surname></author> 
+      </authorgroup>
+      <citetitle pubwork="article">The Glorious Glasgow Haskell Compilation System User's Guide, Version 6.8.2</citetitle>
+      <pubdate>2008</pubdate>
+      <bibliomisc><ulink url="http://www.haskell.org/ghc/docs/latest/html/users_guide/index.html">http://www.haskell.org/ghc/docs/latest/html/users_guide/index.html</ulink></bibliomisc>
+    </biblioentry>
+
+    <biblioentry>
+      <abbrev>ghc-fc-commentary</abbrev>
+      <authorgroup>
+        <author><surname>GHC Wiki</surname></author> 
+      </authorgroup>
+      <citetitle pubwork="article">System FC: equality constraints and coercions</citetitle>
+      <pubdate>2006</pubdate>  
+      <bibliomisc><ulink url="http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/FC">http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/FC</ulink></bibliomisc>
+    </biblioentry>
+
+    <biblioentry>
+      <abbrev>ghc-api</abbrev>
+      <authorgroup>
+        <author><surname>Haskell Wiki</surname></author> 
+      </authorgroup>
+      <citetitle pubwork="article">Using GHC as a library</citetitle>
+      <pubdate>2007</pubdate>  
+      <bibliomisc><ulink url="http://haskell.org/haskellwiki/GHC/As_a_library">http://haskell.org/haskellwiki/GHC/As_a_library</ulink></bibliomisc>
+    </biblioentry>
+
+    <biblioentry>
+      <abbrev>haskell98</abbrev>
+      <authorgroup>
+        <editor><firstname>Simon</firstname><surname>Peyton-Jones</surname></editor> 
+      </authorgroup>
+      <citetitle pubwork="article">Haskell 98 Language and Libraries: The Revised Report</citetitle>
+      <publisher>
+        <publishername>Cambridge University Press</publishername>
+        <address>
+          <city>Cambridge></city>
+          <state>UK</state>
+        </address>
+      </publisher>
+      <pubdate>2003</pubdate>
+    </biblioentry>
+
+    <biblioentry>
+      <abbrev>system-fc</abbrev>
+      <authorgroup>
+        <author><firstname>Martin</firstname><surname>Sulzmann</surname></author>
+        <author><firstname>Manuel M.T.</firstname><surname>Chakravarty</surname></author>
+        <author><firstname>Simon</firstname><surname>Peyton-Jones</surname></author>
+        <author><firstname>Kevin</firstname><surname>Donnelly</surname></author> 
+      </authorgroup>
+      <citetitle pubwork="article">System F with type equality coercions</citetitle>
+      <publisher>
+        <publishername>ACM</publishername>
+        <address>
+          <city>New York</city>
+          <state>NY</state>
+          <country>USA</country>
+        </address>
+      </publisher>
+      <artpagenums>53-66</artpagenums> 
+      <pubdate>2007</pubdate>  
+      <bibliomisc><ulink url="http://portal.acm.org/citation.cfm?id=1190324">http://portal.acm.org/citation.cfm?id=1190324</ulink></bibliomisc>
+      <!-- booktitle = {{TLDI '07: Proceedings of the 2007 ACM SIGPLAN International Workshop on Types in Language Design and Implementation}}, -->
+    </biblioentry>
+
+    <biblioentry>
+      <abbrev>gadts</abbrev>
+      <authorgroup>
+        <author><firstname>Simon</firstname><surname>Peyton-Jones</surname></author>
+        <author><firstname>Dimitrios</firstname><surname>Vytiniotis</surname></author>
+        <author><firstname>Stephanie</firstname><surname>Weirich</surname></author>
+        <author><firstname>Geoffrey</firstname><surname>Washburn</surname></author> 
+      </authorgroup>
+      <citetitle pubwork="article">Simple unification-based type inference for GADTs</citetitle>
+      <publisher>
+        <publishername>ACM</publishername>
+        <address>
+          <city>New York</city>
+          <state>NY</state>
+          <country>USA</country>
+        </address>
+      </publisher>
+      <artpagenums>50-61</artpagenums> 
+      <pubdate>2006</pubdate>  
+      <bibliomisc><ulink url="http://research.microsoft.com/Users/simonpj/papers/gadt/index.htm">http://research.microsoft.com/Users/simonpj/papers/gadt/index.htm</ulink></bibliomisc>
+    </biblioentry>
+
+    <biblioentry>
+      <abbrev>Launchbury94</abbrev>
+      <authorgroup>
+        <author><firstname>John</firstname><surname>Launchbury</surname></author>
+        <author><firstname>Simon L.</firstname><surname>Peyton-Jones</surname></author> 
+      </authorgroup>
+      <citetitle pubwork="article">Lazy Functional State Threads</citetitle>
+      <artpagenums>24-35</artpagenums> 
+      <pubdate>1994</pubdate>  
+      <bibliomisc><ulink url="http://citeseer.ist.psu.edu/article/launchbury93lazy.html">http://citeseer.ist.psu.edu/article/launchbury93lazy.html</ulink></bibliomisc>
+      <!--      booktitle = "{SIGPLAN} {Conference} on {Programming Language Design and Implementation}", -->
+    </biblioentry>
+
+    <biblioentry>
+      <abbrev>pj:unboxed</abbrev>
+      <authorgroup>
+        <author><firstname>Simon L.</firstname><surname>Peyton-Jones</surname></author>
+        <author><firstname>John</firstname><surname>Launchbury</surname></author> 
+        <editor><firstname>J.</firstname><surname>Hughes</surname></editor> 
+      </authorgroup>
+      <citetitle pubwork="article">Unboxed Values as First Class Citizens in a Non-strict Functional Language</citetitle>
+      <publisher>
+        <publishername>Springer-Verlag LNCS523</publishername>
+        <address>
+          <city>Cambridge</city>
+          <state>Massachussetts</state>
+          <country>USA</country>
+        </address>
+      </publisher>
+      <artpagenums>636-666</artpagenums> 
+      <pubdate>1991, August 26-28</pubdate>  
+      <bibliomisc><ulink url="http://citeseer.ist.psu.edu/jones91unboxed.html">http://citeseer.ist.psu.edu/jones91unboxed.html</ulink></bibliomisc>
+      <!--      booktitle = "Proceedings of the Conference on Functional Programming and Computer Architecture", -->
+    </biblioentry>
+
+    <biblioentry>
+      <abbrev>ghc-inliner</abbrev>
+      <authorgroup>
+        <author><firstname>Simon</firstname><surname>Peyton-Jones</surname></author>
+        <author><firstname>Simon</firstname><surname>Marlow</surname></author> 
+      </authorgroup>
+      <citetitle pubwork="article">Secrets of the Glasgow Haskell Compiler inliner</citetitle>
+      <pubdate>1999</pubdate>  
+      <address>
+        <city>Paris</city>
+        <country>France</country>
+      </address>
+      <bibliomisc><ulink url="http://research.microsoft.com/Users/simonpj/Papers/inlining/inline.pdf">http://research.microsoft.com/Users/simonpj/Papers/inlining/inline.pdf</ulink></bibliomisc>
+      <!-- booktitle = "Workshop on Implementing Declarative Languages", -->
+    </biblioentry>
+
+    <biblioentry>
+      <abbrev>comp-by-trans-scp</abbrev>
+      <authorgroup>
+        <author><firstname>Simon L.</firstname><surname>Peyton-Jones</surname></author>
+        <author><firstname>A. L. M.</firstname><surname>Santos</surname></author> 
+      </authorgroup>
+      <citetitle pubwork="article">A transformation-based optimiser for Haskell</citetitle>
+      <citetitle pubwork="journal">Science of Computer Programming</citetitle>
+      <volumenum>32</volumenum> 
+      <issuenum>1-3</issuenum>
+      <artpagenums>3-47</artpagenums> 
+      <pubdate>1998</pubdate>  
+      <bibliomisc><ulink url="http://citeseer.ist.psu.edu/peytonjones98transformationbased.html">http://citeseer.ist.psu.edu/peytonjones98transformationbased.html</ulink></bibliomisc>
+    </biblioentry>
+
+    <biblioentry>
+      <abbrev>stg-machine</abbrev>
+      <authorgroup>
+        <author><firstname>Simon L.</firstname><surname>Peyton-Jones</surname></author> 
+      </authorgroup>
+      <citetitle pubwork="article">Implementing Lazy Functional Languages on Stock Hardware: The Spineless Tagless G-Machine</citetitle>
+      <citetitle pubwork="journal">Journal of Functional Programming</citetitle>
+      <volumenum>2</volumenum> 
+      <issuenum>2</issuenum>
+      <artpagenums>127-202</artpagenums> 
+      <pubdate>1992</pubdate>  
+      <bibliomisc><ulink url="http://citeseer.ist.psu.edu/peytonjones92implementing.html">http://citeseer.ist.psu.edu/peytonjones92implementing.html</ulink></bibliomisc>
+    </biblioentry>
+
+    <biblioentry>
+      <abbrev>launchbury93natural</abbrev>
+      <authorgroup>
+        <author><firstname>John</firstname><surname>Launchbury</surname></author> 
+      </authorgroup>
+      <citetitle pubwork="article">A Natural Semantics for Lazy Evaluation</citetitle>
+      <artpagenums>144-154</artpagenums> 
+      <address>
+        <city>Charleston</city>
+        <state>South Carolina</state>
+      </address>
+      <pubdate>1993</pubdate>  
+      <bibliomisc><ulink url="citeseer.ist.psu.edu/launchbury93natural.html">citeseer.ist.psu.edu/launchbury93natural.html</ulink></bibliomisc>
+      <!-- booktitle = "Conference Record of the Twentieth Annual {ACM} {SIGPLAN}-{SIGACT} Symposium on Principles of Programming Languages", -->
+    </biblioentry>
+
+    <biblioentry>
+      <abbrev>ghcprim</abbrev>
+      <authorgroup>
+        <author><surname>The GHC Team</surname></author> 
+      </authorgroup>
+      <citetitle pubwork="article">Library documentation: GHC.Prim</citetitle>
+      <pubdate>2008</pubdate>
+      <bibliomisc><ulink url="http://www.haskell.org/ghc/docs/latest/html/libraries/base/GHC-Prim.html">http://www.haskell.org/ghc/docs/latest/html/libraries/base/GHC-Prim.html</ulink></bibliomisc>  
+    </biblioentry>
+  </bibliolist>
+
+</chapter>
index 1ff487c..4d4489f 100644 (file)
@@ -17,6 +17,7 @@
 &lang-features;
 &ffi-chap;
 &extending-ghc;
+&external-core;
 &wrong;
 &utils;
 &win32-dll;
index c83abaa..37df568 100644 (file)
@@ -12,6 +12,7 @@
 <!ENTITY sooner         SYSTEM "sooner.xml" >
 <!ENTITY lang-features  SYSTEM "lang.xml" >
 <!ENTITY glasgowexts    SYSTEM "glasgow_exts.xml" >
+<!ENTITY external-core  SYSTEM "external_core.xml" >
 <!ENTITY packages       SYSTEM "packages.xml" >
 <!ENTITY parallel       SYSTEM "parallel.xml" >
 <!ENTITY safehaskell    SYSTEM "safe_haskell.xml" >
@@ -28,3 +29,4 @@
 <!ENTITY libraryBaseLocation    "../libraries/base-@LIBRARY_base_VERSION@">
 <!ENTITY libraryCabalLocation   "../libraries/Cabal-@LIBRARY_Cabal_VERSION@">
 <!ENTITY libraryGhcPrimLocation "../libraries/ghc-prim-@LIBRARY_ghc_prim_VERSION@">
+<!ENTITY arw "-&gt;">