Fix links in #special-ids section
[ghc.git] / docs / users_guide / glasgow_exts.xml
1 <?xml version="1.0" encoding="iso-8859-1"?>
2 <para>
3 <indexterm><primary>language, GHC</primary></indexterm>
4 <indexterm><primary>extensions, GHC</primary></indexterm>
5 As with all known Haskell systems, GHC implements some extensions to
6 the language. They can all be enabled or disabled by commandline flags
7 or language pragmas. By default GHC understands the most recent Haskell
8 version it supports, plus a handful of extensions.
9 </para>
10
11 <para>
12 Some of the Glasgow extensions serve to give you access to the
13 underlying facilities with which we implement Haskell. Thus, you can
14 get at the Raw Iron, if you are willing to write some non-portable
15 code at a more primitive level. You need not be &ldquo;stuck&rdquo;
16 on performance because of the implementation costs of Haskell's
17 &ldquo;high-level&rdquo; features&mdash;you can always code
18 &ldquo;under&rdquo; them. In an extreme case, you can write all your
19 time-critical code in C, and then just glue it together with Haskell!
20 </para>
21
22 <para>
23 Before you get too carried away working at the lowest level (e.g.,
24 sloshing <literal>MutableByteArray&num;</literal>s around your
25 program), you may wish to check if there are libraries that provide a
26 &ldquo;Haskellised veneer&rdquo; over the features you want. The
27 separate <ulink url="../libraries/index.html">libraries
28 documentation</ulink> describes all the libraries that come with GHC.
29 </para>
30
31 <!-- LANGUAGE OPTIONS -->
32 <sect1 id="options-language">
33 <title>Language options</title>
34
35 <indexterm><primary>language</primary><secondary>option</secondary>
36 </indexterm>
37 <indexterm><primary>options</primary><secondary>language</secondary>
38 </indexterm>
39 <indexterm><primary>extensions</primary><secondary>options controlling</secondary>
40 </indexterm>
41
42 <para>The language option flags control what variation of the language are
43 permitted.</para>
44
45 <para>Language options can be controlled in two ways:
46 <itemizedlist>
47 <listitem><para>Every language option can switched on by a command-line flag "<option>-X...</option>"
48 (e.g. <option>-XTemplateHaskell</option>), and switched off by the flag "<option>-XNo...</option>";
49 (e.g. <option>-XNoTemplateHaskell</option>).</para></listitem>
50 <listitem><para>
51 Language options recognised by Cabal can also be enabled using the <literal>LANGUAGE</literal> pragma,
52 thus <literal>{-# LANGUAGE TemplateHaskell #-}</literal> (see <xref linkend="language-pragma"/>). </para>
53 </listitem>
54 </itemizedlist></para>
55
56 <para>The flag <option>-fglasgow-exts</option>
57 <indexterm><primary><option>-fglasgow-exts</option></primary></indexterm>
58 is equivalent to enabling the following extensions:
59 &what_glasgow_exts_does;
60 Enabling these options is the <emphasis>only</emphasis>
61 effect of <option>-fglasgow-exts</option>.
62 We are trying to move away from this portmanteau flag,
63 and towards enabling features individually.</para>
64
65 </sect1>
66
67 <!-- UNBOXED TYPES AND PRIMITIVE OPERATIONS -->
68 <sect1 id="primitives">
69 <title>Unboxed types and primitive operations</title>
70
71 <para>GHC is built on a raft of primitive data types and operations;
72 "primitive" in the sense that they cannot be defined in Haskell itself.
73 While you really can use this stuff to write fast code,
74 we generally find it a lot less painful, and more satisfying in the
75 long run, to use higher-level language features and libraries. With
76 any luck, the code you write will be optimised to the efficient
77 unboxed version in any case. And if it isn't, we'd like to know
78 about it.</para>
79
80 <para>All these primitive data types and operations are exported by the
81 library <literal>GHC.Prim</literal>, for which there is
82 <ulink url="&libraryGhcPrimLocation;/GHC-Prim.html">detailed online documentation</ulink>.
83 (This documentation is generated from the file <filename>compiler/prelude/primops.txt.pp</filename>.)
84 </para>
85
86 <para>
87 If you want to mention any of the primitive data types or operations in your
88 program, you must first import <literal>GHC.Prim</literal> to bring them
89 into scope. Many of them have names ending in "&num;", and to mention such
90 names you need the <option>-XMagicHash</option> extension (<xref linkend="magic-hash"/>).
91 </para>
92
93 <para>The primops make extensive use of <link linkend="glasgow-unboxed">unboxed types</link>
94 and <link linkend="unboxed-tuples">unboxed tuples</link>, which
95 we briefly summarise here. </para>
96
97 <sect2 id="glasgow-unboxed">
98 <title>Unboxed types</title>
99
100 <para>
101 <indexterm><primary>Unboxed types (Glasgow extension)</primary></indexterm>
102 </para>
103
104 <para>Most types in GHC are <firstterm>boxed</firstterm>, which means
105 that values of that type are represented by a pointer to a heap
106 object. The representation of a Haskell <literal>Int</literal>, for
107 example, is a two-word heap object. An <firstterm>unboxed</firstterm>
108 type, however, is represented by the value itself, no pointers or heap
109 allocation are involved.
110 </para>
111
112 <para>
113 Unboxed types correspond to the &ldquo;raw machine&rdquo; types you
114 would use in C: <literal>Int&num;</literal> (long int),
115 <literal>Double&num;</literal> (double), <literal>Addr&num;</literal>
116 (void *), etc. The <emphasis>primitive operations</emphasis>
117 (PrimOps) on these types are what you might expect; e.g.,
118 <literal>(+&num;)</literal> is addition on
119 <literal>Int&num;</literal>s, and is the machine-addition that we all
120 know and love&mdash;usually one instruction.
121 </para>
122
123 <para>
124 Primitive (unboxed) types cannot be defined in Haskell, and are
125 therefore built into the language and compiler. Primitive types are
126 always unlifted; that is, a value of a primitive type cannot be
127 bottom. We use the convention (but it is only a convention)
128 that primitive types, values, and
129 operations have a <literal>&num;</literal> suffix (see <xref linkend="magic-hash"/>).
130 For some primitive types we have special syntax for literals, also
131 described in the <link linkend="magic-hash">same section</link>.
132 </para>
133
134 <para>
135 Primitive values are often represented by a simple bit-pattern, such
136 as <literal>Int&num;</literal>, <literal>Float&num;</literal>,
137 <literal>Double&num;</literal>. But this is not necessarily the case:
138 a primitive value might be represented by a pointer to a
139 heap-allocated object. Examples include
140 <literal>Array&num;</literal>, the type of primitive arrays. A
141 primitive array is heap-allocated because it is too big a value to fit
142 in a register, and would be too expensive to copy around; in a sense,
143 it is accidental that it is represented by a pointer. If a pointer
144 represents a primitive value, then it really does point to that value:
145 no unevaluated thunks, no indirections&hellip;nothing can be at the
146 other end of the pointer than the primitive value.
147 A numerically-intensive program using unboxed types can
148 go a <emphasis>lot</emphasis> faster than its &ldquo;standard&rdquo;
149 counterpart&mdash;we saw a threefold speedup on one example.
150 </para>
151
152 <para>
153 There are some restrictions on the use of primitive types:
154 <itemizedlist>
155 <listitem><para>The main restriction
156 is that you can't pass a primitive value to a polymorphic
157 function or store one in a polymorphic data type. This rules out
158 things like <literal>[Int&num;]</literal> (i.e. lists of primitive
159 integers). The reason for this restriction is that polymorphic
160 arguments and constructor fields are assumed to be pointers: if an
161 unboxed integer is stored in one of these, the garbage collector would
162 attempt to follow it, leading to unpredictable space leaks. Or a
163 <function>seq</function> operation on the polymorphic component may
164 attempt to dereference the pointer, with disastrous results. Even
165 worse, the unboxed value might be larger than a pointer
166 (<literal>Double&num;</literal> for instance).
167 </para>
168 </listitem>
169 <listitem><para> You cannot define a newtype whose representation type
170 (the argument type of the data constructor) is an unboxed type. Thus,
171 this is illegal:
172 <programlisting>
173 newtype A = MkA Int#
174 </programlisting>
175 </para></listitem>
176 <listitem><para> You cannot bind a variable with an unboxed type
177 in a <emphasis>top-level</emphasis> binding.
178 </para></listitem>
179 <listitem><para> You cannot bind a variable with an unboxed type
180 in a <emphasis>recursive</emphasis> binding.
181 </para></listitem>
182 <listitem><para> You may bind unboxed variables in a (non-recursive,
183 non-top-level) pattern binding, but you must make any such pattern-match
184 strict. For example, rather than:
185 <programlisting>
186 data Foo = Foo Int Int#
187
188 f x = let (Foo a b, w) = ..rhs.. in ..body..
189 </programlisting>
190 you must write:
191 <programlisting>
192 data Foo = Foo Int Int#
193
194 f x = let !(Foo a b, w) = ..rhs.. in ..body..
195 </programlisting>
196 since <literal>b</literal> has type <literal>Int#</literal>.
197 </para>
198 </listitem>
199 </itemizedlist>
200 </para>
201
202 </sect2>
203
204 <sect2 id="unboxed-tuples">
205 <title>Unboxed tuples</title>
206
207 <para>
208 Unboxed tuples aren't really exported by <literal>GHC.Exts</literal>;
209 they are a syntactic extension enabled by the language flag <option>-XUnboxedTuples</option>. An
210 unboxed tuple looks like this:
211 </para>
212
213 <para>
214
215 <programlisting>
216 (# e_1, ..., e_n #)
217 </programlisting>
218
219 </para>
220
221 <para>
222 where <literal>e&lowbar;1..e&lowbar;n</literal> are expressions of any
223 type (primitive or non-primitive). The type of an unboxed tuple looks
224 the same.
225 </para>
226
227 <para>
228 Note that when unboxed tuples are enabled,
229 <literal>(#</literal> is a single lexeme, so for example when using
230 operators like <literal>#</literal> and <literal>#-</literal> you need
231 to write <literal>( # )</literal> and <literal>( #- )</literal> rather than
232 <literal>(#)</literal> and <literal>(#-)</literal>.
233 </para>
234
235 <para>
236 Unboxed tuples are used for functions that need to return multiple
237 values, but they avoid the heap allocation normally associated with
238 using fully-fledged tuples. When an unboxed tuple is returned, the
239 components are put directly into registers or on the stack; the
240 unboxed tuple itself does not have a composite representation. Many
241 of the primitive operations listed in <literal>primops.txt.pp</literal> return unboxed
242 tuples.
243 In particular, the <literal>IO</literal> and <literal>ST</literal> monads use unboxed
244 tuples to avoid unnecessary allocation during sequences of operations.
245 </para>
246
247 <para>
248 There are some restrictions on the use of unboxed tuples:
249 <itemizedlist>
250
251 <listitem>
252 <para>
253 Values of unboxed tuple types are subject to the same restrictions as
254 other unboxed types; i.e. they may not be stored in polymorphic data
255 structures or passed to polymorphic functions.
256 </para>
257 </listitem>
258
259 <listitem>
260 <para>
261 The typical use of unboxed tuples is simply to return multiple values,
262 binding those multiple results with a <literal>case</literal> expression, thus:
263 <programlisting>
264 f x y = (# x+1, y-1 #)
265 g x = case f x x of { (# a, b #) -&#62; a + b }
266 </programlisting>
267 You can have an unboxed tuple in a pattern binding, thus
268 <programlisting>
269 f x = let (# p,q #) = h x in ..body..
270 </programlisting>
271 If the types of <literal>p</literal> and <literal>q</literal> are not unboxed,
272 the resulting binding is lazy like any other Haskell pattern binding. The
273 above example desugars like this:
274 <programlisting>
275 f x = let t = case h x o f{ (# p,q #) -> (p,q)
276 p = fst t
277 q = snd t
278 in ..body..
279 </programlisting>
280 Indeed, the bindings can even be recursive.
281 </para>
282 </listitem>
283 </itemizedlist>
284
285 </para>
286
287 </sect2>
288 </sect1>
289
290
291 <!-- ====================== SYNTACTIC EXTENSIONS ======================= -->
292
293 <sect1 id="syntax-extns">
294 <title>Syntactic extensions</title>
295
296 <sect2 id="unicode-syntax">
297 <title>Unicode syntax</title>
298 <para>The language
299 extension <option>-XUnicodeSyntax</option><indexterm><primary><option>-XUnicodeSyntax</option></primary></indexterm>
300 enables Unicode characters to be used to stand for certain ASCII
301 character sequences. The following alternatives are provided:</para>
302
303 <informaltable>
304 <tgroup cols="2" align="left" colsep="1" rowsep="1">
305 <thead>
306 <row>
307 <entry>ASCII</entry>
308 <entry>Unicode alternative</entry>
309 <entry>Code point</entry>
310 <entry>Name</entry>
311 </row>
312 </thead>
313
314 <!--
315 to find the DocBook entities for these characters, find
316 the Unicode code point (e.g. 0x2237), and grep for it in
317 /usr/share/sgml/docbook/xml-dtd-*/ent/* (or equivalent on
318 your system. Some of these Unicode code points don't have
319 equivalent DocBook entities.
320 -->
321
322 <tbody>
323 <row>
324 <entry><literal>::</literal></entry>
325 <entry>::</entry> <!-- no special char, apparently -->
326 <entry>0x2237</entry>
327 <entry>PROPORTION</entry>
328 </row>
329 </tbody>
330 <tbody>
331 <row>
332 <entry><literal>=&gt;</literal></entry>
333 <entry>&rArr;</entry>
334 <entry>0x21D2</entry>
335 <entry>RIGHTWARDS DOUBLE ARROW</entry>
336 </row>
337 </tbody>
338 <tbody>
339 <row>
340 <entry><literal>forall</literal></entry>
341 <entry>&forall;</entry>
342 <entry>0x2200</entry>
343 <entry>FOR ALL</entry>
344 </row>
345 </tbody>
346 <tbody>
347 <row>
348 <entry><literal>-&gt;</literal></entry>
349 <entry>&rarr;</entry>
350 <entry>0x2192</entry>
351 <entry>RIGHTWARDS ARROW</entry>
352 </row>
353 </tbody>
354 <tbody>
355 <row>
356 <entry><literal>&lt;-</literal></entry>
357 <entry>&larr;</entry>
358 <entry>0x2190</entry>
359 <entry>LEFTWARDS ARROW</entry>
360 </row>
361 </tbody>
362
363 <tbody>
364 <row>
365 <entry>-&lt;</entry>
366 <entry>&larrtl;</entry>
367 <entry>0x2919</entry>
368 <entry>LEFTWARDS ARROW-TAIL</entry>
369 </row>
370 </tbody>
371
372 <tbody>
373 <row>
374 <entry>&gt;-</entry>
375 <entry>&rarrtl;</entry>
376 <entry>0x291A</entry>
377 <entry>RIGHTWARDS ARROW-TAIL</entry>
378 </row>
379 </tbody>
380
381 <tbody>
382 <row>
383 <entry>-&lt;&lt;</entry>
384 <entry></entry>
385 <entry>0x291B</entry>
386 <entry>LEFTWARDS DOUBLE ARROW-TAIL</entry>
387 </row>
388 </tbody>
389
390 <tbody>
391 <row>
392 <entry>&gt;&gt;-</entry>
393 <entry></entry>
394 <entry>0x291C</entry>
395 <entry>RIGHTWARDS DOUBLE ARROW-TAIL</entry>
396 </row>
397 </tbody>
398
399 <tbody>
400 <row>
401 <entry>*</entry>
402 <entry>&starf;</entry>
403 <entry>0x2605</entry>
404 <entry>BLACK STAR</entry>
405 </row>
406 </tbody>
407
408 </tgroup>
409 </informaltable>
410 </sect2>
411
412 <sect2 id="magic-hash">
413 <title>The magic hash</title>
414 <para>The language extension <option>-XMagicHash</option> allows "&num;" as a
415 postfix modifier to identifiers. Thus, "x&num;" is a valid variable, and "T&num;" is
416 a valid type constructor or data constructor.</para>
417
418 <para>The hash sign does not change semantics at all. We tend to use variable
419 names ending in "&num;" for unboxed values or types (e.g. <literal>Int&num;</literal>),
420 but there is no requirement to do so; they are just plain ordinary variables.
421 Nor does the <option>-XMagicHash</option> extension bring anything into scope.
422 For example, to bring <literal>Int&num;</literal> into scope you must
423 import <literal>GHC.Prim</literal> (see <xref linkend="primitives"/>);
424 the <option>-XMagicHash</option> extension
425 then allows you to <emphasis>refer</emphasis> to the <literal>Int&num;</literal>
426 that is now in scope.</para>
427 <para> The <option>-XMagicHash</option> also enables some new forms of literals (see <xref linkend="glasgow-unboxed"/>):
428 <itemizedlist>
429 <listitem><para> <literal>'x'&num;</literal> has type <literal>Char&num;</literal></para> </listitem>
430 <listitem><para> <literal>&quot;foo&quot;&num;</literal> has type <literal>Addr&num;</literal></para> </listitem>
431 <listitem><para> <literal>3&num;</literal> has type <literal>Int&num;</literal>. In general,
432 any Haskell integer lexeme followed by a <literal>&num;</literal> is an <literal>Int&num;</literal> literal, e.g.
433 <literal>-0x3A&num;</literal> as well as <literal>32&num;</literal></para>.</listitem>
434 <listitem><para> <literal>3&num;&num;</literal> has type <literal>Word&num;</literal>. In general,
435 any non-negative Haskell integer lexeme followed by <literal>&num;&num;</literal>
436 is a <literal>Word&num;</literal>. </para> </listitem>
437 <listitem><para> <literal>3.2&num;</literal> has type <literal>Float&num;</literal>.</para> </listitem>
438 <listitem><para> <literal>3.2&num;&num;</literal> has type <literal>Double&num;</literal></para> </listitem>
439 </itemizedlist>
440 </para>
441 </sect2>
442
443 <sect2 id="negative-literals">
444 <title>Negative Literals</title>
445 <para>
446 The literal <literal>-123</literal> is, according to
447 Haskell98 and Haskell 2010, desugared as
448 <literal>negate (fromInteger 123)</literal>.
449 The language extension <option>-XNegativeLiterals</option>
450 means that it is instead desugared as
451 <literal>fromInteger (-123)</literal>.
452 </para>
453
454 <para>
455 This can make a difference when the positive and negative range of
456 a numeric data type don't match up. For example,
457 in 8-bit arithmetic -128 is representable, but +128 is not.
458 So <literal>negate (fromInteger 128)</literal> will elicit an
459 unexpected integer-literal-overflow message.
460 </para>
461 </sect2>
462
463 <sect2 id="num-decimals">
464 <title>Fractional looking integer literals</title>
465 <para>
466 Haskell 2010 and Haskell 98 define floating literals with
467 the syntax <literal>1.2e6</literal>. These literals have the
468 type <literal>Fractional a => Fractional</literal>.
469 </para>
470
471 <para>
472 The language extension <option>-XNumLiterals</option> allows
473 you to also use the floating literal syntax for instances of
474 <literal>Integral</literal>, and have values like
475 <literal>(1.2e6 :: Num a => a)</literal>
476 </para>
477 </sect2>
478
479
480 <!-- ====================== HIERARCHICAL MODULES ======================= -->
481
482
483 <sect2 id="hierarchical-modules">
484 <title>Hierarchical Modules</title>
485
486 <para>GHC supports a small extension to the syntax of module
487 names: a module name is allowed to contain a dot
488 <literal>&lsquo;.&rsquo;</literal>. This is also known as the
489 &ldquo;hierarchical module namespace&rdquo; extension, because
490 it extends the normally flat Haskell module namespace into a
491 more flexible hierarchy of modules.</para>
492
493 <para>This extension has very little impact on the language
494 itself; modules names are <emphasis>always</emphasis> fully
495 qualified, so you can just think of the fully qualified module
496 name as <quote>the module name</quote>. In particular, this
497 means that the full module name must be given after the
498 <literal>module</literal> keyword at the beginning of the
499 module; for example, the module <literal>A.B.C</literal> must
500 begin</para>
501
502 <programlisting>module A.B.C</programlisting>
503
504
505 <para>It is a common strategy to use the <literal>as</literal>
506 keyword to save some typing when using qualified names with
507 hierarchical modules. For example:</para>
508
509 <programlisting>
510 import qualified Control.Monad.ST.Strict as ST
511 </programlisting>
512
513 <para>For details on how GHC searches for source and interface
514 files in the presence of hierarchical modules, see <xref
515 linkend="search-path"/>.</para>
516
517 <para>GHC comes with a large collection of libraries arranged
518 hierarchically; see the accompanying <ulink
519 url="../libraries/index.html">library
520 documentation</ulink>. More libraries to install are available
521 from <ulink
522 url="http://hackage.haskell.org/packages/hackage.html">HackageDB</ulink>.</para>
523 </sect2>
524
525 <!-- ====================== PATTERN GUARDS ======================= -->
526
527 <sect2 id="pattern-guards">
528 <title>Pattern guards</title>
529
530 <para>
531 <indexterm><primary>Pattern guards (Glasgow extension)</primary></indexterm>
532 The discussion that follows is an abbreviated version of Simon Peyton Jones's original <ulink url="http://research.microsoft.com/~simonpj/Haskell/guards.html">proposal</ulink>. (Note that the proposal was written before pattern guards were implemented, so refers to them as unimplemented.)
533 </para>
534
535 <para>
536 Suppose we have an abstract data type of finite maps, with a
537 lookup operation:
538
539 <programlisting>
540 lookup :: FiniteMap -> Int -> Maybe Int
541 </programlisting>
542
543 The lookup returns <function>Nothing</function> if the supplied key is not in the domain of the mapping, and <function>(Just v)</function> otherwise,
544 where <varname>v</varname> is the value that the key maps to. Now consider the following definition:
545 </para>
546
547 <programlisting>
548 clunky env var1 var2 | ok1 &amp;&amp; ok2 = val1 + val2
549 | otherwise = var1 + var2
550 where
551 m1 = lookup env var1
552 m2 = lookup env var2
553 ok1 = maybeToBool m1
554 ok2 = maybeToBool m2
555 val1 = expectJust m1
556 val2 = expectJust m2
557 </programlisting>
558
559 <para>
560 The auxiliary functions are
561 </para>
562
563 <programlisting>
564 maybeToBool :: Maybe a -&gt; Bool
565 maybeToBool (Just x) = True
566 maybeToBool Nothing = False
567
568 expectJust :: Maybe a -&gt; a
569 expectJust (Just x) = x
570 expectJust Nothing = error "Unexpected Nothing"
571 </programlisting>
572
573 <para>
574 What is <function>clunky</function> doing? The guard <literal>ok1 &amp;&amp;
575 ok2</literal> checks that both lookups succeed, using
576 <function>maybeToBool</function> to convert the <function>Maybe</function>
577 types to booleans. The (lazily evaluated) <function>expectJust</function>
578 calls extract the values from the results of the lookups, and binds the
579 returned values to <varname>val1</varname> and <varname>val2</varname>
580 respectively. If either lookup fails, then clunky takes the
581 <literal>otherwise</literal> case and returns the sum of its arguments.
582 </para>
583
584 <para>
585 This is certainly legal Haskell, but it is a tremendously verbose and
586 un-obvious way to achieve the desired effect. Arguably, a more direct way
587 to write clunky would be to use case expressions:
588 </para>
589
590 <programlisting>
591 clunky env var1 var2 = case lookup env var1 of
592 Nothing -&gt; fail
593 Just val1 -&gt; case lookup env var2 of
594 Nothing -&gt; fail
595 Just val2 -&gt; val1 + val2
596 where
597 fail = var1 + var2
598 </programlisting>
599
600 <para>
601 This is a bit shorter, but hardly better. Of course, we can rewrite any set
602 of pattern-matching, guarded equations as case expressions; that is
603 precisely what the compiler does when compiling equations! The reason that
604 Haskell provides guarded equations is because they allow us to write down
605 the cases we want to consider, one at a time, independently of each other.
606 This structure is hidden in the case version. Two of the right-hand sides
607 are really the same (<function>fail</function>), and the whole expression
608 tends to become more and more indented.
609 </para>
610
611 <para>
612 Here is how I would write clunky:
613 </para>
614
615 <programlisting>
616 clunky env var1 var2
617 | Just val1 &lt;- lookup env var1
618 , Just val2 &lt;- lookup env var2
619 = val1 + val2
620 ...other equations for clunky...
621 </programlisting>
622
623 <para>
624 The semantics should be clear enough. The qualifiers are matched in order.
625 For a <literal>&lt;-</literal> qualifier, which I call a pattern guard, the
626 right hand side is evaluated and matched against the pattern on the left.
627 If the match fails then the whole guard fails and the next equation is
628 tried. If it succeeds, then the appropriate binding takes place, and the
629 next qualifier is matched, in the augmented environment. Unlike list
630 comprehensions, however, the type of the expression to the right of the
631 <literal>&lt;-</literal> is the same as the type of the pattern to its
632 left. The bindings introduced by pattern guards scope over all the
633 remaining guard qualifiers, and over the right hand side of the equation.
634 </para>
635
636 <para>
637 Just as with list comprehensions, boolean expressions can be freely mixed
638 with among the pattern guards. For example:
639 </para>
640
641 <programlisting>
642 f x | [y] &lt;- x
643 , y > 3
644 , Just z &lt;- h y
645 = ...
646 </programlisting>
647
648 <para>
649 Haskell's current guards therefore emerge as a special case, in which the
650 qualifier list has just one element, a boolean expression.
651 </para>
652 </sect2>
653
654 <!-- ===================== View patterns =================== -->
655
656 <sect2 id="view-patterns">
657 <title>View patterns
658 </title>
659
660 <para>
661 View patterns are enabled by the flag <literal>-XViewPatterns</literal>.
662 More information and examples of view patterns can be found on the
663 <ulink url="http://hackage.haskell.org/trac/ghc/wiki/ViewPatterns">Wiki
664 page</ulink>.
665 </para>
666
667 <para>
668 View patterns are somewhat like pattern guards that can be nested inside
669 of other patterns. They are a convenient way of pattern-matching
670 against values of abstract types. For example, in a programming language
671 implementation, we might represent the syntax of the types of the
672 language as follows:
673
674 <programlisting>
675 type Typ
676
677 data TypView = Unit
678 | Arrow Typ Typ
679
680 view :: Typ -> TypView
681
682 -- additional operations for constructing Typ's ...
683 </programlisting>
684
685 The representation of Typ is held abstract, permitting implementations
686 to use a fancy representation (e.g., hash-consing to manage sharing).
687
688 Without view patterns, using this signature a little inconvenient:
689 <programlisting>
690 size :: Typ -> Integer
691 size t = case view t of
692 Unit -> 1
693 Arrow t1 t2 -> size t1 + size t2
694 </programlisting>
695
696 It is necessary to iterate the case, rather than using an equational
697 function definition. And the situation is even worse when the matching
698 against <literal>t</literal> is buried deep inside another pattern.
699 </para>
700
701 <para>
702 View patterns permit calling the view function inside the pattern and
703 matching against the result:
704 <programlisting>
705 size (view -> Unit) = 1
706 size (view -> Arrow t1 t2) = size t1 + size t2
707 </programlisting>
708
709 That is, we add a new form of pattern, written
710 <replaceable>expression</replaceable> <literal>-></literal>
711 <replaceable>pattern</replaceable> that means "apply the expression to
712 whatever we're trying to match against, and then match the result of
713 that application against the pattern". The expression can be any Haskell
714 expression of function type, and view patterns can be used wherever
715 patterns are used.
716 </para>
717
718 <para>
719 The semantics of a pattern <literal>(</literal>
720 <replaceable>exp</replaceable> <literal>-></literal>
721 <replaceable>pat</replaceable> <literal>)</literal> are as follows:
722
723 <itemizedlist>
724
725 <listitem> Scoping:
726
727 <para>The variables bound by the view pattern are the variables bound by
728 <replaceable>pat</replaceable>.
729 </para>
730
731 <para>
732 Any variables in <replaceable>exp</replaceable> are bound occurrences,
733 but variables bound "to the left" in a pattern are in scope. This
734 feature permits, for example, one argument to a function to be used in
735 the view of another argument. For example, the function
736 <literal>clunky</literal> from <xref linkend="pattern-guards" /> can be
737 written using view patterns as follows:
738
739 <programlisting>
740 clunky env (lookup env -> Just val1) (lookup env -> Just val2) = val1 + val2
741 ...other equations for clunky...
742 </programlisting>
743 </para>
744
745 <para>
746 More precisely, the scoping rules are:
747 <itemizedlist>
748 <listitem>
749 <para>
750 In a single pattern, variables bound by patterns to the left of a view
751 pattern expression are in scope. For example:
752 <programlisting>
753 example :: Maybe ((String -> Integer,Integer), String) -> Bool
754 example Just ((f,_), f -> 4) = True
755 </programlisting>
756
757 Additionally, in function definitions, variables bound by matching earlier curried
758 arguments may be used in view pattern expressions in later arguments:
759 <programlisting>
760 example :: (String -> Integer) -> String -> Bool
761 example f (f -> 4) = True
762 </programlisting>
763 That is, the scoping is the same as it would be if the curried arguments
764 were collected into a tuple.
765 </para>
766 </listitem>
767
768 <listitem>
769 <para>
770 In mutually recursive bindings, such as <literal>let</literal>,
771 <literal>where</literal>, or the top level, view patterns in one
772 declaration may not mention variables bound by other declarations. That
773 is, each declaration must be self-contained. For example, the following
774 program is not allowed:
775 <programlisting>
776 let {(x -> y) = e1 ;
777 (y -> x) = e2 } in x
778 </programlisting>
779
780 (For some amplification on this design choice see
781 <ulink url="http://hackage.haskell.org/trac/ghc/ticket/4061">Trac #4061</ulink>.)
782
783 </para>
784 </listitem>
785 </itemizedlist>
786
787 </para>
788 </listitem>
789
790 <listitem><para> Typing: If <replaceable>exp</replaceable> has type
791 <replaceable>T1</replaceable> <literal>-></literal>
792 <replaceable>T2</replaceable> and <replaceable>pat</replaceable> matches
793 a <replaceable>T2</replaceable>, then the whole view pattern matches a
794 <replaceable>T1</replaceable>.
795 </para></listitem>
796
797 <listitem><para> Matching: To the equations in Section 3.17.3 of the
798 <ulink url="http://www.haskell.org/onlinereport/">Haskell 98
799 Report</ulink>, add the following:
800 <programlisting>
801 case v of { (e -> p) -> e1 ; _ -> e2 }
802 =
803 case (e v) of { p -> e1 ; _ -> e2 }
804 </programlisting>
805 That is, to match a variable <replaceable>v</replaceable> against a pattern
806 <literal>(</literal> <replaceable>exp</replaceable>
807 <literal>-></literal> <replaceable>pat</replaceable>
808 <literal>)</literal>, evaluate <literal>(</literal>
809 <replaceable>exp</replaceable> <replaceable> v</replaceable>
810 <literal>)</literal> and match the result against
811 <replaceable>pat</replaceable>.
812 </para></listitem>
813
814 <listitem><para> Efficiency: When the same view function is applied in
815 multiple branches of a function definition or a case expression (e.g.,
816 in <literal>size</literal> above), GHC makes an attempt to collect these
817 applications into a single nested case expression, so that the view
818 function is only applied once. Pattern compilation in GHC follows the
819 matrix algorithm described in Chapter 4 of <ulink
820 url="http://research.microsoft.com/~simonpj/Papers/slpj-book-1987/">The
821 Implementation of Functional Programming Languages</ulink>. When the
822 top rows of the first column of a matrix are all view patterns with the
823 "same" expression, these patterns are transformed into a single nested
824 case. This includes, for example, adjacent view patterns that line up
825 in a tuple, as in
826 <programlisting>
827 f ((view -> A, p1), p2) = e1
828 f ((view -> B, p3), p4) = e2
829 </programlisting>
830 </para>
831
832 <para> The current notion of when two view pattern expressions are "the
833 same" is very restricted: it is not even full syntactic equality.
834 However, it does include variables, literals, applications, and tuples;
835 e.g., two instances of <literal>view ("hi", "there")</literal> will be
836 collected. However, the current implementation does not compare up to
837 alpha-equivalence, so two instances of <literal>(x, view x ->
838 y)</literal> will not be coalesced.
839 </para>
840
841 </listitem>
842
843 </itemizedlist>
844 </para>
845
846 </sect2>
847
848 <!-- ===================== n+k patterns =================== -->
849
850 <sect2 id="n-k-patterns">
851 <title>n+k patterns</title>
852 <indexterm><primary><option>-XNPlusKPatterns</option></primary></indexterm>
853
854 <para>
855 <literal>n+k</literal> pattern support is disabled by default. To enable
856 it, you can use the <option>-XNPlusKPatterns</option> flag.
857 </para>
858
859 </sect2>
860
861 <!-- ===================== Traditional record syntax =================== -->
862
863 <sect2 id="traditional-record-syntax">
864 <title>Traditional record syntax</title>
865 <indexterm><primary><option>-XNoTraditionalRecordSyntax</option></primary></indexterm>
866
867 <para>
868 Traditional record syntax, such as <literal>C {f = x}</literal>, is enabled by default.
869 To disable it, you can use the <option>-XNoTraditionalRecordSyntax</option> flag.
870 </para>
871
872 </sect2>
873
874 <!-- ===================== Recursive do-notation =================== -->
875
876 <sect2 id="recursive-do-notation">
877 <title>The recursive do-notation
878 </title>
879
880 <para>
881 The do-notation of Haskell 98 does not allow <emphasis>recursive bindings</emphasis>,
882 that is, the variables bound in a do-expression are visible only in the textually following
883 code block. Compare this to a let-expression, where bound variables are visible in the entire binding
884 group.
885 </para>
886
887 <para>
888 It turns out that such recursive bindings do indeed make sense for a variety of monads, but
889 not all. In particular, recursion in this sense requires a fixed-point operator for the underlying
890 monad, captured by the <literal>mfix</literal> method of the <literal>MonadFix</literal> class, defined in <literal>Control.Monad.Fix</literal> as follows:
891 <programlisting>
892 class Monad m => MonadFix m where
893 mfix :: (a -> m a) -> m a
894 </programlisting>
895 Haskell's
896 <literal>Maybe</literal>, <literal>[]</literal> (list), <literal>ST</literal> (both strict and lazy versions),
897 <literal>IO</literal>, and many other monads have <literal>MonadFix</literal> instances. On the negative
898 side, the continuation monad, with the signature <literal>(a -> r) -> r</literal>, does not.
899 </para>
900
901 <para>
902 For monads that do belong to the <literal>MonadFix</literal> class, GHC provides
903 an extended version of the do-notation that allows recursive bindings.
904 The <option>-XRecursiveDo</option> (language pragma: <literal>RecursiveDo</literal>)
905 provides the necessary syntactic support, introducing the keywords <literal>mdo</literal> and
906 <literal>rec</literal> for higher and lower levels of the notation respectively. Unlike
907 bindings in a <literal>do</literal> expression, those introduced by <literal>mdo</literal> and <literal>rec</literal>
908 are recursively defined, much like in an ordinary let-expression. Due to the new
909 keyword <literal>mdo</literal>, we also call this notation the <emphasis>mdo-notation</emphasis>.
910 </para>
911
912 <para>
913 Here is a simple (albeit contrived) example:
914 <programlisting>
915 {-# LANGUAGE RecursiveDo #-}
916 justOnes = mdo { xs &lt;- Just (1:xs)
917 ; return (map negate xs) }
918 </programlisting>
919 or equivalently
920 <programlisting>
921 {-# LANGUAGE RecursiveDo #-}
922 justOnes = do { rec { xs &lt;- Just (1:xs) }
923 ; return (map negate xs) }
924 </programlisting>
925 As you can guess <literal>justOnes</literal> will evaluate to <literal>Just [-1,-1,-1,...</literal>.
926 </para>
927
928 <para>
929 GHC's implementation the mdo-notation closely follows the original translation as described in the paper
930 <ulink url="https://sites.google.com/site/leventerkok/recdo.pdf">A recursive do for Haskell</ulink>, which
931 in turn is based on the work <ulink url="http://sites.google.com/site/leventerkok/erkok-thesis.pdf">Value Recursion
932 in Monadic Computations</ulink>. Furthermore, GHC extends the syntax described in the former paper
933 with a lower level syntax flagged by the <literal>rec</literal> keyword, as we describe next.
934 </para>
935
936 <sect3>
937 <title>Recursive binding groups</title>
938
939 <para>
940 The flag <option>-XRecursiveDo</option> also introduces a new keyword <literal>rec</literal>, which wraps a
941 mutually-recursive group of monadic statements inside a <literal>do</literal> expression, producing a single statement.
942 Similar to a <literal>let</literal> statement inside a <literal>do</literal>, variables bound in
943 the <literal>rec</literal> are visible throughout the <literal>rec</literal> group, and below it. For example, compare
944 <programlisting>
945 do { a &lt;- getChar do { a &lt;- getChar
946 ; let { r1 = f a r2 ; rec { r1 &lt;- f a r2
947 ; ; r2 = g r1 } ; ; r2 &lt;- g r1 }
948 ; return (r1 ++ r2) } ; return (r1 ++ r2) }
949 </programlisting>
950 In both cases, <literal>r1</literal> and <literal>r2</literal> are available both throughout
951 the <literal>let</literal> or <literal>rec</literal> block, and in the statements that follow it.
952 The difference is that <literal>let</literal> is non-monadic, while <literal>rec</literal> is monadic.
953 (In Haskell <literal>let</literal> is really <literal>letrec</literal>, of course.)
954 </para>
955
956 <para>
957 The semantics of <literal>rec</literal> is fairly straightforward. Whenever GHC finds a <literal>rec</literal>
958 group, it will compute its set of bound variables, and will introduce an appropriate call
959 to the underlying monadic value-recursion operator <literal>mfix</literal>, belonging to the
960 <literal>MonadFix</literal> class. Here is an example:
961 <programlisting>
962 rec { b &lt;- f a c ===> (b,c) &lt;- mfix (\ ~(b,c) -> do { b &lt;- f a c
963 ; c &lt;- f b a } ; c &lt;- f b a
964 ; return (b,c) })
965 </programlisting>
966 As usual, the meta-variables <literal>b</literal>, <literal>c</literal> etc., can be arbitrary patterns.
967 In general, the statement <literal>rec <replaceable>ss</replaceable></literal> is desugared to the statement
968 <programlisting>
969 <replaceable>vs</replaceable> &lt;- mfix (\ ~<replaceable>vs</replaceable> -&gt; do { <replaceable>ss</replaceable>; return <replaceable>vs</replaceable> })
970 </programlisting>
971 where <replaceable>vs</replaceable> is a tuple of the variables bound by <replaceable>ss</replaceable>.
972 </para>
973
974 <para>
975 Note in particular that the translation for a <literal>rec</literal> block only involves wrapping a call
976 to <literal>mfix</literal>: it performs no other analysis on the bindings. The latter is the task
977 for the <literal>mdo</literal> notation, which is described next.
978 </para>
979 </sect3>
980
981 <sect3>
982 <title>The <literal>mdo</literal> notation</title>
983
984 <para>
985 A <literal>rec</literal>-block tells the compiler where precisely the recursive knot should be tied. It turns out that
986 the placement of the recursive knots can be rather delicate: in particular, we would like the knots to be wrapped
987 around as minimal groups as possible. This process is known as <emphasis>segmentation</emphasis>, and is described
988 in detail in Secton 3.2 of <ulink url="https://sites.google.com/site/leventerkok/recdo.pdf">A recursive do for
989 Haskell</ulink>. Segmentation improves polymorphism and reduces the size of the recursive knot. Most importantly, it avoids
990 unnecessary interference caused by a fundamental issue with the so-called <emphasis>right-shrinking</emphasis>
991 axiom for monadic recursion. In brief, most monads of interest (IO, strict state, etc.) do <emphasis>not</emphasis>
992 have recursion operators that satisfy this axiom, and thus not performing segmentation can cause unnecessary
993 interference, changing the termination behavior of the resulting translation.
994 (Details can be found in Sections 3.1 and 7.2.2 of
995 <ulink url="http://sites.google.com/site/leventerkok/erkok-thesis.pdf">Value Recursion in Monadic Computations</ulink>.)
996 </para>
997
998 <para>
999 The <literal>mdo</literal> notation removes the burden of placing
1000 explicit <literal>rec</literal> blocks in the code. Unlike an
1001 ordinary <literal>do</literal> expression, in which variables bound by
1002 statements are only in scope for later statements, variables bound in
1003 an <literal>mdo</literal> expression are in scope for all statements
1004 of the expression. The compiler then automatically identifies minimal
1005 mutually recursively dependent segments of statements, treating them as
1006 if the user had wrapped a <literal>rec</literal> qualifier around them.
1007 </para>
1008
1009 <para>
1010 The definition is syntactic:
1011 </para>
1012 <itemizedlist>
1013 <listitem>
1014 <para>
1015 A generator <replaceable>g</replaceable>
1016 <emphasis>depends</emphasis> on a textually following generator
1017 <replaceable>g'</replaceable>, if
1018 </para>
1019 <itemizedlist>
1020 <listitem>
1021 <para>
1022 <replaceable>g'</replaceable> defines a variable that
1023 is used by <replaceable>g</replaceable>, or
1024 </para>
1025 </listitem>
1026 <listitem>
1027 <para>
1028 <replaceable>g'</replaceable> textually appears between
1029 <replaceable>g</replaceable> and
1030 <replaceable>g''</replaceable>, where <replaceable>g</replaceable>
1031 depends on <replaceable>g''</replaceable>.
1032 </para>
1033 </listitem>
1034 </itemizedlist>
1035 </listitem>
1036 <listitem>
1037 <para>
1038 A <emphasis>segment</emphasis> of a given
1039 <literal>mdo</literal>-expression is a minimal sequence of generators
1040 such that no generator of the sequence depends on an outside
1041 generator. As a special case, although it is not a generator,
1042 the final expression in an <literal>mdo</literal>-expression is
1043 considered to form a segment by itself.
1044 </para>
1045 </listitem>
1046 </itemizedlist>
1047 <para>
1048 Segments in this sense are
1049 related to <emphasis>strongly-connected components</emphasis> analysis,
1050 with the exception that bindings in a segment cannot be reordered and
1051 must be contiguous.
1052 </para>
1053
1054 <para>
1055 Here is an example <literal>mdo</literal>-expression, and its translation to <literal>rec</literal> blocks:
1056 <programlisting>
1057 mdo { a &lt;- getChar ===> do { a &lt;- getChar
1058 ; b &lt;- f a c ; rec { b &lt;- f a c
1059 ; c &lt;- f b a ; ; c &lt;- f b a }
1060 ; z &lt;- h a b ; z &lt;- h a b
1061 ; d &lt;- g d e ; rec { d &lt;- g d e
1062 ; e &lt;- g a z ; ; e &lt;- g a z }
1063 ; putChar c } ; putChar c }
1064 </programlisting>
1065 Note that a given <literal>mdo</literal> expression can cause the creation of multiple <literal>rec</literal> blocks.
1066 If there are no recursive dependencies, <literal>mdo</literal> will introduce no <literal>rec</literal> blocks. In this
1067 latter case an <literal>mdo</literal> expression is precisely the same as a <literal>do</literal> expression, as one
1068 would expect.
1069 </para>
1070
1071 <para>
1072 In summary, given an <literal>mdo</literal> expression, GHC first performs segmentation, introducing
1073 <literal>rec</literal> blocks to wrap over minimal recursive groups. Then, each resulting
1074 <literal>rec</literal> is desugared, using a call to <literal>Control.Monad.Fix.mfix</literal> as described
1075 in the previous section. The original <literal>mdo</literal>-expression typechecks exactly when the desugared
1076 version would do so.
1077 </para>
1078
1079 <para>
1080 Here are some other important points in using the recursive-do notation:
1081
1082 <itemizedlist>
1083 <listitem>
1084 <para>
1085 It is enabled with the flag <literal>-XRecursiveDo</literal>, or the <literal>LANGUAGE RecursiveDo</literal>
1086 pragma. (The same flag enables both <literal>mdo</literal>-notation, and the use of <literal>rec</literal>
1087 blocks inside <literal>do</literal> expressions.)
1088 </para>
1089 </listitem>
1090 <listitem>
1091 <para>
1092 <literal>rec</literal> blocks can also be used inside <literal>mdo</literal>-expressions, which will be
1093 treated as a single statement. However, it is good style to either use <literal>mdo</literal> or
1094 <literal>rec</literal> blocks in a single expression.
1095 </para>
1096 </listitem>
1097 <listitem>
1098 <para>
1099 If recursive bindings are required for a monad, then that monad must be declared an instance of
1100 the <literal>MonadFix</literal> class.
1101 </para>
1102 </listitem>
1103 <listitem>
1104 <para>
1105 The following instances of <literal>MonadFix</literal> are automatically provided: List, Maybe, IO.
1106 Furthermore, the <literal>Control.Monad.ST</literal> and <literal>Control.Monad.ST.Lazy</literal>
1107 modules provide the instances of the <literal>MonadFix</literal> class for Haskell's internal
1108 state monad (strict and lazy, respectively).
1109 </para>
1110 </listitem>
1111 <listitem>
1112 <para>
1113 Like <literal>let</literal> and <literal>where</literal> bindings, name shadowing is not allowed within
1114 an <literal>mdo</literal>-expression or a <literal>rec</literal>-block; that is, all the names bound in
1115 a single <literal>rec</literal> must be distinct. (GHC will complain if this is not the case.)
1116 </para>
1117 </listitem>
1118 </itemizedlist>
1119 </para>
1120 </sect3>
1121
1122
1123 </sect2>
1124
1125
1126 <!-- ===================== PARALLEL LIST COMPREHENSIONS =================== -->
1127
1128 <sect2 id="parallel-list-comprehensions">
1129 <title>Parallel List Comprehensions</title>
1130 <indexterm><primary>list comprehensions</primary><secondary>parallel</secondary>
1131 </indexterm>
1132 <indexterm><primary>parallel list comprehensions</primary>
1133 </indexterm>
1134
1135 <para>Parallel list comprehensions are a natural extension to list
1136 comprehensions. List comprehensions can be thought of as a nice
1137 syntax for writing maps and filters. Parallel comprehensions
1138 extend this to include the zipWith family.</para>
1139
1140 <para>A parallel list comprehension has multiple independent
1141 branches of qualifier lists, each separated by a `|' symbol. For
1142 example, the following zips together two lists:</para>
1143
1144 <programlisting>
1145 [ (x, y) | x &lt;- xs | y &lt;- ys ]
1146 </programlisting>
1147
1148 <para>The behaviour of parallel list comprehensions follows that of
1149 zip, in that the resulting list will have the same length as the
1150 shortest branch.</para>
1151
1152 <para>We can define parallel list comprehensions by translation to
1153 regular comprehensions. Here's the basic idea:</para>
1154
1155 <para>Given a parallel comprehension of the form: </para>
1156
1157 <programlisting>
1158 [ e | p1 &lt;- e11, p2 &lt;- e12, ...
1159 | q1 &lt;- e21, q2 &lt;- e22, ...
1160 ...
1161 ]
1162 </programlisting>
1163
1164 <para>This will be translated to: </para>
1165
1166 <programlisting>
1167 [ e | ((p1,p2), (q1,q2), ...) &lt;- zipN [(p1,p2) | p1 &lt;- e11, p2 &lt;- e12, ...]
1168 [(q1,q2) | q1 &lt;- e21, q2 &lt;- e22, ...]
1169 ...
1170 ]
1171 </programlisting>
1172
1173 <para>where `zipN' is the appropriate zip for the given number of
1174 branches.</para>
1175
1176 </sect2>
1177
1178 <!-- ===================== TRANSFORM LIST COMPREHENSIONS =================== -->
1179
1180 <sect2 id="generalised-list-comprehensions">
1181 <title>Generalised (SQL-Like) List Comprehensions</title>
1182 <indexterm><primary>list comprehensions</primary><secondary>generalised</secondary>
1183 </indexterm>
1184 <indexterm><primary>extended list comprehensions</primary>
1185 </indexterm>
1186 <indexterm><primary>group</primary></indexterm>
1187 <indexterm><primary>sql</primary></indexterm>
1188
1189
1190 <para>Generalised list comprehensions are a further enhancement to the
1191 list comprehension syntactic sugar to allow operations such as sorting
1192 and grouping which are familiar from SQL. They are fully described in the
1193 paper <ulink url="http://research.microsoft.com/~simonpj/papers/list-comp">
1194 Comprehensive comprehensions: comprehensions with "order by" and "group by"</ulink>,
1195 except that the syntax we use differs slightly from the paper.</para>
1196 <para>The extension is enabled with the flag <option>-XTransformListComp</option>.</para>
1197 <para>Here is an example:
1198 <programlisting>
1199 employees = [ ("Simon", "MS", 80)
1200 , ("Erik", "MS", 100)
1201 , ("Phil", "Ed", 40)
1202 , ("Gordon", "Ed", 45)
1203 , ("Paul", "Yale", 60)]
1204
1205 output = [ (the dept, sum salary)
1206 | (name, dept, salary) &lt;- employees
1207 , then group by dept using groupWith
1208 , then sortWith by (sum salary)
1209 , then take 5 ]
1210 </programlisting>
1211 In this example, the list <literal>output</literal> would take on
1212 the value:
1213
1214 <programlisting>
1215 [("Yale", 60), ("Ed", 85), ("MS", 180)]
1216 </programlisting>
1217 </para>
1218 <para>There are three new keywords: <literal>group</literal>, <literal>by</literal>, and <literal>using</literal>.
1219 (The functions <literal>sortWith</literal> and <literal>groupWith</literal> are not keywords; they are ordinary
1220 functions that are exported by <literal>GHC.Exts</literal>.)</para>
1221
1222 <para>There are five new forms of comprehension qualifier,
1223 all introduced by the (existing) keyword <literal>then</literal>:
1224 <itemizedlist>
1225 <listitem>
1226
1227 <programlisting>
1228 then f
1229 </programlisting>
1230
1231 This statement requires that <literal>f</literal> have the type <literal>
1232 forall a. [a] -> [a]</literal>. You can see an example of its use in the
1233 motivating example, as this form is used to apply <literal>take 5</literal>.
1234
1235 </listitem>
1236
1237
1238 <listitem>
1239 <para>
1240 <programlisting>
1241 then f by e
1242 </programlisting>
1243
1244 This form is similar to the previous one, but allows you to create a function
1245 which will be passed as the first argument to f. As a consequence f must have
1246 the type <literal>forall a. (a -> t) -> [a] -> [a]</literal>. As you can see
1247 from the type, this function lets f &quot;project out&quot; some information
1248 from the elements of the list it is transforming.</para>
1249
1250 <para>An example is shown in the opening example, where <literal>sortWith</literal>
1251 is supplied with a function that lets it find out the <literal>sum salary</literal>
1252 for any item in the list comprehension it transforms.</para>
1253
1254 </listitem>
1255
1256
1257 <listitem>
1258
1259 <programlisting>
1260 then group by e using f
1261 </programlisting>
1262
1263 <para>This is the most general of the grouping-type statements. In this form,
1264 f is required to have type <literal>forall a. (a -> t) -> [a] -> [[a]]</literal>.
1265 As with the <literal>then f by e</literal> case above, the first argument
1266 is a function supplied to f by the compiler which lets it compute e on every
1267 element of the list being transformed. However, unlike the non-grouping case,
1268 f additionally partitions the list into a number of sublists: this means that
1269 at every point after this statement, binders occurring before it in the comprehension
1270 refer to <emphasis>lists</emphasis> of possible values, not single values. To help understand
1271 this, let's look at an example:</para>
1272
1273 <programlisting>
1274 -- This works similarly to groupWith in GHC.Exts, but doesn't sort its input first
1275 groupRuns :: Eq b => (a -> b) -> [a] -> [[a]]
1276 groupRuns f = groupBy (\x y -> f x == f y)
1277
1278 output = [ (the x, y)
1279 | x &lt;- ([1..3] ++ [1..2])
1280 , y &lt;- [4..6]
1281 , then group by x using groupRuns ]
1282 </programlisting>
1283
1284 <para>This results in the variable <literal>output</literal> taking on the value below:</para>
1285
1286 <programlisting>
1287 [(1, [4, 5, 6]), (2, [4, 5, 6]), (3, [4, 5, 6]), (1, [4, 5, 6]), (2, [4, 5, 6])]
1288 </programlisting>
1289
1290 <para>Note that we have used the <literal>the</literal> function to change the type
1291 of x from a list to its original numeric type. The variable y, in contrast, is left
1292 unchanged from the list form introduced by the grouping.</para>
1293
1294 </listitem>
1295
1296 <listitem>
1297
1298 <programlisting>
1299 then group using f
1300 </programlisting>
1301
1302 <para>With this form of the group statement, f is required to simply have the type
1303 <literal>forall a. [a] -> [[a]]</literal>, which will be used to group up the
1304 comprehension so far directly. An example of this form is as follows:</para>
1305
1306 <programlisting>
1307 output = [ x
1308 | y &lt;- [1..5]
1309 , x &lt;- "hello"
1310 , then group using inits]
1311 </programlisting>
1312
1313 <para>This will yield a list containing every prefix of the word "hello" written out 5 times:</para>
1314
1315 <programlisting>
1316 ["","h","he","hel","hell","hello","helloh","hellohe","hellohel","hellohell","hellohello","hellohelloh",...]
1317 </programlisting>
1318
1319 </listitem>
1320 </itemizedlist>
1321 </para>
1322 </sect2>
1323
1324 <!-- ===================== MONAD COMPREHENSIONS ===================== -->
1325
1326 <sect2 id="monad-comprehensions">
1327 <title>Monad comprehensions</title>
1328 <indexterm><primary>monad comprehensions</primary></indexterm>
1329
1330 <para>
1331 Monad comprehensions generalise the list comprehension notation,
1332 including parallel comprehensions
1333 (<xref linkend="parallel-list-comprehensions"/>) and
1334 transform comprehensions (<xref linkend="generalised-list-comprehensions"/>)
1335 to work for any monad.
1336 </para>
1337
1338 <para>Monad comprehensions support:</para>
1339
1340 <itemizedlist>
1341 <listitem>
1342 <para>
1343 Bindings:
1344 </para>
1345
1346 <programlisting>
1347 [ x + y | x &lt;- Just 1, y &lt;- Just 2 ]
1348 </programlisting>
1349
1350 <para>
1351 Bindings are translated with the <literal>(&gt;&gt;=)</literal> and
1352 <literal>return</literal> functions to the usual do-notation:
1353 </para>
1354
1355 <programlisting>
1356 do x &lt;- Just 1
1357 y &lt;- Just 2
1358 return (x+y)
1359 </programlisting>
1360
1361 </listitem>
1362 <listitem>
1363 <para>
1364 Guards:
1365 </para>
1366
1367 <programlisting>
1368 [ x | x &lt;- [1..10], x &lt;= 5 ]
1369 </programlisting>
1370
1371 <para>
1372 Guards are translated with the <literal>guard</literal> function,
1373 which requires a <literal>MonadPlus</literal> instance:
1374 </para>
1375
1376 <programlisting>
1377 do x &lt;- [1..10]
1378 guard (x &lt;= 5)
1379 return x
1380 </programlisting>
1381
1382 </listitem>
1383 <listitem>
1384 <para>
1385 Transform statements (as with <literal>-XTransformListComp</literal>):
1386 </para>
1387
1388 <programlisting>
1389 [ x+y | x &lt;- [1..10], y &lt;- [1..x], then take 2 ]
1390 </programlisting>
1391
1392 <para>
1393 This translates to:
1394 </para>
1395
1396 <programlisting>
1397 do (x,y) &lt;- take 2 (do x &lt;- [1..10]
1398 y &lt;- [1..x]
1399 return (x,y))
1400 return (x+y)
1401 </programlisting>
1402
1403 </listitem>
1404 <listitem>
1405 <para>
1406 Group statements (as with <literal>-XTransformListComp</literal>):
1407 </para>
1408
1409 <programlisting>
1410 [ x | x &lt;- [1,1,2,2,3], then group by x using GHC.Exts.groupWith ]
1411 [ x | x &lt;- [1,1,2,2,3], then group using myGroup ]
1412 </programlisting>
1413
1414 </listitem>
1415 <listitem>
1416 <para>
1417 Parallel statements (as with <literal>-XParallelListComp</literal>):
1418 </para>
1419
1420 <programlisting>
1421 [ (x+y) | x &lt;- [1..10]
1422 | y &lt;- [11..20]
1423 ]
1424 </programlisting>
1425
1426 <para>
1427 Parallel statements are translated using the
1428 <literal>mzip</literal> function, which requires a
1429 <literal>MonadZip</literal> instance defined in
1430 <ulink url="&libraryBaseLocation;/Control-Monad-Zip.html"><literal>Control.Monad.Zip</literal></ulink>:
1431 </para>
1432
1433 <programlisting>
1434 do (x,y) &lt;- mzip (do x &lt;- [1..10]
1435 return x)
1436 (do y &lt;- [11..20]
1437 return y)
1438 return (x+y)
1439 </programlisting>
1440
1441 </listitem>
1442 </itemizedlist>
1443
1444 <para>
1445 All these features are enabled by default if the
1446 <literal>MonadComprehensions</literal> extension is enabled. The types
1447 and more detailed examples on how to use comprehensions are explained
1448 in the previous chapters <xref
1449 linkend="generalised-list-comprehensions"/> and <xref
1450 linkend="parallel-list-comprehensions"/>. In general you just have
1451 to replace the type <literal>[a]</literal> with the type
1452 <literal>Monad m => m a</literal> for monad comprehensions.
1453 </para>
1454
1455 <para>
1456 Note: Even though most of these examples are using the list monad,
1457 monad comprehensions work for any monad.
1458 The <literal>base</literal> package offers all necessary instances for
1459 lists, which make <literal>MonadComprehensions</literal> backward
1460 compatible to built-in, transform and parallel list comprehensions.
1461 </para>
1462 <para> More formally, the desugaring is as follows. We write <literal>D[ e | Q]</literal>
1463 to mean the desugaring of the monad comprehension <literal>[ e | Q]</literal>:
1464 <programlisting>
1465 Expressions: e
1466 Declarations: d
1467 Lists of qualifiers: Q,R,S
1468
1469 -- Basic forms
1470 D[ e | ] = return e
1471 D[ e | p &lt;- e, Q ] = e &gt;&gt;= \p -&gt; D[ e | Q ]
1472 D[ e | e, Q ] = guard e &gt;&gt; \p -&gt; D[ e | Q ]
1473 D[ e | let d, Q ] = let d in D[ e | Q ]
1474
1475 -- Parallel comprehensions (iterate for multiple parallel branches)
1476 D[ e | (Q | R), S ] = mzip D[ Qv | Q ] D[ Rv | R ] &gt;&gt;= \(Qv,Rv) -&gt; D[ e | S ]
1477
1478 -- Transform comprehensions
1479 D[ e | Q then f, R ] = f D[ Qv | Q ] &gt;&gt;= \Qv -&gt; D[ e | R ]
1480
1481 D[ e | Q then f by b, R ] = f (\Qv -&gt; b) D[ Qv | Q ] &gt;&gt;= \Qv -&gt; D[ e | R ]
1482
1483 D[ e | Q then group using f, R ] = f D[ Qv | Q ] &gt;&gt;= \ys -&gt;
1484 case (fmap selQv1 ys, ..., fmap selQvn ys) of
1485 Qv -&gt; D[ e | R ]
1486
1487 D[ e | Q then group by b using f, R ] = f (\Qv -&gt; b) D[ Qv | Q ] &gt;&gt;= \ys -&gt;
1488 case (fmap selQv1 ys, ..., fmap selQvn ys) of
1489 Qv -&gt; D[ e | R ]
1490
1491 where Qv is the tuple of variables bound by Q (and used subsequently)
1492 selQvi is a selector mapping Qv to the ith component of Qv
1493
1494 Operator Standard binding Expected type
1495 --------------------------------------------------------------------
1496 return GHC.Base t1 -&gt; m t2
1497 (&gt;&gt;=) GHC.Base m1 t1 -&gt; (t2 -&gt; m2 t3) -&gt; m3 t3
1498 (&gt;&gt;) GHC.Base m1 t1 -&gt; m2 t2 -&gt; m3 t3
1499 guard Control.Monad t1 -&gt; m t2
1500 fmap GHC.Base forall a b. (a-&gt;b) -&gt; n a -&gt; n b
1501 mzip Control.Monad.Zip forall a b. m a -&gt; m b -&gt; m (a,b)
1502 </programlisting>
1503 The comprehension should typecheck when its desugaring would typecheck.
1504 </para>
1505 <para>
1506 Monad comprehensions support rebindable syntax (<xref linkend="rebindable-syntax"/>).
1507 Without rebindable
1508 syntax, the operators from the "standard binding" module are used; with
1509 rebindable syntax, the operators are looked up in the current lexical scope.
1510 For example, parallel comprehensions will be typechecked and desugared
1511 using whatever "<literal>mzip</literal>" is in scope.
1512 </para>
1513 <para>
1514 The rebindable operators must have the "Expected type" given in the
1515 table above. These types are surprisingly general. For example, you can
1516 use a bind operator with the type
1517 <programlisting>
1518 (>>=) :: T x y a -> (a -> T y z b) -> T x z b
1519 </programlisting>
1520 In the case of transform comprehensions, notice that the groups are
1521 parameterised over some arbitrary type <literal>n</literal> (provided it
1522 has an <literal>fmap</literal>, as well as
1523 the comprehension being over an arbitrary monad.
1524 </para>
1525 </sect2>
1526
1527 <!-- ===================== REBINDABLE SYNTAX =================== -->
1528
1529 <sect2 id="rebindable-syntax">
1530 <title>Rebindable syntax and the implicit Prelude import</title>
1531
1532 <para><indexterm><primary>-XNoImplicitPrelude
1533 option</primary></indexterm> GHC normally imports
1534 <filename>Prelude.hi</filename> files for you. If you'd
1535 rather it didn't, then give it a
1536 <option>-XNoImplicitPrelude</option> option. The idea is
1537 that you can then import a Prelude of your own. (But don't
1538 call it <literal>Prelude</literal>; the Haskell module
1539 namespace is flat, and you must not conflict with any
1540 Prelude module.)</para>
1541
1542 <para>Suppose you are importing a Prelude of your own
1543 in order to define your own numeric class
1544 hierarchy. It completely defeats that purpose if the
1545 literal "1" means "<literal>Prelude.fromInteger
1546 1</literal>", which is what the Haskell Report specifies.
1547 So the <option>-XRebindableSyntax</option>
1548 flag causes
1549 the following pieces of built-in syntax to refer to
1550 <emphasis>whatever is in scope</emphasis>, not the Prelude
1551 versions:
1552 <itemizedlist>
1553 <listitem>
1554 <para>An integer literal <literal>368</literal> means
1555 "<literal>fromInteger (368::Integer)</literal>", rather than
1556 "<literal>Prelude.fromInteger (368::Integer)</literal>".
1557 </para> </listitem>
1558
1559 <listitem><para>Fractional literals are handed in just the same way,
1560 except that the translation is
1561 <literal>fromRational (3.68::Rational)</literal>.
1562 </para> </listitem>
1563
1564 <listitem><para>The equality test in an overloaded numeric pattern
1565 uses whatever <literal>(==)</literal> is in scope.
1566 </para> </listitem>
1567
1568 <listitem><para>The subtraction operation, and the
1569 greater-than-or-equal test, in <literal>n+k</literal> patterns
1570 use whatever <literal>(-)</literal> and <literal>(>=)</literal> are in scope.
1571 </para></listitem>
1572
1573 <listitem>
1574 <para>Negation (e.g. "<literal>- (f x)</literal>")
1575 means "<literal>negate (f x)</literal>", both in numeric
1576 patterns, and expressions.
1577 </para></listitem>
1578
1579 <listitem>
1580 <para>Conditionals (e.g. "<literal>if</literal> e1 <literal>then</literal> e2 <literal>else</literal> e3")
1581 means "<literal>ifThenElse</literal> e1 e2 e3". However <literal>case</literal> expressions are unaffected.
1582 </para></listitem>
1583
1584 <listitem>
1585 <para>"Do" notation is translated using whatever
1586 functions <literal>(>>=)</literal>,
1587 <literal>(>>)</literal>, and <literal>fail</literal>,
1588 are in scope (not the Prelude
1589 versions). List comprehensions, mdo (<xref linkend="recursive-do-notation"/>), and parallel array
1590 comprehensions, are unaffected. </para></listitem>
1591
1592 <listitem>
1593 <para>Arrow
1594 notation (see <xref linkend="arrow-notation"/>)
1595 uses whatever <literal>arr</literal>,
1596 <literal>(>>>)</literal>, <literal>first</literal>,
1597 <literal>app</literal>, <literal>(|||)</literal> and
1598 <literal>loop</literal> functions are in scope. But unlike the
1599 other constructs, the types of these functions must match the
1600 Prelude types very closely. Details are in flux; if you want
1601 to use this, ask!
1602 </para></listitem>
1603 </itemizedlist>
1604 <option>-XRebindableSyntax</option> implies <option>-XNoImplicitPrelude</option>.
1605 </para>
1606 <para>
1607 In all cases (apart from arrow notation), the static semantics should be that of the desugared form,
1608 even if that is a little unexpected. For example, the
1609 static semantics of the literal <literal>368</literal>
1610 is exactly that of <literal>fromInteger (368::Integer)</literal>; it's fine for
1611 <literal>fromInteger</literal> to have any of the types:
1612 <programlisting>
1613 fromInteger :: Integer -> Integer
1614 fromInteger :: forall a. Foo a => Integer -> a
1615 fromInteger :: Num a => a -> Integer
1616 fromInteger :: Integer -> Bool -> Bool
1617 </programlisting>
1618 </para>
1619
1620 <para>Be warned: this is an experimental facility, with
1621 fewer checks than usual. Use <literal>-dcore-lint</literal>
1622 to typecheck the desugared program. If Core Lint is happy
1623 you should be all right.</para>
1624
1625 </sect2>
1626
1627 <sect2 id="postfix-operators">
1628 <title>Postfix operators</title>
1629
1630 <para>
1631 The <option>-XPostfixOperators</option> flag enables a small
1632 extension to the syntax of left operator sections, which allows you to
1633 define postfix operators. The extension is this: the left section
1634 <programlisting>
1635 (e !)
1636 </programlisting>
1637 is equivalent (from the point of view of both type checking and execution) to the expression
1638 <programlisting>
1639 ((!) e)
1640 </programlisting>
1641 (for any expression <literal>e</literal> and operator <literal>(!)</literal>.
1642 The strict Haskell 98 interpretation is that the section is equivalent to
1643 <programlisting>
1644 (\y -> (!) e y)
1645 </programlisting>
1646 That is, the operator must be a function of two arguments. GHC allows it to
1647 take only one argument, and that in turn allows you to write the function
1648 postfix.
1649 </para>
1650 <para>The extension does not extend to the left-hand side of function
1651 definitions; you must define such a function in prefix form.</para>
1652
1653 </sect2>
1654
1655 <sect2 id="tuple-sections">
1656 <title>Tuple sections</title>
1657
1658 <para>
1659 The <option>-XTupleSections</option> flag enables Python-style partially applied
1660 tuple constructors. For example, the following program
1661 <programlisting>
1662 (, True)
1663 </programlisting>
1664 is considered to be an alternative notation for the more unwieldy alternative
1665 <programlisting>
1666 \x -> (x, True)
1667 </programlisting>
1668 You can omit any combination of arguments to the tuple, as in the following
1669 <programlisting>
1670 (, "I", , , "Love", , 1337)
1671 </programlisting>
1672 which translates to
1673 <programlisting>
1674 \a b c d -> (a, "I", b, c, "Love", d, 1337)
1675 </programlisting>
1676 </para>
1677
1678 <para>
1679 If you have <link linkend="unboxed-tuples">unboxed tuples</link> enabled, tuple sections
1680 will also be available for them, like so
1681 <programlisting>
1682 (# , True #)
1683 </programlisting>
1684 Because there is no unboxed unit tuple, the following expression
1685 <programlisting>
1686 (# #)
1687 </programlisting>
1688 continues to stand for the unboxed singleton tuple data constructor.
1689 </para>
1690
1691 </sect2>
1692
1693 <sect2 id="lambda-case">
1694 <title>Lambda-case</title>
1695 <para>
1696 The <option>-XLambdaCase</option> flag enables expressions of the form
1697 <programlisting>
1698 \case { p1 -> e1; ...; pN -> eN }
1699 </programlisting>
1700 which is equivalent to
1701 <programlisting>
1702 \freshName -> case freshName of { p1 -> e1; ...; pN -> eN }
1703 </programlisting>
1704 Note that <literal>\case</literal> starts a layout, so you can write
1705 <programlisting>
1706 \case
1707 p1 -> e1
1708 ...
1709 pN -> eN
1710 </programlisting>
1711 </para>
1712 </sect2>
1713
1714 <sect2 id="empty-case">
1715 <title>Empty case alternatives</title>
1716 <para>
1717 The <option>-XEmptyCase</option> flag enables
1718 case expressions, or lambda-case expressions, that have no alternatives,
1719 thus:
1720 <programlisting>
1721 case e of { } -- No alternatives
1722 or
1723 \case { } -- -XLambdaCase is also required
1724 </programlisting>
1725 This can be useful when you know that the expression being scrutinised
1726 has no non-bottom values. For example:
1727 <programlisting>
1728 data Void
1729 f :: Void -> Int
1730 f x = case x of { }
1731 </programlisting>
1732 With dependently-typed features it is more useful
1733 (see <ulink url="http://hackage.haskell.org/trac/ghc/ticket/2431">Trac</ulink>).
1734 For example, consider these two candidate definitions of <literal>absurd</literal>:
1735 <programlisting>
1736 data a :==: b where
1737 Refl :: a :==: a
1738
1739 absurd :: True :~: False -> a
1740 absurd x = error "absurd" -- (A)
1741 absurd x = case x of {} -- (B)
1742 </programlisting>
1743 We much prefer (B). Why? Because GHC can figure out that <literal>(True :~: False)</literal>
1744 is an empty type. So (B) has no partiality and GHC should be able to compile with
1745 <option>-fwarn-incomplete-patterns</option>. (Though the pattern match checking is not
1746 yet clever enough to do that.
1747 On the other hand (A) looks dangerous, and GHC doesn't check to make
1748 sure that, in fact, the function can never get called.
1749 </para>
1750 </sect2>
1751
1752 <sect2 id="multi-way-if">
1753 <title>Multi-way if-expressions</title>
1754 <para>
1755 With <option>-XMultiWayIf</option> flag GHC accepts conditional expressions
1756 with multiple branches:
1757 <programlisting>
1758 if | guard1 -> expr1
1759 | ...
1760 | guardN -> exprN
1761 </programlisting>
1762 which is roughly equivalent to
1763 <programlisting>
1764 case () of
1765 _ | guard1 -> expr1
1766 ...
1767 _ | guardN -> exprN
1768 </programlisting>
1769 except that multi-way if-expressions do not alter the layout.
1770 </para>
1771 </sect2>
1772
1773 <sect2 id="disambiguate-fields">
1774 <title>Record field disambiguation</title>
1775 <para>
1776 In record construction and record pattern matching
1777 it is entirely unambiguous which field is referred to, even if there are two different
1778 data types in scope with a common field name. For example:
1779 <programlisting>
1780 module M where
1781 data S = MkS { x :: Int, y :: Bool }
1782
1783 module Foo where
1784 import M
1785
1786 data T = MkT { x :: Int }
1787
1788 ok1 (MkS { x = n }) = n+1 -- Unambiguous
1789 ok2 n = MkT { x = n+1 } -- Unambiguous
1790
1791 bad1 k = k { x = 3 } -- Ambiguous
1792 bad2 k = x k -- Ambiguous
1793 </programlisting>
1794 Even though there are two <literal>x</literal>'s in scope,
1795 it is clear that the <literal>x</literal> in the pattern in the
1796 definition of <literal>ok1</literal> can only mean the field
1797 <literal>x</literal> from type <literal>S</literal>. Similarly for
1798 the function <literal>ok2</literal>. However, in the record update
1799 in <literal>bad1</literal> and the record selection in <literal>bad2</literal>
1800 it is not clear which of the two types is intended.
1801 </para>
1802 <para>
1803 Haskell 98 regards all four as ambiguous, but with the
1804 <option>-XDisambiguateRecordFields</option> flag, GHC will accept
1805 the former two. The rules are precisely the same as those for instance
1806 declarations in Haskell 98, where the method names on the left-hand side
1807 of the method bindings in an instance declaration refer unambiguously
1808 to the method of that class (provided they are in scope at all), even
1809 if there are other variables in scope with the same name.
1810 This reduces the clutter of qualified names when you import two
1811 records from different modules that use the same field name.
1812 </para>
1813 <para>
1814 Some details:
1815 <itemizedlist>
1816 <listitem><para>
1817 Field disambiguation can be combined with punning (see <xref linkend="record-puns"/>). For example:
1818 <programlisting>
1819 module Foo where
1820 import M
1821 x=True
1822 ok3 (MkS { x }) = x+1 -- Uses both disambiguation and punning
1823 </programlisting>
1824 </para></listitem>
1825
1826 <listitem><para>
1827 With <option>-XDisambiguateRecordFields</option> you can use <emphasis>unqualified</emphasis>
1828 field names even if the corresponding selector is only in scope <emphasis>qualified</emphasis>
1829 For example, assuming the same module <literal>M</literal> as in our earlier example, this is legal:
1830 <programlisting>
1831 module Foo where
1832 import qualified M -- Note qualified
1833
1834 ok4 (M.MkS { x = n }) = n+1 -- Unambiguous
1835 </programlisting>
1836 Since the constructor <literal>MkS</literal> is only in scope qualified, you must
1837 name it <literal>M.MkS</literal>, but the field <literal>x</literal> does not need
1838 to be qualified even though <literal>M.x</literal> is in scope but <literal>x</literal>
1839 is not. (In effect, it is qualified by the constructor.)
1840 </para></listitem>
1841 </itemizedlist>
1842 </para>
1843
1844 </sect2>
1845
1846 <!-- ===================== Record puns =================== -->
1847
1848 <sect2 id="record-puns">
1849 <title>Record puns
1850 </title>
1851
1852 <para>
1853 Record puns are enabled by the flag <literal>-XNamedFieldPuns</literal>.
1854 </para>
1855
1856 <para>
1857 When using records, it is common to write a pattern that binds a
1858 variable with the same name as a record field, such as:
1859
1860 <programlisting>
1861 data C = C {a :: Int}
1862 f (C {a = a}) = a
1863 </programlisting>
1864 </para>
1865
1866 <para>
1867 Record punning permits the variable name to be elided, so one can simply
1868 write
1869
1870 <programlisting>
1871 f (C {a}) = a
1872 </programlisting>
1873
1874 to mean the same pattern as above. That is, in a record pattern, the
1875 pattern <literal>a</literal> expands into the pattern <literal>a =
1876 a</literal> for the same name <literal>a</literal>.
1877 </para>
1878
1879 <para>
1880 Note that:
1881 <itemizedlist>
1882 <listitem><para>
1883 Record punning can also be used in an expression, writing, for example,
1884 <programlisting>
1885 let a = 1 in C {a}
1886 </programlisting>
1887 instead of
1888 <programlisting>
1889 let a = 1 in C {a = a}
1890 </programlisting>
1891 The expansion is purely syntactic, so the expanded right-hand side
1892 expression refers to the nearest enclosing variable that is spelled the
1893 same as the field name.
1894 </para></listitem>
1895
1896 <listitem><para>
1897 Puns and other patterns can be mixed in the same record:
1898 <programlisting>
1899 data C = C {a :: Int, b :: Int}
1900 f (C {a, b = 4}) = a
1901 </programlisting>
1902 </para></listitem>
1903
1904 <listitem><para>
1905 Puns can be used wherever record patterns occur (e.g. in
1906 <literal>let</literal> bindings or at the top-level).
1907 </para></listitem>
1908
1909 <listitem><para>
1910 A pun on a qualified field name is expanded by stripping off the module qualifier.
1911 For example:
1912 <programlisting>
1913 f (C {M.a}) = a
1914 </programlisting>
1915 means
1916 <programlisting>
1917 f (M.C {M.a = a}) = a
1918 </programlisting>
1919 (This is useful if the field selector <literal>a</literal> for constructor <literal>M.C</literal>
1920 is only in scope in qualified form.)
1921 </para></listitem>
1922 </itemizedlist>
1923 </para>
1924
1925
1926 </sect2>
1927
1928 <!-- ===================== Record wildcards =================== -->
1929
1930 <sect2 id="record-wildcards">
1931 <title>Record wildcards
1932 </title>
1933
1934 <para>
1935 Record wildcards are enabled by the flag <literal>-XRecordWildCards</literal>.
1936 This flag implies <literal>-XDisambiguateRecordFields</literal>.
1937 </para>
1938
1939 <para>
1940 For records with many fields, it can be tiresome to write out each field
1941 individually in a record pattern, as in
1942 <programlisting>
1943 data C = C {a :: Int, b :: Int, c :: Int, d :: Int}
1944 f (C {a = 1, b = b, c = c, d = d}) = b + c + d
1945 </programlisting>
1946 </para>
1947
1948 <para>
1949 Record wildcard syntax permits a "<literal>..</literal>" in a record
1950 pattern, where each elided field <literal>f</literal> is replaced by the
1951 pattern <literal>f = f</literal>. For example, the above pattern can be
1952 written as
1953 <programlisting>
1954 f (C {a = 1, ..}) = b + c + d
1955 </programlisting>
1956 </para>
1957
1958 <para>
1959 More details:
1960 <itemizedlist>
1961 <listitem><para>
1962 Wildcards can be mixed with other patterns, including puns
1963 (<xref linkend="record-puns"/>); for example, in a pattern <literal>C {a
1964 = 1, b, ..})</literal>. Additionally, record wildcards can be used
1965 wherever record patterns occur, including in <literal>let</literal>
1966 bindings and at the top-level. For example, the top-level binding
1967 <programlisting>
1968 C {a = 1, ..} = e
1969 </programlisting>
1970 defines <literal>b</literal>, <literal>c</literal>, and
1971 <literal>d</literal>.
1972 </para></listitem>
1973
1974 <listitem><para>
1975 Record wildcards can also be used in expressions, writing, for example,
1976 <programlisting>
1977 let {a = 1; b = 2; c = 3; d = 4} in C {..}
1978 </programlisting>
1979 in place of
1980 <programlisting>
1981 let {a = 1; b = 2; c = 3; d = 4} in C {a=a, b=b, c=c, d=d}
1982 </programlisting>
1983 The expansion is purely syntactic, so the record wildcard
1984 expression refers to the nearest enclosing variables that are spelled
1985 the same as the omitted field names.
1986 </para></listitem>
1987
1988 <listitem><para>
1989 The "<literal>..</literal>" expands to the missing
1990 <emphasis>in-scope</emphasis> record fields.
1991 Specifically the expansion of "<literal>C {..}</literal>" includes
1992 <literal>f</literal> if and only if:
1993 <itemizedlist>
1994 <listitem><para>
1995 <literal>f</literal> is a record field of constructor <literal>C</literal>.
1996 </para></listitem>
1997 <listitem><para>
1998 The record field <literal>f</literal> is in scope somehow (either qualified or unqualified).
1999 </para></listitem>
2000 <listitem><para>
2001 In the case of expressions (but not patterns),
2002 the variable <literal>f</literal> is in scope unqualified,
2003 apart from the binding of the record selector itself.
2004 </para></listitem>
2005 </itemizedlist>
2006 For example
2007 <programlisting>
2008 module M where
2009 data R = R { a,b,c :: Int }
2010 module X where
2011 import M( R(a,c) )
2012 f b = R { .. }
2013 </programlisting>
2014 The <literal>R{..}</literal> expands to <literal>R{M.a=a}</literal>,
2015 omitting <literal>b</literal> since the record field is not in scope,
2016 and omitting <literal>c</literal> since the variable <literal>c</literal>
2017 is not in scope (apart from the binding of the
2018 record selector <literal>c</literal>, of course).
2019 </para></listitem>
2020 </itemizedlist>
2021 </para>
2022
2023 </sect2>
2024
2025 <!-- ===================== Local fixity declarations =================== -->
2026
2027 <sect2 id="local-fixity-declarations">
2028 <title>Local Fixity Declarations
2029 </title>
2030
2031 <para>A careful reading of the Haskell 98 Report reveals that fixity
2032 declarations (<literal>infix</literal>, <literal>infixl</literal>, and
2033 <literal>infixr</literal>) are permitted to appear inside local bindings
2034 such those introduced by <literal>let</literal> and
2035 <literal>where</literal>. However, the Haskell Report does not specify
2036 the semantics of such bindings very precisely.
2037 </para>
2038
2039 <para>In GHC, a fixity declaration may accompany a local binding:
2040 <programlisting>
2041 let f = ...
2042 infixr 3 `f`
2043 in
2044 ...
2045 </programlisting>
2046 and the fixity declaration applies wherever the binding is in scope.
2047 For example, in a <literal>let</literal>, it applies in the right-hand
2048 sides of other <literal>let</literal>-bindings and the body of the
2049 <literal>let</literal>C. Or, in recursive <literal>do</literal>
2050 expressions (<xref linkend="recursive-do-notation"/>), the local fixity
2051 declarations of a <literal>let</literal> statement scope over other
2052 statements in the group, just as the bound name does.
2053 </para>
2054
2055 <para>
2056 Moreover, a local fixity declaration *must* accompany a local binding of
2057 that name: it is not possible to revise the fixity of name bound
2058 elsewhere, as in
2059 <programlisting>
2060 let infixr 9 $ in ...
2061 </programlisting>
2062
2063 Because local fixity declarations are technically Haskell 98, no flag is
2064 necessary to enable them.
2065 </para>
2066 </sect2>
2067
2068 <sect2 id="package-imports">
2069 <title>Package-qualified imports</title>
2070
2071 <para>With the <option>-XPackageImports</option> flag, GHC allows
2072 import declarations to be qualified by the package name that the
2073 module is intended to be imported from. For example:</para>
2074
2075 <programlisting>
2076 import "network" Network.Socket
2077 </programlisting>
2078
2079 <para>would import the module <literal>Network.Socket</literal> from
2080 the package <literal>network</literal> (any version). This may
2081 be used to disambiguate an import when the same module is
2082 available from multiple packages, or is present in both the
2083 current package being built and an external package.</para>
2084
2085 <para>The special package name <literal>this</literal> can be used to
2086 refer to the current package being built.</para>
2087
2088 <para>Note: you probably don't need to use this feature, it was
2089 added mainly so that we can build backwards-compatible versions of
2090 packages when APIs change. It can lead to fragile dependencies in
2091 the common case: modules occasionally move from one package to
2092 another, rendering any package-qualified imports broken.</para>
2093 </sect2>
2094
2095 <sect2 id="safe-imports-ext">
2096 <title>Safe imports</title>
2097
2098 <para>With the <option>-XSafe</option>, <option>-XTrustworthy</option>
2099 and <option>-XUnsafe</option> language flags, GHC extends
2100 the import declaration syntax to take an optional <literal>safe</literal>
2101 keyword after the <literal>import</literal> keyword. This feature
2102 is part of the Safe Haskell GHC extension. For example:</para>
2103
2104 <programlisting>
2105 import safe qualified Network.Socket as NS
2106 </programlisting>
2107
2108 <para>would import the module <literal>Network.Socket</literal>
2109 with compilation only succeeding if Network.Socket can be
2110 safely imported. For a description of when a import is
2111 considered safe see <xref linkend="safe-haskell"/></para>
2112
2113 </sect2>
2114
2115 <sect2 id="explicit-namespaces">
2116 <title>Explicit namespaces in import/export</title>
2117
2118 <para> In an import or export list, such as
2119 <programlisting>
2120 module M( f, (++) ) where ...
2121 import N( f, (++) )
2122 ...
2123 </programlisting>
2124 the entities <literal>f</literal> and <literal>(++)</literal> are <emphasis>values</emphasis>.
2125 However, with type operators (<xref linkend="type-operators"/>) it becomes possible
2126 to declare <literal>(++)</literal> as a <emphasis>type constructor</emphasis>. In that
2127 case, how would you export or import it?
2128 </para>
2129 <para>
2130 The <option>-XExplicitNamespaces</option> extension allows you to prefix the name of
2131 a type constructor in an import or export list with "<literal>type</literal>" to
2132 disambiguate this case, thus:
2133 <programlisting>
2134 module M( f, type (++) ) where ...
2135 import N( f, type (++) )
2136 ...
2137 module N( f, type (++) ) where
2138 data family a ++ b = L a | R b
2139 </programlisting>
2140 The extension <option>-XExplicitNamespaces</option>
2141 is implied by <option>-XTypeOperators</option> and (for some reason) by <option>-XTypeFamilies</option>.
2142 </para>
2143 </sect2>
2144
2145 <sect2 id="syntax-stolen">
2146 <title>Summary of stolen syntax</title>
2147
2148 <para>Turning on an option that enables special syntax
2149 <emphasis>might</emphasis> cause working Haskell 98 code to fail
2150 to compile, perhaps because it uses a variable name which has
2151 become a reserved word. This section lists the syntax that is
2152 "stolen" by language extensions.
2153 We use
2154 notation and nonterminal names from the Haskell 98 lexical syntax
2155 (see the Haskell 98 Report).
2156 We only list syntax changes here that might affect
2157 existing working programs (i.e. "stolen" syntax). Many of these
2158 extensions will also enable new context-free syntax, but in all
2159 cases programs written to use the new syntax would not be
2160 compilable without the option enabled.</para>
2161
2162 <para>There are two classes of special
2163 syntax:
2164
2165 <itemizedlist>
2166 <listitem>
2167 <para>New reserved words and symbols: character sequences
2168 which are no longer available for use as identifiers in the
2169 program.</para>
2170 </listitem>
2171 <listitem>
2172 <para>Other special syntax: sequences of characters that have
2173 a different meaning when this particular option is turned
2174 on.</para>
2175 </listitem>
2176 </itemizedlist>
2177
2178 The following syntax is stolen:
2179
2180 <variablelist>
2181 <varlistentry>
2182 <term>
2183 <literal>forall</literal>
2184 <indexterm><primary><literal>forall</literal></primary></indexterm>
2185 </term>
2186 <listitem><para>
2187 Stolen (in types) by: <option>-XExplicitForAll</option>, and hence by
2188 <option>-XScopedTypeVariables</option>,
2189 <option>-XLiberalTypeSynonyms</option>,
2190 <option>-XRankNTypes</option>,
2191 <option>-XExistentialQuantification</option>
2192 </para></listitem>
2193 </varlistentry>
2194
2195 <varlistentry>
2196 <term>
2197 <literal>mdo</literal>
2198 <indexterm><primary><literal>mdo</literal></primary></indexterm>
2199 </term>
2200 <listitem><para>
2201 Stolen by: <option>-XRecursiveDo</option>
2202 </para></listitem>
2203 </varlistentry>
2204
2205 <varlistentry>
2206 <term>
2207 <literal>foreign</literal>
2208 <indexterm><primary><literal>foreign</literal></primary></indexterm>
2209 </term>
2210 <listitem><para>
2211 Stolen by: <option>-XForeignFunctionInterface</option>
2212 </para></listitem>
2213 </varlistentry>
2214
2215 <varlistentry>
2216 <term>
2217 <literal>rec</literal>,
2218 <literal>proc</literal>, <literal>-&lt;</literal>,
2219 <literal>&gt;-</literal>, <literal>-&lt;&lt;</literal>,
2220 <literal>&gt;&gt;-</literal>, and <literal>(|</literal>,
2221 <literal>|)</literal> brackets
2222 <indexterm><primary><literal>proc</literal></primary></indexterm>
2223 </term>
2224 <listitem><para>
2225 Stolen by: <option>-XArrows</option>
2226 </para></listitem>
2227 </varlistentry>
2228
2229 <varlistentry>
2230 <term>
2231 <literal>?<replaceable>varid</replaceable></literal>,
2232 <literal>%<replaceable>varid</replaceable></literal>
2233 <indexterm><primary>implicit parameters</primary></indexterm>
2234 </term>
2235 <listitem><para>
2236 Stolen by: <option>-XImplicitParams</option>
2237 </para></listitem>
2238 </varlistentry>
2239
2240 <varlistentry>
2241 <term>
2242 <literal>[|</literal>,
2243 <literal>[e|</literal>, <literal>[p|</literal>,
2244 <literal>[d|</literal>, <literal>[t|</literal>,
2245 <literal>$(</literal>,
2246 <literal>$<replaceable>varid</replaceable></literal>
2247 <indexterm><primary>Template Haskell</primary></indexterm>
2248 </term>
2249 <listitem><para>
2250 Stolen by: <option>-XTemplateHaskell</option>
2251 </para></listitem>
2252 </varlistentry>
2253
2254 <varlistentry>
2255 <term>
2256 <literal>[:<replaceable>varid</replaceable>|</literal>
2257 <indexterm><primary>quasi-quotation</primary></indexterm>
2258 </term>
2259 <listitem><para>
2260 Stolen by: <option>-XQuasiQuotes</option>
2261 </para></listitem>
2262 </varlistentry>
2263
2264 <varlistentry>
2265 <term>
2266 <replaceable>varid</replaceable>{<literal>&num;</literal>},
2267 <replaceable>char</replaceable><literal>&num;</literal>,
2268 <replaceable>string</replaceable><literal>&num;</literal>,
2269 <replaceable>integer</replaceable><literal>&num;</literal>,
2270 <replaceable>float</replaceable><literal>&num;</literal>,
2271 <replaceable>float</replaceable><literal>&num;&num;</literal>,
2272 <literal>(&num;</literal>, <literal>&num;)</literal>
2273 </term>
2274 <listitem><para>
2275 Stolen by: <option>-XMagicHash</option>
2276 </para></listitem>
2277 </varlistentry>
2278 </variablelist>
2279 </para>
2280 </sect2>
2281 </sect1>
2282
2283
2284 <!-- TYPE SYSTEM EXTENSIONS -->
2285 <sect1 id="data-type-extensions">
2286 <title>Extensions to data types and type synonyms</title>
2287
2288 <sect2 id="nullary-types">
2289 <title>Data types with no constructors</title>
2290
2291 <para>With the <option>-XEmptyDataDecls</option> flag (or equivalent LANGUAGE pragma),
2292 GHC lets you declare a data type with no constructors. For example:</para>
2293
2294 <programlisting>
2295 data S -- S :: *
2296 data T a -- T :: * -> *
2297 </programlisting>
2298
2299 <para>Syntactically, the declaration lacks the "= constrs" part. The
2300 type can be parameterised over types of any kind, but if the kind is
2301 not <literal>*</literal> then an explicit kind annotation must be used
2302 (see <xref linkend="kinding"/>).</para>
2303
2304 <para>Such data types have only one value, namely bottom.
2305 Nevertheless, they can be useful when defining "phantom types".</para>
2306 </sect2>
2307
2308 <sect2 id="datatype-contexts">
2309 <title>Data type contexts</title>
2310
2311 <para>Haskell allows datatypes to be given contexts, e.g.</para>
2312
2313 <programlisting>
2314 data Eq a => Set a = NilSet | ConsSet a (Set a)
2315 </programlisting>
2316
2317 <para>give constructors with types:</para>
2318
2319 <programlisting>
2320 NilSet :: Set a
2321 ConsSet :: Eq a => a -> Set a -> Set a
2322 </programlisting>
2323
2324 <para>This is widely considered a misfeature, and is going to be removed from
2325 the language. In GHC, it is controlled by the deprecated extension
2326 <literal>DatatypeContexts</literal>.</para>
2327 </sect2>
2328
2329 <sect2 id="infix-tycons">
2330 <title>Infix type constructors, classes, and type variables</title>
2331
2332 <para>
2333 GHC allows type constructors, classes, and type variables to be operators, and
2334 to be written infix, very much like expressions. More specifically:
2335 <itemizedlist>
2336 <listitem><para>
2337 A type constructor or class can be an operator, beginning with a colon; e.g. <literal>:*:</literal>.
2338 The lexical syntax is the same as that for data constructors.
2339 </para></listitem>
2340 <listitem><para>
2341 Data type and type-synonym declarations can be written infix, parenthesised
2342 if you want further arguments. E.g.
2343 <screen>
2344 data a :*: b = Foo a b
2345 type a :+: b = Either a b
2346 class a :=: b where ...
2347
2348 data (a :**: b) x = Baz a b x
2349 type (a :++: b) y = Either (a,b) y
2350 </screen>
2351 </para></listitem>
2352 <listitem><para>
2353 Types, and class constraints, can be written infix. For example
2354 <screen>
2355 x :: Int :*: Bool
2356 f :: (a :=: b) => a -> b
2357 </screen>
2358 </para></listitem>
2359 <listitem><para>
2360 Back-quotes work
2361 as for expressions, both for type constructors and type variables; e.g. <literal>Int `Either` Bool</literal>, or
2362 <literal>Int `a` Bool</literal>. Similarly, parentheses work the same; e.g. <literal>(:*:) Int Bool</literal>.
2363 </para></listitem>
2364 <listitem><para>
2365 Fixities may be declared for type constructors, or classes, just as for data constructors. However,
2366 one cannot distinguish between the two in a fixity declaration; a fixity declaration
2367 sets the fixity for a data constructor and the corresponding type constructor. For example:
2368 <screen>
2369 infixl 7 T, :*:
2370 </screen>
2371 sets the fixity for both type constructor <literal>T</literal> and data constructor <literal>T</literal>,
2372 and similarly for <literal>:*:</literal>.
2373 <literal>Int `a` Bool</literal>.
2374 </para></listitem>
2375 <listitem><para>
2376 Function arrow is <literal>infixr</literal> with fixity 0. (This might change; I'm not sure what it should be.)
2377 </para></listitem>
2378
2379 </itemizedlist>
2380 </para>
2381 </sect2>
2382
2383 <sect2 id="type-operators">
2384 <title>Type operators</title>
2385 <para>
2386 In types, an operator symbol like <literal>(+)</literal> is normally treated as a type
2387 <emphasis>variable</emphasis>, just like <literal>a</literal>. Thus in Haskell 98 you can say
2388 <programlisting>
2389 type T (+) = ((+), (+))
2390 -- Just like: type T a = (a,a)
2391
2392 f :: T Int -> Int
2393 f (x,y)= x
2394 </programlisting>
2395 As you can see, using operators in this way is not very useful, and Haskell 98 does not even
2396 allow you to write them infix.
2397 </para>
2398 <para>
2399 The language <option>-XTypeOperators</option> changes this behaviour:
2400 <itemizedlist>
2401 <listitem><para>
2402 Operator symbols become type <emphasis>constructors</emphasis> rather than
2403 type <emphasis>variables</emphasis>.
2404 </para></listitem>
2405 <listitem><para>
2406 Operator symbols in types can be written infix, both in definitions and uses.
2407 for example:
2408 <programlisting>
2409 data a + b = Plus a b
2410 type Foo = Int + Bool
2411 </programlisting>
2412 </para></listitem>
2413 <listitem><para>
2414 There is now some potential ambiguity in import and export lists; for example
2415 if you write <literal>import M( (+) )</literal> do you mean the
2416 <emphasis>function</emphasis> <literal>(+)</literal> or the
2417 <emphasis>type constructor</emphasis> <literal>(+)</literal>?
2418 The default is the former, but with <option>-XExplicitNamespaces</option> (which is implied
2419 by <option>-XExplicitTypeOperators</option>) GHC allows you to specify the latter
2420 by preceding it with the keyword <literal>type</literal>, thus:
2421 <programlisting>
2422 import M( type (+) )
2423 </programlisting>
2424 See <xref linkend="explicit-namespaces"/>.
2425 </para></listitem>
2426 <listitem><para>
2427 The fixity of a type operator may be set using the usual fixity declarations
2428 but, as in <xref linkend="infix-tycons"/>, the function and type constructor share
2429 a single fixity.
2430 </para></listitem>
2431 </itemizedlist>
2432 </para>
2433 </sect2>
2434
2435 <sect2 id="type-synonyms">
2436 <title>Liberalised type synonyms</title>
2437
2438 <para>
2439 Type synonyms are like macros at the type level, but Haskell 98 imposes many rules
2440 on individual synonym declarations.
2441 With the <option>-XLiberalTypeSynonyms</option> extension,
2442 GHC does validity checking on types <emphasis>only after expanding type synonyms</emphasis>.
2443 That means that GHC can be very much more liberal about type synonyms than Haskell 98.
2444
2445 <itemizedlist>
2446 <listitem> <para>You can write a <literal>forall</literal> (including overloading)
2447 in a type synonym, thus:
2448 <programlisting>
2449 type Discard a = forall b. Show b => a -> b -> (a, String)
2450
2451 f :: Discard a
2452 f x y = (x, show y)
2453
2454 g :: Discard Int -> (Int,String) -- A rank-2 type
2455 g f = f 3 True
2456 </programlisting>
2457 </para>
2458 </listitem>
2459
2460 <listitem><para>
2461 If you also use <option>-XUnboxedTuples</option>,
2462 you can write an unboxed tuple in a type synonym:
2463 <programlisting>
2464 type Pr = (# Int, Int #)
2465
2466 h :: Int -> Pr
2467 h x = (# x, x #)
2468 </programlisting>
2469 </para></listitem>
2470
2471 <listitem><para>
2472 You can apply a type synonym to a forall type:
2473 <programlisting>
2474 type Foo a = a -> a -> Bool
2475
2476 f :: Foo (forall b. b->b)
2477 </programlisting>
2478 After expanding the synonym, <literal>f</literal> has the legal (in GHC) type:
2479 <programlisting>
2480 f :: (forall b. b->b) -> (forall b. b->b) -> Bool
2481 </programlisting>
2482 </para></listitem>
2483
2484 <listitem><para>
2485 You can apply a type synonym to a partially applied type synonym:
2486 <programlisting>
2487 type Generic i o = forall x. i x -> o x
2488 type Id x = x
2489
2490 foo :: Generic Id []
2491 </programlisting>
2492 After expanding the synonym, <literal>foo</literal> has the legal (in GHC) type:
2493 <programlisting>
2494 foo :: forall x. x -> [x]
2495 </programlisting>
2496 </para></listitem>
2497
2498 </itemizedlist>
2499 </para>
2500
2501 <para>
2502 GHC currently does kind checking before expanding synonyms (though even that
2503 could be changed.)
2504 </para>
2505 <para>
2506 After expanding type synonyms, GHC does validity checking on types, looking for
2507 the following mal-formedness which isn't detected simply by kind checking:
2508 <itemizedlist>
2509 <listitem><para>
2510 Type constructor applied to a type involving for-alls.
2511 </para></listitem>
2512 <listitem><para>
2513 Unboxed tuple on left of an arrow.
2514 </para></listitem>
2515 <listitem><para>
2516 Partially-applied type synonym.
2517 </para></listitem>
2518 </itemizedlist>
2519 So, for example,
2520 this will be rejected:
2521 <programlisting>
2522 type Pr = (# Int, Int #)
2523
2524 h :: Pr -> Int
2525 h x = ...
2526 </programlisting>
2527 because GHC does not allow unboxed tuples on the left of a function arrow.
2528 </para>
2529 </sect2>
2530
2531
2532 <sect2 id="existential-quantification">
2533 <title>Existentially quantified data constructors
2534 </title>
2535
2536 <para>
2537 The idea of using existential quantification in data type declarations
2538 was suggested by Perry, and implemented in Hope+ (Nigel Perry, <emphasis>The Implementation
2539 of Practical Functional Programming Languages</emphasis>, PhD Thesis, University of
2540 London, 1991). It was later formalised by Laufer and Odersky
2541 (<emphasis>Polymorphic type inference and abstract data types</emphasis>,
2542 TOPLAS, 16(5), pp1411-1430, 1994).
2543 It's been in Lennart
2544 Augustsson's <command>hbc</command> Haskell compiler for several years, and
2545 proved very useful. Here's the idea. Consider the declaration:
2546 </para>
2547
2548 <para>
2549
2550 <programlisting>
2551 data Foo = forall a. MkFoo a (a -> Bool)
2552 | Nil
2553 </programlisting>
2554
2555 </para>
2556
2557 <para>
2558 The data type <literal>Foo</literal> has two constructors with types:
2559 </para>
2560
2561 <para>
2562
2563 <programlisting>
2564 MkFoo :: forall a. a -> (a -> Bool) -> Foo
2565 Nil :: Foo
2566 </programlisting>
2567
2568 </para>
2569
2570 <para>
2571 Notice that the type variable <literal>a</literal> in the type of <function>MkFoo</function>
2572 does not appear in the data type itself, which is plain <literal>Foo</literal>.
2573 For example, the following expression is fine:
2574 </para>
2575
2576 <para>
2577
2578 <programlisting>
2579 [MkFoo 3 even, MkFoo 'c' isUpper] :: [Foo]
2580 </programlisting>
2581
2582 </para>
2583
2584 <para>
2585 Here, <literal>(MkFoo 3 even)</literal> packages an integer with a function
2586 <function>even</function> that maps an integer to <literal>Bool</literal>; and <function>MkFoo 'c'
2587 isUpper</function> packages a character with a compatible function. These
2588 two things are each of type <literal>Foo</literal> and can be put in a list.
2589 </para>
2590
2591 <para>
2592 What can we do with a value of type <literal>Foo</literal>?. In particular,
2593 what happens when we pattern-match on <function>MkFoo</function>?
2594 </para>
2595
2596 <para>
2597
2598 <programlisting>
2599 f (MkFoo val fn) = ???
2600 </programlisting>
2601
2602 </para>
2603
2604 <para>
2605 Since all we know about <literal>val</literal> and <function>fn</function> is that they
2606 are compatible, the only (useful) thing we can do with them is to
2607 apply <function>fn</function> to <literal>val</literal> to get a boolean. For example:
2608 </para>
2609
2610 <para>
2611
2612 <programlisting>
2613 f :: Foo -> Bool
2614 f (MkFoo val fn) = fn val
2615 </programlisting>
2616
2617 </para>
2618
2619 <para>
2620 What this allows us to do is to package heterogeneous values
2621 together with a bunch of functions that manipulate them, and then treat
2622 that collection of packages in a uniform manner. You can express
2623 quite a bit of object-oriented-like programming this way.
2624 </para>
2625
2626 <sect3 id="existential">
2627 <title>Why existential?
2628 </title>
2629
2630 <para>
2631 What has this to do with <emphasis>existential</emphasis> quantification?
2632 Simply that <function>MkFoo</function> has the (nearly) isomorphic type
2633 </para>
2634
2635 <para>
2636
2637 <programlisting>
2638 MkFoo :: (exists a . (a, a -> Bool)) -> Foo
2639 </programlisting>
2640
2641 </para>
2642
2643 <para>
2644 But Haskell programmers can safely think of the ordinary
2645 <emphasis>universally</emphasis> quantified type given above, thereby avoiding
2646 adding a new existential quantification construct.
2647 </para>
2648
2649 </sect3>
2650
2651 <sect3 id="existential-with-context">
2652 <title>Existentials and type classes</title>
2653
2654 <para>
2655 An easy extension is to allow
2656 arbitrary contexts before the constructor. For example:
2657 </para>
2658
2659 <para>
2660
2661 <programlisting>
2662 data Baz = forall a. Eq a => Baz1 a a
2663 | forall b. Show b => Baz2 b (b -> b)
2664 </programlisting>
2665
2666 </para>
2667
2668 <para>
2669 The two constructors have the types you'd expect:
2670 </para>
2671
2672 <para>
2673
2674 <programlisting>
2675 Baz1 :: forall a. Eq a => a -> a -> Baz
2676 Baz2 :: forall b. Show b => b -> (b -> b) -> Baz
2677 </programlisting>
2678
2679 </para>
2680
2681 <para>
2682 But when pattern matching on <function>Baz1</function> the matched values can be compared
2683 for equality, and when pattern matching on <function>Baz2</function> the first matched
2684 value can be converted to a string (as well as applying the function to it).
2685 So this program is legal:
2686 </para>
2687
2688 <para>
2689
2690 <programlisting>
2691 f :: Baz -> String
2692 f (Baz1 p q) | p == q = "Yes"
2693 | otherwise = "No"
2694 f (Baz2 v fn) = show (fn v)
2695 </programlisting>
2696
2697 </para>
2698
2699 <para>
2700 Operationally, in a dictionary-passing implementation, the
2701 constructors <function>Baz1</function> and <function>Baz2</function> must store the
2702 dictionaries for <literal>Eq</literal> and <literal>Show</literal> respectively, and
2703 extract it on pattern matching.
2704 </para>
2705
2706 </sect3>
2707
2708 <sect3 id="existential-records">
2709 <title>Record Constructors</title>
2710
2711 <para>
2712 GHC allows existentials to be used with records syntax as well. For example:
2713
2714 <programlisting>
2715 data Counter a = forall self. NewCounter
2716 { _this :: self
2717 , _inc :: self -> self
2718 , _display :: self -> IO ()
2719 , tag :: a
2720 }
2721 </programlisting>
2722 Here <literal>tag</literal> is a public field, with a well-typed selector
2723 function <literal>tag :: Counter a -> a</literal>. The <literal>self</literal>
2724 type is hidden from the outside; any attempt to apply <literal>_this</literal>,
2725 <literal>_inc</literal> or <literal>_display</literal> as functions will raise a
2726 compile-time error. In other words, <emphasis>GHC defines a record selector function
2727 only for fields whose type does not mention the existentially-quantified variables</emphasis>.
2728 (This example used an underscore in the fields for which record selectors
2729 will not be defined, but that is only programming style; GHC ignores them.)
2730 </para>
2731
2732 <para>
2733 To make use of these hidden fields, we need to create some helper functions:
2734
2735 <programlisting>
2736 inc :: Counter a -> Counter a
2737 inc (NewCounter x i d t) = NewCounter
2738 { _this = i x, _inc = i, _display = d, tag = t }
2739
2740 display :: Counter a -> IO ()
2741 display NewCounter{ _this = x, _display = d } = d x
2742 </programlisting>
2743
2744 Now we can define counters with different underlying implementations:
2745
2746 <programlisting>
2747 counterA :: Counter String
2748 counterA = NewCounter
2749 { _this = 0, _inc = (1+), _display = print, tag = "A" }
2750
2751 counterB :: Counter String
2752 counterB = NewCounter
2753 { _this = "", _inc = ('#':), _display = putStrLn, tag = "B" }
2754
2755 main = do
2756 display (inc counterA) -- prints "1"
2757 display (inc (inc counterB)) -- prints "##"
2758 </programlisting>
2759
2760 Record update syntax is supported for existentials (and GADTs):
2761 <programlisting>
2762 setTag :: Counter a -> a -> Counter a
2763 setTag obj t = obj{ tag = t }
2764 </programlisting>
2765 The rule for record update is this: <emphasis>
2766 the types of the updated fields may
2767 mention only the universally-quantified type variables
2768 of the data constructor. For GADTs, the field may mention only types
2769 that appear as a simple type-variable argument in the constructor's result
2770 type</emphasis>. For example:
2771 <programlisting>
2772 data T a b where { T1 { f1::a, f2::b, f3::(b,c) } :: T a b } -- c is existential
2773 upd1 t x = t { f1=x } -- OK: upd1 :: T a b -> a' -> T a' b
2774 upd2 t x = t { f3=x } -- BAD (f3's type mentions c, which is
2775 -- existentially quantified)
2776
2777 data G a b where { G1 { g1::a, g2::c } :: G a [c] }
2778 upd3 g x = g { g1=x } -- OK: upd3 :: G a b -> c -> G c b
2779 upd4 g x = g { g2=x } -- BAD (f2's type mentions c, which is not a simple
2780 -- type-variable argument in G1's result type)
2781 </programlisting>
2782 </para>
2783
2784 </sect3>
2785
2786
2787 <sect3>
2788 <title>Restrictions</title>
2789
2790 <para>
2791 There are several restrictions on the ways in which existentially-quantified
2792 constructors can be use.
2793 </para>
2794
2795 <para>
2796
2797 <itemizedlist>
2798 <listitem>
2799
2800 <para>
2801 When pattern matching, each pattern match introduces a new,
2802 distinct, type for each existential type variable. These types cannot
2803 be unified with any other type, nor can they escape from the scope of
2804 the pattern match. For example, these fragments are incorrect:
2805
2806
2807 <programlisting>
2808 f1 (MkFoo a f) = a
2809 </programlisting>
2810
2811
2812 Here, the type bound by <function>MkFoo</function> "escapes", because <literal>a</literal>
2813 is the result of <function>f1</function>. One way to see why this is wrong is to
2814 ask what type <function>f1</function> has:
2815
2816
2817 <programlisting>
2818 f1 :: Foo -> a -- Weird!
2819 </programlisting>
2820
2821
2822 What is this "<literal>a</literal>" in the result type? Clearly we don't mean
2823 this:
2824
2825
2826 <programlisting>
2827 f1 :: forall a. Foo -> a -- Wrong!
2828 </programlisting>
2829
2830
2831 The original program is just plain wrong. Here's another sort of error
2832
2833
2834 <programlisting>
2835 f2 (Baz1 a b) (Baz1 p q) = a==q
2836 </programlisting>
2837
2838
2839 It's ok to say <literal>a==b</literal> or <literal>p==q</literal>, but
2840 <literal>a==q</literal> is wrong because it equates the two distinct types arising
2841 from the two <function>Baz1</function> constructors.
2842
2843
2844 </para>
2845 </listitem>
2846 <listitem>
2847
2848 <para>
2849 You can't pattern-match on an existentially quantified
2850 constructor in a <literal>let</literal> or <literal>where</literal> group of
2851 bindings. So this is illegal:
2852
2853
2854 <programlisting>
2855 f3 x = a==b where { Baz1 a b = x }
2856 </programlisting>
2857
2858 Instead, use a <literal>case</literal> expression:
2859
2860 <programlisting>
2861 f3 x = case x of Baz1 a b -> a==b
2862 </programlisting>
2863
2864 In general, you can only pattern-match
2865 on an existentially-quantified constructor in a <literal>case</literal> expression or
2866 in the patterns of a function definition.
2867
2868 The reason for this restriction is really an implementation one.
2869 Type-checking binding groups is already a nightmare without
2870 existentials complicating the picture. Also an existential pattern
2871 binding at the top level of a module doesn't make sense, because it's
2872 not clear how to prevent the existentially-quantified type "escaping".
2873 So for now, there's a simple-to-state restriction. We'll see how
2874 annoying it is.
2875
2876 </para>
2877 </listitem>
2878 <listitem>
2879
2880 <para>
2881 You can't use existential quantification for <literal>newtype</literal>
2882 declarations. So this is illegal:
2883
2884
2885 <programlisting>
2886 newtype T = forall a. Ord a => MkT a
2887 </programlisting>
2888
2889
2890 Reason: a value of type <literal>T</literal> must be represented as a
2891 pair of a dictionary for <literal>Ord t</literal> and a value of type
2892 <literal>t</literal>. That contradicts the idea that
2893 <literal>newtype</literal> should have no concrete representation.
2894 You can get just the same efficiency and effect by using
2895 <literal>data</literal> instead of <literal>newtype</literal>. If
2896 there is no overloading involved, then there is more of a case for
2897 allowing an existentially-quantified <literal>newtype</literal>,
2898 because the <literal>data</literal> version does carry an
2899 implementation cost, but single-field existentially quantified
2900 constructors aren't much use. So the simple restriction (no
2901 existential stuff on <literal>newtype</literal>) stands, unless there
2902 are convincing reasons to change it.
2903
2904
2905 </para>
2906 </listitem>
2907 <listitem>
2908
2909 <para>
2910 You can't use <literal>deriving</literal> to define instances of a
2911 data type with existentially quantified data constructors.
2912
2913 Reason: in most cases it would not make sense. For example:;
2914
2915 <programlisting>
2916 data T = forall a. MkT [a] deriving( Eq )
2917 </programlisting>
2918
2919 To derive <literal>Eq</literal> in the standard way we would need to have equality
2920 between the single component of two <function>MkT</function> constructors:
2921
2922 <programlisting>
2923 instance Eq T where
2924 (MkT a) == (MkT b) = ???
2925 </programlisting>
2926
2927 But <varname>a</varname> and <varname>b</varname> have distinct types, and so can't be compared.
2928 It's just about possible to imagine examples in which the derived instance
2929 would make sense, but it seems altogether simpler simply to prohibit such
2930 declarations. Define your own instances!
2931 </para>
2932 </listitem>
2933
2934 </itemizedlist>
2935
2936 </para>
2937
2938 </sect3>
2939 </sect2>
2940
2941 <!-- ====================== Generalised algebraic data types ======================= -->
2942
2943 <sect2 id="gadt-style">
2944 <title>Declaring data types with explicit constructor signatures</title>
2945
2946 <para>When the <literal>GADTSyntax</literal> extension is enabled,
2947 GHC allows you to declare an algebraic data type by
2948 giving the type signatures of constructors explicitly. For example:
2949 <programlisting>
2950 data Maybe a where
2951 Nothing :: Maybe a
2952 Just :: a -> Maybe a
2953 </programlisting>
2954 The form is called a "GADT-style declaration"
2955 because Generalised Algebraic Data Types, described in <xref linkend="gadt"/>,
2956 can only be declared using this form.</para>
2957 <para>Notice that GADT-style syntax generalises existential types (<xref linkend="existential-quantification"/>).
2958 For example, these two declarations are equivalent:
2959 <programlisting>
2960 data Foo = forall a. MkFoo a (a -> Bool)
2961 data Foo' where { MKFoo :: a -> (a->Bool) -> Foo' }
2962 </programlisting>
2963 </para>
2964 <para>Any data type that can be declared in standard Haskell-98 syntax
2965 can also be declared using GADT-style syntax.
2966 The choice is largely stylistic, but GADT-style declarations differ in one important respect:
2967 they treat class constraints on the data constructors differently.
2968 Specifically, if the constructor is given a type-class context, that
2969 context is made available by pattern matching. For example:
2970 <programlisting>
2971 data Set a where
2972 MkSet :: Eq a => [a] -> Set a
2973
2974 makeSet :: Eq a => [a] -> Set a
2975 makeSet xs = MkSet (nub xs)
2976
2977 insert :: a -> Set a -> Set a
2978 insert a (MkSet as) | a `elem` as = MkSet as
2979 | otherwise = MkSet (a:as)
2980 </programlisting>
2981 A use of <literal>MkSet</literal> as a constructor (e.g. in the definition of <literal>makeSet</literal>)
2982 gives rise to a <literal>(Eq a)</literal>
2983 constraint, as you would expect. The new feature is that pattern-matching on <literal>MkSet</literal>
2984 (as in the definition of <literal>insert</literal>) makes <emphasis>available</emphasis> an <literal>(Eq a)</literal>
2985 context. In implementation terms, the <literal>MkSet</literal> constructor has a hidden field that stores
2986 the <literal>(Eq a)</literal> dictionary that is passed to <literal>MkSet</literal>; so
2987 when pattern-matching that dictionary becomes available for the right-hand side of the match.
2988 In the example, the equality dictionary is used to satisfy the equality constraint
2989 generated by the call to <literal>elem</literal>, so that the type of
2990 <literal>insert</literal> itself has no <literal>Eq</literal> constraint.
2991 </para>
2992 <para>
2993 For example, one possible application is to reify dictionaries:
2994 <programlisting>
2995 data NumInst a where
2996 MkNumInst :: Num a => NumInst a
2997
2998 intInst :: NumInst Int
2999 intInst = MkNumInst
3000
3001 plus :: NumInst a -> a -> a -> a
3002 plus MkNumInst p q = p + q
3003 </programlisting>
3004 Here, a value of type <literal>NumInst a</literal> is equivalent
3005 to an explicit <literal>(Num a)</literal> dictionary.
3006 </para>
3007 <para>
3008 All this applies to constructors declared using the syntax of <xref linkend="existential-with-context"/>.
3009 For example, the <literal>NumInst</literal> data type above could equivalently be declared
3010 like this:
3011 <programlisting>
3012 data NumInst a
3013 = Num a => MkNumInst (NumInst a)
3014 </programlisting>
3015 Notice that, unlike the situation when declaring an existential, there is
3016 no <literal>forall</literal>, because the <literal>Num</literal> constrains the
3017 data type's universally quantified type variable <literal>a</literal>.
3018 A constructor may have both universal and existential type variables: for example,
3019 the following two declarations are equivalent:
3020 <programlisting>
3021 data T1 a
3022 = forall b. (Num a, Eq b) => MkT1 a b
3023 data T2 a where
3024 MkT2 :: (Num a, Eq b) => a -> b -> T2 a
3025 </programlisting>
3026 </para>
3027 <para>All this behaviour contrasts with Haskell 98's peculiar treatment of
3028 contexts on a data type declaration (Section 4.2.1 of the Haskell 98 Report).
3029 In Haskell 98 the definition
3030 <programlisting>
3031 data Eq a => Set' a = MkSet' [a]
3032 </programlisting>
3033 gives <literal>MkSet'</literal> the same type as <literal>MkSet</literal> above. But instead of
3034 <emphasis>making available</emphasis> an <literal>(Eq a)</literal> constraint, pattern-matching
3035 on <literal>MkSet'</literal> <emphasis>requires</emphasis> an <literal>(Eq a)</literal> constraint!
3036 GHC faithfully implements this behaviour, odd though it is. But for GADT-style declarations,
3037 GHC's behaviour is much more useful, as well as much more intuitive.
3038 </para>
3039
3040 <para>
3041 The rest of this section gives further details about GADT-style data
3042 type declarations.
3043
3044 <itemizedlist>
3045 <listitem><para>
3046 The result type of each data constructor must begin with the type constructor being defined.
3047 If the result type of all constructors
3048 has the form <literal>T a1 ... an</literal>, where <literal>a1 ... an</literal>
3049 are distinct type variables, then the data type is <emphasis>ordinary</emphasis>;
3050 otherwise is a <emphasis>generalised</emphasis> data type (<xref linkend="gadt"/>).
3051 </para></listitem>
3052
3053 <listitem><para>
3054 As with other type signatures, you can give a single signature for several data constructors.
3055 In this example we give a single signature for <literal>T1</literal> and <literal>T2</literal>:
3056 <programlisting>
3057 data T a where
3058 T1,T2 :: a -> T a
3059 T3 :: T a
3060 </programlisting>
3061 </para></listitem>
3062
3063 <listitem><para>
3064 The type signature of
3065 each constructor is independent, and is implicitly universally quantified as usual.
3066 In particular, the type variable(s) in the "<literal>data T a where</literal>" header
3067 have no scope, and different constructors may have different universally-quantified type variables:
3068 <programlisting>
3069 data T a where -- The 'a' has no scope
3070 T1,T2 :: b -> T b -- Means forall b. b -> T b
3071 T3 :: T a -- Means forall a. T a
3072 </programlisting>
3073 </para></listitem>
3074
3075 <listitem><para>
3076 A constructor signature may mention type class constraints, which can differ for
3077 different constructors. For example, this is fine:
3078 <programlisting>
3079 data T a where
3080 T1 :: Eq b => b -> b -> T b
3081 T2 :: (Show c, Ix c) => c -> [c] -> T c
3082 </programlisting>
3083 When pattern matching, these constraints are made available to discharge constraints
3084 in the body of the match. For example:
3085 <programlisting>
3086 f :: T a -> String
3087 f (T1 x y) | x==y = "yes"
3088 | otherwise = "no"
3089 f (T2 a b) = show a
3090 </programlisting>
3091 Note that <literal>f</literal> is not overloaded; the <literal>Eq</literal> constraint arising
3092 from the use of <literal>==</literal> is discharged by the pattern match on <literal>T1</literal>
3093 and similarly the <literal>Show</literal> constraint arising from the use of <literal>show</literal>.
3094 </para></listitem>
3095
3096 <listitem><para>
3097 Unlike a Haskell-98-style
3098 data type declaration, the type variable(s) in the "<literal>data Set a where</literal>" header
3099 have no scope. Indeed, one can write a kind signature instead:
3100 <programlisting>
3101 data Set :: * -> * where ...
3102 </programlisting>
3103 or even a mixture of the two:
3104 <programlisting>
3105 data Bar a :: (* -> *) -> * where ...
3106 </programlisting>
3107 The type variables (if given) may be explicitly kinded, so we could also write the header for <literal>Foo</literal>
3108 like this:
3109 <programlisting>
3110 data Bar a (b :: * -> *) where ...
3111 </programlisting>
3112 </para></listitem>
3113
3114
3115 <listitem><para>
3116 You can use strictness annotations, in the obvious places
3117 in the constructor type:
3118 <programlisting>
3119 data Term a where
3120 Lit :: !Int -> Term Int
3121 If :: Term Bool -> !(Term a) -> !(Term a) -> Term a
3122 Pair :: Term a -> Term b -> Term (a,b)
3123 </programlisting>
3124 </para></listitem>
3125
3126 <listitem><para>
3127 You can use a <literal>deriving</literal> clause on a GADT-style data type
3128 declaration. For example, these two declarations are equivalent
3129 <programlisting>
3130 data Maybe1 a where {
3131 Nothing1 :: Maybe1 a ;
3132 Just1 :: a -> Maybe1 a
3133 } deriving( Eq, Ord )
3134
3135 data Maybe2 a = Nothing2 | Just2 a
3136 deriving( Eq, Ord )
3137 </programlisting>
3138 </para></listitem>
3139
3140 <listitem><para>
3141 The type signature may have quantified type variables that do not appear
3142 in the result type:
3143 <programlisting>
3144 data Foo where
3145 MkFoo :: a -> (a->Bool) -> Foo
3146 Nil :: Foo
3147 </programlisting>
3148 Here the type variable <literal>a</literal> does not appear in the result type
3149 of either constructor.
3150 Although it is universally quantified in the type of the constructor, such
3151 a type variable is often called "existential".
3152 Indeed, the above declaration declares precisely the same type as
3153 the <literal>data Foo</literal> in <xref linkend="existential-quantification"/>.
3154 </para><para>
3155 The type may contain a class context too, of course:
3156 <programlisting>
3157 data Showable where
3158 MkShowable :: Show a => a -> Showable
3159 </programlisting>
3160 </para></listitem>
3161
3162 <listitem><para>
3163 You can use record syntax on a GADT-style data type declaration:
3164
3165 <programlisting>
3166 data Person where
3167 Adult :: { name :: String, children :: [Person] } -> Person
3168 Child :: Show a => { name :: !String, funny :: a } -> Person
3169 </programlisting>
3170 As usual, for every constructor that has a field <literal>f</literal>, the type of
3171 field <literal>f</literal> must be the same (modulo alpha conversion).
3172 The <literal>Child</literal> constructor above shows that the signature
3173 may have a context, existentially-quantified variables, and strictness annotations,
3174 just as in the non-record case. (NB: the "type" that follows the double-colon
3175 is not really a type, because of the record syntax and strictness annotations.
3176 A "type" of this form can appear only in a constructor signature.)
3177 </para></listitem>
3178
3179 <listitem><para>
3180 Record updates are allowed with GADT-style declarations,
3181 only fields that have the following property: the type of the field
3182 mentions no existential type variables.
3183 </para></listitem>
3184
3185 <listitem><para>
3186 As in the case of existentials declared using the Haskell-98-like record syntax
3187 (<xref linkend="existential-records"/>),
3188 record-selector functions are generated only for those fields that have well-typed
3189 selectors.
3190 Here is the example of that section, in GADT-style syntax:
3191 <programlisting>
3192 data Counter a where
3193 NewCounter :: { _this :: self
3194 , _inc :: self -> self
3195 , _display :: self -> IO ()
3196 , tag :: a
3197 } -> Counter a
3198 </programlisting>
3199 As before, only one selector function is generated here, that for <literal>tag</literal>.
3200 Nevertheless, you can still use all the field names in pattern matching and record construction.
3201 </para></listitem>
3202
3203 <listitem><para>
3204 In a GADT-style data type declaration there is no obvious way to specify that a data constructor
3205 should be infix, which makes a difference if you derive <literal>Show</literal> for the type.
3206 (Data constructors declared infix are displayed infix by the derived <literal>show</literal>.)
3207 So GHC implements the following design: a data constructor declared in a GADT-style data type
3208 declaration is displayed infix by <literal>Show</literal> iff (a) it is an operator symbol,
3209 (b) it has two arguments, (c) it has a programmer-supplied fixity declaration. For example
3210 <programlisting>
3211 infix 6 (:--:)
3212 data T a where
3213 (:--:) :: Int -> Bool -> T Int
3214 </programlisting>
3215 </para></listitem>
3216 </itemizedlist></para>
3217 </sect2>
3218
3219 <sect2 id="gadt">
3220 <title>Generalised Algebraic Data Types (GADTs)</title>
3221
3222 <para>Generalised Algebraic Data Types generalise ordinary algebraic data types
3223 by allowing constructors to have richer return types. Here is an example:
3224 <programlisting>
3225 data Term a where
3226 Lit :: Int -> Term Int
3227 Succ :: Term Int -> Term Int
3228 IsZero :: Term Int -> Term Bool
3229 If :: Term Bool -> Term a -> Term a -> Term a
3230 Pair :: Term a -> Term b -> Term (a,b)
3231 </programlisting>
3232 Notice that the return type of the constructors is not always <literal>Term a</literal>, as is the
3233 case with ordinary data types. This generality allows us to
3234 write a well-typed <literal>eval</literal> function
3235 for these <literal>Terms</literal>:
3236 <programlisting>
3237 eval :: Term a -> a
3238 eval (Lit i) = i
3239 eval (Succ t) = 1 + eval t
3240 eval (IsZero t) = eval t == 0
3241 eval (If b e1 e2) = if eval b then eval e1 else eval e2
3242 eval (Pair e1 e2) = (eval e1, eval e2)
3243 </programlisting>
3244 The key point about GADTs is that <emphasis>pattern matching causes type refinement</emphasis>.
3245 For example, in the right hand side of the equation
3246 <programlisting>
3247 eval :: Term a -> a
3248 eval (Lit i) = ...
3249 </programlisting>
3250 the type <literal>a</literal> is refined to <literal>Int</literal>. That's the whole point!
3251 A precise specification of the type rules is beyond what this user manual aspires to,
3252 but the design closely follows that described in
3253 the paper <ulink
3254 url="http://research.microsoft.com/%7Esimonpj/papers/gadt/">Simple
3255 unification-based type inference for GADTs</ulink>,
3256 (ICFP 2006).
3257 The general principle is this: <emphasis>type refinement is only carried out
3258 based on user-supplied type annotations</emphasis>.
3259 So if no type signature is supplied for <literal>eval</literal>, no type refinement happens,
3260 and lots of obscure error messages will
3261 occur. However, the refinement is quite general. For example, if we had:
3262 <programlisting>
3263 eval :: Term a -> a -> a
3264 eval (Lit i) j = i+j
3265 </programlisting>
3266 the pattern match causes the type <literal>a</literal> to be refined to <literal>Int</literal> (because of the type
3267 of the constructor <literal>Lit</literal>), and that refinement also applies to the type of <literal>j</literal>, and
3268 the result type of the <literal>case</literal> expression. Hence the addition <literal>i+j</literal> is legal.
3269 </para>
3270 <para>
3271 These and many other examples are given in papers by Hongwei Xi, and
3272 Tim Sheard. There is a longer introduction
3273 <ulink url="http://www.haskell.org/haskellwiki/GADT">on the wiki</ulink>,
3274 and Ralf Hinze's
3275 <ulink url="http://www.informatik.uni-bonn.de/~ralf/publications/With.pdf">Fun with phantom types</ulink> also has a number of examples. Note that papers
3276 may use different notation to that implemented in GHC.
3277 </para>
3278 <para>
3279 The rest of this section outlines the extensions to GHC that support GADTs. The extension is enabled with
3280 <option>-XGADTs</option>. The <option>-XGADTs</option> flag also sets <option>-XRelaxedPolyRec</option>.
3281 <itemizedlist>
3282 <listitem><para>
3283 A GADT can only be declared using GADT-style syntax (<xref linkend="gadt-style"/>);
3284 the old Haskell-98 syntax for data declarations always declares an ordinary data type.
3285 The result type of each constructor must begin with the type constructor being defined,
3286 but for a GADT the arguments to the type constructor can be arbitrary monotypes.
3287 For example, in the <literal>Term</literal> data
3288 type above, the type of each constructor must end with <literal>Term ty</literal>, but
3289 the <literal>ty</literal> need not be a type variable (e.g. the <literal>Lit</literal>
3290 constructor).
3291 </para></listitem>
3292
3293 <listitem><para>
3294 It is permitted to declare an ordinary algebraic data type using GADT-style syntax.
3295 What makes a GADT into a GADT is not the syntax, but rather the presence of data constructors
3296 whose result type is not just <literal>T a b</literal>.
3297 </para></listitem>
3298
3299 <listitem><para>
3300 You cannot use a <literal>deriving</literal> clause for a GADT; only for
3301 an ordinary data type.
3302 </para></listitem>
3303
3304 <listitem><para>
3305 As mentioned in <xref linkend="gadt-style"/>, record syntax is supported.
3306 For example:
3307 <programlisting>
3308 data Term a where
3309 Lit :: { val :: Int } -> Term Int
3310 Succ :: { num :: Term Int } -> Term Int
3311 Pred :: { num :: Term Int } -> Term Int
3312 IsZero :: { arg :: Term Int } -> Term Bool
3313 Pair :: { arg1 :: Term a
3314 , arg2 :: Term b
3315 } -> Term (a,b)
3316 If :: { cnd :: Term Bool
3317 , tru :: Term a
3318 , fls :: Term a
3319 } -> Term a
3320 </programlisting>
3321 However, for GADTs there is the following additional constraint:
3322 every constructor that has a field <literal>f</literal> must have
3323 the same result type (modulo alpha conversion)
3324 Hence, in the above example, we cannot merge the <literal>num</literal>
3325 and <literal>arg</literal> fields above into a
3326 single name. Although their field types are both <literal>Term Int</literal>,
3327 their selector functions actually have different types:
3328
3329 <programlisting>
3330 num :: Term Int -> Term Int
3331 arg :: Term Bool -> Term Int
3332 </programlisting>
3333 </para></listitem>
3334
3335 <listitem><para>
3336 When pattern-matching against data constructors drawn from a GADT,
3337 for example in a <literal>case</literal> expression, the following rules apply:
3338 <itemizedlist>
3339 <listitem><para>The type of the scrutinee must be rigid.</para></listitem>
3340 <listitem><para>The type of the entire <literal>case</literal> expression must be rigid.</para></listitem>
3341 <listitem><para>The type of any free variable mentioned in any of
3342 the <literal>case</literal> alternatives must be rigid.</para></listitem>
3343 </itemizedlist>
3344 A type is "rigid" if it is completely known to the compiler at its binding site. The easiest
3345 way to ensure that a variable a rigid type is to give it a type signature.
3346 For more precise details see <ulink url="http://research.microsoft.com/%7Esimonpj/papers/gadt">
3347 Simple unification-based type inference for GADTs
3348 </ulink>. The criteria implemented by GHC are given in the Appendix.
3349
3350 </para></listitem>
3351
3352 </itemizedlist>
3353 </para>
3354
3355 </sect2>
3356 </sect1>
3357
3358 <!-- ====================== End of Generalised algebraic data types ======================= -->
3359
3360 <sect1 id="deriving">
3361 <title>Extensions to the "deriving" mechanism</title>
3362
3363 <sect2 id="deriving-inferred">
3364 <title>Inferred context for deriving clauses</title>
3365
3366 <para>
3367 The Haskell Report is vague about exactly when a <literal>deriving</literal> clause is
3368 legal. For example:
3369 <programlisting>
3370 data T0 f a = MkT0 a deriving( Eq )
3371 data T1 f a = MkT1 (f a) deriving( Eq )
3372 data T2 f a = MkT2 (f (f a)) deriving( Eq )
3373 </programlisting>
3374 The natural generated <literal>Eq</literal> code would result in these instance declarations:
3375 <programlisting>
3376 instance Eq a => Eq (T0 f a) where ...
3377 instance Eq (f a) => Eq (T1 f a) where ...
3378 instance Eq (f (f a)) => Eq (T2 f a) where ...
3379 </programlisting>
3380 The first of these is obviously fine. The second is still fine, although less obviously.
3381 The third is not Haskell 98, and risks losing termination of instances.
3382 </para>
3383 <para>
3384 GHC takes a conservative position: it accepts the first two, but not the third. The rule is this:
3385 each constraint in the inferred instance context must consist only of type variables,
3386 with no repetitions.
3387 </para>
3388 <para>
3389 This rule is applied regardless of flags. If you want a more exotic context, you can write
3390 it yourself, using the <link linkend="stand-alone-deriving">standalone deriving mechanism</link>.
3391 </para>
3392 </sect2>
3393
3394 <sect2 id="stand-alone-deriving">
3395 <title>Stand-alone deriving declarations</title>
3396
3397 <para>
3398 GHC now allows stand-alone <literal>deriving</literal> declarations, enabled by <literal>-XStandaloneDeriving</literal>:
3399 <programlisting>
3400 data Foo a = Bar a | Baz String
3401
3402 deriving instance Eq a => Eq (Foo a)
3403 </programlisting>
3404 The syntax is identical to that of an ordinary instance declaration apart from (a) the keyword
3405 <literal>deriving</literal>, and (b) the absence of the <literal>where</literal> part.
3406 Note the following points:
3407 <itemizedlist>
3408 <listitem><para>
3409 You must supply an explicit context (in the example the context is <literal>(Eq a)</literal>),
3410 exactly as you would in an ordinary instance declaration.
3411 (In contrast, in a <literal>deriving</literal> clause
3412 attached to a data type declaration, the context is inferred.)
3413 </para></listitem>
3414
3415 <listitem><para>
3416 A <literal>deriving instance</literal> declaration
3417 must obey the same rules concerning form and termination as ordinary instance declarations,
3418 controlled by the same flags; see <xref linkend="instance-decls"/>.
3419 </para></listitem>
3420
3421 <listitem><para>
3422 Unlike a <literal>deriving</literal>
3423 declaration attached to a <literal>data</literal> declaration, the instance can be more specific
3424 than the data type (assuming you also use
3425 <literal>-XFlexibleInstances</literal>, <xref linkend="instance-rules"/>). Consider
3426 for example
3427 <programlisting>
3428 data Foo a = Bar a | Baz String
3429
3430 deriving instance Eq a => Eq (Foo [a])
3431 deriving instance Eq a => Eq (Foo (Maybe a))
3432 </programlisting>
3433 This will generate a derived instance for <literal>(Foo [a])</literal> and <literal>(Foo (Maybe a))</literal>,
3434 but other types such as <literal>(Foo (Int,Bool))</literal> will not be an instance of <literal>Eq</literal>.
3435 </para></listitem>
3436
3437 <listitem><para>
3438 Unlike a <literal>deriving</literal>
3439 declaration attached to a <literal>data</literal> declaration,
3440 GHC does not restrict the form of the data type. Instead, GHC simply generates the appropriate
3441 boilerplate code for the specified class, and typechecks it. If there is a type error, it is
3442 your problem. (GHC will show you the offending code if it has a type error.)
3443 The merit of this is that you can derive instances for GADTs and other exotic
3444 data types, providing only that the boilerplate code does indeed typecheck. For example:
3445 <programlisting>
3446 data T a where
3447 T1 :: T Int
3448 T2 :: T Bool
3449
3450 deriving instance Show (T a)
3451 </programlisting>
3452 In this example, you cannot say <literal>... deriving( Show )</literal> on the
3453 data type declaration for <literal>T</literal>,
3454 because <literal>T</literal> is a GADT, but you <emphasis>can</emphasis> generate
3455 the instance declaration using stand-alone deriving.
3456 </para>
3457 </listitem>
3458
3459 <listitem>
3460 <para>The stand-alone syntax is generalised for newtypes in exactly the same
3461 way that ordinary <literal>deriving</literal> clauses are generalised (<xref linkend="newtype-deriving"/>).
3462 For example:
3463 <programlisting>
3464 newtype Foo a = MkFoo (State Int a)
3465
3466 deriving instance MonadState Int Foo
3467 </programlisting>
3468 GHC always treats the <emphasis>last</emphasis> parameter of the instance
3469 (<literal>Foo</literal> in this example) as the type whose instance is being derived.
3470 </para></listitem>
3471 </itemizedlist></para>
3472
3473 </sect2>
3474
3475
3476 <sect2 id="deriving-typeable">
3477 <title>Deriving clause for extra classes (<literal>Typeable</literal>, <literal>Data</literal>, etc)</title>
3478
3479 <para>
3480 Haskell 98 allows the programmer to add "<literal>deriving( Eq, Ord )</literal>" to a data type
3481 declaration, to generate a standard instance declaration for classes specified in the <literal>deriving</literal> clause.
3482 In Haskell 98, the only classes that may appear in the <literal>deriving</literal> clause are the standard
3483 classes <literal>Eq</literal>, <literal>Ord</literal>,
3484 <literal>Enum</literal>, <literal>Ix</literal>, <literal>Bounded</literal>, <literal>Read</literal>, and <literal>Show</literal>.
3485 </para>
3486 <para>
3487 GHC extends this list with several more classes that may be automatically derived:
3488 <itemizedlist>
3489 <listitem><para> With <option>-XDeriveDataTypeable</option>, you can derive instances of the classes
3490 <literal>Typeable</literal>, and <literal>Data</literal>, defined in the library
3491 modules <literal>Data.Typeable</literal> and <literal>Data.Data</literal> respectively.
3492 </para>
3493 <para>Since GHC 7.8.1, <literal>Typeable</literal> is kind-polymorphic (see
3494 <xref linkend="kind-polymorphism"/>) and can be derived for any datatype and
3495 type class. Instances for datatypes can be derived by attaching a
3496 <literal>deriving Typeable</literal> clause to the datatype declaration, or by
3497 using standalone deriving (see <xref linkend="stand-alone-deriving"/>).
3498 Instances for type classes can only be derived using standalone deriving.
3499 For data families, <literal>Typeable</literal> should only be derived for the
3500 uninstantiated family type; each instance will then automatically have a
3501 <literal>Typeable</literal> instance too.
3502 See also <xref linkend="auto-derive-typeable"/>.
3503 </para>
3504 <para>
3505 Also since GHC 7.8.1, handwritten (ie. not derived) instances of
3506 <literal>Typeable</literal> are forbidden, and will result in an error.
3507 </para>
3508 </listitem>
3509
3510 <listitem><para> With <option>-XDeriveGeneric</option>, you can derive
3511 instances of the classes <literal>Generic</literal> and
3512 <literal>Generic1</literal>, defined in <literal>GHC.Generics</literal>.
3513 You can use these to define generic functions,
3514 as described in <xref linkend="generic-programming"/>.
3515 </para></listitem>
3516
3517 <listitem><para> With <option>-XDeriveFunctor</option>, you can derive instances of
3518 the class <literal>Functor</literal>,
3519 defined in <literal>GHC.Base</literal>.
3520 </para></listitem>
3521
3522 <listitem><para> With <option>-XDeriveFoldable</option>, you can derive instances of
3523 the class <literal>Foldable</literal>,
3524 defined in <literal>Data.Foldable</literal>.
3525 </para></listitem>
3526
3527 <listitem><para> With <option>-XDeriveTraversable</option>, you can derive instances of
3528 the class <literal>Traversable</literal>,
3529 defined in <literal>Data.Traversable</literal>.
3530 </para></listitem>
3531 </itemizedlist>
3532 In each case the appropriate class must be in scope before it
3533 can be mentioned in the <literal>deriving</literal> clause.
3534 </para>
3535 </sect2>
3536
3537 <sect2 id="auto-derive-typeable">
3538 <title>Automatically deriving <literal>Typeable</literal> instances</title>
3539
3540 <para>
3541 The flag <option>-XAutoDeriveTypeable</option> triggers the generation
3542 of derived <literal>Typeable</literal> instances for every datatype and type
3543 class declaration in the module it is used. It will also generate
3544 <literal>Typeable</literal> instances for any promoted data constructors
3545 (<xref linkend="promotion"/>). This flag implies
3546 <option>-XDeriveDataTypeable</option> (<xref linkend="deriving-typeable"/>).
3547 </para>
3548
3549 </sect2>
3550
3551 <sect2 id="newtype-deriving">
3552 <title>Generalised derived instances for newtypes</title>
3553
3554 <para>
3555 When you define an abstract type using <literal>newtype</literal>, you may want
3556 the new type to inherit some instances from its representation. In
3557 Haskell 98, you can inherit instances of <literal>Eq</literal>, <literal>Ord</literal>,
3558 <literal>Enum</literal> and <literal>Bounded</literal> by deriving them, but for any
3559 other classes you have to write an explicit instance declaration. For
3560 example, if you define
3561
3562 <programlisting>
3563 newtype Dollars = Dollars Int
3564 </programlisting>
3565
3566 and you want to use arithmetic on <literal>Dollars</literal>, you have to
3567 explicitly define an instance of <literal>Num</literal>:
3568
3569 <programlisting>
3570 instance Num Dollars where
3571 Dollars a + Dollars b = Dollars (a+b)
3572 ...
3573 </programlisting>
3574 All the instance does is apply and remove the <literal>newtype</literal>
3575 constructor. It is particularly galling that, since the constructor
3576 doesn't appear at run-time, this instance declaration defines a
3577 dictionary which is <emphasis>wholly equivalent</emphasis> to the <literal>Int</literal>
3578 dictionary, only slower!
3579 </para>
3580
3581
3582 <sect3 id="generalized-newtype-deriving"> <title> Generalising the deriving clause </title>
3583 <para>
3584 GHC now permits such instances to be derived instead,
3585 using the flag <option>-XGeneralizedNewtypeDeriving</option>,
3586 so one can write
3587 <programlisting>
3588 newtype Dollars = Dollars Int deriving (Eq,Show,Num)
3589 </programlisting>
3590
3591 and the implementation uses the <emphasis>same</emphasis> <literal>Num</literal> dictionary
3592 for <literal>Dollars</literal> as for <literal>Int</literal>. Notionally, the compiler
3593 derives an instance declaration of the form
3594
3595 <programlisting>
3596 instance Num Int => Num Dollars
3597 </programlisting>
3598
3599 which just adds or removes the <literal>newtype</literal> constructor according to the type.
3600 </para>
3601 <para>
3602
3603 We can also derive instances of constructor classes in a similar
3604 way. For example, suppose we have implemented state and failure monad
3605 transformers, such that
3606
3607 <programlisting>
3608 instance Monad m => Monad (State s m)
3609 instance Monad m => Monad (Failure m)
3610 </programlisting>
3611 In Haskell 98, we can define a parsing monad by
3612 <programlisting>
3613 type Parser tok m a = State [tok] (Failure m) a
3614 </programlisting>
3615
3616 which is automatically a monad thanks to the instance declarations
3617 above. With the extension, we can make the parser type abstract,
3618 without needing to write an instance of class <literal>Monad</literal>, via
3619
3620 <programlisting>
3621 newtype Parser tok m a = Parser (State [tok] (Failure m) a)
3622 deriving Monad
3623 </programlisting>
3624 In this case the derived instance declaration is of the form
3625 <programlisting>
3626 instance Monad (State [tok] (Failure m)) => Monad (Parser tok m)
3627 </programlisting>
3628
3629 Notice that, since <literal>Monad</literal> is a constructor class, the
3630 instance is a <emphasis>partial application</emphasis> of the new type, not the
3631 entire left hand side. We can imagine that the type declaration is
3632 "eta-converted" to generate the context of the instance
3633 declaration.
3634 </para>
3635 <para>
3636
3637 We can even derive instances of multi-parameter classes, provided the
3638 newtype is the last class parameter. In this case, a ``partial
3639 application'' of the class appears in the <literal>deriving</literal>
3640 clause. For example, given the class
3641
3642 <programlisting>
3643 class StateMonad s m | m -> s where ...
3644 instance Monad m => StateMonad s (State s m) where ...
3645 </programlisting>
3646 then we can derive an instance of <literal>StateMonad</literal> for <literal>Parser</literal>s by
3647 <programlisting>
3648 newtype Parser tok m a = Parser (State [tok] (Failure m) a)
3649 deriving (Monad, StateMonad [tok])
3650 </programlisting>
3651
3652 The derived instance is obtained by completing the application of the
3653 class to the new type:
3654
3655 <programlisting>
3656 instance StateMonad [tok] (State [tok] (Failure m)) =>
3657 StateMonad [tok] (Parser tok m)
3658 </programlisting>
3659 </para>
3660 <para>
3661
3662 As a result of this extension, all derived instances in newtype
3663 declarations are treated uniformly (and implemented just by reusing
3664 the dictionary for the representation type), <emphasis>except</emphasis>
3665 <literal>Show</literal> and <literal>Read</literal>, which really behave differently for
3666 the newtype and its representation.
3667 </para>
3668 </sect3>
3669
3670 <sect3> <title> A more precise specification </title>
3671 <para>
3672 Derived instance declarations are constructed as follows. Consider the
3673 declaration (after expansion of any type synonyms)
3674
3675 <programlisting>
3676 newtype T v1...vn = T' (t vk+1...vn) deriving (c1...cm)
3677 </programlisting>
3678
3679 where
3680 <itemizedlist>
3681 <listitem><para>
3682 The <literal>ci</literal> are partial applications of
3683 classes of the form <literal>C t1'...tj'</literal>, where the arity of <literal>C</literal>
3684 is exactly <literal>j+1</literal>. That is, <literal>C</literal> lacks exactly one type argument.
3685 </para></listitem>
3686 <listitem><para>
3687 The <literal>k</literal> is chosen so that <literal>ci (T v1...vk)</literal> is well-kinded.
3688 </para></listitem>
3689 <listitem><para>
3690 The type <literal>t</literal> is an arbitrary type.
3691 </para></listitem>
3692 <listitem><para>
3693 The type variables <literal>vk+1...vn</literal> do not occur in <literal>t</literal>,
3694 nor in the <literal>ci</literal>, and
3695 </para></listitem>
3696 <listitem><para>
3697 None of the <literal>ci</literal> is <literal>Read</literal>, <literal>Show</literal>,
3698 <literal>Typeable</literal>, or <literal>Data</literal>. These classes
3699 should not "look through" the type or its constructor. You can still
3700 derive these classes for a newtype, but it happens in the usual way, not
3701 via this new mechanism.
3702 </para></listitem>
3703 <listitem><para>
3704 The role of the last parameter of each of the <literal>ci</literal> is <emphasis>not</emphasis> <literal>N</literal>. (See <xref linkend="roles"/>.)</para></listitem>
3705 </itemizedlist>
3706 Then, for each <literal>ci</literal>, the derived instance
3707 declaration is:
3708 <programlisting>
3709 instance ci t => ci (T v1...vk)
3710 </programlisting>
3711 As an example which does <emphasis>not</emphasis> work, consider
3712 <programlisting>
3713 newtype NonMonad m s = NonMonad (State s m s) deriving Monad
3714 </programlisting>
3715 Here we cannot derive the instance
3716 <programlisting>
3717 instance Monad (State s m) => Monad (NonMonad m)
3718 </programlisting>
3719
3720 because the type variable <literal>s</literal> occurs in <literal>State s m</literal>,
3721 and so cannot be "eta-converted" away. It is a good thing that this
3722 <literal>deriving</literal> clause is rejected, because <literal>NonMonad m</literal> is
3723 not, in fact, a monad --- for the same reason. Try defining
3724 <literal>>>=</literal> with the correct type: you won't be able to.
3725 </para>
3726 <para>
3727
3728 Notice also that the <emphasis>order</emphasis> of class parameters becomes
3729 important, since we can only derive instances for the last one. If the
3730 <literal>StateMonad</literal> class above were instead defined as
3731
3732 <programlisting>
3733 class StateMonad m s | m -> s where ...
3734 </programlisting>
3735
3736 then we would not have been able to derive an instance for the
3737 <literal>Parser</literal> type above. We hypothesise that multi-parameter
3738 classes usually have one "main" parameter for which deriving new
3739 instances is most interesting.
3740 </para>
3741 <para>Lastly, all of this applies only for classes other than
3742 <literal>Read</literal>, <literal>Show</literal>, <literal>Typeable</literal>,
3743 and <literal>Data</literal>, for which the built-in derivation applies (section
3744 4.3.3. of the Haskell Report).
3745 (For the standard classes <literal>Eq</literal>, <literal>Ord</literal>,
3746 <literal>Ix</literal>, and <literal>Bounded</literal> it is immaterial whether
3747 the standard method is used or the one described here.)
3748 </para>
3749 </sect3>
3750 </sect2>
3751 </sect1>
3752
3753
3754 <!-- TYPE SYSTEM EXTENSIONS -->
3755 <sect1 id="type-class-extensions">
3756 <title>Class and instances declarations</title>
3757
3758 <sect2 id="multi-param-type-classes">
3759 <title>Class declarations</title>
3760
3761 <para>
3762 This section, and the next one, documents GHC's type-class extensions.
3763 There's lots of background in the paper <ulink
3764 url="http://research.microsoft.com/~simonpj/Papers/type-class-design-space/">Type
3765 classes: exploring the design space</ulink> (Simon Peyton Jones, Mark
3766 Jones, Erik Meijer).
3767 </para>
3768
3769 <sect3>
3770 <title>Multi-parameter type classes</title>
3771 <para>
3772 Multi-parameter type classes are permitted, with flag <option>-XMultiParamTypeClasses</option>.
3773 For example:
3774
3775
3776 <programlisting>
3777 class Collection c a where
3778 union :: c a -> c a -> c a
3779 ...etc.
3780 </programlisting>
3781
3782 </para>
3783 </sect3>
3784
3785 <sect3 id="superclass-rules">
3786 <title>The superclasses of a class declaration</title>
3787
3788 <para>
3789 In Haskell 98 the context of a class declaration (which introduces superclasses)
3790 must be simple; that is, each predicate must consist of a class applied to
3791 type variables. The flag <option>-XFlexibleContexts</option>
3792 (<xref linkend="flexible-contexts"/>)
3793 lifts this restriction,
3794 so that the only restriction on the context in a class declaration is
3795 that the class hierarchy must be acyclic. So these class declarations are OK:
3796
3797
3798 <programlisting>
3799 class Functor (m k) => FiniteMap m k where
3800 ...
3801
3802 class (Monad m, Monad (t m)) => Transform t m where
3803 lift :: m a -> (t m) a
3804 </programlisting>
3805
3806
3807 </para>
3808 <para>
3809 As in Haskell 98, The class hierarchy must be acyclic. However, the definition
3810 of "acyclic" involves only the superclass relationships. For example,
3811 this is OK:
3812
3813
3814 <programlisting>
3815 class C a where {
3816 op :: D b => a -> b -> b
3817 }
3818
3819 class C a => D a where { ... }
3820 </programlisting>
3821
3822
3823 Here, <literal>C</literal> is a superclass of <literal>D</literal>, but it's OK for a
3824 class operation <literal>op</literal> of <literal>C</literal> to mention <literal>D</literal>. (It
3825 would not be OK for <literal>D</literal> to be a superclass of <literal>C</literal>.)
3826 </para>
3827 <para>
3828 With the extension that adds a <link linkend="constraint-kind">kind of constraints</link>,
3829 you can write more exotic superclass definitions. The superclass cycle check is even more
3830 liberal in these case. For example, this is OK:
3831
3832 <programlisting>
3833 class A cls c where
3834 meth :: cls c => c -> c
3835
3836 class A B c => B c where
3837 </programlisting>
3838
3839 A superclass context for a class <literal>C</literal> is allowed if, after expanding
3840 type synonyms to their right-hand-sides, and uses of classes (other than <literal>C</literal>)
3841 to their superclasses, <literal>C</literal> does not occur syntactically in the context.
3842 </para>
3843 </sect3>
3844
3845
3846
3847
3848 <sect3 id="class-method-types">
3849 <title>Class method types</title>
3850
3851 <para>
3852 Haskell 98 prohibits class method types to mention constraints on the
3853 class type variable, thus:
3854 <programlisting>
3855 class Seq s a where
3856 fromList :: [a] -> s a
3857 elem :: Eq a => a -> s a -> Bool
3858 </programlisting>
3859 The type of <literal>elem</literal> is illegal in Haskell 98, because it
3860 contains the constraint <literal>Eq a</literal>, constrains only the
3861 class type variable (in this case <literal>a</literal>).
3862 GHC lifts this restriction (flag <option>-XConstrainedClassMethods</option>).
3863 </para>
3864
3865
3866 </sect3>
3867
3868
3869 <sect3 id="class-default-signatures">
3870 <title>Default method signatures</title>
3871
3872 <para>
3873 Haskell 98 allows you to define a default implementation when declaring a class:
3874 <programlisting>
3875 class Enum a where
3876 enum :: [a]
3877 enum = []
3878 </programlisting>
3879 The type of the <literal>enum</literal> method is <literal>[a]</literal>, and
3880 this is also the type of the default method. You can lift this restriction
3881 and give another type to the default method using the flag
3882 <option>-XDefaultSignatures</option>. For instance, if you have written a
3883 generic implementation of enumeration in a class <literal>GEnum</literal>
3884 with method <literal>genum</literal> in terms of <literal>GHC.Generics</literal>,
3885 you can specify a default method that uses that generic implementation:
3886 <programlisting>
3887 class Enum a where
3888 enum :: [a]
3889 default enum :: (Generic a, GEnum (Rep a)) => [a]
3890 enum = map to genum
3891 </programlisting>
3892 We reuse the keyword <literal>default</literal> to signal that a signature
3893 applies to the default method only; when defining instances of the
3894 <literal>Enum</literal> class, the original type <literal>[a]</literal> of
3895 <literal>enum</literal> still applies. When giving an empty instance, however,
3896 the default implementation <literal>map to genum</literal> is filled-in,
3897 and type-checked with the type
3898 <literal>(Generic a, GEnum (Rep a)) => [a]</literal>.
3899 </para>
3900
3901 <para>
3902 We use default signatures to simplify generic programming in GHC
3903 (<xref linkend="generic-programming"/>).
3904 </para>
3905
3906
3907 </sect3>
3908
3909 <sect3 id="nullary-type-classes">
3910 <title>Nullary type classes</title>
3911 Nullary (no parameter) type classes are enabled with <option>-XNullaryTypeClasses</option>.
3912 Since there are no available parameters, there can be at most one instance
3913 of a nullary class. A nullary type class might be used to document some assumption
3914 in a type signature (such as reliance on the Riemann hypothesis) or add some
3915 globally configurable settings in a program. For example,
3916
3917 <programlisting>
3918 class RiemannHypothesis where
3919 assumeRH :: a -> a
3920
3921 -- Deterministic version of the Miller test
3922 -- correctness depends on the generalized Riemann hypothesis
3923 isPrime :: RiemannHypothesis => Integer -> Bool
3924 isPrime n = assumeRH (...)
3925 </programlisting>
3926
3927 The type signature of <literal>isPrime</literal> informs users that its correctness
3928 depends on an unproven conjecture. If the function is used, the user has
3929 to acknowledge the dependence with:
3930
3931 <programlisting>
3932 instance RiemannHypothesis where
3933 assumeRH = id
3934 </programlisting>
3935
3936 </sect3>
3937 </sect2>
3938
3939 <sect2 id="functional-dependencies">
3940 <title>Functional dependencies
3941 </title>
3942
3943 <para> Functional dependencies are implemented as described by Mark Jones
3944 in &ldquo;<ulink url="http://citeseer.ist.psu.edu/jones00type.html">Type Classes with Functional Dependencies</ulink>&rdquo;, Mark P. Jones,
3945 In Proceedings of the 9th European Symposium on Programming,
3946 ESOP 2000, Berlin, Germany, March 2000, Springer-Verlag LNCS 1782,
3947 .
3948 </para>
3949 <para>
3950 Functional dependencies are introduced by a vertical bar in the syntax of a
3951 class declaration; e.g.
3952 <programlisting>
3953 class (Monad m) => MonadState s m | m -> s where ...
3954
3955 class Foo a b c | a b -> c where ...
3956 </programlisting>
3957 There should be more documentation, but there isn't (yet). Yell if you need it.
3958 </para>
3959
3960 <sect3><title>Rules for functional dependencies </title>
3961 <para>
3962 In a class declaration, all of the class type variables must be reachable (in the sense
3963 mentioned in <xref linkend="flexible-contexts"/>)
3964 from the free variables of each method type.
3965 For example:
3966
3967 <programlisting>
3968 class Coll s a where
3969 empty :: s
3970 insert :: s -> a -> s
3971 </programlisting>
3972
3973 is not OK, because the type of <literal>empty</literal> doesn't mention
3974 <literal>a</literal>. Functional dependencies can make the type variable
3975 reachable:
3976 <programlisting>
3977 class Coll s a | s -> a where
3978 empty :: s
3979 insert :: s -> a -> s
3980 </programlisting>
3981
3982 Alternatively <literal>Coll</literal> might be rewritten
3983
3984 <programlisting>
3985 class Coll s a where
3986 empty :: s a
3987 insert :: s a -> a -> s a
3988 </programlisting>
3989
3990
3991 which makes the connection between the type of a collection of
3992 <literal>a</literal>'s (namely <literal>(s a)</literal>) and the element type <literal>a</literal>.
3993 Occasionally this really doesn't work, in which case you can split the
3994 class like this:
3995
3996
3997 <programlisting>
3998 class CollE s where
3999 empty :: s
4000
4001 class CollE s => Coll s a where
4002 insert :: s -> a -> s
4003 </programlisting>
4004 </para>
4005 </sect3>
4006
4007
4008 <sect3>
4009 <title>Background on functional dependencies</title>
4010
4011 <para>The following description of the motivation and use of functional dependencies is taken
4012 from the Hugs user manual, reproduced here (with minor changes) by kind
4013 permission of Mark Jones.
4014 </para>
4015 <para>
4016 Consider the following class, intended as part of a
4017 library for collection types:
4018 <programlisting>
4019 class Collects e ce where
4020 empty :: ce
4021 insert :: e -> ce -> ce
4022 member :: e -> ce -> Bool
4023 </programlisting>
4024 The type variable e used here represents the element type, while ce is the type
4025 of the container itself. Within this framework, we might want to define
4026 instances of this class for lists or characteristic functions (both of which
4027 can be used to represent collections of any equality type), bit sets (which can
4028 be used to represent collections of characters), or hash tables (which can be
4029 used to represent any collection whose elements have a hash function). Omitting
4030 standard implementation details, this would lead to the following declarations:
4031 <programlisting>
4032 instance Eq e => Collects e [e] where ...
4033 instance Eq e => Collects e (e -> Bool) where ...
4034 instance Collects Char BitSet where ...
4035 instance (Hashable e, Collects a ce)
4036 => Collects e (Array Int ce) where ...
4037 </programlisting>
4038 All this looks quite promising; we have a class and a range of interesting
4039 implementations. Unfortunately, there are some serious problems with the class
4040 declaration. First, the empty function has an ambiguous type:
4041 <programlisting>
4042 empty :: Collects e ce => ce
4043 </programlisting>
4044 By "ambiguous" we mean that there is a type variable e that appears on the left
4045 of the <literal>=&gt;</literal> symbol, but not on the right. The problem with
4046 this is that, according to the theoretical foundations of Haskell overloading,
4047 we cannot guarantee a well-defined semantics for any term with an ambiguous
4048 type.
4049 </para>
4050 <para>
4051 We can sidestep this specific problem by removing the empty member from the
4052 class declaration. However, although the remaining members, insert and member,
4053 do not have ambiguous types, we still run into problems when we try to use
4054 them. For example, consider the following two functions:
4055 <programlisting>
4056 f x y = insert x . insert y
4057 g = f True 'a'
4058 </programlisting>
4059 for which GHC infers the following types:
4060 <programlisting>
4061 f :: (Collects a c, Collects b c) => a -> b -> c -> c
4062 g :: (Collects Bool c, Collects Char c) => c -> c
4063 </programlisting>
4064 Notice that the type for f allows the two parameters x and y to be assigned
4065 different types, even though it attempts to insert each of the two values, one
4066 after the other, into the same collection. If we're trying to model collections
4067 that contain only one type of value, then this is clearly an inaccurate
4068 type. Worse still, the definition for g is accepted, without causing a type
4069 error. As a result, the error in this code will not be flagged at the point
4070 where it appears. Instead, it will show up only when we try to use g, which
4071 might even be in a different module.
4072 </para>
4073
4074 <sect4><title>An attempt to use constructor classes</title>
4075
4076 <para>
4077 Faced with the problems described above, some Haskell programmers might be
4078 tempted to use something like the following version of the class declaration:
4079 <programlisting>
4080 class Collects e c where
4081 empty :: c e
4082 insert :: e -> c e -> c e
4083 member :: e -> c e -> Bool
4084 </programlisting>
4085 The key difference here is that we abstract over the type constructor c that is
4086 used to form the collection type c e, and not over that collection type itself,
4087 represented by ce in the original class declaration. This avoids the immediate
4088 problems that we mentioned above: empty has type <literal>Collects e c => c
4089 e</literal>, which is not ambiguous.
4090 </para>
4091 <para>
4092 The function f from the previous section has a more accurate type:
4093 <programlisting>
4094 f :: (Collects e c) => e -> e -> c e -> c e
4095 </programlisting>
4096 The function g from the previous section is now rejected with a type error as