Documentation: use new syntax for record GADTs (#7915)
[ghc.git] / docs / users_guide / glasgow_exts.xml
1 <?xml version="1.0" encoding="iso-8859-1"?>
2 <para>
3 <indexterm><primary>language, GHC</primary></indexterm>
4 <indexterm><primary>extensions, GHC</primary></indexterm>
5 As with all known Haskell systems, GHC implements some extensions to
6 the language. They can all be enabled or disabled by commandline flags
7 or language pragmas. By default GHC understands the most recent Haskell
8 version it supports, plus a handful of extensions.
9 </para>
10
11 <para>
12 Some of the Glasgow extensions serve to give you access to the
13 underlying facilities with which we implement Haskell. Thus, you can
14 get at the Raw Iron, if you are willing to write some non-portable
15 code at a more primitive level. You need not be &ldquo;stuck&rdquo;
16 on performance because of the implementation costs of Haskell's
17 &ldquo;high-level&rdquo; features&mdash;you can always code
18 &ldquo;under&rdquo; them. In an extreme case, you can write all your
19 time-critical code in C, and then just glue it together with Haskell!
20 </para>
21
22 <para>
23 Before you get too carried away working at the lowest level (e.g.,
24 sloshing <literal>MutableByteArray&num;</literal>s around your
25 program), you may wish to check if there are libraries that provide a
26 &ldquo;Haskellised veneer&rdquo; over the features you want. The
27 separate <ulink url="../libraries/index.html">libraries
28 documentation</ulink> describes all the libraries that come with GHC.
29 </para>
30
31 <!-- LANGUAGE OPTIONS -->
32 <sect1 id="options-language">
33 <title>Language options</title>
34
35 <indexterm><primary>language</primary><secondary>option</secondary>
36 </indexterm>
37 <indexterm><primary>options</primary><secondary>language</secondary>
38 </indexterm>
39 <indexterm><primary>extensions</primary><secondary>options controlling</secondary>
40 </indexterm>
41
42 <para>The language option flags control what variation of the language are
43 permitted.</para>
44
45 <para>Language options can be controlled in two ways:
46 <itemizedlist>
47 <listitem><para>Every language option can switched on by a command-line flag "<option>-X...</option>"
48 (e.g. <option>-XTemplateHaskell</option>), and switched off by the flag "<option>-XNo...</option>";
49 (e.g. <option>-XNoTemplateHaskell</option>).</para></listitem>
50 <listitem><para>
51 Language options recognised by Cabal can also be enabled using the <literal>LANGUAGE</literal> pragma,
52 thus <literal>{-# LANGUAGE TemplateHaskell #-}</literal> (see <xref linkend="language-pragma"/>). </para>
53 </listitem>
54 </itemizedlist></para>
55
56 <para>The flag <option>-fglasgow-exts</option>
57 <indexterm><primary><option>-fglasgow-exts</option></primary></indexterm>
58 is equivalent to enabling the following extensions:
59 &what_glasgow_exts_does;
60 Enabling these options is the <emphasis>only</emphasis>
61 effect of <option>-fglasgow-exts</option>.
62 We are trying to move away from this portmanteau flag,
63 and towards enabling features individually.</para>
64
65 </sect1>
66
67 <!-- UNBOXED TYPES AND PRIMITIVE OPERATIONS -->
68 <sect1 id="primitives">
69 <title>Unboxed types and primitive operations</title>
70
71 <para>GHC is built on a raft of primitive data types and operations;
72 "primitive" in the sense that they cannot be defined in Haskell itself.
73 While you really can use this stuff to write fast code,
74 we generally find it a lot less painful, and more satisfying in the
75 long run, to use higher-level language features and libraries. With
76 any luck, the code you write will be optimised to the efficient
77 unboxed version in any case. And if it isn't, we'd like to know
78 about it.</para>
79
80 <para>All these primitive data types and operations are exported by the
81 library <literal>GHC.Prim</literal>, for which there is
82 <ulink url="&libraryGhcPrimLocation;/GHC-Prim.html">detailed online documentation</ulink>.
83 (This documentation is generated from the file <filename>compiler/prelude/primops.txt.pp</filename>.)
84 </para>
85
86 <para>
87 If you want to mention any of the primitive data types or operations in your
88 program, you must first import <literal>GHC.Prim</literal> to bring them
89 into scope. Many of them have names ending in "&num;", and to mention such
90 names you need the <option>-XMagicHash</option> extension (<xref linkend="magic-hash"/>).
91 </para>
92
93 <para>The primops make extensive use of <link linkend="glasgow-unboxed">unboxed types</link>
94 and <link linkend="unboxed-tuples">unboxed tuples</link>, which
95 we briefly summarise here. </para>
96
97 <sect2 id="glasgow-unboxed">
98 <title>Unboxed types</title>
99
100 <para>
101 <indexterm><primary>Unboxed types (Glasgow extension)</primary></indexterm>
102 </para>
103
104 <para>Most types in GHC are <firstterm>boxed</firstterm>, which means
105 that values of that type are represented by a pointer to a heap
106 object. The representation of a Haskell <literal>Int</literal>, for
107 example, is a two-word heap object. An <firstterm>unboxed</firstterm>
108 type, however, is represented by the value itself, no pointers or heap
109 allocation are involved.
110 </para>
111
112 <para>
113 Unboxed types correspond to the &ldquo;raw machine&rdquo; types you
114 would use in C: <literal>Int&num;</literal> (long int),
115 <literal>Double&num;</literal> (double), <literal>Addr&num;</literal>
116 (void *), etc. The <emphasis>primitive operations</emphasis>
117 (PrimOps) on these types are what you might expect; e.g.,
118 <literal>(+&num;)</literal> is addition on
119 <literal>Int&num;</literal>s, and is the machine-addition that we all
120 know and love&mdash;usually one instruction.
121 </para>
122
123 <para>
124 Primitive (unboxed) types cannot be defined in Haskell, and are
125 therefore built into the language and compiler. Primitive types are
126 always unlifted; that is, a value of a primitive type cannot be
127 bottom. We use the convention (but it is only a convention)
128 that primitive types, values, and
129 operations have a <literal>&num;</literal> suffix (see <xref linkend="magic-hash"/>).
130 For some primitive types we have special syntax for literals, also
131 described in the <link linkend="magic-hash">same section</link>.
132 </para>
133
134 <para>
135 Primitive values are often represented by a simple bit-pattern, such
136 as <literal>Int&num;</literal>, <literal>Float&num;</literal>,
137 <literal>Double&num;</literal>. But this is not necessarily the case:
138 a primitive value might be represented by a pointer to a
139 heap-allocated object. Examples include
140 <literal>Array&num;</literal>, the type of primitive arrays. A
141 primitive array is heap-allocated because it is too big a value to fit
142 in a register, and would be too expensive to copy around; in a sense,
143 it is accidental that it is represented by a pointer. If a pointer
144 represents a primitive value, then it really does point to that value:
145 no unevaluated thunks, no indirections&hellip;nothing can be at the
146 other end of the pointer than the primitive value.
147 A numerically-intensive program using unboxed types can
148 go a <emphasis>lot</emphasis> faster than its &ldquo;standard&rdquo;
149 counterpart&mdash;we saw a threefold speedup on one example.
150 </para>
151
152 <para>
153 There are some restrictions on the use of primitive types:
154 <itemizedlist>
155 <listitem><para>The main restriction
156 is that you can't pass a primitive value to a polymorphic
157 function or store one in a polymorphic data type. This rules out
158 things like <literal>[Int&num;]</literal> (i.e. lists of primitive
159 integers). The reason for this restriction is that polymorphic
160 arguments and constructor fields are assumed to be pointers: if an
161 unboxed integer is stored in one of these, the garbage collector would
162 attempt to follow it, leading to unpredictable space leaks. Or a
163 <function>seq</function> operation on the polymorphic component may
164 attempt to dereference the pointer, with disastrous results. Even
165 worse, the unboxed value might be larger than a pointer
166 (<literal>Double&num;</literal> for instance).
167 </para>
168 </listitem>
169 <listitem><para> You cannot define a newtype whose representation type
170 (the argument type of the data constructor) is an unboxed type. Thus,
171 this is illegal:
172 <programlisting>
173 newtype A = MkA Int#
174 </programlisting>
175 </para></listitem>
176 <listitem><para> You cannot bind a variable with an unboxed type
177 in a <emphasis>top-level</emphasis> binding.
178 </para></listitem>
179 <listitem><para> You cannot bind a variable with an unboxed type
180 in a <emphasis>recursive</emphasis> binding.
181 </para></listitem>
182 <listitem><para> You may bind unboxed variables in a (non-recursive,
183 non-top-level) pattern binding, but you must make any such pattern-match
184 strict. For example, rather than:
185 <programlisting>
186 data Foo = Foo Int Int#
187
188 f x = let (Foo a b, w) = ..rhs.. in ..body..
189 </programlisting>
190 you must write:
191 <programlisting>
192 data Foo = Foo Int Int#
193
194 f x = let !(Foo a b, w) = ..rhs.. in ..body..
195 </programlisting>
196 since <literal>b</literal> has type <literal>Int#</literal>.
197 </para>
198 </listitem>
199 </itemizedlist>
200 </para>
201
202 </sect2>
203
204 <sect2 id="unboxed-tuples">
205 <title>Unboxed tuples</title>
206
207 <para>
208 Unboxed tuples aren't really exported by <literal>GHC.Exts</literal>;
209 they are a syntactic extension enabled by the language flag <option>-XUnboxedTuples</option>. An
210 unboxed tuple looks like this:
211 </para>
212
213 <para>
214
215 <programlisting>
216 (# e_1, ..., e_n #)
217 </programlisting>
218
219 </para>
220
221 <para>
222 where <literal>e&lowbar;1..e&lowbar;n</literal> are expressions of any
223 type (primitive or non-primitive). The type of an unboxed tuple looks
224 the same.
225 </para>
226
227 <para>
228 Note that when unboxed tuples are enabled,
229 <literal>(#</literal> is a single lexeme, so for example when using
230 operators like <literal>#</literal> and <literal>#-</literal> you need
231 to write <literal>( # )</literal> and <literal>( #- )</literal> rather than
232 <literal>(#)</literal> and <literal>(#-)</literal>.
233 </para>
234
235 <para>
236 Unboxed tuples are used for functions that need to return multiple
237 values, but they avoid the heap allocation normally associated with
238 using fully-fledged tuples. When an unboxed tuple is returned, the
239 components are put directly into registers or on the stack; the
240 unboxed tuple itself does not have a composite representation. Many
241 of the primitive operations listed in <literal>primops.txt.pp</literal> return unboxed
242 tuples.
243 In particular, the <literal>IO</literal> and <literal>ST</literal> monads use unboxed
244 tuples to avoid unnecessary allocation during sequences of operations.
245 </para>
246
247 <para>
248 There are some restrictions on the use of unboxed tuples:
249 <itemizedlist>
250
251 <listitem>
252 <para>
253 Values of unboxed tuple types are subject to the same restrictions as
254 other unboxed types; i.e. they may not be stored in polymorphic data
255 structures or passed to polymorphic functions.
256 </para>
257 </listitem>
258
259 <listitem>
260 <para>
261 The typical use of unboxed tuples is simply to return multiple values,
262 binding those multiple results with a <literal>case</literal> expression, thus:
263 <programlisting>
264 f x y = (# x+1, y-1 #)
265 g x = case f x x of { (# a, b #) -&#62; a + b }
266 </programlisting>
267 You can have an unboxed tuple in a pattern binding, thus
268 <programlisting>
269 f x = let (# p,q #) = h x in ..body..
270 </programlisting>
271 If the types of <literal>p</literal> and <literal>q</literal> are not unboxed,
272 the resulting binding is lazy like any other Haskell pattern binding. The
273 above example desugars like this:
274 <programlisting>
275 f x = let t = case h x o f{ (# p,q #) -> (p,q)
276 p = fst t
277 q = snd t
278 in ..body..
279 </programlisting>
280 Indeed, the bindings can even be recursive.
281 </para>
282 </listitem>
283 </itemizedlist>
284
285 </para>
286
287 </sect2>
288 </sect1>
289
290
291 <!-- ====================== SYNTACTIC EXTENSIONS ======================= -->
292
293 <sect1 id="syntax-extns">
294 <title>Syntactic extensions</title>
295
296 <sect2 id="unicode-syntax">
297 <title>Unicode syntax</title>
298 <para>The language
299 extension <option>-XUnicodeSyntax</option><indexterm><primary><option>-XUnicodeSyntax</option></primary></indexterm>
300 enables Unicode characters to be used to stand for certain ASCII
301 character sequences. The following alternatives are provided:</para>
302
303 <informaltable>
304 <tgroup cols="2" align="left" colsep="1" rowsep="1">
305 <thead>
306 <row>
307 <entry>ASCII</entry>
308 <entry>Unicode alternative</entry>
309 <entry>Code point</entry>
310 <entry>Name</entry>
311 </row>
312 </thead>
313
314 <!--
315 to find the DocBook entities for these characters, find
316 the Unicode code point (e.g. 0x2237), and grep for it in
317 /usr/share/sgml/docbook/xml-dtd-*/ent/* (or equivalent on
318 your system. Some of these Unicode code points don't have
319 equivalent DocBook entities.
320 -->
321
322 <tbody>
323 <row>
324 <entry><literal>::</literal></entry>
325 <entry>::</entry> <!-- no special char, apparently -->
326 <entry>0x2237</entry>
327 <entry>PROPORTION</entry>
328 </row>
329 </tbody>
330 <tbody>
331 <row>
332 <entry><literal>=&gt;</literal></entry>
333 <entry>&rArr;</entry>
334 <entry>0x21D2</entry>
335 <entry>RIGHTWARDS DOUBLE ARROW</entry>
336 </row>
337 </tbody>
338 <tbody>
339 <row>
340 <entry><literal>forall</literal></entry>
341 <entry>&forall;</entry>
342 <entry>0x2200</entry>
343 <entry>FOR ALL</entry>
344 </row>
345 </tbody>
346 <tbody>
347 <row>
348 <entry><literal>-&gt;</literal></entry>
349 <entry>&rarr;</entry>
350 <entry>0x2192</entry>
351 <entry>RIGHTWARDS ARROW</entry>
352 </row>
353 </tbody>
354 <tbody>
355 <row>
356 <entry><literal>&lt;-</literal></entry>
357 <entry>&larr;</entry>
358 <entry>0x2190</entry>
359 <entry>LEFTWARDS ARROW</entry>
360 </row>
361 </tbody>
362
363 <tbody>
364 <row>
365 <entry>-&lt;</entry>
366 <entry>&larrtl;</entry>
367 <entry>0x2919</entry>
368 <entry>LEFTWARDS ARROW-TAIL</entry>
369 </row>
370 </tbody>
371
372 <tbody>
373 <row>
374 <entry>&gt;-</entry>
375 <entry>&rarrtl;</entry>
376 <entry>0x291A</entry>
377 <entry>RIGHTWARDS ARROW-TAIL</entry>
378 </row>
379 </tbody>
380
381 <tbody>
382 <row>
383 <entry>-&lt;&lt;</entry>
384 <entry></entry>
385 <entry>0x291B</entry>
386 <entry>LEFTWARDS DOUBLE ARROW-TAIL</entry>
387 </row>
388 </tbody>
389
390 <tbody>
391 <row>
392 <entry>&gt;&gt;-</entry>
393 <entry></entry>
394 <entry>0x291C</entry>
395 <entry>RIGHTWARDS DOUBLE ARROW-TAIL</entry>
396 </row>
397 </tbody>
398
399 <tbody>
400 <row>
401 <entry>*</entry>
402 <entry>&starf;</entry>
403 <entry>0x2605</entry>
404 <entry>BLACK STAR</entry>
405 </row>
406 </tbody>
407
408 </tgroup>
409 </informaltable>
410 </sect2>
411
412 <sect2 id="magic-hash">
413 <title>The magic hash</title>
414 <para>The language extension <option>-XMagicHash</option> allows "&num;" as a
415 postfix modifier to identifiers. Thus, "x&num;" is a valid variable, and "T&num;" is
416 a valid type constructor or data constructor.</para>
417
418 <para>The hash sign does not change semantics at all. We tend to use variable
419 names ending in "&num;" for unboxed values or types (e.g. <literal>Int&num;</literal>),
420 but there is no requirement to do so; they are just plain ordinary variables.
421 Nor does the <option>-XMagicHash</option> extension bring anything into scope.
422 For example, to bring <literal>Int&num;</literal> into scope you must
423 import <literal>GHC.Prim</literal> (see <xref linkend="primitives"/>);
424 the <option>-XMagicHash</option> extension
425 then allows you to <emphasis>refer</emphasis> to the <literal>Int&num;</literal>
426 that is now in scope.</para>
427 <para> The <option>-XMagicHash</option> also enables some new forms of literals (see <xref linkend="glasgow-unboxed"/>):
428 <itemizedlist>
429 <listitem><para> <literal>'x'&num;</literal> has type <literal>Char&num;</literal></para> </listitem>
430 <listitem><para> <literal>&quot;foo&quot;&num;</literal> has type <literal>Addr&num;</literal></para> </listitem>
431 <listitem><para> <literal>3&num;</literal> has type <literal>Int&num;</literal>. In general,
432 any Haskell integer lexeme followed by a <literal>&num;</literal> is an <literal>Int&num;</literal> literal, e.g.
433 <literal>-0x3A&num;</literal> as well as <literal>32&num;</literal></para>.</listitem>
434 <listitem><para> <literal>3&num;&num;</literal> has type <literal>Word&num;</literal>. In general,
435 any non-negative Haskell integer lexeme followed by <literal>&num;&num;</literal>
436 is a <literal>Word&num;</literal>. </para> </listitem>
437 <listitem><para> <literal>3.2&num;</literal> has type <literal>Float&num;</literal>.</para> </listitem>
438 <listitem><para> <literal>3.2&num;&num;</literal> has type <literal>Double&num;</literal></para> </listitem>
439 </itemizedlist>
440 </para>
441 </sect2>
442
443 <!-- ====================== HIERARCHICAL MODULES ======================= -->
444
445
446 <sect2 id="hierarchical-modules">
447 <title>Hierarchical Modules</title>
448
449 <para>GHC supports a small extension to the syntax of module
450 names: a module name is allowed to contain a dot
451 <literal>&lsquo;.&rsquo;</literal>. This is also known as the
452 &ldquo;hierarchical module namespace&rdquo; extension, because
453 it extends the normally flat Haskell module namespace into a
454 more flexible hierarchy of modules.</para>
455
456 <para>This extension has very little impact on the language
457 itself; modules names are <emphasis>always</emphasis> fully
458 qualified, so you can just think of the fully qualified module
459 name as <quote>the module name</quote>. In particular, this
460 means that the full module name must be given after the
461 <literal>module</literal> keyword at the beginning of the
462 module; for example, the module <literal>A.B.C</literal> must
463 begin</para>
464
465 <programlisting>module A.B.C</programlisting>
466
467
468 <para>It is a common strategy to use the <literal>as</literal>
469 keyword to save some typing when using qualified names with
470 hierarchical modules. For example:</para>
471
472 <programlisting>
473 import qualified Control.Monad.ST.Strict as ST
474 </programlisting>
475
476 <para>For details on how GHC searches for source and interface
477 files in the presence of hierarchical modules, see <xref
478 linkend="search-path"/>.</para>
479
480 <para>GHC comes with a large collection of libraries arranged
481 hierarchically; see the accompanying <ulink
482 url="../libraries/index.html">library
483 documentation</ulink>. More libraries to install are available
484 from <ulink
485 url="http://hackage.haskell.org/packages/hackage.html">HackageDB</ulink>.</para>
486 </sect2>
487
488 <!-- ====================== PATTERN GUARDS ======================= -->
489
490 <sect2 id="pattern-guards">
491 <title>Pattern guards</title>
492
493 <para>
494 <indexterm><primary>Pattern guards (Glasgow extension)</primary></indexterm>
495 The discussion that follows is an abbreviated version of Simon Peyton Jones's original <ulink url="http://research.microsoft.com/~simonpj/Haskell/guards.html">proposal</ulink>. (Note that the proposal was written before pattern guards were implemented, so refers to them as unimplemented.)
496 </para>
497
498 <para>
499 Suppose we have an abstract data type of finite maps, with a
500 lookup operation:
501
502 <programlisting>
503 lookup :: FiniteMap -> Int -> Maybe Int
504 </programlisting>
505
506 The lookup returns <function>Nothing</function> if the supplied key is not in the domain of the mapping, and <function>(Just v)</function> otherwise,
507 where <varname>v</varname> is the value that the key maps to. Now consider the following definition:
508 </para>
509
510 <programlisting>
511 clunky env var1 var2 | ok1 &amp;&amp; ok2 = val1 + val2
512 | otherwise = var1 + var2
513 where
514 m1 = lookup env var1
515 m2 = lookup env var2
516 ok1 = maybeToBool m1
517 ok2 = maybeToBool m2
518 val1 = expectJust m1
519 val2 = expectJust m2
520 </programlisting>
521
522 <para>
523 The auxiliary functions are
524 </para>
525
526 <programlisting>
527 maybeToBool :: Maybe a -&gt; Bool
528 maybeToBool (Just x) = True
529 maybeToBool Nothing = False
530
531 expectJust :: Maybe a -&gt; a
532 expectJust (Just x) = x
533 expectJust Nothing = error "Unexpected Nothing"
534 </programlisting>
535
536 <para>
537 What is <function>clunky</function> doing? The guard <literal>ok1 &amp;&amp;
538 ok2</literal> checks that both lookups succeed, using
539 <function>maybeToBool</function> to convert the <function>Maybe</function>
540 types to booleans. The (lazily evaluated) <function>expectJust</function>
541 calls extract the values from the results of the lookups, and binds the
542 returned values to <varname>val1</varname> and <varname>val2</varname>
543 respectively. If either lookup fails, then clunky takes the
544 <literal>otherwise</literal> case and returns the sum of its arguments.
545 </para>
546
547 <para>
548 This is certainly legal Haskell, but it is a tremendously verbose and
549 un-obvious way to achieve the desired effect. Arguably, a more direct way
550 to write clunky would be to use case expressions:
551 </para>
552
553 <programlisting>
554 clunky env var1 var2 = case lookup env var1 of
555 Nothing -&gt; fail
556 Just val1 -&gt; case lookup env var2 of
557 Nothing -&gt; fail
558 Just val2 -&gt; val1 + val2
559 where
560 fail = var1 + var2
561 </programlisting>
562
563 <para>
564 This is a bit shorter, but hardly better. Of course, we can rewrite any set
565 of pattern-matching, guarded equations as case expressions; that is
566 precisely what the compiler does when compiling equations! The reason that
567 Haskell provides guarded equations is because they allow us to write down
568 the cases we want to consider, one at a time, independently of each other.
569 This structure is hidden in the case version. Two of the right-hand sides
570 are really the same (<function>fail</function>), and the whole expression
571 tends to become more and more indented.
572 </para>
573
574 <para>
575 Here is how I would write clunky:
576 </para>
577
578 <programlisting>
579 clunky env var1 var2
580 | Just val1 &lt;- lookup env var1
581 , Just val2 &lt;- lookup env var2
582 = val1 + val2
583 ...other equations for clunky...
584 </programlisting>
585
586 <para>
587 The semantics should be clear enough. The qualifiers are matched in order.
588 For a <literal>&lt;-</literal> qualifier, which I call a pattern guard, the
589 right hand side is evaluated and matched against the pattern on the left.
590 If the match fails then the whole guard fails and the next equation is
591 tried. If it succeeds, then the appropriate binding takes place, and the
592 next qualifier is matched, in the augmented environment. Unlike list
593 comprehensions, however, the type of the expression to the right of the
594 <literal>&lt;-</literal> is the same as the type of the pattern to its
595 left. The bindings introduced by pattern guards scope over all the
596 remaining guard qualifiers, and over the right hand side of the equation.
597 </para>
598
599 <para>
600 Just as with list comprehensions, boolean expressions can be freely mixed
601 with among the pattern guards. For example:
602 </para>
603
604 <programlisting>
605 f x | [y] &lt;- x
606 , y > 3
607 , Just z &lt;- h y
608 = ...
609 </programlisting>
610
611 <para>
612 Haskell's current guards therefore emerge as a special case, in which the
613 qualifier list has just one element, a boolean expression.
614 </para>
615 </sect2>
616
617 <!-- ===================== View patterns =================== -->
618
619 <sect2 id="view-patterns">
620 <title>View patterns
621 </title>
622
623 <para>
624 View patterns are enabled by the flag <literal>-XViewPatterns</literal>.
625 More information and examples of view patterns can be found on the
626 <ulink url="http://hackage.haskell.org/trac/ghc/wiki/ViewPatterns">Wiki
627 page</ulink>.
628 </para>
629
630 <para>
631 View patterns are somewhat like pattern guards that can be nested inside
632 of other patterns. They are a convenient way of pattern-matching
633 against values of abstract types. For example, in a programming language
634 implementation, we might represent the syntax of the types of the
635 language as follows:
636
637 <programlisting>
638 type Typ
639
640 data TypView = Unit
641 | Arrow Typ Typ
642
643 view :: Typ -> TypView
644
645 -- additional operations for constructing Typ's ...
646 </programlisting>
647
648 The representation of Typ is held abstract, permitting implementations
649 to use a fancy representation (e.g., hash-consing to manage sharing).
650
651 Without view patterns, using this signature a little inconvenient:
652 <programlisting>
653 size :: Typ -> Integer
654 size t = case view t of
655 Unit -> 1
656 Arrow t1 t2 -> size t1 + size t2
657 </programlisting>
658
659 It is necessary to iterate the case, rather than using an equational
660 function definition. And the situation is even worse when the matching
661 against <literal>t</literal> is buried deep inside another pattern.
662 </para>
663
664 <para>
665 View patterns permit calling the view function inside the pattern and
666 matching against the result:
667 <programlisting>
668 size (view -> Unit) = 1
669 size (view -> Arrow t1 t2) = size t1 + size t2
670 </programlisting>
671
672 That is, we add a new form of pattern, written
673 <replaceable>expression</replaceable> <literal>-></literal>
674 <replaceable>pattern</replaceable> that means "apply the expression to
675 whatever we're trying to match against, and then match the result of
676 that application against the pattern". The expression can be any Haskell
677 expression of function type, and view patterns can be used wherever
678 patterns are used.
679 </para>
680
681 <para>
682 The semantics of a pattern <literal>(</literal>
683 <replaceable>exp</replaceable> <literal>-></literal>
684 <replaceable>pat</replaceable> <literal>)</literal> are as follows:
685
686 <itemizedlist>
687
688 <listitem> Scoping:
689
690 <para>The variables bound by the view pattern are the variables bound by
691 <replaceable>pat</replaceable>.
692 </para>
693
694 <para>
695 Any variables in <replaceable>exp</replaceable> are bound occurrences,
696 but variables bound "to the left" in a pattern are in scope. This
697 feature permits, for example, one argument to a function to be used in
698 the view of another argument. For example, the function
699 <literal>clunky</literal> from <xref linkend="pattern-guards" /> can be
700 written using view patterns as follows:
701
702 <programlisting>
703 clunky env (lookup env -> Just val1) (lookup env -> Just val2) = val1 + val2
704 ...other equations for clunky...
705 </programlisting>
706 </para>
707
708 <para>
709 More precisely, the scoping rules are:
710 <itemizedlist>
711 <listitem>
712 <para>
713 In a single pattern, variables bound by patterns to the left of a view
714 pattern expression are in scope. For example:
715 <programlisting>
716 example :: Maybe ((String -> Integer,Integer), String) -> Bool
717 example Just ((f,_), f -> 4) = True
718 </programlisting>
719
720 Additionally, in function definitions, variables bound by matching earlier curried
721 arguments may be used in view pattern expressions in later arguments:
722 <programlisting>
723 example :: (String -> Integer) -> String -> Bool
724 example f (f -> 4) = True
725 </programlisting>
726 That is, the scoping is the same as it would be if the curried arguments
727 were collected into a tuple.
728 </para>
729 </listitem>
730
731 <listitem>
732 <para>
733 In mutually recursive bindings, such as <literal>let</literal>,
734 <literal>where</literal>, or the top level, view patterns in one
735 declaration may not mention variables bound by other declarations. That
736 is, each declaration must be self-contained. For example, the following
737 program is not allowed:
738 <programlisting>
739 let {(x -> y) = e1 ;
740 (y -> x) = e2 } in x
741 </programlisting>
742
743 (For some amplification on this design choice see
744 <ulink url="http://hackage.haskell.org/trac/ghc/ticket/4061">Trac #4061</ulink>.)
745
746 </para>
747 </listitem>
748 </itemizedlist>
749
750 </para>
751 </listitem>
752
753 <listitem><para> Typing: If <replaceable>exp</replaceable> has type
754 <replaceable>T1</replaceable> <literal>-></literal>
755 <replaceable>T2</replaceable> and <replaceable>pat</replaceable> matches
756 a <replaceable>T2</replaceable>, then the whole view pattern matches a
757 <replaceable>T1</replaceable>.
758 </para></listitem>
759
760 <listitem><para> Matching: To the equations in Section 3.17.3 of the
761 <ulink url="http://www.haskell.org/onlinereport/">Haskell 98
762 Report</ulink>, add the following:
763 <programlisting>
764 case v of { (e -> p) -> e1 ; _ -> e2 }
765 =
766 case (e v) of { p -> e1 ; _ -> e2 }
767 </programlisting>
768 That is, to match a variable <replaceable>v</replaceable> against a pattern
769 <literal>(</literal> <replaceable>exp</replaceable>
770 <literal>-></literal> <replaceable>pat</replaceable>
771 <literal>)</literal>, evaluate <literal>(</literal>
772 <replaceable>exp</replaceable> <replaceable> v</replaceable>
773 <literal>)</literal> and match the result against
774 <replaceable>pat</replaceable>.
775 </para></listitem>
776
777 <listitem><para> Efficiency: When the same view function is applied in
778 multiple branches of a function definition or a case expression (e.g.,
779 in <literal>size</literal> above), GHC makes an attempt to collect these
780 applications into a single nested case expression, so that the view
781 function is only applied once. Pattern compilation in GHC follows the
782 matrix algorithm described in Chapter 4 of <ulink
783 url="http://research.microsoft.com/~simonpj/Papers/slpj-book-1987/">The
784 Implementation of Functional Programming Languages</ulink>. When the
785 top rows of the first column of a matrix are all view patterns with the
786 "same" expression, these patterns are transformed into a single nested
787 case. This includes, for example, adjacent view patterns that line up
788 in a tuple, as in
789 <programlisting>
790 f ((view -> A, p1), p2) = e1
791 f ((view -> B, p3), p4) = e2
792 </programlisting>
793 </para>
794
795 <para> The current notion of when two view pattern expressions are "the
796 same" is very restricted: it is not even full syntactic equality.
797 However, it does include variables, literals, applications, and tuples;
798 e.g., two instances of <literal>view ("hi", "there")</literal> will be
799 collected. However, the current implementation does not compare up to
800 alpha-equivalence, so two instances of <literal>(x, view x ->
801 y)</literal> will not be coalesced.
802 </para>
803
804 </listitem>
805
806 </itemizedlist>
807 </para>
808
809 </sect2>
810
811 <!-- ===================== n+k patterns =================== -->
812
813 <sect2 id="n-k-patterns">
814 <title>n+k patterns</title>
815 <indexterm><primary><option>-XNPlusKPatterns</option></primary></indexterm>
816
817 <para>
818 <literal>n+k</literal> pattern support is disabled by default. To enable
819 it, you can use the <option>-XNPlusKPatterns</option> flag.
820 </para>
821
822 </sect2>
823
824 <!-- ===================== Traditional record syntax =================== -->
825
826 <sect2 id="traditional-record-syntax">
827 <title>Traditional record syntax</title>
828 <indexterm><primary><option>-XNoTraditionalRecordSyntax</option></primary></indexterm>
829
830 <para>
831 Traditional record syntax, such as <literal>C {f = x}</literal>, is enabled by default.
832 To disable it, you can use the <option>-XNoTraditionalRecordSyntax</option> flag.
833 </para>
834
835 </sect2>
836
837 <!-- ===================== Recursive do-notation =================== -->
838
839 <sect2 id="recursive-do-notation">
840 <title>The recursive do-notation
841 </title>
842
843 <para>
844 The do-notation of Haskell 98 does not allow <emphasis>recursive bindings</emphasis>,
845 that is, the variables bound in a do-expression are visible only in the textually following
846 code block. Compare this to a let-expression, where bound variables are visible in the entire binding
847 group.
848 </para>
849
850 <para>
851 It turns out that such recursive bindings do indeed make sense for a variety of monads, but
852 not all. In particular, recursion in this sense requires a fixed-point operator for the underlying
853 monad, captured by the <literal>mfix</literal> method of the <literal>MonadFix</literal> class, defined in <literal>Control.Monad.Fix</literal> as follows:
854 <programlisting>
855 class Monad m => MonadFix m where
856 mfix :: (a -> m a) -> m a
857 </programlisting>
858 Haskell's
859 <literal>Maybe</literal>, <literal>[]</literal> (list), <literal>ST</literal> (both strict and lazy versions),
860 <literal>IO</literal>, and many other monads have <literal>MonadFix</literal> instances. On the negative
861 side, the continuation monad, with the signature <literal>(a -> r) -> r</literal>, does not.
862 </para>
863
864 <para>
865 For monads that do belong to the <literal>MonadFix</literal> class, GHC provides
866 an extended version of the do-notation that allows recursive bindings.
867 The <option>-XRecursiveDo</option> (language pragma: <literal>RecursiveDo</literal>)
868 provides the necessary syntactic support, introducing the keywords <literal>mdo</literal> and
869 <literal>rec</literal> for higher and lower levels of the notation respectively. Unlike
870 bindings in a <literal>do</literal> expression, those introduced by <literal>mdo</literal> and <literal>rec</literal>
871 are recursively defined, much like in an ordinary let-expression. Due to the new
872 keyword <literal>mdo</literal>, we also call this notation the <emphasis>mdo-notation</emphasis>.
873 </para>
874
875 <para>
876 Here is a simple (albeit contrived) example:
877 <programlisting>
878 {-# LANGUAGE RecursiveDo #-}
879 justOnes = mdo { xs &lt;- Just (1:xs)
880 ; return (map negate xs) }
881 </programlisting>
882 or equivalently
883 <programlisting>
884 {-# LANGUAGE RecursiveDo #-}
885 justOnes = do { rec { xs &lt;- Just (1:xs) }
886 ; return (map negate xs) }
887 </programlisting>
888 As you can guess <literal>justOnes</literal> will evaluate to <literal>Just [-1,-1,-1,...</literal>.
889 </para>
890
891 <para>
892 GHC's implementation the mdo-notation closely follows the original translation as described in the paper
893 <ulink url="https://sites.google.com/site/leventerkok/recdo.pdf">A recursive do for Haskell</ulink>, which
894 in turn is based on the work <ulink url="http://sites.google.com/site/leventerkok/erkok-thesis.pdf">Value Recursion
895 in Monadic Computations</ulink>. Furthermore, GHC extends the syntax described in the former paper
896 with a lower level syntax flagged by the <literal>rec</literal> keyword, as we describe next.
897 </para>
898
899 <sect3>
900 <title>Recursive binding groups</title>
901
902 <para>
903 The flag <option>-XRecursiveDo</option> also introduces a new keyword <literal>rec</literal>, which wraps a
904 mutually-recursive group of monadic statements inside a <literal>do</literal> expression, producing a single statement.
905 Similar to a <literal>let</literal> statement inside a <literal>do</literal>, variables bound in
906 the <literal>rec</literal> are visible throughout the <literal>rec</literal> group, and below it. For example, compare
907 <programlisting>
908 do { a &lt;- getChar do { a &lt;- getChar
909 ; let { r1 = f a r2 ; rec { r1 &lt;- f a r2
910 ; ; r2 = g r1 } ; ; r2 &lt;- g r1 }
911 ; return (r1 ++ r2) } ; return (r1 ++ r2) }
912 </programlisting>
913 In both cases, <literal>r1</literal> and <literal>r2</literal> are available both throughout
914 the <literal>let</literal> or <literal>rec</literal> block, and in the statements that follow it.
915 The difference is that <literal>let</literal> is non-monadic, while <literal>rec</literal> is monadic.
916 (In Haskell <literal>let</literal> is really <literal>letrec</literal>, of course.)
917 </para>
918
919 <para>
920 The semantics of <literal>rec</literal> is fairly straightforward. Whenever GHC finds a <literal>rec</literal>
921 group, it will compute its set of bound variables, and will introduce an appropriate call
922 to the underlying monadic value-recursion operator <literal>mfix</literal>, belonging to the
923 <literal>MonadFix</literal> class. Here is an example:
924 <programlisting>
925 rec { b &lt;- f a c ===> (b,c) &lt;- mfix (\~(b,c) -> do { b &lt;- f a c
926 ; c &lt;- f b a } ; c &lt;- f b a
927 ; return (b,c) })
928 </programlisting>
929 As usual, the meta-variables <literal>b</literal>, <literal>c</literal> etc., can be arbitrary patterns.
930 In general, the statement <literal>rec <replaceable>ss</replaceable></literal> is desugared to the statement
931 <programlisting>
932 <replaceable>vs</replaceable> &lt;- mfix (\~<replaceable>vs</replaceable> -&gt; do { <replaceable>ss</replaceable>; return <replaceable>vs</replaceable> })
933 </programlisting>
934 where <replaceable>vs</replaceable> is a tuple of the variables bound by <replaceable>ss</replaceable>.
935 </para>
936
937 <para>
938 Note in particular that the translation for a <literal>rec</literal> block only involves wrapping a call
939 to <literal>mfix</literal>: it performs no other analysis on the bindings. The latter is the task
940 for the <literal>mdo</literal> notation, which is described next.
941 </para>
942 </sect3>
943
944 <sect3>
945 <title>The <literal>mdo</literal> notation</title>
946
947 <para>
948 A <literal>rec</literal>-block tells the compiler where precisely the recursive knot should be tied. It turns out that
949 the placement of the recursive knots can be rather delicate: in particular, we would like the knots to be wrapped
950 around as minimal groups as possible. This process is known as <emphasis>segmentation</emphasis>, and is described
951 in detail in Secton 3.2 of <ulink url="https://sites.google.com/site/leventerkok/recdo.pdf">A recursive do for
952 Haskell</ulink>. Segmentation improves polymorphism and reduces the size of the recursive knot. Most importantly, it avoids
953 unnecessary interference caused by a fundamental issue with the so-called <emphasis>right-shrinking</emphasis>
954 axiom for monadic recursion. In brief, most monads of interest (IO, strict state, etc.) do <emphasis>not</emphasis>
955 have recursion operators that satisfy this axiom, and thus not performing segmentation can cause unnecessary
956 interference, changing the termination behavior of the resulting translation.
957 (Details can be found in Sections 3.1 and 7.2.2 of
958 <ulink url="http://sites.google.com/site/leventerkok/erkok-thesis.pdf">Value Recursion in Monadic Computations</ulink>.)
959 </para>
960
961 <para>
962 The <literal>mdo</literal> notation removes the burden of placing
963 explicit <literal>rec</literal> blocks in the code. Unlike an
964 ordinary <literal>do</literal> expression, in which variables bound by
965 statements are only in scope for later statements, variables bound in
966 an <literal>mdo</literal> expression are in scope for all statements
967 of the expression. The compiler then automatically identifies minimal
968 mutually recursively dependent segments of statements, treating them as
969 if the user had wrapped a <literal>rec</literal> qualifier around them.
970 </para>
971
972 <para>
973 The definition is syntactic:
974 </para>
975 <itemizedlist>
976 <listitem>
977 <para>
978 A generator <replaceable>g</replaceable>
979 <emphasis>depends</emphasis> on a textually following generator
980 <replaceable>g'</replaceable>, if
981 </para>
982 <itemizedlist>
983 <listitem>
984 <para>
985 <replaceable>g'</replaceable> defines a variable that
986 is used by <replaceable>g</replaceable>, or
987 </para>
988 </listitem>
989 <listitem>
990 <para>
991 <replaceable>g'</replaceable> textually appears between
992 <replaceable>g</replaceable> and
993 <replaceable>g''</replaceable>, where <replaceable>g</replaceable>
994 depends on <replaceable>g''</replaceable>.
995 </para>
996 </listitem>
997 </itemizedlist>
998 </listitem>
999 <listitem>
1000 <para>
1001 A <emphasis>segment</emphasis> of a given
1002 <literal>mdo</literal>-expression is a minimal sequence of generators
1003 such that no generator of the sequence depends on an outside
1004 generator. As a special case, although it is not a generator,
1005 the final expression in an <literal>mdo</literal>-expression is
1006 considered to form a segment by itself.
1007 </para>
1008 </listitem>
1009 </itemizedlist>
1010 <para>
1011 Segments in this sense are
1012 related to <emphasis>strongly-connected components</emphasis> analysis,
1013 with the exception that bindings in a segment cannot be reordered and
1014 must be contiguous.
1015 </para>
1016
1017 <para>
1018 Here is an example <literal>mdo</literal>-expression, and its translation to <literal>rec</literal> blocks:
1019 <programlisting>
1020 mdo { a &lt;- getChar ===> do { a &lt;- getChar
1021 ; b &lt;- f a c ; rec { b &lt;- f a c
1022 ; c &lt;- f b a ; ; c &lt;- f b a }
1023 ; z &lt;- h a b ; z &lt;- h a b
1024 ; d &lt;- g d e ; rec { d &lt;- g d e
1025 ; e &lt;- g a z ; ; e &lt;- g a z }
1026 ; putChar c } ; putChar c }
1027 </programlisting>
1028 Note that a given <literal>mdo</literal> expression can cause the creation of multiple <literal>rec</literal> blocks.
1029 If there are no recursive dependencies, <literal>mdo</literal> will introduce no <literal>rec</literal> blocks. In this
1030 latter case an <literal>mdo</literal> expression is precisely the same as a <literal>do</literal> expression, as one
1031 would expect.
1032 </para>
1033
1034 <para>
1035 In summary, given an <literal>mdo</literal> expression, GHC first performs segmentation, introducing
1036 <literal>rec</literal> blocks to wrap over minimal recursive groups. Then, each resulting
1037 <literal>rec</literal> is desugared, using a call to <literal>Control.Monad.Fix.mfix</literal> as described
1038 in the previous section. The original <literal>mdo</literal>-expression typechecks exactly when the desugared
1039 version would do so.
1040 </para>
1041
1042 <para>
1043 Here are some other important points in using the recursive-do notation:
1044
1045 <itemizedlist>
1046 <listitem>
1047 <para>
1048 It is enabled with the flag <literal>-XRecursiveDo</literal>, or the <literal>LANGUAGE RecursiveDo</literal>
1049 pragma. (The same flag enables both <literal>mdo</literal>-notation, and the use of <literal>rec</literal>
1050 blocks inside <literal>do</literal> expressions.)
1051 </para>
1052 </listitem>
1053 <listitem>
1054 <para>
1055 <literal>rec</literal> blocks can also be used inside <literal>mdo</literal>-expressions, which will be
1056 treated as a single statement. However, it is good style to either use <literal>mdo</literal> or
1057 <literal>rec</literal> blocks in a single expression.
1058 </para>
1059 </listitem>
1060 <listitem>
1061 <para>
1062 If recursive bindings are required for a monad, then that monad must be declared an instance of
1063 the <literal>MonadFix</literal> class.
1064 </para>
1065 </listitem>
1066 <listitem>
1067 <para>
1068 The following instances of <literal>MonadFix</literal> are automatically provided: List, Maybe, IO.
1069 Furthermore, the <literal>Control.Monad.ST</literal> and <literal>Control.Monad.ST.Lazy</literal>
1070 modules provide the instances of the <literal>MonadFix</literal> class for Haskell's internal
1071 state monad (strict and lazy, respectively).
1072 </para>
1073 </listitem>
1074 <listitem>
1075 <para>
1076 Like <literal>let</literal> and <literal>where</literal> bindings, name shadowing is not allowed within
1077 an <literal>mdo</literal>-expression or a <literal>rec</literal>-block; that is, all the names bound in
1078 a single <literal>rec</literal> must be distinct. (GHC will complain if this is not the case.)
1079 </para>
1080 </listitem>
1081 </itemizedlist>
1082 </para>
1083 </sect3>
1084
1085
1086 </sect2>
1087
1088
1089 <!-- ===================== PARALLEL LIST COMPREHENSIONS =================== -->
1090
1091 <sect2 id="parallel-list-comprehensions">
1092 <title>Parallel List Comprehensions</title>
1093 <indexterm><primary>list comprehensions</primary><secondary>parallel</secondary>
1094 </indexterm>
1095 <indexterm><primary>parallel list comprehensions</primary>
1096 </indexterm>
1097
1098 <para>Parallel list comprehensions are a natural extension to list
1099 comprehensions. List comprehensions can be thought of as a nice
1100 syntax for writing maps and filters. Parallel comprehensions
1101 extend this to include the zipWith family.</para>
1102
1103 <para>A parallel list comprehension has multiple independent
1104 branches of qualifier lists, each separated by a `|' symbol. For
1105 example, the following zips together two lists:</para>
1106
1107 <programlisting>
1108 [ (x, y) | x &lt;- xs | y &lt;- ys ]
1109 </programlisting>
1110
1111 <para>The behaviour of parallel list comprehensions follows that of
1112 zip, in that the resulting list will have the same length as the
1113 shortest branch.</para>
1114
1115 <para>We can define parallel list comprehensions by translation to
1116 regular comprehensions. Here's the basic idea:</para>
1117
1118 <para>Given a parallel comprehension of the form: </para>
1119
1120 <programlisting>
1121 [ e | p1 &lt;- e11, p2 &lt;- e12, ...
1122 | q1 &lt;- e21, q2 &lt;- e22, ...
1123 ...
1124 ]
1125 </programlisting>
1126
1127 <para>This will be translated to: </para>
1128
1129 <programlisting>
1130 [ e | ((p1,p2), (q1,q2), ...) &lt;- zipN [(p1,p2) | p1 &lt;- e11, p2 &lt;- e12, ...]
1131 [(q1,q2) | q1 &lt;- e21, q2 &lt;- e22, ...]
1132 ...
1133 ]
1134 </programlisting>
1135
1136 <para>where `zipN' is the appropriate zip for the given number of
1137 branches.</para>
1138
1139 </sect2>
1140
1141 <!-- ===================== TRANSFORM LIST COMPREHENSIONS =================== -->
1142
1143 <sect2 id="generalised-list-comprehensions">
1144 <title>Generalised (SQL-Like) List Comprehensions</title>
1145 <indexterm><primary>list comprehensions</primary><secondary>generalised</secondary>
1146 </indexterm>
1147 <indexterm><primary>extended list comprehensions</primary>
1148 </indexterm>
1149 <indexterm><primary>group</primary></indexterm>
1150 <indexterm><primary>sql</primary></indexterm>
1151
1152
1153 <para>Generalised list comprehensions are a further enhancement to the
1154 list comprehension syntactic sugar to allow operations such as sorting
1155 and grouping which are familiar from SQL. They are fully described in the
1156 paper <ulink url="http://research.microsoft.com/~simonpj/papers/list-comp">
1157 Comprehensive comprehensions: comprehensions with "order by" and "group by"</ulink>,
1158 except that the syntax we use differs slightly from the paper.</para>
1159 <para>The extension is enabled with the flag <option>-XTransformListComp</option>.</para>
1160 <para>Here is an example:
1161 <programlisting>
1162 employees = [ ("Simon", "MS", 80)
1163 , ("Erik", "MS", 100)
1164 , ("Phil", "Ed", 40)
1165 , ("Gordon", "Ed", 45)
1166 , ("Paul", "Yale", 60)]
1167
1168 output = [ (the dept, sum salary)
1169 | (name, dept, salary) &lt;- employees
1170 , then group by dept using groupWith
1171 , then sortWith by (sum salary)
1172 , then take 5 ]
1173 </programlisting>
1174 In this example, the list <literal>output</literal> would take on
1175 the value:
1176
1177 <programlisting>
1178 [("Yale", 60), ("Ed", 85), ("MS", 180)]
1179 </programlisting>
1180 </para>
1181 <para>There are three new keywords: <literal>group</literal>, <literal>by</literal>, and <literal>using</literal>.
1182 (The functions <literal>sortWith</literal> and <literal>groupWith</literal> are not keywords; they are ordinary
1183 functions that are exported by <literal>GHC.Exts</literal>.)</para>
1184
1185 <para>There are five new forms of comprehension qualifier,
1186 all introduced by the (existing) keyword <literal>then</literal>:
1187 <itemizedlist>
1188 <listitem>
1189
1190 <programlisting>
1191 then f
1192 </programlisting>
1193
1194 This statement requires that <literal>f</literal> have the type <literal>
1195 forall a. [a] -> [a]</literal>. You can see an example of its use in the
1196 motivating example, as this form is used to apply <literal>take 5</literal>.
1197
1198 </listitem>
1199
1200
1201 <listitem>
1202 <para>
1203 <programlisting>
1204 then f by e
1205 </programlisting>
1206
1207 This form is similar to the previous one, but allows you to create a function
1208 which will be passed as the first argument to f. As a consequence f must have
1209 the type <literal>forall a. (a -> t) -> [a] -> [a]</literal>. As you can see
1210 from the type, this function lets f &quot;project out&quot; some information
1211 from the elements of the list it is transforming.</para>
1212
1213 <para>An example is shown in the opening example, where <literal>sortWith</literal>
1214 is supplied with a function that lets it find out the <literal>sum salary</literal>
1215 for any item in the list comprehension it transforms.</para>
1216
1217 </listitem>
1218
1219
1220 <listitem>
1221
1222 <programlisting>
1223 then group by e using f
1224 </programlisting>
1225
1226 <para>This is the most general of the grouping-type statements. In this form,
1227 f is required to have type <literal>forall a. (a -> t) -> [a] -> [[a]]</literal>.
1228 As with the <literal>then f by e</literal> case above, the first argument
1229 is a function supplied to f by the compiler which lets it compute e on every
1230 element of the list being transformed. However, unlike the non-grouping case,
1231 f additionally partitions the list into a number of sublists: this means that
1232 at every point after this statement, binders occurring before it in the comprehension
1233 refer to <emphasis>lists</emphasis> of possible values, not single values. To help understand
1234 this, let's look at an example:</para>
1235
1236 <programlisting>
1237 -- This works similarly to groupWith in GHC.Exts, but doesn't sort its input first
1238 groupRuns :: Eq b => (a -> b) -> [a] -> [[a]]
1239 groupRuns f = groupBy (\x y -> f x == f y)
1240
1241 output = [ (the x, y)
1242 | x &lt;- ([1..3] ++ [1..2])
1243 , y &lt;- [4..6]
1244 , then group by x using groupRuns ]
1245 </programlisting>
1246
1247 <para>This results in the variable <literal>output</literal> taking on the value below:</para>
1248
1249 <programlisting>
1250 [(1, [4, 5, 6]), (2, [4, 5, 6]), (3, [4, 5, 6]), (1, [4, 5, 6]), (2, [4, 5, 6])]
1251 </programlisting>
1252
1253 <para>Note that we have used the <literal>the</literal> function to change the type
1254 of x from a list to its original numeric type. The variable y, in contrast, is left
1255 unchanged from the list form introduced by the grouping.</para>
1256
1257 </listitem>
1258
1259 <listitem>
1260
1261 <programlisting>
1262 then group using f
1263 </programlisting>
1264
1265 <para>With this form of the group statement, f is required to simply have the type
1266 <literal>forall a. [a] -> [[a]]</literal>, which will be used to group up the
1267 comprehension so far directly. An example of this form is as follows:</para>
1268
1269 <programlisting>
1270 output = [ x
1271 | y &lt;- [1..5]
1272 , x &lt;- "hello"
1273 , then group using inits]
1274 </programlisting>
1275
1276 <para>This will yield a list containing every prefix of the word "hello" written out 5 times:</para>
1277
1278 <programlisting>
1279 ["","h","he","hel","hell","hello","helloh","hellohe","hellohel","hellohell","hellohello","hellohelloh",...]
1280 </programlisting>
1281
1282 </listitem>
1283 </itemizedlist>
1284 </para>
1285 </sect2>
1286
1287 <!-- ===================== MONAD COMPREHENSIONS ===================== -->
1288
1289 <sect2 id="monad-comprehensions">
1290 <title>Monad comprehensions</title>
1291 <indexterm><primary>monad comprehensions</primary></indexterm>
1292
1293 <para>
1294 Monad comprehensions generalise the list comprehension notation,
1295 including parallel comprehensions
1296 (<xref linkend="parallel-list-comprehensions"/>) and
1297 transform comprehensions (<xref linkend="generalised-list-comprehensions"/>)
1298 to work for any monad.
1299 </para>
1300
1301 <para>Monad comprehensions support:</para>
1302
1303 <itemizedlist>
1304 <listitem>
1305 <para>
1306 Bindings:
1307 </para>
1308
1309 <programlisting>
1310 [ x + y | x &lt;- Just 1, y &lt;- Just 2 ]
1311 </programlisting>
1312
1313 <para>
1314 Bindings are translated with the <literal>(&gt;&gt;=)</literal> and
1315 <literal>return</literal> functions to the usual do-notation:
1316 </para>
1317
1318 <programlisting>
1319 do x &lt;- Just 1
1320 y &lt;- Just 2
1321 return (x+y)
1322 </programlisting>
1323
1324 </listitem>
1325 <listitem>
1326 <para>
1327 Guards:
1328 </para>
1329
1330 <programlisting>
1331 [ x | x &lt;- [1..10], x &lt;= 5 ]
1332 </programlisting>
1333
1334 <para>
1335 Guards are translated with the <literal>guard</literal> function,
1336 which requires a <literal>MonadPlus</literal> instance:
1337 </para>
1338
1339 <programlisting>
1340 do x &lt;- [1..10]
1341 guard (x &lt;= 5)
1342 return x
1343 </programlisting>
1344
1345 </listitem>
1346 <listitem>
1347 <para>
1348 Transform statements (as with <literal>-XTransformListComp</literal>):
1349 </para>
1350
1351 <programlisting>
1352 [ x+y | x &lt;- [1..10], y &lt;- [1..x], then take 2 ]
1353 </programlisting>
1354
1355 <para>
1356 This translates to:
1357 </para>
1358
1359 <programlisting>
1360 do (x,y) &lt;- take 2 (do x &lt;- [1..10]
1361 y &lt;- [1..x]
1362 return (x,y))
1363 return (x+y)
1364 </programlisting>
1365
1366 </listitem>
1367 <listitem>
1368 <para>
1369 Group statements (as with <literal>-XTransformListComp</literal>):
1370 </para>
1371
1372 <programlisting>
1373 [ x | x &lt;- [1,1,2,2,3], then group by x using GHC.Exts.groupWith ]
1374 [ x | x &lt;- [1,1,2,2,3], then group using myGroup ]
1375 </programlisting>
1376
1377 </listitem>
1378 <listitem>
1379 <para>
1380 Parallel statements (as with <literal>-XParallelListComp</literal>):
1381 </para>
1382
1383 <programlisting>
1384 [ (x+y) | x &lt;- [1..10]
1385 | y &lt;- [11..20]
1386 ]
1387 </programlisting>
1388
1389 <para>
1390 Parallel statements are translated using the
1391 <literal>mzip</literal> function, which requires a
1392 <literal>MonadZip</literal> instance defined in
1393 <ulink url="&libraryBaseLocation;/Control-Monad-Zip.html"><literal>Control.Monad.Zip</literal></ulink>:
1394 </para>
1395
1396 <programlisting>
1397 do (x,y) &lt;- mzip (do x &lt;- [1..10]
1398 return x)
1399 (do y &lt;- [11..20]
1400 return y)
1401 return (x+y)
1402 </programlisting>
1403
1404 </listitem>
1405 </itemizedlist>
1406
1407 <para>
1408 All these features are enabled by default if the
1409 <literal>MonadComprehensions</literal> extension is enabled. The types
1410 and more detailed examples on how to use comprehensions are explained
1411 in the previous chapters <xref
1412 linkend="generalised-list-comprehensions"/> and <xref
1413 linkend="parallel-list-comprehensions"/>. In general you just have
1414 to replace the type <literal>[a]</literal> with the type
1415 <literal>Monad m => m a</literal> for monad comprehensions.
1416 </para>
1417
1418 <para>
1419 Note: Even though most of these examples are using the list monad,
1420 monad comprehensions work for any monad.
1421 The <literal>base</literal> package offers all necessary instances for
1422 lists, which make <literal>MonadComprehensions</literal> backward
1423 compatible to built-in, transform and parallel list comprehensions.
1424 </para>
1425 <para> More formally, the desugaring is as follows. We write <literal>D[ e | Q]</literal>
1426 to mean the desugaring of the monad comprehension <literal>[ e | Q]</literal>:
1427 <programlisting>
1428 Expressions: e
1429 Declarations: d
1430 Lists of qualifiers: Q,R,S
1431
1432 -- Basic forms
1433 D[ e | ] = return e
1434 D[ e | p &lt;- e, Q ] = e &gt;&gt;= \p -&gt; D[ e | Q ]
1435 D[ e | e, Q ] = guard e &gt;&gt; \p -&gt; D[ e | Q ]
1436 D[ e | let d, Q ] = let d in D[ e | Q ]
1437
1438 -- Parallel comprehensions (iterate for multiple parallel branches)
1439 D[ e | (Q | R), S ] = mzip D[ Qv | Q ] D[ Rv | R ] &gt;&gt;= \(Qv,Rv) -&gt; D[ e | S ]
1440
1441 -- Transform comprehensions
1442 D[ e | Q then f, R ] = f D[ Qv | Q ] &gt;&gt;= \Qv -&gt; D[ e | R ]
1443
1444 D[ e | Q then f by b, R ] = f (\Qv -&gt; b) D[ Qv | Q ] &gt;&gt;= \Qv -&gt; D[ e | R ]
1445
1446 D[ e | Q then group using f, R ] = f D[ Qv | Q ] &gt;&gt;= \ys -&gt;
1447 case (fmap selQv1 ys, ..., fmap selQvn ys) of
1448 Qv -&gt; D[ e | R ]
1449
1450 D[ e | Q then group by b using f, R ] = f (\Qv -&gt; b) D[ Qv | Q ] &gt;&gt;= \ys -&gt;
1451 case (fmap selQv1 ys, ..., fmap selQvn ys) of
1452 Qv -&gt; D[ e | R ]
1453
1454 where Qv is the tuple of variables bound by Q (and used subsequently)
1455 selQvi is a selector mapping Qv to the ith component of Qv
1456
1457 Operator Standard binding Expected type
1458 --------------------------------------------------------------------
1459 return GHC.Base t1 -&gt; m t2
1460 (&gt;&gt;=) GHC.Base m1 t1 -&gt; (t2 -&gt; m2 t3) -&gt; m3 t3
1461 (&gt;&gt;) GHC.Base m1 t1 -&gt; m2 t2 -&gt; m3 t3
1462 guard Control.Monad t1 -&gt; m t2
1463 fmap GHC.Base forall a b. (a-&gt;b) -&gt; n a -&gt; n b
1464 mzip Control.Monad.Zip forall a b. m a -&gt; m b -&gt; m (a,b)
1465 </programlisting>
1466 The comprehension should typecheck when its desugaring would typecheck.
1467 </para>
1468 <para>
1469 Monad comprehensions support rebindable syntax (<xref linkend="rebindable-syntax"/>).
1470 Without rebindable
1471 syntax, the operators from the "standard binding" module are used; with
1472 rebindable syntax, the operators are looked up in the current lexical scope.
1473 For example, parallel comprehensions will be typechecked and desugared
1474 using whatever "<literal>mzip</literal>" is in scope.
1475 </para>
1476 <para>
1477 The rebindable operators must have the "Expected type" given in the
1478 table above. These types are surprisingly general. For example, you can
1479 use a bind operator with the type
1480 <programlisting>
1481 (>>=) :: T x y a -> (a -> T y z b) -> T x z b
1482 </programlisting>
1483 In the case of transform comprehensions, notice that the groups are
1484 parameterised over some arbitrary type <literal>n</literal> (provided it
1485 has an <literal>fmap</literal>, as well as
1486 the comprehension being over an arbitrary monad.
1487 </para>
1488 </sect2>
1489
1490 <!-- ===================== REBINDABLE SYNTAX =================== -->
1491
1492 <sect2 id="rebindable-syntax">
1493 <title>Rebindable syntax and the implicit Prelude import</title>
1494
1495 <para><indexterm><primary>-XNoImplicitPrelude
1496 option</primary></indexterm> GHC normally imports
1497 <filename>Prelude.hi</filename> files for you. If you'd
1498 rather it didn't, then give it a
1499 <option>-XNoImplicitPrelude</option> option. The idea is
1500 that you can then import a Prelude of your own. (But don't
1501 call it <literal>Prelude</literal>; the Haskell module
1502 namespace is flat, and you must not conflict with any
1503 Prelude module.)</para>
1504
1505 <para>Suppose you are importing a Prelude of your own
1506 in order to define your own numeric class
1507 hierarchy. It completely defeats that purpose if the
1508 literal "1" means "<literal>Prelude.fromInteger
1509 1</literal>", which is what the Haskell Report specifies.
1510 So the <option>-XRebindableSyntax</option>
1511 flag causes
1512 the following pieces of built-in syntax to refer to
1513 <emphasis>whatever is in scope</emphasis>, not the Prelude
1514 versions:
1515 <itemizedlist>
1516 <listitem>
1517 <para>An integer literal <literal>368</literal> means
1518 "<literal>fromInteger (368::Integer)</literal>", rather than
1519 "<literal>Prelude.fromInteger (368::Integer)</literal>".
1520 </para> </listitem>
1521
1522 <listitem><para>Fractional literals are handed in just the same way,
1523 except that the translation is
1524 <literal>fromRational (3.68::Rational)</literal>.
1525 </para> </listitem>
1526
1527 <listitem><para>The equality test in an overloaded numeric pattern
1528 uses whatever <literal>(==)</literal> is in scope.
1529 </para> </listitem>
1530
1531 <listitem><para>The subtraction operation, and the
1532 greater-than-or-equal test, in <literal>n+k</literal> patterns
1533 use whatever <literal>(-)</literal> and <literal>(>=)</literal> are in scope.
1534 </para></listitem>
1535
1536 <listitem>
1537 <para>Negation (e.g. "<literal>- (f x)</literal>")
1538 means "<literal>negate (f x)</literal>", both in numeric
1539 patterns, and expressions.
1540 </para></listitem>
1541
1542 <listitem>
1543 <para>Conditionals (e.g. "<literal>if</literal> e1 <literal>then</literal> e2 <literal>else</literal> e3")
1544 means "<literal>ifThenElse</literal> e1 e2 e3". However <literal>case</literal> expressions are unaffected.
1545 </para></listitem>
1546
1547 <listitem>
1548 <para>"Do" notation is translated using whatever
1549 functions <literal>(>>=)</literal>,
1550 <literal>(>>)</literal>, and <literal>fail</literal>,
1551 are in scope (not the Prelude
1552 versions). List comprehensions, mdo (<xref linkend="recursive-do-notation"/>), and parallel array
1553 comprehensions, are unaffected. </para></listitem>
1554
1555 <listitem>
1556 <para>Arrow
1557 notation (see <xref linkend="arrow-notation"/>)
1558 uses whatever <literal>arr</literal>,
1559 <literal>(>>>)</literal>, <literal>first</literal>,
1560 <literal>app</literal>, <literal>(|||)</literal> and
1561 <literal>loop</literal> functions are in scope. But unlike the
1562 other constructs, the types of these functions must match the
1563 Prelude types very closely. Details are in flux; if you want
1564 to use this, ask!
1565 </para></listitem>
1566 </itemizedlist>
1567 <option>-XRebindableSyntax</option> implies <option>-XNoImplicitPrelude</option>.
1568 </para>
1569 <para>
1570 In all cases (apart from arrow notation), the static semantics should be that of the desugared form,
1571 even if that is a little unexpected. For example, the
1572 static semantics of the literal <literal>368</literal>
1573 is exactly that of <literal>fromInteger (368::Integer)</literal>; it's fine for
1574 <literal>fromInteger</literal> to have any of the types:
1575 <programlisting>
1576 fromInteger :: Integer -> Integer
1577 fromInteger :: forall a. Foo a => Integer -> a
1578 fromInteger :: Num a => a -> Integer
1579 fromInteger :: Integer -> Bool -> Bool
1580 </programlisting>
1581 </para>
1582
1583 <para>Be warned: this is an experimental facility, with
1584 fewer checks than usual. Use <literal>-dcore-lint</literal>
1585 to typecheck the desugared program. If Core Lint is happy
1586 you should be all right.</para>
1587
1588 </sect2>
1589
1590 <sect2 id="postfix-operators">
1591 <title>Postfix operators</title>
1592
1593 <para>
1594 The <option>-XPostfixOperators</option> flag enables a small
1595 extension to the syntax of left operator sections, which allows you to
1596 define postfix operators. The extension is this: the left section
1597 <programlisting>
1598 (e !)
1599 </programlisting>
1600 is equivalent (from the point of view of both type checking and execution) to the expression
1601 <programlisting>
1602 ((!) e)
1603 </programlisting>
1604 (for any expression <literal>e</literal> and operator <literal>(!)</literal>.
1605 The strict Haskell 98 interpretation is that the section is equivalent to
1606 <programlisting>
1607 (\y -> (!) e y)
1608 </programlisting>
1609 That is, the operator must be a function of two arguments. GHC allows it to
1610 take only one argument, and that in turn allows you to write the function
1611 postfix.
1612 </para>
1613 <para>The extension does not extend to the left-hand side of function
1614 definitions; you must define such a function in prefix form.</para>
1615
1616 </sect2>
1617
1618 <sect2 id="tuple-sections">
1619 <title>Tuple sections</title>
1620
1621 <para>
1622 The <option>-XTupleSections</option> flag enables Python-style partially applied
1623 tuple constructors. For example, the following program
1624 <programlisting>
1625 (, True)
1626 </programlisting>
1627 is considered to be an alternative notation for the more unwieldy alternative
1628 <programlisting>
1629 \x -> (x, True)
1630 </programlisting>
1631 You can omit any combination of arguments to the tuple, as in the following
1632 <programlisting>
1633 (, "I", , , "Love", , 1337)
1634 </programlisting>
1635 which translates to
1636 <programlisting>
1637 \a b c d -> (a, "I", b, c, "Love", d, 1337)
1638 </programlisting>
1639 </para>
1640
1641 <para>
1642 If you have <link linkend="unboxed-tuples">unboxed tuples</link> enabled, tuple sections
1643 will also be available for them, like so
1644 <programlisting>
1645 (# , True #)
1646 </programlisting>
1647 Because there is no unboxed unit tuple, the following expression
1648 <programlisting>
1649 (# #)
1650 </programlisting>
1651 continues to stand for the unboxed singleton tuple data constructor.
1652 </para>
1653
1654 </sect2>
1655
1656 <sect2 id="lambda-case">
1657 <title>Lambda-case</title>
1658 <para>
1659 The <option>-XLambdaCase</option> flag enables expressions of the form
1660 <programlisting>
1661 \case { p1 -> e1; ...; pN -> eN }
1662 </programlisting>
1663 which is equivalent to
1664 <programlisting>
1665 \freshName -> case freshName of { p1 -> e1; ...; pN -> eN }
1666 </programlisting>
1667 Note that <literal>\case</literal> starts a layout, so you can write
1668 <programlisting>
1669 \case
1670 p1 -> e1
1671 ...
1672 pN -> eN
1673 </programlisting>
1674 </para>
1675 </sect2>
1676
1677 <sect2 id="empty-case">
1678 <title>Empty case alternatives</title>
1679 <para>
1680 The <option>-XEmptyCase</option> flag enables
1681 case expressions, or lambda-case expressions, that have no alternatives,
1682 thus:
1683 <programlisting>
1684 case e of { } -- No alternatives
1685 or
1686 \case { } -- -XLambdaCase is also required
1687 </programlisting>
1688 This can be useful when you know that the expression being scrutinised
1689 has no non-bottom values. For example:
1690 <programlisting>
1691 data Void
1692 f :: Void -> Int
1693 f x = case x of { }
1694 </programlisting>
1695 With dependently-typed features it is more useful
1696 (see <ulink url="http://hackage.haskell.org/trac/ghc/ticket/2431">Trac</ulink>).
1697 For example, consider these two candidate definitions of <literal>absurd</literal>:
1698 <programlisting>
1699 data a :==: b where
1700 Refl :: a :==: a
1701
1702 absurd :: True :~: False -> a
1703 absurd x = error "absurd" -- (A)
1704 absurd x = case x of {} -- (B)
1705 </programlisting>
1706 We much prefer (B). Why? Because GHC can figure out that <literal>(True :~: False)</literal>
1707 is an empty type. So (B) has no partiality and GHC should be able to compile with
1708 <option>-fwarn-incomplete-patterns</option>. (Though the pattern match checking is not
1709 yet clever enough to do that.
1710 On the other hand (A) looks dangerous, and GHC doesn't check to make
1711 sure that, in fact, the function can never get called.
1712 </para>
1713 </sect2>
1714
1715 <sect2 id="multi-way-if">
1716 <title>Multi-way if-expressions</title>
1717 <para>
1718 With <option>-XMultiWayIf</option> flag GHC accepts conditional expressions
1719 with multiple branches:
1720 <programlisting>
1721 if | guard1 -> expr1
1722 | ...
1723 | guardN -> exprN
1724 </programlisting>
1725 which is roughly equivalent to
1726 <programlisting>
1727 case () of
1728 _ | guard1 -> expr1
1729 ...
1730 _ | guardN -> exprN
1731 </programlisting>
1732 except that multi-way if-expressions do not alter the layout.
1733 </para>
1734 </sect2>
1735
1736 <sect2 id="disambiguate-fields">
1737 <title>Record field disambiguation</title>
1738 <para>
1739 In record construction and record pattern matching
1740 it is entirely unambiguous which field is referred to, even if there are two different
1741 data types in scope with a common field name. For example:
1742 <programlisting>
1743 module M where
1744 data S = MkS { x :: Int, y :: Bool }
1745
1746 module Foo where
1747 import M
1748
1749 data T = MkT { x :: Int }
1750
1751 ok1 (MkS { x = n }) = n+1 -- Unambiguous
1752 ok2 n = MkT { x = n+1 } -- Unambiguous
1753
1754 bad1 k = k { x = 3 } -- Ambiguous
1755 bad2 k = x k -- Ambiguous
1756 </programlisting>
1757 Even though there are two <literal>x</literal>'s in scope,
1758 it is clear that the <literal>x</literal> in the pattern in the
1759 definition of <literal>ok1</literal> can only mean the field
1760 <literal>x</literal> from type <literal>S</literal>. Similarly for
1761 the function <literal>ok2</literal>. However, in the record update
1762 in <literal>bad1</literal> and the record selection in <literal>bad2</literal>
1763 it is not clear which of the two types is intended.
1764 </para>
1765 <para>
1766 Haskell 98 regards all four as ambiguous, but with the
1767 <option>-XDisambiguateRecordFields</option> flag, GHC will accept
1768 the former two. The rules are precisely the same as those for instance
1769 declarations in Haskell 98, where the method names on the left-hand side
1770 of the method bindings in an instance declaration refer unambiguously
1771 to the method of that class (provided they are in scope at all), even
1772 if there are other variables in scope with the same name.
1773 This reduces the clutter of qualified names when you import two
1774 records from different modules that use the same field name.
1775 </para>
1776 <para>
1777 Some details:
1778 <itemizedlist>
1779 <listitem><para>
1780 Field disambiguation can be combined with punning (see <xref linkend="record-puns"/>). For example:
1781 <programlisting>
1782 module Foo where
1783 import M
1784 x=True
1785 ok3 (MkS { x }) = x+1 -- Uses both disambiguation and punning
1786 </programlisting>
1787 </para></listitem>
1788
1789 <listitem><para>
1790 With <option>-XDisambiguateRecordFields</option> you can use <emphasis>unqualified</emphasis>
1791 field names even if the corresponding selector is only in scope <emphasis>qualified</emphasis>
1792 For example, assuming the same module <literal>M</literal> as in our earlier example, this is legal:
1793 <programlisting>
1794 module Foo where
1795 import qualified M -- Note qualified
1796
1797 ok4 (M.MkS { x = n }) = n+1 -- Unambiguous
1798 </programlisting>
1799 Since the constructor <literal>MkS</literal> is only in scope qualified, you must
1800 name it <literal>M.MkS</literal>, but the field <literal>x</literal> does not need
1801 to be qualified even though <literal>M.x</literal> is in scope but <literal>x</literal>
1802 is not. (In effect, it is qualified by the constructor.)
1803 </para></listitem>
1804 </itemizedlist>
1805 </para>
1806
1807 </sect2>
1808
1809 <!-- ===================== Record puns =================== -->
1810
1811 <sect2 id="record-puns">
1812 <title>Record puns
1813 </title>
1814
1815 <para>
1816 Record puns are enabled by the flag <literal>-XNamedFieldPuns</literal>.
1817 </para>
1818
1819 <para>
1820 When using records, it is common to write a pattern that binds a
1821 variable with the same name as a record field, such as:
1822
1823 <programlisting>
1824 data C = C {a :: Int}
1825 f (C {a = a}) = a
1826 </programlisting>
1827 </para>
1828
1829 <para>
1830 Record punning permits the variable name to be elided, so one can simply
1831 write
1832
1833 <programlisting>
1834 f (C {a}) = a
1835 </programlisting>
1836
1837 to mean the same pattern as above. That is, in a record pattern, the
1838 pattern <literal>a</literal> expands into the pattern <literal>a =
1839 a</literal> for the same name <literal>a</literal>.
1840 </para>
1841
1842 <para>
1843 Note that:
1844 <itemizedlist>
1845 <listitem><para>
1846 Record punning can also be used in an expression, writing, for example,
1847 <programlisting>
1848 let a = 1 in C {a}
1849 </programlisting>
1850 instead of
1851 <programlisting>
1852 let a = 1 in C {a = a}
1853 </programlisting>
1854 The expansion is purely syntactic, so the expanded right-hand side
1855 expression refers to the nearest enclosing variable that is spelled the
1856 same as the field name.
1857 </para></listitem>
1858
1859 <listitem><para>
1860 Puns and other patterns can be mixed in the same record:
1861 <programlisting>
1862 data C = C {a :: Int, b :: Int}
1863 f (C {a, b = 4}) = a
1864 </programlisting>
1865 </para></listitem>
1866
1867 <listitem><para>
1868 Puns can be used wherever record patterns occur (e.g. in
1869 <literal>let</literal> bindings or at the top-level).
1870 </para></listitem>
1871
1872 <listitem><para>
1873 A pun on a qualified field name is expanded by stripping off the module qualifier.
1874 For example:
1875 <programlisting>
1876 f (C {M.a}) = a
1877 </programlisting>
1878 means
1879 <programlisting>
1880 f (M.C {M.a = a}) = a
1881 </programlisting>
1882 (This is useful if the field selector <literal>a</literal> for constructor <literal>M.C</literal>
1883 is only in scope in qualified form.)
1884 </para></listitem>
1885 </itemizedlist>
1886 </para>
1887
1888
1889 </sect2>
1890
1891 <!-- ===================== Record wildcards =================== -->
1892
1893 <sect2 id="record-wildcards">
1894 <title>Record wildcards
1895 </title>
1896
1897 <para>
1898 Record wildcards are enabled by the flag <literal>-XRecordWildCards</literal>.
1899 This flag implies <literal>-XDisambiguateRecordFields</literal>.
1900 </para>
1901
1902 <para>
1903 For records with many fields, it can be tiresome to write out each field
1904 individually in a record pattern, as in
1905 <programlisting>
1906 data C = C {a :: Int, b :: Int, c :: Int, d :: Int}
1907 f (C {a = 1, b = b, c = c, d = d}) = b + c + d
1908 </programlisting>
1909 </para>
1910
1911 <para>
1912 Record wildcard syntax permits a "<literal>..</literal>" in a record
1913 pattern, where each elided field <literal>f</literal> is replaced by the
1914 pattern <literal>f = f</literal>. For example, the above pattern can be
1915 written as
1916 <programlisting>
1917 f (C {a = 1, ..}) = b + c + d
1918 </programlisting>
1919 </para>
1920
1921 <para>
1922 More details:
1923 <itemizedlist>
1924 <listitem><para>
1925 Wildcards can be mixed with other patterns, including puns
1926 (<xref linkend="record-puns"/>); for example, in a pattern <literal>C {a
1927 = 1, b, ..})</literal>. Additionally, record wildcards can be used
1928 wherever record patterns occur, including in <literal>let</literal>
1929 bindings and at the top-level. For example, the top-level binding
1930 <programlisting>
1931 C {a = 1, ..} = e
1932 </programlisting>
1933 defines <literal>b</literal>, <literal>c</literal>, and
1934 <literal>d</literal>.
1935 </para></listitem>
1936
1937 <listitem><para>
1938 Record wildcards can also be used in expressions, writing, for example,
1939 <programlisting>
1940 let {a = 1; b = 2; c = 3; d = 4} in C {..}
1941 </programlisting>
1942 in place of
1943 <programlisting>
1944 let {a = 1; b = 2; c = 3; d = 4} in C {a=a, b=b, c=c, d=d}
1945 </programlisting>
1946 The expansion is purely syntactic, so the record wildcard
1947 expression refers to the nearest enclosing variables that are spelled
1948 the same as the omitted field names.
1949 </para></listitem>
1950
1951 <listitem><para>
1952 The "<literal>..</literal>" expands to the missing
1953 <emphasis>in-scope</emphasis> record fields.
1954 Specifically the expansion of "<literal>C {..}</literal>" includes
1955 <literal>f</literal> if and only if:
1956 <itemizedlist>
1957 <listitem><para>
1958 <literal>f</literal> is a record field of constructor <literal>C</literal>.
1959 </para></listitem>
1960 <listitem><para>
1961 The record field <literal>f</literal> is in scope somehow (either qualified or unqualified).
1962 </para></listitem>
1963 <listitem><para>
1964 In the case of expressions (but not patterns),
1965 the variable <literal>f</literal> is in scope unqualified,
1966 apart from the binding of the record selector itself.
1967 </para></listitem>
1968 </itemizedlist>
1969 For example
1970 <programlisting>
1971 module M where
1972 data R = R { a,b,c :: Int }
1973 module X where
1974 import M( R(a,c) )
1975 f b = R { .. }
1976 </programlisting>
1977 The <literal>R{..}</literal> expands to <literal>R{M.a=a}</literal>,
1978 omitting <literal>b</literal> since the record field is not in scope,
1979 and omitting <literal>c</literal> since the variable <literal>c</literal>
1980 is not in scope (apart from the binding of the
1981 record selector <literal>c</literal>, of course).
1982 </para></listitem>
1983 </itemizedlist>
1984 </para>
1985
1986 </sect2>
1987
1988 <!-- ===================== Local fixity declarations =================== -->
1989
1990 <sect2 id="local-fixity-declarations">
1991 <title>Local Fixity Declarations
1992 </title>
1993
1994 <para>A careful reading of the Haskell 98 Report reveals that fixity
1995 declarations (<literal>infix</literal>, <literal>infixl</literal>, and
1996 <literal>infixr</literal>) are permitted to appear inside local bindings
1997 such those introduced by <literal>let</literal> and
1998 <literal>where</literal>. However, the Haskell Report does not specify
1999 the semantics of such bindings very precisely.
2000 </para>
2001
2002 <para>In GHC, a fixity declaration may accompany a local binding:
2003 <programlisting>
2004 let f = ...
2005 infixr 3 `f`
2006 in
2007 ...
2008 </programlisting>
2009 and the fixity declaration applies wherever the binding is in scope.
2010 For example, in a <literal>let</literal>, it applies in the right-hand
2011 sides of other <literal>let</literal>-bindings and the body of the
2012 <literal>let</literal>C. Or, in recursive <literal>do</literal>
2013 expressions (<xref linkend="recursive-do-notation"/>), the local fixity
2014 declarations of a <literal>let</literal> statement scope over other
2015 statements in the group, just as the bound name does.
2016 </para>
2017
2018 <para>
2019 Moreover, a local fixity declaration *must* accompany a local binding of
2020 that name: it is not possible to revise the fixity of name bound
2021 elsewhere, as in
2022 <programlisting>
2023 let infixr 9 $ in ...
2024 </programlisting>
2025
2026 Because local fixity declarations are technically Haskell 98, no flag is
2027 necessary to enable them.
2028 </para>
2029 </sect2>
2030
2031 <sect2 id="package-imports">
2032 <title>Package-qualified imports</title>
2033
2034 <para>With the <option>-XPackageImports</option> flag, GHC allows
2035 import declarations to be qualified by the package name that the
2036 module is intended to be imported from. For example:</para>
2037
2038 <programlisting>
2039 import "network" Network.Socket
2040 </programlisting>
2041
2042 <para>would import the module <literal>Network.Socket</literal> from
2043 the package <literal>network</literal> (any version). This may
2044 be used to disambiguate an import when the same module is
2045 available from multiple packages, or is present in both the
2046 current package being built and an external package.</para>
2047
2048 <para>The special package name <literal>this</literal> can be used to
2049 refer to the current package being built.</para>
2050
2051 <para>Note: you probably don't need to use this feature, it was
2052 added mainly so that we can build backwards-compatible versions of
2053 packages when APIs change. It can lead to fragile dependencies in
2054 the common case: modules occasionally move from one package to
2055 another, rendering any package-qualified imports broken.</para>
2056 </sect2>
2057
2058 <sect2 id="safe-imports-ext">
2059 <title>Safe imports</title>
2060
2061 <para>With the <option>-XSafe</option>, <option>-XTrustworthy</option>
2062 and <option>-XUnsafe</option> language flags, GHC extends
2063 the import declaration syntax to take an optional <literal>safe</literal>
2064 keyword after the <literal>import</literal> keyword. This feature
2065 is part of the Safe Haskell GHC extension. For example:</para>
2066
2067 <programlisting>
2068 import safe qualified Network.Socket as NS
2069 </programlisting>
2070
2071 <para>would import the module <literal>Network.Socket</literal>
2072 with compilation only succeeding if Network.Socket can be
2073 safely imported. For a description of when a import is
2074 considered safe see <xref linkend="safe-haskell"/></para>
2075
2076 </sect2>
2077
2078 <sect2 id="syntax-stolen">
2079 <title>Summary of stolen syntax</title>
2080
2081 <para>Turning on an option that enables special syntax
2082 <emphasis>might</emphasis> cause working Haskell 98 code to fail
2083 to compile, perhaps because it uses a variable name which has
2084 become a reserved word. This section lists the syntax that is
2085 "stolen" by language extensions.
2086 We use
2087 notation and nonterminal names from the Haskell 98 lexical syntax
2088 (see the Haskell 98 Report).
2089 We only list syntax changes here that might affect
2090 existing working programs (i.e. "stolen" syntax). Many of these
2091 extensions will also enable new context-free syntax, but in all
2092 cases programs written to use the new syntax would not be
2093 compilable without the option enabled.</para>
2094
2095 <para>There are two classes of special
2096 syntax:
2097
2098 <itemizedlist>
2099 <listitem>
2100 <para>New reserved words and symbols: character sequences
2101 which are no longer available for use as identifiers in the
2102 program.</para>
2103 </listitem>
2104 <listitem>
2105 <para>Other special syntax: sequences of characters that have
2106 a different meaning when this particular option is turned
2107 on.</para>
2108 </listitem>
2109 </itemizedlist>
2110
2111 The following syntax is stolen:
2112
2113 <variablelist>
2114 <varlistentry>
2115 <term>
2116 <literal>forall</literal>
2117 <indexterm><primary><literal>forall</literal></primary></indexterm>
2118 </term>
2119 <listitem><para>
2120 Stolen (in types) by: <option>-XExplicitForAll</option>, and hence by
2121 <option>-XScopedTypeVariables</option>,
2122 <option>-XLiberalTypeSynonyms</option>,
2123 <option>-XRankNTypes</option>,
2124 <option>-XExistentialQuantification</option>
2125 </para></listitem>
2126 </varlistentry>
2127
2128 <varlistentry>
2129 <term>
2130 <literal>mdo</literal>
2131 <indexterm><primary><literal>mdo</literal></primary></indexterm>
2132 </term>
2133 <listitem><para>
2134 Stolen by: <option>-XRecursiveDo</option>
2135 </para></listitem>
2136 </varlistentry>
2137
2138 <varlistentry>
2139 <term>
2140 <literal>foreign</literal>
2141 <indexterm><primary><literal>foreign</literal></primary></indexterm>
2142 </term>
2143 <listitem><para>
2144 Stolen by: <option>-XForeignFunctionInterface</option>
2145 </para></listitem>
2146 </varlistentry>
2147
2148 <varlistentry>
2149 <term>
2150 <literal>rec</literal>,
2151 <literal>proc</literal>, <literal>-&lt;</literal>,
2152 <literal>&gt;-</literal>, <literal>-&lt;&lt;</literal>,
2153 <literal>&gt;&gt;-</literal>, and <literal>(|</literal>,
2154 <literal>|)</literal> brackets
2155 <indexterm><primary><literal>proc</literal></primary></indexterm>
2156 </term>
2157 <listitem><para>
2158 Stolen by: <option>-XArrows</option>
2159 </para></listitem>
2160 </varlistentry>
2161
2162 <varlistentry>
2163 <term>
2164 <literal>?<replaceable>varid</replaceable></literal>,
2165 <literal>%<replaceable>varid</replaceable></literal>
2166 <indexterm><primary>implicit parameters</primary></indexterm>
2167 </term>
2168 <listitem><para>
2169 Stolen by: <option>-XImplicitParams</option>
2170 </para></listitem>
2171 </varlistentry>
2172
2173 <varlistentry>
2174 <term>
2175 <literal>[|</literal>,
2176 <literal>[e|</literal>, <literal>[p|</literal>,
2177 <literal>[d|</literal>, <literal>[t|</literal>,
2178 <literal>$(</literal>,
2179 <literal>$<replaceable>varid</replaceable></literal>
2180 <indexterm><primary>Template Haskell</primary></indexterm>
2181 </term>
2182 <listitem><para>
2183 Stolen by: <option>-XTemplateHaskell</option>
2184 </para></listitem>
2185 </varlistentry>
2186
2187 <varlistentry>
2188 <term>
2189 <literal>[:<replaceable>varid</replaceable>|</literal>
2190 <indexterm><primary>quasi-quotation</primary></indexterm>
2191 </term>
2192 <listitem><para>
2193 Stolen by: <option>-XQuasiQuotes</option>
2194 </para></listitem>
2195 </varlistentry>
2196
2197 <varlistentry>
2198 <term>
2199 <replaceable>varid</replaceable>{<literal>&num;</literal>},
2200 <replaceable>char</replaceable><literal>&num;</literal>,
2201 <replaceable>string</replaceable><literal>&num;</literal>,
2202 <replaceable>integer</replaceable><literal>&num;</literal>,
2203 <replaceable>float</replaceable><literal>&num;</literal>,
2204 <replaceable>float</replaceable><literal>&num;&num;</literal>,
2205 <literal>(&num;</literal>, <literal>&num;)</literal>
2206 </term>
2207 <listitem><para>
2208 Stolen by: <option>-XMagicHash</option>
2209 </para></listitem>
2210 </varlistentry>
2211 </variablelist>
2212 </para>
2213 </sect2>
2214 </sect1>
2215
2216
2217 <!-- TYPE SYSTEM EXTENSIONS -->
2218 <sect1 id="data-type-extensions">
2219 <title>Extensions to data types and type synonyms</title>
2220
2221 <sect2 id="nullary-types">
2222 <title>Data types with no constructors</title>
2223
2224 <para>With the <option>-XEmptyDataDecls</option> flag (or equivalent LANGUAGE pragma),
2225 GHC lets you declare a data type with no constructors. For example:</para>
2226
2227 <programlisting>
2228 data S -- S :: *
2229 data T a -- T :: * -> *
2230 </programlisting>
2231
2232 <para>Syntactically, the declaration lacks the "= constrs" part. The
2233 type can be parameterised over types of any kind, but if the kind is
2234 not <literal>*</literal> then an explicit kind annotation must be used
2235 (see <xref linkend="kinding"/>).</para>
2236
2237 <para>Such data types have only one value, namely bottom.
2238 Nevertheless, they can be useful when defining "phantom types".</para>
2239 </sect2>
2240
2241 <sect2 id="datatype-contexts">
2242 <title>Data type contexts</title>
2243
2244 <para>Haskell allows datatypes to be given contexts, e.g.</para>
2245
2246 <programlisting>
2247 data Eq a => Set a = NilSet | ConsSet a (Set a)
2248 </programlisting>
2249
2250 <para>give constructors with types:</para>
2251
2252 <programlisting>
2253 NilSet :: Set a
2254 ConsSet :: Eq a => a -> Set a -> Set a
2255 </programlisting>
2256
2257 <para>This is widely considered a misfeature, and is going to be removed from
2258 the language. In GHC, it is controlled by the deprecated extension
2259 <literal>DatatypeContexts</literal>.</para>
2260 </sect2>
2261
2262 <sect2 id="infix-tycons">
2263 <title>Infix type constructors, classes, and type variables</title>
2264
2265 <para>
2266 GHC allows type constructors, classes, and type variables to be operators, and
2267 to be written infix, very much like expressions. More specifically:
2268 <itemizedlist>
2269 <listitem><para>
2270 A type constructor or class can be an operator, beginning with a colon; e.g. <literal>:*:</literal>.
2271 The lexical syntax is the same as that for data constructors.
2272 </para></listitem>
2273 <listitem><para>
2274 Data type and type-synonym declarations can be written infix, parenthesised
2275 if you want further arguments. E.g.
2276 <screen>
2277 data a :*: b = Foo a b
2278 type a :+: b = Either a b
2279 class a :=: b where ...
2280
2281 data (a :**: b) x = Baz a b x
2282 type (a :++: b) y = Either (a,b) y
2283 </screen>
2284 </para></listitem>
2285 <listitem><para>
2286 Types, and class constraints, can be written infix. For example
2287 <screen>
2288 x :: Int :*: Bool
2289 f :: (a :=: b) => a -> b
2290 </screen>
2291 </para></listitem>
2292 <listitem><para>
2293 Back-quotes work
2294 as for expressions, both for type constructors and type variables; e.g. <literal>Int `Either` Bool</literal>, or
2295 <literal>Int `a` Bool</literal>. Similarly, parentheses work the same; e.g. <literal>(:*:) Int Bool</literal>.
2296 </para></listitem>
2297 <listitem><para>
2298 Fixities may be declared for type constructors, or classes, just as for data constructors. However,
2299 one cannot distinguish between the two in a fixity declaration; a fixity declaration
2300 sets the fixity for a data constructor and the corresponding type constructor. For example:
2301 <screen>
2302 infixl 7 T, :*:
2303 </screen>
2304 sets the fixity for both type constructor <literal>T</literal> and data constructor <literal>T</literal>,
2305 and similarly for <literal>:*:</literal>.
2306 <literal>Int `a` Bool</literal>.
2307 </para></listitem>
2308 <listitem><para>
2309 Function arrow is <literal>infixr</literal> with fixity 0. (This might change; I'm not sure what it should be.)
2310 </para></listitem>
2311
2312 </itemizedlist>
2313 </para>
2314 </sect2>
2315
2316 <sect2 id="type-synonyms">
2317 <title>Liberalised type synonyms</title>
2318
2319 <para>
2320 Type synonyms are like macros at the type level, but Haskell 98 imposes many rules
2321 on individual synonym declarations.
2322 With the <option>-XLiberalTypeSynonyms</option> extension,
2323 GHC does validity checking on types <emphasis>only after expanding type synonyms</emphasis>.
2324 That means that GHC can be very much more liberal about type synonyms than Haskell 98.
2325
2326 <itemizedlist>
2327 <listitem> <para>You can write a <literal>forall</literal> (including overloading)
2328 in a type synonym, thus:
2329 <programlisting>
2330 type Discard a = forall b. Show b => a -> b -> (a, String)
2331
2332 f :: Discard a
2333 f x y = (x, show y)
2334
2335 g :: Discard Int -> (Int,String) -- A rank-2 type
2336 g f = f 3 True
2337 </programlisting>
2338 </para>
2339 </listitem>
2340
2341 <listitem><para>
2342 If you also use <option>-XUnboxedTuples</option>,
2343 you can write an unboxed tuple in a type synonym:
2344 <programlisting>
2345 type Pr = (# Int, Int #)
2346
2347 h :: Int -> Pr
2348 h x = (# x, x #)
2349 </programlisting>
2350 </para></listitem>
2351
2352 <listitem><para>
2353 You can apply a type synonym to a forall type:
2354 <programlisting>
2355 type Foo a = a -> a -> Bool
2356
2357 f :: Foo (forall b. b->b)
2358 </programlisting>
2359 After expanding the synonym, <literal>f</literal> has the legal (in GHC) type:
2360 <programlisting>
2361 f :: (forall b. b->b) -> (forall b. b->b) -> Bool
2362 </programlisting>
2363 </para></listitem>
2364
2365 <listitem><para>
2366 You can apply a type synonym to a partially applied type synonym:
2367 <programlisting>
2368 type Generic i o = forall x. i x -> o x
2369 type Id x = x
2370
2371 foo :: Generic Id []
2372 </programlisting>
2373 After expanding the synonym, <literal>foo</literal> has the legal (in GHC) type:
2374 <programlisting>
2375 foo :: forall x. x -> [x]
2376 </programlisting>
2377 </para></listitem>
2378
2379 </itemizedlist>
2380 </para>
2381
2382 <para>
2383 GHC currently does kind checking before expanding synonyms (though even that
2384 could be changed.)
2385 </para>
2386 <para>
2387 After expanding type synonyms, GHC does validity checking on types, looking for
2388 the following mal-formedness which isn't detected simply by kind checking:
2389 <itemizedlist>
2390 <listitem><para>
2391 Type constructor applied to a type involving for-alls.
2392 </para></listitem>
2393 <listitem><para>
2394 Unboxed tuple on left of an arrow.
2395 </para></listitem>
2396 <listitem><para>
2397 Partially-applied type synonym.
2398 </para></listitem>
2399 </itemizedlist>
2400 So, for example,
2401 this will be rejected:
2402 <programlisting>
2403 type Pr = (# Int, Int #)
2404
2405 h :: Pr -> Int
2406 h x = ...
2407 </programlisting>
2408 because GHC does not allow unboxed tuples on the left of a function arrow.
2409 </para>
2410 </sect2>
2411
2412
2413 <sect2 id="existential-quantification">
2414 <title>Existentially quantified data constructors
2415 </title>
2416
2417 <para>
2418 The idea of using existential quantification in data type declarations
2419 was suggested by Perry, and implemented in Hope+ (Nigel Perry, <emphasis>The Implementation
2420 of Practical Functional Programming Languages</emphasis>, PhD Thesis, University of
2421 London, 1991). It was later formalised by Laufer and Odersky
2422 (<emphasis>Polymorphic type inference and abstract data types</emphasis>,
2423 TOPLAS, 16(5), pp1411-1430, 1994).
2424 It's been in Lennart
2425 Augustsson's <command>hbc</command> Haskell compiler for several years, and
2426 proved very useful. Here's the idea. Consider the declaration:
2427 </para>
2428
2429 <para>
2430
2431 <programlisting>
2432 data Foo = forall a. MkFoo a (a -> Bool)
2433 | Nil
2434 </programlisting>
2435
2436 </para>
2437
2438 <para>
2439 The data type <literal>Foo</literal> has two constructors with types:
2440 </para>
2441
2442 <para>
2443
2444 <programlisting>
2445 MkFoo :: forall a. a -> (a -> Bool) -> Foo
2446 Nil :: Foo
2447 </programlisting>
2448
2449 </para>
2450
2451 <para>
2452 Notice that the type variable <literal>a</literal> in the type of <function>MkFoo</function>
2453 does not appear in the data type itself, which is plain <literal>Foo</literal>.
2454 For example, the following expression is fine:
2455 </para>
2456
2457 <para>
2458
2459 <programlisting>
2460 [MkFoo 3 even, MkFoo 'c' isUpper] :: [Foo]
2461 </programlisting>
2462
2463 </para>
2464
2465 <para>
2466 Here, <literal>(MkFoo 3 even)</literal> packages an integer with a function
2467 <function>even</function> that maps an integer to <literal>Bool</literal>; and <function>MkFoo 'c'
2468 isUpper</function> packages a character with a compatible function. These
2469 two things are each of type <literal>Foo</literal> and can be put in a list.
2470 </para>
2471
2472 <para>
2473 What can we do with a value of type <literal>Foo</literal>?. In particular,
2474 what happens when we pattern-match on <function>MkFoo</function>?
2475 </para>
2476
2477 <para>
2478
2479 <programlisting>
2480 f (MkFoo val fn) = ???
2481 </programlisting>
2482
2483 </para>
2484
2485 <para>
2486 Since all we know about <literal>val</literal> and <function>fn</function> is that they
2487 are compatible, the only (useful) thing we can do with them is to
2488 apply <function>fn</function> to <literal>val</literal> to get a boolean. For example:
2489 </para>
2490
2491 <para>
2492
2493 <programlisting>
2494 f :: Foo -> Bool
2495 f (MkFoo val fn) = fn val
2496 </programlisting>
2497
2498 </para>
2499
2500 <para>
2501 What this allows us to do is to package heterogeneous values
2502 together with a bunch of functions that manipulate them, and then treat
2503 that collection of packages in a uniform manner. You can express
2504 quite a bit of object-oriented-like programming this way.
2505 </para>
2506
2507 <sect3 id="existential">
2508 <title>Why existential?
2509 </title>
2510
2511 <para>
2512 What has this to do with <emphasis>existential</emphasis> quantification?
2513 Simply that <function>MkFoo</function> has the (nearly) isomorphic type
2514 </para>
2515
2516 <para>
2517
2518 <programlisting>
2519 MkFoo :: (exists a . (a, a -> Bool)) -> Foo
2520 </programlisting>
2521
2522 </para>
2523
2524 <para>
2525 But Haskell programmers can safely think of the ordinary
2526 <emphasis>universally</emphasis> quantified type given above, thereby avoiding
2527 adding a new existential quantification construct.
2528 </para>
2529
2530 </sect3>
2531
2532 <sect3 id="existential-with-context">
2533 <title>Existentials and type classes</title>
2534
2535 <para>
2536 An easy extension is to allow
2537 arbitrary contexts before the constructor. For example:
2538 </para>
2539
2540 <para>
2541
2542 <programlisting>
2543 data Baz = forall a. Eq a => Baz1 a a
2544 | forall b. Show b => Baz2 b (b -> b)
2545 </programlisting>
2546
2547 </para>
2548
2549 <para>
2550 The two constructors have the types you'd expect:
2551 </para>
2552
2553 <para>
2554
2555 <programlisting>
2556 Baz1 :: forall a. Eq a => a -> a -> Baz
2557 Baz2 :: forall b. Show b => b -> (b -> b) -> Baz
2558 </programlisting>
2559
2560 </para>
2561
2562 <para>
2563 But when pattern matching on <function>Baz1</function> the matched values can be compared
2564 for equality, and when pattern matching on <function>Baz2</function> the first matched
2565 value can be converted to a string (as well as applying the function to it).
2566 So this program is legal:
2567 </para>
2568
2569 <para>
2570
2571 <programlisting>
2572 f :: Baz -> String
2573 f (Baz1 p q) | p == q = "Yes"
2574 | otherwise = "No"
2575 f (Baz2 v fn) = show (fn v)
2576 </programlisting>
2577
2578 </para>
2579
2580 <para>
2581 Operationally, in a dictionary-passing implementation, the
2582 constructors <function>Baz1</function> and <function>Baz2</function> must store the
2583 dictionaries for <literal>Eq</literal> and <literal>Show</literal> respectively, and
2584 extract it on pattern matching.
2585 </para>
2586
2587 </sect3>
2588
2589 <sect3 id="existential-records">
2590 <title>Record Constructors</title>
2591
2592 <para>
2593 GHC allows existentials to be used with records syntax as well. For example:
2594
2595 <programlisting>
2596 data Counter a = forall self. NewCounter
2597 { _this :: self
2598 , _inc :: self -> self
2599 , _display :: self -> IO ()
2600 , tag :: a
2601 }
2602 </programlisting>
2603 Here <literal>tag</literal> is a public field, with a well-typed selector
2604 function <literal>tag :: Counter a -> a</literal>. The <literal>self</literal>
2605 type is hidden from the outside; any attempt to apply <literal>_this</literal>,
2606 <literal>_inc</literal> or <literal>_display</literal> as functions will raise a
2607 compile-time error. In other words, <emphasis>GHC defines a record selector function
2608 only for fields whose type does not mention the existentially-quantified variables</emphasis>.
2609 (This example used an underscore in the fields for which record selectors
2610 will not be defined, but that is only programming style; GHC ignores them.)
2611 </para>
2612
2613 <para>
2614 To make use of these hidden fields, we need to create some helper functions:
2615
2616 <programlisting>
2617 inc :: Counter a -> Counter a
2618 inc (NewCounter x i d t) = NewCounter
2619 { _this = i x, _inc = i, _display = d, tag = t }
2620
2621 display :: Counter a -> IO ()
2622 display NewCounter{ _this = x, _display = d } = d x
2623 </programlisting>
2624
2625 Now we can define counters with different underlying implementations:
2626
2627 <programlisting>
2628 counterA :: Counter String
2629 counterA = NewCounter
2630 { _this = 0, _inc = (1+), _display = print, tag = "A" }
2631
2632 counterB :: Counter String
2633 counterB = NewCounter
2634 { _this = "", _inc = ('#':), _display = putStrLn, tag = "B" }
2635
2636 main = do
2637 display (inc counterA) -- prints "1"
2638 display (inc (inc counterB)) -- prints "##"
2639 </programlisting>
2640
2641 Record update syntax is supported for existentials (and GADTs):
2642 <programlisting>
2643 setTag :: Counter a -> a -> Counter a
2644 setTag obj t = obj{ tag = t }
2645 </programlisting>
2646 The rule for record update is this: <emphasis>
2647 the types of the updated fields may
2648 mention only the universally-quantified type variables
2649 of the data constructor. For GADTs, the field may mention only types
2650 that appear as a simple type-variable argument in the constructor's result
2651 type</emphasis>. For example:
2652 <programlisting>
2653 data T a b where { T1 { f1::a, f2::b, f3::(b,c) } :: T a b } -- c is existential
2654 upd1 t x = t { f1=x } -- OK: upd1 :: T a b -> a' -> T a' b
2655 upd2 t x = t { f3=x } -- BAD (f3's type mentions c, which is
2656 -- existentially quantified)
2657
2658 data G a b where { G1 { g1::a, g2::c } :: G a [c] }
2659 upd3 g x = g { g1=x } -- OK: upd3 :: G a b -> c -> G c b
2660 upd4 g x = g { g2=x } -- BAD (f2's type mentions c, which is not a simple
2661 -- type-variable argument in G1's result type)
2662 </programlisting>
2663 </para>
2664
2665 </sect3>
2666
2667
2668 <sect3>
2669 <title>Restrictions</title>
2670
2671 <para>
2672 There are several restrictions on the ways in which existentially-quantified
2673 constructors can be use.
2674 </para>
2675
2676 <para>
2677
2678 <itemizedlist>
2679 <listitem>
2680
2681 <para>
2682 When pattern matching, each pattern match introduces a new,
2683 distinct, type for each existential type variable. These types cannot
2684 be unified with any other type, nor can they escape from the scope of
2685 the pattern match. For example, these fragments are incorrect:
2686
2687
2688 <programlisting>
2689 f1 (MkFoo a f) = a
2690 </programlisting>
2691
2692
2693 Here, the type bound by <function>MkFoo</function> "escapes", because <literal>a</literal>
2694 is the result of <function>f1</function>. One way to see why this is wrong is to
2695 ask what type <function>f1</function> has:
2696
2697
2698 <programlisting>
2699 f1 :: Foo -> a -- Weird!
2700 </programlisting>
2701
2702
2703 What is this "<literal>a</literal>" in the result type? Clearly we don't mean
2704 this:
2705
2706
2707 <programlisting>
2708 f1 :: forall a. Foo -> a -- Wrong!
2709 </programlisting>
2710
2711
2712 The original program is just plain wrong. Here's another sort of error
2713
2714
2715 <programlisting>
2716 f2 (Baz1 a b) (Baz1 p q) = a==q
2717 </programlisting>
2718
2719
2720 It's ok to say <literal>a==b</literal> or <literal>p==q</literal>, but
2721 <literal>a==q</literal> is wrong because it equates the two distinct types arising
2722 from the two <function>Baz1</function> constructors.
2723
2724
2725 </para>
2726 </listitem>
2727 <listitem>
2728
2729 <para>
2730 You can't pattern-match on an existentially quantified
2731 constructor in a <literal>let</literal> or <literal>where</literal> group of
2732 bindings. So this is illegal:
2733
2734
2735 <programlisting>
2736 f3 x = a==b where { Baz1 a b = x }
2737 </programlisting>
2738
2739 Instead, use a <literal>case</literal> expression:
2740
2741 <programlisting>
2742 f3 x = case x of Baz1 a b -> a==b
2743 </programlisting>
2744
2745 In general, you can only pattern-match
2746 on an existentially-quantified constructor in a <literal>case</literal> expression or
2747 in the patterns of a function definition.
2748
2749 The reason for this restriction is really an implementation one.
2750 Type-checking binding groups is already a nightmare without
2751 existentials complicating the picture. Also an existential pattern
2752 binding at the top level of a module doesn't make sense, because it's
2753 not clear how to prevent the existentially-quantified type "escaping".
2754 So for now, there's a simple-to-state restriction. We'll see how
2755 annoying it is.
2756
2757 </para>
2758 </listitem>
2759 <listitem>
2760
2761 <para>
2762 You can't use existential quantification for <literal>newtype</literal>
2763 declarations. So this is illegal:
2764
2765
2766 <programlisting>
2767 newtype T = forall a. Ord a => MkT a
2768 </programlisting>
2769
2770
2771 Reason: a value of type <literal>T</literal> must be represented as a
2772 pair of a dictionary for <literal>Ord t</literal> and a value of type
2773 <literal>t</literal>. That contradicts the idea that
2774 <literal>newtype</literal> should have no concrete representation.
2775 You can get just the same efficiency and effect by using
2776 <literal>data</literal> instead of <literal>newtype</literal>. If
2777 there is no overloading involved, then there is more of a case for
2778 allowing an existentially-quantified <literal>newtype</literal>,
2779 because the <literal>data</literal> version does carry an
2780 implementation cost, but single-field existentially quantified
2781 constructors aren't much use. So the simple restriction (no
2782 existential stuff on <literal>newtype</literal>) stands, unless there
2783 are convincing reasons to change it.
2784
2785
2786 </para>
2787 </listitem>
2788 <listitem>
2789
2790 <para>
2791 You can't use <literal>deriving</literal> to define instances of a
2792 data type with existentially quantified data constructors.
2793
2794 Reason: in most cases it would not make sense. For example:;
2795
2796 <programlisting>
2797 data T = forall a. MkT [a] deriving( Eq )
2798 </programlisting>
2799
2800 To derive <literal>Eq</literal> in the standard way we would need to have equality
2801 between the single component of two <function>MkT</function> constructors:
2802
2803 <programlisting>
2804 instance Eq T where
2805 (MkT a) == (MkT b) = ???
2806 </programlisting>
2807
2808 But <varname>a</varname> and <varname>b</varname> have distinct types, and so can't be compared.
2809 It's just about possible to imagine examples in which the derived instance
2810 would make sense, but it seems altogether simpler simply to prohibit such
2811 declarations. Define your own instances!
2812 </para>
2813 </listitem>
2814
2815 </itemizedlist>
2816
2817 </para>
2818
2819 </sect3>
2820 </sect2>
2821
2822 <!-- ====================== Generalised algebraic data types ======================= -->
2823
2824 <sect2 id="gadt-style">
2825 <title>Declaring data types with explicit constructor signatures</title>
2826
2827 <para>When the <literal>GADTSyntax</literal> extension is enabled,
2828 GHC allows you to declare an algebraic data type by
2829 giving the type signatures of constructors explicitly. For example:
2830 <programlisting>
2831 data Maybe a where
2832 Nothing :: Maybe a
2833 Just :: a -> Maybe a
2834 </programlisting>
2835 The form is called a "GADT-style declaration"
2836 because Generalised Algebraic Data Types, described in <xref linkend="gadt"/>,
2837 can only be declared using this form.</para>
2838 <para>Notice that GADT-style syntax generalises existential types (<xref linkend="existential-quantification"/>).
2839 For example, these two declarations are equivalent:
2840 <programlisting>
2841 data Foo = forall a. MkFoo a (a -> Bool)
2842 data Foo' where { MKFoo :: a -> (a->Bool) -> Foo' }
2843 </programlisting>
2844 </para>
2845 <para>Any data type that can be declared in standard Haskell-98 syntax
2846 can also be declared using GADT-style syntax.
2847 The choice is largely stylistic, but GADT-style declarations differ in one important respect:
2848 they treat class constraints on the data constructors differently.
2849 Specifically, if the constructor is given a type-class context, that
2850 context is made available by pattern matching. For example:
2851 <programlisting>
2852 data Set a where
2853 MkSet :: Eq a => [a] -> Set a
2854
2855 makeSet :: Eq a => [a] -> Set a
2856 makeSet xs = MkSet (nub xs)
2857
2858 insert :: a -> Set a -> Set a
2859 insert a (MkSet as) | a `elem` as = MkSet as
2860 | otherwise = MkSet (a:as)
2861 </programlisting>
2862 A use of <literal>MkSet</literal> as a constructor (e.g. in the definition of <literal>makeSet</literal>)
2863 gives rise to a <literal>(Eq a)</literal>
2864 constraint, as you would expect. The new feature is that pattern-matching on <literal>MkSet</literal>
2865 (as in the definition of <literal>insert</literal>) makes <emphasis>available</emphasis> an <literal>(Eq a)</literal>
2866 context. In implementation terms, the <literal>MkSet</literal> constructor has a hidden field that stores
2867 the <literal>(Eq a)</literal> dictionary that is passed to <literal>MkSet</literal>; so
2868 when pattern-matching that dictionary becomes available for the right-hand side of the match.
2869 In the example, the equality dictionary is used to satisfy the equality constraint
2870 generated by the call to <literal>elem</literal>, so that the type of
2871 <literal>insert</literal> itself has no <literal>Eq</literal> constraint.
2872 </para>
2873 <para>
2874 For example, one possible application is to reify dictionaries:
2875 <programlisting>
2876 data NumInst a where
2877 MkNumInst :: Num a => NumInst a
2878
2879 intInst :: NumInst Int
2880 intInst = MkNumInst
2881
2882 plus :: NumInst a -> a -> a -> a
2883 plus MkNumInst p q = p + q
2884 </programlisting>
2885 Here, a value of type <literal>NumInst a</literal> is equivalent
2886 to an explicit <literal>(Num a)</literal> dictionary.
2887 </para>
2888 <para>
2889 All this applies to constructors declared using the syntax of <xref linkend="existential-with-context"/>.
2890 For example, the <literal>NumInst</literal> data type above could equivalently be declared
2891 like this:
2892 <programlisting>
2893 data NumInst a
2894 = Num a => MkNumInst (NumInst a)
2895 </programlisting>
2896 Notice that, unlike the situation when declaring an existential, there is
2897 no <literal>forall</literal>, because the <literal>Num</literal> constrains the
2898 data type's universally quantified type variable <literal>a</literal>.
2899 A constructor may have both universal and existential type variables: for example,
2900 the following two declarations are equivalent:
2901 <programlisting>
2902 data T1 a
2903 = forall b. (Num a, Eq b) => MkT1 a b
2904 data T2 a where
2905 MkT2 :: (Num a, Eq b) => a -> b -> T2 a
2906 </programlisting>
2907 </para>
2908 <para>All this behaviour contrasts with Haskell 98's peculiar treatment of
2909 contexts on a data type declaration (Section 4.2.1 of the Haskell 98 Report).
2910 In Haskell 98 the definition
2911 <programlisting>
2912 data Eq a => Set' a = MkSet' [a]
2913 </programlisting>
2914 gives <literal>MkSet'</literal> the same type as <literal>MkSet</literal> above. But instead of
2915 <emphasis>making available</emphasis> an <literal>(Eq a)</literal> constraint, pattern-matching
2916 on <literal>MkSet'</literal> <emphasis>requires</emphasis> an <literal>(Eq a)</literal> constraint!
2917 GHC faithfully implements this behaviour, odd though it is. But for GADT-style declarations,
2918 GHC's behaviour is much more useful, as well as much more intuitive.
2919 </para>
2920
2921 <para>
2922 The rest of this section gives further details about GADT-style data
2923 type declarations.
2924
2925 <itemizedlist>
2926 <listitem><para>
2927 The result type of each data constructor must begin with the type constructor being defined.
2928 If the result type of all constructors
2929 has the form <literal>T a1 ... an</literal>, where <literal>a1 ... an</literal>
2930 are distinct type variables, then the data type is <emphasis>ordinary</emphasis>;
2931 otherwise is a <emphasis>generalised</emphasis> data type (<xref linkend="gadt"/>).
2932 </para></listitem>
2933
2934 <listitem><para>
2935 As with other type signatures, you can give a single signature for several data constructors.
2936 In this example we give a single signature for <literal>T1</literal> and <literal>T2</literal>:
2937 <programlisting>
2938 data T a where
2939 T1,T2 :: a -> T a
2940 T3 :: T a
2941 </programlisting>
2942 </para></listitem>
2943
2944 <listitem><para>
2945 The type signature of
2946 each constructor is independent, and is implicitly universally quantified as usual.
2947 In particular, the type variable(s) in the "<literal>data T a where</literal>" header
2948 have no scope, and different constructors may have different universally-quantified type variables:
2949 <programlisting>
2950 data T a where -- The 'a' has no scope
2951 T1,T2 :: b -> T b -- Means forall b. b -> T b
2952 T3 :: T a -- Means forall a. T a
2953 </programlisting>
2954 </para></listitem>
2955
2956 <listitem><para>
2957 A constructor signature may mention type class constraints, which can differ for
2958 different constructors. For example, this is fine:
2959 <programlisting>
2960 data T a where
2961 T1 :: Eq b => b -> b -> T b
2962 T2 :: (Show c, Ix c) => c -> [c] -> T c
2963 </programlisting>
2964 When pattern matching, these constraints are made available to discharge constraints
2965 in the body of the match. For example:
2966 <programlisting>
2967 f :: T a -> String
2968 f (T1 x y) | x==y = "yes"
2969 | otherwise = "no"
2970 f (T2 a b) = show a
2971 </programlisting>
2972 Note that <literal>f</literal> is not overloaded; the <literal>Eq</literal> constraint arising
2973 from the use of <literal>==</literal> is discharged by the pattern match on <literal>T1</literal>
2974 and similarly the <literal>Show</literal> constraint arising from the use of <literal>show</literal>.
2975 </para></listitem>
2976
2977 <listitem><para>
2978 Unlike a Haskell-98-style
2979 data type declaration, the type variable(s) in the "<literal>data Set a where</literal>" header
2980 have no scope. Indeed, one can write a kind signature instead:
2981 <programlisting>
2982 data Set :: * -> * where ...
2983 </programlisting>
2984 or even a mixture of the two:
2985 <programlisting>
2986 data Bar a :: (* -> *) -> * where ...
2987 </programlisting>
2988 The type variables (if given) may be explicitly kinded, so we could also write the header for <literal>Foo</literal>
2989 like this:
2990 <programlisting>
2991 data Bar a (b :: * -> *) where ...
2992 </programlisting>
2993 </para></listitem>
2994
2995
2996 <listitem><para>
2997 You can use strictness annotations, in the obvious places
2998 in the constructor type:
2999 <programlisting>
3000 data Term a where
3001 Lit :: !Int -> Term Int
3002 If :: Term Bool -> !(Term a) -> !(Term a) -> Term a
3003 Pair :: Term a -> Term b -> Term (a,b)
3004 </programlisting>
3005 </para></listitem>
3006
3007 <listitem><para>
3008 You can use a <literal>deriving</literal> clause on a GADT-style data type
3009 declaration. For example, these two declarations are equivalent
3010 <programlisting>
3011 data Maybe1 a where {
3012 Nothing1 :: Maybe1 a ;
3013 Just1 :: a -> Maybe1 a
3014 } deriving( Eq, Ord )
3015
3016 data Maybe2 a = Nothing2 | Just2 a
3017 deriving( Eq, Ord )
3018 </programlisting>
3019 </para></listitem>
3020
3021 <listitem><para>
3022 The type signature may have quantified type variables that do not appear
3023 in the result type:
3024 <programlisting>
3025 data Foo where
3026 MkFoo :: a -> (a->Bool) -> Foo
3027 Nil :: Foo
3028 </programlisting>
3029 Here the type variable <literal>a</literal> does not appear in the result type
3030 of either constructor.
3031 Although it is universally quantified in the type of the constructor, such
3032 a type variable is often called "existential".
3033 Indeed, the above declaration declares precisely the same type as
3034 the <literal>data Foo</literal> in <xref linkend="existential-quantification"/>.
3035 </para><para>
3036 The type may contain a class context too, of course:
3037 <programlisting>
3038 data Showable where
3039 MkShowable :: Show a => a -> Showable
3040 </programlisting>
3041 </para></listitem>
3042
3043 <listitem><para>
3044 You can use record syntax on a GADT-style data type declaration:
3045
3046 <programlisting>
3047 data Person where
3048 Adult :: { name :: String, children :: [Person] } -> Person
3049 Child :: Show a => { name :: !String, funny :: a } -> Person
3050 </programlisting>
3051 As usual, for every constructor that has a field <literal>f</literal>, the type of
3052 field <literal>f</literal> must be the same (modulo alpha conversion).
3053 The <literal>Child</literal> constructor above shows that the signature
3054 may have a context, existentially-quantified variables, and strictness annotations,
3055 just as in the non-record case. (NB: the "type" that follows the double-colon
3056 is not really a type, because of the record syntax and strictness annotations.
3057 A "type" of this form can appear only in a constructor signature.)
3058 </para></listitem>
3059
3060 <listitem><para>
3061 Record updates are allowed with GADT-style declarations,
3062 only fields that have the following property: the type of the field
3063 mentions no existential type variables.
3064 </para></listitem>
3065
3066 <listitem><para>
3067 As in the case of existentials declared using the Haskell-98-like record syntax
3068 (<xref linkend="existential-records"/>),
3069 record-selector functions are generated only for those fields that have well-typed
3070 selectors.
3071 Here is the example of that section, in GADT-style syntax:
3072 <programlisting>
3073 data Counter a where
3074 NewCounter :: { _this :: self
3075 , _inc :: self -> self
3076 , _display :: self -> IO ()
3077 , tag :: a
3078 } -> Counter a
3079 </programlisting>
3080 As before, only one selector function is generated here, that for <literal>tag</literal>.
3081 Nevertheless, you can still use all the field names in pattern matching and record construction.
3082 </para></listitem>
3083
3084 <listitem><para>
3085 In a GADT-style data type declaration there is no obvious way to specify that a data constructor
3086 should be infix, which makes a difference if you derive <literal>Show</literal> for the type.
3087 (Data constructors declared infix are displayed infix by the derived <literal>show</literal>.)
3088 So GHC implements the following design: a data constructor declared in a GADT-style data type
3089 declaration is displayed infix by <literal>Show</literal> iff (a) it is an operator symbol,
3090 (b) it has two arguments, (c) it has a programmer-supplied fixity declaration. For example
3091 <programlisting>
3092 infix 6 (:--:)
3093 data T a where
3094 (:--:) :: Int -> Bool -> T Int
3095 </programlisting>
3096 </para></listitem>
3097 </itemizedlist></para>
3098 </sect2>
3099
3100 <sect2 id="gadt">
3101 <title>Generalised Algebraic Data Types (GADTs)</title>
3102
3103 <para>Generalised Algebraic Data Types generalise ordinary algebraic data types
3104 by allowing constructors to have richer return types. Here is an example:
3105 <programlisting>
3106 data Term a where
3107 Lit :: Int -> Term Int
3108 Succ :: Term Int -> Term Int
3109 IsZero :: Term Int -> Term Bool
3110 If :: Term Bool -> Term a -> Term a -> Term a
3111 Pair :: Term a -> Term b -> Term (a,b)
3112 </programlisting>
3113 Notice that the return type of the constructors is not always <literal>Term a</literal>, as is the
3114 case with ordinary data types. This generality allows us to
3115 write a well-typed <literal>eval</literal> function
3116 for these <literal>Terms</literal>:
3117 <programlisting>
3118 eval :: Term a -> a
3119 eval (Lit i) = i
3120 eval (Succ t) = 1 + eval t
3121 eval (IsZero t) = eval t == 0
3122 eval (If b e1 e2) = if eval b then eval e1 else eval e2
3123 eval (Pair e1 e2) = (eval e1, eval e2)
3124 </programlisting>
3125 The key point about GADTs is that <emphasis>pattern matching causes type refinement</emphasis>.
3126 For example, in the right hand side of the equation
3127 <programlisting>
3128 eval :: Term a -> a
3129 eval (Lit i) = ...
3130 </programlisting>
3131 the type <literal>a</literal> is refined to <literal>Int</literal>. That's the whole point!
3132 A precise specification of the type rules is beyond what this user manual aspires to,
3133 but the design closely follows that described in
3134 the paper <ulink
3135 url="http://research.microsoft.com/%7Esimonpj/papers/gadt/">Simple
3136 unification-based type inference for GADTs</ulink>,
3137 (ICFP 2006).
3138 The general principle is this: <emphasis>type refinement is only carried out
3139 based on user-supplied type annotations</emphasis>.
3140 So if no type signature is supplied for <literal>eval</literal>, no type refinement happens,
3141 and lots of obscure error messages will
3142 occur. However, the refinement is quite general. For example, if we had:
3143 <programlisting>
3144 eval :: Term a -> a -> a
3145 eval (Lit i) j = i+j
3146 </programlisting>
3147 the pattern match causes the type <literal>a</literal> to be refined to <literal>Int</literal> (because of the type
3148 of the constructor <literal>Lit</literal>), and that refinement also applies to the type of <literal>j</literal>, and
3149 the result type of the <literal>case</literal> expression. Hence the addition <literal>i+j</literal> is legal.
3150 </para>
3151 <para>
3152 These and many other examples are given in papers by Hongwei Xi, and
3153 Tim Sheard. There is a longer introduction
3154 <ulink url="http://www.haskell.org/haskellwiki/GADT">on the wiki</ulink>,
3155 and Ralf Hinze's
3156 <ulink url="http://www.informatik.uni-bonn.de/~ralf/publications/With.pdf">Fun with phantom types</ulink> also has a number of examples. Note that papers
3157 may use different notation to that implemented in GHC.
3158 </para>
3159 <para>
3160 The rest of this section outlines the extensions to GHC that support GADTs. The extension is enabled with
3161 <option>-XGADTs</option>. The <option>-XGADTs</option> flag also sets <option>-XRelaxedPolyRec</option>.
3162 <itemizedlist>
3163 <listitem><para>
3164 A GADT can only be declared using GADT-style syntax (<xref linkend="gadt-style"/>);
3165 the old Haskell-98 syntax for data declarations always declares an ordinary data type.
3166 The result type of each constructor must begin with the type constructor being defined,
3167 but for a GADT the arguments to the type constructor can be arbitrary monotypes.
3168 For example, in the <literal>Term</literal> data
3169 type above, the type of each constructor must end with <literal>Term ty</literal>, but
3170 the <literal>ty</literal> need not be a type variable (e.g. the <literal>Lit</literal>
3171 constructor).
3172 </para></listitem>
3173
3174 <listitem><para>
3175 It is permitted to declare an ordinary algebraic data type using GADT-style syntax.
3176 What makes a GADT into a GADT is not the syntax, but rather the presence of data constructors
3177 whose result type is not just <literal>T a b</literal>.
3178 </para></listitem>
3179
3180 <listitem><para>
3181 You cannot use a <literal>deriving</literal> clause for a GADT; only for
3182 an ordinary data type.
3183 </para></listitem>
3184
3185 <listitem><para>
3186 As mentioned in <xref linkend="gadt-style"/>, record syntax is supported.
3187 For example:
3188 <programlisting>
3189 data Term a where
3190 Lit :: { val :: Int } -> Term Int
3191 Succ :: { num :: Term Int } -> Term Int
3192 Pred :: { num :: Term Int } -> Term Int
3193 IsZero :: { arg :: Term Int } -> Term Bool
3194 Pair :: { arg1 :: Term a
3195 , arg2 :: Term b
3196 } -> Term (a,b)
3197 If :: { cnd :: Term Bool
3198 , tru :: Term a
3199 , fls :: Term a
3200 } -> Term a
3201 </programlisting>
3202 However, for GADTs there is the following additional constraint:
3203 every constructor that has a field <literal>f</literal> must have
3204 the same result type (modulo alpha conversion)
3205 Hence, in the above example, we cannot merge the <literal>num</literal>
3206 and <literal>arg</literal> fields above into a
3207 single name. Although their field types are both <literal>Term Int</literal>,
3208 their selector functions actually have different types:
3209
3210 <programlisting>
3211 num :: Term Int -> Term Int
3212 arg :: Term Bool -> Term Int
3213 </programlisting>
3214 </para></listitem>
3215
3216 <listitem><para>
3217 When pattern-matching against data constructors drawn from a GADT,
3218 for example in a <literal>case</literal> expression, the following rules apply:
3219 <itemizedlist>
3220 <listitem><para>The type of the scrutinee must be rigid.</para></listitem>
3221 <listitem><para>The type of the entire <literal>case</literal> expression must be rigid.</para></listitem>
3222 <listitem><para>The type of any free variable mentioned in any of
3223 the <literal>case</literal> alternatives must be rigid.</para></listitem>
3224 </itemizedlist>
3225 A type is "rigid" if it is completely known to the compiler at its binding site. The easiest
3226 way to ensure that a variable a rigid type is to give it a type signature.
3227 For more precise details see <ulink url="http://research.microsoft.com/%7Esimonpj/papers/gadt">
3228 Simple unification-based type inference for GADTs
3229 </ulink>. The criteria implemented by GHC are given in the Appendix.
3230
3231 </para></listitem>
3232
3233 </itemizedlist>
3234 </para>
3235
3236 </sect2>
3237 </sect1>
3238
3239 <!-- ====================== End of Generalised algebraic data types ======================= -->
3240
3241 <sect1 id="deriving">
3242 <title>Extensions to the "deriving" mechanism</title>
3243
3244 <sect2 id="deriving-inferred">
3245 <title>Inferred context for deriving clauses</title>
3246
3247 <para>
3248 The Haskell Report is vague about exactly when a <literal>deriving</literal> clause is
3249 legal. For example:
3250 <programlisting>
3251 data T0 f a = MkT0 a deriving( Eq )
3252 data T1 f a = MkT1 (f a) deriving( Eq )
3253 data T2 f a = MkT2 (f (f a)) deriving( Eq )
3254 </programlisting>
3255 The natural generated <literal>Eq</literal> code would result in these instance declarations:
3256 <programlisting>
3257 instance Eq a => Eq (T0 f a) where ...
3258 instance Eq (f a) => Eq (T1 f a) where ...
3259 instance Eq (f (f a)) => Eq (T2 f a) where ...
3260 </programlisting>
3261 The first of these is obviously fine. The second is still fine, although less obviously.
3262 The third is not Haskell 98, and risks losing termination of instances.
3263 </para>
3264 <para>
3265 GHC takes a conservative position: it accepts the first two, but not the third. The rule is this:
3266 each constraint in the inferred instance context must consist only of type variables,
3267 with no repetitions.
3268 </para>
3269 <para>
3270 This rule is applied regardless of flags. If you want a more exotic context, you can write
3271 it yourself, using the <link linkend="stand-alone-deriving">standalone deriving mechanism</link>.
3272 </para>
3273 </sect2>
3274
3275 <sect2 id="stand-alone-deriving">
3276 <title>Stand-alone deriving declarations</title>
3277
3278 <para>
3279 GHC now allows stand-alone <literal>deriving</literal> declarations, enabled by <literal>-XStandaloneDeriving</literal>:
3280 <programlisting>
3281 data Foo a = Bar a | Baz String
3282
3283 deriving instance Eq a => Eq (Foo a)
3284 </programlisting>
3285 The syntax is identical to that of an ordinary instance declaration apart from (a) the keyword
3286 <literal>deriving</literal>, and (b) the absence of the <literal>where</literal> part.
3287 Note the following points:
3288 <itemizedlist>
3289 <listitem><para>
3290 You must supply an explicit context (in the example the context is <literal>(Eq a)</literal>),
3291 exactly as you would in an ordinary instance declaration.
3292 (In contrast, in a <literal>deriving</literal> clause
3293 attached to a data type declaration, the context is inferred.)
3294 </para></listitem>
3295
3296 <listitem><para>
3297 A <literal>deriving instance</literal> declaration
3298 must obey the same rules concerning form and termination as ordinary instance declarations,
3299 controlled by the same flags; see <xref linkend="instance-decls"/>.
3300 </para></listitem>
3301
3302 <listitem><para>
3303 Unlike a <literal>deriving</literal>
3304 declaration attached to a <literal>data</literal> declaration, the instance can be more specific
3305 than the data type (assuming you also use
3306 <literal>-XFlexibleInstances</literal>, <xref linkend="instance-rules"/>). Consider
3307 for example
3308 <programlisting>
3309 data Foo a = Bar a | Baz String
3310
3311 deriving instance Eq a => Eq (Foo [a])
3312 deriving instance Eq a => Eq (Foo (Maybe a))
3313 </programlisting>
3314 This will generate a derived instance for <literal>(Foo [a])</literal> and <literal>(Foo (Maybe a))</literal>,
3315 but other types such as <literal>(Foo (Int,Bool))</literal> will not be an instance of <literal>Eq</literal>.
3316 </para></listitem>
3317
3318 <listitem><para>
3319 Unlike a <literal>deriving</literal>
3320 declaration attached to a <literal>data</literal> declaration,
3321 GHC does not restrict the form of the data type. Instead, GHC simply generates the appropriate
3322 boilerplate code for the specified class, and typechecks it. If there is a type error, it is
3323 your problem. (GHC will show you the offending code if it has a type error.)
3324 The merit of this is that you can derive instances for GADTs and other exotic
3325 data types, providing only that the boilerplate code does indeed typecheck. For example:
3326 <programlisting>
3327 data T a where
3328 T1 :: T Int
3329 T2 :: T Bool
3330
3331 deriving instance Show (T a)
3332 </programlisting>
3333 In this example, you cannot say <literal>... deriving( Show )</literal> on the
3334 data type declaration for <literal>T</literal>,
3335 because <literal>T</literal> is a GADT, but you <emphasis>can</emphasis> generate
3336 the instance declaration using stand-alone deriving.
3337 </para>
3338 </listitem>
3339
3340 <listitem>
3341 <para>The stand-alone syntax is generalised for newtypes in exactly the same
3342 way that ordinary <literal>deriving</literal> clauses are generalised (<xref linkend="newtype-deriving"/>).
3343 For example:
3344 <programlisting>
3345 newtype Foo a = MkFoo (State Int a)
3346
3347 deriving instance MonadState Int Foo
3348 </programlisting>
3349 GHC always treats the <emphasis>last</emphasis> parameter of the instance
3350 (<literal>Foo</literal> in this example) as the type whose instance is being derived.
3351 </para></listitem>
3352 </itemizedlist></para>
3353
3354 </sect2>
3355
3356
3357 <sect2 id="deriving-typeable">
3358 <title>Deriving clause for extra classes (<literal>Typeable</literal>, <literal>Data</literal>, etc)</title>
3359
3360 <para>
3361 Haskell 98 allows the programmer to add "<literal>deriving( Eq, Ord )</literal>" to a data type
3362 declaration, to generate a standard instance declaration for classes specified in the <literal>deriving</literal> clause.
3363 In Haskell 98, the only classes that may appear in the <literal>deriving</literal> clause are the standard
3364 classes <literal>Eq</literal>, <literal>Ord</literal>,
3365 <literal>Enum</literal>, <literal>Ix</literal>, <literal>Bounded</literal>, <literal>Read</literal>, and <literal>Show</literal>.
3366 </para>
3367 <para>
3368 GHC extends this list with several more classes that may be automatically derived:
3369 <itemizedlist>
3370 <listitem><para> With <option>-XDeriveDataTypeable</option>, you can derive instances of the classes
3371 <literal>Typeable</literal>, and <literal>Data</literal>, defined in the library
3372 modules <literal>Data.Typeable</literal> and <literal>Data.Generics</literal> respectively.
3373 </para>
3374 <para>Since GHC 7.8.1, <literal>Typeable</literal> is kind-polymorphic (see
3375 <xref linkend="kind-polymorphism"/>) and can be derived for any datatype and
3376 type class. Instances for datatypes can be derived by attaching a
3377 <literal>deriving Typeable</literal> clause to the datatype declaration, or by
3378 using standalone deriving (see <xref linkend="stand-alone-deriving"/>).
3379 Instances for type classes can only be derived using standalone deriving.
3380 See also <xref linkend="auto-derive-typeable"/>.
3381 </para>
3382 <para>
3383 Also since GHC 7.8.1, handwritten (ie. not derived) instances of
3384 <literal>Typeable</literal> are forbidden, and will be ignored with a warning.
3385 </para>
3386 </listitem>
3387
3388 <listitem><para> With <option>-XDeriveGeneric</option>, you can derive
3389 instances of the classes <literal>Generic</literal> and
3390 <literal>Generic1</literal>, defined in <literal>GHC.Generics</literal>.
3391 You can use these to define generic functions,
3392 as described in <xref linkend="generic-programming"/>.
3393 </para></listitem>
3394
3395 <listitem><para> With <option>-XDeriveFunctor</option>, you can derive instances of
3396 the class <literal>Functor</literal>,
3397 defined in <literal>GHC.Base</literal>.
3398 </para></listitem>
3399
3400 <listitem><para> With <option>-XDeriveFoldable</option>, you can derive instances of
3401 the class <literal>Foldable</literal>,
3402 defined in <literal>Data.Foldable</literal>.
3403 </para></listitem>
3404
3405 <listitem><para> With <option>-XDeriveTraversable</option>, you can derive instances of
3406 the class <literal>Traversable</literal>,
3407 defined in <literal>Data.Traversable</literal>.
3408 </para></listitem>
3409 </itemizedlist>
3410 In each case the appropriate class must be in scope before it
3411 can be mentioned in the <literal>deriving</literal> clause.
3412 </para>
3413 </sect2>
3414
3415 <sect2 id="auto-derive-typeable">
3416 <title>Automatically deriving <literal>Typeable</literal> instances</title>
3417
3418 <para>
3419 The flag <option>-XAutoDeriveTypeable</option> triggers the generation
3420 of derived <literal>Typeable</literal> instances for every datatype and type
3421 class declaration in the module it is used. This flag implies
3422 <option>-XDeriveDataTypeable</option> (<xref linkend="deriving-typeable"/>).
3423 </para>
3424
3425 </sect2>
3426
3427 <sect2 id="newtype-deriving">
3428 <title>Generalised derived instances for newtypes</title>
3429
3430 <para>
3431 When you define an abstract type using <literal>newtype</literal>, you may want
3432 the new type to inherit some instances from its representation. In
3433 Haskell 98, you can inherit instances of <literal>Eq</literal>, <literal>Ord</literal>,
3434 <literal>Enum</literal> and <literal>Bounded</literal> by deriving them, but for any
3435 other classes you have to write an explicit instance declaration. For
3436 example, if you define
3437
3438 <programlisting>
3439 newtype Dollars = Dollars Int
3440 </programlisting>
3441
3442 and you want to use arithmetic on <literal>Dollars</literal>, you have to
3443 explicitly define an instance of <literal>Num</literal>:
3444
3445 <programlisting>
3446 instance Num Dollars where
3447 Dollars a + Dollars b = Dollars (a+b)
3448 ...
3449 </programlisting>
3450 All the instance does is apply and remove the <literal>newtype</literal>
3451 constructor. It is particularly galling that, since the constructor
3452 doesn't appear at run-time, this instance declaration defines a
3453 dictionary which is <emphasis>wholly equivalent</emphasis> to the <literal>Int</literal>
3454 dictionary, only slower!
3455 </para>
3456
3457
3458 <sect3> <title> Generalising the deriving clause </title>
3459 <para>
3460 GHC now permits such instances to be derived instead,
3461 using the flag <option>-XGeneralizedNewtypeDeriving</option>,
3462 so one can write
3463 <programlisting>
3464 newtype Dollars = Dollars Int deriving (Eq,Show,Num)
3465 </programlisting>
3466
3467 and the implementation uses the <emphasis>same</emphasis> <literal>Num</literal> dictionary
3468 for <literal>Dollars</literal> as for <literal>Int</literal>. Notionally, the compiler
3469 derives an instance declaration of the form
3470
3471 <programlisting>
3472 instance Num Int => Num Dollars
3473 </programlisting>
3474
3475 which just adds or removes the <literal>newtype</literal> constructor according to the type.
3476 </para>
3477 <para>
3478
3479 We can also derive instances of constructor classes in a similar
3480 way. For example, suppose we have implemented state and failure monad
3481 transformers, such that
3482
3483 <programlisting>
3484 instance Monad m => Monad (State s m)
3485 instance Monad m => Monad (Failure m)
3486 </programlisting>
3487 In Haskell 98, we can define a parsing monad by
3488 <programlisting>
3489 type Parser tok m a = State [tok] (Failure m) a
3490 </programlisting>
3491
3492 which is automatically a monad thanks to the instance declarations
3493 above. With the extension, we can make the parser type abstract,
3494 without needing to write an instance of class <literal>Monad</literal>, via
3495
3496 <programlisting>
3497 newtype Parser tok m a = Parser (State [tok] (Failure m) a)
3498 deriving Monad
3499 </programlisting>
3500 In this case the derived instance declaration is of the form
3501 <programlisting>
3502 instance Monad (State [tok] (Failure m)) => Monad (Parser tok m)
3503 </programlisting>
3504
3505 Notice that, since <literal>Monad</literal> is a constructor class, the
3506 instance is a <emphasis>partial application</emphasis> of the new type, not the
3507 entire left hand side. We can imagine that the type declaration is
3508 "eta-converted" to generate the context of the instance
3509 declaration.
3510 </para>
3511 <para>
3512
3513 We can even derive instances of multi-parameter classes, provided the
3514 newtype is the last class parameter. In this case, a ``partial
3515 application'' of the class appears in the <literal>deriving</literal>
3516 clause. For example, given the class
3517
3518 <programlisting>
3519 class StateMonad s m | m -> s where ...
3520 instance Monad m => StateMonad s (State s m) where ...
3521 </programlisting>
3522 then we can derive an instance of <literal>StateMonad</literal> for <literal>Parser</literal>s by
3523 <programlisting>
3524 newtype Parser tok m a = Parser (State [tok] (Failure m) a)
3525 deriving (Monad, StateMonad [tok])
3526 </programlisting>
3527
3528 The derived instance is obtained by completing the application of the
3529 class to the new type:
3530
3531 <programlisting>
3532 instance StateMonad [tok] (State [tok] (Failure m)) =>
3533 StateMonad [tok] (Parser tok m)
3534 </programlisting>
3535 </para>
3536 <para>
3537
3538 As a result of this extension, all derived instances in newtype
3539 declarations are treated uniformly (and implemented just by reusing
3540 the dictionary for the representation type), <emphasis>except</emphasis>
3541 <literal>Show</literal> and <literal>Read</literal>, which really behave differently for
3542 the newtype and its representation.
3543 </para>
3544 </sect3>
3545
3546 <sect3> <title> A more precise specification </title>
3547 <para>
3548 Derived instance declarations are constructed as follows. Consider the
3549 declaration (after expansion of any type synonyms)
3550
3551 <programlisting>
3552 newtype T v1...vn = T' (t vk+1...vn) deriving (c1...cm)
3553 </programlisting>
3554
3555 where
3556 <itemizedlist>
3557 <listitem><para>
3558 The <literal>ci</literal> are partial applications of
3559 classes of the form <literal>C t1'...tj'</literal>, where the arity of <literal>C</literal>
3560 is exactly <literal>j+1</literal>. That is, <literal>C</literal> lacks exactly one type argument.
3561 </para></listitem>
3562 <listitem><para>
3563 The <literal>k</literal> is chosen so that <literal>ci (T v1...vk)</literal> is well-kinded.
3564 </para></listitem>
3565 <listitem><para>
3566 The type <literal>t</literal> is an arbitrary type.
3567 </para></listitem>
3568 <listitem><para>
3569 The type variables <literal>vk+1...vn</literal> do not occur in <literal>t</literal>,
3570 nor in the <literal>ci</literal>, and
3571 </para></listitem>
3572 <listitem><para>
3573 None of the <literal>ci</literal> is <literal>Read</literal>, <literal>Show</literal>,
3574 <literal>Typeable</literal>, or <literal>Data</literal>. These classes
3575 should not "look through" the type or its constructor. You can still
3576 derive these classes for a newtype, but it happens in the usual way, not
3577 via this new mechanism.
3578 </para></listitem>
3579 </itemizedlist>
3580 Then, for each <literal>ci</literal>, the derived instance
3581 declaration is:
3582 <programlisting>
3583 instance ci t => ci (T v1...vk)
3584 </programlisting>
3585 As an example which does <emphasis>not</emphasis> work, consider
3586 <programlisting>
3587 newtype NonMonad m s = NonMonad (State s m s) deriving Monad
3588 </programlisting>
3589 Here we cannot derive the instance
3590 <programlisting>
3591 instance Monad (State s m) => Monad (NonMonad m)
3592 </programlisting>
3593
3594 because the type variable <literal>s</literal> occurs in <literal>State s m</literal>,
3595 and so cannot be "eta-converted" away. It is a good thing that this
3596 <literal>deriving</literal> clause is rejected, because <literal>NonMonad m</literal> is
3597 not, in fact, a monad --- for the same reason. Try defining
3598 <literal>>>=</literal> with the correct type: you won't be able to.
3599 </para>
3600 <para>
3601
3602 Notice also that the <emphasis>order</emphasis> of class parameters becomes
3603 important, since we can only derive instances for the last one. If the
3604 <literal>StateMonad</literal> class above were instead defined as
3605
3606 <programlisting>
3607 class StateMonad m s | m -> s where ...
3608 </programlisting>
3609
3610 then we would not have been able to derive an instance for the
3611 <literal>Parser</literal> type above. We hypothesise that multi-parameter
3612 classes usually have one "main" parameter for which deriving new
3613 instances is most interesting.
3614 </para>
3615 <para>Lastly, all of this applies only for classes other than
3616 <literal>Read</literal>, <literal>Show</literal>, <literal>Typeable</literal>,
3617 and <literal>Data</literal>, for which the built-in derivation applies (section
3618 4.3.3. of the Haskell Report).
3619 (For the standard classes <literal>Eq</literal>, <literal>Ord</literal>,
3620 <literal>Ix</literal>, and <literal>Bounded</literal> it is immaterial whether
3621 the standard method is used or the one described here.)
3622 </para>
3623 </sect3>
3624 </sect2>
3625 </sect1>
3626
3627
3628 <!-- TYPE SYSTEM EXTENSIONS -->
3629 <sect1 id="type-class-extensions">
3630 <title>Class and instances declarations</title>
3631
3632 <sect2 id="multi-param-type-classes">
3633 <title>Class declarations</title>
3634
3635 <para>
3636 This section, and the next one, documents GHC's type-class extensions.
3637 There's lots of background in the paper <ulink
3638 url="http://research.microsoft.com/~simonpj/Papers/type-class-design-space/">Type
3639 classes: exploring the design space</ulink> (Simon Peyton Jones, Mark
3640 Jones, Erik Meijer).
3641 </para>
3642
3643 <sect3>
3644 <title>Multi-parameter type classes</title>
3645 <para>
3646 Multi-parameter type classes are permitted, with flag <option>-XMultiParamTypeClasses</option>.
3647 For example:
3648
3649
3650 <programlisting>
3651 class Collection c a where
3652 union :: c a -> c a -> c a
3653 ...etc.
3654 </programlisting>
3655
3656 </para>
3657 </sect3>
3658
3659 <sect3 id="superclass-rules">
3660 <title>The superclasses of a class declaration</title>
3661
3662 <para>
3663 In Haskell 98 the context of a class declaration (which introduces superclasses)
3664 must be simple; that is, each predicate must consist of a class applied to
3665 type variables. The flag <option>-XFlexibleContexts</option>
3666 (<xref linkend="flexible-contexts"/>)
3667 lifts this restriction,
3668 so that the only restriction on the context in a class declaration is
3669 that the class hierarchy must be acyclic. So these class declarations are OK:
3670
3671
3672 <programlisting>
3673 class Functor (m k) => FiniteMap m k where
3674 ...
3675
3676 class (Monad m, Monad (t m)) => Transform t m where
3677 lift :: m a -> (t m) a
3678 </programlisting>
3679
3680
3681 </para>
3682 <para>
3683 As in Haskell 98, The class hierarchy must be acyclic. However, the definition
3684 of "acyclic" involves only the superclass relationships. For example,
3685 this is OK:
3686
3687
3688 <programlisting>
3689 class C a where {
3690 op :: D b => a -> b -> b
3691 }
3692
3693 class C a => D a where { ... }
3694 </programlisting>
3695
3696
3697 Here, <literal>C</literal> is a superclass of <literal>D</literal>, but it's OK for a
3698 class operation <literal>op</literal> of <literal>C</literal> to mention <literal>D</literal>. (It
3699 would not be OK for <literal>D</literal> to be a superclass of <literal>C</literal>.)
3700 </para>
3701 <para>
3702 With the extension that adds a <link linkend="constraint-kind">kind of constraints</link>,
3703 you can write more exotic superclass definitions. The superclass cycle check is even more
3704 liberal in these case. For example, this is OK:
3705
3706 <programlisting>
3707 class A cls c where
3708 meth :: cls c => c -> c
3709
3710 class A B c => B c where
3711 </programlisting>
3712
3713 A superclass context for a class <literal>C</literal> is allowed if, after expanding
3714 type synonyms to their right-hand-sides, and uses of classes (other than <literal>C</literal>)
3715 to their superclasses, <literal>C</literal> does not occur syntactically in the context.
3716 </para>
3717 </sect3>
3718
3719
3720
3721
3722 <sect3 id="class-method-types">
3723 <title>Class method types</title>
3724
3725 <para>
3726 Haskell 98 prohibits class method types to mention constraints on the
3727 class type variable, thus:
3728 <programlisting>
3729 class Seq s a where
3730 fromList :: [a] -> s a
3731 elem :: Eq a => a -> s a -> Bool
3732 </programlisting>
3733 The type of <literal>elem</literal> is illegal in Haskell 98, because it
3734 contains the constraint <literal>Eq a</literal>, constrains only the
3735 class type variable (in this case <literal>a</literal>).
3736 GHC lifts this restriction (flag <option>-XConstrainedClassMethods</option>).
3737 </para>
3738
3739
3740 </sect3>
3741
3742
3743 <sect3 id="class-default-signatures">
3744 <title>Default method signatures</title>
3745
3746 <para>
3747 Haskell 98 allows you to define a default implementation when declaring a class:
3748 <programlisting>
3749 class Enum a where
3750 enum :: [a]
3751 enum = []
3752 </programlisting>
3753 The type of the <literal>enum</literal> method is <literal>[a]</literal>, and
3754 this is also the type of the default method. You can lift this restriction
3755 and give another type to the default method using the flag
3756 <option>-XDefaultSignatures</option>. For instance, if you have written a
3757 generic implementation of enumeration in a class <literal>GEnum</literal>
3758 with method <literal>genum</literal> in terms of <literal>GHC.Generics</literal>,
3759 you can specify a default method that uses that generic implementation:
3760 <programlisting>
3761 class Enum a where
3762 enum :: [a]
3763 default enum :: (Generic a, GEnum (Rep a)) => [a]
3764 enum = map to genum
3765 </programlisting>
3766 We reuse the keyword <literal>default</literal> to signal that a signature
3767 applies to the default method only; when defining instances of the
3768 <literal>Enum</literal> class, the original type <literal>[a]</literal> of
3769 <literal>enum</literal> still applies. When giving an empty instance, however,
3770 the default implementation <literal>map to genum</literal> is filled-in,
3771 and type-checked with the type
3772 <literal>(Generic a, GEnum (Rep a)) => [a]</literal>.
3773 </para>
3774
3775 <para>
3776 We use default signatures to simplify generic programming in GHC
3777 (<xref linkend="generic-programming"/>).
3778 </para>
3779
3780
3781 </sect3>
3782
3783 <sect3 id="nullary-type-classes">
3784 <title>Nullary type classes</title>
3785 Nullary (no parameter) type classes are enabled with <option>-XNullaryTypeClasses</option>.
3786 Since there are no available parameters, there can be at most one instance
3787 of a nullary class. A nullary type class might be used to document some assumption
3788 in a type signature (such as reliance on the Riemann hypothesis) or add some
3789 globally configurable settings in a program. For example,
3790
3791 <programlisting>
3792 class RiemannHypothesis where
3793 assumeRH :: a -> a
3794
3795 -- Deterministic version of the Miller test
3796 -- correctness depends on the generalized Riemann hypothesis
3797 isPrime :: RiemannHypothesis => Integer -> Bool
3798 isPrime n = assumeRH (...)
3799 </programlisting>
3800
3801 The type signature of <literal>isPrime</literal> informs users that its correctness
3802 depends on an unproven conjecture. If the function is used, the user has
3803 to acknowledge the dependence with:
3804
3805 <programlisting>
3806 instance RiemannHypothesis where
3807 assumeRH = id
3808 </programlisting>
3809
3810 </sect3>
3811 </sect2>
3812
3813 <sect2 id="functional-dependencies">
3814 <title>Functional dependencies
3815 </title>
3816
3817 <para> Functional dependencies are implemented as described by Mark Jones
3818 in &ldquo;<ulink url="http://citeseer.ist.psu.edu/jones00type.html">Type Classes with Functional Dependencies</ulink>&rdquo;, Mark P. Jones,
3819 In Proceedings of the 9th European Symposium on Programming,
3820 ESOP 2000, Berlin, Germany, March 2000, Springer-Verlag LNCS 1782,
3821 .
3822 </para>
3823 <para>
3824 Functional dependencies are introduced by a vertical bar in the syntax of a
3825 class declaration; e.g.
3826 <programlisting>
3827 class (Monad m) => MonadState s m | m -> s where ...
3828
3829 class Foo a b c | a b -> c where ...
3830 </programlisting>
3831 There should be more documentation, but there isn't (yet). Yell if you need it.
3832 </para>
3833
3834 <sect3><title>Rules for functional dependencies </title>
3835 <para>
3836 In a class declaration, all of the class type variables must be reachable (in the sense
3837 mentioned in <xref linkend="flexible-contexts"/>)
3838 from the free variables of each method type.
3839 For example:
3840
3841 <programlisting>
3842 class Coll s a where
3843 empty :: s
3844 insert :: s -> a -> s
3845 </programlisting>
3846
3847 is not OK, because the type of <literal>empty</literal> doesn't mention
3848 <literal>a</literal>. Functional dependencies can make the type variable
3849 reachable:
3850 <programlisting>
3851 class Coll s a | s -> a where
3852 empty :: s
3853 insert :: s -> a -> s
3854 </programlisting>
3855
3856 Alternatively <literal>Coll</literal> might be rewritten
3857
3858 <programlisting>
3859 class Coll s a where
3860 empty :: s a
3861 insert :: s a -> a -> s a
3862 </programlisting>
3863
3864
3865 which makes the connection between the type of a collection of
3866 <literal>a</literal>'s (namely <literal>(s a)</literal>) and the element type <literal>a</literal>.
3867 Occasionally this really doesn't work, in which case you can split the
3868 class like this:
3869
3870
3871 <programlisting>
3872 class CollE s where
3873 empty :: s
3874
3875 class CollE s => Coll s a where
3876 insert :: s -> a -> s
3877 </programlisting>
3878 </para>
3879 </sect3>
3880
3881
3882 <sect3>
3883 <title>Background on functional dependencies</title>
3884
3885 <para>The following description of the motivation and use of functional dependencies is taken
3886 from the Hugs user manual, reproduced here (with minor changes) by kind
3887 permission of Mark Jones.
3888 </para>
3889 <para>
3890 Consider the following class, intended as part of a
3891 library for collection types:
3892 <programlisting>
3893 class Collects e ce where
3894 empty :: ce
3895 insert :: e -> ce -> ce
3896 member :: e -> ce -> Bool
3897 </programlisting>
3898 The type variable e used here represents the element type, while ce is the type
3899 of the container itself. Within this framework, we might want to define
3900 instances of this class for lists or characteristic functions (both of which
3901 can be used to represent collections of any equality type), bit sets (which can
3902 be used to represent collections of characters), or hash tables (which can be
3903 used to represent any collection whose elements have a hash function). Omitting
3904 standard implementation details, this would lead to the following declarations:
3905 <programlisting>
3906 instance Eq e => Collects e [e] where ...
3907 instance Eq e => Collects e (e -> Bool) where ...
3908 instance Collects Char BitSet where ...
3909 instance (Hashable e, Collects a ce)
3910 => Collects e (Array Int ce) where ...
3911 </programlisting>
3912 All this looks quite promising; we have a class and a range of interesting
3913 implementations. Unfortunately, there are some serious problems with the class
3914 declaration. First, the empty function has an ambiguous type:
3915 <programlisting>
3916 empty :: Collects e ce => ce
3917 </programlisting>
3918 By "ambiguous" we mean that there is a type variable e that appears on the left
3919 of the <literal>=&gt;</literal> symbol, but not on the right. The problem with
3920 this is that, according to the theoretical foundations of Haskell overloading,
3921 we cannot guarantee a well-defined semantics for any term with an ambiguous
3922 type.
3923 </para>
3924 <para>
3925 We can sidestep this specific problem by removing the empty member from the
3926 class declaration. However, although the remaining members, insert and member,
3927 do not have ambiguous types, we still run into problems when we try to use
3928 them. For example, consider the following two functions:
3929 <programlisting>
3930 f x y = insert x . insert y
3931 g = f True 'a'
3932 </programlisting>
3933 for which GHC infers the following types:
3934 <programlisting>
3935 f :: (Collects a c, Collects b c) => a -> b -> c -> c
3936 g :: (Collects Bool c, Collects Char c) => c -> c
3937 </programlisting>
3938 Notice that the type for f allows the two parameters x and y to be assigned
3939 different types, even though it attempts to insert each of the two values, one
3940 after the other, into the same collection. If we're trying to model collections
3941 that contain only one type of value, then this is clearly an inaccurate
3942 type. Worse still, the definition for g is accepted, without causing a type
3943 error. As a result, the error in this code will not be flagged at the point
3944 where it appears. Instead, it will show up only when we try to use g, which
3945 might even be in a different module.
3946 </para>
3947
3948 <sect4><title>An attempt to use constructor classes</title>
3949
3950 <para>
3951 Faced with the problems described above, some Haskell programmers might be
3952 tempted to use something like the following version of the class declaration:
3953 <programlisting>
3954 class Collects e c where
3955 empty :: c e
3956 insert :: e -> c e -> c e
3957 member :: e -> c e -> Bool
3958 </programlisting>
3959 The key difference here is that we abstract over the type constructor c that is
3960 used to form the collection type c e, and not over that collection type itself,
3961 represented by ce in the original class declaration. This avoids the immediate
3962 problems that we mentioned above: empty has type <literal>Collects e c => c
3963 e</literal>, which is not ambiguous.
3964 </para>
3965 <para>
3966 The function f from the previous section has a more accurate type:
3967 <programlisting>
3968 f :: (Collects e c) => e -> e -> c e -> c e
3969 </programlisting>
3970 The function g from the previous section is now rejected with a type error as
3971 we would hope because the type of f does not allow the two arguments to have
3972 different types.
3973 This, then, is an example of a multiple parameter class that does actually work
3974 quite well in practice, without ambiguity problems.
3975 There is, however, a catch. This version of the Collects class is nowhere near
3976 as general as the original class seemed to be: only one of the four instances
3977 for <literal>Collects</literal>
3978 given above can be used with this version of Collects because only one of
3979 them---the instance for lists---has a collection type that can be written in
3980 the form c e, for some type constructor c, and element type e.
3981 </para>
3982 </sect4>
3983
3984 <sect4><title>Adding functional dependencies</title>
3985
3986 <para>
3987 To get a more useful version of the Collects class, Hugs provides a mechanism
3988 that allows programmers to specify dependencies between the parameters of a
3989 multiple parameter class (For readers with an interest in theoretical
3990 foundations and previous work: The use of dependency information can be seen
3991 both as a generalization of the proposal for `parametric type classes' that was
3992 put forward by Chen, Hudak, and Odersky, or as a special case of Mark Jones's
3993 later framework for "improvement" of qualified types. The
3994 underlying ideas are also discussed in a more theoretical and abstract setting
3995 in a manuscript [implparam], where they are identified as one point in a
3996 general design space for systems of implicit parameterization.).
3997
3998 To start with an abstract example, consider a declaration such as:
3999 <programlisting>
4000 class C a b where ...
4001 </programlisting>
4002 which tells us simply that C can be thought of as a binary relation on types
4003 (or type constructors, depending on the kinds of a and b). Extra clauses can be
4004 included in the definition of classes to add information about dependencies
4005 between parameters, as in the following examples:
4006 <programlisting>
4007 class D a b | a -> b where ...
4008 class E a b | a -> b, b -> a where ...
4009 </programlisting>
4010 The notation <literal>a -&gt; b</literal> used here between the | and where
4011 symbols --- not to be
4012 confused with a function type --- indicates that the a parameter uniquely
4013 determines the b parameter, and might be read as "a determines b." Thus D is
4014 not just a relation, but actually a (partial) function. Similarly, from the two
4015 dependencies that are included in the definition of E, we can see that E
4016 represents a (partial) one-one mapping between types.
4017 </para>
4018 <para>
4019 More generally, dependencies take the form <literal>x1 ... xn -&gt; y1 ... ym</literal>,
4020 where x1, ..., xn, and y1, ..., yn are type variables with n&gt;0 and
4021 m&gt;=0, meaning that the y parameters are uniquely determined by the x
4022 parameters. Spaces can be used as separators if more than one variable appears
4023 on any single side of a dependency, as in <literal>t -&gt; a b</literal>. Note that a class may be
4024 annotated with multiple dependencies using commas as separators, as in the
4025 definition of E above. Some dependencies that we can write in this notation are
4026 redundant, and will be rejected because they don't serve any useful
4027 purpose, and may instead indicate an error in the program. Examples of
4028 dependencies like this include <literal>a -&gt; a </literal>,
4029 <literal>a -&gt; a a </literal>,
4030 <literal>a -&gt; </literal>, etc. There can also be
4031 some redundancy if multiple dependencies are given, as in
4032 <literal>a-&gt;b</literal>,
4033 <literal>b-&gt;c </literal>, <literal>a-&gt;c </literal>, and
4034 in which some subset implies the remaining dependencies. Examples like this are
4035 not treated as errors. Note that dependencies appear only in class
4036 declarations, and not in any other part of the language. In particular, the
4037 syntax for instance declarations, class constraints, and types is completely
4038 unchanged.
4039 </para>
4040 <para>
4041 By including dependencies in a class declaration, we provide a mechanism for
4042 the programmer to specify each multiple parameter class more precisely. The
4043 compiler, on the other hand, is responsible for ensuring that the set of
4044 instances that are in scope at any given point in the program is consistent
4045 with any declared dependencies. For example, the following pair of instance
4046 declarations cannot appear together in the same scope because they violate the
4047 dependency for D, even though either one on its own would be acceptable:
4048 <programlisting>
4049 instance D Bool Int where ...
4050 instance D Bool Char where ...
4051 </programlisting>
4052 Note also that the following declaration is not allowed, even by itself:
4053 <programlisting>
4054 instance D [a] b where ...
4055 </programlisting>
4056 The problem here is that this instance would allow one particular choice of [a]
4057 to be associated with more than one choice for b, which contradicts the
4058 dependency specified in the definition of D. More generally, this means that,
4059 in any instance of the form:
4060 <programlisting>
4061 instance D t s where ...
4062 </programlisting>
4063 for some particular types t and s, the only variables that can appear in s are
4064 the ones that appear in t, and hence, if the type t is known, then s will be
4065 uniquely determined.
4066 </para>
4067 <para>
4068 The benefit of including dependency information is that it allows us to define
4069 more general multiple parameter classes, without ambiguity problems, and with
4070 the benefit of more accurate types. To illustrate this, we return to the
4071 collection class example, and annotate the original definition of <literal>Collects</literal>
4072 with a simple dependency:
4073 <programlisting>
4074 class Collects e ce | ce -> e where
4075 empty :: ce
4076 insert :: e -> ce -> ce
4077 member :: e -> ce -> Bool
4078 </programlisting>
4079 The dependency <literal>ce -&gt; e</literal> here specifies that the type e of elements is uniquely
4080 determined by the type of the collection ce. Note that both parameters of
4081 Collects are of kind *; there are no constructor classes here. Note too that
4082 all of the instances of Collects that we gave earlier can be used
4083 together with this new definition.
4084 </para>
4085 <para>
4086 What about the ambiguity problems that we encountered with the original
4087 definition? The empty function still has type Collects e ce => ce, but it is no
4088 longer necessary to regard that as an ambiguous type: Although the variable e
4089 does not appear on the right of the => symbol, the dependency for class
4090 Collects tells us that it is uniquely determined by ce, which does appear on
4091 the right of the => symbol. Hence the context in which empty is used can still
4092 give enough information to determine types for both ce and e, without
4093 ambiguity. More generally, we need only regard a type as ambiguous if it
4094 contains a variable on the left of the => that is not uniquely determined
4095 (either directly or indirectly) by the variables on the right.
4096 </para>
4097 <para>
4098 Dependencies also help to produce more accurate types for user defined
4099 functions, and hence to provide earlier detection of errors, and less cluttered
4100 types for programmers to work with. Recall the previous definition for a
4101 function f:
4102 <programlisting>
4103 f x y = insert x y = insert x . insert y