Merge branch 'master' of darcs.haskell.org:/home/darcs/ghc
[ghc.git] / docs / users_guide / glasgow_exts.xml
1 <?xml version="1.0" encoding="iso-8859-1"?>
2 <para>
3 <indexterm><primary>language, GHC</primary></indexterm>
4 <indexterm><primary>extensions, GHC</primary></indexterm>
5 As with all known Haskell systems, GHC implements some extensions to
6 the language. They can all be enabled or disabled by commandline flags
7 or language pragmas. By default GHC understands the most recent Haskell
8 version it supports, plus a handful of extensions.
9 </para>
10
11 <para>
12 Some of the Glasgow extensions serve to give you access to the
13 underlying facilities with which we implement Haskell. Thus, you can
14 get at the Raw Iron, if you are willing to write some non-portable
15 code at a more primitive level. You need not be &ldquo;stuck&rdquo;
16 on performance because of the implementation costs of Haskell's
17 &ldquo;high-level&rdquo; features&mdash;you can always code
18 &ldquo;under&rdquo; them. In an extreme case, you can write all your
19 time-critical code in C, and then just glue it together with Haskell!
20 </para>
21
22 <para>
23 Before you get too carried away working at the lowest level (e.g.,
24 sloshing <literal>MutableByteArray&num;</literal>s around your
25 program), you may wish to check if there are libraries that provide a
26 &ldquo;Haskellised veneer&rdquo; over the features you want. The
27 separate <ulink url="../libraries/index.html">libraries
28 documentation</ulink> describes all the libraries that come with GHC.
29 </para>
30
31 <!-- LANGUAGE OPTIONS -->
32 <sect1 id="options-language">
33 <title>Language options</title>
34
35 <indexterm><primary>language</primary><secondary>option</secondary>
36 </indexterm>
37 <indexterm><primary>options</primary><secondary>language</secondary>
38 </indexterm>
39 <indexterm><primary>extensions</primary><secondary>options controlling</secondary>
40 </indexterm>
41
42 <para>The language option flags control what variation of the language are
43 permitted.</para>
44
45 <para>Language options can be controlled in two ways:
46 <itemizedlist>
47 <listitem><para>Every language option can switched on by a command-line flag "<option>-X...</option>"
48 (e.g. <option>-XTemplateHaskell</option>), and switched off by the flag "<option>-XNo...</option>";
49 (e.g. <option>-XNoTemplateHaskell</option>).</para></listitem>
50 <listitem><para>
51 Language options recognised by Cabal can also be enabled using the <literal>LANGUAGE</literal> pragma,
52 thus <literal>{-# LANGUAGE TemplateHaskell #-}</literal> (see <xref linkend="language-pragma"/>). </para>
53 </listitem>
54 </itemizedlist></para>
55
56 <para>The flag <option>-fglasgow-exts</option>
57 <indexterm><primary><option>-fglasgow-exts</option></primary></indexterm>
58 is equivalent to enabling the following extensions:
59 &what_glasgow_exts_does;
60 Enabling these options is the <emphasis>only</emphasis>
61 effect of <option>-fglasgow-exts</option>.
62 We are trying to move away from this portmanteau flag,
63 and towards enabling features individually.</para>
64
65 </sect1>
66
67 <!-- UNBOXED TYPES AND PRIMITIVE OPERATIONS -->
68 <sect1 id="primitives">
69 <title>Unboxed types and primitive operations</title>
70
71 <para>GHC is built on a raft of primitive data types and operations;
72 "primitive" in the sense that they cannot be defined in Haskell itself.
73 While you really can use this stuff to write fast code,
74 we generally find it a lot less painful, and more satisfying in the
75 long run, to use higher-level language features and libraries. With
76 any luck, the code you write will be optimised to the efficient
77 unboxed version in any case. And if it isn't, we'd like to know
78 about it.</para>
79
80 <para>All these primitive data types and operations are exported by the
81 library <literal>GHC.Prim</literal>, for which there is
82 <ulink url="&libraryGhcPrimLocation;/GHC-Prim.html">detailed online documentation</ulink>.
83 (This documentation is generated from the file <filename>compiler/prelude/primops.txt.pp</filename>.)
84 </para>
85
86 <para>
87 If you want to mention any of the primitive data types or operations in your
88 program, you must first import <literal>GHC.Prim</literal> to bring them
89 into scope. Many of them have names ending in "&num;", and to mention such
90 names you need the <option>-XMagicHash</option> extension (<xref linkend="magic-hash"/>).
91 </para>
92
93 <para>The primops make extensive use of <link linkend="glasgow-unboxed">unboxed types</link>
94 and <link linkend="unboxed-tuples">unboxed tuples</link>, which
95 we briefly summarise here. </para>
96
97 <sect2 id="glasgow-unboxed">
98 <title>Unboxed types</title>
99
100 <para>
101 <indexterm><primary>Unboxed types (Glasgow extension)</primary></indexterm>
102 </para>
103
104 <para>Most types in GHC are <firstterm>boxed</firstterm>, which means
105 that values of that type are represented by a pointer to a heap
106 object. The representation of a Haskell <literal>Int</literal>, for
107 example, is a two-word heap object. An <firstterm>unboxed</firstterm>
108 type, however, is represented by the value itself, no pointers or heap
109 allocation are involved.
110 </para>
111
112 <para>
113 Unboxed types correspond to the &ldquo;raw machine&rdquo; types you
114 would use in C: <literal>Int&num;</literal> (long int),
115 <literal>Double&num;</literal> (double), <literal>Addr&num;</literal>
116 (void *), etc. The <emphasis>primitive operations</emphasis>
117 (PrimOps) on these types are what you might expect; e.g.,
118 <literal>(+&num;)</literal> is addition on
119 <literal>Int&num;</literal>s, and is the machine-addition that we all
120 know and love&mdash;usually one instruction.
121 </para>
122
123 <para>
124 Primitive (unboxed) types cannot be defined in Haskell, and are
125 therefore built into the language and compiler. Primitive types are
126 always unlifted; that is, a value of a primitive type cannot be
127 bottom. We use the convention (but it is only a convention)
128 that primitive types, values, and
129 operations have a <literal>&num;</literal> suffix (see <xref linkend="magic-hash"/>).
130 For some primitive types we have special syntax for literals, also
131 described in the <link linkend="magic-hash">same section</link>.
132 </para>
133
134 <para>
135 Primitive values are often represented by a simple bit-pattern, such
136 as <literal>Int&num;</literal>, <literal>Float&num;</literal>,
137 <literal>Double&num;</literal>. But this is not necessarily the case:
138 a primitive value might be represented by a pointer to a
139 heap-allocated object. Examples include
140 <literal>Array&num;</literal>, the type of primitive arrays. A
141 primitive array is heap-allocated because it is too big a value to fit
142 in a register, and would be too expensive to copy around; in a sense,
143 it is accidental that it is represented by a pointer. If a pointer
144 represents a primitive value, then it really does point to that value:
145 no unevaluated thunks, no indirections&hellip;nothing can be at the
146 other end of the pointer than the primitive value.
147 A numerically-intensive program using unboxed types can
148 go a <emphasis>lot</emphasis> faster than its &ldquo;standard&rdquo;
149 counterpart&mdash;we saw a threefold speedup on one example.
150 </para>
151
152 <para>
153 There are some restrictions on the use of primitive types:
154 <itemizedlist>
155 <listitem><para>The main restriction
156 is that you can't pass a primitive value to a polymorphic
157 function or store one in a polymorphic data type. This rules out
158 things like <literal>[Int&num;]</literal> (i.e. lists of primitive
159 integers). The reason for this restriction is that polymorphic
160 arguments and constructor fields are assumed to be pointers: if an
161 unboxed integer is stored in one of these, the garbage collector would
162 attempt to follow it, leading to unpredictable space leaks. Or a
163 <function>seq</function> operation on the polymorphic component may
164 attempt to dereference the pointer, with disastrous results. Even
165 worse, the unboxed value might be larger than a pointer
166 (<literal>Double&num;</literal> for instance).
167 </para>
168 </listitem>
169 <listitem><para> You cannot define a newtype whose representation type
170 (the argument type of the data constructor) is an unboxed type. Thus,
171 this is illegal:
172 <programlisting>
173 newtype A = MkA Int#
174 </programlisting>
175 </para></listitem>
176 <listitem><para> You cannot bind a variable with an unboxed type
177 in a <emphasis>top-level</emphasis> binding.
178 </para></listitem>
179 <listitem><para> You cannot bind a variable with an unboxed type
180 in a <emphasis>recursive</emphasis> binding.
181 </para></listitem>
182 <listitem><para> You may bind unboxed variables in a (non-recursive,
183 non-top-level) pattern binding, but you must make any such pattern-match
184 strict. For example, rather than:
185 <programlisting>
186 data Foo = Foo Int Int#
187
188 f x = let (Foo a b, w) = ..rhs.. in ..body..
189 </programlisting>
190 you must write:
191 <programlisting>
192 data Foo = Foo Int Int#
193
194 f x = let !(Foo a b, w) = ..rhs.. in ..body..
195 </programlisting>
196 since <literal>b</literal> has type <literal>Int#</literal>.
197 </para>
198 </listitem>
199 </itemizedlist>
200 </para>
201
202 </sect2>
203
204 <sect2 id="unboxed-tuples">
205 <title>Unboxed tuples</title>
206
207 <para>
208 Unboxed tuples aren't really exported by <literal>GHC.Exts</literal>;
209 they are a syntactic extension enabled by the language flag <option>-XUnboxedTuples</option>. An
210 unboxed tuple looks like this:
211 </para>
212
213 <para>
214
215 <programlisting>
216 (# e_1, ..., e_n #)
217 </programlisting>
218
219 </para>
220
221 <para>
222 where <literal>e&lowbar;1..e&lowbar;n</literal> are expressions of any
223 type (primitive or non-primitive). The type of an unboxed tuple looks
224 the same.
225 </para>
226
227 <para>
228 Unboxed tuples are used for functions that need to return multiple
229 values, but they avoid the heap allocation normally associated with
230 using fully-fledged tuples. When an unboxed tuple is returned, the
231 components are put directly into registers or on the stack; the
232 unboxed tuple itself does not have a composite representation. Many
233 of the primitive operations listed in <literal>primops.txt.pp</literal> return unboxed
234 tuples.
235 In particular, the <literal>IO</literal> and <literal>ST</literal> monads use unboxed
236 tuples to avoid unnecessary allocation during sequences of operations.
237 </para>
238
239 <para>
240 There are some restrictions on the use of unboxed tuples:
241 <itemizedlist>
242
243 <listitem>
244 <para>
245 Values of unboxed tuple types are subject to the same restrictions as
246 other unboxed types; i.e. they may not be stored in polymorphic data
247 structures or passed to polymorphic functions.
248 </para>
249 </listitem>
250
251 <listitem>
252 <para>
253 The typical use of unboxed tuples is simply to return multiple values,
254 binding those multiple results with a <literal>case</literal> expression, thus:
255 <programlisting>
256 f x y = (# x+1, y-1 #)
257 g x = case f x x of { (# a, b #) -&#62; a + b }
258 </programlisting>
259 You can have an unboxed tuple in a pattern binding, thus
260 <programlisting>
261 f x = let (# p,q #) = h x in ..body..
262 </programlisting>
263 If the types of <literal>p</literal> and <literal>q</literal> are not unboxed,
264 the resulting binding is lazy like any other Haskell pattern binding. The
265 above example desugars like this:
266 <programlisting>
267 f x = let t = case h x o f{ (# p,q #) -> (p,q)
268 p = fst t
269 q = snd t
270 in ..body..
271 </programlisting>
272 Indeed, the bindings can even be recursive.
273 </para>
274 </listitem>
275 </itemizedlist>
276
277 </para>
278
279 </sect2>
280 </sect1>
281
282
283 <!-- ====================== SYNTACTIC EXTENSIONS ======================= -->
284
285 <sect1 id="syntax-extns">
286 <title>Syntactic extensions</title>
287
288 <sect2 id="unicode-syntax">
289 <title>Unicode syntax</title>
290 <para>The language
291 extension <option>-XUnicodeSyntax</option><indexterm><primary><option>-XUnicodeSyntax</option></primary></indexterm>
292 enables Unicode characters to be used to stand for certain ASCII
293 character sequences. The following alternatives are provided:</para>
294
295 <informaltable>
296 <tgroup cols="2" align="left" colsep="1" rowsep="1">
297 <thead>
298 <row>
299 <entry>ASCII</entry>
300 <entry>Unicode alternative</entry>
301 <entry>Code point</entry>
302 <entry>Name</entry>
303 </row>
304 </thead>
305
306 <!--
307 to find the DocBook entities for these characters, find
308 the Unicode code point (e.g. 0x2237), and grep for it in
309 /usr/share/sgml/docbook/xml-dtd-*/ent/* (or equivalent on
310 your system. Some of these Unicode code points don't have
311 equivalent DocBook entities.
312 -->
313
314 <tbody>
315 <row>
316 <entry><literal>::</literal></entry>
317 <entry>::</entry> <!-- no special char, apparently -->
318 <entry>0x2237</entry>
319 <entry>PROPORTION</entry>
320 </row>
321 </tbody>
322 <tbody>
323 <row>
324 <entry><literal>=&gt;</literal></entry>
325 <entry>&rArr;</entry>
326 <entry>0x21D2</entry>
327 <entry>RIGHTWARDS DOUBLE ARROW</entry>
328 </row>
329 </tbody>
330 <tbody>
331 <row>
332 <entry><literal>forall</literal></entry>
333 <entry>&forall;</entry>
334 <entry>0x2200</entry>
335 <entry>FOR ALL</entry>
336 </row>
337 </tbody>
338 <tbody>
339 <row>
340 <entry><literal>-&gt;</literal></entry>
341 <entry>&rarr;</entry>
342 <entry>0x2192</entry>
343 <entry>RIGHTWARDS ARROW</entry>
344 </row>
345 </tbody>
346 <tbody>
347 <row>
348 <entry><literal>&lt;-</literal></entry>
349 <entry>&larr;</entry>
350 <entry>0x2190</entry>
351 <entry>LEFTWARDS ARROW</entry>
352 </row>
353 </tbody>
354
355 <tbody>
356 <row>
357 <entry>-&lt;</entry>
358 <entry>&larrtl;</entry>
359 <entry>0x2919</entry>
360 <entry>LEFTWARDS ARROW-TAIL</entry>
361 </row>
362 </tbody>
363
364 <tbody>
365 <row>
366 <entry>&gt;-</entry>
367 <entry>&rarrtl;</entry>
368 <entry>0x291A</entry>
369 <entry>RIGHTWARDS ARROW-TAIL</entry>
370 </row>
371 </tbody>
372
373 <tbody>
374 <row>
375 <entry>-&lt;&lt;</entry>
376 <entry></entry>
377 <entry>0x291B</entry>
378 <entry>LEFTWARDS DOUBLE ARROW-TAIL</entry>
379 </row>
380 </tbody>
381
382 <tbody>
383 <row>
384 <entry>&gt;&gt;-</entry>
385 <entry></entry>
386 <entry>0x291C</entry>
387 <entry>RIGHTWARDS DOUBLE ARROW-TAIL</entry>
388 </row>
389 </tbody>
390
391 <tbody>
392 <row>
393 <entry>*</entry>
394 <entry>&starf;</entry>
395 <entry>0x2605</entry>
396 <entry>BLACK STAR</entry>
397 </row>
398 </tbody>
399
400 </tgroup>
401 </informaltable>
402 </sect2>
403
404 <sect2 id="magic-hash">
405 <title>The magic hash</title>
406 <para>The language extension <option>-XMagicHash</option> allows "&num;" as a
407 postfix modifier to identifiers. Thus, "x&num;" is a valid variable, and "T&num;" is
408 a valid type constructor or data constructor.</para>
409
410 <para>The hash sign does not change semantics at all. We tend to use variable
411 names ending in "&num;" for unboxed values or types (e.g. <literal>Int&num;</literal>),
412 but there is no requirement to do so; they are just plain ordinary variables.
413 Nor does the <option>-XMagicHash</option> extension bring anything into scope.
414 For example, to bring <literal>Int&num;</literal> into scope you must
415 import <literal>GHC.Prim</literal> (see <xref linkend="primitives"/>);
416 the <option>-XMagicHash</option> extension
417 then allows you to <emphasis>refer</emphasis> to the <literal>Int&num;</literal>
418 that is now in scope.</para>
419 <para> The <option>-XMagicHash</option> also enables some new forms of literals (see <xref linkend="glasgow-unboxed"/>):
420 <itemizedlist>
421 <listitem><para> <literal>'x'&num;</literal> has type <literal>Char&num;</literal></para> </listitem>
422 <listitem><para> <literal>&quot;foo&quot;&num;</literal> has type <literal>Addr&num;</literal></para> </listitem>
423 <listitem><para> <literal>3&num;</literal> has type <literal>Int&num;</literal>. In general,
424 any Haskell integer lexeme followed by a <literal>&num;</literal> is an <literal>Int&num;</literal> literal, e.g.
425 <literal>-0x3A&num;</literal> as well as <literal>32&num;</literal></para>.</listitem>
426 <listitem><para> <literal>3&num;&num;</literal> has type <literal>Word&num;</literal>. In general,
427 any non-negative Haskell integer lexeme followed by <literal>&num;&num;</literal>
428 is a <literal>Word&num;</literal>. </para> </listitem>
429 <listitem><para> <literal>3.2&num;</literal> has type <literal>Float&num;</literal>.</para> </listitem>
430 <listitem><para> <literal>3.2&num;&num;</literal> has type <literal>Double&num;</literal></para> </listitem>
431 </itemizedlist>
432 </para>
433 </sect2>
434
435 <!-- ====================== HIERARCHICAL MODULES ======================= -->
436
437
438 <sect2 id="hierarchical-modules">
439 <title>Hierarchical Modules</title>
440
441 <para>GHC supports a small extension to the syntax of module
442 names: a module name is allowed to contain a dot
443 <literal>&lsquo;.&rsquo;</literal>. This is also known as the
444 &ldquo;hierarchical module namespace&rdquo; extension, because
445 it extends the normally flat Haskell module namespace into a
446 more flexible hierarchy of modules.</para>
447
448 <para>This extension has very little impact on the language
449 itself; modules names are <emphasis>always</emphasis> fully
450 qualified, so you can just think of the fully qualified module
451 name as <quote>the module name</quote>. In particular, this
452 means that the full module name must be given after the
453 <literal>module</literal> keyword at the beginning of the
454 module; for example, the module <literal>A.B.C</literal> must
455 begin</para>
456
457 <programlisting>module A.B.C</programlisting>
458
459
460 <para>It is a common strategy to use the <literal>as</literal>
461 keyword to save some typing when using qualified names with
462 hierarchical modules. For example:</para>
463
464 <programlisting>
465 import qualified Control.Monad.ST.Strict as ST
466 </programlisting>
467
468 <para>For details on how GHC searches for source and interface
469 files in the presence of hierarchical modules, see <xref
470 linkend="search-path"/>.</para>
471
472 <para>GHC comes with a large collection of libraries arranged
473 hierarchically; see the accompanying <ulink
474 url="../libraries/index.html">library
475 documentation</ulink>. More libraries to install are available
476 from <ulink
477 url="http://hackage.haskell.org/packages/hackage.html">HackageDB</ulink>.</para>
478 </sect2>
479
480 <!-- ====================== PATTERN GUARDS ======================= -->
481
482 <sect2 id="pattern-guards">
483 <title>Pattern guards</title>
484
485 <para>
486 <indexterm><primary>Pattern guards (Glasgow extension)</primary></indexterm>
487 The discussion that follows is an abbreviated version of Simon Peyton Jones's original <ulink url="http://research.microsoft.com/~simonpj/Haskell/guards.html">proposal</ulink>. (Note that the proposal was written before pattern guards were implemented, so refers to them as unimplemented.)
488 </para>
489
490 <para>
491 Suppose we have an abstract data type of finite maps, with a
492 lookup operation:
493
494 <programlisting>
495 lookup :: FiniteMap -> Int -> Maybe Int
496 </programlisting>
497
498 The lookup returns <function>Nothing</function> if the supplied key is not in the domain of the mapping, and <function>(Just v)</function> otherwise,
499 where <varname>v</varname> is the value that the key maps to. Now consider the following definition:
500 </para>
501
502 <programlisting>
503 clunky env var1 var2 | ok1 &amp;&amp; ok2 = val1 + val2
504 | otherwise = var1 + var2
505 where
506 m1 = lookup env var1
507 m2 = lookup env var2
508 ok1 = maybeToBool m1
509 ok2 = maybeToBool m2
510 val1 = expectJust m1
511 val2 = expectJust m2
512 </programlisting>
513
514 <para>
515 The auxiliary functions are
516 </para>
517
518 <programlisting>
519 maybeToBool :: Maybe a -&gt; Bool
520 maybeToBool (Just x) = True
521 maybeToBool Nothing = False
522
523 expectJust :: Maybe a -&gt; a
524 expectJust (Just x) = x
525 expectJust Nothing = error "Unexpected Nothing"
526 </programlisting>
527
528 <para>
529 What is <function>clunky</function> doing? The guard <literal>ok1 &amp;&amp;
530 ok2</literal> checks that both lookups succeed, using
531 <function>maybeToBool</function> to convert the <function>Maybe</function>
532 types to booleans. The (lazily evaluated) <function>expectJust</function>
533 calls extract the values from the results of the lookups, and binds the
534 returned values to <varname>val1</varname> and <varname>val2</varname>
535 respectively. If either lookup fails, then clunky takes the
536 <literal>otherwise</literal> case and returns the sum of its arguments.
537 </para>
538
539 <para>
540 This is certainly legal Haskell, but it is a tremendously verbose and
541 un-obvious way to achieve the desired effect. Arguably, a more direct way
542 to write clunky would be to use case expressions:
543 </para>
544
545 <programlisting>
546 clunky env var1 var2 = case lookup env var1 of
547 Nothing -&gt; fail
548 Just val1 -&gt; case lookup env var2 of
549 Nothing -&gt; fail
550 Just val2 -&gt; val1 + val2
551 where
552 fail = var1 + var2
553 </programlisting>
554
555 <para>
556 This is a bit shorter, but hardly better. Of course, we can rewrite any set
557 of pattern-matching, guarded equations as case expressions; that is
558 precisely what the compiler does when compiling equations! The reason that
559 Haskell provides guarded equations is because they allow us to write down
560 the cases we want to consider, one at a time, independently of each other.
561 This structure is hidden in the case version. Two of the right-hand sides
562 are really the same (<function>fail</function>), and the whole expression
563 tends to become more and more indented.
564 </para>
565
566 <para>
567 Here is how I would write clunky:
568 </para>
569
570 <programlisting>
571 clunky env var1 var2
572 | Just val1 &lt;- lookup env var1
573 , Just val2 &lt;- lookup env var2
574 = val1 + val2
575 ...other equations for clunky...
576 </programlisting>
577
578 <para>
579 The semantics should be clear enough. The qualifiers are matched in order.
580 For a <literal>&lt;-</literal> qualifier, which I call a pattern guard, the
581 right hand side is evaluated and matched against the pattern on the left.
582 If the match fails then the whole guard fails and the next equation is
583 tried. If it succeeds, then the appropriate binding takes place, and the
584 next qualifier is matched, in the augmented environment. Unlike list
585 comprehensions, however, the type of the expression to the right of the
586 <literal>&lt;-</literal> is the same as the type of the pattern to its
587 left. The bindings introduced by pattern guards scope over all the
588 remaining guard qualifiers, and over the right hand side of the equation.
589 </para>
590
591 <para>
592 Just as with list comprehensions, boolean expressions can be freely mixed
593 with among the pattern guards. For example:
594 </para>
595
596 <programlisting>
597 f x | [y] &lt;- x
598 , y > 3
599 , Just z &lt;- h y
600 = ...
601 </programlisting>
602
603 <para>
604 Haskell's current guards therefore emerge as a special case, in which the
605 qualifier list has just one element, a boolean expression.
606 </para>
607 </sect2>
608
609 <!-- ===================== View patterns =================== -->
610
611 <sect2 id="view-patterns">
612 <title>View patterns
613 </title>
614
615 <para>
616 View patterns are enabled by the flag <literal>-XViewPatterns</literal>.
617 More information and examples of view patterns can be found on the
618 <ulink url="http://hackage.haskell.org/trac/ghc/wiki/ViewPatterns">Wiki
619 page</ulink>.
620 </para>
621
622 <para>
623 View patterns are somewhat like pattern guards that can be nested inside
624 of other patterns. They are a convenient way of pattern-matching
625 against values of abstract types. For example, in a programming language
626 implementation, we might represent the syntax of the types of the
627 language as follows:
628
629 <programlisting>
630 type Typ
631
632 data TypView = Unit
633 | Arrow Typ Typ
634
635 view :: Type -> TypeView
636
637 -- additional operations for constructing Typ's ...
638 </programlisting>
639
640 The representation of Typ is held abstract, permitting implementations
641 to use a fancy representation (e.g., hash-consing to manage sharing).
642
643 Without view patterns, using this signature a little inconvenient:
644 <programlisting>
645 size :: Typ -> Integer
646 size t = case view t of
647 Unit -> 1
648 Arrow t1 t2 -> size t1 + size t2
649 </programlisting>
650
651 It is necessary to iterate the case, rather than using an equational
652 function definition. And the situation is even worse when the matching
653 against <literal>t</literal> is buried deep inside another pattern.
654 </para>
655
656 <para>
657 View patterns permit calling the view function inside the pattern and
658 matching against the result:
659 <programlisting>
660 size (view -> Unit) = 1
661 size (view -> Arrow t1 t2) = size t1 + size t2
662 </programlisting>
663
664 That is, we add a new form of pattern, written
665 <replaceable>expression</replaceable> <literal>-></literal>
666 <replaceable>pattern</replaceable> that means "apply the expression to
667 whatever we're trying to match against, and then match the result of
668 that application against the pattern". The expression can be any Haskell
669 expression of function type, and view patterns can be used wherever
670 patterns are used.
671 </para>
672
673 <para>
674 The semantics of a pattern <literal>(</literal>
675 <replaceable>exp</replaceable> <literal>-></literal>
676 <replaceable>pat</replaceable> <literal>)</literal> are as follows:
677
678 <itemizedlist>
679
680 <listitem> Scoping:
681
682 <para>The variables bound by the view pattern are the variables bound by
683 <replaceable>pat</replaceable>.
684 </para>
685
686 <para>
687 Any variables in <replaceable>exp</replaceable> are bound occurrences,
688 but variables bound "to the left" in a pattern are in scope. This
689 feature permits, for example, one argument to a function to be used in
690 the view of another argument. For example, the function
691 <literal>clunky</literal> from <xref linkend="pattern-guards" /> can be
692 written using view patterns as follows:
693
694 <programlisting>
695 clunky env (lookup env -> Just val1) (lookup env -> Just val2) = val1 + val2
696 ...other equations for clunky...
697 </programlisting>
698 </para>
699
700 <para>
701 More precisely, the scoping rules are:
702 <itemizedlist>
703 <listitem>
704 <para>
705 In a single pattern, variables bound by patterns to the left of a view
706 pattern expression are in scope. For example:
707 <programlisting>
708 example :: Maybe ((String -> Integer,Integer), String) -> Bool
709 example Just ((f,_), f -> 4) = True
710 </programlisting>
711
712 Additionally, in function definitions, variables bound by matching earlier curried
713 arguments may be used in view pattern expressions in later arguments:
714 <programlisting>
715 example :: (String -> Integer) -> String -> Bool
716 example f (f -> 4) = True
717 </programlisting>
718 That is, the scoping is the same as it would be if the curried arguments
719 were collected into a tuple.
720 </para>
721 </listitem>
722
723 <listitem>
724 <para>
725 In mutually recursive bindings, such as <literal>let</literal>,
726 <literal>where</literal>, or the top level, view patterns in one
727 declaration may not mention variables bound by other declarations. That
728 is, each declaration must be self-contained. For example, the following
729 program is not allowed:
730 <programlisting>
731 let {(x -> y) = e1 ;
732 (y -> x) = e2 } in x
733 </programlisting>
734
735 (For some amplification on this design choice see
736 <ulink url="http://hackage.haskell.org/trac/ghc/ticket/4061">Trac #4061</ulink>.)
737
738 </para>
739 </listitem>
740 </itemizedlist>
741
742 </para>
743 </listitem>
744
745 <listitem><para> Typing: If <replaceable>exp</replaceable> has type
746 <replaceable>T1</replaceable> <literal>-></literal>
747 <replaceable>T2</replaceable> and <replaceable>pat</replaceable> matches
748 a <replaceable>T2</replaceable>, then the whole view pattern matches a
749 <replaceable>T1</replaceable>.
750 </para></listitem>
751
752 <listitem><para> Matching: To the equations in Section 3.17.3 of the
753 <ulink url="http://www.haskell.org/onlinereport/">Haskell 98
754 Report</ulink>, add the following:
755 <programlisting>
756 case v of { (e -> p) -> e1 ; _ -> e2 }
757 =
758 case (e v) of { p -> e1 ; _ -> e2 }
759 </programlisting>
760 That is, to match a variable <replaceable>v</replaceable> against a pattern
761 <literal>(</literal> <replaceable>exp</replaceable>
762 <literal>-></literal> <replaceable>pat</replaceable>
763 <literal>)</literal>, evaluate <literal>(</literal>
764 <replaceable>exp</replaceable> <replaceable> v</replaceable>
765 <literal>)</literal> and match the result against
766 <replaceable>pat</replaceable>.
767 </para></listitem>
768
769 <listitem><para> Efficiency: When the same view function is applied in
770 multiple branches of a function definition or a case expression (e.g.,
771 in <literal>size</literal> above), GHC makes an attempt to collect these
772 applications into a single nested case expression, so that the view
773 function is only applied once. Pattern compilation in GHC follows the
774 matrix algorithm described in Chapter 4 of <ulink
775 url="http://research.microsoft.com/~simonpj/Papers/slpj-book-1987/">The
776 Implementation of Functional Programming Languages</ulink>. When the
777 top rows of the first column of a matrix are all view patterns with the
778 "same" expression, these patterns are transformed into a single nested
779 case. This includes, for example, adjacent view patterns that line up
780 in a tuple, as in
781 <programlisting>
782 f ((view -> A, p1), p2) = e1
783 f ((view -> B, p3), p4) = e2
784 </programlisting>
785 </para>
786
787 <para> The current notion of when two view pattern expressions are "the
788 same" is very restricted: it is not even full syntactic equality.
789 However, it does include variables, literals, applications, and tuples;
790 e.g., two instances of <literal>view ("hi", "there")</literal> will be
791 collected. However, the current implementation does not compare up to
792 alpha-equivalence, so two instances of <literal>(x, view x ->
793 y)</literal> will not be coalesced.
794 </para>
795
796 </listitem>
797
798 </itemizedlist>
799 </para>
800
801 </sect2>
802
803 <!-- ===================== n+k patterns =================== -->
804
805 <sect2 id="n-k-patterns">
806 <title>n+k patterns</title>
807 <indexterm><primary><option>-XNPlusKPatterns</option></primary></indexterm>
808
809 <para>
810 <literal>n+k</literal> pattern support is disabled by default. To enable
811 it, you can use the <option>-XNPlusKPatterns</option> flag.
812 </para>
813
814 </sect2>
815
816 <!-- ===================== Traditional record syntax =================== -->
817
818 <sect2 id="traditional-record-syntax">
819 <title>Traditional record syntax</title>
820 <indexterm><primary><option>-XNoTraditionalRecordSyntax</option></primary></indexterm>
821
822 <para>
823 Traditional record syntax, such as <literal>C {f = x}</literal>, is enabled by default.
824 To disable it, you can use the <option>-XNoTraditionalRecordSyntax</option> flag.
825 </para>
826
827 </sect2>
828
829 <!-- ===================== Recursive do-notation =================== -->
830
831 <sect2 id="recursive-do-notation">
832 <title>The recursive do-notation
833 </title>
834
835 <para>
836 The do-notation of Haskell 98 does not allow <emphasis>recursive bindings</emphasis>,
837 that is, the variables bound in a do-expression are visible only in the textually following
838 code block. Compare this to a let-expression, where bound variables are visible in the entire binding
839 group.
840 </para>
841
842 <para>
843 It turns out that such recursive bindings do indeed make sense for a variety of monads, but
844 not all. In particular, recursion in this sense requires a fixed-point operator for the underlying
845 monad, captured by the <literal>mfix</literal> method of the <literal>MonadFix</literal> class, defined in <literal>Control.Monad.Fix</literal> as follows:
846 <programlisting>
847 class Monad m => MonadFix m where
848 mfix :: (a -> m a) -> m a
849 </programlisting>
850 Haskell's
851 <literal>Maybe</literal>, <literal>[]</literal> (list), <literal>ST</literal> (both strict and lazy versions),
852 <literal>IO</literal>, and many other monads have <literal>MonadFix</literal> instances. On the negative
853 side, the continuation monad, with the signature <literal>(a -> r) -> r</literal>, does not.
854 </para>
855
856 <para>
857 For monads that do belong to the <literal>MonadFix</literal> class, GHC provides
858 an extended version of the do-notation that allows recursive bindings.
859 The <option>-XRecursiveDo</option> (language pragma: <literal>RecursiveDo</literal>)
860 provides the necessary syntactic support, introducing the keywords <literal>mdo</literal> and
861 <literal>rec</literal> for higher and lower levels of the notation respectively. Unlike
862 bindings in a <literal>do</literal> expression, those introduced by <literal>mdo</literal> and <literal>rec</literal>
863 are recursively defined, much like in an ordinary let-expression. Due to the new
864 keyword <literal>mdo</literal>, we also call this notation the <emphasis>mdo-notation</emphasis>.
865 </para>
866
867 <para>
868 Here is a simple (albeit contrived) example:
869 <programlisting>
870 {-# LANGUAGE RecursiveDo #-}
871 justOnes = mdo { xs &lt;- Just (1:xs)
872 ; return (map negate xs) }
873 </programlisting>
874 or equivalently
875 <programlisting>
876 {-# LANGUAGE RecursiveDo #-}
877 justOnes = do { rec { xs &lt;- Just (1:xs) }
878 ; return (map negate xs) }
879 </programlisting>
880 As you can guess <literal>justOnes</literal> will evaluate to <literal>Just [-1,-1,-1,...</literal>.
881 </para>
882
883 <para>
884 GHC's implementation the mdo-notation closely follows the original translation as described in the paper
885 <ulink url="https://sites.google.com/site/leventerkok/recdo.pdf">A recursive do for Haskell</ulink>, which
886 in turn is based on the work <ulink url="http://sites.google.com/site/leventerkok/erkok-thesis.pdf">Value Recursion
887 in Monadic Computations</ulink>. Furthermore, GHC extends the syntax described in the former paper
888 with a lower level syntax flagged by the <literal>rec</literal> keyword, as we describe next.
889 </para>
890
891 <sect3>
892 <title>Recursive binding groups</title>
893
894 <para>
895 The flag <option>-XRecursiveDo</option> also introduces a new keyword <literal>rec</literal>, which wraps a
896 mutually-recursive group of monadic statements inside a <literal>do</literal> expression, producing a single statement.
897 Similar to a <literal>let</literal> statement inside a <literal>do</literal>, variables bound in
898 the <literal>rec</literal> are visible throughout the <literal>rec</literal> group, and below it. For example, compare
899 <programlisting>
900 do { a &lt;- getChar do { a &lt;- getChar
901 ; let { r1 = f a r2 ; rec { r1 &lt;- f a r2
902 ; ; r2 = g r1 } ; ; r2 &lt;- g r1 }
903 ; return (r1 ++ r2) } ; return (r1 ++ r2) }
904 </programlisting>
905 In both cases, <literal>r1</literal> and <literal>r2</literal> are available both throughout
906 the <literal>let</literal> or <literal>rec</literal> block, and in the statements that follow it.
907 The difference is that <literal>let</literal> is non-monadic, while <literal>rec</literal> is monadic.
908 (In Haskell <literal>let</literal> is really <literal>letrec</literal>, of course.)
909 </para>
910
911 <para>
912 The semantics of <literal>rec</literal> is fairly straightforward. Whenever GHC finds a <literal>rec</literal>
913 group, it will compute its set of bound variables, and will introduce an appropriate call
914 to the underlying monadic value-recursion operator <literal>mfix</literal>, belonging to the
915 <literal>MonadFix</literal> class. Here is an example:
916 <programlisting>
917 rec { b &lt;- f a c ===> (b,c) &lt;- mfix (\~(b,c) -> do { b &lt;- f a c
918 ; c &lt;- f b a } ; c &lt;- f b a
919 ; return (b,c) })
920 </programlisting>
921 As usual, the meta-variables <literal>b</literal>, <literal>c</literal> etc., can be arbitrary patterns.
922 In general, the statement <literal>rec <replaceable>ss</replaceable></literal> is desugared to the statement
923 <programlisting>
924 <replaceable>vs</replaceable> &lt;- mfix (\~<replaceable>vs</replaceable> -&gt; do { <replaceable>ss</replaceable>; return <replaceable>vs</replaceable> })
925 </programlisting>
926 where <replaceable>vs</replaceable> is a tuple of the variables bound by <replaceable>ss</replaceable>.
927 </para>
928
929 <para>
930 Note in particular that the translation for a <literal>rec</literal> block only involves wrapping a call
931 to <literal>mfix</literal>: it performs no other analysis on the bindings. The latter is the task
932 for the <literal>mdo</literal> notation, which is described next.
933 </para>
934 </sect3>
935
936 <sect3>
937 <title>The <literal>mdo</literal> notation</title>
938
939 <para>
940 A <literal>rec</literal>-block tells the compiler where precisely the recursive knot should be tied. It turns out that
941 the placement of the recursive knots can be rather delicate: in particular, we would like the knots to be wrapped
942 around as minimal groups as possible. This process is known as <emphasis>segmentation</emphasis>, and is described
943 in detail in Secton 3.2 of <ulink url="https://sites.google.com/site/leventerkok/recdo.pdf">A recursive do for
944 Haskell</ulink>. Segmentation improves polymorphism and reduces the size of the recursive knot. Most importantly, it avoids
945 unnecessary interference caused by a fundamental issue with the so-called <emphasis>right-shrinking</emphasis>
946 axiom for monadic recursion. In brief, most monads of interest (IO, strict state, etc.) do <emphasis>not</emphasis>
947 have recursion operators that satisfy this axiom, and thus not performing segmentation can cause unnecessary
948 interference, changing the termination behavior of the resulting translation.
949 (Details can be found in Sections 3.1 and 7.2.2 of
950 <ulink url="http://sites.google.com/site/leventerkok/erkok-thesis.pdf">Value Recursion in Monadic Computations</ulink>.)
951 </para>
952
953 <para>
954 The <literal>mdo</literal> notation removes the burden of placing
955 explicit <literal>rec</literal> blocks in the code. Unlike an
956 ordinary <literal>do</literal> expression, in which variables bound by
957 statements are only in scope for later statements, variables bound in
958 an <literal>mdo</literal> expression are in scope for all statements
959 of the expression. The compiler then automatically identifies minimal
960 mutually recursively dependent segments of statements, treating them as
961 if the user had wrapped a <literal>rec</literal> qualifier around them.
962 </para>
963
964 <para>
965 The definition is syntactic:
966 </para>
967 <itemizedlist>
968 <listitem>
969 <para>
970 A generator <replaceable>g</replaceable>
971 <emphasis>depends</emphasis> on a textually following generator
972 <replaceable>g'</replaceable>, if
973 </para>
974 <itemizedlist>
975 <listitem>
976 <para>
977 <replaceable>g'</replaceable> defines a variable that
978 is used by <replaceable>g</replaceable>, or
979 </para>
980 </listitem>
981 <listitem>
982 <para>
983 <replaceable>g'</replaceable> textually appears between
984 <replaceable>g</replaceable> and
985 <replaceable>g''</replaceable>, where <replaceable>g</replaceable>
986 depends on <replaceable>g''</replaceable>.
987 </para>
988 </listitem>
989 </itemizedlist>
990 </listitem>
991 <listitem>
992 <para>
993 A <emphasis>segment</emphasis> of a given
994 <literal>mdo</literal>-expression is a minimal sequence of generators
995 such that no generator of the sequence depends on an outside
996 generator. As a special case, although it is not a generator,
997 the final expression in an <literal>mdo</literal>-expression is
998 considered to form a segment by itself.
999 </para>
1000 </listitem>
1001 </itemizedlist>
1002 <para>
1003 Segments in this sense are
1004 related to <emphasis>strongly-connected components</emphasis> analysis,
1005 with the exception that bindings in a segment cannot be reordered and
1006 must be contiguous.
1007 </para>
1008
1009 <para>
1010 Here is an example <literal>mdo</literal>-expression, and its translation to <literal>rec</literal> blocks:
1011 <programlisting>
1012 mdo { a &lt;- getChar ===> do { a &lt;- getChar
1013 ; b &lt;- f a c ; rec { b &lt;- f a c
1014 ; c &lt;- f b a ; ; c &lt;- f b a }
1015 ; z &lt;- h a b ; z &lt;- h a b
1016 ; d &lt;- g d e ; rec { d &lt;- g d e
1017 ; e &lt;- g a z ; ; e &lt;- g a z }
1018 ; putChar c } ; putChar c }
1019 </programlisting>
1020 Note that a given <literal>mdo</literal> expression can cause the creation of multiple <literal>rec</literal> blocks.
1021 If there are no recursive dependencies, <literal>mdo</literal> will introduce no <literal>rec</literal> blocks. In this
1022 latter case an <literal>mdo</literal> expression is precisely the same as a <literal>do</literal> expression, as one
1023 would expect.
1024 </para>
1025
1026 <para>
1027 In summary, given an <literal>mdo</literal> expression, GHC first performs segmentation, introducing
1028 <literal>rec</literal> blocks to wrap over minimal recursive groups. Then, each resulting
1029 <literal>rec</literal> is desugared, using a call to <literal>Control.Monad.Fix.mfix</literal> as described
1030 in the previous section. The original <literal>mdo</literal>-expression typechecks exactly when the desugared
1031 version would do so.
1032 </para>
1033
1034 <para>
1035 Here are some other important points in using the recursive-do notation:
1036
1037 <itemizedlist>
1038 <listitem>
1039 <para>
1040 It is enabled with the flag <literal>-XRecursiveDo</literal>, or the <literal>LANGUAGE RecursiveDo</literal>
1041 pragma. (The same flag enables both <literal>mdo</literal>-notation, and the use of <literal>rec</literal>
1042 blocks inside <literal>do</literal> expressions.)
1043 </para>
1044 </listitem>
1045 <listitem>
1046 <para>
1047 <literal>rec</literal> blocks can also be used inside <literal>mdo</literal>-expressions, which will be
1048 treated as a single statement. However, it is good style to either use <literal>mdo</literal> or
1049 <literal>rec</literal> blocks in a single expression.
1050 </para>
1051 </listitem>
1052 <listitem>
1053 <para>
1054 If recursive bindings are required for a monad, then that monad must be declared an instance of
1055 the <literal>MonadFix</literal> class.
1056 </para>
1057 </listitem>
1058 <listitem>
1059 <para>
1060 The following instances of <literal>MonadFix</literal> are automatically provided: List, Maybe, IO.
1061 Furthermore, the <literal>Control.Monad.ST</literal> and <literal>Control.Monad.ST.Lazy</literal>
1062 modules provide the instances of the <literal>MonadFix</literal> class for Haskell's internal
1063 state monad (strict and lazy, respectively).
1064 </para>
1065 </listitem>
1066 <listitem>
1067 <para>
1068 Like <literal>let</literal> and <literal>where</literal> bindings, name shadowing is not allowed within
1069 an <literal>mdo</literal>-expression or a <literal>rec</literal>-block; that is, all the names bound in
1070 a single <literal>rec</literal> must be distinct. (GHC will complain if this is not the case.)
1071 </para>
1072 </listitem>
1073 </itemizedlist>
1074 </para>
1075 </sect3>
1076
1077
1078 </sect2>
1079
1080
1081 <!-- ===================== PARALLEL LIST COMPREHENSIONS =================== -->
1082
1083 <sect2 id="parallel-list-comprehensions">
1084 <title>Parallel List Comprehensions</title>
1085 <indexterm><primary>list comprehensions</primary><secondary>parallel</secondary>
1086 </indexterm>
1087 <indexterm><primary>parallel list comprehensions</primary>
1088 </indexterm>
1089
1090 <para>Parallel list comprehensions are a natural extension to list
1091 comprehensions. List comprehensions can be thought of as a nice
1092 syntax for writing maps and filters. Parallel comprehensions
1093 extend this to include the zipWith family.</para>
1094
1095 <para>A parallel list comprehension has multiple independent
1096 branches of qualifier lists, each separated by a `|' symbol. For
1097 example, the following zips together two lists:</para>
1098
1099 <programlisting>
1100 [ (x, y) | x &lt;- xs | y &lt;- ys ]
1101 </programlisting>
1102
1103 <para>The behaviour of parallel list comprehensions follows that of
1104 zip, in that the resulting list will have the same length as the
1105 shortest branch.</para>
1106
1107 <para>We can define parallel list comprehensions by translation to
1108 regular comprehensions. Here's the basic idea:</para>
1109
1110 <para>Given a parallel comprehension of the form: </para>
1111
1112 <programlisting>
1113 [ e | p1 &lt;- e11, p2 &lt;- e12, ...
1114 | q1 &lt;- e21, q2 &lt;- e22, ...
1115 ...
1116 ]
1117 </programlisting>
1118
1119 <para>This will be translated to: </para>
1120
1121 <programlisting>
1122 [ e | ((p1,p2), (q1,q2), ...) &lt;- zipN [(p1,p2) | p1 &lt;- e11, p2 &lt;- e12, ...]
1123 [(q1,q2) | q1 &lt;- e21, q2 &lt;- e22, ...]
1124 ...
1125 ]
1126 </programlisting>
1127
1128 <para>where `zipN' is the appropriate zip for the given number of
1129 branches.</para>
1130
1131 </sect2>
1132
1133 <!-- ===================== TRANSFORM LIST COMPREHENSIONS =================== -->
1134
1135 <sect2 id="generalised-list-comprehensions">
1136 <title>Generalised (SQL-Like) List Comprehensions</title>
1137 <indexterm><primary>list comprehensions</primary><secondary>generalised</secondary>
1138 </indexterm>
1139 <indexterm><primary>extended list comprehensions</primary>
1140 </indexterm>
1141 <indexterm><primary>group</primary></indexterm>
1142 <indexterm><primary>sql</primary></indexterm>
1143
1144
1145 <para>Generalised list comprehensions are a further enhancement to the
1146 list comprehension syntactic sugar to allow operations such as sorting
1147 and grouping which are familiar from SQL. They are fully described in the
1148 paper <ulink url="http://research.microsoft.com/~simonpj/papers/list-comp">
1149 Comprehensive comprehensions: comprehensions with "order by" and "group by"</ulink>,
1150 except that the syntax we use differs slightly from the paper.</para>
1151 <para>The extension is enabled with the flag <option>-XTransformListComp</option>.</para>
1152 <para>Here is an example:
1153 <programlisting>
1154 employees = [ ("Simon", "MS", 80)
1155 , ("Erik", "MS", 100)
1156 , ("Phil", "Ed", 40)
1157 , ("Gordon", "Ed", 45)
1158 , ("Paul", "Yale", 60)]
1159
1160 output = [ (the dept, sum salary)
1161 | (name, dept, salary) &lt;- employees
1162 , then group by dept using groupWith
1163 , then sortWith by (sum salary)
1164 , then take 5 ]
1165 </programlisting>
1166 In this example, the list <literal>output</literal> would take on
1167 the value:
1168
1169 <programlisting>
1170 [("Yale", 60), ("Ed", 85), ("MS", 180)]
1171 </programlisting>
1172 </para>
1173 <para>There are three new keywords: <literal>group</literal>, <literal>by</literal>, and <literal>using</literal>.
1174 (The functions <literal>sortWith</literal> and <literal>groupWith</literal> are not keywords; they are ordinary
1175 functions that are exported by <literal>GHC.Exts</literal>.)</para>
1176
1177 <para>There are five new forms of comprehension qualifier,
1178 all introduced by the (existing) keyword <literal>then</literal>:
1179 <itemizedlist>
1180 <listitem>
1181
1182 <programlisting>
1183 then f
1184 </programlisting>
1185
1186 This statement requires that <literal>f</literal> have the type <literal>
1187 forall a. [a] -> [a]</literal>. You can see an example of its use in the
1188 motivating example, as this form is used to apply <literal>take 5</literal>.
1189
1190 </listitem>
1191
1192
1193 <listitem>
1194 <para>
1195 <programlisting>
1196 then f by e
1197 </programlisting>
1198
1199 This form is similar to the previous one, but allows you to create a function
1200 which will be passed as the first argument to f. As a consequence f must have
1201 the type <literal>forall a. (a -> t) -> [a] -> [a]</literal>. As you can see
1202 from the type, this function lets f &quot;project out&quot; some information
1203 from the elements of the list it is transforming.</para>
1204
1205 <para>An example is shown in the opening example, where <literal>sortWith</literal>
1206 is supplied with a function that lets it find out the <literal>sum salary</literal>
1207 for any item in the list comprehension it transforms.</para>
1208
1209 </listitem>
1210
1211
1212 <listitem>
1213
1214 <programlisting>
1215 then group by e using f
1216 </programlisting>
1217
1218 <para>This is the most general of the grouping-type statements. In this form,
1219 f is required to have type <literal>forall a. (a -> t) -> [a] -> [[a]]</literal>.
1220 As with the <literal>then f by e</literal> case above, the first argument
1221 is a function supplied to f by the compiler which lets it compute e on every
1222 element of the list being transformed. However, unlike the non-grouping case,
1223 f additionally partitions the list into a number of sublists: this means that
1224 at every point after this statement, binders occurring before it in the comprehension
1225 refer to <emphasis>lists</emphasis> of possible values, not single values. To help understand
1226 this, let's look at an example:</para>
1227
1228 <programlisting>
1229 -- This works similarly to groupWith in GHC.Exts, but doesn't sort its input first
1230 groupRuns :: Eq b => (a -> b) -> [a] -> [[a]]
1231 groupRuns f = groupBy (\x y -> f x == f y)
1232
1233 output = [ (the x, y)
1234 | x &lt;- ([1..3] ++ [1..2])
1235 , y &lt;- [4..6]
1236 , then group by x using groupRuns ]
1237 </programlisting>
1238
1239 <para>This results in the variable <literal>output</literal> taking on the value below:</para>
1240
1241 <programlisting>
1242 [(1, [4, 5, 6]), (2, [4, 5, 6]), (3, [4, 5, 6]), (1, [4, 5, 6]), (2, [4, 5, 6])]
1243 </programlisting>
1244
1245 <para>Note that we have used the <literal>the</literal> function to change the type
1246 of x from a list to its original numeric type. The variable y, in contrast, is left
1247 unchanged from the list form introduced by the grouping.</para>
1248
1249 </listitem>
1250
1251 <listitem>
1252
1253 <programlisting>
1254 then group using f
1255 </programlisting>
1256
1257 <para>With this form of the group statement, f is required to simply have the type
1258 <literal>forall a. [a] -> [[a]]</literal>, which will be used to group up the
1259 comprehension so far directly. An example of this form is as follows:</para>
1260
1261 <programlisting>
1262 output = [ x
1263 | y &lt;- [1..5]
1264 , x &lt;- "hello"
1265 , then group using inits]
1266 </programlisting>
1267
1268 <para>This will yield a list containing every prefix of the word "hello" written out 5 times:</para>
1269
1270 <programlisting>
1271 ["","h","he","hel","hell","hello","helloh","hellohe","hellohel","hellohell","hellohello","hellohelloh",...]
1272 </programlisting>
1273
1274 </listitem>
1275 </itemizedlist>
1276 </para>
1277 </sect2>
1278
1279 <!-- ===================== MONAD COMPREHENSIONS ===================== -->
1280
1281 <sect2 id="monad-comprehensions">
1282 <title>Monad comprehensions</title>
1283 <indexterm><primary>monad comprehensions</primary></indexterm>
1284
1285 <para>
1286 Monad comprehensions generalise the list comprehension notation,
1287 including parallel comprehensions
1288 (<xref linkend="parallel-list-comprehensions"/>) and
1289 transform comprehensions (<xref linkend="generalised-list-comprehensions"/>)
1290 to work for any monad.
1291 </para>
1292
1293 <para>Monad comprehensions support:</para>
1294
1295 <itemizedlist>
1296 <listitem>
1297 <para>
1298 Bindings:
1299 </para>
1300
1301 <programlisting>
1302 [ x + y | x &lt;- Just 1, y &lt;- Just 2 ]
1303 </programlisting>
1304
1305 <para>
1306 Bindings are translated with the <literal>(&gt;&gt;=)</literal> and
1307 <literal>return</literal> functions to the usual do-notation:
1308 </para>
1309
1310 <programlisting>
1311 do x &lt;- Just 1
1312 y &lt;- Just 2
1313 return (x+y)
1314 </programlisting>
1315
1316 </listitem>
1317 <listitem>
1318 <para>
1319 Guards:
1320 </para>
1321
1322 <programlisting>
1323 [ x | x &lt;- [1..10], x &lt;= 5 ]
1324 </programlisting>
1325
1326 <para>
1327 Guards are translated with the <literal>guard</literal> function,
1328 which requires a <literal>MonadPlus</literal> instance:
1329 </para>
1330
1331 <programlisting>
1332 do x &lt;- [1..10]
1333 guard (x &lt;= 5)
1334 return x
1335 </programlisting>
1336
1337 </listitem>
1338 <listitem>
1339 <para>
1340 Transform statements (as with <literal>-XTransformListComp</literal>):
1341 </para>
1342
1343 <programlisting>
1344 [ x+y | x &lt;- [1..10], y &lt;- [1..x], then take 2 ]
1345 </programlisting>
1346
1347 <para>
1348 This translates to:
1349 </para>
1350
1351 <programlisting>
1352 do (x,y) &lt;- take 2 (do x &lt;- [1..10]
1353 y &lt;- [1..x]
1354 return (x,y))
1355 return (x+y)
1356 </programlisting>
1357
1358 </listitem>
1359 <listitem>
1360 <para>
1361 Group statements (as with <literal>-XTransformListComp</literal>):
1362 </para>
1363
1364 <programlisting>
1365 [ x | x &lt;- [1,1,2,2,3], then group by x using GHC.Exts.groupWith ]
1366 [ x | x &lt;- [1,1,2,2,3], then group using myGroup ]
1367 </programlisting>
1368
1369 </listitem>
1370 <listitem>
1371 <para>
1372 Parallel statements (as with <literal>-XParallelListComp</literal>):
1373 </para>
1374
1375 <programlisting>
1376 [ (x+y) | x &lt;- [1..10]
1377 | y &lt;- [11..20]
1378 ]
1379 </programlisting>
1380
1381 <para>
1382 Parallel statements are translated using the
1383 <literal>mzip</literal> function, which requires a
1384 <literal>MonadZip</literal> instance defined in
1385 <ulink url="&libraryBaseLocation;/Control-Monad-Zip.html"><literal>Control.Monad.Zip</literal></ulink>:
1386 </para>
1387
1388 <programlisting>
1389 do (x,y) &lt;- mzip (do x &lt;- [1..10]
1390 return x)
1391 (do y &lt;- [11..20]
1392 return y)
1393 return (x+y)
1394 </programlisting>
1395
1396 </listitem>
1397 </itemizedlist>
1398
1399 <para>
1400 All these features are enabled by default if the
1401 <literal>MonadComprehensions</literal> extension is enabled. The types
1402 and more detailed examples on how to use comprehensions are explained
1403 in the previous chapters <xref
1404 linkend="generalised-list-comprehensions"/> and <xref
1405 linkend="parallel-list-comprehensions"/>. In general you just have
1406 to replace the type <literal>[a]</literal> with the type
1407 <literal>Monad m => m a</literal> for monad comprehensions.
1408 </para>
1409
1410 <para>
1411 Note: Even though most of these examples are using the list monad,
1412 monad comprehensions work for any monad.
1413 The <literal>base</literal> package offers all necessary instances for
1414 lists, which make <literal>MonadComprehensions</literal> backward
1415 compatible to built-in, transform and parallel list comprehensions.
1416 </para>
1417 <para> More formally, the desugaring is as follows. We write <literal>D[ e | Q]</literal>
1418 to mean the desugaring of the monad comprehension <literal>[ e | Q]</literal>:
1419 <programlisting>
1420 Expressions: e
1421 Declarations: d
1422 Lists of qualifiers: Q,R,S
1423
1424 -- Basic forms
1425 D[ e | ] = return e
1426 D[ e | p &lt;- e, Q ] = e &gt;&gt;= \p -&gt; D[ e | Q ]
1427 D[ e | e, Q ] = guard e &gt;&gt; \p -&gt; D[ e | Q ]
1428 D[ e | let d, Q ] = let d in D[ e | Q ]
1429
1430 -- Parallel comprehensions (iterate for multiple parallel branches)
1431 D[ e | (Q | R), S ] = mzip D[ Qv | Q ] D[ Rv | R ] &gt;&gt;= \(Qv,Rv) -&gt; D[ e | S ]
1432
1433 -- Transform comprehensions
1434 D[ e | Q then f, R ] = f D[ Qv | Q ] &gt;&gt;= \Qv -&gt; D[ e | R ]
1435
1436 D[ e | Q then f by b, R ] = f (\Qv -&gt; b) D[ Qv | Q ] &gt;&gt;= \Qv -&gt; D[ e | R ]
1437
1438 D[ e | Q then group using f, R ] = f D[ Qv | Q ] &gt;&gt;= \ys -&gt;
1439 case (fmap selQv1 ys, ..., fmap selQvn ys) of
1440 Qv -&gt; D[ e | R ]
1441
1442 D[ e | Q then group by b using f, R ] = f (\Qv -&gt; b) D[ Qv | Q ] &gt;&gt;= \ys -&gt;
1443 case (fmap selQv1 ys, ..., fmap selQvn ys) of
1444 Qv -&gt; D[ e | R ]
1445
1446 where Qv is the tuple of variables bound by Q (and used subsequently)
1447 selQvi is a selector mapping Qv to the ith component of Qv
1448
1449 Operator Standard binding Expected type
1450 --------------------------------------------------------------------
1451 return GHC.Base t1 -&gt; m t2
1452 (&gt;&gt;=) GHC.Base m1 t1 -&gt; (t2 -&gt; m2 t3) -&gt; m3 t3
1453 (&gt;&gt;) GHC.Base m1 t1 -&gt; m2 t2 -&gt; m3 t3
1454 guard Control.Monad t1 -&gt; m t2
1455 fmap GHC.Base forall a b. (a-&gt;b) -&gt; n a -&gt; n b
1456 mzip Control.Monad.Zip forall a b. m a -&gt; m b -&gt; m (a,b)
1457 </programlisting>
1458 The comprehension should typecheck when its desugaring would typecheck.
1459 </para>
1460 <para>
1461 Monad comprehensions support rebindable syntax (<xref linkend="rebindable-syntax"/>).
1462 Without rebindable
1463 syntax, the operators from the "standard binding" module are used; with
1464 rebindable syntax, the operators are looked up in the current lexical scope.
1465 For example, parallel comprehensions will be typechecked and desugared
1466 using whatever "<literal>mzip</literal>" is in scope.
1467 </para>
1468 <para>
1469 The rebindable operators must have the "Expected type" given in the
1470 table above. These types are surprisingly general. For example, you can
1471 use a bind operator with the type
1472 <programlisting>
1473 (>>=) :: T x y a -> (a -> T y z b) -> T x z b
1474 </programlisting>
1475 In the case of transform comprehensions, notice that the groups are
1476 parameterised over some arbitrary type <literal>n</literal> (provided it
1477 has an <literal>fmap</literal>, as well as
1478 the comprehension being over an arbitrary monad.
1479 </para>
1480 </sect2>
1481
1482 <!-- ===================== REBINDABLE SYNTAX =================== -->
1483
1484 <sect2 id="rebindable-syntax">
1485 <title>Rebindable syntax and the implicit Prelude import</title>
1486
1487 <para><indexterm><primary>-XNoImplicitPrelude
1488 option</primary></indexterm> GHC normally imports
1489 <filename>Prelude.hi</filename> files for you. If you'd
1490 rather it didn't, then give it a
1491 <option>-XNoImplicitPrelude</option> option. The idea is
1492 that you can then import a Prelude of your own. (But don't
1493 call it <literal>Prelude</literal>; the Haskell module
1494 namespace is flat, and you must not conflict with any
1495 Prelude module.)</para>
1496
1497 <para>Suppose you are importing a Prelude of your own
1498 in order to define your own numeric class
1499 hierarchy. It completely defeats that purpose if the
1500 literal "1" means "<literal>Prelude.fromInteger
1501 1</literal>", which is what the Haskell Report specifies.
1502 So the <option>-XRebindableSyntax</option>
1503 flag causes
1504 the following pieces of built-in syntax to refer to
1505 <emphasis>whatever is in scope</emphasis>, not the Prelude
1506 versions:
1507 <itemizedlist>
1508 <listitem>
1509 <para>An integer literal <literal>368</literal> means
1510 "<literal>fromInteger (368::Integer)</literal>", rather than
1511 "<literal>Prelude.fromInteger (368::Integer)</literal>".
1512 </para> </listitem>
1513
1514 <listitem><para>Fractional literals are handed in just the same way,
1515 except that the translation is
1516 <literal>fromRational (3.68::Rational)</literal>.
1517 </para> </listitem>
1518
1519 <listitem><para>The equality test in an overloaded numeric pattern
1520 uses whatever <literal>(==)</literal> is in scope.
1521 </para> </listitem>
1522
1523 <listitem><para>The subtraction operation, and the
1524 greater-than-or-equal test, in <literal>n+k</literal> patterns
1525 use whatever <literal>(-)</literal> and <literal>(>=)</literal> are in scope.
1526 </para></listitem>
1527
1528 <listitem>
1529 <para>Negation (e.g. "<literal>- (f x)</literal>")
1530 means "<literal>negate (f x)</literal>", both in numeric
1531 patterns, and expressions.
1532 </para></listitem>
1533
1534 <listitem>
1535 <para>Conditionals (e.g. "<literal>if</literal> e1 <literal>then</literal> e2 <literal>else</literal> e3")
1536 means "<literal>ifThenElse</literal> e1 e2 e3". However <literal>case</literal> expressions are unaffected.
1537 </para></listitem>
1538
1539 <listitem>
1540 <para>"Do" notation is translated using whatever
1541 functions <literal>(>>=)</literal>,
1542 <literal>(>>)</literal>, and <literal>fail</literal>,
1543 are in scope (not the Prelude
1544 versions). List comprehensions, mdo (<xref linkend="recursive-do-notation"/>), and parallel array
1545 comprehensions, are unaffected. </para></listitem>
1546
1547 <listitem>
1548 <para>Arrow
1549 notation (see <xref linkend="arrow-notation"/>)
1550 uses whatever <literal>arr</literal>,
1551 <literal>(>>>)</literal>, <literal>first</literal>,
1552 <literal>app</literal>, <literal>(|||)</literal> and
1553 <literal>loop</literal> functions are in scope. But unlike the
1554 other constructs, the types of these functions must match the
1555 Prelude types very closely. Details are in flux; if you want
1556 to use this, ask!
1557 </para></listitem>
1558 </itemizedlist>
1559 <option>-XRebindableSyntax</option> implies <option>-XNoImplicitPrelude</option>.
1560 </para>
1561 <para>
1562 In all cases (apart from arrow notation), the static semantics should be that of the desugared form,
1563 even if that is a little unexpected. For example, the
1564 static semantics of the literal <literal>368</literal>
1565 is exactly that of <literal>fromInteger (368::Integer)</literal>; it's fine for
1566 <literal>fromInteger</literal> to have any of the types:
1567 <programlisting>
1568 fromInteger :: Integer -> Integer
1569 fromInteger :: forall a. Foo a => Integer -> a
1570 fromInteger :: Num a => a -> Integer
1571 fromInteger :: Integer -> Bool -> Bool
1572 </programlisting>
1573 </para>
1574
1575 <para>Be warned: this is an experimental facility, with
1576 fewer checks than usual. Use <literal>-dcore-lint</literal>
1577 to typecheck the desugared program. If Core Lint is happy
1578 you should be all right.</para>
1579
1580 </sect2>
1581
1582 <sect2 id="postfix-operators">
1583 <title>Postfix operators</title>
1584
1585 <para>
1586 The <option>-XPostfixOperators</option> flag enables a small
1587 extension to the syntax of left operator sections, which allows you to
1588 define postfix operators. The extension is this: the left section
1589 <programlisting>
1590 (e !)
1591 </programlisting>
1592 is equivalent (from the point of view of both type checking and execution) to the expression
1593 <programlisting>
1594 ((!) e)
1595 </programlisting>
1596 (for any expression <literal>e</literal> and operator <literal>(!)</literal>.
1597 The strict Haskell 98 interpretation is that the section is equivalent to
1598 <programlisting>
1599 (\y -> (!) e y)
1600 </programlisting>
1601 That is, the operator must be a function of two arguments. GHC allows it to
1602 take only one argument, and that in turn allows you to write the function
1603 postfix.
1604 </para>
1605 <para>The extension does not extend to the left-hand side of function
1606 definitions; you must define such a function in prefix form.</para>
1607
1608 </sect2>
1609
1610 <sect2 id="tuple-sections">
1611 <title>Tuple sections</title>
1612
1613 <para>
1614 The <option>-XTupleSections</option> flag enables Python-style partially applied
1615 tuple constructors. For example, the following program
1616 <programlisting>
1617 (, True)
1618 </programlisting>
1619 is considered to be an alternative notation for the more unwieldy alternative
1620 <programlisting>
1621 \x -> (x, True)
1622 </programlisting>
1623 You can omit any combination of arguments to the tuple, as in the following
1624 <programlisting>
1625 (, "I", , , "Love", , 1337)
1626 </programlisting>
1627 which translates to
1628 <programlisting>
1629 \a b c d -> (a, "I", b, c, "Love", d, 1337)
1630 </programlisting>
1631 </para>
1632
1633 <para>
1634 If you have <link linkend="unboxed-tuples">unboxed tuples</link> enabled, tuple sections
1635 will also be available for them, like so
1636 <programlisting>
1637 (# , True #)
1638 </programlisting>
1639 Because there is no unboxed unit tuple, the following expression
1640 <programlisting>
1641 (# #)
1642 </programlisting>
1643 continues to stand for the unboxed singleton tuple data constructor.
1644 </para>
1645
1646 </sect2>
1647
1648 <sect2 id="lambda-case">
1649 <title>Lambda-case</title>
1650 <para>
1651 The <option>-XLambdaCase</option> flag enables expressions of the form
1652 <programlisting>
1653 \case { p1 -> e1; ...; pN -> eN }
1654 </programlisting>
1655 which is equivalent to
1656 <programlisting>
1657 \freshName -> case freshName of { p1 -> e1; ...; pN -> eN }
1658 </programlisting>
1659 Note that <literal>\case</literal> starts a layout, so you can write
1660 <programlisting>
1661 \case
1662 p1 -> e1
1663 ...
1664 pN -> eN
1665 </programlisting>
1666 </para>
1667 </sect2>
1668
1669 <sect2 id="multi-way-if">
1670 <title>Multi-way if-expressions</title>
1671 <para>
1672 With <option>-XMultiWayIf</option> flag GHC accepts conditional expressions
1673 with multiple branches:
1674 <programlisting>
1675 if | guard1 -> expr1
1676 | ...
1677 | guardN -> exprN
1678 </programlisting>
1679 which is roughly equivalent to
1680 <programlisting>
1681 case () of
1682 _ | guard1 -> expr1
1683 ...
1684 _ | guardN -> exprN
1685 </programlisting>
1686 except that multi-way if-expressions do not alter the layout.
1687 </para>
1688 </sect2>
1689
1690 <sect2 id="disambiguate-fields">
1691 <title>Record field disambiguation</title>
1692 <para>
1693 In record construction and record pattern matching
1694 it is entirely unambiguous which field is referred to, even if there are two different
1695 data types in scope with a common field name. For example:
1696 <programlisting>
1697 module M where
1698 data S = MkS { x :: Int, y :: Bool }
1699
1700 module Foo where
1701 import M
1702
1703 data T = MkT { x :: Int }
1704
1705 ok1 (MkS { x = n }) = n+1 -- Unambiguous
1706 ok2 n = MkT { x = n+1 } -- Unambiguous
1707
1708 bad1 k = k { x = 3 } -- Ambiguous
1709 bad2 k = x k -- Ambiguous
1710 </programlisting>
1711 Even though there are two <literal>x</literal>'s in scope,
1712 it is clear that the <literal>x</literal> in the pattern in the
1713 definition of <literal>ok1</literal> can only mean the field
1714 <literal>x</literal> from type <literal>S</literal>. Similarly for
1715 the function <literal>ok2</literal>. However, in the record update
1716 in <literal>bad1</literal> and the record selection in <literal>bad2</literal>
1717 it is not clear which of the two types is intended.
1718 </para>
1719 <para>
1720 Haskell 98 regards all four as ambiguous, but with the
1721 <option>-XDisambiguateRecordFields</option> flag, GHC will accept
1722 the former two. The rules are precisely the same as those for instance
1723 declarations in Haskell 98, where the method names on the left-hand side
1724 of the method bindings in an instance declaration refer unambiguously
1725 to the method of that class (provided they are in scope at all), even
1726 if there are other variables in scope with the same name.
1727 This reduces the clutter of qualified names when you import two
1728 records from different modules that use the same field name.
1729 </para>
1730 <para>
1731 Some details:
1732 <itemizedlist>
1733 <listitem><para>
1734 Field disambiguation can be combined with punning (see <xref linkend="record-puns"/>). For example:
1735 <programlisting>
1736 module Foo where
1737 import M
1738 x=True
1739 ok3 (MkS { x }) = x+1 -- Uses both disambiguation and punning
1740 </programlisting>
1741 </para></listitem>
1742
1743 <listitem><para>
1744 With <option>-XDisambiguateRecordFields</option> you can use <emphasis>unqualified</emphasis>
1745 field names even if the corresponding selector is only in scope <emphasis>qualified</emphasis>
1746 For example, assuming the same module <literal>M</literal> as in our earlier example, this is legal:
1747 <programlisting>
1748 module Foo where
1749 import qualified M -- Note qualified
1750
1751 ok4 (M.MkS { x = n }) = n+1 -- Unambiguous
1752 </programlisting>
1753 Since the constructor <literal>MkS</literal> is only in scope qualified, you must
1754 name it <literal>M.MkS</literal>, but the field <literal>x</literal> does not need
1755 to be qualified even though <literal>M.x</literal> is in scope but <literal>x</literal>
1756 is not. (In effect, it is qualified by the constructor.)
1757 </para></listitem>
1758 </itemizedlist>
1759 </para>
1760
1761 </sect2>
1762
1763 <!-- ===================== Record puns =================== -->
1764
1765 <sect2 id="record-puns">
1766 <title>Record puns
1767 </title>
1768
1769 <para>
1770 Record puns are enabled by the flag <literal>-XNamedFieldPuns</literal>.
1771 </para>
1772
1773 <para>
1774 When using records, it is common to write a pattern that binds a
1775 variable with the same name as a record field, such as:
1776
1777 <programlisting>
1778 data C = C {a :: Int}
1779 f (C {a = a}) = a
1780 </programlisting>
1781 </para>
1782
1783 <para>
1784 Record punning permits the variable name to be elided, so one can simply
1785 write
1786
1787 <programlisting>
1788 f (C {a}) = a
1789 </programlisting>
1790
1791 to mean the same pattern as above. That is, in a record pattern, the
1792 pattern <literal>a</literal> expands into the pattern <literal>a =
1793 a</literal> for the same name <literal>a</literal>.
1794 </para>
1795
1796 <para>
1797 Note that:
1798 <itemizedlist>
1799 <listitem><para>
1800 Record punning can also be used in an expression, writing, for example,
1801 <programlisting>
1802 let a = 1 in C {a}
1803 </programlisting>
1804 instead of
1805 <programlisting>
1806 let a = 1 in C {a = a}
1807 </programlisting>
1808 The expansion is purely syntactic, so the expanded right-hand side
1809 expression refers to the nearest enclosing variable that is spelled the
1810 same as the field name.
1811 </para></listitem>
1812
1813 <listitem><para>
1814 Puns and other patterns can be mixed in the same record:
1815 <programlisting>
1816 data C = C {a :: Int, b :: Int}
1817 f (C {a, b = 4}) = a
1818 </programlisting>
1819 </para></listitem>
1820
1821 <listitem><para>
1822 Puns can be used wherever record patterns occur (e.g. in
1823 <literal>let</literal> bindings or at the top-level).
1824 </para></listitem>
1825
1826 <listitem><para>
1827 A pun on a qualified field name is expanded by stripping off the module qualifier.
1828 For example:
1829 <programlisting>
1830 f (C {M.a}) = a
1831 </programlisting>
1832 means
1833 <programlisting>
1834 f (M.C {M.a = a}) = a
1835 </programlisting>
1836 (This is useful if the field selector <literal>a</literal> for constructor <literal>M.C</literal>
1837 is only in scope in qualified form.)
1838 </para></listitem>
1839 </itemizedlist>
1840 </para>
1841
1842
1843 </sect2>
1844
1845 <!-- ===================== Record wildcards =================== -->
1846
1847 <sect2 id="record-wildcards">
1848 <title>Record wildcards
1849 </title>
1850
1851 <para>
1852 Record wildcards are enabled by the flag <literal>-XRecordWildCards</literal>.
1853 This flag implies <literal>-XDisambiguateRecordFields</literal>.
1854 </para>
1855
1856 <para>
1857 For records with many fields, it can be tiresome to write out each field
1858 individually in a record pattern, as in
1859 <programlisting>
1860 data C = C {a :: Int, b :: Int, c :: Int, d :: Int}
1861 f (C {a = 1, b = b, c = c, d = d}) = b + c + d
1862 </programlisting>
1863 </para>
1864
1865 <para>
1866 Record wildcard syntax permits a "<literal>..</literal>" in a record
1867 pattern, where each elided field <literal>f</literal> is replaced by the
1868 pattern <literal>f = f</literal>. For example, the above pattern can be
1869 written as
1870 <programlisting>
1871 f (C {a = 1, ..}) = b + c + d
1872 </programlisting>
1873 </para>
1874
1875 <para>
1876 More details:
1877 <itemizedlist>
1878 <listitem><para>
1879 Wildcards can be mixed with other patterns, including puns
1880 (<xref linkend="record-puns"/>); for example, in a pattern <literal>C {a
1881 = 1, b, ..})</literal>. Additionally, record wildcards can be used
1882 wherever record patterns occur, including in <literal>let</literal>
1883 bindings and at the top-level. For example, the top-level binding
1884 <programlisting>
1885 C {a = 1, ..} = e
1886 </programlisting>
1887 defines <literal>b</literal>, <literal>c</literal>, and
1888 <literal>d</literal>.
1889 </para></listitem>
1890
1891 <listitem><para>
1892 Record wildcards can also be used in expressions, writing, for example,
1893 <programlisting>
1894 let {a = 1; b = 2; c = 3; d = 4} in C {..}
1895 </programlisting>
1896 in place of
1897 <programlisting>
1898 let {a = 1; b = 2; c = 3; d = 4} in C {a=a, b=b, c=c, d=d}
1899 </programlisting>
1900 The expansion is purely syntactic, so the record wildcard
1901 expression refers to the nearest enclosing variables that are spelled
1902 the same as the omitted field names.
1903 </para></listitem>
1904
1905 <listitem><para>
1906 The "<literal>..</literal>" expands to the missing
1907 <emphasis>in-scope</emphasis> record fields.
1908 Specifically the expansion of "<literal>C {..}</literal>" includes
1909 <literal>f</literal> if and only if:
1910 <itemizedlist>
1911 <listitem><para>
1912 <literal>f</literal> is a record field of constructor <literal>C</literal>.
1913 </para></listitem>
1914 <listitem><para>
1915 The record field <literal>f</literal> is in scope somehow (either qualified or unqualified).
1916 </para></listitem>
1917 <listitem><para>
1918 In the case of expressions (but not patterns),
1919 the variable <literal>f</literal> is in scope unqualified,
1920 apart from the binding of the record selector itself.
1921 </para></listitem>
1922 </itemizedlist>
1923 For example
1924 <programlisting>
1925 module M where
1926 data R = R { a,b,c :: Int }
1927 module X where
1928 import M( R(a,c) )
1929 f b = R { .. }
1930 </programlisting>
1931 The <literal>R{..}</literal> expands to <literal>R{M.a=a}</literal>,
1932 omitting <literal>b</literal> since the record field is not in scope,
1933 and omitting <literal>c</literal> since the variable <literal>c</literal>
1934 is not in scope (apart from the binding of the
1935 record selector <literal>c</literal>, of course).
1936 </para></listitem>
1937 </itemizedlist>
1938 </para>
1939
1940 </sect2>
1941
1942 <!-- ===================== Local fixity declarations =================== -->
1943
1944 <sect2 id="local-fixity-declarations">
1945 <title>Local Fixity Declarations
1946 </title>
1947
1948 <para>A careful reading of the Haskell 98 Report reveals that fixity
1949 declarations (<literal>infix</literal>, <literal>infixl</literal>, and
1950 <literal>infixr</literal>) are permitted to appear inside local bindings
1951 such those introduced by <literal>let</literal> and
1952 <literal>where</literal>. However, the Haskell Report does not specify
1953 the semantics of such bindings very precisely.
1954 </para>
1955
1956 <para>In GHC, a fixity declaration may accompany a local binding:
1957 <programlisting>
1958 let f = ...
1959 infixr 3 `f`
1960 in
1961 ...
1962 </programlisting>
1963 and the fixity declaration applies wherever the binding is in scope.
1964 For example, in a <literal>let</literal>, it applies in the right-hand
1965 sides of other <literal>let</literal>-bindings and the body of the
1966 <literal>let</literal>C. Or, in recursive <literal>do</literal>
1967 expressions (<xref linkend="recursive-do-notation"/>), the local fixity
1968 declarations of a <literal>let</literal> statement scope over other
1969 statements in the group, just as the bound name does.
1970 </para>
1971
1972 <para>
1973 Moreover, a local fixity declaration *must* accompany a local binding of
1974 that name: it is not possible to revise the fixity of name bound
1975 elsewhere, as in
1976 <programlisting>
1977 let infixr 9 $ in ...
1978 </programlisting>
1979
1980 Because local fixity declarations are technically Haskell 98, no flag is
1981 necessary to enable them.
1982 </para>
1983 </sect2>
1984
1985 <sect2 id="package-imports">
1986 <title>Package-qualified imports</title>
1987
1988 <para>With the <option>-XPackageImports</option> flag, GHC allows
1989 import declarations to be qualified by the package name that the
1990 module is intended to be imported from. For example:</para>
1991
1992 <programlisting>
1993 import "network" Network.Socket
1994 </programlisting>
1995
1996 <para>would import the module <literal>Network.Socket</literal> from
1997 the package <literal>network</literal> (any version). This may
1998 be used to disambiguate an import when the same module is
1999 available from multiple packages, or is present in both the
2000 current package being built and an external package.</para>
2001
2002 <para>Note: you probably don't need to use this feature, it was
2003 added mainly so that we can build backwards-compatible versions of
2004 packages when APIs change. It can lead to fragile dependencies in
2005 the common case: modules occasionally move from one package to
2006 another, rendering any package-qualified imports broken.</para>
2007 </sect2>
2008
2009 <sect2 id="safe-imports-ext">
2010 <title>Safe imports</title>
2011
2012 <para>With the <option>-XSafe</option>, <option>-XTrustworthy</option>
2013 and <option>-XUnsafe</option> language flags, GHC extends
2014 the import declaration syntax to take an optional <literal>safe</literal>
2015 keyword after the <literal>import</literal> keyword. This feature
2016 is part of the Safe Haskell GHC extension. For example:</para>
2017
2018 <programlisting>
2019 import safe qualified Network.Socket as NS
2020 </programlisting>
2021
2022 <para>would import the module <literal>Network.Socket</literal>
2023 with compilation only succeeding if Network.Socket can be
2024 safely imported. For a description of when a import is
2025 considered safe see <xref linkend="safe-haskell"/></para>
2026
2027 </sect2>
2028
2029 <sect2 id="syntax-stolen">
2030 <title>Summary of stolen syntax</title>
2031
2032 <para>Turning on an option that enables special syntax
2033 <emphasis>might</emphasis> cause working Haskell 98 code to fail
2034 to compile, perhaps because it uses a variable name which has
2035 become a reserved word. This section lists the syntax that is
2036 "stolen" by language extensions.
2037 We use
2038 notation and nonterminal names from the Haskell 98 lexical syntax
2039 (see the Haskell 98 Report).
2040 We only list syntax changes here that might affect
2041 existing working programs (i.e. "stolen" syntax). Many of these
2042 extensions will also enable new context-free syntax, but in all
2043 cases programs written to use the new syntax would not be
2044 compilable without the option enabled.</para>
2045
2046 <para>There are two classes of special
2047 syntax:
2048
2049 <itemizedlist>
2050 <listitem>
2051 <para>New reserved words and symbols: character sequences
2052 which are no longer available for use as identifiers in the
2053 program.</para>
2054 </listitem>
2055 <listitem>
2056 <para>Other special syntax: sequences of characters that have
2057 a different meaning when this particular option is turned
2058 on.</para>
2059 </listitem>
2060 </itemizedlist>
2061
2062 The following syntax is stolen:
2063
2064 <variablelist>
2065 <varlistentry>
2066 <term>
2067 <literal>forall</literal>
2068 <indexterm><primary><literal>forall</literal></primary></indexterm>
2069 </term>
2070 <listitem><para>
2071 Stolen (in types) by: <option>-XExplicitForAll</option>, and hence by
2072 <option>-XScopedTypeVariables</option>,
2073 <option>-XLiberalTypeSynonyms</option>,
2074 <option>-XRank2Types</option>,
2075 <option>-XRankNTypes</option>,
2076 <option>-XPolymorphicComponents</option>,
2077 <option>-XExistentialQuantification</option>
2078 </para></listitem>
2079 </varlistentry>
2080
2081 <varlistentry>
2082 <term>
2083 <literal>mdo</literal>
2084 <indexterm><primary><literal>mdo</literal></primary></indexterm>
2085 </term>
2086 <listitem><para>
2087 Stolen by: <option>-XRecursiveDo</option>
2088 </para></listitem>
2089 </varlistentry>
2090
2091 <varlistentry>
2092 <term>
2093 <literal>foreign</literal>
2094 <indexterm><primary><literal>foreign</literal></primary></indexterm>
2095 </term>
2096 <listitem><para>
2097 Stolen by: <option>-XForeignFunctionInterface</option>
2098 </para></listitem>
2099 </varlistentry>
2100
2101 <varlistentry>
2102 <term>
2103 <literal>rec</literal>,
2104 <literal>proc</literal>, <literal>-&lt;</literal>,
2105 <literal>&gt;-</literal>, <literal>-&lt;&lt;</literal>,
2106 <literal>&gt;&gt;-</literal>, and <literal>(|</literal>,
2107 <literal>|)</literal> brackets
2108 <indexterm><primary><literal>proc</literal></primary></indexterm>
2109 </term>
2110 <listitem><para>
2111 Stolen by: <option>-XArrows</option>
2112 </para></listitem>
2113 </varlistentry>
2114
2115 <varlistentry>
2116 <term>
2117 <literal>?<replaceable>varid</replaceable></literal>,
2118 <literal>%<replaceable>varid</replaceable></literal>
2119 <indexterm><primary>implicit parameters</primary></indexterm>
2120 </term>
2121 <listitem><para>
2122 Stolen by: <option>-XImplicitParams</option>
2123 </para></listitem>
2124 </varlistentry>
2125
2126 <varlistentry>
2127 <term>
2128 <literal>[|</literal>,
2129 <literal>[e|</literal>, <literal>[p|</literal>,
2130 <literal>[d|</literal>, <literal>[t|</literal>,
2131 <literal>$(</literal>,
2132 <literal>$<replaceable>varid</replaceable></literal>
2133 <indexterm><primary>Template Haskell</primary></indexterm>
2134 </term>
2135 <listitem><para>
2136 Stolen by: <option>-XTemplateHaskell</option>
2137 </para></listitem>
2138 </varlistentry>
2139
2140 <varlistentry>
2141 <term>
2142 <literal>[:<replaceable>varid</replaceable>|</literal>
2143 <indexterm><primary>quasi-quotation</primary></indexterm>
2144 </term>
2145 <listitem><para>
2146 Stolen by: <option>-XQuasiQuotes</option>
2147 </para></listitem>
2148 </varlistentry>
2149
2150 <varlistentry>
2151 <term>
2152 <replaceable>varid</replaceable>{<literal>&num;</literal>},
2153 <replaceable>char</replaceable><literal>&num;</literal>,
2154 <replaceable>string</replaceable><literal>&num;</literal>,
2155 <replaceable>integer</replaceable><literal>&num;</literal>,
2156 <replaceable>float</replaceable><literal>&num;</literal>,
2157 <replaceable>float</replaceable><literal>&num;&num;</literal>,
2158 <literal>(&num;</literal>, <literal>&num;)</literal>
2159 </term>
2160 <listitem><para>
2161 Stolen by: <option>-XMagicHash</option>
2162 </para></listitem>
2163 </varlistentry>
2164 </variablelist>
2165 </para>
2166 </sect2>
2167 </sect1>
2168
2169
2170 <!-- TYPE SYSTEM EXTENSIONS -->
2171 <sect1 id="data-type-extensions">
2172 <title>Extensions to data types and type synonyms</title>
2173
2174 <sect2 id="nullary-types">
2175 <title>Data types with no constructors</title>
2176
2177 <para>With the <option>-XEmptyDataDecls</option> flag (or equivalent LANGUAGE pragma),
2178 GHC lets you declare a data type with no constructors. For example:</para>
2179
2180 <programlisting>
2181 data S -- S :: *
2182 data T a -- T :: * -> *
2183 </programlisting>
2184
2185 <para>Syntactically, the declaration lacks the "= constrs" part. The
2186 type can be parameterised over types of any kind, but if the kind is
2187 not <literal>*</literal> then an explicit kind annotation must be used
2188 (see <xref linkend="kinding"/>).</para>
2189
2190 <para>Such data types have only one value, namely bottom.
2191 Nevertheless, they can be useful when defining "phantom types".</para>
2192 </sect2>
2193
2194 <sect2 id="datatype-contexts">
2195 <title>Data type contexts</title>
2196
2197 <para>Haskell allows datatypes to be given contexts, e.g.</para>
2198
2199 <programlisting>
2200 data Eq a => Set a = NilSet | ConsSet a (Set a)
2201 </programlisting>
2202
2203 <para>give constructors with types:</para>
2204
2205 <programlisting>
2206 NilSet :: Set a
2207 ConsSet :: Eq a => a -> Set a -> Set a
2208 </programlisting>
2209
2210 <para>This is widely considered a misfeature, and is going to be removed from
2211 the language. In GHC, it is controlled by the deprecated extension
2212 <literal>DatatypeContexts</literal>.</para>
2213 </sect2>
2214
2215 <sect2 id="infix-tycons">
2216 <title>Infix type constructors, classes, and type variables</title>
2217
2218 <para>
2219 GHC allows type constructors, classes, and type variables to be operators, and
2220 to be written infix, very much like expressions. More specifically:
2221 <itemizedlist>
2222 <listitem><para>
2223 A type constructor or class can be an operator, beginning with a colon; e.g. <literal>:*:</literal>.
2224 The lexical syntax is the same as that for data constructors.
2225 </para></listitem>
2226 <listitem><para>
2227 Data type and type-synonym declarations can be written infix, parenthesised
2228 if you want further arguments. E.g.
2229 <screen>
2230 data a :*: b = Foo a b
2231 type a :+: b = Either a b
2232 class a :=: b where ...
2233
2234 data (a :**: b) x = Baz a b x
2235 type (a :++: b) y = Either (a,b) y
2236 </screen>
2237 </para></listitem>
2238 <listitem><para>
2239 Types, and class constraints, can be written infix. For example
2240 <screen>
2241 x :: Int :*: Bool
2242 f :: (a :=: b) => a -> b
2243 </screen>
2244 </para></listitem>
2245 <listitem><para>
2246 A type variable can be an (unqualified) operator e.g. <literal>+</literal>.
2247 The lexical syntax is the same as that for variable operators, excluding "(.)",
2248 "(!)", and "(*)". In a binding position, the operator must be
2249 parenthesised. For example:
2250 <programlisting>
2251 type T (+) = Int + Int
2252 f :: T Either
2253 f = Left 3
2254
2255 liftA2 :: Arrow (~>)
2256 => (a -> b -> c) -> (e ~> a) -> (e ~> b) -> (e ~> c)
2257 liftA2 = ...
2258 </programlisting>
2259 </para></listitem>
2260 <listitem><para>
2261 Back-quotes work
2262 as for expressions, both for type constructors and type variables; e.g. <literal>Int `Either` Bool</literal>, or
2263 <literal>Int `a` Bool</literal>. Similarly, parentheses work the same; e.g. <literal>(:*:) Int Bool</literal>.
2264 </para></listitem>
2265 <listitem><para>
2266 Fixities may be declared for type constructors, or classes, just as for data constructors. However,
2267 one cannot distinguish between the two in a fixity declaration; a fixity declaration
2268 sets the fixity for a data constructor and the corresponding type constructor. For example:
2269 <screen>
2270 infixl 7 T, :*:
2271 </screen>
2272 sets the fixity for both type constructor <literal>T</literal> and data constructor <literal>T</literal>,
2273 and similarly for <literal>:*:</literal>.
2274 <literal>Int `a` Bool</literal>.
2275 </para></listitem>
2276 <listitem><para>
2277 Function arrow is <literal>infixr</literal> with fixity 0. (This might change; I'm not sure what it should be.)
2278 </para></listitem>
2279
2280 </itemizedlist>
2281 </para>
2282 </sect2>
2283
2284 <sect2 id="type-synonyms">
2285 <title>Liberalised type synonyms</title>
2286
2287 <para>
2288 Type synonyms are like macros at the type level, but Haskell 98 imposes many rules
2289 on individual synonym declarations.
2290 With the <option>-XLiberalTypeSynonyms</option> extension,
2291 GHC does validity checking on types <emphasis>only after expanding type synonyms</emphasis>.
2292 That means that GHC can be very much more liberal about type synonyms than Haskell 98.
2293
2294 <itemizedlist>
2295 <listitem> <para>You can write a <literal>forall</literal> (including overloading)
2296 in a type synonym, thus:
2297 <programlisting>
2298 type Discard a = forall b. Show b => a -> b -> (a, String)
2299
2300 f :: Discard a
2301 f x y = (x, show y)
2302
2303 g :: Discard Int -> (Int,String) -- A rank-2 type
2304 g f = f 3 True
2305 </programlisting>
2306 </para>
2307 </listitem>
2308
2309 <listitem><para>
2310 If you also use <option>-XUnboxedTuples</option>,
2311 you can write an unboxed tuple in a type synonym:
2312 <programlisting>
2313 type Pr = (# Int, Int #)
2314
2315 h :: Int -> Pr
2316 h x = (# x, x #)
2317 </programlisting>
2318 </para></listitem>
2319
2320 <listitem><para>
2321 You can apply a type synonym to a forall type:
2322 <programlisting>
2323 type Foo a = a -> a -> Bool
2324
2325 f :: Foo (forall b. b->b)
2326 </programlisting>
2327 After expanding the synonym, <literal>f</literal> has the legal (in GHC) type:
2328 <programlisting>
2329 f :: (forall b. b->b) -> (forall b. b->b) -> Bool
2330 </programlisting>
2331 </para></listitem>
2332
2333 <listitem><para>
2334 You can apply a type synonym to a partially applied type synonym:
2335 <programlisting>
2336 type Generic i o = forall x. i x -> o x
2337 type Id x = x
2338
2339 foo :: Generic Id []
2340 </programlisting>
2341 After expanding the synonym, <literal>foo</literal> has the legal (in GHC) type:
2342 <programlisting>
2343 foo :: forall x. x -> [x]
2344 </programlisting>
2345 </para></listitem>
2346
2347 </itemizedlist>
2348 </para>
2349
2350 <para>
2351 GHC currently does kind checking before expanding synonyms (though even that
2352 could be changed.)
2353 </para>
2354 <para>
2355 After expanding type synonyms, GHC does validity checking on types, looking for
2356 the following mal-formedness which isn't detected simply by kind checking:
2357 <itemizedlist>
2358 <listitem><para>
2359 Type constructor applied to a type involving for-alls.
2360 </para></listitem>
2361 <listitem><para>
2362 Unboxed tuple on left of an arrow.
2363 </para></listitem>
2364 <listitem><para>
2365 Partially-applied type synonym.
2366 </para></listitem>
2367 </itemizedlist>
2368 So, for example,
2369 this will be rejected:
2370 <programlisting>
2371 type Pr = (# Int, Int #)
2372
2373 h :: Pr -> Int
2374 h x = ...
2375 </programlisting>
2376 because GHC does not allow unboxed tuples on the left of a function arrow.
2377 </para>
2378 </sect2>
2379
2380
2381 <sect2 id="existential-quantification">
2382 <title>Existentially quantified data constructors
2383 </title>
2384
2385 <para>
2386 The idea of using existential quantification in data type declarations
2387 was suggested by Perry, and implemented in Hope+ (Nigel Perry, <emphasis>The Implementation
2388 of Practical Functional Programming Languages</emphasis>, PhD Thesis, University of
2389 London, 1991). It was later formalised by Laufer and Odersky
2390 (<emphasis>Polymorphic type inference and abstract data types</emphasis>,
2391 TOPLAS, 16(5), pp1411-1430, 1994).
2392 It's been in Lennart
2393 Augustsson's <command>hbc</command> Haskell compiler for several years, and
2394 proved very useful. Here's the idea. Consider the declaration:
2395 </para>
2396
2397 <para>
2398
2399 <programlisting>
2400 data Foo = forall a. MkFoo a (a -> Bool)
2401 | Nil
2402 </programlisting>
2403
2404 </para>
2405
2406 <para>
2407 The data type <literal>Foo</literal> has two constructors with types:
2408 </para>
2409
2410 <para>
2411
2412 <programlisting>
2413 MkFoo :: forall a. a -> (a -> Bool) -> Foo
2414 Nil :: Foo
2415 </programlisting>
2416
2417 </para>
2418
2419 <para>
2420 Notice that the type variable <literal>a</literal> in the type of <function>MkFoo</function>
2421 does not appear in the data type itself, which is plain <literal>Foo</literal>.
2422 For example, the following expression is fine:
2423 </para>
2424
2425 <para>
2426
2427 <programlisting>
2428 [MkFoo 3 even, MkFoo 'c' isUpper] :: [Foo]
2429 </programlisting>
2430
2431 </para>
2432
2433 <para>
2434 Here, <literal>(MkFoo 3 even)</literal> packages an integer with a function
2435 <function>even</function> that maps an integer to <literal>Bool</literal>; and <function>MkFoo 'c'
2436 isUpper</function> packages a character with a compatible function. These
2437 two things are each of type <literal>Foo</literal> and can be put in a list.
2438 </para>
2439
2440 <para>
2441 What can we do with a value of type <literal>Foo</literal>?. In particular,
2442 what happens when we pattern-match on <function>MkFoo</function>?
2443 </para>
2444
2445 <para>
2446
2447 <programlisting>
2448 f (MkFoo val fn) = ???
2449 </programlisting>
2450
2451 </para>
2452
2453 <para>
2454 Since all we know about <literal>val</literal> and <function>fn</function> is that they
2455 are compatible, the only (useful) thing we can do with them is to
2456 apply <function>fn</function> to <literal>val</literal> to get a boolean. For example:
2457 </para>
2458
2459 <para>
2460
2461 <programlisting>
2462 f :: Foo -> Bool
2463 f (MkFoo val fn) = fn val
2464 </programlisting>
2465
2466 </para>
2467
2468 <para>
2469 What this allows us to do is to package heterogeneous values
2470 together with a bunch of functions that manipulate them, and then treat
2471 that collection of packages in a uniform manner. You can express
2472 quite a bit of object-oriented-like programming this way.
2473 </para>
2474
2475 <sect3 id="existential">
2476 <title>Why existential?
2477 </title>
2478
2479 <para>
2480 What has this to do with <emphasis>existential</emphasis> quantification?
2481 Simply that <function>MkFoo</function> has the (nearly) isomorphic type
2482 </para>
2483
2484 <para>
2485
2486 <programlisting>
2487 MkFoo :: (exists a . (a, a -> Bool)) -> Foo
2488 </programlisting>
2489
2490 </para>
2491
2492 <para>
2493 But Haskell programmers can safely think of the ordinary
2494 <emphasis>universally</emphasis> quantified type given above, thereby avoiding
2495 adding a new existential quantification construct.
2496 </para>
2497
2498 </sect3>
2499
2500 <sect3 id="existential-with-context">
2501 <title>Existentials and type classes</title>
2502
2503 <para>
2504 An easy extension is to allow
2505 arbitrary contexts before the constructor. For example:
2506 </para>
2507
2508 <para>
2509
2510 <programlisting>
2511 data Baz = forall a. Eq a => Baz1 a a
2512 | forall b. Show b => Baz2 b (b -> b)
2513 </programlisting>
2514
2515 </para>
2516
2517 <para>
2518 The two constructors have the types you'd expect:
2519 </para>
2520
2521 <para>
2522
2523 <programlisting>
2524 Baz1 :: forall a. Eq a => a -> a -> Baz
2525 Baz2 :: forall b. Show b => b -> (b -> b) -> Baz
2526 </programlisting>
2527
2528 </para>
2529
2530 <para>
2531 But when pattern matching on <function>Baz1</function> the matched values can be compared
2532 for equality, and when pattern matching on <function>Baz2</function> the first matched
2533 value can be converted to a string (as well as applying the function to it).
2534 So this program is legal:
2535 </para>
2536
2537 <para>
2538
2539 <programlisting>
2540 f :: Baz -> String
2541 f (Baz1 p q) | p == q = "Yes"
2542 | otherwise = "No"
2543 f (Baz2 v fn) = show (fn v)
2544 </programlisting>
2545
2546 </para>
2547
2548 <para>
2549 Operationally, in a dictionary-passing implementation, the
2550 constructors <function>Baz1</function> and <function>Baz2</function> must store the
2551 dictionaries for <literal>Eq</literal> and <literal>Show</literal> respectively, and
2552 extract it on pattern matching.
2553 </para>
2554
2555 </sect3>
2556
2557 <sect3 id="existential-records">
2558 <title>Record Constructors</title>
2559
2560 <para>
2561 GHC allows existentials to be used with records syntax as well. For example:
2562
2563 <programlisting>
2564 data Counter a = forall self. NewCounter
2565 { _this :: self
2566 , _inc :: self -> self
2567 , _display :: self -> IO ()
2568 , tag :: a
2569 }
2570 </programlisting>
2571 Here <literal>tag</literal> is a public field, with a well-typed selector
2572 function <literal>tag :: Counter a -> a</literal>. The <literal>self</literal>
2573 type is hidden from the outside; any attempt to apply <literal>_this</literal>,
2574 <literal>_inc</literal> or <literal>_display</literal> as functions will raise a
2575 compile-time error. In other words, <emphasis>GHC defines a record selector function
2576 only for fields whose type does not mention the existentially-quantified variables</emphasis>.
2577 (This example used an underscore in the fields for which record selectors
2578 will not be defined, but that is only programming style; GHC ignores them.)
2579 </para>
2580
2581 <para>
2582 To make use of these hidden fields, we need to create some helper functions:
2583
2584 <programlisting>
2585 inc :: Counter a -> Counter a
2586 inc (NewCounter x i d t) = NewCounter
2587 { _this = i x, _inc = i, _display = d, tag = t }
2588
2589 display :: Counter a -> IO ()
2590 display NewCounter{ _this = x, _display = d } = d x
2591 </programlisting>
2592
2593 Now we can define counters with different underlying implementations:
2594
2595 <programlisting>
2596 counterA :: Counter String
2597 counterA = NewCounter
2598 { _this = 0, _inc = (1+), _display = print, tag = "A" }
2599
2600 counterB :: Counter String
2601 counterB = NewCounter
2602 { _this = "", _inc = ('#':), _display = putStrLn, tag = "B" }
2603
2604 main = do
2605 display (inc counterA) -- prints "1"
2606 display (inc (inc counterB)) -- prints "##"
2607 </programlisting>
2608
2609 Record update syntax is supported for existentials (and GADTs):
2610 <programlisting>
2611 setTag :: Counter a -> a -> Counter a
2612 setTag obj t = obj{ tag = t }
2613 </programlisting>
2614 The rule for record update is this: <emphasis>
2615 the types of the updated fields may
2616 mention only the universally-quantified type variables
2617 of the data constructor. For GADTs, the field may mention only types
2618 that appear as a simple type-variable argument in the constructor's result
2619 type</emphasis>. For example:
2620 <programlisting>
2621 data T a b where { T1 { f1::a, f2::b, f3::(b,c) } :: T a b } -- c is existential
2622 upd1 t x = t { f1=x } -- OK: upd1 :: T a b -> a' -> T a' b
2623 upd2 t x = t { f3=x } -- BAD (f3's type mentions c, which is
2624 -- existentially quantified)
2625
2626 data G a b where { G1 { g1::a, g2::c } :: G a [c] }
2627 upd3 g x = g { g1=x } -- OK: upd3 :: G a b -> c -> G c b
2628 upd4 g x = g { g2=x } -- BAD (f2's type mentions c, which is not a simple
2629 -- type-variable argument in G1's result type)
2630 </programlisting>
2631 </para>
2632
2633 </sect3>
2634
2635
2636 <sect3>
2637 <title>Restrictions</title>
2638
2639 <para>
2640 There are several restrictions on the ways in which existentially-quantified
2641 constructors can be use.
2642 </para>
2643
2644 <para>
2645
2646 <itemizedlist>
2647 <listitem>
2648
2649 <para>
2650 When pattern matching, each pattern match introduces a new,
2651 distinct, type for each existential type variable. These types cannot
2652 be unified with any other type, nor can they escape from the scope of
2653 the pattern match. For example, these fragments are incorrect:
2654
2655
2656 <programlisting>
2657 f1 (MkFoo a f) = a
2658 </programlisting>
2659
2660
2661 Here, the type bound by <function>MkFoo</function> "escapes", because <literal>a</literal>
2662 is the result of <function>f1</function>. One way to see why this is wrong is to
2663 ask what type <function>f1</function> has:
2664
2665
2666 <programlisting>
2667 f1 :: Foo -> a -- Weird!
2668 </programlisting>
2669
2670
2671 What is this "<literal>a</literal>" in the result type? Clearly we don't mean
2672 this:
2673
2674
2675 <programlisting>
2676 f1 :: forall a. Foo -> a -- Wrong!
2677 </programlisting>
2678
2679
2680 The original program is just plain wrong. Here's another sort of error
2681
2682
2683 <programlisting>
2684 f2 (Baz1 a b) (Baz1 p q) = a==q
2685 </programlisting>
2686
2687
2688 It's ok to say <literal>a==b</literal> or <literal>p==q</literal>, but
2689 <literal>a==q</literal> is wrong because it equates the two distinct types arising
2690 from the two <function>Baz1</function> constructors.
2691
2692
2693 </para>
2694 </listitem>
2695 <listitem>
2696
2697 <para>
2698 You can't pattern-match on an existentially quantified
2699 constructor in a <literal>let</literal> or <literal>where</literal> group of
2700 bindings. So this is illegal:
2701
2702
2703 <programlisting>
2704 f3 x = a==b where { Baz1 a b = x }
2705 </programlisting>
2706
2707 Instead, use a <literal>case</literal> expression:
2708
2709 <programlisting>
2710 f3 x = case x of Baz1 a b -> a==b
2711 </programlisting>
2712
2713 In general, you can only pattern-match
2714 on an existentially-quantified constructor in a <literal>case</literal> expression or
2715 in the patterns of a function definition.
2716
2717 The reason for this restriction is really an implementation one.
2718 Type-checking binding groups is already a nightmare without
2719 existentials complicating the picture. Also an existential pattern
2720 binding at the top level of a module doesn't make sense, because it's
2721 not clear how to prevent the existentially-quantified type "escaping".
2722 So for now, there's a simple-to-state restriction. We'll see how
2723 annoying it is.
2724
2725 </para>
2726 </listitem>
2727 <listitem>
2728
2729 <para>
2730 You can't use existential quantification for <literal>newtype</literal>
2731 declarations. So this is illegal:
2732
2733
2734 <programlisting>
2735 newtype T = forall a. Ord a => MkT a
2736 </programlisting>
2737
2738
2739 Reason: a value of type <literal>T</literal> must be represented as a
2740 pair of a dictionary for <literal>Ord t</literal> and a value of type
2741 <literal>t</literal>. That contradicts the idea that
2742 <literal>newtype</literal> should have no concrete representation.
2743 You can get just the same efficiency and effect by using
2744 <literal>data</literal> instead of <literal>newtype</literal>. If
2745 there is no overloading involved, then there is more of a case for
2746 allowing an existentially-quantified <literal>newtype</literal>,
2747 because the <literal>data</literal> version does carry an
2748 implementation cost, but single-field existentially quantified
2749 constructors aren't much use. So the simple restriction (no
2750 existential stuff on <literal>newtype</literal>) stands, unless there
2751 are convincing reasons to change it.
2752
2753
2754 </para>
2755 </listitem>
2756 <listitem>
2757
2758 <para>
2759 You can't use <literal>deriving</literal> to define instances of a
2760 data type with existentially quantified data constructors.
2761
2762 Reason: in most cases it would not make sense. For example:;
2763
2764 <programlisting>
2765 data T = forall a. MkT [a] deriving( Eq )
2766 </programlisting>
2767
2768 To derive <literal>Eq</literal> in the standard way we would need to have equality
2769 between the single component of two <function>MkT</function> constructors:
2770
2771 <programlisting>
2772 instance Eq T where
2773 (MkT a) == (MkT b) = ???
2774 </programlisting>
2775
2776 But <varname>a</varname> and <varname>b</varname> have distinct types, and so can't be compared.
2777 It's just about possible to imagine examples in which the derived instance
2778 would make sense, but it seems altogether simpler simply to prohibit such
2779 declarations. Define your own instances!
2780 </para>
2781 </listitem>
2782
2783 </itemizedlist>
2784
2785 </para>
2786
2787 </sect3>
2788 </sect2>
2789
2790 <!-- ====================== Generalised algebraic data types ======================= -->
2791
2792 <sect2 id="gadt-style">
2793 <title>Declaring data types with explicit constructor signatures</title>
2794
2795 <para>When the <literal>GADTSyntax</literal> extension is enabled,
2796 GHC allows you to declare an algebraic data type by
2797 giving the type signatures of constructors explicitly. For example:
2798 <programlisting>
2799 data Maybe a where
2800 Nothing :: Maybe a
2801 Just :: a -> Maybe a
2802 </programlisting>
2803 The form is called a "GADT-style declaration"
2804 because Generalised Algebraic Data Types, described in <xref linkend="gadt"/>,
2805 can only be declared using this form.</para>
2806 <para>Notice that GADT-style syntax generalises existential types (<xref linkend="existential-quantification"/>).
2807 For example, these two declarations are equivalent:
2808 <programlisting>
2809 data Foo = forall a. MkFoo a (a -> Bool)
2810 data Foo' where { MKFoo :: a -> (a->Bool) -> Foo' }
2811 </programlisting>
2812 </para>
2813 <para>Any data type that can be declared in standard Haskell-98 syntax
2814 can also be declared using GADT-style syntax.
2815 The choice is largely stylistic, but GADT-style declarations differ in one important respect:
2816 they treat class constraints on the data constructors differently.
2817 Specifically, if the constructor is given a type-class context, that
2818 context is made available by pattern matching. For example:
2819 <programlisting>
2820 data Set a where
2821 MkSet :: Eq a => [a] -> Set a
2822
2823 makeSet :: Eq a => [a] -> Set a
2824 makeSet xs = MkSet (nub xs)
2825
2826 insert :: a -> Set a -> Set a
2827 insert a (MkSet as) | a `elem` as = MkSet as
2828 | otherwise = MkSet (a:as)
2829 </programlisting>
2830 A use of <literal>MkSet</literal> as a constructor (e.g. in the definition of <literal>makeSet</literal>)
2831 gives rise to a <literal>(Eq a)</literal>
2832 constraint, as you would expect. The new feature is that pattern-matching on <literal>MkSet</literal>
2833 (as in the definition of <literal>insert</literal>) makes <emphasis>available</emphasis> an <literal>(Eq a)</literal>
2834 context. In implementation terms, the <literal>MkSet</literal> constructor has a hidden field that stores
2835 the <literal>(Eq a)</literal> dictionary that is passed to <literal>MkSet</literal>; so
2836 when pattern-matching that dictionary becomes available for the right-hand side of the match.
2837 In the example, the equality dictionary is used to satisfy the equality constraint
2838 generated by the call to <literal>elem</literal>, so that the type of
2839 <literal>insert</literal> itself has no <literal>Eq</literal> constraint.
2840 </para>
2841 <para>
2842 For example, one possible application is to reify dictionaries:
2843 <programlisting>
2844 data NumInst a where
2845 MkNumInst :: Num a => NumInst a
2846
2847 intInst :: NumInst Int
2848 intInst = MkNumInst
2849
2850 plus :: NumInst a -> a -> a -> a
2851 plus MkNumInst p q = p + q
2852 </programlisting>
2853 Here, a value of type <literal>NumInst a</literal> is equivalent
2854 to an explicit <literal>(Num a)</literal> dictionary.
2855 </para>
2856 <para>
2857 All this applies to constructors declared using the syntax of <xref linkend="existential-with-context"/>.
2858 For example, the <literal>NumInst</literal> data type above could equivalently be declared
2859 like this:
2860 <programlisting>
2861 data NumInst a
2862 = Num a => MkNumInst (NumInst a)
2863 </programlisting>
2864 Notice that, unlike the situation when declaring an existential, there is
2865 no <literal>forall</literal>, because the <literal>Num</literal> constrains the
2866 data type's universally quantified type variable <literal>a</literal>.
2867 A constructor may have both universal and existential type variables: for example,
2868 the following two declarations are equivalent:
2869 <programlisting>
2870 data T1 a
2871 = forall b. (Num a, Eq b) => MkT1 a b
2872 data T2 a where
2873 MkT2 :: (Num a, Eq b) => a -> b -> T2 a
2874 </programlisting>
2875 </para>
2876 <para>All this behaviour contrasts with Haskell 98's peculiar treatment of
2877 contexts on a data type declaration (Section 4.2.1 of the Haskell 98 Report).
2878 In Haskell 98 the definition
2879 <programlisting>
2880 data Eq a => Set' a = MkSet' [a]
2881 </programlisting>
2882 gives <literal>MkSet'</literal> the same type as <literal>MkSet</literal> above. But instead of
2883 <emphasis>making available</emphasis> an <literal>(Eq a)</literal> constraint, pattern-matching
2884 on <literal>MkSet'</literal> <emphasis>requires</emphasis> an <literal>(Eq a)</literal> constraint!
2885 GHC faithfully implements this behaviour, odd though it is. But for GADT-style declarations,
2886 GHC's behaviour is much more useful, as well as much more intuitive.
2887 </para>
2888
2889 <para>
2890 The rest of this section gives further details about GADT-style data
2891 type declarations.
2892
2893 <itemizedlist>
2894 <listitem><para>
2895 The result type of each data constructor must begin with the type constructor being defined.
2896 If the result type of all constructors
2897 has the form <literal>T a1 ... an</literal>, where <literal>a1 ... an</literal>
2898 are distinct type variables, then the data type is <emphasis>ordinary</emphasis>;
2899 otherwise is a <emphasis>generalised</emphasis> data type (<xref linkend="gadt"/>).
2900 </para></listitem>
2901
2902 <listitem><para>
2903 As with other type signatures, you can give a single signature for several data constructors.
2904 In this example we give a single signature for <literal>T1</literal> and <literal>T2</literal>:
2905 <programlisting>
2906 data T a where
2907 T1,T2 :: a -> T a
2908 T3 :: T a
2909 </programlisting>
2910 </para></listitem>
2911
2912 <listitem><para>
2913 The type signature of
2914 each constructor is independent, and is implicitly universally quantified as usual.
2915 In particular, the type variable(s) in the "<literal>data T a where</literal>" header
2916 have no scope, and different constructors may have different universally-quantified type variables:
2917 <programlisting>
2918 data T a where -- The 'a' has no scope
2919 T1,T2 :: b -> T b -- Means forall b. b -> T b
2920 T3 :: T a -- Means forall a. T a
2921 </programlisting>
2922 </para></listitem>
2923
2924 <listitem><para>
2925 A constructor signature may mention type class constraints, which can differ for
2926 different constructors. For example, this is fine:
2927 <programlisting>
2928 data T a where
2929 T1 :: Eq b => b -> b -> T b
2930 T2 :: (Show c, Ix c) => c -> [c] -> T c
2931 </programlisting>
2932 When pattern matching, these constraints are made available to discharge constraints
2933 in the body of the match. For example:
2934 <programlisting>
2935 f :: T a -> String
2936 f (T1 x y) | x==y = "yes"
2937 | otherwise = "no"
2938 f (T2 a b) = show a
2939 </programlisting>
2940 Note that <literal>f</literal> is not overloaded; the <literal>Eq</literal> constraint arising
2941 from the use of <literal>==</literal> is discharged by the pattern match on <literal>T1</literal>
2942 and similarly the <literal>Show</literal> constraint arising from the use of <literal>show</literal>.
2943 </para></listitem>
2944
2945 <listitem><para>
2946 Unlike a Haskell-98-style
2947 data type declaration, the type variable(s) in the "<literal>data Set a where</literal>" header
2948 have no scope. Indeed, one can write a kind signature instead:
2949 <programlisting>
2950 data Set :: * -> * where ...
2951 </programlisting>
2952 or even a mixture of the two:
2953 <programlisting>
2954 data Bar a :: (* -> *) -> * where ...
2955 </programlisting>
2956 The type variables (if given) may be explicitly kinded, so we could also write the header for <literal>Foo</literal>
2957 like this:
2958 <programlisting>
2959 data Bar a (b :: * -> *) where ...
2960 </programlisting>
2961 </para></listitem>
2962
2963
2964 <listitem><para>
2965 You can use strictness annotations, in the obvious places
2966 in the constructor type:
2967 <programlisting>
2968 data Term a where
2969 Lit :: !Int -> Term Int
2970 If :: Term Bool -> !(Term a) -> !(Term a) -> Term a
2971 Pair :: Term a -> Term b -> Term (a,b)
2972 </programlisting>
2973 </para></listitem>
2974
2975 <listitem><para>
2976 You can use a <literal>deriving</literal> clause on a GADT-style data type
2977 declaration. For example, these two declarations are equivalent
2978 <programlisting>
2979 data Maybe1 a where {
2980 Nothing1 :: Maybe1 a ;
2981 Just1 :: a -> Maybe1 a
2982 } deriving( Eq, Ord )
2983
2984 data Maybe2 a = Nothing2 | Just2 a
2985 deriving( Eq, Ord )
2986 </programlisting>
2987 </para></listitem>
2988
2989 <listitem><para>
2990 The type signature may have quantified type variables that do not appear
2991 in the result type:
2992 <programlisting>
2993 data Foo where
2994 MkFoo :: a -> (a->Bool) -> Foo
2995 Nil :: Foo
2996 </programlisting>
2997 Here the type variable <literal>a</literal> does not appear in the result type
2998 of either constructor.
2999 Although it is universally quantified in the type of the constructor, such
3000 a type variable is often called "existential".
3001 Indeed, the above declaration declares precisely the same type as
3002 the <literal>data Foo</literal> in <xref linkend="existential-quantification"/>.
3003 </para><para>
3004 The type may contain a class context too, of course:
3005 <programlisting>
3006 data Showable where
3007 MkShowable :: Show a => a -> Showable
3008 </programlisting>
3009 </para></listitem>
3010
3011 <listitem><para>
3012 You can use record syntax on a GADT-style data type declaration:
3013
3014 <programlisting>
3015 data Person where
3016 Adult :: { name :: String, children :: [Person] } -> Person
3017 Child :: Show a => { name :: !String, funny :: a } -> Person
3018 </programlisting>
3019 As usual, for every constructor that has a field <literal>f</literal>, the type of
3020 field <literal>f</literal> must be the same (modulo alpha conversion).
3021 The <literal>Child</literal> constructor above shows that the signature
3022 may have a context, existentially-quantified variables, and strictness annotations,
3023 just as in the non-record case. (NB: the "type" that follows the double-colon
3024 is not really a type, because of the record syntax and strictness annotations.
3025 A "type" of this form can appear only in a constructor signature.)
3026 </para></listitem>
3027
3028 <listitem><para>
3029 Record updates are allowed with GADT-style declarations,
3030 only fields that have the following property: the type of the field
3031 mentions no existential type variables.
3032 </para></listitem>
3033
3034 <listitem><para>
3035 As in the case of existentials declared using the Haskell-98-like record syntax
3036 (<xref linkend="existential-records"/>),
3037 record-selector functions are generated only for those fields that have well-typed
3038 selectors.
3039 Here is the example of that section, in GADT-style syntax:
3040 <programlisting>
3041 data Counter a where
3042 NewCounter { _this :: self
3043 , _inc :: self -> self
3044 , _display :: self -> IO ()
3045 , tag :: a
3046 }
3047 :: Counter a
3048 </programlisting>
3049 As before, only one selector function is generated here, that for <literal>tag</literal>.
3050 Nevertheless, you can still use all the field names in pattern matching and record construction.
3051 </para></listitem>
3052
3053 <listitem><para>
3054 In a GADT-style data type declaration there is no obvious way to specify that a data constructor
3055 should be infix, which makes a difference if you derive <literal>Show</literal> for the type.
3056 (Data constructors declared infix are displayed infix by the derived <literal>show</literal>.)
3057 So GHC implements the following design: a data constructor declared in a GADT-style data type
3058 declaration is displayed infix by <literal>Show</literal> iff (a) it is an operator symbol,
3059 (b) it has two arguments, (c) it has a programmer-supplied fixity declaration. For example
3060 <programlisting>
3061 infix 6 (:--:)
3062 data T a where
3063 (:--:) :: Int -> Bool -> T Int
3064 </programlisting>
3065 </para></listitem>
3066 </itemizedlist></para>
3067 </sect2>
3068
3069 <sect2 id="gadt">
3070 <title>Generalised Algebraic Data Types (GADTs)</title>
3071
3072 <para>Generalised Algebraic Data Types generalise ordinary algebraic data types
3073 by allowing constructors to have richer return types. Here is an example:
3074 <programlisting>
3075 data Term a where
3076 Lit :: Int -> Term Int
3077 Succ :: Term Int -> Term Int
3078 IsZero :: Term Int -> Term Bool
3079 If :: Term Bool -> Term a -> Term a -> Term a
3080 Pair :: Term a -> Term b -> Term (a,b)
3081 </programlisting>
3082 Notice that the return type of the constructors is not always <literal>Term a</literal>, as is the
3083 case with ordinary data types. This generality allows us to
3084 write a well-typed <literal>eval</literal> function
3085 for these <literal>Terms</literal>:
3086 <programlisting>
3087 eval :: Term a -> a
3088 eval (Lit i) = i
3089 eval (Succ t) = 1 + eval t
3090 eval (IsZero t) = eval t == 0
3091 eval (If b e1 e2) = if eval b then eval e1 else eval e2
3092 eval (Pair e1 e2) = (eval e1, eval e2)
3093 </programlisting>
3094 The key point about GADTs is that <emphasis>pattern matching causes type refinement</emphasis>.
3095 For example, in the right hand side of the equation
3096 <programlisting>
3097 eval :: Term a -> a
3098 eval (Lit i) = ...
3099 </programlisting>
3100 the type <literal>a</literal> is refined to <literal>Int</literal>. That's the whole point!
3101 A precise specification of the type rules is beyond what this user manual aspires to,
3102 but the design closely follows that described in
3103 the paper <ulink
3104 url="http://research.microsoft.com/%7Esimonpj/papers/gadt/">Simple
3105 unification-based type inference for GADTs</ulink>,
3106 (ICFP 2006).
3107 The general principle is this: <emphasis>type refinement is only carried out
3108 based on user-supplied type annotations</emphasis>.
3109 So if no type signature is supplied for <literal>eval</literal>, no type refinement happens,
3110 and lots of obscure error messages will
3111 occur. However, the refinement is quite general. For example, if we had:
3112 <programlisting>
3113 eval :: Term a -> a -> a
3114 eval (Lit i) j = i+j
3115 </programlisting>
3116 the pattern match causes the type <literal>a</literal> to be refined to <literal>Int</literal> (because of the type
3117 of the constructor <literal>Lit</literal>), and that refinement also applies to the type of <literal>j</literal>, and
3118 the result type of the <literal>case</literal> expression. Hence the addition <literal>i+j</literal> is legal.
3119 </para>
3120 <para>
3121 These and many other examples are given in papers by Hongwei Xi, and
3122 Tim Sheard. There is a longer introduction
3123 <ulink url="http://www.haskell.org/haskellwiki/GADT">on the wiki</ulink>,
3124 and Ralf Hinze's
3125 <ulink url="http://www.informatik.uni-bonn.de/~ralf/publications/With.pdf">Fun with phantom types</ulink> also has a number of examples. Note that papers
3126 may use different notation to that implemented in GHC.
3127 </para>
3128 <para>
3129 The rest of this section outlines the extensions to GHC that support GADTs. The extension is enabled with
3130 <option>-XGADTs</option>. The <option>-XGADTs</option> flag also sets <option>-XRelaxedPolyRec</option>.
3131 <itemizedlist>
3132 <listitem><para>
3133 A GADT can only be declared using GADT-style syntax (<xref linkend="gadt-style"/>);
3134 the old Haskell-98 syntax for data declarations always declares an ordinary data type.
3135 The result type of each constructor must begin with the type constructor being defined,
3136 but for a GADT the arguments to the type constructor can be arbitrary monotypes.
3137 For example, in the <literal>Term</literal> data
3138 type above, the type of each constructor must end with <literal>Term ty</literal>, but
3139 the <literal>ty</literal> need not be a type variable (e.g. the <literal>Lit</literal>
3140 constructor).
3141 </para></listitem>
3142
3143 <listitem><para>
3144 It is permitted to declare an ordinary algebraic data type using GADT-style syntax.
3145 What makes a GADT into a GADT is not the syntax, but rather the presence of data constructors
3146 whose result type is not just <literal>T a b</literal>.
3147 </para></listitem>
3148
3149 <listitem><para>
3150 You cannot use a <literal>deriving</literal> clause for a GADT; only for
3151 an ordinary data type.
3152 </para></listitem>
3153
3154 <listitem><para>
3155 As mentioned in <xref linkend="gadt-style"/>, record syntax is supported.
3156 For example:
3157 <programlisting>
3158 data Term a where
3159 Lit { val :: Int } :: Term Int
3160 Succ { num :: Term Int } :: Term Int
3161 Pred { num :: Term Int } :: Term Int
3162 IsZero { arg :: Term Int } :: Term Bool
3163 Pair { arg1 :: Term a
3164 , arg2 :: Term b
3165 } :: Term (a,b)
3166 If { cnd :: Term Bool
3167 , tru :: Term a
3168 , fls :: Term a
3169 } :: Term a
3170 </programlisting>
3171 However, for GADTs there is the following additional constraint:
3172 every constructor that has a field <literal>f</literal> must have
3173 the same result type (modulo alpha conversion)
3174 Hence, in the above example, we cannot merge the <literal>num</literal>
3175 and <literal>arg</literal> fields above into a
3176 single name. Although their field types are both <literal>Term Int</literal>,
3177 their selector functions actually have different types:
3178
3179 <programlisting>
3180 num :: Term Int -> Term Int
3181 arg :: Term Bool -> Term Int
3182 </programlisting>
3183 </para></listitem>
3184
3185 <listitem><para>
3186 When pattern-matching against data constructors drawn from a GADT,
3187 for example in a <literal>case</literal> expression, the following rules apply:
3188 <itemizedlist>
3189 <listitem><para>The type of the scrutinee must be rigid.</para></listitem>
3190 <listitem><para>The type of the entire <literal>case</literal> expression must be rigid.</para></listitem>
3191 <listitem><para>The type of any free variable mentioned in any of
3192 the <literal>case</literal> alternatives must be rigid.</para></listitem>
3193 </itemizedlist>
3194 A type is "rigid" if it is completely known to the compiler at its binding site. The easiest
3195 way to ensure that a variable a rigid type is to give it a type signature.
3196 For more precise details see <ulink url="http://research.microsoft.com/%7Esimonpj/papers/gadt">
3197 Simple unification-based type inference for GADTs
3198 </ulink>. The criteria implemented by GHC are given in the Appendix.
3199
3200 </para></listitem>
3201
3202 </itemizedlist>
3203 </para>
3204
3205 </sect2>
3206 </sect1>
3207
3208 <!-- ====================== End of Generalised algebraic data types ======================= -->
3209
3210 <sect1 id="deriving">
3211 <title>Extensions to the "deriving" mechanism</title>
3212
3213 <sect2 id="deriving-inferred">
3214 <title>Inferred context for deriving clauses</title>
3215
3216 <para>
3217 The Haskell Report is vague about exactly when a <literal>deriving</literal> clause is
3218 legal. For example:
3219 <programlisting>
3220 data T0 f a = MkT0 a deriving( Eq )
3221 data T1 f a = MkT1 (f a) deriving( Eq )
3222 data T2 f a = MkT2 (f (f a)) deriving( Eq )
3223 </programlisting>
3224 The natural generated <literal>Eq</literal> code would result in these instance declarations:
3225 <programlisting>
3226 instance Eq a => Eq (T0 f a) where ...
3227 instance Eq (f a) => Eq (T1 f a) where ...
3228 instance Eq (f (f a)) => Eq (T2 f a) where ...
3229 </programlisting>
3230 The first of these is obviously fine. The second is still fine, although less obviously.
3231 The third is not Haskell 98, and risks losing termination of instances.
3232 </para>
3233 <para>
3234 GHC takes a conservative position: it accepts the first two, but not the third. The rule is this:
3235 each constraint in the inferred instance context must consist only of type variables,
3236 with no repetitions.
3237 </para>
3238 <para>
3239 This rule is applied regardless of flags. If you want a more exotic context, you can write
3240 it yourself, using the <link linkend="stand-alone-deriving">standalone deriving mechanism</link>.
3241 </para>
3242 </sect2>
3243
3244 <sect2 id="stand-alone-deriving">
3245 <title>Stand-alone deriving declarations</title>
3246
3247 <para>
3248 GHC now allows stand-alone <literal>deriving</literal> declarations, enabled by <literal>-XStandaloneDeriving</literal>:
3249 <programlisting>
3250 data Foo a = Bar a | Baz String
3251
3252 deriving instance Eq a => Eq (Foo a)
3253 </programlisting>
3254 The syntax is identical to that of an ordinary instance declaration apart from (a) the keyword
3255 <literal>deriving</literal>, and (b) the absence of the <literal>where</literal> part.
3256 Note the following points:
3257 <itemizedlist>
3258 <listitem><para>
3259 You must supply an explicit context (in the example the context is <literal>(Eq a)</literal>),
3260 exactly as you would in an ordinary instance declaration.
3261 (In contrast, in a <literal>deriving</literal> clause
3262 attached to a data type declaration, the context is inferred.)
3263 </para></listitem>
3264
3265 <listitem><para>
3266 A <literal>deriving instance</literal> declaration
3267 must obey the same rules concerning form and termination as ordinary instance declarations,
3268 controlled by the same flags; see <xref linkend="instance-decls"/>.
3269 </para></listitem>
3270
3271 <listitem><para>
3272 Unlike a <literal>deriving</literal>
3273 declaration attached to a <literal>data</literal> declaration, the instance can be more specific
3274 than the data type (assuming you also use
3275 <literal>-XFlexibleInstances</literal>, <xref linkend="instance-rules"/>). Consider
3276 for example
3277 <programlisting>
3278 data Foo a = Bar a | Baz String
3279
3280 deriving instance Eq a => Eq (Foo [a])
3281 deriving instance Eq a => Eq (Foo (Maybe a))
3282 </programlisting>
3283 This will generate a derived instance for <literal>(Foo [a])</literal> and <literal>(Foo (Maybe a))</literal>,
3284 but other types such as <literal>(Foo (Int,Bool))</literal> will not be an instance of <literal>Eq</literal>.
3285 </para></listitem>
3286
3287 <listitem><para>
3288 Unlike a <literal>deriving</literal>
3289 declaration attached to a <literal>data</literal> declaration,
3290 GHC does not restrict the form of the data type. Instead, GHC simply generates the appropriate
3291 boilerplate code for the specified class, and typechecks it. If there is a type error, it is
3292 your problem. (GHC will show you the offending code if it has a type error.)
3293 The merit of this is that you can derive instances for GADTs and other exotic
3294 data types, providing only that the boilerplate code does indeed typecheck. For example:
3295 <programlisting>
3296 data T a where
3297 T1 :: T Int
3298 T2 :: T Bool
3299
3300 deriving instance Show (T a)
3301 </programlisting>
3302 In this example, you cannot say <literal>... deriving( Show )</literal> on the
3303 data type declaration for <literal>T</literal>,
3304 because <literal>T</literal> is a GADT, but you <emphasis>can</emphasis> generate
3305 the instance declaration using stand-alone deriving.
3306 </para>
3307 </listitem>
3308
3309 <listitem>
3310 <para>The stand-alone syntax is generalised for newtypes in exactly the same
3311 way that ordinary <literal>deriving</literal> clauses are generalised (<xref linkend="newtype-deriving"/>).
3312 For example:
3313 <programlisting>
3314 newtype Foo a = MkFoo (State Int a)
3315
3316 deriving instance MonadState Int Foo
3317 </programlisting>
3318 GHC always treats the <emphasis>last</emphasis> parameter of the instance
3319 (<literal>Foo</literal> in this example) as the type whose instance is being derived.
3320 </para></listitem>
3321 </itemizedlist></para>
3322
3323 </sect2>
3324
3325
3326 <sect2 id="deriving-typeable">
3327 <title>Deriving clause for extra classes (<literal>Typeable</literal>, <literal>Data</literal>, etc)</title>
3328
3329 <para>
3330 Haskell 98 allows the programmer to add "<literal>deriving( Eq, Ord )</literal>" to a data type
3331 declaration, to generate a standard instance declaration for classes specified in the <literal>deriving</literal> clause.
3332 In Haskell 98, the only classes that may appear in the <literal>deriving</literal> clause are the standard
3333 classes <literal>Eq</literal>, <literal>Ord</literal>,
3334 <literal>Enum</literal>, <literal>Ix</literal>, <literal>Bounded</literal>, <literal>Read</literal>, and <literal>Show</literal>.
3335 </para>
3336 <para>
3337 GHC extends this list with several more classes that may be automatically derived:
3338 <itemizedlist>
3339 <listitem><para> With <option>-XDeriveDataTypeable</option>, you can derive instances of the classes
3340 <literal>Typeable</literal>, and <literal>Data</literal>, defined in the library
3341 modules <literal>Data.Typeable</literal> and <literal>Data.Generics</literal> respectively.
3342 </para>
3343 <para>An instance of <literal>Typeable</literal> can only be derived if the
3344 data type has seven or fewer type parameters, all of kind <literal>*</literal>.
3345 The reason for this is that the <literal>Typeable</literal> class is derived using the scheme
3346 described in
3347 <ulink url="http://research.microsoft.com/%7Esimonpj/papers/hmap/gmap2.ps">
3348 Scrap More Boilerplate: Reflection, Zips, and Generalised Casts
3349 </ulink>.
3350 (Section 7.4 of the paper describes the multiple <literal>Typeable</literal> classes that
3351 are used, and only <literal>Typeable1</literal> up to
3352 <literal>Typeable7</literal> are provided in the library.)
3353 In other cases, there is nothing to stop the programmer writing a <literal>TypeableX</literal>
3354 class, whose kind suits that of the data type constructor, and
3355 then writing the data type instance by hand.
3356 </para>
3357 </listitem>
3358
3359 <listitem><para> With <option>-XDeriveGeneric</option>, you can derive
3360 instances of the classes <literal>Generic</literal> and
3361 <literal>Generic1</literal>, defined in <literal>GHC.Generics</literal>.
3362 You can use these to define generic functions,
3363 as described in <xref linkend="generic-programming"/>.
3364 </para></listitem>
3365
3366 <listitem><para> With <option>-XDeriveFunctor</option>, you can derive instances of
3367 the class <literal>Functor</literal>,
3368 defined in <literal>GHC.Base</literal>.
3369 </para></listitem>
3370
3371 <listitem><para> With <option>-XDeriveFoldable</option>, you can derive instances of
3372 the class <literal>Foldable</literal>,
3373 defined in <literal>Data.Foldable</literal>.
3374 </para></listitem>
3375
3376 <listitem><para> With <option>-XDeriveTraversable</option>, you can derive instances of
3377 the class <literal>Traversable</literal>,
3378 defined in <literal>Data.Traversable</literal>.
3379 </para></listitem>
3380 </itemizedlist>
3381 In each case the appropriate class must be in scope before it
3382 can be mentioned in the <literal>deriving</literal> clause.
3383 </para>
3384 </sect2>
3385
3386 <sect2 id="newtype-deriving">
3387 <title>Generalised derived instances for newtypes</title>
3388
3389 <para>
3390 When you define an abstract type using <literal>newtype</literal>, you may want
3391 the new type to inherit some instances from its representation. In
3392 Haskell 98, you can inherit instances of <literal>Eq</literal>, <literal>Ord</literal>,
3393 <literal>Enum</literal> and <literal>Bounded</literal> by deriving them, but for any
3394 other classes you have to write an explicit instance declaration. For
3395 example, if you define
3396
3397 <programlisting>
3398 newtype Dollars = Dollars Int
3399 </programlisting>
3400
3401 and you want to use arithmetic on <literal>Dollars</literal>, you have to
3402 explicitly define an instance of <literal>Num</literal>:
3403
3404 <programlisting>
3405 instance Num Dollars where
3406 Dollars a + Dollars b = Dollars (a+b)
3407 ...
3408 </programlisting>
3409 All the instance does is apply and remove the <literal>newtype</literal>
3410 constructor. It is particularly galling that, since the constructor
3411 doesn't appear at run-time, this instance declaration defines a
3412 dictionary which is <emphasis>wholly equivalent</emphasis> to the <literal>Int</literal>
3413 dictionary, only slower!
3414 </para>
3415
3416
3417 <sect3> <title> Generalising the deriving clause </title>
3418 <para>
3419 GHC now permits such instances to be derived instead,
3420 using the flag <option>-XGeneralizedNewtypeDeriving</option>,
3421 so one can write
3422 <programlisting>
3423 newtype Dollars = Dollars Int deriving (Eq,Show,Num)
3424 </programlisting>
3425
3426 and the implementation uses the <emphasis>same</emphasis> <literal>Num</literal> dictionary
3427 for <literal>Dollars</literal> as for <literal>Int</literal>. Notionally, the compiler
3428 derives an instance declaration of the form
3429
3430 <programlisting>
3431 instance Num Int => Num Dollars
3432 </programlisting>
3433
3434 which just adds or removes the <literal>newtype</literal> constructor according to the type.
3435 </para>
3436 <para>
3437
3438 We can also derive instances of constructor classes in a similar
3439 way. For example, suppose we have implemented state and failure monad
3440 transformers, such that
3441
3442 <programlisting>
3443 instance Monad m => Monad (State s m)
3444 instance Monad m => Monad (Failure m)
3445 </programlisting>
3446 In Haskell 98, we can define a parsing monad by
3447 <programlisting>
3448 type Parser tok m a = State [tok] (Failure m) a
3449 </programlisting>
3450
3451 which is automatically a monad thanks to the instance declarations
3452 above. With the extension, we can make the parser type abstract,
3453 without needing to write an instance of class <literal>Monad</literal>, via
3454
3455 <programlisting>
3456 newtype Parser tok m a = Parser (State [tok] (Failure m) a)
3457 deriving Monad
3458 </programlisting>
3459 In this case the derived instance declaration is of the form
3460 <programlisting>
3461 instance Monad (State [tok] (Failure m)) => Monad (Parser tok m)
3462 </programlisting>
3463
3464 Notice that, since <literal>Monad</literal> is a constructor class, the
3465 instance is a <emphasis>partial application</emphasis> of the new type, not the
3466 entire left hand side. We can imagine that the type declaration is
3467 "eta-converted" to generate the context of the instance
3468 declaration.
3469 </para>
3470 <para>
3471
3472 We can even derive instances of multi-parameter classes, provided the
3473 newtype is the last class parameter. In this case, a ``partial
3474 application'' of the class appears in the <literal>deriving</literal>
3475 clause. For example, given the class
3476
3477 <programlisting>
3478 class StateMonad s m | m -> s where ...
3479 instance Monad m => StateMonad s (State s m) where ...
3480 </programlisting>
3481 then we can derive an instance of <literal>StateMonad</literal> for <literal>Parser</literal>s by
3482 <programlisting>
3483 newtype Parser tok m a = Parser (State [tok] (Failure m) a)
3484 deriving (Monad, StateMonad [tok])
3485 </programlisting>
3486
3487 The derived instance is obtained by completing the application of the
3488 class to the new type:
3489
3490 <programlisting>
3491 instance StateMonad [tok] (State [tok] (Failure m)) =>
3492 StateMonad [tok] (Parser tok m)
3493 </programlisting>
3494 </para>
3495 <para>
3496
3497 As a result of this extension, all derived instances in newtype
3498 declarations are treated uniformly (and implemented just by reusing
3499 the dictionary for the representation type), <emphasis>except</emphasis>
3500 <literal>Show</literal> and <literal>Read</literal>, which really behave differently for
3501 the newtype and its representation.
3502 </para>
3503 </sect3>
3504
3505 <sect3> <title> A more precise specification </title>
3506 <para>
3507 Derived instance declarations are constructed as follows. Consider the
3508 declaration (after expansion of any type synonyms)
3509
3510 <programlisting>
3511 newtype T v1...vn = T' (t vk+1...vn) deriving (c1...cm)
3512 </programlisting>
3513
3514 where
3515 <itemizedlist>
3516 <listitem><para>
3517 The <literal>ci</literal> are partial applications of
3518 classes of the form <literal>C t1'...tj'</literal>, where the arity of <literal>C</literal>
3519 is exactly <literal>j+1</literal>. That is, <literal>C</literal> lacks exactly one type argument.
3520 </para></listitem>
3521 <listitem><para>
3522 The <literal>k</literal> is chosen so that <literal>ci (T v1...vk)</literal> is well-kinded.
3523 </para></listitem>
3524 <listitem><para>
3525 The type <literal>t</literal> is an arbitrary type.
3526 </para></listitem>
3527 <listitem><para>
3528 The type variables <literal>vk+1...vn</literal> do not occur in <literal>t</literal>,
3529 nor in the <literal>ci</literal>, and
3530 </para></listitem>
3531 <listitem><para>
3532 None of the <literal>ci</literal> is <literal>Read</literal>, <literal>Show</literal>,
3533 <literal>Typeable</literal>, or <literal>Data</literal>. These classes
3534 should not "look through" the type or its constructor. You can still
3535 derive these classes for a newtype, but it happens in the usual way, not
3536 via this new mechanism.
3537 </para></listitem>
3538 </itemizedlist>
3539 Then, for each <literal>ci</literal>, the derived instance
3540 declaration is:
3541 <programlisting>
3542 instance ci t => ci (T v1...vk)
3543 </programlisting>
3544 As an example which does <emphasis>not</emphasis> work, consider
3545 <programlisting>
3546 newtype NonMonad m s = NonMonad (State s m s) deriving Monad
3547 </programlisting>
3548 Here we cannot derive the instance
3549 <programlisting>
3550 instance Monad (State s m) => Monad (NonMonad m)
3551 </programlisting>
3552
3553 because the type variable <literal>s</literal> occurs in <literal>State s m</literal>,
3554 and so cannot be "eta-converted" away. It is a good thing that this
3555 <literal>deriving</literal> clause is rejected, because <literal>NonMonad m</literal> is
3556 not, in fact, a monad --- for the same reason. Try defining
3557 <literal>>>=</literal> with the correct type: you won't be able to.
3558 </para>
3559 <para>
3560
3561 Notice also that the <emphasis>order</emphasis> of class parameters becomes
3562 important, since we can only derive instances for the last one. If the
3563 <literal>StateMonad</literal> class above were instead defined as
3564
3565 <programlisting>
3566 class StateMonad m s | m -> s where ...
3567 </programlisting>
3568
3569 then we would not have been able to derive an instance for the
3570 <literal>Parser</literal> type above. We hypothesise that multi-parameter
3571 classes usually have one "main" parameter for which deriving new
3572 instances is most interesting.
3573 </para>
3574 <para>Lastly, all of this applies only for classes other than
3575 <literal>Read</literal>, <literal>Show</literal>, <literal>Typeable</literal>,
3576 and <literal>Data</literal>, for which the built-in derivation applies (section
3577 4.3.3. of the Haskell Report).
3578 (For the standard classes <literal>Eq</literal>, <literal>Ord</literal>,
3579 <literal>Ix</literal>, and <literal>Bounded</literal> it is immaterial whether
3580 the standard method is used or the one described here.)
3581 </para>
3582 </sect3>
3583 </sect2>
3584 </sect1>
3585
3586
3587 <!-- TYPE SYSTEM EXTENSIONS -->
3588 <sect1 id="type-class-extensions">
3589 <title>Class and instances declarations</title>
3590
3591 <sect2 id="multi-param-type-classes">
3592 <title>Class declarations</title>
3593
3594 <para>
3595 This section, and the next one, documents GHC's type-class extensions.
3596 There's lots of background in the paper <ulink
3597 url="http://research.microsoft.com/~simonpj/Papers/type-class-design-space/">Type
3598 classes: exploring the design space</ulink> (Simon Peyton Jones, Mark
3599 Jones, Erik Meijer).
3600 </para>
3601
3602 <sect3>
3603 <title>Multi-parameter type classes</title>
3604 <para>
3605 Multi-parameter type classes are permitted, with flag <option>-XMultiParamTypeClasses</option>.
3606 For example:
3607
3608
3609 <programlisting>
3610 class Collection c a where
3611 union :: c a -> c a -> c a
3612 ...etc.
3613 </programlisting>
3614
3615 </para>
3616 </sect3>
3617
3618 <sect3 id="superclass-rules">
3619 <title>The superclasses of a class declaration</title>
3620
3621 <para>
3622 In Haskell 98 the context of a class declaration (which introduces superclasses)
3623 must be simple; that is, each predicate must consist of a class applied to
3624 type variables. The flag <option>-XFlexibleContexts</option>
3625 (<xref linkend="flexible-contexts"/>)
3626 lifts this restriction,
3627 so that the only restriction on the context in a class declaration is
3628 that the class hierarchy must be acyclic. So these class declarations are OK:
3629
3630
3631 <programlisting>
3632 class Functor (m k) => FiniteMap m k where
3633 ...
3634
3635 class (Monad m, Monad (t m)) => Transform t m where
3636 lift :: m a -> (t m) a
3637 </programlisting>
3638
3639
3640 </para>
3641 <para>
3642 As in Haskell 98, The class hierarchy must be acyclic. However, the definition
3643 of "acyclic" involves only the superclass relationships. For example,
3644 this is OK:
3645
3646
3647 <programlisting>
3648 class C a where {
3649 op :: D b => a -> b -> b
3650 }
3651
3652 class C a => D a where { ... }
3653 </programlisting>
3654
3655
3656 Here, <literal>C</literal> is a superclass of <literal>D</literal>, but it's OK for a
3657 class operation <literal>op</literal> of <literal>C</literal> to mention <literal>D</literal>. (It
3658 would not be OK for <literal>D</literal> to be a superclass of <literal>C</literal>.)
3659 </para>
3660 <para>
3661 With the extension that adds a <link linkend="constraint-kind">kind of constraints</link>,
3662 you can write more exotic superclass definitions. The superclass cycle check is even more
3663 liberal in these case. For example, this is OK:
3664
3665 <programlisting>
3666 class A cls c where
3667 meth :: cls c => c -> c
3668
3669 class A B c => B c where
3670 </programlisting>
3671
3672 A superclass context for a class <literal>C</literal> is allowed if, after expanding
3673 type synonyms to their right-hand-sides, and uses of classes (other than <literal>C</literal>)
3674 to their superclasses, <literal>C</literal> does not occur syntactically in the context.
3675 </para>
3676 </sect3>
3677
3678
3679
3680
3681 <sect3 id="class-method-types">
3682 <title>Class method types</title>
3683
3684 <para>
3685 Haskell 98 prohibits class method types to mention constraints on the
3686 class type variable, thus:
3687 <programlisting>
3688 class Seq s a where
3689 fromList :: [a] -> s a
3690 elem :: Eq a => a -> s a -> Bool
3691 </programlisting>
3692 The type of <literal>elem</literal> is illegal in Haskell 98, because it
3693 contains the constraint <literal>Eq a</literal>, constrains only the
3694 class type variable (in this case <literal>a</literal>).
3695 GHC lifts this restriction (flag <option>-XConstrainedClassMethods</option>).
3696 </para>
3697
3698
3699 </sect3>
3700
3701
3702 <sect3 id="class-default-signatures">
3703 <title>Default method signatures</title>
3704
3705 <para>
3706 Haskell 98 allows you to define a default implementation when declaring a class:
3707 <programlisting>
3708 class Enum a where
3709 enum :: [a]
3710 enum = []
3711 </programlisting>
3712 The type of the <literal>enum</literal> method is <literal>[a]</literal>, and
3713 this is also the type of the default method. You can lift this restriction
3714 and give another type to the default method using the flag
3715 <option>-XDefaultSignatures</option>. For instance, if you have written a
3716 generic implementation of enumeration in a class <literal>GEnum</literal>
3717 with method <literal>genum</literal> in terms of <literal>GHC.Generics</literal>,
3718 you can specify a default method that uses that generic implementation:
3719 <programlisting>
3720 class Enum a where
3721 enum :: [a]
3722 default enum :: (Generic a, GEnum (Rep a)) => [a]
3723 enum = map to genum
3724 </programlisting>
3725 We reuse the keyword <literal>default</literal> to signal that a signature
3726 applies to the default method only; when defining instances of the
3727 <literal>Enum</literal> class, the original type <literal>[a]</literal> of
3728 <literal>enum</literal> still applies. When giving an empty instance, however,
3729 the default implementation <literal>map to0 genum</literal> is filled-in,
3730 and type-checked with the type
3731 <literal>(Generic a, GEnum (Rep a)) => [a]</literal>.
3732 </para>
3733
3734 <para>
3735 We use default signatures to simplify generic programming in GHC
3736 (<xref linkend="generic-programming"/>).
3737 </para>
3738
3739
3740 </sect3>
3741 </sect2>
3742
3743 <sect2 id="functional-dependencies">
3744 <title>Functional dependencies
3745 </title>
3746
3747 <para> Functional dependencies are implemented as described by Mark Jones
3748 in &ldquo;<ulink url="http://citeseer.ist.psu.edu/jones00type.html">Type Classes with Functional Dependencies</ulink>&rdquo;, Mark P. Jones,
3749 In Proceedings of the 9th European Symposium on Programming,
3750 ESOP 2000, Berlin, Germany, March 2000, Springer-Verlag LNCS 1782,
3751 .
3752 </para>
3753 <para>
3754 Functional dependencies are introduced by a vertical bar in the syntax of a
3755 class declaration; e.g.
3756 <programlisting>
3757 class (Monad m) => MonadState s m | m -> s where ...
3758
3759 class Foo a b c | a b -> c where ...
3760 </programlisting>
3761 There should be more documentation, but there isn't (yet). Yell if you need it.
3762 </para>
3763
3764 <sect3><title>Rules for functional dependencies </title>
3765 <para>
3766 In a class declaration, all of the class type variables must be reachable (in the sense
3767 mentioned in <xref linkend="flexible-contexts"/>)
3768 from the free variables of each method type.
3769 For example:
3770
3771 <programlisting>
3772 class Coll s a where
3773 empty :: s
3774 insert :: s -> a -> s
3775 </programlisting>
3776
3777 is not OK, because the type of <literal>empty</literal> doesn't mention
3778 <literal>a</literal>. Functional dependencies can make the type variable
3779 reachable:
3780 <programlisting>
3781 class Coll s a | s -> a where
3782 empty :: s
3783 insert :: s -> a -> s
3784 </programlisting>
3785
3786 Alternatively <literal>Coll</literal> might be rewritten
3787
3788 <programlisting>
3789 class Coll s a where
3790 empty :: s a
3791 insert :: s a -> a -> s a
3792 </programlisting>
3793
3794
3795 which makes the connection between the type of a collection of
3796 <literal>a</literal>'s (namely <literal>(s a)</literal>) and the element type <literal>a</literal>.
3797 Occasionally this really doesn't work, in which case you can split the
3798 class like this:
3799
3800
3801 <programlisting>
3802 class CollE s where
3803 empty :: s
3804
3805 class CollE s => Coll s a where
3806 insert :: s -> a -> s
3807 </programlisting>
3808 </para>
3809 </sect3>
3810
3811
3812 <sect3>
3813 <title>Background on functional dependencies</title>
3814
3815 <para>The following description of the motivation and use of functional dependencies is taken
3816 from the Hugs user manual, reproduced here (with minor changes) by kind
3817 permission of Mark Jones.
3818 </para>
3819 <para>
3820 Consider the following class, intended as part of a
3821 library for collection types:
3822 <programlisting>
3823 class Collects e ce where
3824 empty :: ce
3825 insert :: e -> ce -> ce
3826 member :: e -> ce -> Bool
3827 </programlisting>
3828 The type variable e used here represents the element type, while ce is the type
3829 of the container itself. Within this framework, we might want to define
3830 instances of this class for lists or characteristic functions (both of which
3831 can be used to represent collections of any equality type), bit sets (which can
3832 be used to represent collections of characters), or hash tables (which can be
3833 used to represent any collection whose elements have a hash function). Omitting
3834 standard implementation details, this would lead to the following declarations:
3835 <programlisting>
3836 instance Eq e => Collects e [e] where ...
3837 instance Eq e => Collects e (e -> Bool) where ...
3838 instance Collects Char BitSet where ...
3839 instance (Hashable e, Collects a ce)
3840 => Collects e (Array Int ce) where ...
3841 </programlisting>
3842 All this looks quite promising; we have a class and a range of interesting
3843 implementations. Unfortunately, there are some serious problems with the class
3844 declaration. First, the empty function has an ambiguous type:
3845 <programlisting>
3846 empty :: Collects e ce => ce
3847 </programlisting>
3848 By "ambiguous" we mean that there is a type variable e that appears on the left
3849 of the <literal>=&gt;</literal> symbol, but not on the right. The problem with
3850 this is that, according to the theoretical foundations of Haskell overloading,
3851 we cannot guarantee a well-defined semantics for any term with an ambiguous
3852 type.
3853 </para>
3854 <para>
3855 We can sidestep this specific problem by removing the empty member from the
3856 class declaration. However, although the remaining members, insert and member,
3857 do not have ambiguous types, we still run into problems when we try to use
3858 them. For example, consider the following two functions:
3859 <programlisting>
3860 f x y = insert x . insert y
3861 g = f True 'a'
3862 </programlisting>
3863 for which GHC infers the following types:
3864 <programlisting>
3865 f :: (Collects a c, Collects b c) => a -> b -> c -> c
3866 g :: (Collects Bool c, Collects Char c) => c -> c
3867 </programlisting>
3868 Notice that the type for f allows the two parameters x and y to be assigned
3869 different types, even though it attempts to insert each of the two values, one
3870 after the other, into the same collection. If we're trying to model collections
3871 that contain only one type of value, then this is clearly an inaccurate
3872 type. Worse still, the definition for g is accepted, without causing a type
3873 error. As a result, the error in this code will not be flagged at the point
3874 where it appears. Instead, it will show up only when we try to use g, which
3875 might even be in a different module.
3876 </para>
3877
3878 <sect4><title>An attempt to use constructor classes</title>
3879
3880 <para>
3881 Faced with the problems described above, some Haskell programmers might be
3882 tempted to use something like the following version of the class declaration:
3883 <programlisting>
3884 class Collects e c where
3885 empty :: c e
3886 insert :: e -> c e -> c e
3887 member :: e -> c e -> Bool
3888 </programlisting>
3889 The key difference here is that we abstract over the type constructor c that is
3890 used to form the collection type c e, and not over that collection type itself,
3891 represented by ce in the original class declaration. This avoids the immediate
3892 problems that we mentioned above: empty has type <literal>Collects e c => c
3893 e</literal>, which is not ambiguous.
3894 </para>
3895 <para>
3896 The function f from the previous section has a more accurate type:
3897 <programlisting>
3898 f :: (Collects e c) => e -> e -> c e -> c e
3899 </programlisting>
3900 The function g from the previous section is now rejected with a type error as
3901 we would hope because the type of f does not allow the two arguments to have
3902 different types.
3903 This, then, is an example of a multiple parameter class that does actually work
3904 quite well in practice, without ambiguity problems.
3905 There is, however, a catch. This version of the Collects class is nowhere near
3906 as general as the original class seemed to be: only one of the four instances
3907 for <literal>Collects</literal>
3908 given above can be used with this version of Collects because only one of
3909 them---the instance for lists---has a collection type that can be written in
3910 the form c e, for some type constructor c, and element type e.
3911 </para>
3912 </sect4>
3913
3914 <sect4><title>Adding functional dependencies</title>
3915
3916 <para>
3917 To get a more useful version of the Collects class, Hugs provides a mechanism
3918 that allows programmers to specify dependencies between the parameters of a
3919 multiple parameter class (For readers with an interest in theoretical
3920 foundations and previous work: The use of dependency information can be seen
3921 both as a generalization of the proposal for `parametric type classes' that was
3922 put forward by Chen, Hudak, and Odersky, or as a special case of Mark Jones's
3923 later framework for "improvement" of qualified types. The
3924 underlying ideas are also discussed in a more theoretical and abstract setting
3925 in a manuscript [implparam], where they are identified as one point in a
3926 general design space for systems of implicit parameterization.).
3927
3928 To start with an abstract example, consider a declaration such as:
3929 <programlisting>
3930 class C a b where ...
3931 </programlisting>
3932 which tells us simply that C can be thought of as a binary relation on types
3933 (or type constructors, depending on the kinds of a and b). Extra clauses can be
3934 included in the definition of classes to add information about dependencies
3935 between parameters, as in the following examples:
3936 <programlisting>
3937 class D a b | a -> b where ...
3938 class E a b | a -> b, b -> a where ...
3939 </programlisting>
3940 The notation <literal>a -&gt; b</literal> used here between the | and where
3941 symbols --- not to be
3942 confused with a function type --- indicates that the a parameter uniquely
3943 determines the b parameter, and might be read as "a determines b." Thus D is
3944 not just a relation, but actually a (partial) function. Similarly, from the two
3945 dependencies that are included in the definition of E, we can see that E
3946 represents a (partial) one-one mapping between types.
3947 </para>
3948 <para>
3949 More generally, dependencies take the form <literal>x1 ... xn -&gt; y1 ... ym</literal>,
3950 where x1, ..., xn, and y1, ..., yn are type variables with n&gt;0 and
3951 m&gt;=0, meaning that the y parameters are uniquely determined by the x
3952 parameters. Spaces can be used as separators if more than one variable appears
3953 on any single side of a dependency, as in <literal>t -&gt; a b</literal>. Note that a class may be
3954 annotated with multiple dependencies using commas as separators, as in the
3955 definition of E above. Some dependencies that we can write in this notation are
3956 redundant, and will be rejected because they don't serve any useful
3957 purpose, and may instead indicate an error in the program. Examples of
3958 dependencies like this include <literal>a -&gt; a </literal>,
3959 <literal>a -&gt; a a </literal>,
3960 <literal>a -&gt; </literal>, etc. There can also be
3961 some redundancy if multiple dependencies are given, as in
3962 <literal>a-&gt;b</literal>,
3963 <literal>b-&gt;c </literal>, <literal>a-&gt;c </literal>, and
3964 in which some subset implies the remaining dependencies. Examples like this are
3965 not treated as errors. Note that dependencies appear only in class
3966 declarations, and not in any other part of the language. In particular, the
3967 syntax for instance declarations, class constraints, and types is completely
3968 unchanged.
3969 </para>
3970 <para>
3971 By including dependencies in a class declaration, we provide a mechanism for
3972 the programmer to specify each multiple parameter class more precisely. The
3973 compiler, on the other hand, is responsible for ensuring that the set of
3974 instances that are in scope at any given point in the program is consistent
3975 with any declared dependencies. For example, the following pair of instance
3976 declarations cannot appear together in the same scope because they violate the
3977 dependency for D, even though either one on its own would be acceptable:
3978 <programlisting>
3979 instance D Bool Int where ...
3980 instance D Bool Char where ...
3981 </programlisting>
3982 Note also that the following declaration is not allowed, even by itself:
3983 <programlisting>
3984 instance D [a] b where ...
3985 </programlisting>
3986 The problem here is that this instance would allow one particular choice of [a]
3987 to be associated with more than one choice for b, which contradicts the
3988 dependency specified in the definition of D. More generally, this means that,
3989 in any instance of the form:
3990 <programlisting>
3991 instance D t s where ...
3992 </programlisting>
3993 for some particular types t and s, the only variables that can appear in s are
3994 the ones that appear in t, and hence, if the type t is known, then s will be
3995 uniquely determined.
3996 </para>
3997 <para>
3998 The benefit of including dependency information is that it allows us to define
3999 more general multiple parameter classes, without ambiguity problems, and with
4000 the benefit of more accurate types. To illustrate this, we return to the
4001 collection class example, and annotate the original definition of <literal>Collects</literal>
4002 with a simple dependency:
4003 <programlisting>
4004 class Collects e ce | ce -> e where
4005 empty :: ce
4006 insert :: e -> ce -> ce
4007 member :: e -> ce -> Bool
4008 </programlisting>
4009 The dependency <literal>ce -&gt; e</literal> here specifies that the type e of elements is uniquely
4010 determined by the type of the collection ce. Note that both parameters of
4011 Collects are of kind *; there are no constructor classes here. Note too that
4012 all of the instances of Collects that we gave earlier can be used
4013 together with this new definition.
4014 </para>
4015 <para>
4016 What about the ambiguity problems that we encountered with the original
4017 definition? The empty function still has type Collects e ce => ce, but it is no
4018 longer necessary to regard that as an ambiguous type: Although the variable e
4019 does not appear on the right of the => symbol, the dependency for class
4020 Collects tells us that it is uniquely determined by ce, which does appear on
4021 the right of the => symbol. Hence the context in which empty is used can still
4022 give enough information to determine types for both ce and e, without
4023 ambiguity. More generally, we need only regard a type as ambiguous if it
4024 contains a variable on the left of the => that is not uniquely determined
4025 (either directly or indirectly) by the variables on the right.
4026 </para>
4027 <para>
4028 Dependencies also help to produce more accurate types for user defined
4029 functions, and hence to provide earlier detection of errors, and less cluttered
4030 types for programmers to work with. Recall the previous definition for a
4031 function f:
4032 <programlisting>
4033 f x y = insert x y = insert x . insert y
4034 </programlisting>
4035 for which we originally obtained a type:
4036 <programlisting>
4037 f :: (Collects a c, Collects b c) => a -> b -> c -> c
4038 </programlisting>
4039 Given the dependency information that we have for Collects, however, we can
4040 deduce that a and b must be equal because they both appear as the second
4041 parameter in a Collects constraint with the same first parameter c. Hence we
4042 can infer a shorter and more accurate type for f:
4043 <programlisting>
4044 f :: (Collects a c) => a -> a -> c -> c
4045 </programlisting>
4046 In a similar way, the earlier definition of g will now be flagged as a type error.
4047 </para>
4048 <para>
4049 Although we have given only a few examples here, it should be clear that the
4050 addition of dependency information can help to make multiple parameter classes
4051 more useful in practice, avoiding ambiguity problems, and allowing more general
4052 sets of instance declarations.
4053 </para>
4054 </sect4>
4055 </sect3>
4056 </sect2>
4057
4058 <sect2 id="instance-decls">
4059 <title>Instance declarations</title>
4060
4061 <para>An instance declaration has the form
4062 <screen>
4063 instance ( <replaceable>assertion</replaceable><subscript>1</subscript>, ..., <replaceable>assertion</replaceable><subscript>n</subscript>) =&gt; <replaceable>class</replaceable> <replaceable>type</replaceable><subscript>1</subscript> ... <replaceable>type</replaceable><subscript>m</subscript> where ...
4064 </screen>
4065 The part before the "<literal>=&gt;</literal>" is the
4066 <emphasis>context</emphasis>, while the part after the
4067 "<literal>=&gt;</literal>" is the <emphasis>head</emphasis> of the instance declaration.
4068 </para>
4069
4070 <sect3 id="flexible-instance-head">
4071 <title>Relaxed rules for the instance head</title>
4072
4073 <para>
4074 In Haskell 98 the head of an instance declaration
4075 must be of the form <literal>C (T a1 ... an)</literal>, where
4076 <literal>C</literal> is the class, <literal>T</literal> is a data type constructor,
4077 and the <literal>a1 ... an</literal> are distinct type variables.
4078 GHC relaxes these rules in two ways.
4079 <itemizedlist>
4080 <listitem><para>
4081 With the <option>-XTypeSynonymInstances</option> flag, instance heads may use type
4082 synonyms. As always, using a type synonym is just shorthand for
4083 writing the RHS of the type synonym definition. For example:
4084 <programlisting>
4085 type Point a = (a,a)
4086 instance C (Point a) where ...
4087 </programlisting>
4088 is legal. The instance declaration is equivalent to
4089 <programlisting>
4090 instance C (a,a) where ...
4091 </programlisting>
4092 As always, type synonyms
4093 must be fully applied. You cannot, for example, write:
4094 <programlisting>
4095 instance Monad Point where ...
4096 </programlisting>
4097 </para></listitem>
4098
4099 <listitem>
4100 <para>
4101 The <option>-XFlexibleInstances</option> flag allows the head of the instance
<