Rearrange the typechecking of arrows, especially arrow "forms"
[ghc.git] / docs / users_guide / glasgow_exts.xml
1 <?xml version="1.0" encoding="iso-8859-1"?>
2 <para>
3 <indexterm><primary>language, GHC</primary></indexterm>
4 <indexterm><primary>extensions, GHC</primary></indexterm>
5 As with all known Haskell systems, GHC implements some extensions to
6 the language. They can all be enabled or disabled by commandline flags
7 or language pragmas. By default GHC understands the most recent Haskell
8 version it supports, plus a handful of extensions.
9 </para>
10
11 <para>
12 Some of the Glasgow extensions serve to give you access to the
13 underlying facilities with which we implement Haskell. Thus, you can
14 get at the Raw Iron, if you are willing to write some non-portable
15 code at a more primitive level. You need not be &ldquo;stuck&rdquo;
16 on performance because of the implementation costs of Haskell's
17 &ldquo;high-level&rdquo; features&mdash;you can always code
18 &ldquo;under&rdquo; them. In an extreme case, you can write all your
19 time-critical code in C, and then just glue it together with Haskell!
20 </para>
21
22 <para>
23 Before you get too carried away working at the lowest level (e.g.,
24 sloshing <literal>MutableByteArray&num;</literal>s around your
25 program), you may wish to check if there are libraries that provide a
26 &ldquo;Haskellised veneer&rdquo; over the features you want. The
27 separate <ulink url="../libraries/index.html">libraries
28 documentation</ulink> describes all the libraries that come with GHC.
29 </para>
30
31 <!-- LANGUAGE OPTIONS -->
32 <sect1 id="options-language">
33 <title>Language options</title>
34
35 <indexterm><primary>language</primary><secondary>option</secondary>
36 </indexterm>
37 <indexterm><primary>options</primary><secondary>language</secondary>
38 </indexterm>
39 <indexterm><primary>extensions</primary><secondary>options controlling</secondary>
40 </indexterm>
41
42 <para>The language option flags control what variation of the language are
43 permitted.</para>
44
45 <para>Language options can be controlled in two ways:
46 <itemizedlist>
47 <listitem><para>Every language option can switched on by a command-line flag "<option>-X...</option>"
48 (e.g. <option>-XTemplateHaskell</option>), and switched off by the flag "<option>-XNo...</option>";
49 (e.g. <option>-XNoTemplateHaskell</option>).</para></listitem>
50 <listitem><para>
51 Language options recognised by Cabal can also be enabled using the <literal>LANGUAGE</literal> pragma,
52 thus <literal>{-# LANGUAGE TemplateHaskell #-}</literal> (see <xref linkend="language-pragma"/>). </para>
53 </listitem>
54 </itemizedlist></para>
55
56 <para>The flag <option>-fglasgow-exts</option>
57 <indexterm><primary><option>-fglasgow-exts</option></primary></indexterm>
58 is equivalent to enabling the following extensions:
59 &what_glasgow_exts_does;
60 Enabling these options is the <emphasis>only</emphasis>
61 effect of <option>-fglasgow-exts</option>.
62 We are trying to move away from this portmanteau flag,
63 and towards enabling features individually.</para>
64
65 </sect1>
66
67 <!-- UNBOXED TYPES AND PRIMITIVE OPERATIONS -->
68 <sect1 id="primitives">
69 <title>Unboxed types and primitive operations</title>
70
71 <para>GHC is built on a raft of primitive data types and operations;
72 "primitive" in the sense that they cannot be defined in Haskell itself.
73 While you really can use this stuff to write fast code,
74 we generally find it a lot less painful, and more satisfying in the
75 long run, to use higher-level language features and libraries. With
76 any luck, the code you write will be optimised to the efficient
77 unboxed version in any case. And if it isn't, we'd like to know
78 about it.</para>
79
80 <para>All these primitive data types and operations are exported by the
81 library <literal>GHC.Prim</literal>, for which there is
82 <ulink url="&libraryGhcPrimLocation;/GHC-Prim.html">detailed online documentation</ulink>.
83 (This documentation is generated from the file <filename>compiler/prelude/primops.txt.pp</filename>.)
84 </para>
85
86 <para>
87 If you want to mention any of the primitive data types or operations in your
88 program, you must first import <literal>GHC.Prim</literal> to bring them
89 into scope. Many of them have names ending in "&num;", and to mention such
90 names you need the <option>-XMagicHash</option> extension (<xref linkend="magic-hash"/>).
91 </para>
92
93 <para>The primops make extensive use of <link linkend="glasgow-unboxed">unboxed types</link>
94 and <link linkend="unboxed-tuples">unboxed tuples</link>, which
95 we briefly summarise here. </para>
96
97 <sect2 id="glasgow-unboxed">
98 <title>Unboxed types</title>
99
100 <para>
101 <indexterm><primary>Unboxed types (Glasgow extension)</primary></indexterm>
102 </para>
103
104 <para>Most types in GHC are <firstterm>boxed</firstterm>, which means
105 that values of that type are represented by a pointer to a heap
106 object. The representation of a Haskell <literal>Int</literal>, for
107 example, is a two-word heap object. An <firstterm>unboxed</firstterm>
108 type, however, is represented by the value itself, no pointers or heap
109 allocation are involved.
110 </para>
111
112 <para>
113 Unboxed types correspond to the &ldquo;raw machine&rdquo; types you
114 would use in C: <literal>Int&num;</literal> (long int),
115 <literal>Double&num;</literal> (double), <literal>Addr&num;</literal>
116 (void *), etc. The <emphasis>primitive operations</emphasis>
117 (PrimOps) on these types are what you might expect; e.g.,
118 <literal>(+&num;)</literal> is addition on
119 <literal>Int&num;</literal>s, and is the machine-addition that we all
120 know and love&mdash;usually one instruction.
121 </para>
122
123 <para>
124 Primitive (unboxed) types cannot be defined in Haskell, and are
125 therefore built into the language and compiler. Primitive types are
126 always unlifted; that is, a value of a primitive type cannot be
127 bottom. We use the convention (but it is only a convention)
128 that primitive types, values, and
129 operations have a <literal>&num;</literal> suffix (see <xref linkend="magic-hash"/>).
130 For some primitive types we have special syntax for literals, also
131 described in the <link linkend="magic-hash">same section</link>.
132 </para>
133
134 <para>
135 Primitive values are often represented by a simple bit-pattern, such
136 as <literal>Int&num;</literal>, <literal>Float&num;</literal>,
137 <literal>Double&num;</literal>. But this is not necessarily the case:
138 a primitive value might be represented by a pointer to a
139 heap-allocated object. Examples include
140 <literal>Array&num;</literal>, the type of primitive arrays. A
141 primitive array is heap-allocated because it is too big a value to fit
142 in a register, and would be too expensive to copy around; in a sense,
143 it is accidental that it is represented by a pointer. If a pointer
144 represents a primitive value, then it really does point to that value:
145 no unevaluated thunks, no indirections&hellip;nothing can be at the
146 other end of the pointer than the primitive value.
147 A numerically-intensive program using unboxed types can
148 go a <emphasis>lot</emphasis> faster than its &ldquo;standard&rdquo;
149 counterpart&mdash;we saw a threefold speedup on one example.
150 </para>
151
152 <para>
153 There are some restrictions on the use of primitive types:
154 <itemizedlist>
155 <listitem><para>The main restriction
156 is that you can't pass a primitive value to a polymorphic
157 function or store one in a polymorphic data type. This rules out
158 things like <literal>[Int&num;]</literal> (i.e. lists of primitive
159 integers). The reason for this restriction is that polymorphic
160 arguments and constructor fields are assumed to be pointers: if an
161 unboxed integer is stored in one of these, the garbage collector would
162 attempt to follow it, leading to unpredictable space leaks. Or a
163 <function>seq</function> operation on the polymorphic component may
164 attempt to dereference the pointer, with disastrous results. Even
165 worse, the unboxed value might be larger than a pointer
166 (<literal>Double&num;</literal> for instance).
167 </para>
168 </listitem>
169 <listitem><para> You cannot define a newtype whose representation type
170 (the argument type of the data constructor) is an unboxed type. Thus,
171 this is illegal:
172 <programlisting>
173 newtype A = MkA Int#
174 </programlisting>
175 </para></listitem>
176 <listitem><para> You cannot bind a variable with an unboxed type
177 in a <emphasis>top-level</emphasis> binding.
178 </para></listitem>
179 <listitem><para> You cannot bind a variable with an unboxed type
180 in a <emphasis>recursive</emphasis> binding.
181 </para></listitem>
182 <listitem><para> You may bind unboxed variables in a (non-recursive,
183 non-top-level) pattern binding, but you must make any such pattern-match
184 strict. For example, rather than:
185 <programlisting>
186 data Foo = Foo Int Int#
187
188 f x = let (Foo a b, w) = ..rhs.. in ..body..
189 </programlisting>
190 you must write:
191 <programlisting>
192 data Foo = Foo Int Int#
193
194 f x = let !(Foo a b, w) = ..rhs.. in ..body..
195 </programlisting>
196 since <literal>b</literal> has type <literal>Int#</literal>.
197 </para>
198 </listitem>
199 </itemizedlist>
200 </para>
201
202 </sect2>
203
204 <sect2 id="unboxed-tuples">
205 <title>Unboxed tuples</title>
206
207 <para>
208 Unboxed tuples aren't really exported by <literal>GHC.Exts</literal>;
209 they are a syntactic extension enabled by the language flag <option>-XUnboxedTuples</option>. An
210 unboxed tuple looks like this:
211 </para>
212
213 <para>
214
215 <programlisting>
216 (# e_1, ..., e_n #)
217 </programlisting>
218
219 </para>
220
221 <para>
222 where <literal>e&lowbar;1..e&lowbar;n</literal> are expressions of any
223 type (primitive or non-primitive). The type of an unboxed tuple looks
224 the same.
225 </para>
226
227 <para>
228 Note that when unboxed tuples are enabled,
229 <literal>(#</literal> is a single lexeme, so for example when using
230 operators like <literal>#</literal> and <literal>#-</literal> you need
231 to write <literal>( # )</literal> and <literal>( #- )</literal> rather than
232 <literal>(#)</literal> and <literal>(#-)</literal>.
233 </para>
234
235 <para>
236 Unboxed tuples are used for functions that need to return multiple
237 values, but they avoid the heap allocation normally associated with
238 using fully-fledged tuples. When an unboxed tuple is returned, the
239 components are put directly into registers or on the stack; the
240 unboxed tuple itself does not have a composite representation. Many
241 of the primitive operations listed in <literal>primops.txt.pp</literal> return unboxed
242 tuples.
243 In particular, the <literal>IO</literal> and <literal>ST</literal> monads use unboxed
244 tuples to avoid unnecessary allocation during sequences of operations.
245 </para>
246
247 <para>
248 There are some restrictions on the use of unboxed tuples:
249 <itemizedlist>
250
251 <listitem>
252 <para>
253 Values of unboxed tuple types are subject to the same restrictions as
254 other unboxed types; i.e. they may not be stored in polymorphic data
255 structures or passed to polymorphic functions.
256 </para>
257 </listitem>
258
259 <listitem>
260 <para>
261 The typical use of unboxed tuples is simply to return multiple values,
262 binding those multiple results with a <literal>case</literal> expression, thus:
263 <programlisting>
264 f x y = (# x+1, y-1 #)
265 g x = case f x x of { (# a, b #) -&#62; a + b }
266 </programlisting>
267 You can have an unboxed tuple in a pattern binding, thus
268 <programlisting>
269 f x = let (# p,q #) = h x in ..body..
270 </programlisting>
271 If the types of <literal>p</literal> and <literal>q</literal> are not unboxed,
272 the resulting binding is lazy like any other Haskell pattern binding. The
273 above example desugars like this:
274 <programlisting>
275 f x = let t = case h x o f{ (# p,q #) -> (p,q)
276 p = fst t
277 q = snd t
278 in ..body..
279 </programlisting>
280 Indeed, the bindings can even be recursive.
281 </para>
282 </listitem>
283 </itemizedlist>
284
285 </para>
286
287 </sect2>
288 </sect1>
289
290
291 <!-- ====================== SYNTACTIC EXTENSIONS ======================= -->
292
293 <sect1 id="syntax-extns">
294 <title>Syntactic extensions</title>
295
296 <sect2 id="unicode-syntax">
297 <title>Unicode syntax</title>
298 <para>The language
299 extension <option>-XUnicodeSyntax</option><indexterm><primary><option>-XUnicodeSyntax</option></primary></indexterm>
300 enables Unicode characters to be used to stand for certain ASCII
301 character sequences. The following alternatives are provided:</para>
302
303 <informaltable>
304 <tgroup cols="2" align="left" colsep="1" rowsep="1">
305 <thead>
306 <row>
307 <entry>ASCII</entry>
308 <entry>Unicode alternative</entry>
309 <entry>Code point</entry>
310 <entry>Name</entry>
311 </row>
312 </thead>
313
314 <!--
315 to find the DocBook entities for these characters, find
316 the Unicode code point (e.g. 0x2237), and grep for it in
317 /usr/share/sgml/docbook/xml-dtd-*/ent/* (or equivalent on
318 your system. Some of these Unicode code points don't have
319 equivalent DocBook entities.
320 -->
321
322 <tbody>
323 <row>
324 <entry><literal>::</literal></entry>
325 <entry>::</entry> <!-- no special char, apparently -->
326 <entry>0x2237</entry>
327 <entry>PROPORTION</entry>
328 </row>
329 </tbody>
330 <tbody>
331 <row>
332 <entry><literal>=&gt;</literal></entry>
333 <entry>&rArr;</entry>
334 <entry>0x21D2</entry>
335 <entry>RIGHTWARDS DOUBLE ARROW</entry>
336 </row>
337 </tbody>
338 <tbody>
339 <row>
340 <entry><literal>forall</literal></entry>
341 <entry>&forall;</entry>
342 <entry>0x2200</entry>
343 <entry>FOR ALL</entry>
344 </row>
345 </tbody>
346 <tbody>
347 <row>
348 <entry><literal>-&gt;</literal></entry>
349 <entry>&rarr;</entry>
350 <entry>0x2192</entry>
351 <entry>RIGHTWARDS ARROW</entry>
352 </row>
353 </tbody>
354 <tbody>
355 <row>
356 <entry><literal>&lt;-</literal></entry>
357 <entry>&larr;</entry>
358 <entry>0x2190</entry>
359 <entry>LEFTWARDS ARROW</entry>
360 </row>
361 </tbody>
362
363 <tbody>
364 <row>
365 <entry>-&lt;</entry>
366 <entry>&larrtl;</entry>
367 <entry>0x2919</entry>
368 <entry>LEFTWARDS ARROW-TAIL</entry>
369 </row>
370 </tbody>
371
372 <tbody>
373 <row>
374 <entry>&gt;-</entry>
375 <entry>&rarrtl;</entry>
376 <entry>0x291A</entry>
377 <entry>RIGHTWARDS ARROW-TAIL</entry>
378 </row>
379 </tbody>
380
381 <tbody>
382 <row>
383 <entry>-&lt;&lt;</entry>
384 <entry></entry>
385 <entry>0x291B</entry>
386 <entry>LEFTWARDS DOUBLE ARROW-TAIL</entry>
387 </row>
388 </tbody>
389
390 <tbody>
391 <row>
392 <entry>&gt;&gt;-</entry>
393 <entry></entry>
394 <entry>0x291C</entry>
395 <entry>RIGHTWARDS DOUBLE ARROW-TAIL</entry>
396 </row>
397 </tbody>
398
399 <tbody>
400 <row>
401 <entry>*</entry>
402 <entry>&starf;</entry>
403 <entry>0x2605</entry>
404 <entry>BLACK STAR</entry>
405 </row>
406 </tbody>
407
408 </tgroup>
409 </informaltable>
410 </sect2>
411
412 <sect2 id="magic-hash">
413 <title>The magic hash</title>
414 <para>The language extension <option>-XMagicHash</option> allows "&num;" as a
415 postfix modifier to identifiers. Thus, "x&num;" is a valid variable, and "T&num;" is
416 a valid type constructor or data constructor.</para>
417
418 <para>The hash sign does not change semantics at all. We tend to use variable
419 names ending in "&num;" for unboxed values or types (e.g. <literal>Int&num;</literal>),
420 but there is no requirement to do so; they are just plain ordinary variables.
421 Nor does the <option>-XMagicHash</option> extension bring anything into scope.
422 For example, to bring <literal>Int&num;</literal> into scope you must
423 import <literal>GHC.Prim</literal> (see <xref linkend="primitives"/>);
424 the <option>-XMagicHash</option> extension
425 then allows you to <emphasis>refer</emphasis> to the <literal>Int&num;</literal>
426 that is now in scope.</para>
427 <para> The <option>-XMagicHash</option> also enables some new forms of literals (see <xref linkend="glasgow-unboxed"/>):
428 <itemizedlist>
429 <listitem><para> <literal>'x'&num;</literal> has type <literal>Char&num;</literal></para> </listitem>
430 <listitem><para> <literal>&quot;foo&quot;&num;</literal> has type <literal>Addr&num;</literal></para> </listitem>
431 <listitem><para> <literal>3&num;</literal> has type <literal>Int&num;</literal>. In general,
432 any Haskell integer lexeme followed by a <literal>&num;</literal> is an <literal>Int&num;</literal> literal, e.g.
433 <literal>-0x3A&num;</literal> as well as <literal>32&num;</literal></para>.</listitem>
434 <listitem><para> <literal>3&num;&num;</literal> has type <literal>Word&num;</literal>. In general,
435 any non-negative Haskell integer lexeme followed by <literal>&num;&num;</literal>
436 is a <literal>Word&num;</literal>. </para> </listitem>
437 <listitem><para> <literal>3.2&num;</literal> has type <literal>Float&num;</literal>.</para> </listitem>
438 <listitem><para> <literal>3.2&num;&num;</literal> has type <literal>Double&num;</literal></para> </listitem>
439 </itemizedlist>
440 </para>
441 </sect2>
442
443 <!-- ====================== HIERARCHICAL MODULES ======================= -->
444
445
446 <sect2 id="hierarchical-modules">
447 <title>Hierarchical Modules</title>
448
449 <para>GHC supports a small extension to the syntax of module
450 names: a module name is allowed to contain a dot
451 <literal>&lsquo;.&rsquo;</literal>. This is also known as the
452 &ldquo;hierarchical module namespace&rdquo; extension, because
453 it extends the normally flat Haskell module namespace into a
454 more flexible hierarchy of modules.</para>
455
456 <para>This extension has very little impact on the language
457 itself; modules names are <emphasis>always</emphasis> fully
458 qualified, so you can just think of the fully qualified module
459 name as <quote>the module name</quote>. In particular, this
460 means that the full module name must be given after the
461 <literal>module</literal> keyword at the beginning of the
462 module; for example, the module <literal>A.B.C</literal> must
463 begin</para>
464
465 <programlisting>module A.B.C</programlisting>
466
467
468 <para>It is a common strategy to use the <literal>as</literal>
469 keyword to save some typing when using qualified names with
470 hierarchical modules. For example:</para>
471
472 <programlisting>
473 import qualified Control.Monad.ST.Strict as ST
474 </programlisting>
475
476 <para>For details on how GHC searches for source and interface
477 files in the presence of hierarchical modules, see <xref
478 linkend="search-path"/>.</para>
479
480 <para>GHC comes with a large collection of libraries arranged
481 hierarchically; see the accompanying <ulink
482 url="../libraries/index.html">library
483 documentation</ulink>. More libraries to install are available
484 from <ulink
485 url="http://hackage.haskell.org/packages/hackage.html">HackageDB</ulink>.</para>
486 </sect2>
487
488 <!-- ====================== PATTERN GUARDS ======================= -->
489
490 <sect2 id="pattern-guards">
491 <title>Pattern guards</title>
492
493 <para>
494 <indexterm><primary>Pattern guards (Glasgow extension)</primary></indexterm>
495 The discussion that follows is an abbreviated version of Simon Peyton Jones's original <ulink url="http://research.microsoft.com/~simonpj/Haskell/guards.html">proposal</ulink>. (Note that the proposal was written before pattern guards were implemented, so refers to them as unimplemented.)
496 </para>
497
498 <para>
499 Suppose we have an abstract data type of finite maps, with a
500 lookup operation:
501
502 <programlisting>
503 lookup :: FiniteMap -> Int -> Maybe Int
504 </programlisting>
505
506 The lookup returns <function>Nothing</function> if the supplied key is not in the domain of the mapping, and <function>(Just v)</function> otherwise,
507 where <varname>v</varname> is the value that the key maps to. Now consider the following definition:
508 </para>
509
510 <programlisting>
511 clunky env var1 var2 | ok1 &amp;&amp; ok2 = val1 + val2
512 | otherwise = var1 + var2
513 where
514 m1 = lookup env var1
515 m2 = lookup env var2
516 ok1 = maybeToBool m1
517 ok2 = maybeToBool m2
518 val1 = expectJust m1
519 val2 = expectJust m2
520 </programlisting>
521
522 <para>
523 The auxiliary functions are
524 </para>
525
526 <programlisting>
527 maybeToBool :: Maybe a -&gt; Bool
528 maybeToBool (Just x) = True
529 maybeToBool Nothing = False
530
531 expectJust :: Maybe a -&gt; a
532 expectJust (Just x) = x
533 expectJust Nothing = error "Unexpected Nothing"
534 </programlisting>
535
536 <para>
537 What is <function>clunky</function> doing? The guard <literal>ok1 &amp;&amp;
538 ok2</literal> checks that both lookups succeed, using
539 <function>maybeToBool</function> to convert the <function>Maybe</function>
540 types to booleans. The (lazily evaluated) <function>expectJust</function>
541 calls extract the values from the results of the lookups, and binds the
542 returned values to <varname>val1</varname> and <varname>val2</varname>
543 respectively. If either lookup fails, then clunky takes the
544 <literal>otherwise</literal> case and returns the sum of its arguments.
545 </para>
546
547 <para>
548 This is certainly legal Haskell, but it is a tremendously verbose and
549 un-obvious way to achieve the desired effect. Arguably, a more direct way
550 to write clunky would be to use case expressions:
551 </para>
552
553 <programlisting>
554 clunky env var1 var2 = case lookup env var1 of
555 Nothing -&gt; fail
556 Just val1 -&gt; case lookup env var2 of
557 Nothing -&gt; fail
558 Just val2 -&gt; val1 + val2
559 where
560 fail = var1 + var2
561 </programlisting>
562
563 <para>
564 This is a bit shorter, but hardly better. Of course, we can rewrite any set
565 of pattern-matching, guarded equations as case expressions; that is
566 precisely what the compiler does when compiling equations! The reason that
567 Haskell provides guarded equations is because they allow us to write down
568 the cases we want to consider, one at a time, independently of each other.
569 This structure is hidden in the case version. Two of the right-hand sides
570 are really the same (<function>fail</function>), and the whole expression
571 tends to become more and more indented.
572 </para>
573
574 <para>
575 Here is how I would write clunky:
576 </para>
577
578 <programlisting>
579 clunky env var1 var2
580 | Just val1 &lt;- lookup env var1
581 , Just val2 &lt;- lookup env var2
582 = val1 + val2
583 ...other equations for clunky...
584 </programlisting>
585
586 <para>
587 The semantics should be clear enough. The qualifiers are matched in order.
588 For a <literal>&lt;-</literal> qualifier, which I call a pattern guard, the
589 right hand side is evaluated and matched against the pattern on the left.
590 If the match fails then the whole guard fails and the next equation is
591 tried. If it succeeds, then the appropriate binding takes place, and the
592 next qualifier is matched, in the augmented environment. Unlike list
593 comprehensions, however, the type of the expression to the right of the
594 <literal>&lt;-</literal> is the same as the type of the pattern to its
595 left. The bindings introduced by pattern guards scope over all the
596 remaining guard qualifiers, and over the right hand side of the equation.
597 </para>
598
599 <para>
600 Just as with list comprehensions, boolean expressions can be freely mixed
601 with among the pattern guards. For example:
602 </para>
603
604 <programlisting>
605 f x | [y] &lt;- x
606 , y > 3
607 , Just z &lt;- h y
608 = ...
609 </programlisting>
610
611 <para>
612 Haskell's current guards therefore emerge as a special case, in which the
613 qualifier list has just one element, a boolean expression.
614 </para>
615 </sect2>
616
617 <!-- ===================== View patterns =================== -->
618
619 <sect2 id="view-patterns">
620 <title>View patterns
621 </title>
622
623 <para>
624 View patterns are enabled by the flag <literal>-XViewPatterns</literal>.
625 More information and examples of view patterns can be found on the
626 <ulink url="http://hackage.haskell.org/trac/ghc/wiki/ViewPatterns">Wiki
627 page</ulink>.
628 </para>
629
630 <para>
631 View patterns are somewhat like pattern guards that can be nested inside
632 of other patterns. They are a convenient way of pattern-matching
633 against values of abstract types. For example, in a programming language
634 implementation, we might represent the syntax of the types of the
635 language as follows:
636
637 <programlisting>
638 type Typ
639
640 data TypView = Unit
641 | Arrow Typ Typ
642
643 view :: Typ -> TypView
644
645 -- additional operations for constructing Typ's ...
646 </programlisting>
647
648 The representation of Typ is held abstract, permitting implementations
649 to use a fancy representation (e.g., hash-consing to manage sharing).
650
651 Without view patterns, using this signature a little inconvenient:
652 <programlisting>
653 size :: Typ -> Integer
654 size t = case view t of
655 Unit -> 1
656 Arrow t1 t2 -> size t1 + size t2
657 </programlisting>
658
659 It is necessary to iterate the case, rather than using an equational
660 function definition. And the situation is even worse when the matching
661 against <literal>t</literal> is buried deep inside another pattern.
662 </para>
663
664 <para>
665 View patterns permit calling the view function inside the pattern and
666 matching against the result:
667 <programlisting>
668 size (view -> Unit) = 1
669 size (view -> Arrow t1 t2) = size t1 + size t2
670 </programlisting>
671
672 That is, we add a new form of pattern, written
673 <replaceable>expression</replaceable> <literal>-></literal>
674 <replaceable>pattern</replaceable> that means "apply the expression to
675 whatever we're trying to match against, and then match the result of
676 that application against the pattern". The expression can be any Haskell
677 expression of function type, and view patterns can be used wherever
678 patterns are used.
679 </para>
680
681 <para>
682 The semantics of a pattern <literal>(</literal>
683 <replaceable>exp</replaceable> <literal>-></literal>
684 <replaceable>pat</replaceable> <literal>)</literal> are as follows:
685
686 <itemizedlist>
687
688 <listitem> Scoping:
689
690 <para>The variables bound by the view pattern are the variables bound by
691 <replaceable>pat</replaceable>.
692 </para>
693
694 <para>
695 Any variables in <replaceable>exp</replaceable> are bound occurrences,
696 but variables bound "to the left" in a pattern are in scope. This
697 feature permits, for example, one argument to a function to be used in
698 the view of another argument. For example, the function
699 <literal>clunky</literal> from <xref linkend="pattern-guards" /> can be
700 written using view patterns as follows:
701
702 <programlisting>
703 clunky env (lookup env -> Just val1) (lookup env -> Just val2) = val1 + val2
704 ...other equations for clunky...
705 </programlisting>
706 </para>
707
708 <para>
709 More precisely, the scoping rules are:
710 <itemizedlist>
711 <listitem>
712 <para>
713 In a single pattern, variables bound by patterns to the left of a view
714 pattern expression are in scope. For example:
715 <programlisting>
716 example :: Maybe ((String -> Integer,Integer), String) -> Bool
717 example Just ((f,_), f -> 4) = True
718 </programlisting>
719
720 Additionally, in function definitions, variables bound by matching earlier curried
721 arguments may be used in view pattern expressions in later arguments:
722 <programlisting>
723 example :: (String -> Integer) -> String -> Bool
724 example f (f -> 4) = True
725 </programlisting>
726 That is, the scoping is the same as it would be if the curried arguments
727 were collected into a tuple.
728 </para>
729 </listitem>
730
731 <listitem>
732 <para>
733 In mutually recursive bindings, such as <literal>let</literal>,
734 <literal>where</literal>, or the top level, view patterns in one
735 declaration may not mention variables bound by other declarations. That
736 is, each declaration must be self-contained. For example, the following
737 program is not allowed:
738 <programlisting>
739 let {(x -> y) = e1 ;
740 (y -> x) = e2 } in x
741 </programlisting>
742
743 (For some amplification on this design choice see
744 <ulink url="http://hackage.haskell.org/trac/ghc/ticket/4061">Trac #4061</ulink>.)
745
746 </para>
747 </listitem>
748 </itemizedlist>
749
750 </para>
751 </listitem>
752
753 <listitem><para> Typing: If <replaceable>exp</replaceable> has type
754 <replaceable>T1</replaceable> <literal>-></literal>
755 <replaceable>T2</replaceable> and <replaceable>pat</replaceable> matches
756 a <replaceable>T2</replaceable>, then the whole view pattern matches a
757 <replaceable>T1</replaceable>.
758 </para></listitem>
759
760 <listitem><para> Matching: To the equations in Section 3.17.3 of the
761 <ulink url="http://www.haskell.org/onlinereport/">Haskell 98
762 Report</ulink>, add the following:
763 <programlisting>
764 case v of { (e -> p) -> e1 ; _ -> e2 }
765 =
766 case (e v) of { p -> e1 ; _ -> e2 }
767 </programlisting>
768 That is, to match a variable <replaceable>v</replaceable> against a pattern
769 <literal>(</literal> <replaceable>exp</replaceable>
770 <literal>-></literal> <replaceable>pat</replaceable>
771 <literal>)</literal>, evaluate <literal>(</literal>
772 <replaceable>exp</replaceable> <replaceable> v</replaceable>
773 <literal>)</literal> and match the result against
774 <replaceable>pat</replaceable>.
775 </para></listitem>
776
777 <listitem><para> Efficiency: When the same view function is applied in
778 multiple branches of a function definition or a case expression (e.g.,
779 in <literal>size</literal> above), GHC makes an attempt to collect these
780 applications into a single nested case expression, so that the view
781 function is only applied once. Pattern compilation in GHC follows the
782 matrix algorithm described in Chapter 4 of <ulink
783 url="http://research.microsoft.com/~simonpj/Papers/slpj-book-1987/">The
784 Implementation of Functional Programming Languages</ulink>. When the
785 top rows of the first column of a matrix are all view patterns with the
786 "same" expression, these patterns are transformed into a single nested
787 case. This includes, for example, adjacent view patterns that line up
788 in a tuple, as in
789 <programlisting>
790 f ((view -> A, p1), p2) = e1
791 f ((view -> B, p3), p4) = e2
792 </programlisting>
793 </para>
794
795 <para> The current notion of when two view pattern expressions are "the
796 same" is very restricted: it is not even full syntactic equality.
797 However, it does include variables, literals, applications, and tuples;
798 e.g., two instances of <literal>view ("hi", "there")</literal> will be
799 collected. However, the current implementation does not compare up to
800 alpha-equivalence, so two instances of <literal>(x, view x ->
801 y)</literal> will not be coalesced.
802 </para>
803
804 </listitem>
805
806 </itemizedlist>
807 </para>
808
809 </sect2>
810
811 <!-- ===================== n+k patterns =================== -->
812
813 <sect2 id="n-k-patterns">
814 <title>n+k patterns</title>
815 <indexterm><primary><option>-XNPlusKPatterns</option></primary></indexterm>
816
817 <para>
818 <literal>n+k</literal> pattern support is disabled by default. To enable
819 it, you can use the <option>-XNPlusKPatterns</option> flag.
820 </para>
821
822 </sect2>
823
824 <!-- ===================== Traditional record syntax =================== -->
825
826 <sect2 id="traditional-record-syntax">
827 <title>Traditional record syntax</title>
828 <indexterm><primary><option>-XNoTraditionalRecordSyntax</option></primary></indexterm>
829
830 <para>
831 Traditional record syntax, such as <literal>C {f = x}</literal>, is enabled by default.
832 To disable it, you can use the <option>-XNoTraditionalRecordSyntax</option> flag.
833 </para>
834
835 </sect2>
836
837 <!-- ===================== Recursive do-notation =================== -->
838
839 <sect2 id="recursive-do-notation">
840 <title>The recursive do-notation
841 </title>
842
843 <para>
844 The do-notation of Haskell 98 does not allow <emphasis>recursive bindings</emphasis>,
845 that is, the variables bound in a do-expression are visible only in the textually following
846 code block. Compare this to a let-expression, where bound variables are visible in the entire binding
847 group.
848 </para>
849
850 <para>
851 It turns out that such recursive bindings do indeed make sense for a variety of monads, but
852 not all. In particular, recursion in this sense requires a fixed-point operator for the underlying
853 monad, captured by the <literal>mfix</literal> method of the <literal>MonadFix</literal> class, defined in <literal>Control.Monad.Fix</literal> as follows:
854 <programlisting>
855 class Monad m => MonadFix m where
856 mfix :: (a -> m a) -> m a
857 </programlisting>
858 Haskell's
859 <literal>Maybe</literal>, <literal>[]</literal> (list), <literal>ST</literal> (both strict and lazy versions),
860 <literal>IO</literal>, and many other monads have <literal>MonadFix</literal> instances. On the negative
861 side, the continuation monad, with the signature <literal>(a -> r) -> r</literal>, does not.
862 </para>
863
864 <para>
865 For monads that do belong to the <literal>MonadFix</literal> class, GHC provides
866 an extended version of the do-notation that allows recursive bindings.
867 The <option>-XRecursiveDo</option> (language pragma: <literal>RecursiveDo</literal>)
868 provides the necessary syntactic support, introducing the keywords <literal>mdo</literal> and
869 <literal>rec</literal> for higher and lower levels of the notation respectively. Unlike
870 bindings in a <literal>do</literal> expression, those introduced by <literal>mdo</literal> and <literal>rec</literal>
871 are recursively defined, much like in an ordinary let-expression. Due to the new
872 keyword <literal>mdo</literal>, we also call this notation the <emphasis>mdo-notation</emphasis>.
873 </para>
874
875 <para>
876 Here is a simple (albeit contrived) example:
877 <programlisting>
878 {-# LANGUAGE RecursiveDo #-}
879 justOnes = mdo { xs &lt;- Just (1:xs)
880 ; return (map negate xs) }
881 </programlisting>
882 or equivalently
883 <programlisting>
884 {-# LANGUAGE RecursiveDo #-}
885 justOnes = do { rec { xs &lt;- Just (1:xs) }
886 ; return (map negate xs) }
887 </programlisting>
888 As you can guess <literal>justOnes</literal> will evaluate to <literal>Just [-1,-1,-1,...</literal>.
889 </para>
890
891 <para>
892 GHC's implementation the mdo-notation closely follows the original translation as described in the paper
893 <ulink url="https://sites.google.com/site/leventerkok/recdo.pdf">A recursive do for Haskell</ulink>, which
894 in turn is based on the work <ulink url="http://sites.google.com/site/leventerkok/erkok-thesis.pdf">Value Recursion
895 in Monadic Computations</ulink>. Furthermore, GHC extends the syntax described in the former paper
896 with a lower level syntax flagged by the <literal>rec</literal> keyword, as we describe next.
897 </para>
898
899 <sect3>
900 <title>Recursive binding groups</title>
901
902 <para>
903 The flag <option>-XRecursiveDo</option> also introduces a new keyword <literal>rec</literal>, which wraps a
904 mutually-recursive group of monadic statements inside a <literal>do</literal> expression, producing a single statement.
905 Similar to a <literal>let</literal> statement inside a <literal>do</literal>, variables bound in
906 the <literal>rec</literal> are visible throughout the <literal>rec</literal> group, and below it. For example, compare
907 <programlisting>
908 do { a &lt;- getChar do { a &lt;- getChar
909 ; let { r1 = f a r2 ; rec { r1 &lt;- f a r2
910 ; ; r2 = g r1 } ; ; r2 &lt;- g r1 }
911 ; return (r1 ++ r2) } ; return (r1 ++ r2) }
912 </programlisting>
913 In both cases, <literal>r1</literal> and <literal>r2</literal> are available both throughout
914 the <literal>let</literal> or <literal>rec</literal> block, and in the statements that follow it.
915 The difference is that <literal>let</literal> is non-monadic, while <literal>rec</literal> is monadic.
916 (In Haskell <literal>let</literal> is really <literal>letrec</literal>, of course.)
917 </para>
918
919 <para>
920 The semantics of <literal>rec</literal> is fairly straightforward. Whenever GHC finds a <literal>rec</literal>
921 group, it will compute its set of bound variables, and will introduce an appropriate call
922 to the underlying monadic value-recursion operator <literal>mfix</literal>, belonging to the
923 <literal>MonadFix</literal> class. Here is an example:
924 <programlisting>
925 rec { b &lt;- f a c ===> (b,c) &lt;- mfix (\~(b,c) -> do { b &lt;- f a c
926 ; c &lt;- f b a } ; c &lt;- f b a
927 ; return (b,c) })
928 </programlisting>
929 As usual, the meta-variables <literal>b</literal>, <literal>c</literal> etc., can be arbitrary patterns.
930 In general, the statement <literal>rec <replaceable>ss</replaceable></literal> is desugared to the statement
931 <programlisting>
932 <replaceable>vs</replaceable> &lt;- mfix (\~<replaceable>vs</replaceable> -&gt; do { <replaceable>ss</replaceable>; return <replaceable>vs</replaceable> })
933 </programlisting>
934 where <replaceable>vs</replaceable> is a tuple of the variables bound by <replaceable>ss</replaceable>.
935 </para>
936
937 <para>
938 Note in particular that the translation for a <literal>rec</literal> block only involves wrapping a call
939 to <literal>mfix</literal>: it performs no other analysis on the bindings. The latter is the task
940 for the <literal>mdo</literal> notation, which is described next.
941 </para>
942 </sect3>
943
944 <sect3>
945 <title>The <literal>mdo</literal> notation</title>
946
947 <para>
948 A <literal>rec</literal>-block tells the compiler where precisely the recursive knot should be tied. It turns out that
949 the placement of the recursive knots can be rather delicate: in particular, we would like the knots to be wrapped
950 around as minimal groups as possible. This process is known as <emphasis>segmentation</emphasis>, and is described
951 in detail in Secton 3.2 of <ulink url="https://sites.google.com/site/leventerkok/recdo.pdf">A recursive do for
952 Haskell</ulink>. Segmentation improves polymorphism and reduces the size of the recursive knot. Most importantly, it avoids
953 unnecessary interference caused by a fundamental issue with the so-called <emphasis>right-shrinking</emphasis>
954 axiom for monadic recursion. In brief, most monads of interest (IO, strict state, etc.) do <emphasis>not</emphasis>
955 have recursion operators that satisfy this axiom, and thus not performing segmentation can cause unnecessary
956 interference, changing the termination behavior of the resulting translation.
957 (Details can be found in Sections 3.1 and 7.2.2 of
958 <ulink url="http://sites.google.com/site/leventerkok/erkok-thesis.pdf">Value Recursion in Monadic Computations</ulink>.)
959 </para>
960
961 <para>
962 The <literal>mdo</literal> notation removes the burden of placing
963 explicit <literal>rec</literal> blocks in the code. Unlike an
964 ordinary <literal>do</literal> expression, in which variables bound by
965 statements are only in scope for later statements, variables bound in
966 an <literal>mdo</literal> expression are in scope for all statements
967 of the expression. The compiler then automatically identifies minimal
968 mutually recursively dependent segments of statements, treating them as
969 if the user had wrapped a <literal>rec</literal> qualifier around them.
970 </para>
971
972 <para>
973 The definition is syntactic:
974 </para>
975 <itemizedlist>
976 <listitem>
977 <para>
978 A generator <replaceable>g</replaceable>
979 <emphasis>depends</emphasis> on a textually following generator
980 <replaceable>g'</replaceable>, if
981 </para>
982 <itemizedlist>
983 <listitem>
984 <para>
985 <replaceable>g'</replaceable> defines a variable that
986 is used by <replaceable>g</replaceable>, or
987 </para>
988 </listitem>
989 <listitem>
990 <para>
991 <replaceable>g'</replaceable> textually appears between
992 <replaceable>g</replaceable> and
993 <replaceable>g''</replaceable>, where <replaceable>g</replaceable>
994 depends on <replaceable>g''</replaceable>.
995 </para>
996 </listitem>
997 </itemizedlist>
998 </listitem>
999 <listitem>
1000 <para>
1001 A <emphasis>segment</emphasis> of a given
1002 <literal>mdo</literal>-expression is a minimal sequence of generators
1003 such that no generator of the sequence depends on an outside
1004 generator. As a special case, although it is not a generator,
1005 the final expression in an <literal>mdo</literal>-expression is
1006 considered to form a segment by itself.
1007 </para>
1008 </listitem>
1009 </itemizedlist>
1010 <para>
1011 Segments in this sense are
1012 related to <emphasis>strongly-connected components</emphasis> analysis,
1013 with the exception that bindings in a segment cannot be reordered and
1014 must be contiguous.
1015 </para>
1016
1017 <para>
1018 Here is an example <literal>mdo</literal>-expression, and its translation to <literal>rec</literal> blocks:
1019 <programlisting>
1020 mdo { a &lt;- getChar ===> do { a &lt;- getChar
1021 ; b &lt;- f a c ; rec { b &lt;- f a c
1022 ; c &lt;- f b a ; ; c &lt;- f b a }
1023 ; z &lt;- h a b ; z &lt;- h a b
1024 ; d &lt;- g d e ; rec { d &lt;- g d e
1025 ; e &lt;- g a z ; ; e &lt;- g a z }
1026 ; putChar c } ; putChar c }
1027 </programlisting>
1028 Note that a given <literal>mdo</literal> expression can cause the creation of multiple <literal>rec</literal> blocks.
1029 If there are no recursive dependencies, <literal>mdo</literal> will introduce no <literal>rec</literal> blocks. In this
1030 latter case an <literal>mdo</literal> expression is precisely the same as a <literal>do</literal> expression, as one
1031 would expect.
1032 </para>
1033
1034 <para>
1035 In summary, given an <literal>mdo</literal> expression, GHC first performs segmentation, introducing
1036 <literal>rec</literal> blocks to wrap over minimal recursive groups. Then, each resulting
1037 <literal>rec</literal> is desugared, using a call to <literal>Control.Monad.Fix.mfix</literal> as described
1038 in the previous section. The original <literal>mdo</literal>-expression typechecks exactly when the desugared
1039 version would do so.
1040 </para>
1041
1042 <para>
1043 Here are some other important points in using the recursive-do notation:
1044
1045 <itemizedlist>
1046 <listitem>
1047 <para>
1048 It is enabled with the flag <literal>-XRecursiveDo</literal>, or the <literal>LANGUAGE RecursiveDo</literal>
1049 pragma. (The same flag enables both <literal>mdo</literal>-notation, and the use of <literal>rec</literal>
1050 blocks inside <literal>do</literal> expressions.)
1051 </para>
1052 </listitem>
1053 <listitem>
1054 <para>
1055 <literal>rec</literal> blocks can also be used inside <literal>mdo</literal>-expressions, which will be
1056 treated as a single statement. However, it is good style to either use <literal>mdo</literal> or
1057 <literal>rec</literal> blocks in a single expression.
1058 </para>
1059 </listitem>
1060 <listitem>
1061 <para>
1062 If recursive bindings are required for a monad, then that monad must be declared an instance of
1063 the <literal>MonadFix</literal> class.
1064 </para>
1065 </listitem>
1066 <listitem>
1067 <para>
1068 The following instances of <literal>MonadFix</literal> are automatically provided: List, Maybe, IO.
1069 Furthermore, the <literal>Control.Monad.ST</literal> and <literal>Control.Monad.ST.Lazy</literal>
1070 modules provide the instances of the <literal>MonadFix</literal> class for Haskell's internal
1071 state monad (strict and lazy, respectively).
1072 </para>
1073 </listitem>
1074 <listitem>
1075 <para>
1076 Like <literal>let</literal> and <literal>where</literal> bindings, name shadowing is not allowed within
1077 an <literal>mdo</literal>-expression or a <literal>rec</literal>-block; that is, all the names bound in
1078 a single <literal>rec</literal> must be distinct. (GHC will complain if this is not the case.)
1079 </para>
1080 </listitem>
1081 </itemizedlist>
1082 </para>
1083 </sect3>
1084
1085
1086 </sect2>
1087
1088
1089 <!-- ===================== PARALLEL LIST COMPREHENSIONS =================== -->
1090
1091 <sect2 id="parallel-list-comprehensions">
1092 <title>Parallel List Comprehensions</title>
1093 <indexterm><primary>list comprehensions</primary><secondary>parallel</secondary>
1094 </indexterm>
1095 <indexterm><primary>parallel list comprehensions</primary>
1096 </indexterm>
1097
1098 <para>Parallel list comprehensions are a natural extension to list
1099 comprehensions. List comprehensions can be thought of as a nice
1100 syntax for writing maps and filters. Parallel comprehensions
1101 extend this to include the zipWith family.</para>
1102
1103 <para>A parallel list comprehension has multiple independent
1104 branches of qualifier lists, each separated by a `|' symbol. For
1105 example, the following zips together two lists:</para>
1106
1107 <programlisting>
1108 [ (x, y) | x &lt;- xs | y &lt;- ys ]
1109 </programlisting>
1110
1111 <para>The behaviour of parallel list comprehensions follows that of
1112 zip, in that the resulting list will have the same length as the
1113 shortest branch.</para>
1114
1115 <para>We can define parallel list comprehensions by translation to
1116 regular comprehensions. Here's the basic idea:</para>
1117
1118 <para>Given a parallel comprehension of the form: </para>
1119
1120 <programlisting>
1121 [ e | p1 &lt;- e11, p2 &lt;- e12, ...
1122 | q1 &lt;- e21, q2 &lt;- e22, ...
1123 ...
1124 ]
1125 </programlisting>
1126
1127 <para>This will be translated to: </para>
1128
1129 <programlisting>
1130 [ e | ((p1,p2), (q1,q2), ...) &lt;- zipN [(p1,p2) | p1 &lt;- e11, p2 &lt;- e12, ...]
1131 [(q1,q2) | q1 &lt;- e21, q2 &lt;- e22, ...]
1132 ...
1133 ]
1134 </programlisting>
1135
1136 <para>where `zipN' is the appropriate zip for the given number of
1137 branches.</para>
1138
1139 </sect2>
1140
1141 <!-- ===================== TRANSFORM LIST COMPREHENSIONS =================== -->
1142
1143 <sect2 id="generalised-list-comprehensions">
1144 <title>Generalised (SQL-Like) List Comprehensions</title>
1145 <indexterm><primary>list comprehensions</primary><secondary>generalised</secondary>
1146 </indexterm>
1147 <indexterm><primary>extended list comprehensions</primary>
1148 </indexterm>
1149 <indexterm><primary>group</primary></indexterm>
1150 <indexterm><primary>sql</primary></indexterm>
1151
1152
1153 <para>Generalised list comprehensions are a further enhancement to the
1154 list comprehension syntactic sugar to allow operations such as sorting
1155 and grouping which are familiar from SQL. They are fully described in the
1156 paper <ulink url="http://research.microsoft.com/~simonpj/papers/list-comp">
1157 Comprehensive comprehensions: comprehensions with "order by" and "group by"</ulink>,
1158 except that the syntax we use differs slightly from the paper.</para>
1159 <para>The extension is enabled with the flag <option>-XTransformListComp</option>.</para>
1160 <para>Here is an example:
1161 <programlisting>
1162 employees = [ ("Simon", "MS", 80)
1163 , ("Erik", "MS", 100)
1164 , ("Phil", "Ed", 40)
1165 , ("Gordon", "Ed", 45)
1166 , ("Paul", "Yale", 60)]
1167
1168 output = [ (the dept, sum salary)
1169 | (name, dept, salary) &lt;- employees
1170 , then group by dept using groupWith
1171 , then sortWith by (sum salary)
1172 , then take 5 ]
1173 </programlisting>
1174 In this example, the list <literal>output</literal> would take on
1175 the value:
1176
1177 <programlisting>
1178 [("Yale", 60), ("Ed", 85), ("MS", 180)]
1179 </programlisting>
1180 </para>
1181 <para>There are three new keywords: <literal>group</literal>, <literal>by</literal>, and <literal>using</literal>.
1182 (The functions <literal>sortWith</literal> and <literal>groupWith</literal> are not keywords; they are ordinary
1183 functions that are exported by <literal>GHC.Exts</literal>.)</para>
1184
1185 <para>There are five new forms of comprehension qualifier,
1186 all introduced by the (existing) keyword <literal>then</literal>:
1187 <itemizedlist>
1188 <listitem>
1189
1190 <programlisting>
1191 then f
1192 </programlisting>
1193
1194 This statement requires that <literal>f</literal> have the type <literal>
1195 forall a. [a] -> [a]</literal>. You can see an example of its use in the
1196 motivating example, as this form is used to apply <literal>take 5</literal>.
1197
1198 </listitem>
1199
1200
1201 <listitem>
1202 <para>
1203 <programlisting>
1204 then f by e
1205 </programlisting>
1206
1207 This form is similar to the previous one, but allows you to create a function
1208 which will be passed as the first argument to f. As a consequence f must have
1209 the type <literal>forall a. (a -> t) -> [a] -> [a]</literal>. As you can see
1210 from the type, this function lets f &quot;project out&quot; some information
1211 from the elements of the list it is transforming.</para>
1212
1213 <para>An example is shown in the opening example, where <literal>sortWith</literal>
1214 is supplied with a function that lets it find out the <literal>sum salary</literal>
1215 for any item in the list comprehension it transforms.</para>
1216
1217 </listitem>
1218
1219
1220 <listitem>
1221
1222 <programlisting>
1223 then group by e using f
1224 </programlisting>
1225
1226 <para>This is the most general of the grouping-type statements. In this form,
1227 f is required to have type <literal>forall a. (a -> t) -> [a] -> [[a]]</literal>.
1228 As with the <literal>then f by e</literal> case above, the first argument
1229 is a function supplied to f by the compiler which lets it compute e on every
1230 element of the list being transformed. However, unlike the non-grouping case,
1231 f additionally partitions the list into a number of sublists: this means that
1232 at every point after this statement, binders occurring before it in the comprehension
1233 refer to <emphasis>lists</emphasis> of possible values, not single values. To help understand
1234 this, let's look at an example:</para>
1235
1236 <programlisting>
1237 -- This works similarly to groupWith in GHC.Exts, but doesn't sort its input first
1238 groupRuns :: Eq b => (a -> b) -> [a] -> [[a]]
1239 groupRuns f = groupBy (\x y -> f x == f y)
1240
1241 output = [ (the x, y)
1242 | x &lt;- ([1..3] ++ [1..2])
1243 , y &lt;- [4..6]
1244 , then group by x using groupRuns ]
1245 </programlisting>
1246
1247 <para>This results in the variable <literal>output</literal> taking on the value below:</para>
1248
1249 <programlisting>
1250 [(1, [4, 5, 6]), (2, [4, 5, 6]), (3, [4, 5, 6]), (1, [4, 5, 6]), (2, [4, 5, 6])]
1251 </programlisting>
1252
1253 <para>Note that we have used the <literal>the</literal> function to change the type
1254 of x from a list to its original numeric type. The variable y, in contrast, is left
1255 unchanged from the list form introduced by the grouping.</para>
1256
1257 </listitem>
1258
1259 <listitem>
1260
1261 <programlisting>
1262 then group using f
1263 </programlisting>
1264
1265 <para>With this form of the group statement, f is required to simply have the type
1266 <literal>forall a. [a] -> [[a]]</literal>, which will be used to group up the
1267 comprehension so far directly. An example of this form is as follows:</para>
1268
1269 <programlisting>
1270 output = [ x
1271 | y &lt;- [1..5]
1272 , x &lt;- "hello"
1273 , then group using inits]
1274 </programlisting>
1275
1276 <para>This will yield a list containing every prefix of the word "hello" written out 5 times:</para>
1277
1278 <programlisting>
1279 ["","h","he","hel","hell","hello","helloh","hellohe","hellohel","hellohell","hellohello","hellohelloh",...]
1280 </programlisting>
1281
1282 </listitem>
1283 </itemizedlist>
1284 </para>
1285 </sect2>
1286
1287 <!-- ===================== MONAD COMPREHENSIONS ===================== -->
1288
1289 <sect2 id="monad-comprehensions">
1290 <title>Monad comprehensions</title>
1291 <indexterm><primary>monad comprehensions</primary></indexterm>
1292
1293 <para>
1294 Monad comprehensions generalise the list comprehension notation,
1295 including parallel comprehensions
1296 (<xref linkend="parallel-list-comprehensions"/>) and
1297 transform comprehensions (<xref linkend="generalised-list-comprehensions"/>)
1298 to work for any monad.
1299 </para>
1300
1301 <para>Monad comprehensions support:</para>
1302
1303 <itemizedlist>
1304 <listitem>
1305 <para>
1306 Bindings:
1307 </para>
1308
1309 <programlisting>
1310 [ x + y | x &lt;- Just 1, y &lt;- Just 2 ]
1311 </programlisting>
1312
1313 <para>
1314 Bindings are translated with the <literal>(&gt;&gt;=)</literal> and
1315 <literal>return</literal> functions to the usual do-notation:
1316 </para>
1317
1318 <programlisting>
1319 do x &lt;- Just 1
1320 y &lt;- Just 2
1321 return (x+y)
1322 </programlisting>
1323
1324 </listitem>
1325 <listitem>
1326 <para>
1327 Guards:
1328 </para>
1329
1330 <programlisting>
1331 [ x | x &lt;- [1..10], x &lt;= 5 ]
1332 </programlisting>
1333
1334 <para>
1335 Guards are translated with the <literal>guard</literal> function,
1336 which requires a <literal>MonadPlus</literal> instance:
1337 </para>
1338
1339 <programlisting>
1340 do x &lt;- [1..10]
1341 guard (x &lt;= 5)
1342 return x
1343 </programlisting>
1344
1345 </listitem>
1346 <listitem>
1347 <para>
1348 Transform statements (as with <literal>-XTransformListComp</literal>):
1349 </para>
1350
1351 <programlisting>
1352 [ x+y | x &lt;- [1..10], y &lt;- [1..x], then take 2 ]
1353 </programlisting>
1354
1355 <para>
1356 This translates to:
1357 </para>
1358
1359 <programlisting>
1360 do (x,y) &lt;- take 2 (do x &lt;- [1..10]
1361 y &lt;- [1..x]
1362 return (x,y))
1363 return (x+y)
1364 </programlisting>
1365
1366 </listitem>
1367 <listitem>
1368 <para>
1369 Group statements (as with <literal>-XTransformListComp</literal>):
1370 </para>
1371
1372 <programlisting>
1373 [ x | x &lt;- [1,1,2,2,3], then group by x using GHC.Exts.groupWith ]
1374 [ x | x &lt;- [1,1,2,2,3], then group using myGroup ]
1375 </programlisting>
1376
1377 </listitem>
1378 <listitem>
1379 <para>
1380 Parallel statements (as with <literal>-XParallelListComp</literal>):
1381 </para>
1382
1383 <programlisting>
1384 [ (x+y) | x &lt;- [1..10]
1385 | y &lt;- [11..20]
1386 ]
1387 </programlisting>
1388
1389 <para>
1390 Parallel statements are translated using the
1391 <literal>mzip</literal> function, which requires a
1392 <literal>MonadZip</literal> instance defined in
1393 <ulink url="&libraryBaseLocation;/Control-Monad-Zip.html"><literal>Control.Monad.Zip</literal></ulink>:
1394 </para>
1395
1396 <programlisting>
1397 do (x,y) &lt;- mzip (do x &lt;- [1..10]
1398 return x)
1399 (do y &lt;- [11..20]
1400 return y)
1401 return (x+y)
1402 </programlisting>
1403
1404 </listitem>
1405 </itemizedlist>
1406
1407 <para>
1408 All these features are enabled by default if the
1409 <literal>MonadComprehensions</literal> extension is enabled. The types
1410 and more detailed examples on how to use comprehensions are explained
1411 in the previous chapters <xref
1412 linkend="generalised-list-comprehensions"/> and <xref
1413 linkend="parallel-list-comprehensions"/>. In general you just have
1414 to replace the type <literal>[a]</literal> with the type
1415 <literal>Monad m => m a</literal> for monad comprehensions.
1416 </para>
1417
1418 <para>
1419 Note: Even though most of these examples are using the list monad,
1420 monad comprehensions work for any monad.
1421 The <literal>base</literal> package offers all necessary instances for
1422 lists, which make <literal>MonadComprehensions</literal> backward
1423 compatible to built-in, transform and parallel list comprehensions.
1424 </para>
1425 <para> More formally, the desugaring is as follows. We write <literal>D[ e | Q]</literal>
1426 to mean the desugaring of the monad comprehension <literal>[ e | Q]</literal>:
1427 <programlisting>
1428 Expressions: e
1429 Declarations: d
1430 Lists of qualifiers: Q,R,S
1431
1432 -- Basic forms
1433 D[ e | ] = return e
1434 D[ e | p &lt;- e, Q ] = e &gt;&gt;= \p -&gt; D[ e | Q ]
1435 D[ e | e, Q ] = guard e &gt;&gt; \p -&gt; D[ e | Q ]
1436 D[ e | let d, Q ] = let d in D[ e | Q ]
1437
1438 -- Parallel comprehensions (iterate for multiple parallel branches)
1439 D[ e | (Q | R), S ] = mzip D[ Qv | Q ] D[ Rv | R ] &gt;&gt;= \(Qv,Rv) -&gt; D[ e | S ]
1440
1441 -- Transform comprehensions
1442 D[ e | Q then f, R ] = f D[ Qv | Q ] &gt;&gt;= \Qv -&gt; D[ e | R ]
1443
1444 D[ e | Q then f by b, R ] = f (\Qv -&gt; b) D[ Qv | Q ] &gt;&gt;= \Qv -&gt; D[ e | R ]
1445
1446 D[ e | Q then group using f, R ] = f D[ Qv | Q ] &gt;&gt;= \ys -&gt;
1447 case (fmap selQv1 ys, ..., fmap selQvn ys) of
1448 Qv -&gt; D[ e | R ]
1449
1450 D[ e | Q then group by b using f, R ] = f (\Qv -&gt; b) D[ Qv | Q ] &gt;&gt;= \ys -&gt;
1451 case (fmap selQv1 ys, ..., fmap selQvn ys) of
1452 Qv -&gt; D[ e | R ]
1453
1454 where Qv is the tuple of variables bound by Q (and used subsequently)
1455 selQvi is a selector mapping Qv to the ith component of Qv
1456
1457 Operator Standard binding Expected type
1458 --------------------------------------------------------------------
1459 return GHC.Base t1 -&gt; m t2
1460 (&gt;&gt;=) GHC.Base m1 t1 -&gt; (t2 -&gt; m2 t3) -&gt; m3 t3
1461 (&gt;&gt;) GHC.Base m1 t1 -&gt; m2 t2 -&gt; m3 t3
1462 guard Control.Monad t1 -&gt; m t2
1463 fmap GHC.Base forall a b. (a-&gt;b) -&gt; n a -&gt; n b
1464 mzip Control.Monad.Zip forall a b. m a -&gt; m b -&gt; m (a,b)
1465 </programlisting>
1466 The comprehension should typecheck when its desugaring would typecheck.
1467 </para>
1468 <para>
1469 Monad comprehensions support rebindable syntax (<xref linkend="rebindable-syntax"/>).
1470 Without rebindable
1471 syntax, the operators from the "standard binding" module are used; with
1472 rebindable syntax, the operators are looked up in the current lexical scope.
1473 For example, parallel comprehensions will be typechecked and desugared
1474 using whatever "<literal>mzip</literal>" is in scope.
1475 </para>
1476 <para>
1477 The rebindable operators must have the "Expected type" given in the
1478 table above. These types are surprisingly general. For example, you can
1479 use a bind operator with the type
1480 <programlisting>
1481 (>>=) :: T x y a -> (a -> T y z b) -> T x z b
1482 </programlisting>
1483 In the case of transform comprehensions, notice that the groups are
1484 parameterised over some arbitrary type <literal>n</literal> (provided it
1485 has an <literal>fmap</literal>, as well as
1486 the comprehension being over an arbitrary monad.
1487 </para>
1488 </sect2>
1489
1490 <!-- ===================== REBINDABLE SYNTAX =================== -->
1491
1492 <sect2 id="rebindable-syntax">
1493 <title>Rebindable syntax and the implicit Prelude import</title>
1494
1495 <para><indexterm><primary>-XNoImplicitPrelude
1496 option</primary></indexterm> GHC normally imports
1497 <filename>Prelude.hi</filename> files for you. If you'd
1498 rather it didn't, then give it a
1499 <option>-XNoImplicitPrelude</option> option. The idea is
1500 that you can then import a Prelude of your own. (But don't
1501 call it <literal>Prelude</literal>; the Haskell module
1502 namespace is flat, and you must not conflict with any
1503 Prelude module.)</para>
1504
1505 <para>Suppose you are importing a Prelude of your own
1506 in order to define your own numeric class
1507 hierarchy. It completely defeats that purpose if the
1508 literal "1" means "<literal>Prelude.fromInteger
1509 1</literal>", which is what the Haskell Report specifies.
1510 So the <option>-XRebindableSyntax</option>
1511 flag causes
1512 the following pieces of built-in syntax to refer to
1513 <emphasis>whatever is in scope</emphasis>, not the Prelude
1514 versions:
1515 <itemizedlist>
1516 <listitem>
1517 <para>An integer literal <literal>368</literal> means
1518 "<literal>fromInteger (368::Integer)</literal>", rather than
1519 "<literal>Prelude.fromInteger (368::Integer)</literal>".
1520 </para> </listitem>
1521
1522 <listitem><para>Fractional literals are handed in just the same way,
1523 except that the translation is
1524 <literal>fromRational (3.68::Rational)</literal>.
1525 </para> </listitem>
1526
1527 <listitem><para>The equality test in an overloaded numeric pattern
1528 uses whatever <literal>(==)</literal> is in scope.
1529 </para> </listitem>
1530
1531 <listitem><para>The subtraction operation, and the
1532 greater-than-or-equal test, in <literal>n+k</literal> patterns
1533 use whatever <literal>(-)</literal> and <literal>(>=)</literal> are in scope.
1534 </para></listitem>
1535
1536 <listitem>
1537 <para>Negation (e.g. "<literal>- (f x)</literal>")
1538 means "<literal>negate (f x)</literal>", both in numeric
1539 patterns, and expressions.
1540 </para></listitem>
1541
1542 <listitem>
1543 <para>Conditionals (e.g. "<literal>if</literal> e1 <literal>then</literal> e2 <literal>else</literal> e3")
1544 means "<literal>ifThenElse</literal> e1 e2 e3". However <literal>case</literal> expressions are unaffected.
1545 </para></listitem>
1546
1547 <listitem>
1548 <para>"Do" notation is translated using whatever
1549 functions <literal>(>>=)</literal>,
1550 <literal>(>>)</literal>, and <literal>fail</literal>,
1551 are in scope (not the Prelude
1552 versions). List comprehensions, mdo (<xref linkend="recursive-do-notation"/>), and parallel array
1553 comprehensions, are unaffected. </para></listitem>
1554
1555 <listitem>
1556 <para>Arrow
1557 notation (see <xref linkend="arrow-notation"/>)
1558 uses whatever <literal>arr</literal>,
1559 <literal>(>>>)</literal>, <literal>first</literal>,
1560 <literal>app</literal>, <literal>(|||)</literal> and
1561 <literal>loop</literal> functions are in scope. But unlike the
1562 other constructs, the types of these functions must match the
1563 Prelude types very closely. Details are in flux; if you want
1564 to use this, ask!
1565 </para></listitem>
1566 </itemizedlist>
1567 <option>-XRebindableSyntax</option> implies <option>-XNoImplicitPrelude</option>.
1568 </para>
1569 <para>
1570 In all cases (apart from arrow notation), the static semantics should be that of the desugared form,
1571 even if that is a little unexpected. For example, the
1572 static semantics of the literal <literal>368</literal>
1573 is exactly that of <literal>fromInteger (368::Integer)</literal>; it's fine for
1574 <literal>fromInteger</literal> to have any of the types:
1575 <programlisting>
1576 fromInteger :: Integer -> Integer
1577 fromInteger :: forall a. Foo a => Integer -> a
1578 fromInteger :: Num a => a -> Integer
1579 fromInteger :: Integer -> Bool -> Bool
1580 </programlisting>
1581 </para>
1582
1583 <para>Be warned: this is an experimental facility, with
1584 fewer checks than usual. Use <literal>-dcore-lint</literal>
1585 to typecheck the desugared program. If Core Lint is happy
1586 you should be all right.</para>
1587
1588 </sect2>
1589
1590 <sect2 id="postfix-operators">
1591 <title>Postfix operators</title>
1592
1593 <para>
1594 The <option>-XPostfixOperators</option> flag enables a small
1595 extension to the syntax of left operator sections, which allows you to
1596 define postfix operators. The extension is this: the left section
1597 <programlisting>
1598 (e !)
1599 </programlisting>
1600 is equivalent (from the point of view of both type checking and execution) to the expression
1601 <programlisting>
1602 ((!) e)
1603 </programlisting>
1604 (for any expression <literal>e</literal> and operator <literal>(!)</literal>.
1605 The strict Haskell 98 interpretation is that the section is equivalent to
1606 <programlisting>
1607 (\y -> (!) e y)
1608 </programlisting>
1609 That is, the operator must be a function of two arguments. GHC allows it to
1610 take only one argument, and that in turn allows you to write the function
1611 postfix.
1612 </para>
1613 <para>The extension does not extend to the left-hand side of function
1614 definitions; you must define such a function in prefix form.</para>
1615
1616 </sect2>
1617
1618 <sect2 id="tuple-sections">
1619 <title>Tuple sections</title>
1620
1621 <para>
1622 The <option>-XTupleSections</option> flag enables Python-style partially applied
1623 tuple constructors. For example, the following program
1624 <programlisting>
1625 (, True)
1626 </programlisting>
1627 is considered to be an alternative notation for the more unwieldy alternative
1628 <programlisting>
1629 \x -> (x, True)
1630 </programlisting>
1631 You can omit any combination of arguments to the tuple, as in the following
1632 <programlisting>
1633 (, "I", , , "Love", , 1337)
1634 </programlisting>
1635 which translates to
1636 <programlisting>
1637 \a b c d -> (a, "I", b, c, "Love", d, 1337)
1638 </programlisting>
1639 </para>
1640
1641 <para>
1642 If you have <link linkend="unboxed-tuples">unboxed tuples</link> enabled, tuple sections
1643 will also be available for them, like so
1644 <programlisting>
1645 (# , True #)
1646 </programlisting>
1647 Because there is no unboxed unit tuple, the following expression
1648 <programlisting>
1649 (# #)
1650 </programlisting>
1651 continues to stand for the unboxed singleton tuple data constructor.
1652 </para>
1653
1654 </sect2>
1655
1656 <sect2 id="lambda-case">
1657 <title>Lambda-case</title>
1658 <para>
1659 The <option>-XLambdaCase</option> flag enables expressions of the form
1660 <programlisting>
1661 \case { p1 -> e1; ...; pN -> eN }
1662 </programlisting>
1663 which is equivalent to
1664 <programlisting>
1665 \freshName -> case freshName of { p1 -> e1; ...; pN -> eN }
1666 </programlisting>
1667 Note that <literal>\case</literal> starts a layout, so you can write
1668 <programlisting>
1669 \case
1670 p1 -> e1
1671 ...
1672 pN -> eN
1673 </programlisting>
1674 </para>
1675 </sect2>
1676
1677 <sect2 id="empty-case">
1678 <title>Empty case alternatives</title>
1679 <para>
1680 The <option>-XEmptyCase</option> flag enables
1681 case expressions, or lambda-case expressions, that have no alternatives,
1682 thus:
1683 <programlisting>
1684 case e of { } -- No alternatives
1685 or
1686 \case { } -- -XLambdaCase is also required
1687 </programlisting>
1688 This can be useful when you know that the expression being scrutinised
1689 has no non-bottom values. For example:
1690 <programlisting>
1691 data Void
1692 f :: Void -> Int
1693 f x = case x of { }
1694 </programlisting>
1695 With dependently-typed features it is more useful
1696 (see <ulink url="http://hackage.haskell.org/trac/ghc/ticket/2431">Trac</ulink>).
1697 For example, consider these two candidate definitions of <literal>absurd</literal>:
1698 <programlisting>
1699 data a :==: b where
1700 Refl :: a :==: a
1701
1702 absurd :: True :~: False -> a
1703 absurd x = error "absurd" -- (A)
1704 absurd x = case x of {} -- (B)
1705 </programlisting>
1706 We much prefer (B). Why? Because GHC can figure out that <literal>(True :~: False)</literal>
1707 is an empty type. So (B) has no partiality and GHC should be able to compile with
1708 <option>-fwarn-incomplete-patterns</option>. (Though the pattern match checking is not
1709 yet clever enough to do that.
1710 On the other hand (A) looks dangerous, and GHC doesn't check to make
1711 sure that, in fact, the function can never get called.
1712 </para>
1713 </sect2>
1714
1715 <sect2 id="multi-way-if">
1716 <title>Multi-way if-expressions</title>
1717 <para>
1718 With <option>-XMultiWayIf</option> flag GHC accepts conditional expressions
1719 with multiple branches:
1720 <programlisting>
1721 if | guard1 -> expr1
1722 | ...
1723 | guardN -> exprN
1724 </programlisting>
1725 which is roughly equivalent to
1726 <programlisting>
1727 case () of
1728 _ | guard1 -> expr1
1729 ...
1730 _ | guardN -> exprN
1731 </programlisting>
1732 except that multi-way if-expressions do not alter the layout.
1733 </para>
1734 </sect2>
1735
1736 <sect2 id="disambiguate-fields">
1737 <title>Record field disambiguation</title>
1738 <para>
1739 In record construction and record pattern matching
1740 it is entirely unambiguous which field is referred to, even if there are two different
1741 data types in scope with a common field name. For example:
1742 <programlisting>
1743 module M where
1744 data S = MkS { x :: Int, y :: Bool }
1745
1746 module Foo where
1747 import M
1748
1749 data T = MkT { x :: Int }
1750
1751 ok1 (MkS { x = n }) = n+1 -- Unambiguous
1752 ok2 n = MkT { x = n+1 } -- Unambiguous
1753
1754 bad1 k = k { x = 3 } -- Ambiguous
1755 bad2 k = x k -- Ambiguous
1756 </programlisting>
1757 Even though there are two <literal>x</literal>'s in scope,
1758 it is clear that the <literal>x</literal> in the pattern in the
1759 definition of <literal>ok1</literal> can only mean the field
1760 <literal>x</literal> from type <literal>S</literal>. Similarly for
1761 the function <literal>ok2</literal>. However, in the record update
1762 in <literal>bad1</literal> and the record selection in <literal>bad2</literal>
1763 it is not clear which of the two types is intended.
1764 </para>
1765 <para>
1766 Haskell 98 regards all four as ambiguous, but with the
1767 <option>-XDisambiguateRecordFields</option> flag, GHC will accept
1768 the former two. The rules are precisely the same as those for instance
1769 declarations in Haskell 98, where the method names on the left-hand side
1770 of the method bindings in an instance declaration refer unambiguously
1771 to the method of that class (provided they are in scope at all), even
1772 if there are other variables in scope with the same name.
1773 This reduces the clutter of qualified names when you import two
1774 records from different modules that use the same field name.
1775 </para>
1776 <para>
1777 Some details:
1778 <itemizedlist>
1779 <listitem><para>
1780 Field disambiguation can be combined with punning (see <xref linkend="record-puns"/>). For example:
1781 <programlisting>
1782 module Foo where
1783 import M
1784 x=True
1785 ok3 (MkS { x }) = x+1 -- Uses both disambiguation and punning
1786 </programlisting>
1787 </para></listitem>
1788
1789 <listitem><para>
1790 With <option>-XDisambiguateRecordFields</option> you can use <emphasis>unqualified</emphasis>
1791 field names even if the corresponding selector is only in scope <emphasis>qualified</emphasis>
1792 For example, assuming the same module <literal>M</literal> as in our earlier example, this is legal:
1793 <programlisting>
1794 module Foo where
1795 import qualified M -- Note qualified
1796
1797 ok4 (M.MkS { x = n }) = n+1 -- Unambiguous
1798 </programlisting>
1799 Since the constructor <literal>MkS</literal> is only in scope qualified, you must
1800 name it <literal>M.MkS</literal>, but the field <literal>x</literal> does not need
1801 to be qualified even though <literal>M.x</literal> is in scope but <literal>x</literal>
1802 is not. (In effect, it is qualified by the constructor.)
1803 </para></listitem>
1804 </itemizedlist>
1805 </para>
1806
1807 </sect2>
1808
1809 <!-- ===================== Record puns =================== -->
1810
1811 <sect2 id="record-puns">
1812 <title>Record puns
1813 </title>
1814
1815 <para>
1816 Record puns are enabled by the flag <literal>-XNamedFieldPuns</literal>.
1817 </para>
1818
1819 <para>
1820 When using records, it is common to write a pattern that binds a
1821 variable with the same name as a record field, such as:
1822
1823 <programlisting>
1824 data C = C {a :: Int}
1825 f (C {a = a}) = a
1826 </programlisting>
1827 </para>
1828
1829 <para>
1830 Record punning permits the variable name to be elided, so one can simply
1831 write
1832
1833 <programlisting>
1834 f (C {a}) = a
1835 </programlisting>
1836
1837 to mean the same pattern as above. That is, in a record pattern, the
1838 pattern <literal>a</literal> expands into the pattern <literal>a =
1839 a</literal> for the same name <literal>a</literal>.
1840 </para>
1841
1842 <para>
1843 Note that:
1844 <itemizedlist>
1845 <listitem><para>
1846 Record punning can also be used in an expression, writing, for example,
1847 <programlisting>
1848 let a = 1 in C {a}
1849 </programlisting>
1850 instead of
1851 <programlisting>
1852 let a = 1 in C {a = a}
1853 </programlisting>
1854 The expansion is purely syntactic, so the expanded right-hand side
1855 expression refers to the nearest enclosing variable that is spelled the
1856 same as the field name.
1857 </para></listitem>
1858
1859 <listitem><para>
1860 Puns and other patterns can be mixed in the same record:
1861 <programlisting>
1862 data C = C {a :: Int, b :: Int}
1863 f (C {a, b = 4}) = a
1864 </programlisting>
1865 </para></listitem>
1866
1867 <listitem><para>
1868 Puns can be used wherever record patterns occur (e.g. in
1869 <literal>let</literal> bindings or at the top-level).
1870 </para></listitem>
1871
1872 <listitem><para>
1873 A pun on a qualified field name is expanded by stripping off the module qualifier.
1874 For example:
1875 <programlisting>
1876 f (C {M.a}) = a
1877 </programlisting>
1878 means
1879 <programlisting>
1880 f (M.C {M.a = a}) = a
1881 </programlisting>
1882 (This is useful if the field selector <literal>a</literal> for constructor <literal>M.C</literal>
1883 is only in scope in qualified form.)
1884 </para></listitem>
1885 </itemizedlist>
1886 </para>
1887
1888
1889 </sect2>
1890
1891 <!-- ===================== Record wildcards =================== -->
1892
1893 <sect2 id="record-wildcards">
1894 <title>Record wildcards
1895 </title>
1896
1897 <para>
1898 Record wildcards are enabled by the flag <literal>-XRecordWildCards</literal>.
1899 This flag implies <literal>-XDisambiguateRecordFields</literal>.
1900 </para>
1901
1902 <para>
1903 For records with many fields, it can be tiresome to write out each field
1904 individually in a record pattern, as in
1905 <programlisting>
1906 data C = C {a :: Int, b :: Int, c :: Int, d :: Int}
1907 f (C {a = 1, b = b, c = c, d = d}) = b + c + d
1908 </programlisting>
1909 </para>
1910
1911 <para>
1912 Record wildcard syntax permits a "<literal>..</literal>" in a record
1913 pattern, where each elided field <literal>f</literal> is replaced by the
1914 pattern <literal>f = f</literal>. For example, the above pattern can be
1915 written as
1916 <programlisting>
1917 f (C {a = 1, ..}) = b + c + d
1918 </programlisting>
1919 </para>
1920
1921 <para>
1922 More details:
1923 <itemizedlist>
1924 <listitem><para>
1925 Wildcards can be mixed with other patterns, including puns
1926 (<xref linkend="record-puns"/>); for example, in a pattern <literal>C {a
1927 = 1, b, ..})</literal>. Additionally, record wildcards can be used
1928 wherever record patterns occur, including in <literal>let</literal>
1929 bindings and at the top-level. For example, the top-level binding
1930 <programlisting>
1931 C {a = 1, ..} = e
1932 </programlisting>
1933 defines <literal>b</literal>, <literal>c</literal>, and
1934 <literal>d</literal>.
1935 </para></listitem>
1936
1937 <listitem><para>
1938 Record wildcards can also be used in expressions, writing, for example,
1939 <programlisting>
1940 let {a = 1; b = 2; c = 3; d = 4} in C {..}
1941 </programlisting>
1942 in place of
1943 <programlisting>
1944 let {a = 1; b = 2; c = 3; d = 4} in C {a=a, b=b, c=c, d=d}
1945 </programlisting>
1946 The expansion is purely syntactic, so the record wildcard
1947 expression refers to the nearest enclosing variables that are spelled
1948 the same as the omitted field names.
1949 </para></listitem>
1950
1951 <listitem><para>
1952 The "<literal>..</literal>" expands to the missing
1953 <emphasis>in-scope</emphasis> record fields.
1954 Specifically the expansion of "<literal>C {..}</literal>" includes
1955 <literal>f</literal> if and only if:
1956 <itemizedlist>
1957 <listitem><para>
1958 <literal>f</literal> is a record field of constructor <literal>C</literal>.
1959 </para></listitem>
1960 <listitem><para>
1961 The record field <literal>f</literal> is in scope somehow (either qualified or unqualified).
1962 </para></listitem>
1963 <listitem><para>
1964 In the case of expressions (but not patterns),
1965 the variable <literal>f</literal> is in scope unqualified,
1966 apart from the binding of the record selector itself.
1967 </para></listitem>
1968 </itemizedlist>
1969 For example
1970 <programlisting>
1971 module M where
1972 data R = R { a,b,c :: Int }
1973 module X where
1974 import M( R(a,c) )
1975 f b = R { .. }
1976 </programlisting>
1977 The <literal>R{..}</literal> expands to <literal>R{M.a=a}</literal>,
1978 omitting <literal>b</literal> since the record field is not in scope,
1979 and omitting <literal>c</literal> since the variable <literal>c</literal>
1980 is not in scope (apart from the binding of the
1981 record selector <literal>c</literal>, of course).
1982 </para></listitem>
1983 </itemizedlist>
1984 </para>
1985
1986 </sect2>
1987
1988 <!-- ===================== Local fixity declarations =================== -->
1989
1990 <sect2 id="local-fixity-declarations">
1991 <title>Local Fixity Declarations
1992 </title>
1993
1994 <para>A careful reading of the Haskell 98 Report reveals that fixity
1995 declarations (<literal>infix</literal>, <literal>infixl</literal>, and
1996 <literal>infixr</literal>) are permitted to appear inside local bindings
1997 such those introduced by <literal>let</literal> and
1998 <literal>where</literal>. However, the Haskell Report does not specify
1999 the semantics of such bindings very precisely.
2000 </para>
2001
2002 <para>In GHC, a fixity declaration may accompany a local binding:
2003 <programlisting>
2004 let f = ...
2005 infixr 3 `f`
2006 in
2007 ...
2008 </programlisting>
2009 and the fixity declaration applies wherever the binding is in scope.
2010 For example, in a <literal>let</literal>, it applies in the right-hand
2011 sides of other <literal>let</literal>-bindings and the body of the
2012 <literal>let</literal>C. Or, in recursive <literal>do</literal>
2013 expressions (<xref linkend="recursive-do-notation"/>), the local fixity
2014 declarations of a <literal>let</literal> statement scope over other
2015 statements in the group, just as the bound name does.
2016 </para>
2017
2018 <para>
2019 Moreover, a local fixity declaration *must* accompany a local binding of
2020 that name: it is not possible to revise the fixity of name bound
2021 elsewhere, as in
2022 <programlisting>
2023 let infixr 9 $ in ...
2024 </programlisting>
2025
2026 Because local fixity declarations are technically Haskell 98, no flag is
2027 necessary to enable them.
2028 </para>
2029 </sect2>
2030
2031 <sect2 id="package-imports">
2032 <title>Package-qualified imports</title>
2033
2034 <para>With the <option>-XPackageImports</option> flag, GHC allows
2035 import declarations to be qualified by the package name that the
2036 module is intended to be imported from. For example:</para>
2037
2038 <programlisting>
2039 import "network" Network.Socket
2040 </programlisting>
2041
2042 <para>would import the module <literal>Network.Socket</literal> from
2043 the package <literal>network</literal> (any version). This may
2044 be used to disambiguate an import when the same module is
2045 available from multiple packages, or is present in both the
2046 current package being built and an external package.</para>
2047
2048 <para>The special package name <literal>this</literal> can be used to
2049 refer to the current package being built.</para>
2050
2051 <para>Note: you probably don't need to use this feature, it was
2052 added mainly so that we can build backwards-compatible versions of
2053 packages when APIs change. It can lead to fragile dependencies in
2054 the common case: modules occasionally move from one package to
2055 another, rendering any package-qualified imports broken.</para>
2056 </sect2>
2057
2058 <sect2 id="safe-imports-ext">
2059 <title>Safe imports</title>
2060
2061 <para>With the <option>-XSafe</option>, <option>-XTrustworthy</option>
2062 and <option>-XUnsafe</option> language flags, GHC extends
2063 the import declaration syntax to take an optional <literal>safe</literal>
2064 keyword after the <literal>import</literal> keyword. This feature
2065 is part of the Safe Haskell GHC extension. For example:</para>
2066
2067 <programlisting>
2068 import safe qualified Network.Socket as NS
2069 </programlisting>
2070
2071 <para>would import the module <literal>Network.Socket</literal>
2072 with compilation only succeeding if Network.Socket can be
2073 safely imported. For a description of when a import is
2074 considered safe see <xref linkend="safe-haskell"/></para>
2075
2076 </sect2>
2077
2078 <sect2 id="syntax-stolen">
2079 <title>Summary of stolen syntax</title>
2080
2081 <para>Turning on an option that enables special syntax
2082 <emphasis>might</emphasis> cause working Haskell 98 code to fail
2083 to compile, perhaps because it uses a variable name which has
2084 become a reserved word. This section lists the syntax that is
2085 "stolen" by language extensions.
2086 We use
2087 notation and nonterminal names from the Haskell 98 lexical syntax
2088 (see the Haskell 98 Report).
2089 We only list syntax changes here that might affect
2090 existing working programs (i.e. "stolen" syntax). Many of these
2091 extensions will also enable new context-free syntax, but in all
2092 cases programs written to use the new syntax would not be
2093 compilable without the option enabled.</para>
2094
2095 <para>There are two classes of special
2096 syntax:
2097
2098 <itemizedlist>
2099 <listitem>
2100 <para>New reserved words and symbols: character sequences
2101 which are no longer available for use as identifiers in the
2102 program.</para>
2103 </listitem>
2104 <listitem>
2105 <para>Other special syntax: sequences of characters that have
2106 a different meaning when this particular option is turned
2107 on.</para>
2108 </listitem>
2109 </itemizedlist>
2110
2111 The following syntax is stolen:
2112
2113 <variablelist>
2114 <varlistentry>
2115 <term>
2116 <literal>forall</literal>
2117 <indexterm><primary><literal>forall</literal></primary></indexterm>
2118 </term>
2119 <listitem><para>
2120 Stolen (in types) by: <option>-XExplicitForAll</option>, and hence by
2121 <option>-XScopedTypeVariables</option>,
2122 <option>-XLiberalTypeSynonyms</option>,
2123 <option>-XRankNTypes</option>,
2124 <option>-XExistentialQuantification</option>
2125 </para></listitem>
2126 </varlistentry>
2127
2128 <varlistentry>
2129 <term>
2130 <literal>mdo</literal>
2131 <indexterm><primary><literal>mdo</literal></primary></indexterm>
2132 </term>
2133 <listitem><para>
2134 Stolen by: <option>-XRecursiveDo</option>
2135 </para></listitem>
2136 </varlistentry>
2137
2138 <varlistentry>
2139 <term>
2140 <literal>foreign</literal>
2141 <indexterm><primary><literal>foreign</literal></primary></indexterm>
2142 </term>
2143 <listitem><para>
2144 Stolen by: <option>-XForeignFunctionInterface</option>
2145 </para></listitem>
2146 </varlistentry>
2147
2148 <varlistentry>
2149 <term>
2150 <literal>rec</literal>,
2151 <literal>proc</literal>, <literal>-&lt;</literal>,
2152 <literal>&gt;-</literal>, <literal>-&lt;&lt;</literal>,
2153 <literal>&gt;&gt;-</literal>, and <literal>(|</literal>,
2154 <literal>|)</literal> brackets
2155 <indexterm><primary><literal>proc</literal></primary></indexterm>
2156 </term>
2157 <listitem><para>
2158 Stolen by: <option>-XArrows</option>
2159 </para></listitem>
2160 </varlistentry>
2161
2162 <varlistentry>
2163 <term>
2164 <literal>?<replaceable>varid</replaceable></literal>,
2165 <literal>%<replaceable>varid</replaceable></literal>
2166 <indexterm><primary>implicit parameters</primary></indexterm>
2167 </term>
2168 <listitem><para>
2169 Stolen by: <option>-XImplicitParams</option>
2170 </para></listitem>
2171 </varlistentry>
2172
2173 <varlistentry>
2174 <term>
2175 <literal>[|</literal>,
2176 <literal>[e|</literal>, <literal>[p|</literal>,
2177 <literal>[d|</literal>, <literal>[t|</literal>,
2178 <literal>$(</literal>,
2179 <literal>$<replaceable>varid</replaceable></literal>
2180 <indexterm><primary>Template Haskell</primary></indexterm>
2181 </term>
2182 <listitem><para>
2183 Stolen by: <option>-XTemplateHaskell</option>
2184 </para></listitem>
2185 </varlistentry>
2186
2187 <varlistentry>
2188 <term>
2189 <literal>[:<replaceable>varid</replaceable>|</literal>
2190 <indexterm><primary>quasi-quotation</primary></indexterm>
2191 </term>
2192 <listitem><para>
2193 Stolen by: <option>-XQuasiQuotes</option>
2194 </para></listitem>
2195 </varlistentry>
2196
2197 <varlistentry>
2198 <term>
2199 <replaceable>varid</replaceable>{<literal>&num;</literal>},
2200 <replaceable>char</replaceable><literal>&num;</literal>,
2201 <replaceable>string</replaceable><literal>&num;</literal>,
2202 <replaceable>integer</replaceable><literal>&num;</literal>,
2203 <replaceable>float</replaceable><literal>&num;</literal>,
2204 <replaceable>float</replaceable><literal>&num;&num;</literal>,
2205 <literal>(&num;</literal>, <literal>&num;)</literal>
2206 </term>
2207 <listitem><para>
2208 Stolen by: <option>-XMagicHash</option>
2209 </para></listitem>
2210 </varlistentry>
2211 </variablelist>
2212 </para>
2213 </sect2>
2214 </sect1>
2215
2216
2217 <!-- TYPE SYSTEM EXTENSIONS -->
2218 <sect1 id="data-type-extensions">
2219 <title>Extensions to data types and type synonyms</title>
2220
2221 <sect2 id="nullary-types">
2222 <title>Data types with no constructors</title>
2223
2224 <para>With the <option>-XEmptyDataDecls</option> flag (or equivalent LANGUAGE pragma),
2225 GHC lets you declare a data type with no constructors. For example:</para>
2226
2227 <programlisting>
2228 data S -- S :: *
2229 data T a -- T :: * -> *
2230 </programlisting>
2231
2232 <para>Syntactically, the declaration lacks the "= constrs" part. The
2233 type can be parameterised over types of any kind, but if the kind is
2234 not <literal>*</literal> then an explicit kind annotation must be used
2235 (see <xref linkend="kinding"/>).</para>
2236
2237 <para>Such data types have only one value, namely bottom.
2238 Nevertheless, they can be useful when defining "phantom types".</para>
2239 </sect2>
2240
2241 <sect2 id="datatype-contexts">
2242 <title>Data type contexts</title>
2243
2244 <para>Haskell allows datatypes to be given contexts, e.g.</para>
2245
2246 <programlisting>
2247 data Eq a => Set a = NilSet | ConsSet a (Set a)
2248 </programlisting>
2249
2250 <para>give constructors with types:</para>
2251
2252 <programlisting>
2253 NilSet :: Set a
2254 ConsSet :: Eq a => a -> Set a -> Set a
2255 </programlisting>
2256
2257 <para>This is widely considered a misfeature, and is going to be removed from
2258 the language. In GHC, it is controlled by the deprecated extension
2259 <literal>DatatypeContexts</literal>.</para>
2260 </sect2>
2261
2262 <sect2 id="infix-tycons">
2263 <title>Infix type constructors, classes, and type variables</title>
2264
2265 <para>
2266 GHC allows type constructors, classes, and type variables to be operators, and
2267 to be written infix, very much like expressions. More specifically:
2268 <itemizedlist>
2269 <listitem><para>
2270 A type constructor or class can be an operator, beginning with a colon; e.g. <literal>:*:</literal>.
2271 The lexical syntax is the same as that for data constructors.
2272 </para></listitem>
2273 <listitem><para>
2274 Data type and type-synonym declarations can be written infix, parenthesised
2275 if you want further arguments. E.g.
2276 <screen>
2277 data a :*: b = Foo a b
2278 type a :+: b = Either a b
2279 class a :=: b where ...
2280
2281 data (a :**: b) x = Baz a b x
2282 type (a :++: b) y = Either (a,b) y
2283 </screen>
2284 </para></listitem>
2285 <listitem><para>
2286 Types, and class constraints, can be written infix. For example
2287 <screen>
2288 x :: Int :*: Bool
2289 f :: (a :=: b) => a -> b
2290 </screen>
2291 </para></listitem>
2292 <listitem><para>
2293 A type variable can be an (unqualified) operator e.g. <literal>+</literal>.
2294 The lexical syntax is the same as that for variable operators, excluding "(.)",
2295 "(!)", and "(*)". In a binding position, the operator must be
2296 parenthesised. For example:
2297 <programlisting>
2298 type T (+) = Int + Int
2299 f :: T Either
2300 f = Left 3
2301
2302 liftA2 :: Arrow (~>)
2303 => (a -> b -> c) -> (e ~> a) -> (e ~> b) -> (e ~> c)
2304 liftA2 = ...
2305 </programlisting>
2306 </para></listitem>
2307 <listitem><para>
2308 Back-quotes work
2309 as for expressions, both for type constructors and type variables; e.g. <literal>Int `Either` Bool</literal>, or
2310 <literal>Int `a` Bool</literal>. Similarly, parentheses work the same; e.g. <literal>(:*:) Int Bool</literal>.
2311 </para></listitem>
2312 <listitem><para>
2313 Fixities may be declared for type constructors, or classes, just as for data constructors. However,
2314 one cannot distinguish between the two in a fixity declaration; a fixity declaration
2315 sets the fixity for a data constructor and the corresponding type constructor. For example:
2316 <screen>
2317 infixl 7 T, :*:
2318 </screen>
2319 sets the fixity for both type constructor <literal>T</literal> and data constructor <literal>T</literal>,
2320 and similarly for <literal>:*:</literal>.
2321 <literal>Int `a` Bool</literal>.
2322 </para></listitem>
2323 <listitem><para>
2324 Function arrow is <literal>infixr</literal> with fixity 0. (This might change; I'm not sure what it should be.)
2325 </para></listitem>
2326
2327 </itemizedlist>
2328 </para>
2329 </sect2>
2330
2331 <sect2 id="type-synonyms">
2332 <title>Liberalised type synonyms</title>
2333
2334 <para>
2335 Type synonyms are like macros at the type level, but Haskell 98 imposes many rules
2336 on individual synonym declarations.
2337 With the <option>-XLiberalTypeSynonyms</option> extension,
2338 GHC does validity checking on types <emphasis>only after expanding type synonyms</emphasis>.
2339 That means that GHC can be very much more liberal about type synonyms than Haskell 98.
2340
2341 <itemizedlist>
2342 <listitem> <para>You can write a <literal>forall</literal> (including overloading)
2343 in a type synonym, thus:
2344 <programlisting>
2345 type Discard a = forall b. Show b => a -> b -> (a, String)
2346
2347 f :: Discard a
2348 f x y = (x, show y)
2349
2350 g :: Discard Int -> (Int,String) -- A rank-2 type
2351 g f = f 3 True
2352 </programlisting>
2353 </para>
2354 </listitem>
2355
2356 <listitem><para>
2357 If you also use <option>-XUnboxedTuples</option>,
2358 you can write an unboxed tuple in a type synonym:
2359 <programlisting>
2360 type Pr = (# Int, Int #)
2361
2362 h :: Int -> Pr
2363 h x = (# x, x #)
2364 </programlisting>
2365 </para></listitem>
2366
2367 <listitem><para>
2368 You can apply a type synonym to a forall type:
2369 <programlisting>
2370 type Foo a = a -> a -> Bool
2371
2372 f :: Foo (forall b. b->b)
2373 </programlisting>
2374 After expanding the synonym, <literal>f</literal> has the legal (in GHC) type:
2375 <programlisting>
2376 f :: (forall b. b->b) -> (forall b. b->b) -> Bool
2377 </programlisting>
2378 </para></listitem>
2379
2380 <listitem><para>
2381 You can apply a type synonym to a partially applied type synonym:
2382 <programlisting>
2383 type Generic i o = forall x. i x -> o x
2384 type Id x = x
2385
2386 foo :: Generic Id []
2387 </programlisting>
2388 After expanding the synonym, <literal>foo</literal> has the legal (in GHC) type:
2389 <programlisting>
2390 foo :: forall x. x -> [x]
2391 </programlisting>
2392 </para></listitem>
2393
2394 </itemizedlist>
2395 </para>
2396
2397 <para>
2398 GHC currently does kind checking before expanding synonyms (though even that
2399 could be changed.)
2400 </para>
2401 <para>
2402 After expanding type synonyms, GHC does validity checking on types, looking for
2403 the following mal-formedness which isn't detected simply by kind checking:
2404 <itemizedlist>
2405 <listitem><para>
2406 Type constructor applied to a type involving for-alls.
2407 </para></listitem>
2408 <listitem><para>
2409 Unboxed tuple on left of an arrow.
2410 </para></listitem>
2411 <listitem><para>
2412 Partially-applied type synonym.
2413 </para></listitem>
2414 </itemizedlist>
2415 So, for example,
2416 this will be rejected:
2417 <programlisting>
2418 type Pr = (# Int, Int #)
2419
2420 h :: Pr -> Int
2421 h x = ...
2422 </programlisting>
2423 because GHC does not allow unboxed tuples on the left of a function arrow.
2424 </para>
2425 </sect2>
2426
2427
2428 <sect2 id="existential-quantification">
2429 <title>Existentially quantified data constructors
2430 </title>
2431
2432 <para>
2433 The idea of using existential quantification in data type declarations
2434 was suggested by Perry, and implemented in Hope+ (Nigel Perry, <emphasis>The Implementation
2435 of Practical Functional Programming Languages</emphasis>, PhD Thesis, University of
2436 London, 1991). It was later formalised by Laufer and Odersky
2437 (<emphasis>Polymorphic type inference and abstract data types</emphasis>,
2438 TOPLAS, 16(5), pp1411-1430, 1994).
2439 It's been in Lennart
2440 Augustsson's <command>hbc</command> Haskell compiler for several years, and
2441 proved very useful. Here's the idea. Consider the declaration:
2442 </para>
2443
2444 <para>
2445
2446 <programlisting>
2447 data Foo = forall a. MkFoo a (a -> Bool)
2448 | Nil
2449 </programlisting>
2450
2451 </para>
2452
2453 <para>
2454 The data type <literal>Foo</literal> has two constructors with types:
2455 </para>
2456
2457 <para>
2458
2459 <programlisting>
2460 MkFoo :: forall a. a -> (a -> Bool) -> Foo
2461 Nil :: Foo
2462 </programlisting>
2463
2464 </para>
2465
2466 <para>
2467 Notice that the type variable <literal>a</literal> in the type of <function>MkFoo</function>
2468 does not appear in the data type itself, which is plain <literal>Foo</literal>.
2469 For example, the following expression is fine:
2470 </para>
2471
2472 <para>
2473
2474 <programlisting>
2475 [MkFoo 3 even, MkFoo 'c' isUpper] :: [Foo]
2476 </programlisting>
2477
2478 </para>
2479
2480 <para>
2481 Here, <literal>(MkFoo 3 even)</literal> packages an integer with a function
2482 <function>even</function> that maps an integer to <literal>Bool</literal>; and <function>MkFoo 'c'
2483 isUpper</function> packages a character with a compatible function. These
2484 two things are each of type <literal>Foo</literal> and can be put in a list.
2485 </para>
2486
2487 <para>
2488 What can we do with a value of type <literal>Foo</literal>?. In particular,
2489 what happens when we pattern-match on <function>MkFoo</function>?
2490 </para>
2491
2492 <para>
2493
2494 <programlisting>
2495 f (MkFoo val fn) = ???
2496 </programlisting>
2497
2498 </para>
2499
2500 <para>
2501 Since all we know about <literal>val</literal> and <function>fn</function> is that they
2502 are compatible, the only (useful) thing we can do with them is to
2503 apply <function>fn</function> to <literal>val</literal> to get a boolean. For example:
2504 </para>
2505
2506 <para>
2507
2508 <programlisting>
2509 f :: Foo -> Bool
2510 f (MkFoo val fn) = fn val
2511 </programlisting>
2512
2513 </para>
2514
2515 <para>
2516 What this allows us to do is to package heterogeneous values
2517 together with a bunch of functions that manipulate them, and then treat
2518 that collection of packages in a uniform manner. You can express
2519 quite a bit of object-oriented-like programming this way.
2520 </para>
2521
2522 <sect3 id="existential">
2523 <title>Why existential?
2524 </title>
2525
2526 <para>
2527 What has this to do with <emphasis>existential</emphasis> quantification?
2528 Simply that <function>MkFoo</function> has the (nearly) isomorphic type
2529 </para>
2530
2531 <para>
2532
2533 <programlisting>
2534 MkFoo :: (exists a . (a, a -> Bool)) -> Foo
2535 </programlisting>
2536
2537 </para>
2538
2539 <para>
2540 But Haskell programmers can safely think of the ordinary
2541 <emphasis>universally</emphasis> quantified type given above, thereby avoiding
2542 adding a new existential quantification construct.
2543 </para>
2544
2545 </sect3>
2546
2547 <sect3 id="existential-with-context">
2548 <title>Existentials and type classes</title>
2549
2550 <para>
2551 An easy extension is to allow
2552 arbitrary contexts before the constructor. For example:
2553 </para>
2554
2555 <para>
2556
2557 <programlisting>
2558 data Baz = forall a. Eq a => Baz1 a a
2559 | forall b. Show b => Baz2 b (b -> b)
2560 </programlisting>
2561
2562 </para>
2563
2564 <para>
2565 The two constructors have the types you'd expect:
2566 </para>
2567
2568 <para>
2569
2570 <programlisting>
2571 Baz1 :: forall a. Eq a => a -> a -> Baz
2572 Baz2 :: forall b. Show b => b -> (b -> b) -> Baz
2573 </programlisting>
2574
2575 </para>
2576
2577 <para>
2578 But when pattern matching on <function>Baz1</function> the matched values can be compared
2579 for equality, and when pattern matching on <function>Baz2</function> the first matched
2580 value can be converted to a string (as well as applying the function to it).
2581 So this program is legal:
2582 </para>
2583
2584 <para>
2585
2586 <programlisting>
2587 f :: Baz -> String
2588 f (Baz1 p q) | p == q = "Yes"
2589 | otherwise = "No"
2590 f (Baz2 v fn) = show (fn v)
2591 </programlisting>
2592
2593 </para>
2594
2595 <para>
2596 Operationally, in a dictionary-passing implementation, the
2597 constructors <function>Baz1</function> and <function>Baz2</function> must store the
2598 dictionaries for <literal>Eq</literal> and <literal>Show</literal> respectively, and
2599 extract it on pattern matching.
2600 </para>
2601
2602 </sect3>
2603
2604 <sect3 id="existential-records">
2605 <title>Record Constructors</title>
2606
2607 <para>
2608 GHC allows existentials to be used with records syntax as well. For example:
2609
2610 <programlisting>
2611 data Counter a = forall self. NewCounter
2612 { _this :: self
2613 , _inc :: self -> self
2614 , _display :: self -> IO ()
2615 , tag :: a
2616 }
2617 </programlisting>
2618 Here <literal>tag</literal> is a public field, with a well-typed selector
2619 function <literal>tag :: Counter a -> a</literal>. The <literal>self</literal>
2620 type is hidden from the outside; any attempt to apply <literal>_this</literal>,
2621 <literal>_inc</literal> or <literal>_display</literal> as functions will raise a
2622 compile-time error. In other words, <emphasis>GHC defines a record selector function
2623 only for fields whose type does not mention the existentially-quantified variables</emphasis>.
2624 (This example used an underscore in the fields for which record selectors
2625 will not be defined, but that is only programming style; GHC ignores them.)
2626 </para>
2627
2628 <para>
2629 To make use of these hidden fields, we need to create some helper functions:
2630
2631 <programlisting>
2632 inc :: Counter a -> Counter a
2633 inc (NewCounter x i d t) = NewCounter
2634 { _this = i x, _inc = i, _display = d, tag = t }
2635
2636 display :: Counter a -> IO ()
2637 display NewCounter{ _this = x, _display = d } = d x
2638 </programlisting>
2639
2640 Now we can define counters with different underlying implementations:
2641
2642 <programlisting>
2643 counterA :: Counter String
2644 counterA = NewCounter
2645 { _this = 0, _inc = (1+), _display = print, tag = "A" }
2646
2647 counterB :: Counter String
2648 counterB = NewCounter
2649 { _this = "", _inc = ('#':), _display = putStrLn, tag = "B" }
2650
2651 main = do
2652 display (inc counterA) -- prints "1"
2653 display (inc (inc counterB)) -- prints "##"
2654 </programlisting>
2655
2656 Record update syntax is supported for existentials (and GADTs):
2657 <programlisting>
2658 setTag :: Counter a -> a -> Counter a
2659 setTag obj t = obj{ tag = t }
2660 </programlisting>
2661 The rule for record update is this: <emphasis>
2662 the types of the updated fields may
2663 mention only the universally-quantified type variables
2664 of the data constructor. For GADTs, the field may mention only types
2665 that appear as a simple type-variable argument in the constructor's result
2666 type</emphasis>. For example:
2667 <programlisting>
2668 data T a b where { T1 { f1::a, f2::b, f3::(b,c) } :: T a b } -- c is existential
2669 upd1 t x = t { f1=x } -- OK: upd1 :: T a b -> a' -> T a' b
2670 upd2 t x = t { f3=x } -- BAD (f3's type mentions c, which is
2671 -- existentially quantified)
2672
2673 data G a b where { G1 { g1::a, g2::c } :: G a [c] }
2674 upd3 g x = g { g1=x } -- OK: upd3 :: G a b -> c -> G c b
2675 upd4 g x = g { g2=x } -- BAD (f2's type mentions c, which is not a simple
2676 -- type-variable argument in G1's result type)
2677 </programlisting>
2678 </para>
2679
2680 </sect3>
2681
2682
2683 <sect3>
2684 <title>Restrictions</title>
2685
2686 <para>
2687 There are several restrictions on the ways in which existentially-quantified
2688 constructors can be use.
2689 </para>
2690
2691 <para>
2692
2693 <itemizedlist>
2694 <listitem>
2695
2696 <para>
2697 When pattern matching, each pattern match introduces a new,
2698 distinct, type for each existential type variable. These types cannot
2699 be unified with any other type, nor can they escape from the scope of
2700 the pattern match. For example, these fragments are incorrect:
2701
2702
2703 <programlisting>
2704 f1 (MkFoo a f) = a
2705 </programlisting>
2706
2707
2708 Here, the type bound by <function>MkFoo</function> "escapes", because <literal>a</literal>
2709 is the result of <function>f1</function>. One way to see why this is wrong is to
2710 ask what type <function>f1</function> has:
2711
2712
2713 <programlisting>
2714 f1 :: Foo -> a -- Weird!
2715 </programlisting>
2716
2717
2718 What is this "<literal>a</literal>" in the result type? Clearly we don't mean
2719 this:
2720
2721
2722 <programlisting>
2723 f1 :: forall a. Foo -> a -- Wrong!
2724 </programlisting>
2725
2726
2727 The original program is just plain wrong. Here's another sort of error
2728
2729
2730 <programlisting>
2731 f2 (Baz1 a b) (Baz1 p q) = a==q
2732 </programlisting>
2733
2734
2735 It's ok to say <literal>a==b</literal> or <literal>p==q</literal>, but
2736 <literal>a==q</literal> is wrong because it equates the two distinct types arising
2737 from the two <function>Baz1</function> constructors.
2738
2739
2740 </para>
2741 </listitem>
2742 <listitem>
2743
2744 <para>
2745 You can't pattern-match on an existentially quantified
2746 constructor in a <literal>let</literal> or <literal>where</literal> group of
2747 bindings. So this is illegal:
2748
2749
2750 <programlisting>
2751 f3 x = a==b where { Baz1 a b = x }
2752 </programlisting>
2753
2754 Instead, use a <literal>case</literal> expression:
2755
2756 <programlisting>
2757 f3 x = case x of Baz1 a b -> a==b
2758 </programlisting>
2759
2760 In general, you can only pattern-match
2761 on an existentially-quantified constructor in a <literal>case</literal> expression or
2762 in the patterns of a function definition.
2763
2764 The reason for this restriction is really an implementation one.
2765 Type-checking binding groups is already a nightmare without
2766 existentials complicating the picture. Also an existential pattern
2767 binding at the top level of a module doesn't make sense, because it's
2768 not clear how to prevent the existentially-quantified type "escaping".
2769 So for now, there's a simple-to-state restriction. We'll see how
2770 annoying it is.
2771
2772 </para>
2773 </listitem>
2774 <listitem>
2775
2776 <para>
2777 You can't use existential quantification for <literal>newtype</literal>
2778 declarations. So this is illegal:
2779
2780
2781 <programlisting>
2782 newtype T = forall a. Ord a => MkT a
2783 </programlisting>
2784
2785
2786 Reason: a value of type <literal>T</literal> must be represented as a
2787 pair of a dictionary for <literal>Ord t</literal> and a value of type
2788 <literal>t</literal>. That contradicts the idea that
2789 <literal>newtype</literal> should have no concrete representation.
2790 You can get just the same efficiency and effect by using
2791 <literal>data</literal> instead of <literal>newtype</literal>. If
2792 there is no overloading involved, then there is more of a case for
2793 allowing an existentially-quantified <literal>newtype</literal>,
2794 because the <literal>data</literal> version does carry an
2795 implementation cost, but single-field existentially quantified
2796 constructors aren't much use. So the simple restriction (no
2797 existential stuff on <literal>newtype</literal>) stands, unless there
2798 are convincing reasons to change it.
2799
2800
2801 </para>
2802 </listitem>
2803 <listitem>
2804
2805 <para>
2806 You can't use <literal>deriving</literal> to define instances of a
2807 data type with existentially quantified data constructors.
2808
2809 Reason: in most cases it would not make sense. For example:;
2810
2811 <programlisting>
2812 data T = forall a. MkT [a] deriving( Eq )
2813 </programlisting>
2814
2815 To derive <literal>Eq</literal> in the standard way we would need to have equality
2816 between the single component of two <function>MkT</function> constructors:
2817
2818 <programlisting>
2819 instance Eq T where
2820 (MkT a) == (MkT b) = ???
2821 </programlisting>
2822
2823 But <varname>a</varname> and <varname>b</varname> have distinct types, and so can't be compared.
2824 It's just about possible to imagine examples in which the derived instance
2825 would make sense, but it seems altogether simpler simply to prohibit such
2826 declarations. Define your own instances!
2827 </para>
2828 </listitem>
2829
2830 </itemizedlist>
2831
2832 </para>
2833
2834 </sect3>
2835 </sect2>
2836
2837 <!-- ====================== Generalised algebraic data types ======================= -->
2838
2839 <sect2 id="gadt-style">
2840 <title>Declaring data types with explicit constructor signatures</title>
2841
2842 <para>When the <literal>GADTSyntax</literal> extension is enabled,
2843 GHC allows you to declare an algebraic data type by
2844 giving the type signatures of constructors explicitly. For example:
2845 <programlisting>
2846 data Maybe a where
2847 Nothing :: Maybe a
2848 Just :: a -> Maybe a
2849 </programlisting>
2850 The form is called a "GADT-style declaration"
2851 because Generalised Algebraic Data Types, described in <xref linkend="gadt"/>,
2852 can only be declared using this form.</para>
2853 <para>Notice that GADT-style syntax generalises existential types (<xref linkend="existential-quantification"/>).
2854 For example, these two declarations are equivalent:
2855 <programlisting>
2856 data Foo = forall a. MkFoo a (a -> Bool)
2857 data Foo' where { MKFoo :: a -> (a->Bool) -> Foo' }
2858 </programlisting>
2859 </para>
2860 <para>Any data type that can be declared in standard Haskell-98 syntax
2861 can also be declared using GADT-style syntax.
2862 The choice is largely stylistic, but GADT-style declarations differ in one important respect:
2863 they treat class constraints on the data constructors differently.
2864 Specifically, if the constructor is given a type-class context, that
2865 context is made available by pattern matching. For example:
2866 <programlisting>
2867 data Set a where
2868 MkSet :: Eq a => [a] -> Set a
2869
2870 makeSet :: Eq a => [a] -> Set a
2871 makeSet xs = MkSet (nub xs)
2872
2873 insert :: a -> Set a -> Set a
2874 insert a (MkSet as) | a `elem` as = MkSet as
2875 | otherwise = MkSet (a:as)
2876 </programlisting>
2877 A use of <literal>MkSet</literal> as a constructor (e.g. in the definition of <literal>makeSet</literal>)
2878 gives rise to a <literal>(Eq a)</literal>
2879 constraint, as you would expect. The new feature is that pattern-matching on <literal>MkSet</literal>
2880 (as in the definition of <literal>insert</literal>) makes <emphasis>available</emphasis> an <literal>(Eq a)</literal>
2881 context. In implementation terms, the <literal>MkSet</literal> constructor has a hidden field that stores
2882 the <literal>(Eq a)</literal> dictionary that is passed to <literal>MkSet</literal>; so
2883 when pattern-matching that dictionary becomes available for the right-hand side of the match.
2884 In the example, the equality dictionary is used to satisfy the equality constraint
2885 generated by the call to <literal>elem</literal>, so that the type of
2886 <literal>insert</literal> itself has no <literal>Eq</literal> constraint.
2887 </para>
2888 <para>
2889 For example, one possible application is to reify dictionaries:
2890 <programlisting>
2891 data NumInst a where
2892 MkNumInst :: Num a => NumInst a
2893
2894 intInst :: NumInst Int
2895 intInst = MkNumInst
2896
2897 plus :: NumInst a -> a -> a -> a
2898 plus MkNumInst p q = p + q
2899 </programlisting>
2900 Here, a value of type <literal>NumInst a</literal> is equivalent
2901 to an explicit <literal>(Num a)</literal> dictionary.
2902 </para>
2903 <para>
2904 All this applies to constructors declared using the syntax of <xref linkend="existential-with-context"/>.
2905 For example, the <literal>NumInst</literal> data type above could equivalently be declared
2906 like this:
2907 <programlisting>
2908 data NumInst a
2909 = Num a => MkNumInst (NumInst a)
2910 </programlisting>
2911 Notice that, unlike the situation when declaring an existential, there is
2912 no <literal>forall</literal>, because the <literal>Num</literal> constrains the
2913 data type's universally quantified type variable <literal>a</literal>.
2914 A constructor may have both universal and existential type variables: for example,
2915 the following two declarations are equivalent:
2916 <programlisting>
2917 data T1 a
2918 = forall b. (Num a, Eq b) => MkT1 a b
2919 data T2 a where
2920 MkT2 :: (Num a, Eq b) => a -> b -> T2 a
2921 </programlisting>
2922 </para>
2923 <para>All this behaviour contrasts with Haskell 98's peculiar treatment of
2924 contexts on a data type declaration (Section 4.2.1 of the Haskell 98 Report).
2925 In Haskell 98 the definition
2926 <programlisting>
2927 data Eq a => Set' a = MkSet' [a]
2928 </programlisting>
2929 gives <literal>MkSet'</literal> the same type as <literal>MkSet</literal> above. But instead of
2930 <emphasis>making available</emphasis> an <literal>(Eq a)</literal> constraint, pattern-matching
2931 on <literal>MkSet'</literal> <emphasis>requires</emphasis> an <literal>(Eq a)</literal> constraint!
2932 GHC faithfully implements this behaviour, odd though it is. But for GADT-style declarations,
2933 GHC's behaviour is much more useful, as well as much more intuitive.
2934 </para>
2935
2936 <para>
2937 The rest of this section gives further details about GADT-style data
2938 type declarations.
2939
2940 <itemizedlist>
2941 <listitem><para>
2942 The result type of each data constructor must begin with the type constructor being defined.
2943 If the result type of all constructors
2944 has the form <literal>T a1 ... an</literal>, where <literal>a1 ... an</literal>
2945 are distinct type variables, then the data type is <emphasis>ordinary</emphasis>;
2946 otherwise is a <emphasis>generalised</emphasis> data type (<xref linkend="gadt"/>).
2947 </para></listitem>
2948
2949 <listitem><para>
2950 As with other type signatures, you can give a single signature for several data constructors.
2951 In this example we give a single signature for <literal>T1</literal> and <literal>T2</literal>:
2952 <programlisting>
2953 data T a where
2954 T1,T2 :: a -> T a
2955 T3 :: T a
2956 </programlisting>
2957 </para></listitem>
2958
2959 <listitem><para>
2960 The type signature of
2961 each constructor is independent, and is implicitly universally quantified as usual.
2962 In particular, the type variable(s) in the "<literal>data T a where</literal>" header
2963 have no scope, and different constructors may have different universally-quantified type variables:
2964 <programlisting>
2965 data T a where -- The 'a' has no scope
2966 T1,T2 :: b -> T b -- Means forall b. b -> T b
2967 T3 :: T a -- Means forall a. T a
2968 </programlisting>
2969 </para></listitem>
2970
2971 <listitem><para>
2972 A constructor signature may mention type class constraints, which can differ for
2973 different constructors. For example, this is fine:
2974 <programlisting>
2975 data T a where
2976 T1 :: Eq b => b -> b -> T b
2977 T2 :: (Show c, Ix c) => c -> [c] -> T c
2978 </programlisting>
2979 When pattern matching, these constraints are made available to discharge constraints
2980 in the body of the match. For example:
2981 <programlisting>
2982 f :: T a -> String
2983 f (T1 x y) | x==y = "yes"
2984 | otherwise = "no"
2985 f (T2 a b) = show a
2986 </programlisting>
2987 Note that <literal>f</literal> is not overloaded; the <literal>Eq</literal> constraint arising
2988 from the use of <literal>==</literal> is discharged by the pattern match on <literal>T1</literal>
2989 and similarly the <literal>Show</literal> constraint arising from the use of <literal>show</literal>.
2990 </para></listitem>
2991
2992 <listitem><para>
2993 Unlike a Haskell-98-style
2994 data type declaration, the type variable(s) in the "<literal>data Set a where</literal>" header
2995 have no scope. Indeed, one can write a kind signature instead:
2996 <programlisting>
2997 data Set :: * -> * where ...
2998 </programlisting>
2999 or even a mixture of the two:
3000 <programlisting>
3001 data Bar a :: (* -> *) -> * where ...
3002 </programlisting>
3003 The type variables (if given) may be explicitly kinded, so we could also write the header for <literal>Foo</literal>
3004 like this:
3005 <programlisting>
3006 data Bar a (b :: * -> *) where ...
3007 </programlisting>
3008 </para></listitem>
3009
3010
3011 <listitem><para>
3012 You can use strictness annotations, in the obvious places
3013 in the constructor type:
3014 <programlisting>
3015 data Term a where
3016 Lit :: !Int -> Term Int
3017 If :: Term Bool -> !(Term a) -> !(Term a) -> Term a
3018 Pair :: Term a -> Term b -> Term (a,b)
3019 </programlisting>
3020 </para></listitem>
3021
3022 <listitem><para>
3023 You can use a <literal>deriving</literal> clause on a GADT-style data type
3024 declaration. For example, these two declarations are equivalent
3025 <programlisting>
3026 data Maybe1 a where {
3027 Nothing1 :: Maybe1 a ;
3028 Just1 :: a -> Maybe1 a
3029 } deriving( Eq, Ord )
3030
3031 data Maybe2 a = Nothing2 | Just2 a
3032 deriving( Eq, Ord )
3033 </programlisting>
3034 </para></listitem>
3035
3036 <listitem><para>
3037 The type signature may have quantified type variables that do not appear
3038 in the result type:
3039 <programlisting>
3040 data Foo where
3041 MkFoo :: a -> (a->Bool) -> Foo
3042 Nil :: Foo
3043 </programlisting>
3044 Here the type variable <literal>a</literal> does not appear in the result type
3045 of either constructor.
3046 Although it is universally quantified in the type of the constructor, such
3047 a type variable is often called "existential".
3048 Indeed, the above declaration declares precisely the same type as
3049 the <literal>data Foo</literal> in <xref linkend="existential-quantification"/>.
3050 </para><para>
3051 The type may contain a class context too, of course:
3052 <programlisting>
3053 data Showable where
3054 MkShowable :: Show a => a -> Showable
3055 </programlisting>
3056 </para></listitem>
3057
3058 <listitem><para>
3059 You can use record syntax on a GADT-style data type declaration:
3060
3061 <programlisting>
3062 data Person where
3063 Adult :: { name :: String, children :: [Person] } -> Person
3064 Child :: Show a => { name :: !String, funny :: a } -> Person
3065 </programlisting>
3066 As usual, for every constructor that has a field <literal>f</literal>, the type of
3067 field <literal>f</literal> must be the same (modulo alpha conversion).
3068 The <literal>Child</literal> constructor above shows that the signature
3069 may have a context, existentially-quantified variables, and strictness annotations,
3070 just as in the non-record case. (NB: the "type" that follows the double-colon
3071 is not really a type, because of the record syntax and strictness annotations.
3072 A "type" of this form can appear only in a constructor signature.)
3073 </para></listitem>
3074
3075 <listitem><para>
3076 Record updates are allowed with GADT-style declarations,
3077 only fields that have the following property: the type of the field
3078 mentions no existential type variables.
3079 </para></listitem>
3080
3081 <listitem><para>
3082 As in the case of existentials declared using the Haskell-98-like record syntax
3083 (<xref linkend="existential-records"/>),
3084 record-selector functions are generated only for those fields that have well-typed
3085 selectors.
3086 Here is the example of that section, in GADT-style syntax:
3087 <programlisting>
3088 data Counter a where
3089 NewCounter { _this :: self
3090 , _inc :: self -> self
3091 , _display :: self -> IO ()
3092 , tag :: a
3093 }
3094 :: Counter a
3095 </programlisting>
3096 As before, only one selector function is generated here, that for <literal>tag</literal>.
3097 Nevertheless, you can still use all the field names in pattern matching and record construction.
3098 </para></listitem>
3099
3100 <listitem><para>
3101 In a GADT-style data type declaration there is no obvious way to specify that a data constructor
3102 should be infix, which makes a difference if you derive <literal>Show</literal> for the type.
3103 (Data constructors declared infix are displayed infix by the derived <literal>show</literal>.)
3104 So GHC implements the following design: a data constructor declared in a GADT-style data type
3105 declaration is displayed infix by <literal>Show</literal> iff (a) it is an operator symbol,
3106 (b) it has two arguments, (c) it has a programmer-supplied fixity declaration. For example
3107 <programlisting>
3108 infix 6 (:--:)
3109 data T a where
3110 (:--:) :: Int -> Bool -> T Int
3111 </programlisting>
3112 </para></listitem>
3113 </itemizedlist></para>
3114 </sect2>
3115
3116 <sect2 id="gadt">
3117 <title>Generalised Algebraic Data Types (GADTs)</title>
3118
3119 <para>Generalised Algebraic Data Types generalise ordinary algebraic data types
3120 by allowing constructors to have richer return types. Here is an example:
3121 <programlisting>
3122 data Term a where
3123 Lit :: Int -> Term Int
3124 Succ :: Term Int -> Term Int
3125 IsZero :: Term Int -> Term Bool
3126 If :: Term Bool -> Term a -> Term a -> Term a
3127 Pair :: Term a -> Term b -> Term (a,b)
3128 </programlisting>
3129 Notice that the return type of the constructors is not always <literal>Term a</literal>, as is the
3130 case with ordinary data types. This generality allows us to
3131 write a well-typed <literal>eval</literal> function
3132 for these <literal>Terms</literal>:
3133 <programlisting>
3134 eval :: Term a -> a
3135 eval (Lit i) = i
3136 eval (Succ t) = 1 + eval t
3137 eval (IsZero t) = eval t == 0
3138 eval (If b e1 e2) = if eval b then eval e1 else eval e2
3139 eval (Pair e1 e2) = (eval e1, eval e2)
3140 </programlisting>
3141 The key point about GADTs is that <emphasis>pattern matching causes type refinement</emphasis>.
3142 For example, in the right hand side of the equation
3143 <programlisting>
3144 eval :: Term a -> a
3145 eval (Lit i) = ...
3146 </programlisting>
3147 the type <literal>a</literal> is refined to <literal>Int</literal>. That's the whole point!
3148 A precise specification of the type rules is beyond what this user manual aspires to,
3149 but the design closely follows that described in
3150 the paper <ulink
3151 url="http://research.microsoft.com/%7Esimonpj/papers/gadt/">Simple
3152 unification-based type inference for GADTs</ulink>,
3153 (ICFP 2006).
3154 The general principle is this: <emphasis>type refinement is only carried out
3155 based on user-supplied type annotations</emphasis>.
3156 So if no type signature is supplied for <literal>eval</literal>, no type refinement happens,
3157 and lots of obscure error messages will
3158 occur. However, the refinement is quite general. For example, if we had:
3159 <programlisting>
3160 eval :: Term a -> a -> a
3161 eval (Lit i) j = i+j
3162 </programlisting>
3163 the pattern match causes the type <literal>a</literal> to be refined to <literal>Int</literal> (because of the type
3164 of the constructor <literal>Lit</literal>), and that refinement also applies to the type of <literal>j</literal>, and
3165 the result type of the <literal>case</literal> expression. Hence the addition <literal>i+j</literal> is legal.
3166 </para>
3167 <para>
3168 These and many other examples are given in papers by Hongwei Xi, and
3169 Tim Sheard. There is a longer introduction
3170 <ulink url="http://www.haskell.org/haskellwiki/GADT">on the wiki</ulink>,
3171 and Ralf Hinze's
3172 <ulink url="http://www.informatik.uni-bonn.de/~ralf/publications/With.pdf">Fun with phantom types</ulink> also has a number of examples. Note that papers
3173 may use different notation to that implemented in GHC.
3174 </para>
3175 <para>
3176 The rest of this section outlines the extensions to GHC that support GADTs. The extension is enabled with
3177 <option>-XGADTs</option>. The <option>-XGADTs</option> flag also sets <option>-XRelaxedPolyRec</option>.
3178 <itemizedlist>
3179 <listitem><para>
3180 A GADT can only be declared using GADT-style syntax (<xref linkend="gadt-style"/>);
3181 the old Haskell-98 syntax for data declarations always declares an ordinary data type.
3182 The result type of each constructor must begin with the type constructor being defined,
3183 but for a GADT the arguments to the type constructor can be arbitrary monotypes.
3184 For example, in the <literal>Term</literal> data
3185 type above, the type of each constructor must end with <literal>Term ty</literal>, but
3186 the <literal>ty</literal> need not be a type variable (e.g. the <literal>Lit</literal>
3187 constructor).
3188 </para></listitem>
3189
3190 <listitem><para>
3191 It is permitted to declare an ordinary algebraic data type using GADT-style syntax.
3192 What makes a GADT into a GADT is not the syntax, but rather the presence of data constructors
3193 whose result type is not just <literal>T a b</literal>.
3194 </para></listitem>
3195
3196 <listitem><para>
3197 You cannot use a <literal>deriving</literal> clause for a GADT; only for
3198 an ordinary data type.
3199 </para></listitem>
3200
3201 <listitem><para>
3202 As mentioned in <xref linkend="gadt-style"/>, record syntax is supported.
3203 For example:
3204 <programlisting>
3205 data Term a where
3206 Lit { val :: Int } :: Term Int
3207 Succ { num :: Term Int } :: Term Int
3208 Pred { num :: Term Int } :: Term Int
3209 IsZero { arg :: Term Int } :: Term Bool
3210 Pair { arg1 :: Term a
3211 , arg2 :: Term b
3212 } :: Term (a,b)
3213 If { cnd :: Term Bool
3214 , tru :: Term a
3215 , fls :: Term a
3216 } :: Term a
3217 </programlisting>
3218 However, for GADTs there is the following additional constraint:
3219 every constructor that has a field <literal>f</literal> must have
3220 the same result type (modulo alpha conversion)
3221 Hence, in the above example, we cannot merge the <literal>num</literal>
3222 and <literal>arg</literal> fields above into a
3223 single name. Although their field types are both <literal>Term Int</literal>,
3224 their selector functions actually have different types:
3225
3226 <programlisting>
3227 num :: Term Int -> Term Int
3228 arg :: Term Bool -> Term Int
3229 </programlisting>
3230 </para></listitem>
3231
3232 <listitem><para>
3233 When pattern-matching against data constructors drawn from a GADT,
3234 for example in a <literal>case</literal> expression, the following rules apply:
3235 <itemizedlist>
3236 <listitem><para>The type of the scrutinee must be rigid.</para></listitem>
3237 <listitem><para>The type of the entire <literal>case</literal> expression must be rigid.</para></listitem>
3238 <listitem><para>The type of any free variable mentioned in any of
3239 the <literal>case</literal> alternatives must be rigid.</para></listitem>
3240 </itemizedlist>
3241 A type is "rigid" if it is completely known to the compiler at its binding site. The easiest
3242 way to ensure that a variable a rigid type is to give it a type signature.
3243 For more precise details see <ulink url="http://research.microsoft.com/%7Esimonpj/papers/gadt">
3244 Simple unification-based type inference for GADTs
3245 </ulink>. The criteria implemented by GHC are given in the Appendix.
3246
3247 </para></listitem>
3248
3249 </itemizedlist>
3250 </para>
3251
3252 </sect2>
3253 </sect1>
3254
3255 <!-- ====================== End of Generalised algebraic data types ======================= -->
3256
3257 <sect1 id="deriving">
3258 <title>Extensions to the "deriving" mechanism</title>
3259
3260 <sect2 id="deriving-inferred">
3261 <title>Inferred context for deriving clauses</title>
3262
3263 <para>
3264 The Haskell Report is vague about exactly when a <literal>deriving</literal> clause is
3265 legal. For example:
3266 <programlisting>
3267 data T0 f a = MkT0 a deriving( Eq )
3268 data T1 f a = MkT1 (f a) deriving( Eq )
3269 data T2 f a = MkT2 (f (f a)) deriving( Eq )
3270 </programlisting>
3271 The natural generated <literal>Eq</literal> code would result in these instance declarations:
3272 <programlisting>
3273 instance Eq a => Eq (T0 f a) where ...
3274 instance Eq (f a) => Eq (T1 f a) where ...
3275 instance Eq (f (f a)) => Eq (T2 f a) where ...
3276 </programlisting>
3277 The first of these is obviously fine. The second is still fine, although less obviously.
3278 The third is not Haskell 98, and risks losing termination of instances.
3279 </para>
3280 <para>
3281 GHC takes a conservative position: it accepts the first two, but not the third. The rule is this:
3282 each constraint in the inferred instance context must consist only of type variables,
3283 with no repetitions.
3284 </para>
3285 <para>
3286 This rule is applied regardless of flags. If you want a more exotic context, you can write
3287 it yourself, using the <link linkend="stand-alone-deriving">standalone deriving mechanism</link>.
3288 </para>
3289 </sect2>
3290
3291 <sect2 id="stand-alone-deriving">
3292 <title>Stand-alone deriving declarations</title>
3293
3294 <para>
3295 GHC now allows stand-alone <literal>deriving</literal> declarations, enabled by <literal>-XStandaloneDeriving</literal>:
3296 <programlisting>
3297 data Foo a = Bar a | Baz String
3298
3299 deriving instance Eq a => Eq (Foo a)
3300 </programlisting>
3301 The syntax is identical to that of an ordinary instance declaration apart from (a) the keyword
3302 <literal>deriving</literal>, and (b) the absence of the <literal>where</literal> part.
3303 Note the following points:
3304 <itemizedlist>
3305 <listitem><para>
3306 You must supply an explicit context (in the example the context is <literal>(Eq a)</literal>),
3307 exactly as you would in an ordinary instance declaration.
3308 (In contrast, in a <literal>deriving</literal> clause
3309 attached to a data type declaration, the context is inferred.)
3310 </para></listitem>
3311
3312 <listitem><para>
3313 A <literal>deriving instance</literal> declaration
3314 must obey the same rules concerning form and termination as ordinary instance declarations,
3315 controlled by the same flags; see <xref linkend="instance-decls"/>.
3316 </para></listitem>
3317
3318 <listitem><para>
3319 Unlike a <literal>deriving</literal>
3320 declaration attached to a <literal>data</literal> declaration, the instance can be more specific
3321 than the data type (assuming you also use
3322 <literal>-XFlexibleInstances</literal>, <xref linkend="instance-rules"/>). Consider
3323 for example
3324 <programlisting>
3325 data Foo a = Bar a | Baz String
3326
3327 deriving instance Eq a => Eq (Foo [a])
3328 deriving instance Eq a => Eq (Foo (Maybe a))
3329 </programlisting>
3330 This will generate a derived instance for <literal>(Foo [a])</literal> and <literal>(Foo (Maybe a))</literal>,
3331 but other types such as <literal>(Foo (Int,Bool))</literal> will not be an instance of <literal>Eq</literal>.
3332 </para></listitem>
3333
3334 <listitem><para>
3335 Unlike a <literal>deriving</literal>
3336 declaration attached to a <literal>data</literal> declaration,
3337 GHC does not restrict the form of the data type. Instead, GHC simply generates the appropriate
3338 boilerplate code for the specified class, and typechecks it. If there is a type error, it is
3339 your problem. (GHC will show you the offending code if it has a type error.)
3340 The merit of this is that you can derive instances for GADTs and other exotic
3341 data types, providing only that the boilerplate code does indeed typecheck. For example:
3342 <programlisting>
3343 data T a where
3344 T1 :: T Int
3345 T2 :: T Bool
3346
3347 deriving instance Show (T a)
3348 </programlisting>
3349 In this example, you cannot say <literal>... deriving( Show )</literal> on the
3350 data type declaration for <literal>T</literal>,
3351 because <literal>T</literal> is a GADT, but you <emphasis>can</emphasis> generate
3352 the instance declaration using stand-alone deriving.
3353 </para>
3354 </listitem>
3355
3356 <listitem>
3357 <para>The stand-alone syntax is generalised for newtypes in exactly the same
3358 way that ordinary <literal>deriving</literal> clauses are generalised (<xref linkend="newtype-deriving"/>).
3359 For example:
3360 <programlisting>
3361 newtype Foo a = MkFoo (State Int a)
3362
3363 deriving instance MonadState Int Foo
3364 </programlisting>
3365 GHC always treats the <emphasis>last</emphasis> parameter of the instance
3366 (<literal>Foo</literal> in this example) as the type whose instance is being derived.
3367 </para></listitem>
3368 </itemizedlist></para>
3369
3370 </sect2>
3371
3372
3373 <sect2 id="deriving-typeable">
3374 <title>Deriving clause for extra classes (<literal>Typeable</literal>, <literal>Data</literal>, etc)</title>
3375
3376 <para>
3377 Haskell 98 allows the programmer to add "<literal>deriving( Eq, Ord )</literal>" to a data type
3378 declaration, to generate a standard instance declaration for classes specified in the <literal>deriving</literal> clause.
3379 In Haskell 98, the only classes that may appear in the <literal>deriving</literal> clause are the standard
3380 classes <literal>Eq</literal>, <literal>Ord</literal>,
3381 <literal>Enum</literal>, <literal>Ix</literal>, <literal>Bounded</literal>, <literal>Read</literal>, and <literal>Show</literal>.
3382 </para>
3383 <para>
3384 GHC extends this list with several more classes that may be automatically derived:
3385 <itemizedlist>
3386 <listitem><para> With <option>-XDeriveDataTypeable</option>, you can derive instances of the classes
3387 <literal>Typeable</literal>, and <literal>Data</literal>, defined in the library
3388 modules <literal>Data.Typeable</literal> and <literal>Data.Generics</literal> respectively.
3389 </para>
3390 <para>Since GHC 7.8.1, <literal>Typeable</literal> is kind-polymorphic (see
3391 <xref linkend="kind-polymorphism"/>) and can be derived for any datatype and
3392 type class. Instances for datatypes can be derived by attaching a
3393 <literal>deriving Typeable</literal> clause to the datatype declaration, or by
3394 using standalone deriving (see <xref linkend="stand-alone-deriving"/>).
3395 Instances for type classes can only be derived using standalone deriving.
3396 See also <xref linkend="auto-derive-typeable"/>.
3397 </para>
3398 <para>
3399 Also since GHC 7.8.1, handwritten (ie. not derived) instances of
3400 <literal>Typeable</literal> are forbidden, and will be ignored with a warning.
3401 </para>
3402 </listitem>
3403
3404 <listitem><para> With <option>-XDeriveGeneric</option>, you can derive
3405 instances of the classes <literal>Generic</literal> and
3406 <literal>Generic1</literal>, defined in <literal>GHC.Generics</literal>.
3407 You can use these to define generic functions,
3408 as described in <xref linkend="generic-programming"/>.
3409 </para></listitem>
3410
3411 <listitem><para> With <option>-XDeriveFunctor</option>, you can derive instances of
3412 the class <literal>Functor</literal>,
3413 defined in <literal>GHC.Base</literal>.
3414 </para></listitem>
3415
3416 <listitem><para> With <option>-XDeriveFoldable</option>, you can derive instances of
3417 the class <literal>Foldable</literal>,
3418 defined in <literal>Data.Foldable</literal>.
3419 </para></listitem>
3420
3421 <listitem><para> With <option>-XDeriveTraversable</option>, you can derive instances of
3422 the class <literal>Traversable</literal>,
3423 defined in <literal>Data.Traversable</literal>.
3424 </para></listitem>
3425 </itemizedlist>
3426 In each case the appropriate class must be in scope before it
3427 can be mentioned in the <literal>deriving</literal> clause.
3428 </para>
3429 </sect2>
3430
3431 <sect2 id="auto-derive-typeable">
3432 <title>Automatically deriving <literal>Typeable</literal> instances</title>
3433
3434 <para>
3435 The flag <option>-XAutoDeriveTypeable</option> triggers the generation
3436 of derived <literal>Typeable</literal> instances for every datatype and type
3437 class declaration in the module it is used.
3438 </para>
3439
3440 </sect2>
3441
3442 <sect2 id="newtype-deriving">
3443 <title>Generalised derived instances for newtypes</title>
3444
3445 <para>
3446 When you define an abstract type using <literal>newtype</literal>, you may want
3447 the new type to inherit some instances from its representation. In
3448 Haskell 98, you can inherit instances of <literal>Eq</literal>, <literal>Ord</literal>,
3449 <literal>Enum</literal> and <literal>Bounded</literal> by deriving them, but for any
3450 other classes you have to write an explicit instance declaration. For
3451 example, if you define
3452
3453 <programlisting>
3454 newtype Dollars = Dollars Int
3455 </programlisting>
3456
3457 and you want to use arithmetic on <literal>Dollars</literal>, you have to
3458 explicitly define an instance of <literal>Num</literal>:
3459
3460 <programlisting>
3461 instance Num Dollars where
3462 Dollars a + Dollars b = Dollars (a+b)
3463 ...
3464 </programlisting>
3465 All the instance does is apply and remove the <literal>newtype</literal>
3466 constructor. It is particularly galling that, since the constructor
3467 doesn't appear at run-time, this instance declaration defines a
3468 dictionary which is <emphasis>wholly equivalent</emphasis> to the <literal>Int</literal>
3469 dictionary, only slower!
3470 </para>
3471
3472
3473 <sect3> <title> Generalising the deriving clause </title>
3474 <para>
3475 GHC now permits such instances to be derived instead,
3476 using the flag <option>-XGeneralizedNewtypeDeriving</option>,
3477 so one can write
3478 <programlisting>
3479 newtype Dollars = Dollars Int deriving (Eq,Show,Num)
3480 </programlisting>
3481
3482 and the implementation uses the <emphasis>same</emphasis> <literal>Num</literal> dictionary
3483 for <literal>Dollars</literal> as for <literal>Int</literal>. Notionally, the compiler
3484 derives an instance declaration of the form
3485
3486 <programlisting>
3487 instance Num Int => Num Dollars
3488 </programlisting>
3489
3490 which just adds or removes the <literal>newtype</literal> constructor according to the type.
3491 </para>
3492 <para>
3493
3494 We can also derive instances of constructor classes in a similar
3495 way. For example, suppose we have implemented state and failure monad
3496 transformers, such that
3497
3498 <programlisting>
3499 instance Monad m => Monad (State s m)
3500 instance Monad m => Monad (Failure m)
3501 </programlisting>
3502 In Haskell 98, we can define a parsing monad by
3503 <programlisting>
3504 type Parser tok m a = State [tok] (Failure m) a
3505 </programlisting>
3506
3507 which is automatically a monad thanks to the instance declarations
3508 above. With the extension, we can make the parser type abstract,
3509 without needing to write an instance of class <literal>Monad</literal>, via
3510
3511 <programlisting>
3512 newtype Parser tok m a = Parser (State [tok] (Failure m) a)
3513 deriving Monad
3514 </programlisting>
3515 In this case the derived instance declaration is of the form
3516 <programlisting>
3517 instance Monad (State [tok] (Failure m)) => Monad (Parser tok m)
3518 </programlisting>
3519
3520 Notice that, since <literal>Monad</literal> is a constructor class, the
3521 instance is a <emphasis>partial application</emphasis> of the new type, not the
3522 entire left hand side. We can imagine that the type declaration is
3523 "eta-converted" to generate the context of the instance
3524 declaration.
3525 </para>
3526 <para>
3527
3528 We can even derive instances of multi-parameter classes, provided the
3529 newtype is the last class parameter. In this case, a ``partial
3530 application'' of the class appears in the <literal>deriving</literal>
3531 clause. For example, given the class
3532
3533 <programlisting>
3534 class StateMonad s m | m -> s where ...
3535 instance Monad m => StateMonad s (State s m) where ...
3536 </programlisting>
3537 then we can derive an instance of <literal>StateMonad</literal> for <literal>Parser</literal>s by
3538 <programlisting>
3539 newtype Parser tok m a = Parser (State [tok] (Failure m) a)
3540 deriving (Monad, StateMonad [tok])
3541 </programlisting>
3542
3543 The derived instance is obtained by completing the application of the
3544 class to the new type:
3545
3546 <programlisting>
3547 instance StateMonad [tok] (State [tok] (Failure m)) =>
3548 StateMonad [tok] (Parser tok m)
3549 </programlisting>
3550 </para>
3551 <para>
3552
3553 As a result of this extension, all derived instances in newtype
3554 declarations are treated uniformly (and implemented just by reusing
3555 the dictionary for the representation type), <emphasis>except</emphasis>
3556 <literal>Show</literal> and <literal>Read</literal>, which really behave differently for
3557 the newtype and its representation.
3558 </para>
3559 </sect3>
3560
3561 <sect3> <title> A more precise specification </title>
3562 <para>
3563 Derived instance declarations are constructed as follows. Consider the
3564 declaration (after expansion of any type synonyms)
3565
3566 <programlisting>
3567 newtype T v1...vn = T' (t vk+1...vn) deriving (c1...cm)
3568 </programlisting>
3569
3570 where
3571 <itemizedlist>
3572 <listitem><para>
3573 The <literal>ci</literal> are partial applications of
3574 classes of the form <literal>C t1'...tj'</literal>, where the arity of <literal>C</literal>
3575 is exactly <literal>j+1</literal>. That is, <literal>C</literal> lacks exactly one type argument.
3576 </para></listitem>
3577 <listitem><para>
3578 The <literal>k</literal> is chosen so that <literal>ci (T v1...vk)</literal> is well-kinded.
3579 </para></listitem>
3580 <listitem><para>
3581 The type <literal>t</literal> is an arbitrary type.
3582 </para></listitem>
3583 <listitem><para>
3584 The type variables <literal>vk+1...vn</literal> do not occur in <literal>t</literal>,
3585 nor in the <literal>ci</literal>, and
3586 </para></listitem>
3587 <listitem><para>
3588 None of the <literal>ci</literal> is <literal>Read</literal>, <literal>Show</literal>,
3589 <literal>Typeable</literal>, or <literal>Data</literal>. These classes
3590 should not "look through" the type or its constructor. You can still
3591 derive these classes for a newtype, but it happens in the usual way, not
3592 via this new mechanism.
3593 </para></listitem>
3594 </itemizedlist>
3595 Then, for each <literal>ci</literal>, the derived instance
3596 declaration is:
3597 <programlisting>
3598 instance ci t => ci (T v1...vk)
3599 </programlisting>
3600 As an example which does <emphasis>not</emphasis> work, consider
3601 <programlisting>
3602 newtype NonMonad m s = NonMonad (State s m s) deriving Monad
3603 </programlisting>
3604 Here we cannot derive the instance
3605 <programlisting>
3606 instance Monad (State s m) => Monad (NonMonad m)
3607 </programlisting>
3608
3609 because the type variable <literal>s</literal> occurs in <literal>State s m</literal>,
3610 and so cannot be "eta-converted" away. It is a good thing that this
3611 <literal>deriving</literal> clause is rejected, because <literal>NonMonad m</literal> is
3612 not, in fact, a monad --- for the same reason. Try defining
3613 <literal>>>=</literal> with the correct type: you won't be able to.
3614 </para>
3615 <para>
3616
3617 Notice also that the <emphasis>order</emphasis> of class parameters becomes
3618 important, since we can only derive instances for the last one. If the
3619 <literal>StateMonad</literal> class above were instead defined as
3620
3621 <programlisting>
3622 class StateMonad m s | m -> s where ...
3623 </programlisting>
3624
3625 then we would not have been able to derive an instance for the
3626 <literal>Parser</literal> type above. We hypothesise that multi-parameter
3627 classes usually have one "main" parameter for which deriving new
3628 instances is most interesting.
3629 </para>
3630 <para>Lastly, all of this applies only for classes other than
3631 <literal>Read</literal>, <literal>Show</literal>, <literal>Typeable</literal>,
3632 and <literal>Data</literal>, for which the built-in derivation applies (section
3633 4.3.3. of the Haskell Report).
3634 (For the standard classes <literal>Eq</literal>, <literal>Ord</literal>,
3635 <literal>Ix</literal>, and <literal>Bounded</literal> it is immaterial whether
3636 the standard method is used or the one described here.)
3637 </para>
3638 </sect3>
3639 </sect2>
3640 </sect1>
3641
3642
3643 <!-- TYPE SYSTEM EXTENSIONS -->
3644 <sect1 id="type-class-extensions">
3645 <title>Class and instances declarations</title>
3646
3647 <sect2 id="multi-param-type-classes">
3648 <title>Class declarations</title>
3649
3650 <para>
3651 This section, and the next one, documents GHC's type-class extensions.
3652 There's lots of background in the paper <ulink
3653 url="http://research.microsoft.com/~simonpj/Papers/type-class-design-space/">Type
3654 classes: exploring the design space</ulink> (Simon Peyton Jones, Mark
3655 Jones, Erik Meijer).
3656 </para>
3657
3658 <sect3>
3659 <title>Multi-parameter type classes</title>
3660 <para>
3661 Multi-parameter type classes are permitted, with flag <option>-XMultiParamTypeClasses</option>.
3662 For example:
3663
3664
3665 <programlisting>
3666 class Collection c a where
3667 union :: c a -> c a -> c a
3668 ...etc.
3669 </programlisting>
3670
3671 </para>
3672 </sect3>
3673
3674 <sect3 id="superclass-rules">
3675 <title>The superclasses of a class declaration</title>
3676
3677 <para>
3678 In Haskell 98 the context of a class declaration (which introduces superclasses)
3679 must be simple; that is, each predicate must consist of a class applied to
3680 type variables. The flag <option>-XFlexibleContexts</option>
3681 (<xref linkend="flexible-contexts"/>)
3682 lifts this restriction,
3683 so that the only restriction on the context in a class declaration is
3684 that the class hierarchy must be acyclic. So these class declarations are OK:
3685
3686
3687 <programlisting>
3688 class Functor (m k) => FiniteMap m k where
3689 ...
3690
3691 class (Monad m, Monad (t m)) => Transform t m where
3692 lift :: m a -> (t m) a
3693 </programlisting>
3694
3695
3696 </para>
3697 <para>
3698 As in Haskell 98, The class hierarchy must be acyclic. However, the definition
3699 of "acyclic" involves only the superclass relationships. For example,
3700 this is OK:
3701
3702
3703 <programlisting>
3704 class C a where {
3705 op :: D b => a -> b -> b
3706 }
3707
3708 class C a => D a where { ... }
3709 </programlisting>
3710
3711
3712 Here, <literal>C</literal> is a superclass of <literal>D</literal>, but it's OK for a
3713 class operation <literal>op</literal> of <literal>C</literal> to mention <literal>D</literal>. (It
3714 would not be OK for <literal>D</literal> to be a superclass of <literal>C</literal>.)
3715 </para>
3716 <para>
3717 With the extension that adds a <link linkend="constraint-kind">kind of constraints</link>,
3718 you can write more exotic superclass definitions. The superclass cycle check is even more
3719 liberal in these case. For example, this is OK:
3720
3721 <programlisting>
3722 class A cls c where
3723 meth :: cls c => c -> c
3724
3725 class A B c => B c where
3726 </programlisting>
3727
3728 A superclass context for a class <literal>C</literal> is allowed if, after expanding
3729 type synonyms to their right-hand-sides, and uses of classes (other than <literal>C</literal>)
3730 to their superclasses, <literal>C</literal> does not occur syntactically in the context.
3731 </para>
3732 </sect3>
3733
3734
3735
3736
3737 <sect3 id="class-method-types">
3738 <title>Class method types</title>
3739
3740 <para>
3741 Haskell 98 prohibits class method types to mention constraints on the
3742 class type variable, thus:
3743 <programlisting>
3744 class Seq s a where
3745 fromList :: [a] -> s a
3746 elem :: Eq a => a -> s a -> Bool
3747 </programlisting>
3748 The type of <literal>elem</literal> is illegal in Haskell 98, because it
3749 contains the constraint <literal>Eq a</literal>, constrains only the
3750 class type variable (in this case <literal>a</literal>).
3751 GHC lifts this restriction (flag <option>-XConstrainedClassMethods</option>).
3752 </para>
3753
3754
3755 </sect3>
3756
3757
3758 <sect3 id="class-default-signatures">
3759 <title>Default method signatures</title>
3760
3761 <para>
3762 Haskell 98 allows you to define a default implementation when declaring a class:
3763 <programlisting>
3764 class Enum a where
3765 enum :: [a]
3766 enum = []
3767 </programlisting>
3768 The type of the <literal>enum</literal> method is <literal>[a]</literal>, and
3769 this is also the type of the default method. You can lift this restriction
3770 and give another type to the default method using the flag
3771 <option>-XDefaultSignatures</option>. For instance, if you have written a
3772 generic implementation of enumeration in a class <literal>GEnum</literal>
3773 with method <literal>genum</literal> in terms of <literal>GHC.Generics</literal>,
3774 you can specify a default method that uses that generic implementation:
3775 <programlisting>
3776 class Enum a where
3777 enum :: [a]
3778 default enum :: (Generic a, GEnum (Rep a)) => [a]
3779 enum = map to genum
3780 </programlisting>
3781 We reuse the keyword <literal>default</literal> to signal that a signature
3782 applies to the default method only; when defining instances of the
3783 <literal>Enum</literal> class, the original type <literal>[a]</literal> of
3784 <literal>enum</literal> still applies. When giving an empty instance, however,
3785 the default implementation <literal>map to genum</literal> is filled-in,
3786 and type-checked with the type
3787 <literal>(Generic a, GEnum (Rep a)) => [a]</literal>.
3788 </para>
3789
3790 <para>
3791 We use default signatures to simplify generic programming in GHC
3792 (<xref linkend="generic-programming"/>).
3793 </para>
3794
3795
3796 </sect3>
3797 </sect2>
3798
3799 <sect2 id="functional-dependencies">
3800 <title>Functional dependencies
3801 </title>
3802
3803 <para> Functional dependencies are implemented as described by Mark Jones
3804 in &ldquo;<ulink url="http://citeseer.ist.psu.edu/jones00type.html">Type Classes with Functional Dependencies</ulink>&rdquo;, Mark P. Jones,
3805 In Proceedings of the 9th European Symposium on Programming,
3806 ESOP 2000, Berlin, Germany, March 2000, Springer-Verlag LNCS 1782,
3807 .
3808 </para>
3809 <para>
3810 Functional dependencies are introduced by a vertical bar in the syntax of a
3811 class declaration; e.g.
3812 <programlisting>
3813 class (Monad m) => MonadState s m | m -> s where ...
3814
3815 class Foo a b c | a b -> c where ...
3816 </programlisting>
3817 There should be more documentation, but there isn't (yet). Yell if you need it.
3818 </para>
3819
3820 <sect3><title>Rules for functional dependencies </title>
3821 <para>
3822 In a class declaration, all of the class type variables must be reachable (in the sense
3823 mentioned in <xref linkend="flexible-contexts"/>)
3824 from the free variables of each method type.
3825 For example:
3826
3827 <programlisting>
3828 class Coll s a where
3829 empty :: s
3830 insert :: s -> a -> s
3831 </programlisting>
3832
3833 is not OK, because the type of <literal>empty</literal> doesn't mention
3834 <literal>a</literal>. Functional dependencies can make the type variable
3835 reachable:
3836 <programlisting>
3837 class Coll s a | s -> a where
3838 empty :: s
3839 insert :: s -> a -> s
3840 </programlisting>
3841
3842 Alternatively <literal>Coll</literal> might be rewritten
3843
3844 <programlisting>
3845 class Coll s a where
3846 empty :: s a
3847 insert :: s a -> a -> s a
3848 </programlisting>
3849
3850
3851 which makes the connection between the type of a collection of
3852 <literal>a</literal>'s (namely <literal>(s a)</literal>) and the element type <literal>a</literal>.
3853 Occasionally this really doesn't work, in which case you can split the
3854 class like this:
3855
3856
3857 <programlisting>
3858 class CollE s where
3859 empty :: s
3860
3861 class CollE s => Coll s a where
3862 insert :: s -> a -> s
3863 </programlisting>
3864 </para>
3865 </sect3>
3866
3867
3868 <sect3>
3869 <title>Background on functional dependencies</title>
3870
3871 <para>The following description of the motivation and use of functional dependencies is taken
3872 from the Hugs user manual, reproduced here (with minor changes) by kind
3873 permission of Mark Jones.
3874 </para>
3875 <para>
3876 Consider the following class, intended as part of a
3877 library for collection types:
3878 <programlisting>
3879 class Collects e ce where
3880 empty :: ce
3881 insert :: e -> ce -> ce
3882 member :: e -> ce -> Bool
3883 </programlisting>
3884 The type variable e used here represents the element type, while ce is the type
3885 of the container itself. Within this framework, we might want to define
3886 instances of this class for lists or characteristic functions (both of which
3887 can be used to represent collections of any equality type), bit sets (which can
3888 be used to represent collections of characters), or hash tables (which can be
3889 used to represent any collection whose elements have a hash function). Omitting
3890 standard implementation details, this would lead to the following declarations:
3891 <programlisting>
3892 instance Eq e => Collects e [e] where ...
3893 instance Eq e => Collects e (e -> Bool) where ...
3894 instance Collects Char BitSet where ...
3895 instance (Hashable e, Collects a ce)
3896 => Collects e (Array Int ce) where ...
3897 </programlisting>
3898 All this looks quite promising; we have a class and a range of interesting
3899 implementations. Unfortunately, there are some serious problems with the class
3900 declaration. First, the empty function has an ambiguous type:
3901 <programlisting>
3902 empty :: Collects e ce => ce
3903 </programlisting>
3904 By "ambiguous" we mean that there is a type variable e that appears on the left
3905 of the <literal>=&gt;</literal> symbol, but not on the right. The problem with
3906 this is that, according to the theoretical foundations of Haskell overloading,
3907 we cannot guarantee a well-defined semantics for any term with an ambiguous
3908 type.
3909 </para>
3910 <para>
3911 We can sidestep this specific problem by removing the empty member from the
3912 class declaration. However, although the remaining members, insert and member,
3913 do not have ambiguous types, we still run into problems when we try to use
3914 them. For example, consider the following two functions:
3915 <programlisting>
3916 f x y = insert x . insert y
3917 g = f True 'a'
3918 </programlisting>
3919 for which GHC infers the following types:
3920 <programlisting>
3921 f :: (Collects a c, Collects b c) => a -> b -> c -> c
3922 g :: (Collects Bool c, Collects Char c) => c -> c
3923 </programlisting>
3924 Notice that the type for f allows the two parameters x and y to be assigned
3925 different types, even though it attempts to insert each of the two values, one
3926 after the other, into the same collection. If we're trying to model collections
3927 that contain only one type of value, then this is clearly an inaccurate
3928 type. Worse still, the definition for g is accepted, without causing a type
3929 error. As a result, the error in this code will not be flagged at the point
3930 where it appears. Instead, it will show up only when we try to use g, which
3931 might even be in a different module.
3932 </para>
3933
3934 <sect4><title>An attempt to use constructor classes</title>
3935
3936 <para>
3937 Faced with the problems described above, some Haskell programmers might be
3938 tempted to use something like the following version of the class declaration:
3939 <programlisting>
3940 class Collects e c where
3941 empty :: c e
3942 insert :: e -> c e -> c e
3943 member :: e -> c e -> Bool
3944 </programlisting>
3945 The key difference here is that we abstract over the type constructor c that is
3946 used to form the collection type c e, and not over that collection type itself,
3947 represented by ce in the original class declaration. This avoids the immediate
3948 problems that we mentioned above: empty has type <literal>Collects e c => c
3949 e</literal>, which is not ambiguous.
3950 </para>
3951 <para>
3952 The function f from the previous section has a more accurate type:
3953 <programlisting>
3954 f :: (Collects e c) => e -> e -> c e -> c e
3955 </programlisting>
3956 The function g from the previous section is now rejected with a type error as
3957 we would hope because the type of f does not allow the two arguments to have
3958 different types.
3959 This, then, is an example of a multiple parameter class that does actually work
3960 quite well in practice, without ambiguity problems.
3961 There is, however, a catch. This version of the Collects class is nowhere near
3962 as general as the original class seemed to be: only one of the four instances
3963 for <literal>Collects</literal>
3964 given above can be used with this version of Collects because only one of
3965 them---the instance for lists---has a collection type that can be written in
3966 the form c e, for some type constructor c, and element type e.
3967 </para>
3968 </sect4>
3969
3970 <sect4><title>Adding functional dependencies</title>
3971
3972 <para>
3973 To get a more useful version of the Collects class, Hugs provides a mechanism
3974 that allows programmers to specify dependencies between the parameters of a
3975 multiple parameter class (For readers with an interest in theoretical
3976 foundations and previous work: The use of dependency information can be seen
3977 both as a generalization of the proposal for `parametric type classes' that was
3978 put forward by Chen, Hudak, and Odersky, or as a special case of Mark Jones's
3979 later framework for "improvement" of qualified types. The
3980 underlying ideas are also discussed in a more theoretical and abstract setting
3981 in a manuscript [implparam], where they are identified as one point in a
3982 general design space for systems of implicit parameterization.).
3983
3984 To start with an abstract example, consider a declaration such as:
3985 <programlisting>
3986 class C a b where ...
3987 </programlisting>
3988 which tells us simply that C can be thought of as a binary relation on types
3989 (or type constructors, depending on the kinds of a and b). Extra clauses can be
3990 included in the definition of classes to add information about dependencies
3991 between parameters, as in the following examples:
3992 <programlisting>
3993 class D a b | a -> b where ...
3994 class E a b | a -> b, b -> a where ...
3995 </programlisting>
3996 The notation <literal>a -&gt; b</literal> used here between the | and where
3997 symbols --- not to be
3998 confused with a function type --- indicates that the a parameter uniquely
3999 determines the b parameter, and might be read as "a determines b." Thus D is
4000 not just a relation, but actually a (partial) function. Similarly, from the two
4001 dependencies that are included in the definition of E, we can see that E
4002 represents a (partial) one-one mapping between types.
4003 </para>
4004 <para>
4005 More generally, dependencies take the form <literal>x1 ... xn -&gt; y1 ... ym</literal>,
4006 where x1, ..., xn, and y1, ..., yn are type variables with n&gt;0 and
4007 m&gt;=0, meaning that the y parameters are uniquely determined by the x
4008 parameters. Spaces can be used as separators if more than one variable appears
4009 on any single side of a dependency, as in <literal>t -&gt; a b</literal>. Note that a class may be
4010 annotated with multiple dependencies using commas as separators, as in the
4011 definition of E above. Some dependencies that we can write in this notation are
4012 redundant, and will be rejected because they don't serve any useful
4013 purpose, and may instead indicate an error in the program. Examples of
4014 dependencies like this include <literal>a -&gt; a </literal>,
4015 <literal>a -&gt; a a </literal>,
4016 <literal>a -&gt; </literal>, etc. There can also be
4017 some redundancy if multiple dependencies are given, as in
4018 <literal>a-&gt;b</literal>,
4019 <literal>b-&gt;c </literal>, <literal>a-&gt;c </literal>, and
4020 in which some subset implies the remaining dependencies. Examples like this are
4021 not treated as errors. Note that dependencies appear only in class
4022 declarations, and not in any other part of the language. In particular, the
4023 syntax for instance declarations, class constraints, and types is completely
4024 unchanged.
4025 </para>
4026 <para>
4027 By including dependencies in a class declaration, we provide a mechanism for
4028 the programmer to specify each multiple parameter class more precisely. The
4029 compiler, on the other hand, is responsible for ensuring that the set of
4030 instances that are in scope at any given point in the program is consistent
4031 with any declared dependencies. For example, the following pair of instance
4032 declarations cannot appear together in the same scope because they violate the
4033 dependency for D, even though either one on its own would be acceptable:
4034 <programlisting>
4035 instance D Bool Int where ...
4036 instance D Bool Char where ...
4037 </programlisting>
4038 Note also that the following declaration is not allowed, even by itself:
4039 <programlisting>
4040 instance D [a] b where ...
4041 </programlisting>
4042 The problem here is that this instance would allow one particular choice of [a]
4043 to be associated with more than one choice for b, which contradicts the
4044 dependency specified in the definition of D. More generally, this means that,
4045 in any instance of the form:
4046 <programlisting>
4047 instance D t s where ...
4048 </programlisting>
4049 for some particular types t and s, the only variables that can appear in s are
4050 the ones that appear in t, and hence, if the type t is known, then s will be
4051 uniquely determined.
4052 </para>
4053 <para>
4054 The benefit of including dependency information is that it allows us to define
4055 more general multiple parameter classes, without ambiguity problems, and with
4056 the benefit of more accurate types. To illustrate this, we return to the
4057 collection class example, and annotate the original definition of <literal>Collects</literal>
4058 with a simple dependency:
4059 <programlisting>
4060 class Collects e ce | ce -> e where
4061 empty :: ce
4062 insert :: e -> ce -> ce
4063 member :: e -> ce -> Bool
4064 </programlisting>
4065 The dependency <literal>ce -&gt; e</literal> here specifies that the type e of elements is uniquely
4066 determined by the type of the collection ce. Note that both parameters of
4067 Collects are of kind *; there are no constructor classes here. Note too that
4068 all of the instances of Collects that we gave earlier can be used
4069 together with this new definition.
4070 </para>
4071 <para>
4072 What about the ambiguity problems that we encountered with the original
4073 definition? The empty function still has type Collects e ce => ce, but it is no
4074 longer necessary to regard that as an ambiguous type: Although the variable e
4075 does not appear on the right of the => symbol, the dependency for class
4076 Collects tells us that it is uniquely determined by ce, which does appear on
4077 the right of the => symbol. Hence the context in which empty is used can still
4078 give enough information to determine types for both ce and e, without
4079 ambiguity. More generally, we need only regard a type as ambiguous if it
4080 contains a variable on the left of the => that is not uniquely determined
4081 (either directly or indirectly) by the variables on the right.
4082 </para>
4083 <para>
4084 Dependencies also help to produce more accurate types for user defined
4085 functions, and hence to provide earlier detection of errors, and less cluttered
4086 types for programmers to work with. Recall the previous definition for a
4087 function f:
4088 <programlisting>
4089 f x y = insert x y = insert x . insert y
4090 </programlisting>
4091 for which we originally obtained a type:
4092 <programlisting>
4093 f :: (Collects a c, Collects b c) => a -> b -> c -> c
4094 </programlisting>
4095 Given the dependency information that we have for Collects, however, we can
4096 deduce that a and b must be equal because they both appear as the second
4097 parameter in a Collects constraint with the same first parameter c. Hence we
4098 can infer a shorter and more accurate type for f:
4099 <programlisting>
4100 f :: (Collects a c) => a -> a -> c -> c
4101 </programlisting>
4102 In a similar way, the earlier definition of g will now be flagged as a type error.
4103 </para>