Update user's guide for kind inference for closed type families.
[ghc.git] / docs / users_guide / glasgow_exts.xml
1 <?xml version="1.0" encoding="iso-8859-1"?>
2 <para>
3 <indexterm><primary>language, GHC</primary></indexterm>
4 <indexterm><primary>extensions, GHC</primary></indexterm>
5 As with all known Haskell systems, GHC implements some extensions to
6 the language. They can all be enabled or disabled by commandline flags
7 or language pragmas. By default GHC understands the most recent Haskell
8 version it supports, plus a handful of extensions.
9 </para>
10
11 <para>
12 Some of the Glasgow extensions serve to give you access to the
13 underlying facilities with which we implement Haskell. Thus, you can
14 get at the Raw Iron, if you are willing to write some non-portable
15 code at a more primitive level. You need not be &ldquo;stuck&rdquo;
16 on performance because of the implementation costs of Haskell's
17 &ldquo;high-level&rdquo; features&mdash;you can always code
18 &ldquo;under&rdquo; them. In an extreme case, you can write all your
19 time-critical code in C, and then just glue it together with Haskell!
20 </para>
21
22 <para>
23 Before you get too carried away working at the lowest level (e.g.,
24 sloshing <literal>MutableByteArray&num;</literal>s around your
25 program), you may wish to check if there are libraries that provide a
26 &ldquo;Haskellised veneer&rdquo; over the features you want. The
27 separate <ulink url="../libraries/index.html">libraries
28 documentation</ulink> describes all the libraries that come with GHC.
29 </para>
30
31 <!-- LANGUAGE OPTIONS -->
32 <sect1 id="options-language">
33 <title>Language options</title>
34
35 <indexterm><primary>language</primary><secondary>option</secondary>
36 </indexterm>
37 <indexterm><primary>options</primary><secondary>language</secondary>
38 </indexterm>
39 <indexterm><primary>extensions</primary><secondary>options controlling</secondary>
40 </indexterm>
41
42 <para>The language option flags control what variation of the language are
43 permitted.</para>
44
45 <para>Language options can be controlled in two ways:
46 <itemizedlist>
47 <listitem><para>Every language option can switched on by a command-line flag "<option>-X...</option>"
48 (e.g. <option>-XTemplateHaskell</option>), and switched off by the flag "<option>-XNo...</option>";
49 (e.g. <option>-XNoTemplateHaskell</option>).</para></listitem>
50 <listitem><para>
51 Language options recognised by Cabal can also be enabled using the <literal>LANGUAGE</literal> pragma,
52 thus <literal>{-# LANGUAGE TemplateHaskell #-}</literal> (see <xref linkend="language-pragma"/>). </para>
53 </listitem>
54 </itemizedlist></para>
55
56 <para>The flag <option>-fglasgow-exts</option>
57 <indexterm><primary><option>-fglasgow-exts</option></primary></indexterm>
58 is equivalent to enabling the following extensions:
59 &what_glasgow_exts_does;
60 Enabling these options is the <emphasis>only</emphasis>
61 effect of <option>-fglasgow-exts</option>.
62 We are trying to move away from this portmanteau flag,
63 and towards enabling features individually.</para>
64
65 </sect1>
66
67 <!-- UNBOXED TYPES AND PRIMITIVE OPERATIONS -->
68 <sect1 id="primitives">
69 <title>Unboxed types and primitive operations</title>
70
71 <para>GHC is built on a raft of primitive data types and operations;
72 "primitive" in the sense that they cannot be defined in Haskell itself.
73 While you really can use this stuff to write fast code,
74 we generally find it a lot less painful, and more satisfying in the
75 long run, to use higher-level language features and libraries. With
76 any luck, the code you write will be optimised to the efficient
77 unboxed version in any case. And if it isn't, we'd like to know
78 about it.</para>
79
80 <para>All these primitive data types and operations are exported by the
81 library <literal>GHC.Prim</literal>, for which there is
82 <ulink url="&libraryGhcPrimLocation;/GHC-Prim.html">detailed online documentation</ulink>.
83 (This documentation is generated from the file <filename>compiler/prelude/primops.txt.pp</filename>.)
84 </para>
85
86 <para>
87 If you want to mention any of the primitive data types or operations in your
88 program, you must first import <literal>GHC.Prim</literal> to bring them
89 into scope. Many of them have names ending in "&num;", and to mention such
90 names you need the <option>-XMagicHash</option> extension (<xref linkend="magic-hash"/>).
91 </para>
92
93 <para>The primops make extensive use of <link linkend="glasgow-unboxed">unboxed types</link>
94 and <link linkend="unboxed-tuples">unboxed tuples</link>, which
95 we briefly summarise here. </para>
96
97 <sect2 id="glasgow-unboxed">
98 <title>Unboxed types</title>
99
100 <para>
101 <indexterm><primary>Unboxed types (Glasgow extension)</primary></indexterm>
102 </para>
103
104 <para>Most types in GHC are <firstterm>boxed</firstterm>, which means
105 that values of that type are represented by a pointer to a heap
106 object. The representation of a Haskell <literal>Int</literal>, for
107 example, is a two-word heap object. An <firstterm>unboxed</firstterm>
108 type, however, is represented by the value itself, no pointers or heap
109 allocation are involved.
110 </para>
111
112 <para>
113 Unboxed types correspond to the &ldquo;raw machine&rdquo; types you
114 would use in C: <literal>Int&num;</literal> (long int),
115 <literal>Double&num;</literal> (double), <literal>Addr&num;</literal>
116 (void *), etc. The <emphasis>primitive operations</emphasis>
117 (PrimOps) on these types are what you might expect; e.g.,
118 <literal>(+&num;)</literal> is addition on
119 <literal>Int&num;</literal>s, and is the machine-addition that we all
120 know and love&mdash;usually one instruction.
121 </para>
122
123 <para>
124 Primitive (unboxed) types cannot be defined in Haskell, and are
125 therefore built into the language and compiler. Primitive types are
126 always unlifted; that is, a value of a primitive type cannot be
127 bottom. We use the convention (but it is only a convention)
128 that primitive types, values, and
129 operations have a <literal>&num;</literal> suffix (see <xref linkend="magic-hash"/>).
130 For some primitive types we have special syntax for literals, also
131 described in the <link linkend="magic-hash">same section</link>.
132 </para>
133
134 <para>
135 Primitive values are often represented by a simple bit-pattern, such
136 as <literal>Int&num;</literal>, <literal>Float&num;</literal>,
137 <literal>Double&num;</literal>. But this is not necessarily the case:
138 a primitive value might be represented by a pointer to a
139 heap-allocated object. Examples include
140 <literal>Array&num;</literal>, the type of primitive arrays. A
141 primitive array is heap-allocated because it is too big a value to fit
142 in a register, and would be too expensive to copy around; in a sense,
143 it is accidental that it is represented by a pointer. If a pointer
144 represents a primitive value, then it really does point to that value:
145 no unevaluated thunks, no indirections&hellip;nothing can be at the
146 other end of the pointer than the primitive value.
147 A numerically-intensive program using unboxed types can
148 go a <emphasis>lot</emphasis> faster than its &ldquo;standard&rdquo;
149 counterpart&mdash;we saw a threefold speedup on one example.
150 </para>
151
152 <para>
153 There are some restrictions on the use of primitive types:
154 <itemizedlist>
155 <listitem><para>The main restriction
156 is that you can't pass a primitive value to a polymorphic
157 function or store one in a polymorphic data type. This rules out
158 things like <literal>[Int&num;]</literal> (i.e. lists of primitive
159 integers). The reason for this restriction is that polymorphic
160 arguments and constructor fields are assumed to be pointers: if an
161 unboxed integer is stored in one of these, the garbage collector would
162 attempt to follow it, leading to unpredictable space leaks. Or a
163 <function>seq</function> operation on the polymorphic component may
164 attempt to dereference the pointer, with disastrous results. Even
165 worse, the unboxed value might be larger than a pointer
166 (<literal>Double&num;</literal> for instance).
167 </para>
168 </listitem>
169 <listitem><para> You cannot define a newtype whose representation type
170 (the argument type of the data constructor) is an unboxed type. Thus,
171 this is illegal:
172 <programlisting>
173 newtype A = MkA Int#
174 </programlisting>
175 </para></listitem>
176 <listitem><para> You cannot bind a variable with an unboxed type
177 in a <emphasis>top-level</emphasis> binding.
178 </para></listitem>
179 <listitem><para> You cannot bind a variable with an unboxed type
180 in a <emphasis>recursive</emphasis> binding.
181 </para></listitem>
182 <listitem><para> You may bind unboxed variables in a (non-recursive,
183 non-top-level) pattern binding, but you must make any such pattern-match
184 strict. For example, rather than:
185 <programlisting>
186 data Foo = Foo Int Int#
187
188 f x = let (Foo a b, w) = ..rhs.. in ..body..
189 </programlisting>
190 you must write:
191 <programlisting>
192 data Foo = Foo Int Int#
193
194 f x = let !(Foo a b, w) = ..rhs.. in ..body..
195 </programlisting>
196 since <literal>b</literal> has type <literal>Int#</literal>.
197 </para>
198 </listitem>
199 </itemizedlist>
200 </para>
201
202 </sect2>
203
204 <sect2 id="unboxed-tuples">
205 <title>Unboxed tuples</title>
206
207 <para>
208 Unboxed tuples aren't really exported by <literal>GHC.Exts</literal>;
209 they are a syntactic extension enabled by the language flag <option>-XUnboxedTuples</option>. An
210 unboxed tuple looks like this:
211 </para>
212
213 <para>
214
215 <programlisting>
216 (# e_1, ..., e_n #)
217 </programlisting>
218
219 </para>
220
221 <para>
222 where <literal>e&lowbar;1..e&lowbar;n</literal> are expressions of any
223 type (primitive or non-primitive). The type of an unboxed tuple looks
224 the same.
225 </para>
226
227 <para>
228 Note that when unboxed tuples are enabled,
229 <literal>(#</literal> is a single lexeme, so for example when using
230 operators like <literal>#</literal> and <literal>#-</literal> you need
231 to write <literal>( # )</literal> and <literal>( #- )</literal> rather than
232 <literal>(#)</literal> and <literal>(#-)</literal>.
233 </para>
234
235 <para>
236 Unboxed tuples are used for functions that need to return multiple
237 values, but they avoid the heap allocation normally associated with
238 using fully-fledged tuples. When an unboxed tuple is returned, the
239 components are put directly into registers or on the stack; the
240 unboxed tuple itself does not have a composite representation. Many
241 of the primitive operations listed in <literal>primops.txt.pp</literal> return unboxed
242 tuples.
243 In particular, the <literal>IO</literal> and <literal>ST</literal> monads use unboxed
244 tuples to avoid unnecessary allocation during sequences of operations.
245 </para>
246
247 <para>
248 There are some restrictions on the use of unboxed tuples:
249 <itemizedlist>
250
251 <listitem>
252 <para>
253 Values of unboxed tuple types are subject to the same restrictions as
254 other unboxed types; i.e. they may not be stored in polymorphic data
255 structures or passed to polymorphic functions.
256 </para>
257 </listitem>
258
259 <listitem>
260 <para>
261 The typical use of unboxed tuples is simply to return multiple values,
262 binding those multiple results with a <literal>case</literal> expression, thus:
263 <programlisting>
264 f x y = (# x+1, y-1 #)
265 g x = case f x x of { (# a, b #) -&#62; a + b }
266 </programlisting>
267 You can have an unboxed tuple in a pattern binding, thus
268 <programlisting>
269 f x = let (# p,q #) = h x in ..body..
270 </programlisting>
271 If the types of <literal>p</literal> and <literal>q</literal> are not unboxed,
272 the resulting binding is lazy like any other Haskell pattern binding. The
273 above example desugars like this:
274 <programlisting>
275 f x = let t = case h x o f{ (# p,q #) -> (p,q)
276 p = fst t
277 q = snd t
278 in ..body..
279 </programlisting>
280 Indeed, the bindings can even be recursive.
281 </para>
282 </listitem>
283 </itemizedlist>
284
285 </para>
286
287 </sect2>
288 </sect1>
289
290
291 <!-- ====================== SYNTACTIC EXTENSIONS ======================= -->
292
293 <sect1 id="syntax-extns">
294 <title>Syntactic extensions</title>
295
296 <sect2 id="unicode-syntax">
297 <title>Unicode syntax</title>
298 <para>The language
299 extension <option>-XUnicodeSyntax</option><indexterm><primary><option>-XUnicodeSyntax</option></primary></indexterm>
300 enables Unicode characters to be used to stand for certain ASCII
301 character sequences. The following alternatives are provided:</para>
302
303 <informaltable>
304 <tgroup cols="2" align="left" colsep="1" rowsep="1">
305 <thead>
306 <row>
307 <entry>ASCII</entry>
308 <entry>Unicode alternative</entry>
309 <entry>Code point</entry>
310 <entry>Name</entry>
311 </row>
312 </thead>
313
314 <!--
315 to find the DocBook entities for these characters, find
316 the Unicode code point (e.g. 0x2237), and grep for it in
317 /usr/share/sgml/docbook/xml-dtd-*/ent/* (or equivalent on
318 your system. Some of these Unicode code points don't have
319 equivalent DocBook entities.
320 -->
321
322 <tbody>
323 <row>
324 <entry><literal>::</literal></entry>
325 <entry>::</entry> <!-- no special char, apparently -->
326 <entry>0x2237</entry>
327 <entry>PROPORTION</entry>
328 </row>
329 </tbody>
330 <tbody>
331 <row>
332 <entry><literal>=&gt;</literal></entry>
333 <entry>&rArr;</entry>
334 <entry>0x21D2</entry>
335 <entry>RIGHTWARDS DOUBLE ARROW</entry>
336 </row>
337 </tbody>
338 <tbody>
339 <row>
340 <entry><literal>forall</literal></entry>
341 <entry>&forall;</entry>
342 <entry>0x2200</entry>
343 <entry>FOR ALL</entry>
344 </row>
345 </tbody>
346 <tbody>
347 <row>
348 <entry><literal>-&gt;</literal></entry>
349 <entry>&rarr;</entry>
350 <entry>0x2192</entry>
351 <entry>RIGHTWARDS ARROW</entry>
352 </row>
353 </tbody>
354 <tbody>
355 <row>
356 <entry><literal>&lt;-</literal></entry>
357 <entry>&larr;</entry>
358 <entry>0x2190</entry>
359 <entry>LEFTWARDS ARROW</entry>
360 </row>
361 </tbody>
362
363 <tbody>
364 <row>
365 <entry>-&lt;</entry>
366 <entry>&larrtl;</entry>
367 <entry>0x2919</entry>
368 <entry>LEFTWARDS ARROW-TAIL</entry>
369 </row>
370 </tbody>
371
372 <tbody>
373 <row>
374 <entry>&gt;-</entry>
375 <entry>&rarrtl;</entry>
376 <entry>0x291A</entry>
377 <entry>RIGHTWARDS ARROW-TAIL</entry>
378 </row>
379 </tbody>
380
381 <tbody>
382 <row>
383 <entry>-&lt;&lt;</entry>
384 <entry></entry>
385 <entry>0x291B</entry>
386 <entry>LEFTWARDS DOUBLE ARROW-TAIL</entry>
387 </row>
388 </tbody>
389
390 <tbody>
391 <row>
392 <entry>&gt;&gt;-</entry>
393 <entry></entry>
394 <entry>0x291C</entry>
395 <entry>RIGHTWARDS DOUBLE ARROW-TAIL</entry>
396 </row>
397 </tbody>
398
399 <tbody>
400 <row>
401 <entry>*</entry>
402 <entry>&starf;</entry>
403 <entry>0x2605</entry>
404 <entry>BLACK STAR</entry>
405 </row>
406 </tbody>
407
408 </tgroup>
409 </informaltable>
410 </sect2>
411
412 <sect2 id="magic-hash">
413 <title>The magic hash</title>
414 <para>The language extension <option>-XMagicHash</option> allows "&num;" as a
415 postfix modifier to identifiers. Thus, "x&num;" is a valid variable, and "T&num;" is
416 a valid type constructor or data constructor.</para>
417
418 <para>The hash sign does not change semantics at all. We tend to use variable
419 names ending in "&num;" for unboxed values or types (e.g. <literal>Int&num;</literal>),
420 but there is no requirement to do so; they are just plain ordinary variables.
421 Nor does the <option>-XMagicHash</option> extension bring anything into scope.
422 For example, to bring <literal>Int&num;</literal> into scope you must
423 import <literal>GHC.Prim</literal> (see <xref linkend="primitives"/>);
424 the <option>-XMagicHash</option> extension
425 then allows you to <emphasis>refer</emphasis> to the <literal>Int&num;</literal>
426 that is now in scope.</para>
427 <para> The <option>-XMagicHash</option> also enables some new forms of literals (see <xref linkend="glasgow-unboxed"/>):
428 <itemizedlist>
429 <listitem><para> <literal>'x'&num;</literal> has type <literal>Char&num;</literal></para> </listitem>
430 <listitem><para> <literal>&quot;foo&quot;&num;</literal> has type <literal>Addr&num;</literal></para> </listitem>
431 <listitem><para> <literal>3&num;</literal> has type <literal>Int&num;</literal>. In general,
432 any Haskell integer lexeme followed by a <literal>&num;</literal> is an <literal>Int&num;</literal> literal, e.g.
433 <literal>-0x3A&num;</literal> as well as <literal>32&num;</literal></para>.</listitem>
434 <listitem><para> <literal>3&num;&num;</literal> has type <literal>Word&num;</literal>. In general,
435 any non-negative Haskell integer lexeme followed by <literal>&num;&num;</literal>
436 is a <literal>Word&num;</literal>. </para> </listitem>
437 <listitem><para> <literal>3.2&num;</literal> has type <literal>Float&num;</literal>.</para> </listitem>
438 <listitem><para> <literal>3.2&num;&num;</literal> has type <literal>Double&num;</literal></para> </listitem>
439 </itemizedlist>
440 </para>
441 </sect2>
442
443 <!-- ====================== HIERARCHICAL MODULES ======================= -->
444
445
446 <sect2 id="hierarchical-modules">
447 <title>Hierarchical Modules</title>
448
449 <para>GHC supports a small extension to the syntax of module
450 names: a module name is allowed to contain a dot
451 <literal>&lsquo;.&rsquo;</literal>. This is also known as the
452 &ldquo;hierarchical module namespace&rdquo; extension, because
453 it extends the normally flat Haskell module namespace into a
454 more flexible hierarchy of modules.</para>
455
456 <para>This extension has very little impact on the language
457 itself; modules names are <emphasis>always</emphasis> fully
458 qualified, so you can just think of the fully qualified module
459 name as <quote>the module name</quote>. In particular, this
460 means that the full module name must be given after the
461 <literal>module</literal> keyword at the beginning of the
462 module; for example, the module <literal>A.B.C</literal> must
463 begin</para>
464
465 <programlisting>module A.B.C</programlisting>
466
467
468 <para>It is a common strategy to use the <literal>as</literal>
469 keyword to save some typing when using qualified names with
470 hierarchical modules. For example:</para>
471
472 <programlisting>
473 import qualified Control.Monad.ST.Strict as ST
474 </programlisting>
475
476 <para>For details on how GHC searches for source and interface
477 files in the presence of hierarchical modules, see <xref
478 linkend="search-path"/>.</para>
479
480 <para>GHC comes with a large collection of libraries arranged
481 hierarchically; see the accompanying <ulink
482 url="../libraries/index.html">library
483 documentation</ulink>. More libraries to install are available
484 from <ulink
485 url="http://hackage.haskell.org/packages/hackage.html">HackageDB</ulink>.</para>
486 </sect2>
487
488 <!-- ====================== PATTERN GUARDS ======================= -->
489
490 <sect2 id="pattern-guards">
491 <title>Pattern guards</title>
492
493 <para>
494 <indexterm><primary>Pattern guards (Glasgow extension)</primary></indexterm>
495 The discussion that follows is an abbreviated version of Simon Peyton Jones's original <ulink url="http://research.microsoft.com/~simonpj/Haskell/guards.html">proposal</ulink>. (Note that the proposal was written before pattern guards were implemented, so refers to them as unimplemented.)
496 </para>
497
498 <para>
499 Suppose we have an abstract data type of finite maps, with a
500 lookup operation:
501
502 <programlisting>
503 lookup :: FiniteMap -> Int -> Maybe Int
504 </programlisting>
505
506 The lookup returns <function>Nothing</function> if the supplied key is not in the domain of the mapping, and <function>(Just v)</function> otherwise,
507 where <varname>v</varname> is the value that the key maps to. Now consider the following definition:
508 </para>
509
510 <programlisting>
511 clunky env var1 var2 | ok1 &amp;&amp; ok2 = val1 + val2
512 | otherwise = var1 + var2
513 where
514 m1 = lookup env var1
515 m2 = lookup env var2
516 ok1 = maybeToBool m1
517 ok2 = maybeToBool m2
518 val1 = expectJust m1
519 val2 = expectJust m2
520 </programlisting>
521
522 <para>
523 The auxiliary functions are
524 </para>
525
526 <programlisting>
527 maybeToBool :: Maybe a -&gt; Bool
528 maybeToBool (Just x) = True
529 maybeToBool Nothing = False
530
531 expectJust :: Maybe a -&gt; a
532 expectJust (Just x) = x
533 expectJust Nothing = error "Unexpected Nothing"
534 </programlisting>
535
536 <para>
537 What is <function>clunky</function> doing? The guard <literal>ok1 &amp;&amp;
538 ok2</literal> checks that both lookups succeed, using
539 <function>maybeToBool</function> to convert the <function>Maybe</function>
540 types to booleans. The (lazily evaluated) <function>expectJust</function>
541 calls extract the values from the results of the lookups, and binds the
542 returned values to <varname>val1</varname> and <varname>val2</varname>
543 respectively. If either lookup fails, then clunky takes the
544 <literal>otherwise</literal> case and returns the sum of its arguments.
545 </para>
546
547 <para>
548 This is certainly legal Haskell, but it is a tremendously verbose and
549 un-obvious way to achieve the desired effect. Arguably, a more direct way
550 to write clunky would be to use case expressions:
551 </para>
552
553 <programlisting>
554 clunky env var1 var2 = case lookup env var1 of
555 Nothing -&gt; fail
556 Just val1 -&gt; case lookup env var2 of
557 Nothing -&gt; fail
558 Just val2 -&gt; val1 + val2
559 where
560 fail = var1 + var2
561 </programlisting>
562
563 <para>
564 This is a bit shorter, but hardly better. Of course, we can rewrite any set
565 of pattern-matching, guarded equations as case expressions; that is
566 precisely what the compiler does when compiling equations! The reason that
567 Haskell provides guarded equations is because they allow us to write down
568 the cases we want to consider, one at a time, independently of each other.
569 This structure is hidden in the case version. Two of the right-hand sides
570 are really the same (<function>fail</function>), and the whole expression
571 tends to become more and more indented.
572 </para>
573
574 <para>
575 Here is how I would write clunky:
576 </para>
577
578 <programlisting>
579 clunky env var1 var2
580 | Just val1 &lt;- lookup env var1
581 , Just val2 &lt;- lookup env var2
582 = val1 + val2
583 ...other equations for clunky...
584 </programlisting>
585
586 <para>
587 The semantics should be clear enough. The qualifiers are matched in order.
588 For a <literal>&lt;-</literal> qualifier, which I call a pattern guard, the
589 right hand side is evaluated and matched against the pattern on the left.
590 If the match fails then the whole guard fails and the next equation is
591 tried. If it succeeds, then the appropriate binding takes place, and the
592 next qualifier is matched, in the augmented environment. Unlike list
593 comprehensions, however, the type of the expression to the right of the
594 <literal>&lt;-</literal> is the same as the type of the pattern to its
595 left. The bindings introduced by pattern guards scope over all the
596 remaining guard qualifiers, and over the right hand side of the equation.
597 </para>
598
599 <para>
600 Just as with list comprehensions, boolean expressions can be freely mixed
601 with among the pattern guards. For example:
602 </para>
603
604 <programlisting>
605 f x | [y] &lt;- x
606 , y > 3
607 , Just z &lt;- h y
608 = ...
609 </programlisting>
610
611 <para>
612 Haskell's current guards therefore emerge as a special case, in which the
613 qualifier list has just one element, a boolean expression.
614 </para>
615 </sect2>
616
617 <!-- ===================== View patterns =================== -->
618
619 <sect2 id="view-patterns">
620 <title>View patterns
621 </title>
622
623 <para>
624 View patterns are enabled by the flag <literal>-XViewPatterns</literal>.
625 More information and examples of view patterns can be found on the
626 <ulink url="http://hackage.haskell.org/trac/ghc/wiki/ViewPatterns">Wiki
627 page</ulink>.
628 </para>
629
630 <para>
631 View patterns are somewhat like pattern guards that can be nested inside
632 of other patterns. They are a convenient way of pattern-matching
633 against values of abstract types. For example, in a programming language
634 implementation, we might represent the syntax of the types of the
635 language as follows:
636
637 <programlisting>
638 type Typ
639
640 data TypView = Unit
641 | Arrow Typ Typ
642
643 view :: Typ -> TypView
644
645 -- additional operations for constructing Typ's ...
646 </programlisting>
647
648 The representation of Typ is held abstract, permitting implementations
649 to use a fancy representation (e.g., hash-consing to manage sharing).
650
651 Without view patterns, using this signature a little inconvenient:
652 <programlisting>
653 size :: Typ -> Integer
654 size t = case view t of
655 Unit -> 1
656 Arrow t1 t2 -> size t1 + size t2
657 </programlisting>
658
659 It is necessary to iterate the case, rather than using an equational
660 function definition. And the situation is even worse when the matching
661 against <literal>t</literal> is buried deep inside another pattern.
662 </para>
663
664 <para>
665 View patterns permit calling the view function inside the pattern and
666 matching against the result:
667 <programlisting>
668 size (view -> Unit) = 1
669 size (view -> Arrow t1 t2) = size t1 + size t2
670 </programlisting>
671
672 That is, we add a new form of pattern, written
673 <replaceable>expression</replaceable> <literal>-></literal>
674 <replaceable>pattern</replaceable> that means "apply the expression to
675 whatever we're trying to match against, and then match the result of
676 that application against the pattern". The expression can be any Haskell
677 expression of function type, and view patterns can be used wherever
678 patterns are used.
679 </para>
680
681 <para>
682 The semantics of a pattern <literal>(</literal>
683 <replaceable>exp</replaceable> <literal>-></literal>
684 <replaceable>pat</replaceable> <literal>)</literal> are as follows:
685
686 <itemizedlist>
687
688 <listitem> Scoping:
689
690 <para>The variables bound by the view pattern are the variables bound by
691 <replaceable>pat</replaceable>.
692 </para>
693
694 <para>
695 Any variables in <replaceable>exp</replaceable> are bound occurrences,
696 but variables bound "to the left" in a pattern are in scope. This
697 feature permits, for example, one argument to a function to be used in
698 the view of another argument. For example, the function
699 <literal>clunky</literal> from <xref linkend="pattern-guards" /> can be
700 written using view patterns as follows:
701
702 <programlisting>
703 clunky env (lookup env -> Just val1) (lookup env -> Just val2) = val1 + val2
704 ...other equations for clunky...
705 </programlisting>
706 </para>
707
708 <para>
709 More precisely, the scoping rules are:
710 <itemizedlist>
711 <listitem>
712 <para>
713 In a single pattern, variables bound by patterns to the left of a view
714 pattern expression are in scope. For example:
715 <programlisting>
716 example :: Maybe ((String -> Integer,Integer), String) -> Bool
717 example Just ((f,_), f -> 4) = True
718 </programlisting>
719
720 Additionally, in function definitions, variables bound by matching earlier curried
721 arguments may be used in view pattern expressions in later arguments:
722 <programlisting>
723 example :: (String -> Integer) -> String -> Bool
724 example f (f -> 4) = True
725 </programlisting>
726 That is, the scoping is the same as it would be if the curried arguments
727 were collected into a tuple.
728 </para>
729 </listitem>
730
731 <listitem>
732 <para>
733 In mutually recursive bindings, such as <literal>let</literal>,
734 <literal>where</literal>, or the top level, view patterns in one
735 declaration may not mention variables bound by other declarations. That
736 is, each declaration must be self-contained. For example, the following
737 program is not allowed:
738 <programlisting>
739 let {(x -> y) = e1 ;
740 (y -> x) = e2 } in x
741 </programlisting>
742
743 (For some amplification on this design choice see
744 <ulink url="http://hackage.haskell.org/trac/ghc/ticket/4061">Trac #4061</ulink>.)
745
746 </para>
747 </listitem>
748 </itemizedlist>
749
750 </para>
751 </listitem>
752
753 <listitem><para> Typing: If <replaceable>exp</replaceable> has type
754 <replaceable>T1</replaceable> <literal>-></literal>
755 <replaceable>T2</replaceable> and <replaceable>pat</replaceable> matches
756 a <replaceable>T2</replaceable>, then the whole view pattern matches a
757 <replaceable>T1</replaceable>.
758 </para></listitem>
759
760 <listitem><para> Matching: To the equations in Section 3.17.3 of the
761 <ulink url="http://www.haskell.org/onlinereport/">Haskell 98
762 Report</ulink>, add the following:
763 <programlisting>
764 case v of { (e -> p) -> e1 ; _ -> e2 }
765 =
766 case (e v) of { p -> e1 ; _ -> e2 }
767 </programlisting>
768 That is, to match a variable <replaceable>v</replaceable> against a pattern
769 <literal>(</literal> <replaceable>exp</replaceable>
770 <literal>-></literal> <replaceable>pat</replaceable>
771 <literal>)</literal>, evaluate <literal>(</literal>
772 <replaceable>exp</replaceable> <replaceable> v</replaceable>
773 <literal>)</literal> and match the result against
774 <replaceable>pat</replaceable>.
775 </para></listitem>
776
777 <listitem><para> Efficiency: When the same view function is applied in
778 multiple branches of a function definition or a case expression (e.g.,
779 in <literal>size</literal> above), GHC makes an attempt to collect these
780 applications into a single nested case expression, so that the view
781 function is only applied once. Pattern compilation in GHC follows the
782 matrix algorithm described in Chapter 4 of <ulink
783 url="http://research.microsoft.com/~simonpj/Papers/slpj-book-1987/">The
784 Implementation of Functional Programming Languages</ulink>. When the
785 top rows of the first column of a matrix are all view patterns with the
786 "same" expression, these patterns are transformed into a single nested
787 case. This includes, for example, adjacent view patterns that line up
788 in a tuple, as in
789 <programlisting>
790 f ((view -> A, p1), p2) = e1
791 f ((view -> B, p3), p4) = e2
792 </programlisting>
793 </para>
794
795 <para> The current notion of when two view pattern expressions are "the
796 same" is very restricted: it is not even full syntactic equality.
797 However, it does include variables, literals, applications, and tuples;
798 e.g., two instances of <literal>view ("hi", "there")</literal> will be
799 collected. However, the current implementation does not compare up to
800 alpha-equivalence, so two instances of <literal>(x, view x ->
801 y)</literal> will not be coalesced.
802 </para>
803
804 </listitem>
805
806 </itemizedlist>
807 </para>
808
809 </sect2>
810
811 <!-- ===================== n+k patterns =================== -->
812
813 <sect2 id="n-k-patterns">
814 <title>n+k patterns</title>
815 <indexterm><primary><option>-XNPlusKPatterns</option></primary></indexterm>
816
817 <para>
818 <literal>n+k</literal> pattern support is disabled by default. To enable
819 it, you can use the <option>-XNPlusKPatterns</option> flag.
820 </para>
821
822 </sect2>
823
824 <!-- ===================== Traditional record syntax =================== -->
825
826 <sect2 id="traditional-record-syntax">
827 <title>Traditional record syntax</title>
828 <indexterm><primary><option>-XNoTraditionalRecordSyntax</option></primary></indexterm>
829
830 <para>
831 Traditional record syntax, such as <literal>C {f = x}</literal>, is enabled by default.
832 To disable it, you can use the <option>-XNoTraditionalRecordSyntax</option> flag.
833 </para>
834
835 </sect2>
836
837 <!-- ===================== Recursive do-notation =================== -->
838
839 <sect2 id="recursive-do-notation">
840 <title>The recursive do-notation
841 </title>
842
843 <para>
844 The do-notation of Haskell 98 does not allow <emphasis>recursive bindings</emphasis>,
845 that is, the variables bound in a do-expression are visible only in the textually following
846 code block. Compare this to a let-expression, where bound variables are visible in the entire binding
847 group.
848 </para>
849
850 <para>
851 It turns out that such recursive bindings do indeed make sense for a variety of monads, but
852 not all. In particular, recursion in this sense requires a fixed-point operator for the underlying
853 monad, captured by the <literal>mfix</literal> method of the <literal>MonadFix</literal> class, defined in <literal>Control.Monad.Fix</literal> as follows:
854 <programlisting>
855 class Monad m => MonadFix m where
856 mfix :: (a -> m a) -> m a
857 </programlisting>
858 Haskell's
859 <literal>Maybe</literal>, <literal>[]</literal> (list), <literal>ST</literal> (both strict and lazy versions),
860 <literal>IO</literal>, and many other monads have <literal>MonadFix</literal> instances. On the negative
861 side, the continuation monad, with the signature <literal>(a -> r) -> r</literal>, does not.
862 </para>
863
864 <para>
865 For monads that do belong to the <literal>MonadFix</literal> class, GHC provides
866 an extended version of the do-notation that allows recursive bindings.
867 The <option>-XRecursiveDo</option> (language pragma: <literal>RecursiveDo</literal>)
868 provides the necessary syntactic support, introducing the keywords <literal>mdo</literal> and
869 <literal>rec</literal> for higher and lower levels of the notation respectively. Unlike
870 bindings in a <literal>do</literal> expression, those introduced by <literal>mdo</literal> and <literal>rec</literal>
871 are recursively defined, much like in an ordinary let-expression. Due to the new
872 keyword <literal>mdo</literal>, we also call this notation the <emphasis>mdo-notation</emphasis>.
873 </para>
874
875 <para>
876 Here is a simple (albeit contrived) example:
877 <programlisting>
878 {-# LANGUAGE RecursiveDo #-}
879 justOnes = mdo { xs &lt;- Just (1:xs)
880 ; return (map negate xs) }
881 </programlisting>
882 or equivalently
883 <programlisting>
884 {-# LANGUAGE RecursiveDo #-}
885 justOnes = do { rec { xs &lt;- Just (1:xs) }
886 ; return (map negate xs) }
887 </programlisting>
888 As you can guess <literal>justOnes</literal> will evaluate to <literal>Just [-1,-1,-1,...</literal>.
889 </para>
890
891 <para>
892 GHC's implementation the mdo-notation closely follows the original translation as described in the paper
893 <ulink url="https://sites.google.com/site/leventerkok/recdo.pdf">A recursive do for Haskell</ulink>, which
894 in turn is based on the work <ulink url="http://sites.google.com/site/leventerkok/erkok-thesis.pdf">Value Recursion
895 in Monadic Computations</ulink>. Furthermore, GHC extends the syntax described in the former paper
896 with a lower level syntax flagged by the <literal>rec</literal> keyword, as we describe next.
897 </para>
898
899 <sect3>
900 <title>Recursive binding groups</title>
901
902 <para>
903 The flag <option>-XRecursiveDo</option> also introduces a new keyword <literal>rec</literal>, which wraps a
904 mutually-recursive group of monadic statements inside a <literal>do</literal> expression, producing a single statement.
905 Similar to a <literal>let</literal> statement inside a <literal>do</literal>, variables bound in
906 the <literal>rec</literal> are visible throughout the <literal>rec</literal> group, and below it. For example, compare
907 <programlisting>
908 do { a &lt;- getChar do { a &lt;- getChar
909 ; let { r1 = f a r2 ; rec { r1 &lt;- f a r2
910 ; ; r2 = g r1 } ; ; r2 &lt;- g r1 }
911 ; return (r1 ++ r2) } ; return (r1 ++ r2) }
912 </programlisting>
913 In both cases, <literal>r1</literal> and <literal>r2</literal> are available both throughout
914 the <literal>let</literal> or <literal>rec</literal> block, and in the statements that follow it.
915 The difference is that <literal>let</literal> is non-monadic, while <literal>rec</literal> is monadic.
916 (In Haskell <literal>let</literal> is really <literal>letrec</literal>, of course.)
917 </para>
918
919 <para>
920 The semantics of <literal>rec</literal> is fairly straightforward. Whenever GHC finds a <literal>rec</literal>
921 group, it will compute its set of bound variables, and will introduce an appropriate call
922 to the underlying monadic value-recursion operator <literal>mfix</literal>, belonging to the
923 <literal>MonadFix</literal> class. Here is an example:
924 <programlisting>
925 rec { b &lt;- f a c ===> (b,c) &lt;- mfix (\ ~(b,c) -> do { b &lt;- f a c
926 ; c &lt;- f b a } ; c &lt;- f b a
927 ; return (b,c) })
928 </programlisting>
929 As usual, the meta-variables <literal>b</literal>, <literal>c</literal> etc., can be arbitrary patterns.
930 In general, the statement <literal>rec <replaceable>ss</replaceable></literal> is desugared to the statement
931 <programlisting>
932 <replaceable>vs</replaceable> &lt;- mfix (\ ~<replaceable>vs</replaceable> -&gt; do { <replaceable>ss</replaceable>; return <replaceable>vs</replaceable> })
933 </programlisting>
934 where <replaceable>vs</replaceable> is a tuple of the variables bound by <replaceable>ss</replaceable>.
935 </para>
936
937 <para>
938 Note in particular that the translation for a <literal>rec</literal> block only involves wrapping a call
939 to <literal>mfix</literal>: it performs no other analysis on the bindings. The latter is the task
940 for the <literal>mdo</literal> notation, which is described next.
941 </para>
942 </sect3>
943
944 <sect3>
945 <title>The <literal>mdo</literal> notation</title>
946
947 <para>
948 A <literal>rec</literal>-block tells the compiler where precisely the recursive knot should be tied. It turns out that
949 the placement of the recursive knots can be rather delicate: in particular, we would like the knots to be wrapped
950 around as minimal groups as possible. This process is known as <emphasis>segmentation</emphasis>, and is described
951 in detail in Secton 3.2 of <ulink url="https://sites.google.com/site/leventerkok/recdo.pdf">A recursive do for
952 Haskell</ulink>. Segmentation improves polymorphism and reduces the size of the recursive knot. Most importantly, it avoids
953 unnecessary interference caused by a fundamental issue with the so-called <emphasis>right-shrinking</emphasis>
954 axiom for monadic recursion. In brief, most monads of interest (IO, strict state, etc.) do <emphasis>not</emphasis>
955 have recursion operators that satisfy this axiom, and thus not performing segmentation can cause unnecessary
956 interference, changing the termination behavior of the resulting translation.
957 (Details can be found in Sections 3.1 and 7.2.2 of
958 <ulink url="http://sites.google.com/site/leventerkok/erkok-thesis.pdf">Value Recursion in Monadic Computations</ulink>.)
959 </para>
960
961 <para>
962 The <literal>mdo</literal> notation removes the burden of placing
963 explicit <literal>rec</literal> blocks in the code. Unlike an
964 ordinary <literal>do</literal> expression, in which variables bound by
965 statements are only in scope for later statements, variables bound in
966 an <literal>mdo</literal> expression are in scope for all statements
967 of the expression. The compiler then automatically identifies minimal
968 mutually recursively dependent segments of statements, treating them as
969 if the user had wrapped a <literal>rec</literal> qualifier around them.
970 </para>
971
972 <para>
973 The definition is syntactic:
974 </para>
975 <itemizedlist>
976 <listitem>
977 <para>
978 A generator <replaceable>g</replaceable>
979 <emphasis>depends</emphasis> on a textually following generator
980 <replaceable>g'</replaceable>, if
981 </para>
982 <itemizedlist>
983 <listitem>
984 <para>
985 <replaceable>g'</replaceable> defines a variable that
986 is used by <replaceable>g</replaceable>, or
987 </para>
988 </listitem>
989 <listitem>
990 <para>
991 <replaceable>g'</replaceable> textually appears between
992 <replaceable>g</replaceable> and
993 <replaceable>g''</replaceable>, where <replaceable>g</replaceable>
994 depends on <replaceable>g''</replaceable>.
995 </para>
996 </listitem>
997 </itemizedlist>
998 </listitem>
999 <listitem>
1000 <para>
1001 A <emphasis>segment</emphasis> of a given
1002 <literal>mdo</literal>-expression is a minimal sequence of generators
1003 such that no generator of the sequence depends on an outside
1004 generator. As a special case, although it is not a generator,
1005 the final expression in an <literal>mdo</literal>-expression is
1006 considered to form a segment by itself.
1007 </para>
1008 </listitem>
1009 </itemizedlist>
1010 <para>
1011 Segments in this sense are
1012 related to <emphasis>strongly-connected components</emphasis> analysis,
1013 with the exception that bindings in a segment cannot be reordered and
1014 must be contiguous.
1015 </para>
1016
1017 <para>
1018 Here is an example <literal>mdo</literal>-expression, and its translation to <literal>rec</literal> blocks:
1019 <programlisting>
1020 mdo { a &lt;- getChar ===> do { a &lt;- getChar
1021 ; b &lt;- f a c ; rec { b &lt;- f a c
1022 ; c &lt;- f b a ; ; c &lt;- f b a }
1023 ; z &lt;- h a b ; z &lt;- h a b
1024 ; d &lt;- g d e ; rec { d &lt;- g d e
1025 ; e &lt;- g a z ; ; e &lt;- g a z }
1026 ; putChar c } ; putChar c }
1027 </programlisting>
1028 Note that a given <literal>mdo</literal> expression can cause the creation of multiple <literal>rec</literal> blocks.
1029 If there are no recursive dependencies, <literal>mdo</literal> will introduce no <literal>rec</literal> blocks. In this
1030 latter case an <literal>mdo</literal> expression is precisely the same as a <literal>do</literal> expression, as one
1031 would expect.
1032 </para>
1033
1034 <para>
1035 In summary, given an <literal>mdo</literal> expression, GHC first performs segmentation, introducing
1036 <literal>rec</literal> blocks to wrap over minimal recursive groups. Then, each resulting
1037 <literal>rec</literal> is desugared, using a call to <literal>Control.Monad.Fix.mfix</literal> as described
1038 in the previous section. The original <literal>mdo</literal>-expression typechecks exactly when the desugared
1039 version would do so.
1040 </para>
1041
1042 <para>
1043 Here are some other important points in using the recursive-do notation:
1044
1045 <itemizedlist>
1046 <listitem>
1047 <para>
1048 It is enabled with the flag <literal>-XRecursiveDo</literal>, or the <literal>LANGUAGE RecursiveDo</literal>
1049 pragma. (The same flag enables both <literal>mdo</literal>-notation, and the use of <literal>rec</literal>
1050 blocks inside <literal>do</literal> expressions.)
1051 </para>
1052 </listitem>
1053 <listitem>
1054 <para>
1055 <literal>rec</literal> blocks can also be used inside <literal>mdo</literal>-expressions, which will be
1056 treated as a single statement. However, it is good style to either use <literal>mdo</literal> or
1057 <literal>rec</literal> blocks in a single expression.
1058 </para>
1059 </listitem>
1060 <listitem>
1061 <para>
1062 If recursive bindings are required for a monad, then that monad must be declared an instance of
1063 the <literal>MonadFix</literal> class.
1064 </para>
1065 </listitem>
1066 <listitem>
1067 <para>
1068 The following instances of <literal>MonadFix</literal> are automatically provided: List, Maybe, IO.
1069 Furthermore, the <literal>Control.Monad.ST</literal> and <literal>Control.Monad.ST.Lazy</literal>
1070 modules provide the instances of the <literal>MonadFix</literal> class for Haskell's internal
1071 state monad (strict and lazy, respectively).
1072 </para>
1073 </listitem>
1074 <listitem>
1075 <para>
1076 Like <literal>let</literal> and <literal>where</literal> bindings, name shadowing is not allowed within
1077 an <literal>mdo</literal>-expression or a <literal>rec</literal>-block; that is, all the names bound in
1078 a single <literal>rec</literal> must be distinct. (GHC will complain if this is not the case.)
1079 </para>
1080 </listitem>
1081 </itemizedlist>
1082 </para>
1083 </sect3>
1084
1085
1086 </sect2>
1087
1088
1089 <!-- ===================== PARALLEL LIST COMPREHENSIONS =================== -->
1090
1091 <sect2 id="parallel-list-comprehensions">
1092 <title>Parallel List Comprehensions</title>
1093 <indexterm><primary>list comprehensions</primary><secondary>parallel</secondary>
1094 </indexterm>
1095 <indexterm><primary>parallel list comprehensions</primary>
1096 </indexterm>
1097
1098 <para>Parallel list comprehensions are a natural extension to list
1099 comprehensions. List comprehensions can be thought of as a nice
1100 syntax for writing maps and filters. Parallel comprehensions
1101 extend this to include the zipWith family.</para>
1102
1103 <para>A parallel list comprehension has multiple independent
1104 branches of qualifier lists, each separated by a `|' symbol. For
1105 example, the following zips together two lists:</para>
1106
1107 <programlisting>
1108 [ (x, y) | x &lt;- xs | y &lt;- ys ]
1109 </programlisting>
1110
1111 <para>The behaviour of parallel list comprehensions follows that of
1112 zip, in that the resulting list will have the same length as the
1113 shortest branch.</para>
1114
1115 <para>We can define parallel list comprehensions by translation to
1116 regular comprehensions. Here's the basic idea:</para>
1117
1118 <para>Given a parallel comprehension of the form: </para>
1119
1120 <programlisting>
1121 [ e | p1 &lt;- e11, p2 &lt;- e12, ...
1122 | q1 &lt;- e21, q2 &lt;- e22, ...
1123 ...
1124 ]
1125 </programlisting>
1126
1127 <para>This will be translated to: </para>
1128
1129 <programlisting>
1130 [ e | ((p1,p2), (q1,q2), ...) &lt;- zipN [(p1,p2) | p1 &lt;- e11, p2 &lt;- e12, ...]
1131 [(q1,q2) | q1 &lt;- e21, q2 &lt;- e22, ...]
1132 ...
1133 ]
1134 </programlisting>
1135
1136 <para>where `zipN' is the appropriate zip for the given number of
1137 branches.</para>
1138
1139 </sect2>
1140
1141 <!-- ===================== TRANSFORM LIST COMPREHENSIONS =================== -->
1142
1143 <sect2 id="generalised-list-comprehensions">
1144 <title>Generalised (SQL-Like) List Comprehensions</title>
1145 <indexterm><primary>list comprehensions</primary><secondary>generalised</secondary>
1146 </indexterm>
1147 <indexterm><primary>extended list comprehensions</primary>
1148 </indexterm>
1149 <indexterm><primary>group</primary></indexterm>
1150 <indexterm><primary>sql</primary></indexterm>
1151
1152
1153 <para>Generalised list comprehensions are a further enhancement to the
1154 list comprehension syntactic sugar to allow operations such as sorting
1155 and grouping which are familiar from SQL. They are fully described in the
1156 paper <ulink url="http://research.microsoft.com/~simonpj/papers/list-comp">
1157 Comprehensive comprehensions: comprehensions with "order by" and "group by"</ulink>,
1158 except that the syntax we use differs slightly from the paper.</para>
1159 <para>The extension is enabled with the flag <option>-XTransformListComp</option>.</para>
1160 <para>Here is an example:
1161 <programlisting>
1162 employees = [ ("Simon", "MS", 80)
1163 , ("Erik", "MS", 100)
1164 , ("Phil", "Ed", 40)
1165 , ("Gordon", "Ed", 45)
1166 , ("Paul", "Yale", 60)]
1167
1168 output = [ (the dept, sum salary)
1169 | (name, dept, salary) &lt;- employees
1170 , then group by dept using groupWith
1171 , then sortWith by (sum salary)
1172 , then take 5 ]
1173 </programlisting>
1174 In this example, the list <literal>output</literal> would take on
1175 the value:
1176
1177 <programlisting>
1178 [("Yale", 60), ("Ed", 85), ("MS", 180)]
1179 </programlisting>
1180 </para>
1181 <para>There are three new keywords: <literal>group</literal>, <literal>by</literal>, and <literal>using</literal>.
1182 (The functions <literal>sortWith</literal> and <literal>groupWith</literal> are not keywords; they are ordinary
1183 functions that are exported by <literal>GHC.Exts</literal>.)</para>
1184
1185 <para>There are five new forms of comprehension qualifier,
1186 all introduced by the (existing) keyword <literal>then</literal>:
1187 <itemizedlist>
1188 <listitem>
1189
1190 <programlisting>
1191 then f
1192 </programlisting>
1193
1194 This statement requires that <literal>f</literal> have the type <literal>
1195 forall a. [a] -> [a]</literal>. You can see an example of its use in the
1196 motivating example, as this form is used to apply <literal>take 5</literal>.
1197
1198 </listitem>
1199
1200
1201 <listitem>
1202 <para>
1203 <programlisting>
1204 then f by e
1205 </programlisting>
1206
1207 This form is similar to the previous one, but allows you to create a function
1208 which will be passed as the first argument to f. As a consequence f must have
1209 the type <literal>forall a. (a -> t) -> [a] -> [a]</literal>. As you can see
1210 from the type, this function lets f &quot;project out&quot; some information
1211 from the elements of the list it is transforming.</para>
1212
1213 <para>An example is shown in the opening example, where <literal>sortWith</literal>
1214 is supplied with a function that lets it find out the <literal>sum salary</literal>
1215 for any item in the list comprehension it transforms.</para>
1216
1217 </listitem>
1218
1219
1220 <listitem>
1221
1222 <programlisting>
1223 then group by e using f
1224 </programlisting>
1225
1226 <para>This is the most general of the grouping-type statements. In this form,
1227 f is required to have type <literal>forall a. (a -> t) -> [a] -> [[a]]</literal>.
1228 As with the <literal>then f by e</literal> case above, the first argument
1229 is a function supplied to f by the compiler which lets it compute e on every
1230 element of the list being transformed. However, unlike the non-grouping case,
1231 f additionally partitions the list into a number of sublists: this means that
1232 at every point after this statement, binders occurring before it in the comprehension
1233 refer to <emphasis>lists</emphasis> of possible values, not single values. To help understand
1234 this, let's look at an example:</para>
1235
1236 <programlisting>
1237 -- This works similarly to groupWith in GHC.Exts, but doesn't sort its input first
1238 groupRuns :: Eq b => (a -> b) -> [a] -> [[a]]
1239 groupRuns f = groupBy (\x y -> f x == f y)
1240
1241 output = [ (the x, y)
1242 | x &lt;- ([1..3] ++ [1..2])
1243 , y &lt;- [4..6]
1244 , then group by x using groupRuns ]
1245 </programlisting>
1246
1247 <para>This results in the variable <literal>output</literal> taking on the value below:</para>
1248
1249 <programlisting>
1250 [(1, [4, 5, 6]), (2, [4, 5, 6]), (3, [4, 5, 6]), (1, [4, 5, 6]), (2, [4, 5, 6])]
1251 </programlisting>
1252
1253 <para>Note that we have used the <literal>the</literal> function to change the type
1254 of x from a list to its original numeric type. The variable y, in contrast, is left
1255 unchanged from the list form introduced by the grouping.</para>
1256
1257 </listitem>
1258
1259 <listitem>
1260
1261 <programlisting>
1262 then group using f
1263 </programlisting>
1264
1265 <para>With this form of the group statement, f is required to simply have the type
1266 <literal>forall a. [a] -> [[a]]</literal>, which will be used to group up the
1267 comprehension so far directly. An example of this form is as follows:</para>
1268
1269 <programlisting>
1270 output = [ x
1271 | y &lt;- [1..5]
1272 , x &lt;- "hello"
1273 , then group using inits]
1274 </programlisting>
1275
1276 <para>This will yield a list containing every prefix of the word "hello" written out 5 times:</para>
1277
1278 <programlisting>
1279 ["","h","he","hel","hell","hello","helloh","hellohe","hellohel","hellohell","hellohello","hellohelloh",...]
1280 </programlisting>
1281
1282 </listitem>
1283 </itemizedlist>
1284 </para>
1285 </sect2>
1286
1287 <!-- ===================== MONAD COMPREHENSIONS ===================== -->
1288
1289 <sect2 id="monad-comprehensions">
1290 <title>Monad comprehensions</title>
1291 <indexterm><primary>monad comprehensions</primary></indexterm>
1292
1293 <para>
1294 Monad comprehensions generalise the list comprehension notation,
1295 including parallel comprehensions
1296 (<xref linkend="parallel-list-comprehensions"/>) and
1297 transform comprehensions (<xref linkend="generalised-list-comprehensions"/>)
1298 to work for any monad.
1299 </para>
1300
1301 <para>Monad comprehensions support:</para>
1302
1303 <itemizedlist>
1304 <listitem>
1305 <para>
1306 Bindings:
1307 </para>
1308
1309 <programlisting>
1310 [ x + y | x &lt;- Just 1, y &lt;- Just 2 ]
1311 </programlisting>
1312
1313 <para>
1314 Bindings are translated with the <literal>(&gt;&gt;=)</literal> and
1315 <literal>return</literal> functions to the usual do-notation:
1316 </para>
1317
1318 <programlisting>
1319 do x &lt;- Just 1
1320 y &lt;- Just 2
1321 return (x+y)
1322 </programlisting>
1323
1324 </listitem>
1325 <listitem>
1326 <para>
1327 Guards:
1328 </para>
1329
1330 <programlisting>
1331 [ x | x &lt;- [1..10], x &lt;= 5 ]
1332 </programlisting>
1333
1334 <para>
1335 Guards are translated with the <literal>guard</literal> function,
1336 which requires a <literal>MonadPlus</literal> instance:
1337 </para>
1338
1339 <programlisting>
1340 do x &lt;- [1..10]
1341 guard (x &lt;= 5)
1342 return x
1343 </programlisting>
1344
1345 </listitem>
1346 <listitem>
1347 <para>
1348 Transform statements (as with <literal>-XTransformListComp</literal>):
1349 </para>
1350
1351 <programlisting>
1352 [ x+y | x &lt;- [1..10], y &lt;- [1..x], then take 2 ]
1353 </programlisting>
1354
1355 <para>
1356 This translates to:
1357 </para>
1358
1359 <programlisting>
1360 do (x,y) &lt;- take 2 (do x &lt;- [1..10]
1361 y &lt;- [1..x]
1362 return (x,y))
1363 return (x+y)
1364 </programlisting>
1365
1366 </listitem>
1367 <listitem>
1368 <para>
1369 Group statements (as with <literal>-XTransformListComp</literal>):
1370 </para>
1371
1372 <programlisting>
1373 [ x | x &lt;- [1,1,2,2,3], then group by x using GHC.Exts.groupWith ]
1374 [ x | x &lt;- [1,1,2,2,3], then group using myGroup ]
1375 </programlisting>
1376
1377 </listitem>
1378 <listitem>
1379 <para>
1380 Parallel statements (as with <literal>-XParallelListComp</literal>):
1381 </para>
1382
1383 <programlisting>
1384 [ (x+y) | x &lt;- [1..10]
1385 | y &lt;- [11..20]
1386 ]
1387 </programlisting>
1388
1389 <para>
1390 Parallel statements are translated using the
1391 <literal>mzip</literal> function, which requires a
1392 <literal>MonadZip</literal> instance defined in
1393 <ulink url="&libraryBaseLocation;/Control-Monad-Zip.html"><literal>Control.Monad.Zip</literal></ulink>:
1394 </para>
1395
1396 <programlisting>
1397 do (x,y) &lt;- mzip (do x &lt;- [1..10]
1398 return x)
1399 (do y &lt;- [11..20]
1400 return y)
1401 return (x+y)
1402 </programlisting>
1403
1404 </listitem>
1405 </itemizedlist>
1406
1407 <para>
1408 All these features are enabled by default if the
1409 <literal>MonadComprehensions</literal> extension is enabled. The types
1410 and more detailed examples on how to use comprehensions are explained
1411 in the previous chapters <xref
1412 linkend="generalised-list-comprehensions"/> and <xref
1413 linkend="parallel-list-comprehensions"/>. In general you just have
1414 to replace the type <literal>[a]</literal> with the type
1415 <literal>Monad m => m a</literal> for monad comprehensions.
1416 </para>
1417
1418 <para>
1419 Note: Even though most of these examples are using the list monad,
1420 monad comprehensions work for any monad.
1421 The <literal>base</literal> package offers all necessary instances for
1422 lists, which make <literal>MonadComprehensions</literal> backward
1423 compatible to built-in, transform and parallel list comprehensions.
1424 </para>
1425 <para> More formally, the desugaring is as follows. We write <literal>D[ e | Q]</literal>
1426 to mean the desugaring of the monad comprehension <literal>[ e | Q]</literal>:
1427 <programlisting>
1428 Expressions: e
1429 Declarations: d
1430 Lists of qualifiers: Q,R,S
1431
1432 -- Basic forms
1433 D[ e | ] = return e
1434 D[ e | p &lt;- e, Q ] = e &gt;&gt;= \p -&gt; D[ e | Q ]
1435 D[ e | e, Q ] = guard e &gt;&gt; \p -&gt; D[ e | Q ]
1436 D[ e | let d, Q ] = let d in D[ e | Q ]
1437
1438 -- Parallel comprehensions (iterate for multiple parallel branches)
1439 D[ e | (Q | R), S ] = mzip D[ Qv | Q ] D[ Rv | R ] &gt;&gt;= \(Qv,Rv) -&gt; D[ e | S ]
1440
1441 -- Transform comprehensions
1442 D[ e | Q then f, R ] = f D[ Qv | Q ] &gt;&gt;= \Qv -&gt; D[ e | R ]
1443
1444 D[ e | Q then f by b, R ] = f (\Qv -&gt; b) D[ Qv | Q ] &gt;&gt;= \Qv -&gt; D[ e | R ]
1445
1446 D[ e | Q then group using f, R ] = f D[ Qv | Q ] &gt;&gt;= \ys -&gt;
1447 case (fmap selQv1 ys, ..., fmap selQvn ys) of
1448 Qv -&gt; D[ e | R ]
1449
1450 D[ e | Q then group by b using f, R ] = f (\Qv -&gt; b) D[ Qv | Q ] &gt;&gt;= \ys -&gt;
1451 case (fmap selQv1 ys, ..., fmap selQvn ys) of
1452 Qv -&gt; D[ e | R ]
1453
1454 where Qv is the tuple of variables bound by Q (and used subsequently)
1455 selQvi is a selector mapping Qv to the ith component of Qv
1456
1457 Operator Standard binding Expected type
1458 --------------------------------------------------------------------
1459 return GHC.Base t1 -&gt; m t2
1460 (&gt;&gt;=) GHC.Base m1 t1 -&gt; (t2 -&gt; m2 t3) -&gt; m3 t3
1461 (&gt;&gt;) GHC.Base m1 t1 -&gt; m2 t2 -&gt; m3 t3
1462 guard Control.Monad t1 -&gt; m t2
1463 fmap GHC.Base forall a b. (a-&gt;b) -&gt; n a -&gt; n b
1464 mzip Control.Monad.Zip forall a b. m a -&gt; m b -&gt; m (a,b)
1465 </programlisting>
1466 The comprehension should typecheck when its desugaring would typecheck.
1467 </para>
1468 <para>
1469 Monad comprehensions support rebindable syntax (<xref linkend="rebindable-syntax"/>).
1470 Without rebindable
1471 syntax, the operators from the "standard binding" module are used; with
1472 rebindable syntax, the operators are looked up in the current lexical scope.
1473 For example, parallel comprehensions will be typechecked and desugared
1474 using whatever "<literal>mzip</literal>" is in scope.
1475 </para>
1476 <para>
1477 The rebindable operators must have the "Expected type" given in the
1478 table above. These types are surprisingly general. For example, you can
1479 use a bind operator with the type
1480 <programlisting>
1481 (>>=) :: T x y a -> (a -> T y z b) -> T x z b
1482 </programlisting>
1483 In the case of transform comprehensions, notice that the groups are
1484 parameterised over some arbitrary type <literal>n</literal> (provided it
1485 has an <literal>fmap</literal>, as well as
1486 the comprehension being over an arbitrary monad.
1487 </para>
1488 </sect2>
1489
1490 <!-- ===================== REBINDABLE SYNTAX =================== -->
1491
1492 <sect2 id="rebindable-syntax">
1493 <title>Rebindable syntax and the implicit Prelude import</title>
1494
1495 <para><indexterm><primary>-XNoImplicitPrelude
1496 option</primary></indexterm> GHC normally imports
1497 <filename>Prelude.hi</filename> files for you. If you'd
1498 rather it didn't, then give it a
1499 <option>-XNoImplicitPrelude</option> option. The idea is
1500 that you can then import a Prelude of your own. (But don't
1501 call it <literal>Prelude</literal>; the Haskell module
1502 namespace is flat, and you must not conflict with any
1503 Prelude module.)</para>
1504
1505 <para>Suppose you are importing a Prelude of your own
1506 in order to define your own numeric class
1507 hierarchy. It completely defeats that purpose if the
1508 literal "1" means "<literal>Prelude.fromInteger
1509 1</literal>", which is what the Haskell Report specifies.
1510 So the <option>-XRebindableSyntax</option>
1511 flag causes
1512 the following pieces of built-in syntax to refer to
1513 <emphasis>whatever is in scope</emphasis>, not the Prelude
1514 versions:
1515 <itemizedlist>
1516 <listitem>
1517 <para>An integer literal <literal>368</literal> means
1518 "<literal>fromInteger (368::Integer)</literal>", rather than
1519 "<literal>Prelude.fromInteger (368::Integer)</literal>".
1520 </para> </listitem>
1521
1522 <listitem><para>Fractional literals are handed in just the same way,
1523 except that the translation is
1524 <literal>fromRational (3.68::Rational)</literal>.
1525 </para> </listitem>
1526
1527 <listitem><para>The equality test in an overloaded numeric pattern
1528 uses whatever <literal>(==)</literal> is in scope.
1529 </para> </listitem>
1530
1531 <listitem><para>The subtraction operation, and the
1532 greater-than-or-equal test, in <literal>n+k</literal> patterns
1533 use whatever <literal>(-)</literal> and <literal>(>=)</literal> are in scope.
1534 </para></listitem>
1535
1536 <listitem>
1537 <para>Negation (e.g. "<literal>- (f x)</literal>")
1538 means "<literal>negate (f x)</literal>", both in numeric
1539 patterns, and expressions.
1540 </para></listitem>
1541
1542 <listitem>
1543 <para>Conditionals (e.g. "<literal>if</literal> e1 <literal>then</literal> e2 <literal>else</literal> e3")
1544 means "<literal>ifThenElse</literal> e1 e2 e3". However <literal>case</literal> expressions are unaffected.
1545 </para></listitem>
1546
1547 <listitem>
1548 <para>"Do" notation is translated using whatever
1549 functions <literal>(>>=)</literal>,
1550 <literal>(>>)</literal>, and <literal>fail</literal>,
1551 are in scope (not the Prelude
1552 versions). List comprehensions, mdo (<xref linkend="recursive-do-notation"/>), and parallel array
1553 comprehensions, are unaffected. </para></listitem>
1554
1555 <listitem>
1556 <para>Arrow
1557 notation (see <xref linkend="arrow-notation"/>)
1558 uses whatever <literal>arr</literal>,
1559 <literal>(>>>)</literal>, <literal>first</literal>,
1560 <literal>app</literal>, <literal>(|||)</literal> and
1561 <literal>loop</literal> functions are in scope. But unlike the
1562 other constructs, the types of these functions must match the
1563 Prelude types very closely. Details are in flux; if you want
1564 to use this, ask!
1565 </para></listitem>
1566 </itemizedlist>
1567 <option>-XRebindableSyntax</option> implies <option>-XNoImplicitPrelude</option>.
1568 </para>
1569 <para>
1570 In all cases (apart from arrow notation), the static semantics should be that of the desugared form,
1571 even if that is a little unexpected. For example, the
1572 static semantics of the literal <literal>368</literal>
1573 is exactly that of <literal>fromInteger (368::Integer)</literal>; it's fine for
1574 <literal>fromInteger</literal> to have any of the types:
1575 <programlisting>
1576 fromInteger :: Integer -> Integer
1577 fromInteger :: forall a. Foo a => Integer -> a
1578 fromInteger :: Num a => a -> Integer
1579 fromInteger :: Integer -> Bool -> Bool
1580 </programlisting>
1581 </para>
1582
1583 <para>Be warned: this is an experimental facility, with
1584 fewer checks than usual. Use <literal>-dcore-lint</literal>
1585 to typecheck the desugared program. If Core Lint is happy
1586 you should be all right.</para>
1587
1588 </sect2>
1589
1590 <sect2 id="postfix-operators">
1591 <title>Postfix operators</title>
1592
1593 <para>
1594 The <option>-XPostfixOperators</option> flag enables a small
1595 extension to the syntax of left operator sections, which allows you to
1596 define postfix operators. The extension is this: the left section
1597 <programlisting>
1598 (e !)
1599 </programlisting>
1600 is equivalent (from the point of view of both type checking and execution) to the expression
1601 <programlisting>
1602 ((!) e)
1603 </programlisting>
1604 (for any expression <literal>e</literal> and operator <literal>(!)</literal>.
1605 The strict Haskell 98 interpretation is that the section is equivalent to
1606 <programlisting>
1607 (\y -> (!) e y)
1608 </programlisting>
1609 That is, the operator must be a function of two arguments. GHC allows it to
1610 take only one argument, and that in turn allows you to write the function
1611 postfix.
1612 </para>
1613 <para>The extension does not extend to the left-hand side of function
1614 definitions; you must define such a function in prefix form.</para>
1615
1616 </sect2>
1617
1618 <sect2 id="tuple-sections">
1619 <title>Tuple sections</title>
1620
1621 <para>
1622 The <option>-XTupleSections</option> flag enables Python-style partially applied
1623 tuple constructors. For example, the following program
1624 <programlisting>
1625 (, True)
1626 </programlisting>
1627 is considered to be an alternative notation for the more unwieldy alternative
1628 <programlisting>
1629 \x -> (x, True)
1630 </programlisting>
1631 You can omit any combination of arguments to the tuple, as in the following
1632 <programlisting>
1633 (, "I", , , "Love", , 1337)
1634 </programlisting>
1635 which translates to
1636 <programlisting>
1637 \a b c d -> (a, "I", b, c, "Love", d, 1337)
1638 </programlisting>
1639 </para>
1640
1641 <para>
1642 If you have <link linkend="unboxed-tuples">unboxed tuples</link> enabled, tuple sections
1643 will also be available for them, like so
1644 <programlisting>
1645 (# , True #)
1646 </programlisting>
1647 Because there is no unboxed unit tuple, the following expression
1648 <programlisting>
1649 (# #)
1650 </programlisting>
1651 continues to stand for the unboxed singleton tuple data constructor.
1652 </para>
1653
1654 </sect2>
1655
1656 <sect2 id="lambda-case">
1657 <title>Lambda-case</title>
1658 <para>
1659 The <option>-XLambdaCase</option> flag enables expressions of the form
1660 <programlisting>
1661 \case { p1 -> e1; ...; pN -> eN }
1662 </programlisting>
1663 which is equivalent to
1664 <programlisting>
1665 \freshName -> case freshName of { p1 -> e1; ...; pN -> eN }
1666 </programlisting>
1667 Note that <literal>\case</literal> starts a layout, so you can write
1668 <programlisting>
1669 \case
1670 p1 -> e1
1671 ...
1672 pN -> eN
1673 </programlisting>
1674 </para>
1675 </sect2>
1676
1677 <sect2 id="empty-case">
1678 <title>Empty case alternatives</title>
1679 <para>
1680 The <option>-XEmptyCase</option> flag enables
1681 case expressions, or lambda-case expressions, that have no alternatives,
1682 thus:
1683 <programlisting>
1684 case e of { } -- No alternatives
1685 or
1686 \case { } -- -XLambdaCase is also required
1687 </programlisting>
1688 This can be useful when you know that the expression being scrutinised
1689 has no non-bottom values. For example:
1690 <programlisting>
1691 data Void
1692 f :: Void -> Int
1693 f x = case x of { }
1694 </programlisting>
1695 With dependently-typed features it is more useful
1696 (see <ulink url="http://hackage.haskell.org/trac/ghc/ticket/2431">Trac</ulink>).
1697 For example, consider these two candidate definitions of <literal>absurd</literal>:
1698 <programlisting>
1699 data a :==: b where
1700 Refl :: a :==: a
1701
1702 absurd :: True :~: False -> a
1703 absurd x = error "absurd" -- (A)
1704 absurd x = case x of {} -- (B)
1705 </programlisting>
1706 We much prefer (B). Why? Because GHC can figure out that <literal>(True :~: False)</literal>
1707 is an empty type. So (B) has no partiality and GHC should be able to compile with
1708 <option>-fwarn-incomplete-patterns</option>. (Though the pattern match checking is not
1709 yet clever enough to do that.
1710 On the other hand (A) looks dangerous, and GHC doesn't check to make
1711 sure that, in fact, the function can never get called.
1712 </para>
1713 </sect2>
1714
1715 <sect2 id="multi-way-if">
1716 <title>Multi-way if-expressions</title>
1717 <para>
1718 With <option>-XMultiWayIf</option> flag GHC accepts conditional expressions
1719 with multiple branches:
1720 <programlisting>
1721 if | guard1 -> expr1
1722 | ...
1723 | guardN -> exprN
1724 </programlisting>
1725 which is roughly equivalent to
1726 <programlisting>
1727 case () of
1728 _ | guard1 -> expr1
1729 ...
1730 _ | guardN -> exprN
1731 </programlisting>
1732 except that multi-way if-expressions do not alter the layout.
1733 </para>
1734 </sect2>
1735
1736 <sect2 id="disambiguate-fields">
1737 <title>Record field disambiguation</title>
1738 <para>
1739 In record construction and record pattern matching
1740 it is entirely unambiguous which field is referred to, even if there are two different
1741 data types in scope with a common field name. For example:
1742 <programlisting>
1743 module M where
1744 data S = MkS { x :: Int, y :: Bool }
1745
1746 module Foo where
1747 import M
1748
1749 data T = MkT { x :: Int }
1750
1751 ok1 (MkS { x = n }) = n+1 -- Unambiguous
1752 ok2 n = MkT { x = n+1 } -- Unambiguous
1753
1754 bad1 k = k { x = 3 } -- Ambiguous
1755 bad2 k = x k -- Ambiguous
1756 </programlisting>
1757 Even though there are two <literal>x</literal>'s in scope,
1758 it is clear that the <literal>x</literal> in the pattern in the
1759 definition of <literal>ok1</literal> can only mean the field
1760 <literal>x</literal> from type <literal>S</literal>. Similarly for
1761 the function <literal>ok2</literal>. However, in the record update
1762 in <literal>bad1</literal> and the record selection in <literal>bad2</literal>
1763 it is not clear which of the two types is intended.
1764 </para>
1765 <para>
1766 Haskell 98 regards all four as ambiguous, but with the
1767 <option>-XDisambiguateRecordFields</option> flag, GHC will accept
1768 the former two. The rules are precisely the same as those for instance
1769 declarations in Haskell 98, where the method names on the left-hand side
1770 of the method bindings in an instance declaration refer unambiguously
1771 to the method of that class (provided they are in scope at all), even
1772 if there are other variables in scope with the same name.
1773 This reduces the clutter of qualified names when you import two
1774 records from different modules that use the same field name.
1775 </para>
1776 <para>
1777 Some details:
1778 <itemizedlist>
1779 <listitem><para>
1780 Field disambiguation can be combined with punning (see <xref linkend="record-puns"/>). For example:
1781 <programlisting>
1782 module Foo where
1783 import M
1784 x=True
1785 ok3 (MkS { x }) = x+1 -- Uses both disambiguation and punning
1786 </programlisting>
1787 </para></listitem>
1788
1789 <listitem><para>
1790 With <option>-XDisambiguateRecordFields</option> you can use <emphasis>unqualified</emphasis>
1791 field names even if the corresponding selector is only in scope <emphasis>qualified</emphasis>
1792 For example, assuming the same module <literal>M</literal> as in our earlier example, this is legal:
1793 <programlisting>
1794 module Foo where
1795 import qualified M -- Note qualified
1796
1797 ok4 (M.MkS { x = n }) = n+1 -- Unambiguous
1798 </programlisting>
1799 Since the constructor <literal>MkS</literal> is only in scope qualified, you must
1800 name it <literal>M.MkS</literal>, but the field <literal>x</literal> does not need
1801 to be qualified even though <literal>M.x</literal> is in scope but <literal>x</literal>
1802 is not. (In effect, it is qualified by the constructor.)
1803 </para></listitem>
1804 </itemizedlist>
1805 </para>
1806
1807 </sect2>
1808
1809 <!-- ===================== Record puns =================== -->
1810
1811 <sect2 id="record-puns">
1812 <title>Record puns
1813 </title>
1814
1815 <para>
1816 Record puns are enabled by the flag <literal>-XNamedFieldPuns</literal>.
1817 </para>
1818
1819 <para>
1820 When using records, it is common to write a pattern that binds a
1821 variable with the same name as a record field, such as:
1822
1823 <programlisting>
1824 data C = C {a :: Int}
1825 f (C {a = a}) = a
1826 </programlisting>
1827 </para>
1828
1829 <para>
1830 Record punning permits the variable name to be elided, so one can simply
1831 write
1832
1833 <programlisting>
1834 f (C {a}) = a
1835 </programlisting>
1836
1837 to mean the same pattern as above. That is, in a record pattern, the
1838 pattern <literal>a</literal> expands into the pattern <literal>a =
1839 a</literal> for the same name <literal>a</literal>.
1840 </para>
1841
1842 <para>
1843 Note that:
1844 <itemizedlist>
1845 <listitem><para>
1846 Record punning can also be used in an expression, writing, for example,
1847 <programlisting>
1848 let a = 1 in C {a}
1849 </programlisting>
1850 instead of
1851 <programlisting>
1852 let a = 1 in C {a = a}
1853 </programlisting>
1854 The expansion is purely syntactic, so the expanded right-hand side
1855 expression refers to the nearest enclosing variable that is spelled the
1856 same as the field name.
1857 </para></listitem>
1858
1859 <listitem><para>
1860 Puns and other patterns can be mixed in the same record:
1861 <programlisting>
1862 data C = C {a :: Int, b :: Int}
1863 f (C {a, b = 4}) = a
1864 </programlisting>
1865 </para></listitem>
1866
1867 <listitem><para>
1868 Puns can be used wherever record patterns occur (e.g. in
1869 <literal>let</literal> bindings or at the top-level).
1870 </para></listitem>
1871
1872 <listitem><para>
1873 A pun on a qualified field name is expanded by stripping off the module qualifier.
1874 For example:
1875 <programlisting>
1876 f (C {M.a}) = a
1877 </programlisting>
1878 means
1879 <programlisting>
1880 f (M.C {M.a = a}) = a
1881 </programlisting>
1882 (This is useful if the field selector <literal>a</literal> for constructor <literal>M.C</literal>
1883 is only in scope in qualified form.)
1884 </para></listitem>
1885 </itemizedlist>
1886 </para>
1887
1888
1889 </sect2>
1890
1891 <!-- ===================== Record wildcards =================== -->
1892
1893 <sect2 id="record-wildcards">
1894 <title>Record wildcards
1895 </title>
1896
1897 <para>
1898 Record wildcards are enabled by the flag <literal>-XRecordWildCards</literal>.
1899 This flag implies <literal>-XDisambiguateRecordFields</literal>.
1900 </para>
1901
1902 <para>
1903 For records with many fields, it can be tiresome to write out each field
1904 individually in a record pattern, as in
1905 <programlisting>
1906 data C = C {a :: Int, b :: Int, c :: Int, d :: Int}
1907 f (C {a = 1, b = b, c = c, d = d}) = b + c + d
1908 </programlisting>
1909 </para>
1910
1911 <para>
1912 Record wildcard syntax permits a "<literal>..</literal>" in a record
1913 pattern, where each elided field <literal>f</literal> is replaced by the
1914 pattern <literal>f = f</literal>. For example, the above pattern can be
1915 written as
1916 <programlisting>
1917 f (C {a = 1, ..}) = b + c + d
1918 </programlisting>
1919 </para>
1920
1921 <para>
1922 More details:
1923 <itemizedlist>
1924 <listitem><para>
1925 Wildcards can be mixed with other patterns, including puns
1926 (<xref linkend="record-puns"/>); for example, in a pattern <literal>C {a
1927 = 1, b, ..})</literal>. Additionally, record wildcards can be used
1928 wherever record patterns occur, including in <literal>let</literal>
1929 bindings and at the top-level. For example, the top-level binding
1930 <programlisting>
1931 C {a = 1, ..} = e
1932 </programlisting>
1933 defines <literal>b</literal>, <literal>c</literal>, and
1934 <literal>d</literal>.
1935 </para></listitem>
1936
1937 <listitem><para>
1938 Record wildcards can also be used in expressions, writing, for example,
1939 <programlisting>
1940 let {a = 1; b = 2; c = 3; d = 4} in C {..}
1941 </programlisting>
1942 in place of
1943 <programlisting>
1944 let {a = 1; b = 2; c = 3; d = 4} in C {a=a, b=b, c=c, d=d}
1945 </programlisting>
1946 The expansion is purely syntactic, so the record wildcard
1947 expression refers to the nearest enclosing variables that are spelled
1948 the same as the omitted field names.
1949 </para></listitem>
1950
1951 <listitem><para>
1952 The "<literal>..</literal>" expands to the missing
1953 <emphasis>in-scope</emphasis> record fields.
1954 Specifically the expansion of "<literal>C {..}</literal>" includes
1955 <literal>f</literal> if and only if:
1956 <itemizedlist>
1957 <listitem><para>
1958 <literal>f</literal> is a record field of constructor <literal>C</literal>.
1959 </para></listitem>
1960 <listitem><para>
1961 The record field <literal>f</literal> is in scope somehow (either qualified or unqualified).
1962 </para></listitem>
1963 <listitem><para>
1964 In the case of expressions (but not patterns),
1965 the variable <literal>f</literal> is in scope unqualified,
1966 apart from the binding of the record selector itself.
1967 </para></listitem>
1968 </itemizedlist>
1969 For example
1970 <programlisting>
1971 module M where
1972 data R = R { a,b,c :: Int }
1973 module X where
1974 import M( R(a,c) )
1975 f b = R { .. }
1976 </programlisting>
1977 The <literal>R{..}</literal> expands to <literal>R{M.a=a}</literal>,
1978 omitting <literal>b</literal> since the record field is not in scope,
1979 and omitting <literal>c</literal> since the variable <literal>c</literal>
1980 is not in scope (apart from the binding of the
1981 record selector <literal>c</literal>, of course).
1982 </para></listitem>
1983 </itemizedlist>
1984 </para>
1985
1986 </sect2>
1987
1988 <!-- ===================== Local fixity declarations =================== -->
1989
1990 <sect2 id="local-fixity-declarations">
1991 <title>Local Fixity Declarations
1992 </title>
1993
1994 <para>A careful reading of the Haskell 98 Report reveals that fixity
1995 declarations (<literal>infix</literal>, <literal>infixl</literal>, and
1996 <literal>infixr</literal>) are permitted to appear inside local bindings
1997 such those introduced by <literal>let</literal> and
1998 <literal>where</literal>. However, the Haskell Report does not specify
1999 the semantics of such bindings very precisely.
2000 </para>
2001
2002 <para>In GHC, a fixity declaration may accompany a local binding:
2003 <programlisting>
2004 let f = ...
2005 infixr 3 `f`
2006 in
2007 ...
2008 </programlisting>
2009 and the fixity declaration applies wherever the binding is in scope.
2010 For example, in a <literal>let</literal>, it applies in the right-hand
2011 sides of other <literal>let</literal>-bindings and the body of the
2012 <literal>let</literal>C. Or, in recursive <literal>do</literal>
2013 expressions (<xref linkend="recursive-do-notation"/>), the local fixity
2014 declarations of a <literal>let</literal> statement scope over other
2015 statements in the group, just as the bound name does.
2016 </para>
2017
2018 <para>
2019 Moreover, a local fixity declaration *must* accompany a local binding of
2020 that name: it is not possible to revise the fixity of name bound
2021 elsewhere, as in
2022 <programlisting>
2023 let infixr 9 $ in ...
2024 </programlisting>
2025
2026 Because local fixity declarations are technically Haskell 98, no flag is
2027 necessary to enable them.
2028 </para>
2029 </sect2>
2030
2031 <sect2 id="package-imports">
2032 <title>Package-qualified imports</title>
2033
2034 <para>With the <option>-XPackageImports</option> flag, GHC allows
2035 import declarations to be qualified by the package name that the
2036 module is intended to be imported from. For example:</para>
2037
2038 <programlisting>
2039 import "network" Network.Socket
2040 </programlisting>
2041
2042 <para>would import the module <literal>Network.Socket</literal> from
2043 the package <literal>network</literal> (any version). This may
2044 be used to disambiguate an import when the same module is
2045 available from multiple packages, or is present in both the
2046 current package being built and an external package.</para>
2047
2048 <para>The special package name <literal>this</literal> can be used to
2049 refer to the current package being built.</para>
2050
2051 <para>Note: you probably don't need to use this feature, it was
2052 added mainly so that we can build backwards-compatible versions of
2053 packages when APIs change. It can lead to fragile dependencies in
2054 the common case: modules occasionally move from one package to
2055 another, rendering any package-qualified imports broken.</para>
2056 </sect2>
2057
2058 <sect2 id="safe-imports-ext">
2059 <title>Safe imports</title>
2060
2061 <para>With the <option>-XSafe</option>, <option>-XTrustworthy</option>
2062 and <option>-XUnsafe</option> language flags, GHC extends
2063 the import declaration syntax to take an optional <literal>safe</literal>
2064 keyword after the <literal>import</literal> keyword. This feature
2065 is part of the Safe Haskell GHC extension. For example:</para>
2066
2067 <programlisting>
2068 import safe qualified Network.Socket as NS
2069 </programlisting>
2070
2071 <para>would import the module <literal>Network.Socket</literal>
2072 with compilation only succeeding if Network.Socket can be
2073 safely imported. For a description of when a import is
2074 considered safe see <xref linkend="safe-haskell"/></para>
2075
2076 </sect2>
2077
2078 <sect2 id="syntax-stolen">
2079 <title>Summary of stolen syntax</title>
2080
2081 <para>Turning on an option that enables special syntax
2082 <emphasis>might</emphasis> cause working Haskell 98 code to fail
2083 to compile, perhaps because it uses a variable name which has
2084 become a reserved word. This section lists the syntax that is
2085 "stolen" by language extensions.
2086 We use
2087 notation and nonterminal names from the Haskell 98 lexical syntax
2088 (see the Haskell 98 Report).
2089 We only list syntax changes here that might affect
2090 existing working programs (i.e. "stolen" syntax). Many of these
2091 extensions will also enable new context-free syntax, but in all
2092 cases programs written to use the new syntax would not be
2093 compilable without the option enabled.</para>
2094
2095 <para>There are two classes of special
2096 syntax:
2097
2098 <itemizedlist>
2099 <listitem>
2100 <para>New reserved words and symbols: character sequences
2101 which are no longer available for use as identifiers in the
2102 program.</para>
2103 </listitem>
2104 <listitem>
2105 <para>Other special syntax: sequences of characters that have
2106 a different meaning when this particular option is turned
2107 on.</para>
2108 </listitem>
2109 </itemizedlist>
2110
2111 The following syntax is stolen:
2112
2113 <variablelist>
2114 <varlistentry>
2115 <term>
2116 <literal>forall</literal>
2117 <indexterm><primary><literal>forall</literal></primary></indexterm>
2118 </term>
2119 <listitem><para>
2120 Stolen (in types) by: <option>-XExplicitForAll</option>, and hence by
2121 <option>-XScopedTypeVariables</option>,
2122 <option>-XLiberalTypeSynonyms</option>,
2123 <option>-XRankNTypes</option>,
2124 <option>-XExistentialQuantification</option>
2125 </para></listitem>
2126 </varlistentry>
2127
2128 <varlistentry>
2129 <term>
2130 <literal>mdo</literal>
2131 <indexterm><primary><literal>mdo</literal></primary></indexterm>
2132 </term>
2133 <listitem><para>
2134 Stolen by: <option>-XRecursiveDo</option>
2135 </para></listitem>
2136 </varlistentry>
2137
2138 <varlistentry>
2139 <term>
2140 <literal>foreign</literal>
2141 <indexterm><primary><literal>foreign</literal></primary></indexterm>
2142 </term>
2143 <listitem><para>
2144 Stolen by: <option>-XForeignFunctionInterface</option>
2145 </para></listitem>
2146 </varlistentry>
2147
2148 <varlistentry>
2149 <term>
2150 <literal>rec</literal>,
2151 <literal>proc</literal>, <literal>-&lt;</literal>,
2152 <literal>&gt;-</literal>, <literal>-&lt;&lt;</literal>,
2153 <literal>&gt;&gt;-</literal>, and <literal>(|</literal>,
2154 <literal>|)</literal> brackets
2155 <indexterm><primary><literal>proc</literal></primary></indexterm>
2156 </term>
2157 <listitem><para>
2158 Stolen by: <option>-XArrows</option>
2159 </para></listitem>
2160 </varlistentry>
2161
2162 <varlistentry>
2163 <term>
2164 <literal>?<replaceable>varid</replaceable></literal>,
2165 <literal>%<replaceable>varid</replaceable></literal>
2166 <indexterm><primary>implicit parameters</primary></indexterm>
2167 </term>
2168 <listitem><para>
2169 Stolen by: <option>-XImplicitParams</option>
2170 </para></listitem>
2171 </varlistentry>
2172
2173 <varlistentry>
2174 <term>
2175 <literal>[|</literal>,
2176 <literal>[e|</literal>, <literal>[p|</literal>,
2177 <literal>[d|</literal>, <literal>[t|</literal>,
2178 <literal>$(</literal>,
2179 <literal>$<replaceable>varid</replaceable></literal>
2180 <indexterm><primary>Template Haskell</primary></indexterm>
2181 </term>
2182 <listitem><para>
2183 Stolen by: <option>-XTemplateHaskell</option>
2184 </para></listitem>
2185 </varlistentry>
2186
2187 <varlistentry>
2188 <term>
2189 <literal>[:<replaceable>varid</replaceable>|</literal>
2190 <indexterm><primary>quasi-quotation</primary></indexterm>
2191 </term>
2192 <listitem><para>
2193 Stolen by: <option>-XQuasiQuotes</option>
2194 </para></listitem>
2195 </varlistentry>
2196
2197 <varlistentry>
2198 <term>
2199 <replaceable>varid</replaceable>{<literal>&num;</literal>},
2200 <replaceable>char</replaceable><literal>&num;</literal>,
2201 <replaceable>string</replaceable><literal>&num;</literal>,
2202 <replaceable>integer</replaceable><literal>&num;</literal>,
2203 <replaceable>float</replaceable><literal>&num;</literal>,
2204 <replaceable>float</replaceable><literal>&num;&num;</literal>,
2205 <literal>(&num;</literal>, <literal>&num;)</literal>
2206 </term>
2207 <listitem><para>
2208 Stolen by: <option>-XMagicHash</option>
2209 </para></listitem>
2210 </varlistentry>
2211 </variablelist>
2212 </para>
2213 </sect2>
2214 </sect1>
2215
2216
2217 <!-- TYPE SYSTEM EXTENSIONS -->
2218 <sect1 id="data-type-extensions">
2219 <title>Extensions to data types and type synonyms</title>
2220
2221 <sect2 id="nullary-types">
2222 <title>Data types with no constructors</title>
2223
2224 <para>With the <option>-XEmptyDataDecls</option> flag (or equivalent LANGUAGE pragma),
2225 GHC lets you declare a data type with no constructors. For example:</para>
2226
2227 <programlisting>
2228 data S -- S :: *
2229 data T a -- T :: * -> *
2230 </programlisting>
2231
2232 <para>Syntactically, the declaration lacks the "= constrs" part. The
2233 type can be parameterised over types of any kind, but if the kind is
2234 not <literal>*</literal> then an explicit kind annotation must be used
2235 (see <xref linkend="kinding"/>).</para>
2236
2237 <para>Such data types have only one value, namely bottom.
2238 Nevertheless, they can be useful when defining "phantom types".</para>
2239 </sect2>
2240
2241 <sect2 id="datatype-contexts">
2242 <title>Data type contexts</title>
2243
2244 <para>Haskell allows datatypes to be given contexts, e.g.</para>
2245
2246 <programlisting>
2247 data Eq a => Set a = NilSet | ConsSet a (Set a)
2248 </programlisting>
2249
2250 <para>give constructors with types:</para>
2251
2252 <programlisting>
2253 NilSet :: Set a
2254 ConsSet :: Eq a => a -> Set a -> Set a
2255 </programlisting>
2256
2257 <para>This is widely considered a misfeature, and is going to be removed from
2258 the language. In GHC, it is controlled by the deprecated extension
2259 <literal>DatatypeContexts</literal>.</para>
2260 </sect2>
2261
2262 <sect2 id="infix-tycons">
2263 <title>Infix type constructors, classes, and type variables</title>
2264
2265 <para>
2266 GHC allows type constructors, classes, and type variables to be operators, and
2267 to be written infix, very much like expressions. More specifically:
2268 <itemizedlist>
2269 <listitem><para>
2270 A type constructor or class can be an operator, beginning with a colon; e.g. <literal>:*:</literal>.
2271 The lexical syntax is the same as that for data constructors.
2272 </para></listitem>
2273 <listitem><para>
2274 Data type and type-synonym declarations can be written infix, parenthesised
2275 if you want further arguments. E.g.
2276 <screen>
2277 data a :*: b = Foo a b
2278 type a :+: b = Either a b
2279 class a :=: b where ...
2280
2281 data (a :**: b) x = Baz a b x
2282 type (a :++: b) y = Either (a,b) y
2283 </screen>
2284 </para></listitem>
2285 <listitem><para>
2286 Types, and class constraints, can be written infix. For example
2287 <screen>
2288 x :: Int :*: Bool
2289 f :: (a :=: b) => a -> b
2290 </screen>
2291 </para></listitem>
2292 <listitem><para>
2293 Back-quotes work
2294 as for expressions, both for type constructors and type variables; e.g. <literal>Int `Either` Bool</literal>, or
2295 <literal>Int `a` Bool</literal>. Similarly, parentheses work the same; e.g. <literal>(:*:) Int Bool</literal>.
2296 </para></listitem>
2297 <listitem><para>
2298 Fixities may be declared for type constructors, or classes, just as for data constructors. However,
2299 one cannot distinguish between the two in a fixity declaration; a fixity declaration
2300 sets the fixity for a data constructor and the corresponding type constructor. For example:
2301 <screen>
2302 infixl 7 T, :*:
2303 </screen>
2304 sets the fixity for both type constructor <literal>T</literal> and data constructor <literal>T</literal>,
2305 and similarly for <literal>:*:</literal>.
2306 <literal>Int `a` Bool</literal>.
2307 </para></listitem>
2308 <listitem><para>
2309 Function arrow is <literal>infixr</literal> with fixity 0. (This might change; I'm not sure what it should be.)
2310 </para></listitem>
2311
2312 </itemizedlist>
2313 </para>
2314 </sect2>
2315
2316 <sect2 id="type-operators">
2317 <title>Type operators</title>
2318 <para>
2319 In types, an operator symbol like <literal>(+)</literal> is normally treated as a type
2320 <emphasis>variable</emphasis>, just like <literal>a</literal>. Thus in Haskell 98 you can say
2321 <programlisting>
2322 type T (+) = ((+), (+))
2323 -- Just like: type T a = (a,a)
2324
2325 f :: T Int -> Int
2326 f (x,y)= x
2327 </programlisting>
2328 As you can see, using operators in this way is not very useful, and Haskell 98 does not even
2329 allow you to write them infix.
2330 </para>
2331 <para>
2332 The language <option>-XTypeOperators</option> changes this behaviour:
2333 <itemizedlist>
2334 <listitem><para>
2335 Operator symbols become type <emphasis>constructors</emphasis> rather than
2336 type <emphasis>variables</emphasis>.
2337 </para></listitem>
2338 <listitem><para>
2339 Operator symbols in types can be written infix, both in definitions and uses.
2340 for example:
2341 <programlisting>
2342 data a + b = Plus a b
2343 type Foo = Int + Bool
2344 </programlisting>
2345 </para></listitem>
2346 <listitem><para>
2347 There is now some potential ambiguity in import and export lists; for example
2348 if you write <literal>import M( (+) )</literal> do you mean the
2349 <emphasis>function</emphasis> <literal>(+)</literal> or the
2350 <emphasis>type constructor</emphasis> <literal>(+)</literal>?
2351 The default is the former, but GHC allows you to specify the latter
2352 by preceding it with the keyword <literal>type</literal>, thus:
2353 <programlisting>
2354 import M( type (+) )
2355 </programlisting>
2356 </para></listitem>
2357 <listitem><para>
2358 The fixity of a type operator may be set using the usual fixity declarations
2359 but, as in <xref linkend="infix-tycons"/>, the function and type constructor share
2360 a single fixity.
2361 </para></listitem>
2362 </itemizedlist>
2363 </para>
2364 </sect2>
2365
2366 <sect2 id="type-synonyms">
2367 <title>Liberalised type synonyms</title>
2368
2369 <para>
2370 Type synonyms are like macros at the type level, but Haskell 98 imposes many rules
2371 on individual synonym declarations.
2372 With the <option>-XLiberalTypeSynonyms</option> extension,
2373 GHC does validity checking on types <emphasis>only after expanding type synonyms</emphasis>.
2374 That means that GHC can be very much more liberal about type synonyms than Haskell 98.
2375
2376 <itemizedlist>
2377 <listitem> <para>You can write a <literal>forall</literal> (including overloading)
2378 in a type synonym, thus:
2379 <programlisting>
2380 type Discard a = forall b. Show b => a -> b -> (a, String)
2381
2382 f :: Discard a
2383 f x y = (x, show y)
2384
2385 g :: Discard Int -> (Int,String) -- A rank-2 type
2386 g f = f 3 True
2387 </programlisting>
2388 </para>
2389 </listitem>
2390
2391 <listitem><para>
2392 If you also use <option>-XUnboxedTuples</option>,
2393 you can write an unboxed tuple in a type synonym:
2394 <programlisting>
2395 type Pr = (# Int, Int #)
2396
2397 h :: Int -> Pr
2398 h x = (# x, x #)
2399 </programlisting>
2400 </para></listitem>
2401
2402 <listitem><para>
2403 You can apply a type synonym to a forall type:
2404 <programlisting>
2405 type Foo a = a -> a -> Bool
2406
2407 f :: Foo (forall b. b->b)
2408 </programlisting>
2409 After expanding the synonym, <literal>f</literal> has the legal (in GHC) type:
2410 <programlisting>
2411 f :: (forall b. b->b) -> (forall b. b->b) -> Bool
2412 </programlisting>
2413 </para></listitem>
2414
2415 <listitem><para>
2416 You can apply a type synonym to a partially applied type synonym:
2417 <programlisting>
2418 type Generic i o = forall x. i x -> o x
2419 type Id x = x
2420
2421 foo :: Generic Id []
2422 </programlisting>
2423 After expanding the synonym, <literal>foo</literal> has the legal (in GHC) type:
2424 <programlisting>
2425 foo :: forall x. x -> [x]
2426 </programlisting>
2427 </para></listitem>
2428
2429 </itemizedlist>
2430 </para>
2431
2432 <para>
2433 GHC currently does kind checking before expanding synonyms (though even that
2434 could be changed.)
2435 </para>
2436 <para>
2437 After expanding type synonyms, GHC does validity checking on types, looking for
2438 the following mal-formedness which isn't detected simply by kind checking:
2439 <itemizedlist>
2440 <listitem><para>
2441 Type constructor applied to a type involving for-alls.
2442 </para></listitem>
2443 <listitem><para>
2444 Unboxed tuple on left of an arrow.
2445 </para></listitem>
2446 <listitem><para>
2447 Partially-applied type synonym.
2448 </para></listitem>
2449 </itemizedlist>
2450 So, for example,
2451 this will be rejected:
2452 <programlisting>
2453 type Pr = (# Int, Int #)
2454
2455 h :: Pr -> Int
2456 h x = ...
2457 </programlisting>
2458 because GHC does not allow unboxed tuples on the left of a function arrow.
2459 </para>
2460 </sect2>
2461
2462
2463 <sect2 id="existential-quantification">
2464 <title>Existentially quantified data constructors
2465 </title>
2466
2467 <para>
2468 The idea of using existential quantification in data type declarations
2469 was suggested by Perry, and implemented in Hope+ (Nigel Perry, <emphasis>The Implementation
2470 of Practical Functional Programming Languages</emphasis>, PhD Thesis, University of
2471 London, 1991). It was later formalised by Laufer and Odersky
2472 (<emphasis>Polymorphic type inference and abstract data types</emphasis>,
2473 TOPLAS, 16(5), pp1411-1430, 1994).
2474 It's been in Lennart
2475 Augustsson's <command>hbc</command> Haskell compiler for several years, and
2476 proved very useful. Here's the idea. Consider the declaration:
2477 </para>
2478
2479 <para>
2480
2481 <programlisting>
2482 data Foo = forall a. MkFoo a (a -> Bool)
2483 | Nil
2484 </programlisting>
2485
2486 </para>
2487
2488 <para>
2489 The data type <literal>Foo</literal> has two constructors with types:
2490 </para>
2491
2492 <para>
2493
2494 <programlisting>
2495 MkFoo :: forall a. a -> (a -> Bool) -> Foo
2496 Nil :: Foo
2497 </programlisting>
2498
2499 </para>
2500
2501 <para>
2502 Notice that the type variable <literal>a</literal> in the type of <function>MkFoo</function>
2503 does not appear in the data type itself, which is plain <literal>Foo</literal>.
2504 For example, the following expression is fine:
2505 </para>
2506
2507 <para>
2508
2509 <programlisting>
2510 [MkFoo 3 even, MkFoo 'c' isUpper] :: [Foo]
2511 </programlisting>
2512
2513 </para>
2514
2515 <para>
2516 Here, <literal>(MkFoo 3 even)</literal> packages an integer with a function
2517 <function>even</function> that maps an integer to <literal>Bool</literal>; and <function>MkFoo 'c'
2518 isUpper</function> packages a character with a compatible function. These
2519 two things are each of type <literal>Foo</literal> and can be put in a list.
2520 </para>
2521
2522 <para>
2523 What can we do with a value of type <literal>Foo</literal>?. In particular,
2524 what happens when we pattern-match on <function>MkFoo</function>?
2525 </para>
2526
2527 <para>
2528
2529 <programlisting>
2530 f (MkFoo val fn) = ???
2531 </programlisting>
2532
2533 </para>
2534
2535 <para>
2536 Since all we know about <literal>val</literal> and <function>fn</function> is that they
2537 are compatible, the only (useful) thing we can do with them is to
2538 apply <function>fn</function> to <literal>val</literal> to get a boolean. For example:
2539 </para>
2540
2541 <para>
2542
2543 <programlisting>
2544 f :: Foo -> Bool
2545 f (MkFoo val fn) = fn val
2546 </programlisting>
2547
2548 </para>
2549
2550 <para>
2551 What this allows us to do is to package heterogeneous values
2552 together with a bunch of functions that manipulate them, and then treat
2553 that collection of packages in a uniform manner. You can express
2554 quite a bit of object-oriented-like programming this way.
2555 </para>
2556
2557 <sect3 id="existential">
2558 <title>Why existential?
2559 </title>
2560
2561 <para>
2562 What has this to do with <emphasis>existential</emphasis> quantification?
2563 Simply that <function>MkFoo</function> has the (nearly) isomorphic type
2564 </para>
2565
2566 <para>
2567
2568 <programlisting>
2569 MkFoo :: (exists a . (a, a -> Bool)) -> Foo
2570 </programlisting>
2571
2572 </para>
2573
2574 <para>
2575 But Haskell programmers can safely think of the ordinary
2576 <emphasis>universally</emphasis> quantified type given above, thereby avoiding
2577 adding a new existential quantification construct.
2578 </para>
2579
2580 </sect3>
2581
2582 <sect3 id="existential-with-context">
2583 <title>Existentials and type classes</title>
2584
2585 <para>
2586 An easy extension is to allow
2587 arbitrary contexts before the constructor. For example:
2588 </para>
2589
2590 <para>
2591
2592 <programlisting>
2593 data Baz = forall a. Eq a => Baz1 a a
2594 | forall b. Show b => Baz2 b (b -> b)
2595 </programlisting>
2596
2597 </para>
2598
2599 <para>
2600 The two constructors have the types you'd expect:
2601 </para>
2602
2603 <para>
2604
2605 <programlisting>
2606 Baz1 :: forall a. Eq a => a -> a -> Baz
2607 Baz2 :: forall b. Show b => b -> (b -> b) -> Baz
2608 </programlisting>
2609
2610 </para>
2611
2612 <para>
2613 But when pattern matching on <function>Baz1</function> the matched values can be compared
2614 for equality, and when pattern matching on <function>Baz2</function> the first matched
2615 value can be converted to a string (as well as applying the function to it).
2616 So this program is legal:
2617 </para>
2618
2619 <para>
2620
2621 <programlisting>
2622 f :: Baz -> String
2623 f (Baz1 p q) | p == q = "Yes"
2624 | otherwise = "No"
2625 f (Baz2 v fn) = show (fn v)
2626 </programlisting>
2627
2628 </para>
2629
2630 <para>
2631 Operationally, in a dictionary-passing implementation, the
2632 constructors <function>Baz1</function> and <function>Baz2</function> must store the
2633 dictionaries for <literal>Eq</literal> and <literal>Show</literal> respectively, and
2634 extract it on pattern matching.
2635 </para>
2636
2637 </sect3>
2638
2639 <sect3 id="existential-records">
2640 <title>Record Constructors</title>
2641
2642 <para>
2643 GHC allows existentials to be used with records syntax as well. For example:
2644
2645 <programlisting>
2646 data Counter a = forall self. NewCounter
2647 { _this :: self
2648 , _inc :: self -> self
2649 , _display :: self -> IO ()
2650 , tag :: a
2651 }
2652 </programlisting>
2653 Here <literal>tag</literal> is a public field, with a well-typed selector
2654 function <literal>tag :: Counter a -> a</literal>. The <literal>self</literal>
2655 type is hidden from the outside; any attempt to apply <literal>_this</literal>,
2656 <literal>_inc</literal> or <literal>_display</literal> as functions will raise a
2657 compile-time error. In other words, <emphasis>GHC defines a record selector function
2658 only for fields whose type does not mention the existentially-quantified variables</emphasis>.
2659 (This example used an underscore in the fields for which record selectors
2660 will not be defined, but that is only programming style; GHC ignores them.)
2661 </para>
2662
2663 <para>
2664 To make use of these hidden fields, we need to create some helper functions:
2665
2666 <programlisting>
2667 inc :: Counter a -> Counter a
2668 inc (NewCounter x i d t) = NewCounter
2669 { _this = i x, _inc = i, _display = d, tag = t }
2670
2671 display :: Counter a -> IO ()
2672 display NewCounter{ _this = x, _display = d } = d x
2673 </programlisting>
2674
2675 Now we can define counters with different underlying implementations:
2676
2677 <programlisting>
2678 counterA :: Counter String
2679 counterA = NewCounter
2680 { _this = 0, _inc = (1+), _display = print, tag = "A" }
2681
2682 counterB :: Counter String
2683 counterB = NewCounter
2684 { _this = "", _inc = ('#':), _display = putStrLn, tag = "B" }
2685
2686 main = do
2687 display (inc counterA) -- prints "1"
2688 display (inc (inc counterB)) -- prints "##"
2689 </programlisting>
2690
2691 Record update syntax is supported for existentials (and GADTs):
2692 <programlisting>
2693 setTag :: Counter a -> a -> Counter a
2694 setTag obj t = obj{ tag = t }
2695 </programlisting>
2696 The rule for record update is this: <emphasis>
2697 the types of the updated fields may
2698 mention only the universally-quantified type variables
2699 of the data constructor. For GADTs, the field may mention only types
2700 that appear as a simple type-variable argument in the constructor's result
2701 type</emphasis>. For example:
2702 <programlisting>
2703 data T a b where { T1 { f1::a, f2::b, f3::(b,c) } :: T a b } -- c is existential
2704 upd1 t x = t { f1=x } -- OK: upd1 :: T a b -> a' -> T a' b
2705 upd2 t x = t { f3=x } -- BAD (f3's type mentions c, which is
2706 -- existentially quantified)
2707
2708 data G a b where { G1 { g1::a, g2::c } :: G a [c] }
2709 upd3 g x = g { g1=x } -- OK: upd3 :: G a b -> c -> G c b
2710 upd4 g x = g { g2=x } -- BAD (f2's type mentions c, which is not a simple
2711 -- type-variable argument in G1's result type)
2712 </programlisting>
2713 </para>
2714
2715 </sect3>
2716
2717
2718 <sect3>
2719 <title>Restrictions</title>
2720
2721 <para>
2722 There are several restrictions on the ways in which existentially-quantified
2723 constructors can be use.
2724 </para>
2725
2726 <para>
2727
2728 <itemizedlist>
2729 <listitem>
2730
2731 <para>
2732 When pattern matching, each pattern match introduces a new,
2733 distinct, type for each existential type variable. These types cannot
2734 be unified with any other type, nor can they escape from the scope of
2735 the pattern match. For example, these fragments are incorrect:
2736
2737
2738 <programlisting>
2739 f1 (MkFoo a f) = a
2740 </programlisting>
2741
2742
2743 Here, the type bound by <function>MkFoo</function> "escapes", because <literal>a</literal>
2744 is the result of <function>f1</function>. One way to see why this is wrong is to
2745 ask what type <function>f1</function> has:
2746
2747
2748 <programlisting>
2749 f1 :: Foo -> a -- Weird!
2750 </programlisting>
2751
2752
2753 What is this "<literal>a</literal>" in the result type? Clearly we don't mean
2754 this:
2755
2756
2757 <programlisting>
2758 f1 :: forall a. Foo -> a -- Wrong!
2759 </programlisting>
2760
2761
2762 The original program is just plain wrong. Here's another sort of error
2763
2764
2765 <programlisting>
2766 f2 (Baz1 a b) (Baz1 p q) = a==q
2767 </programlisting>
2768
2769
2770 It's ok to say <literal>a==b</literal> or <literal>p==q</literal>, but
2771 <literal>a==q</literal> is wrong because it equates the two distinct types arising
2772 from the two <function>Baz1</function> constructors.
2773
2774
2775 </para>
2776 </listitem>
2777 <listitem>
2778
2779 <para>
2780 You can't pattern-match on an existentially quantified
2781 constructor in a <literal>let</literal> or <literal>where</literal> group of
2782 bindings. So this is illegal:
2783
2784
2785 <programlisting>
2786 f3 x = a==b where { Baz1 a b = x }
2787 </programlisting>
2788
2789 Instead, use a <literal>case</literal> expression:
2790
2791 <programlisting>
2792 f3 x = case x of Baz1 a b -> a==b
2793 </programlisting>
2794
2795 In general, you can only pattern-match
2796 on an existentially-quantified constructor in a <literal>case</literal> expression or
2797 in the patterns of a function definition.
2798
2799 The reason for this restriction is really an implementation one.
2800 Type-checking binding groups is already a nightmare without
2801 existentials complicating the picture. Also an existential pattern
2802 binding at the top level of a module doesn't make sense, because it's
2803 not clear how to prevent the existentially-quantified type "escaping".
2804 So for now, there's a simple-to-state restriction. We'll see how
2805 annoying it is.
2806
2807 </para>
2808 </listitem>
2809 <listitem>
2810
2811 <para>
2812 You can't use existential quantification for <literal>newtype</literal>
2813 declarations. So this is illegal:
2814
2815
2816 <programlisting>
2817 newtype T = forall a. Ord a => MkT a
2818 </programlisting>
2819
2820
2821 Reason: a value of type <literal>T</literal> must be represented as a
2822 pair of a dictionary for <literal>Ord t</literal> and a value of type
2823 <literal>t</literal>. That contradicts the idea that
2824 <literal>newtype</literal> should have no concrete representation.
2825 You can get just the same efficiency and effect by using
2826 <literal>data</literal> instead of <literal>newtype</literal>. If
2827 there is no overloading involved, then there is more of a case for
2828 allowing an existentially-quantified <literal>newtype</literal>,
2829 because the <literal>data</literal> version does carry an
2830 implementation cost, but single-field existentially quantified
2831 constructors aren't much use. So the simple restriction (no
2832 existential stuff on <literal>newtype</literal>) stands, unless there
2833 are convincing reasons to change it.
2834
2835
2836 </para>
2837 </listitem>
2838 <listitem>
2839
2840 <para>
2841 You can't use <literal>deriving</literal> to define instances of a
2842 data type with existentially quantified data constructors.
2843
2844 Reason: in most cases it would not make sense. For example:;
2845
2846 <programlisting>
2847 data T = forall a. MkT [a] deriving( Eq )
2848 </programlisting>
2849
2850 To derive <literal>Eq</literal> in the standard way we would need to have equality
2851 between the single component of two <function>MkT</function> constructors:
2852
2853 <programlisting>
2854 instance Eq T where
2855 (MkT a) == (MkT b) = ???
2856 </programlisting>
2857
2858 But <varname>a</varname> and <varname>b</varname> have distinct types, and so can't be compared.
2859 It's just about possible to imagine examples in which the derived instance
2860 would make sense, but it seems altogether simpler simply to prohibit such
2861 declarations. Define your own instances!
2862 </para>
2863 </listitem>
2864
2865 </itemizedlist>
2866
2867 </para>
2868
2869 </sect3>
2870 </sect2>
2871
2872 <!-- ====================== Generalised algebraic data types ======================= -->
2873
2874 <sect2 id="gadt-style">
2875 <title>Declaring data types with explicit constructor signatures</title>
2876
2877 <para>When the <literal>GADTSyntax</literal> extension is enabled,
2878 GHC allows you to declare an algebraic data type by
2879 giving the type signatures of constructors explicitly. For example:
2880 <programlisting>
2881 data Maybe a where
2882 Nothing :: Maybe a
2883 Just :: a -> Maybe a
2884 </programlisting>
2885 The form is called a "GADT-style declaration"
2886 because Generalised Algebraic Data Types, described in <xref linkend="gadt"/>,
2887 can only be declared using this form.</para>
2888 <para>Notice that GADT-style syntax generalises existential types (<xref linkend="existential-quantification"/>).
2889 For example, these two declarations are equivalent:
2890 <programlisting>
2891 data Foo = forall a. MkFoo a (a -> Bool)
2892 data Foo' where { MKFoo :: a -> (a->Bool) -> Foo' }
2893 </programlisting>
2894 </para>
2895 <para>Any data type that can be declared in standard Haskell-98 syntax
2896 can also be declared using GADT-style syntax.
2897 The choice is largely stylistic, but GADT-style declarations differ in one important respect:
2898 they treat class constraints on the data constructors differently.
2899 Specifically, if the constructor is given a type-class context, that
2900 context is made available by pattern matching. For example:
2901 <programlisting>
2902 data Set a where
2903 MkSet :: Eq a => [a] -> Set a
2904
2905 makeSet :: Eq a => [a] -> Set a
2906 makeSet xs = MkSet (nub xs)
2907
2908 insert :: a -> Set a -> Set a
2909 insert a (MkSet as) | a `elem` as = MkSet as
2910 | otherwise = MkSet (a:as)
2911 </programlisting>
2912 A use of <literal>MkSet</literal> as a constructor (e.g. in the definition of <literal>makeSet</literal>)
2913 gives rise to a <literal>(Eq a)</literal>
2914 constraint, as you would expect. The new feature is that pattern-matching on <literal>MkSet</literal>
2915 (as in the definition of <literal>insert</literal>) makes <emphasis>available</emphasis> an <literal>(Eq a)</literal>
2916 context. In implementation terms, the <literal>MkSet</literal> constructor has a hidden field that stores
2917 the <literal>(Eq a)</literal> dictionary that is passed to <literal>MkSet</literal>; so
2918 when pattern-matching that dictionary becomes available for the right-hand side of the match.
2919 In the example, the equality dictionary is used to satisfy the equality constraint
2920 generated by the call to <literal>elem</literal>, so that the type of
2921 <literal>insert</literal> itself has no <literal>Eq</literal> constraint.
2922 </para>
2923 <para>
2924 For example, one possible application is to reify dictionaries:
2925 <programlisting>
2926 data NumInst a where
2927 MkNumInst :: Num a => NumInst a
2928
2929 intInst :: NumInst Int
2930 intInst = MkNumInst
2931
2932 plus :: NumInst a -> a -> a -> a
2933 plus MkNumInst p q = p + q
2934 </programlisting>
2935 Here, a value of type <literal>NumInst a</literal> is equivalent
2936 to an explicit <literal>(Num a)</literal> dictionary.
2937 </para>
2938 <para>
2939 All this applies to constructors declared using the syntax of <xref linkend="existential-with-context"/>.
2940 For example, the <literal>NumInst</literal> data type above could equivalently be declared
2941 like this:
2942 <programlisting>
2943 data NumInst a
2944 = Num a => MkNumInst (NumInst a)
2945 </programlisting>
2946 Notice that, unlike the situation when declaring an existential, there is
2947 no <literal>forall</literal>, because the <literal>Num</literal> constrains the
2948 data type's universally quantified type variable <literal>a</literal>.
2949 A constructor may have both universal and existential type variables: for example,
2950 the following two declarations are equivalent:
2951 <programlisting>
2952 data T1 a
2953 = forall b. (Num a, Eq b) => MkT1 a b
2954 data T2 a where
2955 MkT2 :: (Num a, Eq b) => a -> b -> T2 a
2956 </programlisting>
2957 </para>
2958 <para>All this behaviour contrasts with Haskell 98's peculiar treatment of
2959 contexts on a data type declaration (Section 4.2.1 of the Haskell 98 Report).
2960 In Haskell 98 the definition
2961 <programlisting>
2962 data Eq a => Set' a = MkSet' [a]
2963 </programlisting>
2964 gives <literal>MkSet'</literal> the same type as <literal>MkSet</literal> above. But instead of
2965 <emphasis>making available</emphasis> an <literal>(Eq a)</literal> constraint, pattern-matching
2966 on <literal>MkSet'</literal> <emphasis>requires</emphasis> an <literal>(Eq a)</literal> constraint!
2967 GHC faithfully implements this behaviour, odd though it is. But for GADT-style declarations,
2968 GHC's behaviour is much more useful, as well as much more intuitive.
2969 </para>
2970
2971 <para>
2972 The rest of this section gives further details about GADT-style data
2973 type declarations.
2974
2975 <itemizedlist>
2976 <listitem><para>
2977 The result type of each data constructor must begin with the type constructor being defined.
2978 If the result type of all constructors
2979 has the form <literal>T a1 ... an</literal>, where <literal>a1 ... an</literal>
2980 are distinct type variables, then the data type is <emphasis>ordinary</emphasis>;
2981 otherwise is a <emphasis>generalised</emphasis> data type (<xref linkend="gadt"/>).
2982 </para></listitem>
2983
2984 <listitem><para>
2985 As with other type signatures, you can give a single signature for several data constructors.
2986 In this example we give a single signature for <literal>T1</literal> and <literal>T2</literal>:
2987 <programlisting>
2988 data T a where
2989 T1,T2 :: a -> T a
2990 T3 :: T a
2991 </programlisting>
2992 </para></listitem>
2993
2994 <listitem><para>
2995 The type signature of
2996 each constructor is independent, and is implicitly universally quantified as usual.
2997 In particular, the type variable(s) in the "<literal>data T a where</literal>" header
2998 have no scope, and different constructors may have different universally-quantified type variables:
2999 <programlisting>
3000 data T a where -- The 'a' has no scope
3001 T1,T2 :: b -> T b -- Means forall b. b -> T b
3002 T3 :: T a -- Means forall a. T a
3003 </programlisting>
3004 </para></listitem>
3005
3006 <listitem><para>
3007 A constructor signature may mention type class constraints, which can differ for
3008 different constructors. For example, this is fine:
3009 <programlisting>
3010 data T a where
3011 T1 :: Eq b => b -> b -> T b
3012 T2 :: (Show c, Ix c) => c -> [c] -> T c
3013 </programlisting>
3014 When pattern matching, these constraints are made available to discharge constraints
3015 in the body of the match. For example:
3016 <programlisting>
3017 f :: T a -> String
3018 f (T1 x y) | x==y = "yes"
3019 | otherwise = "no"
3020 f (T2 a b) = show a
3021 </programlisting>
3022 Note that <literal>f</literal> is not overloaded; the <literal>Eq</literal> constraint arising
3023 from the use of <literal>==</literal> is discharged by the pattern match on <literal>T1</literal>
3024 and similarly the <literal>Show</literal> constraint arising from the use of <literal>show</literal>.
3025 </para></listitem>
3026
3027 <listitem><para>
3028 Unlike a Haskell-98-style
3029 data type declaration, the type variable(s) in the "<literal>data Set a where</literal>" header
3030 have no scope. Indeed, one can write a kind signature instead:
3031 <programlisting>
3032 data Set :: * -> * where ...
3033 </programlisting>
3034 or even a mixture of the two:
3035 <programlisting>
3036 data Bar a :: (* -> *) -> * where ...
3037 </programlisting>
3038 The type variables (if given) may be explicitly kinded, so we could also write the header for <literal>Foo</literal>
3039 like this:
3040 <programlisting>
3041 data Bar a (b :: * -> *) where ...
3042 </programlisting>
3043 </para></listitem>
3044
3045
3046 <listitem><para>
3047 You can use strictness annotations, in the obvious places
3048 in the constructor type:
3049 <programlisting>
3050 data Term a where
3051 Lit :: !Int -> Term Int
3052 If :: Term Bool -> !(Term a) -> !(Term a) -> Term a
3053 Pair :: Term a -> Term b -> Term (a,b)
3054 </programlisting>
3055 </para></listitem>
3056
3057 <listitem><para>
3058 You can use a <literal>deriving</literal> clause on a GADT-style data type
3059 declaration. For example, these two declarations are equivalent
3060 <programlisting>
3061 data Maybe1 a where {
3062 Nothing1 :: Maybe1 a ;
3063 Just1 :: a -> Maybe1 a
3064 } deriving( Eq, Ord )
3065
3066 data Maybe2 a = Nothing2 | Just2 a
3067 deriving( Eq, Ord )
3068 </programlisting>
3069 </para></listitem>
3070
3071 <listitem><para>
3072 The type signature may have quantified type variables that do not appear
3073 in the result type:
3074 <programlisting>
3075 data Foo where
3076 MkFoo :: a -> (a->Bool) -> Foo
3077 Nil :: Foo
3078 </programlisting>
3079 Here the type variable <literal>a</literal> does not appear in the result type
3080 of either constructor.
3081 Although it is universally quantified in the type of the constructor, such
3082 a type variable is often called "existential".
3083 Indeed, the above declaration declares precisely the same type as
3084 the <literal>data Foo</literal> in <xref linkend="existential-quantification"/>.
3085 </para><para>
3086 The type may contain a class context too, of course:
3087 <programlisting>
3088 data Showable where
3089 MkShowable :: Show a => a -> Showable
3090 </programlisting>
3091 </para></listitem>
3092
3093 <listitem><para>
3094 You can use record syntax on a GADT-style data type declaration:
3095
3096 <programlisting>
3097 data Person where
3098 Adult :: { name :: String, children :: [Person] } -> Person
3099 Child :: Show a => { name :: !String, funny :: a } -> Person
3100 </programlisting>
3101 As usual, for every constructor that has a field <literal>f</literal>, the type of
3102 field <literal>f</literal> must be the same (modulo alpha conversion).
3103 The <literal>Child</literal> constructor above shows that the signature
3104 may have a context, existentially-quantified variables, and strictness annotations,
3105 just as in the non-record case. (NB: the "type" that follows the double-colon
3106 is not really a type, because of the record syntax and strictness annotations.
3107 A "type" of this form can appear only in a constructor signature.)
3108 </para></listitem>
3109
3110 <listitem><para>
3111 Record updates are allowed with GADT-style declarations,
3112 only fields that have the following property: the type of the field
3113 mentions no existential type variables.
3114 </para></listitem>
3115
3116 <listitem><para>
3117 As in the case of existentials declared using the Haskell-98-like record syntax
3118 (<xref linkend="existential-records"/>),
3119 record-selector functions are generated only for those fields that have well-typed
3120 selectors.
3121 Here is the example of that section, in GADT-style syntax:
3122 <programlisting>
3123 data Counter a where
3124 NewCounter :: { _this :: self
3125 , _inc :: self -> self
3126 , _display :: self -> IO ()
3127 , tag :: a
3128 } -> Counter a
3129 </programlisting>
3130 As before, only one selector function is generated here, that for <literal>tag</literal>.
3131 Nevertheless, you can still use all the field names in pattern matching and record construction.
3132 </para></listitem>
3133
3134 <listitem><para>
3135 In a GADT-style data type declaration there is no obvious way to specify that a data constructor
3136 should be infix, which makes a difference if you derive <literal>Show</literal> for the type.
3137 (Data constructors declared infix are displayed infix by the derived <literal>show</literal>.)
3138 So GHC implements the following design: a data constructor declared in a GADT-style data type
3139 declaration is displayed infix by <literal>Show</literal> iff (a) it is an operator symbol,
3140 (b) it has two arguments, (c) it has a programmer-supplied fixity declaration. For example
3141 <programlisting>
3142 infix 6 (:--:)
3143 data T a where
3144 (:--:) :: Int -> Bool -> T Int
3145 </programlisting>
3146 </para></listitem>
3147 </itemizedlist></para>
3148 </sect2>
3149
3150 <sect2 id="gadt">
3151 <title>Generalised Algebraic Data Types (GADTs)</title>
3152
3153 <para>Generalised Algebraic Data Types generalise ordinary algebraic data types
3154 by allowing constructors to have richer return types. Here is an example:
3155 <programlisting>
3156 data Term a where
3157 Lit :: Int -> Term Int
3158 Succ :: Term Int -> Term Int
3159 IsZero :: Term Int -> Term Bool
3160 If :: Term Bool -> Term a -> Term a -> Term a
3161 Pair :: Term a -> Term b -> Term (a,b)
3162 </programlisting>
3163 Notice that the return type of the constructors is not always <literal>Term a</literal>, as is the
3164 case with ordinary data types. This generality allows us to
3165 write a well-typed <literal>eval</literal> function
3166 for these <literal>Terms</literal>:
3167 <programlisting>
3168 eval :: Term a -> a
3169 eval (Lit i) = i
3170 eval (Succ t) = 1 + eval t
3171 eval (IsZero t) = eval t == 0
3172 eval (If b e1 e2) = if eval b then eval e1 else eval e2
3173 eval (Pair e1 e2) = (eval e1, eval e2)
3174 </programlisting>
3175 The key point about GADTs is that <emphasis>pattern matching causes type refinement</emphasis>.
3176 For example, in the right hand side of the equation
3177 <programlisting>
3178 eval :: Term a -> a
3179 eval (Lit i) = ...
3180 </programlisting>
3181 the type <literal>a</literal> is refined to <literal>Int</literal>. That's the whole point!
3182 A precise specification of the type rules is beyond what this user manual aspires to,
3183 but the design closely follows that described in
3184 the paper <ulink
3185 url="http://research.microsoft.com/%7Esimonpj/papers/gadt/">Simple
3186 unification-based type inference for GADTs</ulink>,
3187 (ICFP 2006).
3188 The general principle is this: <emphasis>type refinement is only carried out
3189 based on user-supplied type annotations</emphasis>.
3190 So if no type signature is supplied for <literal>eval</literal>, no type refinement happens,
3191 and lots of obscure error messages will
3192 occur. However, the refinement is quite general. For example, if we had:
3193 <programlisting>
3194 eval :: Term a -> a -> a
3195 eval (Lit i) j = i+j
3196 </programlisting>
3197 the pattern match causes the type <literal>a</literal> to be refined to <literal>Int</literal> (because of the type
3198 of the constructor <literal>Lit</literal>), and that refinement also applies to the type of <literal>j</literal>, and
3199 the result type of the <literal>case</literal> expression. Hence the addition <literal>i+j</literal> is legal.
3200 </para>
3201 <para>
3202 These and many other examples are given in papers by Hongwei Xi, and
3203 Tim Sheard. There is a longer introduction
3204 <ulink url="http://www.haskell.org/haskellwiki/GADT">on the wiki</ulink>,
3205 and Ralf Hinze's
3206 <ulink url="http://www.informatik.uni-bonn.de/~ralf/publications/With.pdf">Fun with phantom types</ulink> also has a number of examples. Note that papers
3207 may use different notation to that implemented in GHC.
3208 </para>
3209 <para>
3210 The rest of this section outlines the extensions to GHC that support GADTs. The extension is enabled with
3211 <option>-XGADTs</option>. The <option>-XGADTs</option> flag also sets <option>-XRelaxedPolyRec</option>.
3212 <itemizedlist>
3213 <listitem><para>
3214 A GADT can only be declared using GADT-style syntax (<xref linkend="gadt-style"/>);
3215 the old Haskell-98 syntax for data declarations always declares an ordinary data type.
3216 The result type of each constructor must begin with the type constructor being defined,
3217 but for a GADT the arguments to the type constructor can be arbitrary monotypes.
3218 For example, in the <literal>Term</literal> data
3219 type above, the type of each constructor must end with <literal>Term ty</literal>, but
3220 the <literal>ty</literal> need not be a type variable (e.g. the <literal>Lit</literal>
3221 constructor).
3222 </para></listitem>
3223
3224 <listitem><para>
3225 It is permitted to declare an ordinary algebraic data type using GADT-style syntax.
3226 What makes a GADT into a GADT is not the syntax, but rather the presence of data constructors
3227 whose result type is not just <literal>T a b</literal>.
3228 </para></listitem>
3229
3230 <listitem><para>
3231 You cannot use a <literal>deriving</literal> clause for a GADT; only for
3232 an ordinary data type.
3233 </para></listitem>
3234
3235 <listitem><para>
3236 As mentioned in <xref linkend="gadt-style"/>, record syntax is supported.
3237 For example:
3238 <programlisting>
3239 data Term a where
3240 Lit :: { val :: Int } -> Term Int
3241 Succ :: { num :: Term Int } -> Term Int
3242 Pred :: { num :: Term Int } -> Term Int
3243 IsZero :: { arg :: Term Int } -> Term Bool
3244 Pair :: { arg1 :: Term a
3245 , arg2 :: Term b
3246 } -> Term (a,b)
3247 If :: { cnd :: Term Bool
3248 , tru :: Term a
3249 , fls :: Term a
3250 } -> Term a
3251 </programlisting>
3252 However, for GADTs there is the following additional constraint:
3253 every constructor that has a field <literal>f</literal> must have
3254 the same result type (modulo alpha conversion)
3255 Hence, in the above example, we cannot merge the <literal>num</literal>
3256 and <literal>arg</literal> fields above into a
3257 single name. Although their field types are both <literal>Term Int</literal>,
3258 their selector functions actually have different types:
3259
3260 <programlisting>
3261 num :: Term Int -> Term Int
3262 arg :: Term Bool -> Term Int
3263 </programlisting>
3264 </para></listitem>
3265
3266 <listitem><para>
3267 When pattern-matching against data constructors drawn from a GADT,
3268 for example in a <literal>case</literal> expression, the following rules apply:
3269 <itemizedlist>
3270 <listitem><para>The type of the scrutinee must be rigid.</para></listitem>
3271 <listitem><para>The type of the entire <literal>case</literal> expression must be rigid.</para></listitem>
3272 <listitem><para>The type of any free variable mentioned in any of
3273 the <literal>case</literal> alternatives must be rigid.</para></listitem>
3274 </itemizedlist>
3275 A type is "rigid" if it is completely known to the compiler at its binding site. The easiest
3276 way to ensure that a variable a rigid type is to give it a type signature.
3277 For more precise details see <ulink url="http://research.microsoft.com/%7Esimonpj/papers/gadt">
3278 Simple unification-based type inference for GADTs
3279 </ulink>. The criteria implemented by GHC are given in the Appendix.
3280
3281 </para></listitem>
3282
3283 </itemizedlist>
3284 </para>
3285
3286 </sect2>
3287 </sect1>
3288
3289 <!-- ====================== End of Generalised algebraic data types ======================= -->
3290
3291 <sect1 id="deriving">
3292 <title>Extensions to the "deriving" mechanism</title>
3293
3294 <sect2 id="deriving-inferred">
3295 <title>Inferred context for deriving clauses</title>
3296
3297 <para>
3298 The Haskell Report is vague about exactly when a <literal>deriving</literal> clause is
3299 legal. For example:
3300 <programlisting>
3301 data T0 f a = MkT0 a deriving( Eq )
3302 data T1 f a = MkT1 (f a) deriving( Eq )
3303 data T2 f a = MkT2 (f (f a)) deriving( Eq )
3304 </programlisting>
3305 The natural generated <literal>Eq</literal> code would result in these instance declarations:
3306 <programlisting>
3307 instance Eq a => Eq (T0 f a) where ...
3308 instance Eq (f a) => Eq (T1 f a) where ...
3309 instance Eq (f (f a)) => Eq (T2 f a) where ...
3310 </programlisting>
3311 The first of these is obviously fine. The second is still fine, although less obviously.
3312 The third is not Haskell 98, and risks losing termination of instances.
3313 </para>
3314 <para>
3315 GHC takes a conservative position: it accepts the first two, but not the third. The rule is this:
3316 each constraint in the inferred instance context must consist only of type variables,
3317 with no repetitions.
3318 </para>
3319 <para>
3320 This rule is applied regardless of flags. If you want a more exotic context, you can write
3321 it yourself, using the <link linkend="stand-alone-deriving">standalone deriving mechanism</link>.
3322 </para>
3323 </sect2>
3324
3325 <sect2 id="stand-alone-deriving">
3326 <title>Stand-alone deriving declarations</title>
3327
3328 <para>
3329 GHC now allows stand-alone <literal>deriving</literal> declarations, enabled by <literal>-XStandaloneDeriving</literal>:
3330 <programlisting>
3331 data Foo a = Bar a | Baz String
3332
3333 deriving instance Eq a => Eq (Foo a)
3334 </programlisting>
3335 The syntax is identical to that of an ordinary instance declaration apart from (a) the keyword
3336 <literal>deriving</literal>, and (b) the absence of the <literal>where</literal> part.
3337 Note the following points:
3338 <itemizedlist>
3339 <listitem><para>
3340 You must supply an explicit context (in the example the context is <literal>(Eq a)</literal>),
3341 exactly as you would in an ordinary instance declaration.
3342 (In contrast, in a <literal>deriving</literal> clause
3343 attached to a data type declaration, the context is inferred.)
3344 </para></listitem>
3345
3346 <listitem><para>
3347 A <literal>deriving instance</literal> declaration
3348 must obey the same rules concerning form and termination as ordinary instance declarations,
3349 controlled by the same flags; see <xref linkend="instance-decls"/>.
3350 </para></listitem>
3351
3352 <listitem><para>
3353 Unlike a <literal>deriving</literal>
3354 declaration attached to a <literal>data</literal> declaration, the instance can be more specific
3355 than the data type (assuming you also use
3356 <literal>-XFlexibleInstances</literal>, <xref linkend="instance-rules"/>). Consider
3357 for example
3358 <programlisting>
3359 data Foo a = Bar a | Baz String
3360
3361 deriving instance Eq a => Eq (Foo [a])
3362 deriving instance Eq a => Eq (Foo (Maybe a))
3363 </programlisting>
3364 This will generate a derived instance for <literal>(Foo [a])</literal> and <literal>(Foo (Maybe a))</literal>,
3365 but other types such as <literal>(Foo (Int,Bool))</literal> will not be an instance of <literal>Eq</literal>.
3366 </para></listitem>
3367
3368 <listitem><para>
3369 Unlike a <literal>deriving</literal>
3370 declaration attached to a <literal>data</literal> declaration,
3371 GHC does not restrict the form of the data type. Instead, GHC simply generates the appropriate
3372 boilerplate code for the specified class, and typechecks it. If there is a type error, it is
3373 your problem. (GHC will show you the offending code if it has a type error.)
3374 The merit of this is that you can derive instances for GADTs and other exotic
3375 data types, providing only that the boilerplate code does indeed typecheck. For example:
3376 <programlisting>
3377 data T a where
3378 T1 :: T Int
3379 T2 :: T Bool
3380
3381 deriving instance Show (T a)
3382 </programlisting>
3383 In this example, you cannot say <literal>... deriving( Show )</literal> on the
3384 data type declaration for <literal>T</literal>,
3385 because <literal>T</literal> is a GADT, but you <emphasis>can</emphasis> generate
3386 the instance declaration using stand-alone deriving.
3387 </para>
3388 </listitem>
3389
3390 <listitem>
3391 <para>The stand-alone syntax is generalised for newtypes in exactly the same
3392 way that ordinary <literal>deriving</literal> clauses are generalised (<xref linkend="newtype-deriving"/>).
3393 For example:
3394 <programlisting>
3395 newtype Foo a = MkFoo (State Int a)
3396
3397 deriving instance MonadState Int Foo
3398 </programlisting>
3399 GHC always treats the <emphasis>last</emphasis> parameter of the instance
3400 (<literal>Foo</literal> in this example) as the type whose instance is being derived.
3401 </para></listitem>
3402 </itemizedlist></para>
3403
3404 </sect2>
3405
3406
3407 <sect2 id="deriving-typeable">
3408 <title>Deriving clause for extra classes (<literal>Typeable</literal>, <literal>Data</literal>, etc)</title>
3409
3410 <para>
3411 Haskell 98 allows the programmer to add "<literal>deriving( Eq, Ord )</literal>" to a data type
3412 declaration, to generate a standard instance declaration for classes specified in the <literal>deriving</literal> clause.
3413 In Haskell 98, the only classes that may appear in the <literal>deriving</literal> clause are the standard
3414 classes <literal>Eq</literal>, <literal>Ord</literal>,
3415 <literal>Enum</literal>, <literal>Ix</literal>, <literal>Bounded</literal>, <literal>Read</literal>, and <literal>Show</literal>.
3416 </para>
3417 <para>
3418 GHC extends this list with several more classes that may be automatically derived:
3419 <itemizedlist>
3420 <listitem><para> With <option>-XDeriveDataTypeable</option>, you can derive instances of the classes
3421 <literal>Typeable</literal>, and <literal>Data</literal>, defined in the library
3422 modules <literal>Data.Typeable</literal> and <literal>Data.Generics</literal> respectively.
3423 </para>
3424 <para>Since GHC 7.8.1, <literal>Typeable</literal> is kind-polymorphic (see
3425 <xref linkend="kind-polymorphism"/>) and can be derived for any datatype and
3426 type class. Instances for datatypes can be derived by attaching a
3427 <literal>deriving Typeable</literal> clause to the datatype declaration, or by
3428 using standalone deriving (see <xref linkend="stand-alone-deriving"/>).
3429 Instances for type classes can only be derived using standalone deriving.
3430 For data families, <literal>Typeable</literal> should only be derived for the
3431 uninstantiated family type; each instance will then automatically have a
3432 <literal>Typeable</literal> instance too.
3433 See also <xref linkend="auto-derive-typeable"/>.
3434 </para>
3435 <para>
3436 Also since GHC 7.8.1, handwritten (ie. not derived) instances of
3437 <literal>Typeable</literal> are forbidden, and will be ignored with a warning.
3438 </para>
3439 </listitem>
3440
3441 <listitem><para> With <option>-XDeriveGeneric</option>, you can derive
3442 instances of the classes <literal>Generic</literal> and
3443 <literal>Generic1</literal>, defined in <literal>GHC.Generics</literal>.
3444 You can use these to define generic functions,
3445 as described in <xref linkend="generic-programming"/>.
3446 </para></listitem>
3447
3448 <listitem><para> With <option>-XDeriveFunctor</option>, you can derive instances of
3449 the class <literal>Functor</literal>,
3450 defined in <literal>GHC.Base</literal>.
3451 </para></listitem>
3452
3453 <listitem><para> With <option>-XDeriveFoldable</option>, you can derive instances of
3454 the class <literal>Foldable</literal>,
3455 defined in <literal>Data.Foldable</literal>.
3456 </para></listitem>
3457
3458 <listitem><para> With <option>-XDeriveTraversable</option>, you can derive instances of
3459 the class <literal>Traversable</literal>,
3460 defined in <literal>Data.Traversable</literal>.
3461 </para></listitem>
3462 </itemizedlist>
3463 In each case the appropriate class must be in scope before it
3464 can be mentioned in the <literal>deriving</literal> clause.
3465 </para>
3466 </sect2>
3467
3468 <sect2 id="auto-derive-typeable">
3469 <title>Automatically deriving <literal>Typeable</literal> instances</title>
3470
3471 <para>
3472 The flag <option>-XAutoDeriveTypeable</option> triggers the generation
3473 of derived <literal>Typeable</literal> instances for every datatype and type
3474 class declaration in the module it is used. It will also generate
3475 <literal>Typeable</literal> instances for any promoted data constructors
3476 (<xref linkend="promotion"/>). This flag implies
3477 <option>-XDeriveDataTypeable</option> (<xref linkend="deriving-typeable"/>).
3478 </para>
3479
3480 </sect2>
3481
3482 <sect2 id="newtype-deriving">
3483 <title>Generalised derived instances for newtypes</title>
3484
3485 <para>
3486 When you define an abstract type using <literal>newtype</literal>, you may want
3487 the new type to inherit some instances from its representation. In
3488 Haskell 98, you can inherit instances of <literal>Eq</literal>, <literal>Ord</literal>,
3489 <literal>Enum</literal> and <literal>Bounded</literal> by deriving them, but for any
3490 other classes you have to write an explicit instance declaration. For
3491 example, if you define
3492
3493 <programlisting>
3494 newtype Dollars = Dollars Int
3495 </programlisting>
3496
3497 and you want to use arithmetic on <literal>Dollars</literal>, you have to
3498 explicitly define an instance of <literal>Num</literal>:
3499
3500 <programlisting>
3501 instance Num Dollars where
3502 Dollars a + Dollars b = Dollars (a+b)
3503 ...
3504 </programlisting>
3505 All the instance does is apply and remove the <literal>newtype</literal>
3506 constructor. It is particularly galling that, since the constructor
3507 doesn't appear at run-time, this instance declaration defines a
3508 dictionary which is <emphasis>wholly equivalent</emphasis> to the <literal>Int</literal>
3509 dictionary, only slower!
3510 </para>
3511
3512
3513 <sect3> <title> Generalising the deriving clause </title>
3514 <para>
3515 GHC now permits such instances to be derived instead,
3516 using the flag <option>-XGeneralizedNewtypeDeriving</option>,
3517 so one can write
3518 <programlisting>
3519 newtype Dollars = Dollars Int deriving (Eq,Show,Num)
3520 </programlisting>
3521
3522 and the implementation uses the <emphasis>same</emphasis> <literal>Num</literal> dictionary
3523 for <literal>Dollars</literal> as for <literal>Int</literal>. Notionally, the compiler
3524 derives an instance declaration of the form
3525
3526 <programlisting>
3527 instance Num Int => Num Dollars
3528 </programlisting>
3529
3530 which just adds or removes the <literal>newtype</literal> constructor according to the type.
3531 </para>
3532 <para>
3533
3534 We can also derive instances of constructor classes in a similar
3535 way. For example, suppose we have implemented state and failure monad
3536 transformers, such that
3537
3538 <programlisting>
3539 instance Monad m => Monad (State s m)
3540 instance Monad m => Monad (Failure m)
3541 </programlisting>
3542 In Haskell 98, we can define a parsing monad by
3543 <programlisting>
3544 type Parser tok m a = State [tok] (Failure m) a
3545 </programlisting>
3546
3547 which is automatically a monad thanks to the instance declarations
3548 above. With the extension, we can make the parser type abstract,
3549 without needing to write an instance of class <literal>Monad</literal>, via
3550
3551 <programlisting>
3552 newtype Parser tok m a = Parser (State [tok] (Failure m) a)
3553 deriving Monad
3554 </programlisting>
3555 In this case the derived instance declaration is of the form
3556 <programlisting>
3557 instance Monad (State [tok] (Failure m)) => Monad (Parser tok m)
3558 </programlisting>
3559
3560 Notice that, since <literal>Monad</literal> is a constructor class, the
3561 instance is a <emphasis>partial application</emphasis> of the new type, not the
3562 entire left hand side. We can imagine that the type declaration is
3563 "eta-converted" to generate the context of the instance
3564 declaration.
3565 </para>
3566 <para>
3567
3568 We can even derive instances of multi-parameter classes, provided the
3569 newtype is the last class parameter. In this case, a ``partial
3570 application'' of the class appears in the <literal>deriving</literal>
3571 clause. For example, given the class
3572
3573 <programlisting>
3574 class StateMonad s m | m -> s where ...
3575 instance Monad m => StateMonad s (State s m) where ...
3576 </programlisting>
3577 then we can derive an instance of <literal>StateMonad</literal> for <literal>Parser</literal>s by
3578 <programlisting>
3579 newtype Parser tok m a = Parser (State [tok] (Failure m) a)
3580 deriving (Monad, StateMonad [tok])
3581 </programlisting>
3582
3583 The derived instance is obtained by completing the application of the
3584 class to the new type:
3585
3586 <programlisting>
3587 instance StateMonad [tok] (State [tok] (Failure m)) =>
3588 StateMonad [tok] (Parser tok m)
3589 </programlisting>
3590 </para>
3591 <para>
3592
3593 As a result of this extension, all derived instances in newtype
3594 declarations are treated uniformly (and implemented just by reusing
3595 the dictionary for the representation type), <emphasis>except</emphasis>
3596 <literal>Show</literal> and <literal>Read</literal>, which really behave differently for
3597 the newtype and its representation.
3598 </para>
3599 </sect3>
3600
3601 <sect3> <title> A more precise specification </title>
3602 <para>
3603 Derived instance declarations are constructed as follows. Consider the
3604 declaration (after expansion of any type synonyms)
3605
3606 <programlisting>
3607 newtype T v1...vn = T' (t vk+1...vn) deriving (c1...cm)
3608 </programlisting>
3609
3610 where
3611 <itemizedlist>
3612 <listitem><para>
3613 The <literal>ci</literal> are partial applications of
3614 classes of the form <literal>C t1'...tj'</literal>, where the arity of <literal>C</literal>
3615 is exactly <literal>j+1</literal>. That is, <literal>C</literal> lacks exactly one type argument.
3616 </para></listitem>
3617 <listitem><para>
3618 The <literal>k</literal> is chosen so that <literal>ci (T v1...vk)</literal> is well-kinded.
3619 </para></listitem>
3620 <listitem><para>
3621 The type <literal>t</literal> is an arbitrary type.
3622 </para></listitem>
3623 <listitem><para>
3624 The type variables <literal>vk+1...vn</literal> do not occur in <literal>t</literal>,
3625 nor in the <literal>ci</literal>, and
3626 </para></listitem>
3627 <listitem><para>
3628 None of the <literal>ci</literal> is <literal>Read</literal>, <literal>Show</literal>,
3629 <literal>Typeable</literal>, or <literal>Data</literal>. These classes
3630 should not "look through" the type or its constructor. You can still
3631 derive these classes for a newtype, but it happens in the usual way, not
3632 via this new mechanism.
3633 </para></listitem>
3634 </itemizedlist>
3635 Then, for each <literal>ci</literal>, the derived instance
3636 declaration is:
3637 <programlisting>
3638 instance ci t => ci (T v1...vk)
3639 </programlisting>
3640 As an example which does <emphasis>not</emphasis> work, consider
3641 <programlisting>
3642 newtype NonMonad m s = NonMonad (State s m s) deriving Monad
3643 </programlisting>
3644 Here we cannot derive the instance
3645 <programlisting>
3646 instance Monad (State s m) => Monad (NonMonad m)
3647 </programlisting>
3648
3649 because the type variable <literal>s</literal> occurs in <literal>State s m</literal>,
3650 and so cannot be "eta-converted" away. It is a good thing that this
3651 <literal>deriving</literal> clause is rejected, because <literal>NonMonad m</literal> is
3652 not, in fact, a monad --- for the same reason. Try defining
3653 <literal>>>=</literal> with the correct type: you won't be able to.
3654 </para>
3655 <para>
3656
3657 Notice also that the <emphasis>order</emphasis> of class parameters becomes
3658 important, since we can only derive instances for the last one. If the
3659 <literal>StateMonad</literal> class above were instead defined as
3660
3661 <programlisting>
3662 class StateMonad m s | m -> s where ...
3663 </programlisting>
3664
3665 then we would not have been able to derive an instance for the
3666 <literal>Parser</literal> type above. We hypothesise that multi-parameter
3667 classes usually have one "main" parameter for which deriving new
3668 instances is most interesting.
3669 </para>
3670 <para>Lastly, all of this applies only for classes other than
3671 <literal>Read</literal>, <literal>Show</literal>, <literal>Typeable</literal>,
3672 and <literal>Data</literal>, for which the built-in derivation applies (section
3673 4.3.3. of the Haskell Report).
3674 (For the standard classes <literal>Eq</literal>, <literal>Ord</literal>,
3675 <literal>Ix</literal>, and <literal>Bounded</literal> it is immaterial whether
3676 the standard method is used or the one described here.)
3677 </para>
3678 </sect3>
3679 </sect2>
3680 </sect1>
3681
3682
3683 <!-- TYPE SYSTEM EXTENSIONS -->
3684 <sect1 id="type-class-extensions">
3685 <title>Class and instances declarations</title>
3686
3687 <sect2 id="multi-param-type-classes">
3688 <title>Class declarations</title>
3689
3690 <para>
3691 This section, and the next one, documents GHC's type-class extensions.
3692 There's lots of background in the paper <ulink
3693 url="http://research.microsoft.com/~simonpj/Papers/type-class-design-space/">Type
3694 classes: exploring the design space</ulink> (Simon Peyton Jones, Mark
3695 Jones, Erik Meijer).
3696 </para>
3697
3698 <sect3>
3699 <title>Multi-parameter type classes</title>
3700 <para>
3701 Multi-parameter type classes are permitted, with flag <option>-XMultiParamTypeClasses</option>.
3702 For example:
3703
3704
3705 <programlisting>
3706 class Collection c a where
3707 union :: c a -> c a -> c a
3708 ...etc.
3709 </programlisting>
3710
3711 </para>
3712 </sect3>
3713
3714 <sect3 id="superclass-rules">
3715 <title>The superclasses of a class declaration</title>
3716
3717 <para>
3718 In Haskell 98 the context of a class declaration (which introduces superclasses)
3719 must be simple; that is, each predicate must consist of a class applied to
3720 type variables. The flag <option>-XFlexibleContexts</option>
3721 (<xref linkend="flexible-contexts"/>)
3722 lifts this restriction,
3723 so that the only restriction on the context in a class declaration is
3724 that the class hierarchy must be acyclic. So these class declarations are OK:
3725
3726
3727 <programlisting>
3728 class Functor (m k) => FiniteMap m k where
3729 ...
3730
3731 class (Monad m, Monad (t m)) => Transform t m where
3732 lift :: m a -> (t m) a
3733 </programlisting>
3734
3735
3736 </para>
3737 <para>
3738 As in Haskell 98, The class hierarchy must be acyclic. However, the definition
3739 of "acyclic" involves only the superclass relationships. For example,
3740 this is OK:
3741
3742
3743 <programlisting>
3744 class C a where {
3745 op :: D b => a -> b -> b
3746 }
3747
3748 class C a => D a where { ... }
3749 </programlisting>
3750
3751
3752 Here, <literal>C</literal> is a superclass of <literal>D</literal>, but it's OK for a
3753 class operation <literal>op</literal> of <literal>C</literal> to mention <literal>D</literal>. (It
3754 would not be OK for <literal>D</literal> to be a superclass of <literal>C</literal>.)
3755 </para>
3756 <para>
3757 With the extension that adds a <link linkend="constraint-kind">kind of constraints</link>,
3758 you can write more exotic superclass definitions. The superclass cycle check is even more
3759 liberal in these case. For example, this is OK:
3760
3761 <programlisting>
3762 class A cls c where
3763 meth :: cls c => c -> c
3764
3765 class A B c => B c where
3766 </programlisting>
3767
3768 A superclass context for a class <literal>C</literal> is allowed if, after expanding
3769 type synonyms to their right-hand-sides, and uses of classes (other than <literal>C</literal>)
3770 to their superclasses, <literal>C</literal> does not occur syntactically in the context.
3771 </para>
3772 </sect3>
3773
3774
3775
3776
3777 <sect3 id="class-method-types">
3778 <title>Class method types</title>
3779
3780 <para>
3781 Haskell 98 prohibits class method types to mention constraints on the
3782 class type variable, thus:
3783 <programlisting>
3784 class Seq s a where
3785 fromList :: [a] -> s a
3786 elem :: Eq a => a -> s a -> Bool
3787 </programlisting>
3788 The type of <literal>elem</literal> is illegal in Haskell 98, because it
3789 contains the constraint <literal>Eq a</literal>, constrains only the
3790 class type variable (in this case <literal>a</literal>).
3791 GHC lifts this restriction (flag <option>-XConstrainedClassMethods</option>).
3792 </para>
3793
3794
3795 </sect3>
3796
3797
3798 <sect3 id="class-default-signatures">
3799 <title>Default method signatures</title>
3800
3801 <para>
3802 Haskell 98 allows you to define a default implementation when declaring a class:
3803 <programlisting>
3804 class Enum a where
3805 enum :: [a]
3806 enum = []
3807 </programlisting>
3808 The type of the <literal>enum</literal> method is <literal>[a]</literal>, and
3809 this is also the type of the default method. You can lift this restriction
3810 and give another type to the default method using the flag
3811 <option>-XDefaultSignatures</option>. For instance, if you have written a
3812 generic implementation of enumeration in a class <literal>GEnum</literal>
3813 with method <literal>genum</literal> in terms of <literal>GHC.Generics</literal>,
3814 you can specify a default method that uses that generic implementation:
3815 <programlisting>
3816 class Enum a where
3817 enum :: [a]
3818 default enum :: (Generic a, GEnum (Rep a)) => [a]
3819 enum = map to genum
3820 </programlisting>
3821 We reuse the keyword <literal>default</literal> to signal that a signature
3822 applies to the default method only; when defining instances of the
3823 <literal>Enum</literal> class, the original type <literal>[a]</literal> of
3824 <literal>enum</literal> still applies. When giving an empty instance, however,
3825 the default implementation <literal>map to genum</literal> is filled-in,
3826 and type-checked with the type
3827 <literal>(Generic a, GEnum (Rep a)) => [a]</literal>.
3828 </para>
3829
3830 <para>
3831 We use default signatures to simplify generic programming in GHC
3832 (<xref linkend="generic-programming"/>).
3833 </para>
3834
3835
3836 </sect3>
3837
3838 <sect3 id="nullary-type-classes">
3839 <title>Nullary type classes</title>
3840 Nullary (no parameter) type classes are enabled with <option>-XNullaryTypeClasses</option>.
3841 Since there are no available parameters, there can be at most one instance
3842 of a nullary class. A nullary type class might be used to document some assumption
3843 in a type signature (such as reliance on the Riemann hypothesis) or add some
3844 globally configurable settings in a program. For example,
3845
3846 <programlisting>
3847 class RiemannHypothesis where
3848 assumeRH :: a -> a
3849
3850 -- Deterministic version of the Miller test
3851 -- correctness depends on the generalized Riemann hypothesis
3852 isPrime :: RiemannHypothesis => Integer -> Bool
3853 isPrime n = assumeRH (...)
3854 </programlisting>
3855
3856 The type signature of <literal>isPrime</literal> informs users that its correctness
3857 depends on an unproven conjecture. If the function is used, the user has
3858 to acknowledge the dependence with:
3859
3860 <programlisting>
3861 instance RiemannHypothesis where
3862 assumeRH = id
3863 </programlisting>
3864
3865 </sect3>
3866 </sect2>
3867
3868 <sect2 id="functional-dependencies">
3869 <title>Functional dependencies
3870 </title>
3871
3872 <para> Functional dependencies are implemented as described by Mark Jones
3873 in &ldquo;<ulink url="http://citeseer.ist.psu.edu/jones00type.html">Type Classes with Functional Dependencies</ulink>&rdquo;, Mark P. Jones,
3874 In Proceedings of the 9th European Symposium on Programming,
3875 ESOP 2000, Berlin, Germany, March 2000, Springer-Verlag LNCS 1782,
3876 .
3877 </para>
3878 <para>
3879 Functional dependencies are introduced by a vertical bar in the syntax of a
3880 class declaration; e.g.
3881 <programlisting>
3882 class (Monad m) => MonadState s m | m -> s where ...
3883
3884 class Foo a b c | a b -> c where ...
3885 </programlisting>
3886 There should be more documentation, but there isn't (yet). Yell if you need it.
3887 </para>
3888
3889 <sect3><title>Rules for functional dependencies </title>
3890 <para>
3891 In a class declaration, all of the class type variables must be reachable (in the sense
3892 mentioned in <xref linkend="flexible-contexts"/>)
3893 from the free variables of each method type.
3894 For example:
3895
3896 <programlisting>
3897 class Coll s a where
3898 empty :: s
3899 insert :: s -> a -> s
3900 </programlisting>
3901
3902 is not OK, because the type of <literal>empty</literal> doesn't mention
3903 <literal>a</literal>. Functional dependencies can make the type variable
3904 reachable:
3905 <programlisting>
3906 class Coll s a | s -> a where
3907 empty :: s
3908 insert :: s -> a -> s
3909 </programlisting>
3910
3911 Alternatively <literal>Coll</literal> might be rewritten
3912
3913 <programlisting>
3914 class Coll s a where
3915 empty :: s a
3916 insert :: s a -> a -> s a
3917 </programlisting>
3918
3919
3920 which makes the connection between the type of a collection of
3921 <literal>a</literal>'s (namely <literal>(s a)</literal>) and the element type <literal>a</literal>.
3922 Occasionally this really doesn't work, in which case you can split the
3923 class like this:
3924
3925
3926 <programlisting>
3927 class CollE s where
3928 empty :: s
3929
3930 class CollE s => Coll s a where
3931 insert :: s -> a -> s
3932 </programlisting>
3933 </para>
3934 </sect3>
3935
3936
3937 <sect3>
3938 <title>Background on functional dependencies</title>
3939
3940 <para>The following description of the motivation and use of functional dependencies is taken
3941 from the Hugs user manual, reproduced here (with minor changes) by kind
3942 permission of Mark Jones.
3943 </para>
3944 <para>
3945 Consider the following class, intended as part of a
3946 library for collection types:
3947 <programlisting>
3948 class Collects e ce where
3949 empty :: ce
3950 insert :: e -> ce -> ce
3951 member :: e -> ce -> Bool
3952 </programlisting>
3953 The type variable e used here represents the element type, while ce is the type
3954 of the container itself. Within this framework, we might want to define
3955 instances of this class for lists or characteristic functions (both of which
3956 can be used to represent collections of any equality type), bit sets (which can
3957 be used to represent collections of characters), or hash tables (which can be
3958 used to represent any collection whose elements have a hash function). Omitting
3959 standard implementation details, this would lead to the following declarations:
3960 <programlisting>
3961 instance Eq e => Collects e [e] where ...
3962 instance Eq e => Collects e (e -> Bool) where ...
3963 instance Collects Char BitSet where ...
3964 instance (Hashable e, Collects a ce)
3965 => Collects e (Array Int ce) where ...
3966 </programlisting>
3967 All this looks quite promising; we have a class and a range of interesting
3968 implementations. Unfortunately, there are some serious problems with the class
3969 declaration. First, the empty function has an ambiguous type:
3970 <programlisting>
3971 empty :: Collects e ce => ce
3972 </programlisting>
3973 By "ambiguous" we mean that there is a type variable e that appears on the left
3974 of the <literal>=&gt;</literal> symbol, but not on the right. The problem with
3975 this is that, according to the theoretical foundations of Haskell overloading,
3976 we cannot guarantee a well-defined semantics for any term with an ambiguous
3977 type.
3978 </para>
3979 <para>
3980 We can sidestep this specific problem by removing the empty member from the
3981 class declaration. However, although the remaining members, insert and member,
3982 do not have ambiguous types, we still run into problems when we try to use
3983 them. For example, consider the following two functions:
3984 <programlisting>
3985 f x y = insert x . insert y
3986 g = f True 'a'
3987 </programlisting>
3988 for which GHC infers the following types:
3989 <programlisting>
3990 f :: (Collects a c, Collects b c) => a -> b -> c -> c
3991 g :: (Collects Bool c, Collects Char c) => c -> c
3992 </programlisting>
3993 Notice that the type for f allows the two parameters x and y to be assigned
3994 different types, even though it attempts to insert each of the two values, one
3995 after the other, into the same collection. If we're trying to model collections
3996 that contain only one type of value, then this is clearly an inaccurate
3997 type. Worse still, the definition for g is accepted, without causing a type
3998 error. As a result, the error in this code will not be flagged at the point
3999 where it appears. Instead, it will show up only when we try to use g, which
4000 might even be in a different module.
4001 </para>
4002
4003 <sect4><title>An attempt to use constructor classes</title>
4004
4005 <para>
4006 Faced with the problems described above, some Haskell programmers might be
4007 tempted to use something like the following version of the class declaration:
4008 <programlisting>
4009 class Collects e c where
4010 empty :: c e
4011 insert :: e -> c e -> c e
4012 member :: e -> c e -> Bool
4013 </programlisting>
4014 The key difference here is that we abstract over the type constructor c that is
4015 used to form the collection type c e, and not over that collection type itself,
4016 represented by ce in the original class declaration. This avoids the immediate
4017 problems that we mentioned above: empty has type <literal>Collects e c => c
4018 e</literal>, which is not ambiguous.
4019 </para>
4020 <para>
4021 The function f from the previous section has a more accurate type:
4022 <programlisting>
4023 f :: (Collects e c) => e -> e -> c e -> c e
4024 </programlisting>
4025 The function g from the previous section is now rejected with a type error as
4026 we would hope because the type of f does not allow the two arguments to have
4027 different types.
4028 This, then, is an example of a multiple parameter class that does actually work
4029 quite well in practice, without ambiguity problems.
4030 There is, however, a catch. This version of the Collects class is nowhere near
4031 as general as the original class seemed to be: only one of the four instances
4032 for <literal>Collects</literal>
4033 given above can be used with this version of Collects because only one of
4034 them---the instance for lists---has a collection type that can be written in
4035 the form c e, for some type constructor c, and element type e.
4036 </para>
4037 </sect4>
4038
4039 <sect4><title>Adding functional dependencies</title>
4040
4041 <para>
4042 To get a more useful version of the Collects class, Hugs provides a mechanism
4043 that allows programmers to specify dependencies between the parameters of a
4044 multiple parameter class (For readers with an interest in theoretical
4045 foundations and previous work: The use of dependency information can be seen
4046 both as a generalization of the proposal for `parametric type classes' that was
4047 put forward by Chen, Hudak, and Odersky, or as a special case of Mark Jones's
4048 later framework for "improvement" of qualified types. The
4049 underlying ideas are also discussed in a more theoretical and abstract setting
4050 in a manuscript [implparam], where they are identified as one point in a
4051 general design space for systems of implicit parameterization.).
4052
4053 To start with an abstract example, consider a declaration such as:
4054 <programlisting>
4055 class C a b where ...
4056 </programlisting>
4057 which tells us simply that C can be thought of as a binary relation on types
4058 (or type constructors, depending on the kinds of a and b). Extra clauses can be
4059 included in the definition of classes to add information about dependencies
4060 between parameters, as in the following examples:
4061 <programlisting>
4062 class D a b | a -> b where ...
4063 class E a b | a -> b, b -> a where ...
4064 </programlisting>
4065 The notation <literal>a -&gt; b</literal> used here between the | and where
4066 symbols --- not to be
4067 confused with a function type --- indicates that the a parameter uniquely
4068 determines the b parameter, and might be read as "a determines b." Thus D is
4069 not just a relation, but actually a (partial) function. Similarly, from the two
4070 dependencies that are included in the definition of E, we can see that E
4071 represents a (partial) one-one mapping between types.
4072 </para>
4073 <para>
4074 More generally, dependencies take the form <literal>x1 ... xn -&gt; y1 ... ym</literal>,
4075 where x1, ..., xn, and y1, ..., yn are type variables with n&gt;0 and
4076 m&gt;=0, meaning that the y parameters are uniquely determined by the x
4077 parameters. Spaces can be used as separators if more than one variable appears
4078 on any single side of a dependency, as in <literal>t -&gt; a b</literal>. Note that a class may be
4079 annotated with multiple dependencies using commas as separators, as in the
4080 definition of E above. Some dependencies that we can write in this notation are
4081 redundant, and will be rejected because they don't serve any useful
4082 purpose, and may instead indicate an error in the program. Examples of
4083 dependencies like this include <literal>a -&gt; a </literal>,
4084 <literal>a -&gt; a a </literal>,
4085 <literal>a -&gt; </literal>, etc. There can also be
4086 some redundancy if multiple dependencies are given, as in
4087 <literal>a-&gt;b</literal>,
4088 <literal>b-&gt;c </literal>, <literal>a-&gt;c </literal>, and
4089 in which some subset implies the remaining dependencies. Examples like this are
4090 not treated as errors. Note that dependencies appear only in class
4091 declarations, and not in any other part of the language. In particular, the
4092 syntax for instance declarations, class constraints, and types is completely
4093 unchanged.
4094 </para>
4095 <para>
4096 By including dependencies in a class declaration, we provide a mechanism for
4097 the programmer to specify each multiple parameter class more precisely. The
4098 compiler, on the other hand, is responsible for ensuring that the set of
4099 instances that are in scope at any given point in the program is consistent
4100 with any declared dependencies. For example, the following pair of instance
4101<