c941df1b6ead605b55fefa54d11f70e8c65c2eb8
[ghc.git] / docs / users_guide / glasgow_exts.xml
1 <?xml version="1.0" encoding="iso-8859-1"?>
2 <para>
3 <indexterm><primary>language, GHC</primary></indexterm>
4 <indexterm><primary>extensions, GHC</primary></indexterm>
5 As with all known Haskell systems, GHC implements some extensions to
6 the language. They can all be enabled or disabled by commandline flags
7 or language pragmas. By default GHC understands the most recent Haskell
8 version it supports, plus a handful of extensions.
9 </para>
10
11 <para>
12 Some of the Glasgow extensions serve to give you access to the
13 underlying facilities with which we implement Haskell. Thus, you can
14 get at the Raw Iron, if you are willing to write some non-portable
15 code at a more primitive level. You need not be &ldquo;stuck&rdquo;
16 on performance because of the implementation costs of Haskell's
17 &ldquo;high-level&rdquo; features&mdash;you can always code
18 &ldquo;under&rdquo; them. In an extreme case, you can write all your
19 time-critical code in C, and then just glue it together with Haskell!
20 </para>
21
22 <para>
23 Before you get too carried away working at the lowest level (e.g.,
24 sloshing <literal>MutableByteArray&num;</literal>s around your
25 program), you may wish to check if there are libraries that provide a
26 &ldquo;Haskellised veneer&rdquo; over the features you want. The
27 separate <ulink url="../libraries/index.html">libraries
28 documentation</ulink> describes all the libraries that come with GHC.
29 </para>
30
31 <!-- LANGUAGE OPTIONS -->
32 <sect1 id="options-language">
33 <title>Language options</title>
34
35 <indexterm><primary>language</primary><secondary>option</secondary>
36 </indexterm>
37 <indexterm><primary>options</primary><secondary>language</secondary>
38 </indexterm>
39 <indexterm><primary>extensions</primary><secondary>options controlling</secondary>
40 </indexterm>
41
42 <para>The language option flags control what variation of the language are
43 permitted.</para>
44
45 <para>Language options can be controlled in two ways:
46 <itemizedlist>
47 <listitem><para>Every language option can switched on by a command-line flag "<option>-X...</option>"
48 (e.g. <option>-XTemplateHaskell</option>), and switched off by the flag "<option>-XNo...</option>";
49 (e.g. <option>-XNoTemplateHaskell</option>).</para></listitem>
50 <listitem><para>
51 Language options recognised by Cabal can also be enabled using the <literal>LANGUAGE</literal> pragma,
52 thus <literal>{-# LANGUAGE TemplateHaskell #-}</literal> (see <xref linkend="language-pragma"/>). </para>
53 </listitem>
54 </itemizedlist></para>
55
56 <para>The flag <option>-fglasgow-exts</option>
57 <indexterm><primary><option>-fglasgow-exts</option></primary></indexterm>
58 is equivalent to enabling the following extensions:
59 &what_glasgow_exts_does;
60 Enabling these options is the <emphasis>only</emphasis>
61 effect of <option>-fglasgow-exts</option>.
62 We are trying to move away from this portmanteau flag,
63 and towards enabling features individually.</para>
64
65 </sect1>
66
67 <!-- UNBOXED TYPES AND PRIMITIVE OPERATIONS -->
68 <sect1 id="primitives">
69 <title>Unboxed types and primitive operations</title>
70
71 <para>GHC is built on a raft of primitive data types and operations;
72 "primitive" in the sense that they cannot be defined in Haskell itself.
73 While you really can use this stuff to write fast code,
74 we generally find it a lot less painful, and more satisfying in the
75 long run, to use higher-level language features and libraries. With
76 any luck, the code you write will be optimised to the efficient
77 unboxed version in any case. And if it isn't, we'd like to know
78 about it.</para>
79
80 <para>All these primitive data types and operations are exported by the
81 library <literal>GHC.Prim</literal>, for which there is
82 <ulink url="&libraryGhcPrimLocation;/GHC-Prim.html">detailed online documentation</ulink>.
83 (This documentation is generated from the file <filename>compiler/prelude/primops.txt.pp</filename>.)
84 </para>
85 <para>
86 If you want to mention any of the primitive data types or operations in your
87 program, you must first import <literal>GHC.Prim</literal> to bring them
88 into scope. Many of them have names ending in "&num;", and to mention such
89 names you need the <option>-XMagicHash</option> extension (<xref linkend="magic-hash"/>).
90 </para>
91
92 <para>The primops make extensive use of <link linkend="glasgow-unboxed">unboxed types</link>
93 and <link linkend="unboxed-tuples">unboxed tuples</link>, which
94 we briefly summarise here. </para>
95
96 <sect2 id="glasgow-unboxed">
97 <title>Unboxed types
98 </title>
99
100 <para>
101 <indexterm><primary>Unboxed types (Glasgow extension)</primary></indexterm>
102 </para>
103
104 <para>Most types in GHC are <firstterm>boxed</firstterm>, which means
105 that values of that type are represented by a pointer to a heap
106 object. The representation of a Haskell <literal>Int</literal>, for
107 example, is a two-word heap object. An <firstterm>unboxed</firstterm>
108 type, however, is represented by the value itself, no pointers or heap
109 allocation are involved.
110 </para>
111
112 <para>
113 Unboxed types correspond to the &ldquo;raw machine&rdquo; types you
114 would use in C: <literal>Int&num;</literal> (long int),
115 <literal>Double&num;</literal> (double), <literal>Addr&num;</literal>
116 (void *), etc. The <emphasis>primitive operations</emphasis>
117 (PrimOps) on these types are what you might expect; e.g.,
118 <literal>(+&num;)</literal> is addition on
119 <literal>Int&num;</literal>s, and is the machine-addition that we all
120 know and love&mdash;usually one instruction.
121 </para>
122
123 <para>
124 Primitive (unboxed) types cannot be defined in Haskell, and are
125 therefore built into the language and compiler. Primitive types are
126 always unlifted; that is, a value of a primitive type cannot be
127 bottom. We use the convention (but it is only a convention)
128 that primitive types, values, and
129 operations have a <literal>&num;</literal> suffix (see <xref linkend="magic-hash"/>).
130 For some primitive types we have special syntax for literals, also
131 described in the <link linkend="magic-hash">same section</link>.
132 </para>
133
134 <para>
135 Primitive values are often represented by a simple bit-pattern, such
136 as <literal>Int&num;</literal>, <literal>Float&num;</literal>,
137 <literal>Double&num;</literal>. But this is not necessarily the case:
138 a primitive value might be represented by a pointer to a
139 heap-allocated object. Examples include
140 <literal>Array&num;</literal>, the type of primitive arrays. A
141 primitive array is heap-allocated because it is too big a value to fit
142 in a register, and would be too expensive to copy around; in a sense,
143 it is accidental that it is represented by a pointer. If a pointer
144 represents a primitive value, then it really does point to that value:
145 no unevaluated thunks, no indirections&hellip;nothing can be at the
146 other end of the pointer than the primitive value.
147 A numerically-intensive program using unboxed types can
148 go a <emphasis>lot</emphasis> faster than its &ldquo;standard&rdquo;
149 counterpart&mdash;we saw a threefold speedup on one example.
150 </para>
151
152 <para>
153 There are some restrictions on the use of primitive types:
154 <itemizedlist>
155 <listitem><para>The main restriction
156 is that you can't pass a primitive value to a polymorphic
157 function or store one in a polymorphic data type. This rules out
158 things like <literal>[Int&num;]</literal> (i.e. lists of primitive
159 integers). The reason for this restriction is that polymorphic
160 arguments and constructor fields are assumed to be pointers: if an
161 unboxed integer is stored in one of these, the garbage collector would
162 attempt to follow it, leading to unpredictable space leaks. Or a
163 <function>seq</function> operation on the polymorphic component may
164 attempt to dereference the pointer, with disastrous results. Even
165 worse, the unboxed value might be larger than a pointer
166 (<literal>Double&num;</literal> for instance).
167 </para>
168 </listitem>
169 <listitem><para> You cannot define a newtype whose representation type
170 (the argument type of the data constructor) is an unboxed type. Thus,
171 this is illegal:
172 <programlisting>
173 newtype A = MkA Int#
174 </programlisting>
175 </para></listitem>
176 <listitem><para> You cannot bind a variable with an unboxed type
177 in a <emphasis>top-level</emphasis> binding.
178 </para></listitem>
179 <listitem><para> You cannot bind a variable with an unboxed type
180 in a <emphasis>recursive</emphasis> binding.
181 </para></listitem>
182 <listitem><para> You may bind unboxed variables in a (non-recursive,
183 non-top-level) pattern binding, but you must make any such pattern-match
184 strict. For example, rather than:
185 <programlisting>
186 data Foo = Foo Int Int#
187
188 f x = let (Foo a b, w) = ..rhs.. in ..body..
189 </programlisting>
190 you must write:
191 <programlisting>
192 data Foo = Foo Int Int#
193
194 f x = let !(Foo a b, w) = ..rhs.. in ..body..
195 </programlisting>
196 since <literal>b</literal> has type <literal>Int#</literal>.
197 </para>
198 </listitem>
199 </itemizedlist>
200 </para>
201
202 </sect2>
203
204 <sect2 id="unboxed-tuples">
205 <title>Unboxed Tuples
206 </title>
207
208 <para>
209 Unboxed tuples aren't really exported by <literal>GHC.Exts</literal>;
210 they are a syntactic extension enabled by the language flag <option>-XUnboxedTuples</option>. An
211 unboxed tuple looks like this:
212 </para>
213
214 <para>
215
216 <programlisting>
217 (# e_1, ..., e_n #)
218 </programlisting>
219
220 </para>
221
222 <para>
223 where <literal>e&lowbar;1..e&lowbar;n</literal> are expressions of any
224 type (primitive or non-primitive). The type of an unboxed tuple looks
225 the same.
226 </para>
227
228 <para>
229 Unboxed tuples are used for functions that need to return multiple
230 values, but they avoid the heap allocation normally associated with
231 using fully-fledged tuples. When an unboxed tuple is returned, the
232 components are put directly into registers or on the stack; the
233 unboxed tuple itself does not have a composite representation. Many
234 of the primitive operations listed in <literal>primops.txt.pp</literal> return unboxed
235 tuples.
236 In particular, the <literal>IO</literal> and <literal>ST</literal> monads use unboxed
237 tuples to avoid unnecessary allocation during sequences of operations.
238 </para>
239
240 <para>
241 There are some pretty stringent restrictions on the use of unboxed tuples:
242 <itemizedlist>
243 <listitem>
244
245 <para>
246 Values of unboxed tuple types are subject to the same restrictions as
247 other unboxed types; i.e. they may not be stored in polymorphic data
248 structures or passed to polymorphic functions.
249
250 </para>
251 </listitem>
252 <listitem>
253
254 <para>
255 No variable can have an unboxed tuple type, nor may a constructor or function
256 argument have an unboxed tuple type. The following are all illegal:
257 <programlisting>
258 data Foo = Foo (# Int, Int #)
259
260 f :: (# Int, Int #) -&#62; (# Int, Int #)
261 f x = x
262
263 g :: (# Int, Int #) -&#62; Int
264 g (# a,b #) = a
265
266 h x = let y = (# x,x #) in ...
267 </programlisting>
268 </para>
269 </listitem>
270 <listitem>
271 <para>
272 Unboxed tuples may not be nested. So this is illegal:
273 <programlisting>
274 f :: (# Int, (# Int, Int #), Bool #)
275 </programlisting>
276 </para>
277 </listitem>
278 </itemizedlist>
279 </para>
280 <para>
281 The typical use of unboxed tuples is simply to return multiple values,
282 binding those multiple results with a <literal>case</literal> expression, thus:
283 <programlisting>
284 f x y = (# x+1, y-1 #)
285 g x = case f x x of { (# a, b #) -&#62; a + b }
286 </programlisting>
287 You can have an unboxed tuple in a pattern binding, thus
288 <programlisting>
289 f x = let (# p,q #) = h x in ..body..
290 </programlisting>
291 If the types of <literal>p</literal> and <literal>q</literal> are not unboxed,
292 the resulting binding is lazy like any other Haskell pattern binding. The
293 above example desugars like this:
294 <programlisting>
295 f x = let t = case h x o f{ (# p,q #) -> (p,q)
296 p = fst t
297 q = snd t
298 in ..body..
299 </programlisting>
300 Indeed, the bindings can even be recursive.
301 </para>
302
303 </sect2>
304 </sect1>
305
306
307 <!-- ====================== SYNTACTIC EXTENSIONS ======================= -->
308
309 <sect1 id="syntax-extns">
310 <title>Syntactic extensions</title>
311
312 <sect2 id="unicode-syntax">
313 <title>Unicode syntax</title>
314 <para>The language
315 extension <option>-XUnicodeSyntax</option><indexterm><primary><option>-XUnicodeSyntax</option></primary></indexterm>
316 enables Unicode characters to be used to stand for certain ASCII
317 character sequences. The following alternatives are provided:</para>
318
319 <informaltable>
320 <tgroup cols="2" align="left" colsep="1" rowsep="1">
321 <thead>
322 <row>
323 <entry>ASCII</entry>
324 <entry>Unicode alternative</entry>
325 <entry>Code point</entry>
326 <entry>Name</entry>
327 </row>
328 </thead>
329
330 <!--
331 to find the DocBook entities for these characters, find
332 the Unicode code point (e.g. 0x2237), and grep for it in
333 /usr/share/sgml/docbook/xml-dtd-*/ent/* (or equivalent on
334 your system. Some of these Unicode code points don't have
335 equivalent DocBook entities.
336 -->
337
338 <tbody>
339 <row>
340 <entry><literal>::</literal></entry>
341 <entry>::</entry> <!-- no special char, apparently -->
342 <entry>0x2237</entry>
343 <entry>PROPORTION</entry>
344 </row>
345 </tbody>
346 <tbody>
347 <row>
348 <entry><literal>=&gt;</literal></entry>
349 <entry>&rArr;</entry>
350 <entry>0x21D2</entry>
351 <entry>RIGHTWARDS DOUBLE ARROW</entry>
352 </row>
353 </tbody>
354 <tbody>
355 <row>
356 <entry><literal>forall</literal></entry>
357 <entry>&forall;</entry>
358 <entry>0x2200</entry>
359 <entry>FOR ALL</entry>
360 </row>
361 </tbody>
362 <tbody>
363 <row>
364 <entry><literal>-&gt;</literal></entry>
365 <entry>&rarr;</entry>
366 <entry>0x2192</entry>
367 <entry>RIGHTWARDS ARROW</entry>
368 </row>
369 </tbody>
370 <tbody>
371 <row>
372 <entry><literal>&lt;-</literal></entry>
373 <entry>&larr;</entry>
374 <entry>0x2190</entry>
375 <entry>LEFTWARDS ARROW</entry>
376 </row>
377 </tbody>
378
379 <tbody>
380 <row>
381 <entry>-&lt;</entry>
382 <entry>&larrtl;</entry>
383 <entry>0x2919</entry>
384 <entry>LEFTWARDS ARROW-TAIL</entry>
385 </row>
386 </tbody>
387
388 <tbody>
389 <row>
390 <entry>&gt;-</entry>
391 <entry>&rarrtl;</entry>
392 <entry>0x291A</entry>
393 <entry>RIGHTWARDS ARROW-TAIL</entry>
394 </row>
395 </tbody>
396
397 <tbody>
398 <row>
399 <entry>-&lt;&lt;</entry>
400 <entry></entry>
401 <entry>0x291B</entry>
402 <entry>LEFTWARDS DOUBLE ARROW-TAIL</entry>
403 </row>
404 </tbody>
405
406 <tbody>
407 <row>
408 <entry>&gt;&gt;-</entry>
409 <entry></entry>
410 <entry>0x291C</entry>
411 <entry>RIGHTWARDS DOUBLE ARROW-TAIL</entry>
412 </row>
413 </tbody>
414
415 <tbody>
416 <row>
417 <entry>*</entry>
418 <entry>&starf;</entry>
419 <entry>0x2605</entry>
420 <entry>BLACK STAR</entry>
421 </row>
422 </tbody>
423
424 </tgroup>
425 </informaltable>
426 </sect2>
427
428 <sect2 id="magic-hash">
429 <title>The magic hash</title>
430 <para>The language extension <option>-XMagicHash</option> allows "&num;" as a
431 postfix modifier to identifiers. Thus, "x&num;" is a valid variable, and "T&num;" is
432 a valid type constructor or data constructor.</para>
433
434 <para>The hash sign does not change semantics at all. We tend to use variable
435 names ending in "&num;" for unboxed values or types (e.g. <literal>Int&num;</literal>),
436 but there is no requirement to do so; they are just plain ordinary variables.
437 Nor does the <option>-XMagicHash</option> extension bring anything into scope.
438 For example, to bring <literal>Int&num;</literal> into scope you must
439 import <literal>GHC.Prim</literal> (see <xref linkend="primitives"/>);
440 the <option>-XMagicHash</option> extension
441 then allows you to <emphasis>refer</emphasis> to the <literal>Int&num;</literal>
442 that is now in scope.</para>
443 <para> The <option>-XMagicHash</option> also enables some new forms of literals (see <xref linkend="glasgow-unboxed"/>):
444 <itemizedlist>
445 <listitem><para> <literal>'x'&num;</literal> has type <literal>Char&num;</literal></para> </listitem>
446 <listitem><para> <literal>&quot;foo&quot;&num;</literal> has type <literal>Addr&num;</literal></para> </listitem>
447 <listitem><para> <literal>3&num;</literal> has type <literal>Int&num;</literal>. In general,
448 any Haskell integer lexeme followed by a <literal>&num;</literal> is an <literal>Int&num;</literal> literal, e.g.
449 <literal>-0x3A&num;</literal> as well as <literal>32&num;</literal></para>.</listitem>
450 <listitem><para> <literal>3&num;&num;</literal> has type <literal>Word&num;</literal>. In general,
451 any non-negative Haskell integer lexeme followed by <literal>&num;&num;</literal>
452 is a <literal>Word&num;</literal>. </para> </listitem>
453 <listitem><para> <literal>3.2&num;</literal> has type <literal>Float&num;</literal>.</para> </listitem>
454 <listitem><para> <literal>3.2&num;&num;</literal> has type <literal>Double&num;</literal></para> </listitem>
455 </itemizedlist>
456 </para>
457 </sect2>
458
459 <!-- ====================== HIERARCHICAL MODULES ======================= -->
460
461
462 <sect2 id="hierarchical-modules">
463 <title>Hierarchical Modules</title>
464
465 <para>GHC supports a small extension to the syntax of module
466 names: a module name is allowed to contain a dot
467 <literal>&lsquo;.&rsquo;</literal>. This is also known as the
468 &ldquo;hierarchical module namespace&rdquo; extension, because
469 it extends the normally flat Haskell module namespace into a
470 more flexible hierarchy of modules.</para>
471
472 <para>This extension has very little impact on the language
473 itself; modules names are <emphasis>always</emphasis> fully
474 qualified, so you can just think of the fully qualified module
475 name as <quote>the module name</quote>. In particular, this
476 means that the full module name must be given after the
477 <literal>module</literal> keyword at the beginning of the
478 module; for example, the module <literal>A.B.C</literal> must
479 begin</para>
480
481 <programlisting>module A.B.C</programlisting>
482
483
484 <para>It is a common strategy to use the <literal>as</literal>
485 keyword to save some typing when using qualified names with
486 hierarchical modules. For example:</para>
487
488 <programlisting>
489 import qualified Control.Monad.ST.Strict as ST
490 </programlisting>
491
492 <para>For details on how GHC searches for source and interface
493 files in the presence of hierarchical modules, see <xref
494 linkend="search-path"/>.</para>
495
496 <para>GHC comes with a large collection of libraries arranged
497 hierarchically; see the accompanying <ulink
498 url="../libraries/index.html">library
499 documentation</ulink>. More libraries to install are available
500 from <ulink
501 url="http://hackage.haskell.org/packages/hackage.html">HackageDB</ulink>.</para>
502 </sect2>
503
504 <!-- ====================== PATTERN GUARDS ======================= -->
505
506 <sect2 id="pattern-guards">
507 <title>Pattern guards</title>
508
509 <para>
510 <indexterm><primary>Pattern guards (Glasgow extension)</primary></indexterm>
511 The discussion that follows is an abbreviated version of Simon Peyton Jones's original <ulink url="http://research.microsoft.com/~simonpj/Haskell/guards.html">proposal</ulink>. (Note that the proposal was written before pattern guards were implemented, so refers to them as unimplemented.)
512 </para>
513
514 <para>
515 Suppose we have an abstract data type of finite maps, with a
516 lookup operation:
517
518 <programlisting>
519 lookup :: FiniteMap -> Int -> Maybe Int
520 </programlisting>
521
522 The lookup returns <function>Nothing</function> if the supplied key is not in the domain of the mapping, and <function>(Just v)</function> otherwise,
523 where <varname>v</varname> is the value that the key maps to. Now consider the following definition:
524 </para>
525
526 <programlisting>
527 clunky env var1 var2 | ok1 &amp;&amp; ok2 = val1 + val2
528 | otherwise = var1 + var2
529 where
530 m1 = lookup env var1
531 m2 = lookup env var2
532 ok1 = maybeToBool m1
533 ok2 = maybeToBool m2
534 val1 = expectJust m1
535 val2 = expectJust m2
536 </programlisting>
537
538 <para>
539 The auxiliary functions are
540 </para>
541
542 <programlisting>
543 maybeToBool :: Maybe a -&gt; Bool
544 maybeToBool (Just x) = True
545 maybeToBool Nothing = False
546
547 expectJust :: Maybe a -&gt; a
548 expectJust (Just x) = x
549 expectJust Nothing = error "Unexpected Nothing"
550 </programlisting>
551
552 <para>
553 What is <function>clunky</function> doing? The guard <literal>ok1 &amp;&amp;
554 ok2</literal> checks that both lookups succeed, using
555 <function>maybeToBool</function> to convert the <function>Maybe</function>
556 types to booleans. The (lazily evaluated) <function>expectJust</function>
557 calls extract the values from the results of the lookups, and binds the
558 returned values to <varname>val1</varname> and <varname>val2</varname>
559 respectively. If either lookup fails, then clunky takes the
560 <literal>otherwise</literal> case and returns the sum of its arguments.
561 </para>
562
563 <para>
564 This is certainly legal Haskell, but it is a tremendously verbose and
565 un-obvious way to achieve the desired effect. Arguably, a more direct way
566 to write clunky would be to use case expressions:
567 </para>
568
569 <programlisting>
570 clunky env var1 var2 = case lookup env var1 of
571 Nothing -&gt; fail
572 Just val1 -&gt; case lookup env var2 of
573 Nothing -&gt; fail
574 Just val2 -&gt; val1 + val2
575 where
576 fail = var1 + var2
577 </programlisting>
578
579 <para>
580 This is a bit shorter, but hardly better. Of course, we can rewrite any set
581 of pattern-matching, guarded equations as case expressions; that is
582 precisely what the compiler does when compiling equations! The reason that
583 Haskell provides guarded equations is because they allow us to write down
584 the cases we want to consider, one at a time, independently of each other.
585 This structure is hidden in the case version. Two of the right-hand sides
586 are really the same (<function>fail</function>), and the whole expression
587 tends to become more and more indented.
588 </para>
589
590 <para>
591 Here is how I would write clunky:
592 </para>
593
594 <programlisting>
595 clunky env var1 var2
596 | Just val1 &lt;- lookup env var1
597 , Just val2 &lt;- lookup env var2
598 = val1 + val2
599 ...other equations for clunky...
600 </programlisting>
601
602 <para>
603 The semantics should be clear enough. The qualifiers are matched in order.
604 For a <literal>&lt;-</literal> qualifier, which I call a pattern guard, the
605 right hand side is evaluated and matched against the pattern on the left.
606 If the match fails then the whole guard fails and the next equation is
607 tried. If it succeeds, then the appropriate binding takes place, and the
608 next qualifier is matched, in the augmented environment. Unlike list
609 comprehensions, however, the type of the expression to the right of the
610 <literal>&lt;-</literal> is the same as the type of the pattern to its
611 left. The bindings introduced by pattern guards scope over all the
612 remaining guard qualifiers, and over the right hand side of the equation.
613 </para>
614
615 <para>
616 Just as with list comprehensions, boolean expressions can be freely mixed
617 with among the pattern guards. For example:
618 </para>
619
620 <programlisting>
621 f x | [y] &lt;- x
622 , y > 3
623 , Just z &lt;- h y
624 = ...
625 </programlisting>
626
627 <para>
628 Haskell's current guards therefore emerge as a special case, in which the
629 qualifier list has just one element, a boolean expression.
630 </para>
631 </sect2>
632
633 <!-- ===================== View patterns =================== -->
634
635 <sect2 id="view-patterns">
636 <title>View patterns
637 </title>
638
639 <para>
640 View patterns are enabled by the flag <literal>-XViewPatterns</literal>.
641 More information and examples of view patterns can be found on the
642 <ulink url="http://hackage.haskell.org/trac/ghc/wiki/ViewPatterns">Wiki
643 page</ulink>.
644 </para>
645
646 <para>
647 View patterns are somewhat like pattern guards that can be nested inside
648 of other patterns. They are a convenient way of pattern-matching
649 against values of abstract types. For example, in a programming language
650 implementation, we might represent the syntax of the types of the
651 language as follows:
652
653 <programlisting>
654 type Typ
655
656 data TypView = Unit
657 | Arrow Typ Typ
658
659 view :: Type -> TypeView
660
661 -- additional operations for constructing Typ's ...
662 </programlisting>
663
664 The representation of Typ is held abstract, permitting implementations
665 to use a fancy representation (e.g., hash-consing to manage sharing).
666
667 Without view patterns, using this signature a little inconvenient:
668 <programlisting>
669 size :: Typ -> Integer
670 size t = case view t of
671 Unit -> 1
672 Arrow t1 t2 -> size t1 + size t2
673 </programlisting>
674
675 It is necessary to iterate the case, rather than using an equational
676 function definition. And the situation is even worse when the matching
677 against <literal>t</literal> is buried deep inside another pattern.
678 </para>
679
680 <para>
681 View patterns permit calling the view function inside the pattern and
682 matching against the result:
683 <programlisting>
684 size (view -> Unit) = 1
685 size (view -> Arrow t1 t2) = size t1 + size t2
686 </programlisting>
687
688 That is, we add a new form of pattern, written
689 <replaceable>expression</replaceable> <literal>-></literal>
690 <replaceable>pattern</replaceable> that means "apply the expression to
691 whatever we're trying to match against, and then match the result of
692 that application against the pattern". The expression can be any Haskell
693 expression of function type, and view patterns can be used wherever
694 patterns are used.
695 </para>
696
697 <para>
698 The semantics of a pattern <literal>(</literal>
699 <replaceable>exp</replaceable> <literal>-></literal>
700 <replaceable>pat</replaceable> <literal>)</literal> are as follows:
701
702 <itemizedlist>
703
704 <listitem> Scoping:
705
706 <para>The variables bound by the view pattern are the variables bound by
707 <replaceable>pat</replaceable>.
708 </para>
709
710 <para>
711 Any variables in <replaceable>exp</replaceable> are bound occurrences,
712 but variables bound "to the left" in a pattern are in scope. This
713 feature permits, for example, one argument to a function to be used in
714 the view of another argument. For example, the function
715 <literal>clunky</literal> from <xref linkend="pattern-guards" /> can be
716 written using view patterns as follows:
717
718 <programlisting>
719 clunky env (lookup env -> Just val1) (lookup env -> Just val2) = val1 + val2
720 ...other equations for clunky...
721 </programlisting>
722 </para>
723
724 <para>
725 More precisely, the scoping rules are:
726 <itemizedlist>
727 <listitem>
728 <para>
729 In a single pattern, variables bound by patterns to the left of a view
730 pattern expression are in scope. For example:
731 <programlisting>
732 example :: Maybe ((String -> Integer,Integer), String) -> Bool
733 example Just ((f,_), f -> 4) = True
734 </programlisting>
735
736 Additionally, in function definitions, variables bound by matching earlier curried
737 arguments may be used in view pattern expressions in later arguments:
738 <programlisting>
739 example :: (String -> Integer) -> String -> Bool
740 example f (f -> 4) = True
741 </programlisting>
742 That is, the scoping is the same as it would be if the curried arguments
743 were collected into a tuple.
744 </para>
745 </listitem>
746
747 <listitem>
748 <para>
749 In mutually recursive bindings, such as <literal>let</literal>,
750 <literal>where</literal>, or the top level, view patterns in one
751 declaration may not mention variables bound by other declarations. That
752 is, each declaration must be self-contained. For example, the following
753 program is not allowed:
754 <programlisting>
755 let {(x -> y) = e1 ;
756 (y -> x) = e2 } in x
757 </programlisting>
758
759 (For some amplification on this design choice see
760 <ulink url="http://hackage.haskell.org/trac/ghc/ticket/4061">Trac #4061</ulink>.)
761
762 </para>
763 </listitem>
764 </itemizedlist>
765
766 </para>
767 </listitem>
768
769 <listitem><para> Typing: If <replaceable>exp</replaceable> has type
770 <replaceable>T1</replaceable> <literal>-></literal>
771 <replaceable>T2</replaceable> and <replaceable>pat</replaceable> matches
772 a <replaceable>T2</replaceable>, then the whole view pattern matches a
773 <replaceable>T1</replaceable>.
774 </para></listitem>
775
776 <listitem><para> Matching: To the equations in Section 3.17.3 of the
777 <ulink url="http://www.haskell.org/onlinereport/">Haskell 98
778 Report</ulink>, add the following:
779 <programlisting>
780 case v of { (e -> p) -> e1 ; _ -> e2 }
781 =
782 case (e v) of { p -> e1 ; _ -> e2 }
783 </programlisting>
784 That is, to match a variable <replaceable>v</replaceable> against a pattern
785 <literal>(</literal> <replaceable>exp</replaceable>
786 <literal>-></literal> <replaceable>pat</replaceable>
787 <literal>)</literal>, evaluate <literal>(</literal>
788 <replaceable>exp</replaceable> <replaceable> v</replaceable>
789 <literal>)</literal> and match the result against
790 <replaceable>pat</replaceable>.
791 </para></listitem>
792
793 <listitem><para> Efficiency: When the same view function is applied in
794 multiple branches of a function definition or a case expression (e.g.,
795 in <literal>size</literal> above), GHC makes an attempt to collect these
796 applications into a single nested case expression, so that the view
797 function is only applied once. Pattern compilation in GHC follows the
798 matrix algorithm described in Chapter 4 of <ulink
799 url="http://research.microsoft.com/~simonpj/Papers/slpj-book-1987/">The
800 Implementation of Functional Programming Languages</ulink>. When the
801 top rows of the first column of a matrix are all view patterns with the
802 "same" expression, these patterns are transformed into a single nested
803 case. This includes, for example, adjacent view patterns that line up
804 in a tuple, as in
805 <programlisting>
806 f ((view -> A, p1), p2) = e1
807 f ((view -> B, p3), p4) = e2
808 </programlisting>
809 </para>
810
811 <para> The current notion of when two view pattern expressions are "the
812 same" is very restricted: it is not even full syntactic equality.
813 However, it does include variables, literals, applications, and tuples;
814 e.g., two instances of <literal>view ("hi", "there")</literal> will be
815 collected. However, the current implementation does not compare up to
816 alpha-equivalence, so two instances of <literal>(x, view x ->
817 y)</literal> will not be coalesced.
818 </para>
819
820 </listitem>
821
822 </itemizedlist>
823 </para>
824
825 </sect2>
826
827 <!-- ===================== n+k patterns =================== -->
828
829 <sect2 id="n-k-patterns">
830 <title>n+k patterns</title>
831 <indexterm><primary><option>-XNPlusKPatterns</option></primary></indexterm>
832
833 <para>
834 <literal>n+k</literal> pattern support is disabled by default. To enable
835 it, you can use the <option>-XNPlusKPatterns</option> flag.
836 </para>
837
838 </sect2>
839
840 <!-- ===================== Traditional record syntax =================== -->
841
842 <sect2 id="traditional-record-syntax">
843 <title>Traditional record syntax</title>
844 <indexterm><primary><option>-XNoTraditionalRecordSyntax</option></primary></indexterm>
845
846 <para>
847 Traditional record syntax, such as <literal>C {f = x}</literal>, is enabled by default.
848 To disable it, you can use the <option>-XNoTraditionalRecordSyntax</option> flag.
849 </para>
850
851 </sect2>
852
853 <!-- ===================== Recursive do-notation =================== -->
854
855 <sect2 id="recursive-do-notation">
856 <title>The recursive do-notation
857 </title>
858
859 <para>
860 The do-notation of Haskell 98 does not allow <emphasis>recursive bindings</emphasis>,
861 that is, the variables bound in a do-expression are visible only in the textually following
862 code block. Compare this to a let-expression, where bound variables are visible in the entire binding
863 group.
864 </para>
865
866 <para>
867 It turns out that such recursive bindings do indeed make sense for a variety of monads, but
868 not all. In particular, recursion in this sense requires a fixed-point operator for the underlying
869 monad, captured by the <literal>mfix</literal> method of the <literal>MonadFix</literal> class, defined in <literal>Control.Monad.Fix</literal> as follows:
870 <programlisting>
871 class Monad m => MonadFix m where
872 mfix :: (a -> m a) -> m a
873 </programlisting>
874 Haskell's
875 <literal>Maybe</literal>, <literal>[]</literal> (list), <literal>ST</literal> (both strict and lazy versions),
876 <literal>IO</literal>, and many other monads have <literal>MonadFix</literal> instances. On the negative
877 side, the continuation monad, with the signature <literal>(a -> r) -> r</literal>, does not.
878 </para>
879
880 <para>
881 For monads that do belong to the <literal>MonadFix</literal> class, GHC provides
882 an extended version of the do-notation that allows recursive bindings.
883 The <option>-XRecursiveDo</option> (language pragma: <literal>RecursiveDo</literal>)
884 provides the necessary syntactic support, introducing the keywords <literal>mdo</literal> and
885 <literal>rec</literal> for higher and lower levels of the notation respectively. Unlike
886 bindings in a <literal>do</literal> expression, those introduced by <literal>mdo</literal> and <literal>rec</literal>
887 are recursively defined, much like in an ordinary let-expression. Due to the new
888 keyword <literal>mdo</literal>, we also call this notation the <emphasis>mdo-notation</emphasis>.
889 </para>
890
891 <para>
892 Here is a simple (albeit contrived) example:
893 <programlisting>
894 {-# LANGUAGE RecursiveDo #-}
895 justOnes = mdo { xs &lt;- Just (1:xs)
896 ; return (map negate xs) }
897 </programlisting>
898 or equivalently
899 <programlisting>
900 {-# LANGUAGE RecursiveDo #-}
901 justOnes = do { rec { xs &lt;- Just (1:xs) }
902 ; return (map negate xs) }
903 </programlisting>
904 As you can guess <literal>justOnes</literal> will evaluate to <literal>Just [-1,-1,-1,...</literal>.
905 </para>
906
907 <para>
908 GHC's implementation the mdo-notation closely follows the original translation as described in the paper
909 <ulink url="https://sites.google.com/site/leventerkok/recdo.pdf">A recursive do for Haskell</ulink>, which
910 in turn is based on the work <ulink url="http://sites.google.com/site/leventerkok/erkok-thesis.pdf">Value Recursion
911 in Monadic Computations</ulink>. Furthermore, GHC extends the syntax described in the former paper
912 with a lower level syntax flagged by the <literal>rec</literal> keyword, as we describe next.
913 </para>
914
915 <sect3>
916 <title>Recursive binding groups</title>
917
918 <para>
919 The flag <option>-XRecursiveDo</option> also introduces a new keyword <literal>rec</literal>, which wraps a
920 mutually-recursive group of monadic statements inside a <literal>do</literal> expression, producing a single statement.
921 Similar to a <literal>let</literal> statement inside a <literal>do</literal>, variables bound in
922 the <literal>rec</literal> are visible throughout the <literal>rec</literal> group, and below it. For example, compare
923 <programlisting>
924 do { a &lt;- getChar do { a &lt;- getChar
925 ; let { r1 = f a r2 ; rec { r1 &lt;- f a r2
926 ; ; r2 = g r1 } ; ; r2 &lt;- g r1 }
927 ; return (r1 ++ r2) } ; return (r1 ++ r2) }
928 </programlisting>
929 In both cases, <literal>r1</literal> and <literal>r2</literal> are available both throughout
930 the <literal>let</literal> or <literal>rec</literal> block, and in the statements that follow it.
931 The difference is that <literal>let</literal> is non-monadic, while <literal>rec</literal> is monadic.
932 (In Haskell <literal>let</literal> is really <literal>letrec</literal>, of course.)
933 </para>
934
935 <para>
936 The semantics of <literal>rec</literal> is fairly straightforward. Whenever GHC finds a <literal>rec</literal>
937 group, it will compute its set of bound variables, and will introduce an appropriate call
938 to the underlying monadic value-recursion operator <literal>mfix</literal>, belonging to the
939 <literal>MonadFix</literal> class. Here is an example:
940 <programlisting>
941 rec { b &lt;- f a c ===> (b,c) &lt;- mfix (\~(b,c) -> do { b &lt;- f a c
942 ; c &lt;- f b a } ; c &lt;- f b a
943 ; return (b,c) })
944 </programlisting>
945 As usual, the meta-variables <literal>b</literal>, <literal>c</literal> etc., can be arbitrary patterns.
946 In general, the statement <literal>rec <replaceable>ss</replaceable></literal> is desugared to the statement
947 <programlisting>
948 <replaceable>vs</replaceable> &lt;- mfix (\~<replaceable>vs</replaceable> -&gt; do { <replaceable>ss</replaceable>; return <replaceable>vs</replaceable> })
949 </programlisting>
950 where <replaceable>vs</replaceable> is a tuple of the variables bound by <replaceable>ss</replaceable>.
951 </para>
952
953 <para>
954 Note in particular that the translation for a <literal>rec</literal> block only involves wrapping a call
955 to <literal>mfix</literal>: it performs no other analysis on the bindings. The latter is the task
956 for the <literal>mdo</literal> notation, which is described next.
957 </para>
958 </sect3>
959
960 <sect3>
961 <title>The <literal>mdo</literal> notation</title>
962
963 <para>
964 A <literal>rec</literal>-block tells the compiler where precisely the recursive knot should be tied. It turns out that
965 the placement of the recursive knots can be rather delicate: in particular, we would like the knots to be wrapped
966 around as minimal groups as possible. This process is known as <emphasis>segmentation</emphasis>, and is described
967 in detail in Secton 3.2 of <ulink url="https://sites.google.com/site/leventerkok/recdo.pdf">A recursive do for
968 Haskell</ulink>. Segmentation improves polymorphism and reduces the size of the recursive knot. Most importantly, it avoids
969 unnecessary interference caused by a fundamental issue with the so-called <emphasis>right-shrinking</emphasis>
970 axiom for monadic recursion. In brief, most monads of interest (IO, strict state, etc.) do <emphasis>not</emphasis>
971 have recursion operators that satisfy this axiom, and thus not performing segmentation can cause unnecessary
972 interference, changing the termination behavior of the resulting translation.
973 (Details can be found in Sections 3.1 and 7.2.2 of
974 <ulink url="http://sites.google.com/site/leventerkok/erkok-thesis.pdf">Value Recursion in Monadic Computations</ulink>.)
975 </para>
976
977 <para>
978 The <literal>mdo</literal> notation removes the burden of placing
979 explicit <literal>rec</literal> blocks in the code. Unlike an
980 ordinary <literal>do</literal> expression, in which variables bound by
981 statements are only in scope for later statements, variables bound in
982 an <literal>mdo</literal> expression are in scope for all statements
983 of the expression. The compiler then automatically identifies minimal
984 mutually recursively dependent segments of statements, treating them as
985 if the user had wrapped a <literal>rec</literal> qualifier around them.
986 </para>
987
988 <para>
989 The definition is syntactic:
990 </para>
991 <itemizedlist>
992 <listitem>
993 <para>
994 A generator <replaceable>g</replaceable>
995 <emphasis>depends</emphasis> on a textually following generator
996 <replaceable>g'</replaceable>, if
997 </para>
998 <itemizedlist>
999 <listitem>
1000 <para>
1001 <replaceable>g'</replaceable> defines a variable that
1002 is used by <replaceable>g</replaceable>, or
1003 </para>
1004 </listitem>
1005 <listitem>
1006 <para>
1007 <replaceable>g'</replaceable> textually appears between
1008 <replaceable>g</replaceable> and
1009 <replaceable>g''</replaceable>, where <replaceable>g</replaceable>
1010 depends on <replaceable>g''</replaceable>.
1011 </para>
1012 </listitem>
1013 </itemizedlist>
1014 </listitem>
1015 <listitem>
1016 <para>
1017 A <emphasis>segment</emphasis> of a given
1018 <literal>mdo</literal>-expression is a minimal sequence of generators
1019 such that no generator of the sequence depends on an outside
1020 generator. As a special case, although it is not a generator,
1021 the final expression in an <literal>mdo</literal>-expression is
1022 considered to form a segment by itself.
1023 </para>
1024 </listitem>
1025 </itemizedlist>
1026 <para>
1027 Segments in this sense are
1028 related to <emphasis>strongly-connected components</emphasis> analysis,
1029 with the exception that bindings in a segment cannot be reordered and
1030 must be contiguous.
1031 </para>
1032
1033 <para>
1034 Here is an example <literal>mdo</literal>-expression, and its translation to <literal>rec</literal> blocks:
1035 <programlisting>
1036 mdo { a &lt;- getChar ===> do { a &lt;- getChar
1037 ; b &lt;- f a c ; rec { b &lt;- f a c
1038 ; c &lt;- f b a ; ; c &lt;- f b a }
1039 ; z &lt;- h a b ; z &lt;- h a b
1040 ; d &lt;- g d e ; rec { d &lt;- g d e
1041 ; e &lt;- g a z ; ; e &lt;- g a z }
1042 ; putChar c } ; putChar c }
1043 </programlisting>
1044 Note that a given <literal>mdo</literal> expression can cause the creation of multiple <literal>rec</literal> blocks.
1045 If there are no recursive dependencies, <literal>mdo</literal> will introduce no <literal>rec</literal> blocks. In this
1046 latter case an <literal>mdo</literal> expression is precisely the same as a <literal>do</literal> expression, as one
1047 would expect.
1048 </para>
1049
1050 <para>
1051 In summary, given an <literal>mdo</literal> expression, GHC first performs segmentation, introducing
1052 <literal>rec</literal> blocks to wrap over minimal recursive groups. Then, each resulting
1053 <literal>rec</literal> is desugared, using a call to <literal>Control.Monad.Fix.mfix</literal> as described
1054 in the previous section. The original <literal>mdo</literal>-expression typechecks exactly when the desugared
1055 version would do so.
1056 </para>
1057
1058 <para>
1059 Here are some other important points in using the recursive-do notation:
1060
1061 <itemizedlist>
1062 <listitem>
1063 <para>
1064 It is enabled with the flag <literal>-XRecursiveDo</literal>, or the <literal>LANGUAGE RecursiveDo</literal>
1065 pragma. (The same flag enables both <literal>mdo</literal>-notation, and the use of <literal>rec</literal>
1066 blocks inside <literal>do</literal> expressions.)
1067 </para>
1068 </listitem>
1069 <listitem>
1070 <para>
1071 <literal>rec</literal> blocks can also be used inside <literal>mdo</literal>-expressions, which will be
1072 treated as a single statement. However, it is good style to either use <literal>mdo</literal> or
1073 <literal>rec</literal> blocks in a single expression.
1074 </para>
1075 </listitem>
1076 <listitem>
1077 <para>
1078 If recursive bindings are required for a monad, then that monad must be declared an instance of
1079 the <literal>MonadFix</literal> class.
1080 </para>
1081 </listitem>
1082 <listitem>
1083 <para>
1084 The following instances of <literal>MonadFix</literal> are automatically provided: List, Maybe, IO.
1085 Furthermore, the <literal>Control.Monad.ST</literal> and <literal>Control.Monad.ST.Lazy</literal>
1086 modules provide the instances of the <literal>MonadFix</literal> class for Haskell's internal
1087 state monad (strict and lazy, respectively).
1088 </para>
1089 </listitem>
1090 <listitem>
1091 <para>
1092 Like <literal>let</literal> and <literal>where</literal> bindings, name shadowing is not allowed within
1093 an <literal>mdo</literal>-expression or a <literal>rec</literal>-block; that is, all the names bound in
1094 a single <literal>rec</literal> must be distinct. (GHC will complain if this is not the case.)
1095 </para>
1096 </listitem>
1097 </itemizedlist>
1098 </para>
1099 </sect3>
1100
1101
1102 </sect2>
1103
1104
1105 <!-- ===================== PARALLEL LIST COMPREHENSIONS =================== -->
1106
1107 <sect2 id="parallel-list-comprehensions">
1108 <title>Parallel List Comprehensions</title>
1109 <indexterm><primary>list comprehensions</primary><secondary>parallel</secondary>
1110 </indexterm>
1111 <indexterm><primary>parallel list comprehensions</primary>
1112 </indexterm>
1113
1114 <para>Parallel list comprehensions are a natural extension to list
1115 comprehensions. List comprehensions can be thought of as a nice
1116 syntax for writing maps and filters. Parallel comprehensions
1117 extend this to include the zipWith family.</para>
1118
1119 <para>A parallel list comprehension has multiple independent
1120 branches of qualifier lists, each separated by a `|' symbol. For
1121 example, the following zips together two lists:</para>
1122
1123 <programlisting>
1124 [ (x, y) | x &lt;- xs | y &lt;- ys ]
1125 </programlisting>
1126
1127 <para>The behaviour of parallel list comprehensions follows that of
1128 zip, in that the resulting list will have the same length as the
1129 shortest branch.</para>
1130
1131 <para>We can define parallel list comprehensions by translation to
1132 regular comprehensions. Here's the basic idea:</para>
1133
1134 <para>Given a parallel comprehension of the form: </para>
1135
1136 <programlisting>
1137 [ e | p1 &lt;- e11, p2 &lt;- e12, ...
1138 | q1 &lt;- e21, q2 &lt;- e22, ...
1139 ...
1140 ]
1141 </programlisting>
1142
1143 <para>This will be translated to: </para>
1144
1145 <programlisting>
1146 [ e | ((p1,p2), (q1,q2), ...) &lt;- zipN [(p1,p2) | p1 &lt;- e11, p2 &lt;- e12, ...]
1147 [(q1,q2) | q1 &lt;- e21, q2 &lt;- e22, ...]
1148 ...
1149 ]
1150 </programlisting>
1151
1152 <para>where `zipN' is the appropriate zip for the given number of
1153 branches.</para>
1154
1155 </sect2>
1156
1157 <!-- ===================== TRANSFORM LIST COMPREHENSIONS =================== -->
1158
1159 <sect2 id="generalised-list-comprehensions">
1160 <title>Generalised (SQL-Like) List Comprehensions</title>
1161 <indexterm><primary>list comprehensions</primary><secondary>generalised</secondary>
1162 </indexterm>
1163 <indexterm><primary>extended list comprehensions</primary>
1164 </indexterm>
1165 <indexterm><primary>group</primary></indexterm>
1166 <indexterm><primary>sql</primary></indexterm>
1167
1168
1169 <para>Generalised list comprehensions are a further enhancement to the
1170 list comprehension syntactic sugar to allow operations such as sorting
1171 and grouping which are familiar from SQL. They are fully described in the
1172 paper <ulink url="http://research.microsoft.com/~simonpj/papers/list-comp">
1173 Comprehensive comprehensions: comprehensions with "order by" and "group by"</ulink>,
1174 except that the syntax we use differs slightly from the paper.</para>
1175 <para>The extension is enabled with the flag <option>-XTransformListComp</option>.</para>
1176 <para>Here is an example:
1177 <programlisting>
1178 employees = [ ("Simon", "MS", 80)
1179 , ("Erik", "MS", 100)
1180 , ("Phil", "Ed", 40)
1181 , ("Gordon", "Ed", 45)
1182 , ("Paul", "Yale", 60)]
1183
1184 output = [ (the dept, sum salary)
1185 | (name, dept, salary) &lt;- employees
1186 , then group by dept using groupWith
1187 , then sortWith by (sum salary)
1188 , then take 5 ]
1189 </programlisting>
1190 In this example, the list <literal>output</literal> would take on
1191 the value:
1192
1193 <programlisting>
1194 [("Yale", 60), ("Ed", 85), ("MS", 180)]
1195 </programlisting>
1196 </para>
1197 <para>There are three new keywords: <literal>group</literal>, <literal>by</literal>, and <literal>using</literal>.
1198 (The functions <literal>sortWith</literal> and <literal>groupWith</literal> are not keywords; they are ordinary
1199 functions that are exported by <literal>GHC.Exts</literal>.)</para>
1200
1201 <para>There are five new forms of comprehension qualifier,
1202 all introduced by the (existing) keyword <literal>then</literal>:
1203 <itemizedlist>
1204 <listitem>
1205
1206 <programlisting>
1207 then f
1208 </programlisting>
1209
1210 This statement requires that <literal>f</literal> have the type <literal>
1211 forall a. [a] -> [a]</literal>. You can see an example of its use in the
1212 motivating example, as this form is used to apply <literal>take 5</literal>.
1213
1214 </listitem>
1215
1216
1217 <listitem>
1218 <para>
1219 <programlisting>
1220 then f by e
1221 </programlisting>
1222
1223 This form is similar to the previous one, but allows you to create a function
1224 which will be passed as the first argument to f. As a consequence f must have
1225 the type <literal>forall a. (a -> t) -> [a] -> [a]</literal>. As you can see
1226 from the type, this function lets f &quot;project out&quot; some information
1227 from the elements of the list it is transforming.</para>
1228
1229 <para>An example is shown in the opening example, where <literal>sortWith</literal>
1230 is supplied with a function that lets it find out the <literal>sum salary</literal>
1231 for any item in the list comprehension it transforms.</para>
1232
1233 </listitem>
1234
1235
1236 <listitem>
1237
1238 <programlisting>
1239 then group by e using f
1240 </programlisting>
1241
1242 <para>This is the most general of the grouping-type statements. In this form,
1243 f is required to have type <literal>forall a. (a -> t) -> [a] -> [[a]]</literal>.
1244 As with the <literal>then f by e</literal> case above, the first argument
1245 is a function supplied to f by the compiler which lets it compute e on every
1246 element of the list being transformed. However, unlike the non-grouping case,
1247 f additionally partitions the list into a number of sublists: this means that
1248 at every point after this statement, binders occurring before it in the comprehension
1249 refer to <emphasis>lists</emphasis> of possible values, not single values. To help understand
1250 this, let's look at an example:</para>
1251
1252 <programlisting>
1253 -- This works similarly to groupWith in GHC.Exts, but doesn't sort its input first
1254 groupRuns :: Eq b => (a -> b) -> [a] -> [[a]]
1255 groupRuns f = groupBy (\x y -> f x == f y)
1256
1257 output = [ (the x, y)
1258 | x &lt;- ([1..3] ++ [1..2])
1259 , y &lt;- [4..6]
1260 , then group by x using groupRuns ]
1261 </programlisting>
1262
1263 <para>This results in the variable <literal>output</literal> taking on the value below:</para>
1264
1265 <programlisting>
1266 [(1, [4, 5, 6]), (2, [4, 5, 6]), (3, [4, 5, 6]), (1, [4, 5, 6]), (2, [4, 5, 6])]
1267 </programlisting>
1268
1269 <para>Note that we have used the <literal>the</literal> function to change the type
1270 of x from a list to its original numeric type. The variable y, in contrast, is left
1271 unchanged from the list form introduced by the grouping.</para>
1272
1273 </listitem>
1274
1275 <listitem>
1276
1277 <programlisting>
1278 then group using f
1279 </programlisting>
1280
1281 <para>With this form of the group statement, f is required to simply have the type
1282 <literal>forall a. [a] -> [[a]]</literal>, which will be used to group up the
1283 comprehension so far directly. An example of this form is as follows:</para>
1284
1285 <programlisting>
1286 output = [ x
1287 | y &lt;- [1..5]
1288 , x &lt;- "hello"
1289 , then group using inits]
1290 </programlisting>
1291
1292 <para>This will yield a list containing every prefix of the word "hello" written out 5 times:</para>
1293
1294 <programlisting>
1295 ["","h","he","hel","hell","hello","helloh","hellohe","hellohel","hellohell","hellohello","hellohelloh",...]
1296 </programlisting>
1297
1298 </listitem>
1299 </itemizedlist>
1300 </para>
1301 </sect2>
1302
1303 <!-- ===================== MONAD COMPREHENSIONS ===================== -->
1304
1305 <sect2 id="monad-comprehensions">
1306 <title>Monad comprehensions</title>
1307 <indexterm><primary>monad comprehensions</primary></indexterm>
1308
1309 <para>
1310 Monad comprehensions generalise the list comprehension notation,
1311 including parallel comprehensions
1312 (<xref linkend="parallel-list-comprehensions"/>) and
1313 transform comprehensions (<xref linkend="generalised-list-comprehensions"/>)
1314 to work for any monad.
1315 </para>
1316
1317 <para>Monad comprehensions support:</para>
1318
1319 <itemizedlist>
1320 <listitem>
1321 <para>
1322 Bindings:
1323 </para>
1324
1325 <programlisting>
1326 [ x + y | x &lt;- Just 1, y &lt;- Just 2 ]
1327 </programlisting>
1328
1329 <para>
1330 Bindings are translated with the <literal>(&gt;&gt;=)</literal> and
1331 <literal>return</literal> functions to the usual do-notation:
1332 </para>
1333
1334 <programlisting>
1335 do x &lt;- Just 1
1336 y &lt;- Just 2
1337 return (x+y)
1338 </programlisting>
1339
1340 </listitem>
1341 <listitem>
1342 <para>
1343 Guards:
1344 </para>
1345
1346 <programlisting>
1347 [ x | x &lt;- [1..10], x &lt;= 5 ]
1348 </programlisting>
1349
1350 <para>
1351 Guards are translated with the <literal>guard</literal> function,
1352 which requires a <literal>MonadPlus</literal> instance:
1353 </para>
1354
1355 <programlisting>
1356 do x &lt;- [1..10]
1357 guard (x &lt;= 5)
1358 return x
1359 </programlisting>
1360
1361 </listitem>
1362 <listitem>
1363 <para>
1364 Transform statements (as with <literal>-XTransformListComp</literal>):
1365 </para>
1366
1367 <programlisting>
1368 [ x+y | x &lt;- [1..10], y &lt;- [1..x], then take 2 ]
1369 </programlisting>
1370
1371 <para>
1372 This translates to:
1373 </para>
1374
1375 <programlisting>
1376 do (x,y) &lt;- take 2 (do x &lt;- [1..10]
1377 y &lt;- [1..x]
1378 return (x,y))
1379 return (x+y)
1380 </programlisting>
1381
1382 </listitem>
1383 <listitem>
1384 <para>
1385 Group statements (as with <literal>-XTransformListComp</literal>):
1386 </para>
1387
1388 <programlisting>
1389 [ x | x &lt;- [1,1,2,2,3], then group by x using GHC.Exts.groupWith ]
1390 [ x | x &lt;- [1,1,2,2,3], then group using myGroup ]
1391 </programlisting>
1392
1393 </listitem>
1394 <listitem>
1395 <para>
1396 Parallel statements (as with <literal>-XParallelListComp</literal>):
1397 </para>
1398
1399 <programlisting>
1400 [ (x+y) | x &lt;- [1..10]
1401 | y &lt;- [11..20]
1402 ]
1403 </programlisting>
1404
1405 <para>
1406 Parallel statements are translated using the
1407 <literal>mzip</literal> function, which requires a
1408 <literal>MonadZip</literal> instance defined in
1409 <ulink url="&libraryBaseLocation;/Control-Monad-Zip.html"><literal>Control.Monad.Zip</literal></ulink>:
1410 </para>
1411
1412 <programlisting>
1413 do (x,y) &lt;- mzip (do x &lt;- [1..10]
1414 return x)
1415 (do y &lt;- [11..20]
1416 return y)
1417 return (x+y)
1418 </programlisting>
1419
1420 </listitem>
1421 </itemizedlist>
1422
1423 <para>
1424 All these features are enabled by default if the
1425 <literal>MonadComprehensions</literal> extension is enabled. The types
1426 and more detailed examples on how to use comprehensions are explained
1427 in the previous chapters <xref
1428 linkend="generalised-list-comprehensions"/> and <xref
1429 linkend="parallel-list-comprehensions"/>. In general you just have
1430 to replace the type <literal>[a]</literal> with the type
1431 <literal>Monad m => m a</literal> for monad comprehensions.
1432 </para>
1433
1434 <para>
1435 Note: Even though most of these examples are using the list monad,
1436 monad comprehensions work for any monad.
1437 The <literal>base</literal> package offers all necessary instances for
1438 lists, which make <literal>MonadComprehensions</literal> backward
1439 compatible to built-in, transform and parallel list comprehensions.
1440 </para>
1441 <para> More formally, the desugaring is as follows. We write <literal>D[ e | Q]</literal>
1442 to mean the desugaring of the monad comprehension <literal>[ e | Q]</literal>:
1443 <programlisting>
1444 Expressions: e
1445 Declarations: d
1446 Lists of qualifiers: Q,R,S
1447
1448 -- Basic forms
1449 D[ e | ] = return e
1450 D[ e | p &lt;- e, Q ] = e &gt;&gt;= \p -&gt; D[ e | Q ]
1451 D[ e | e, Q ] = guard e &gt;&gt; \p -&gt; D[ e | Q ]
1452 D[ e | let d, Q ] = let d in D[ e | Q ]
1453
1454 -- Parallel comprehensions (iterate for multiple parallel branches)
1455 D[ e | (Q | R), S ] = mzip D[ Qv | Q ] D[ Rv | R ] &gt;&gt;= \(Qv,Rv) -&gt; D[ e | S ]
1456
1457 -- Transform comprehensions
1458 D[ e | Q then f, R ] = f D[ Qv | Q ] &gt;&gt;= \Qv -&gt; D[ e | R ]
1459
1460 D[ e | Q then f by b, R ] = f (\Qv -&gt; b) D[ Qv | Q ] &gt;&gt;= \Qv -&gt; D[ e | R ]
1461
1462 D[ e | Q then group using f, R ] = f D[ Qv | Q ] &gt;&gt;= \ys -&gt;
1463 case (fmap selQv1 ys, ..., fmap selQvn ys) of
1464 Qv -&gt; D[ e | R ]
1465
1466 D[ e | Q then group by b using f, R ] = f (\Qv -&gt; b) D[ Qv | Q ] &gt;&gt;= \ys -&gt;
1467 case (fmap selQv1 ys, ..., fmap selQvn ys) of
1468 Qv -&gt; D[ e | R ]
1469
1470 where Qv is the tuple of variables bound by Q (and used subsequently)
1471 selQvi is a selector mapping Qv to the ith component of Qv
1472
1473 Operator Standard binding Expected type
1474 --------------------------------------------------------------------
1475 return GHC.Base t1 -&gt; m t2
1476 (&gt;&gt;=) GHC.Base m1 t1 -&gt; (t2 -&gt; m2 t3) -&gt; m3 t3
1477 (&gt;&gt;) GHC.Base m1 t1 -&gt; m2 t2 -&gt; m3 t3
1478 guard Control.Monad t1 -&gt; m t2
1479 fmap GHC.Base forall a b. (a-&gt;b) -&gt; n a -&gt; n b
1480 mzip Control.Monad.Zip forall a b. m a -&gt; m b -&gt; m (a,b)
1481 </programlisting>
1482 The comprehension should typecheck when its desugaring would typecheck.
1483 </para>
1484 <para>
1485 Monad comprehensions support rebindable syntax (<xref linkend="rebindable-syntax"/>).
1486 Without rebindable
1487 syntax, the operators from the "standard binding" module are used; with
1488 rebindable syntax, the operators are looked up in the current lexical scope.
1489 For example, parallel comprehensions will be typechecked and desugared
1490 using whatever "<literal>mzip</literal>" is in scope.
1491 </para>
1492 <para>
1493 The rebindable operators must have the "Expected type" given in the
1494 table above. These types are surprisingly general. For example, you can
1495 use a bind operator with the type
1496 <programlisting>
1497 (>>=) :: T x y a -> (a -> T y z b) -> T x z b
1498 </programlisting>
1499 In the case of transform comprehensions, notice that the groups are
1500 parameterised over some arbitrary type <literal>n</literal> (provided it
1501 has an <literal>fmap</literal>, as well as
1502 the comprehension being over an arbitrary monad.
1503 </para>
1504 </sect2>
1505
1506 <!-- ===================== REBINDABLE SYNTAX =================== -->
1507
1508 <sect2 id="rebindable-syntax">
1509 <title>Rebindable syntax and the implicit Prelude import</title>
1510
1511 <para><indexterm><primary>-XNoImplicitPrelude
1512 option</primary></indexterm> GHC normally imports
1513 <filename>Prelude.hi</filename> files for you. If you'd
1514 rather it didn't, then give it a
1515 <option>-XNoImplicitPrelude</option> option. The idea is
1516 that you can then import a Prelude of your own. (But don't
1517 call it <literal>Prelude</literal>; the Haskell module
1518 namespace is flat, and you must not conflict with any
1519 Prelude module.)</para>
1520
1521 <para>Suppose you are importing a Prelude of your own
1522 in order to define your own numeric class
1523 hierarchy. It completely defeats that purpose if the
1524 literal "1" means "<literal>Prelude.fromInteger
1525 1</literal>", which is what the Haskell Report specifies.
1526 So the <option>-XRebindableSyntax</option>
1527 flag causes
1528 the following pieces of built-in syntax to refer to
1529 <emphasis>whatever is in scope</emphasis>, not the Prelude
1530 versions:
1531 <itemizedlist>
1532 <listitem>
1533 <para>An integer literal <literal>368</literal> means
1534 "<literal>fromInteger (368::Integer)</literal>", rather than
1535 "<literal>Prelude.fromInteger (368::Integer)</literal>".
1536 </para> </listitem>
1537
1538 <listitem><para>Fractional literals are handed in just the same way,
1539 except that the translation is
1540 <literal>fromRational (3.68::Rational)</literal>.
1541 </para> </listitem>
1542
1543 <listitem><para>The equality test in an overloaded numeric pattern
1544 uses whatever <literal>(==)</literal> is in scope.
1545 </para> </listitem>
1546
1547 <listitem><para>The subtraction operation, and the
1548 greater-than-or-equal test, in <literal>n+k</literal> patterns
1549 use whatever <literal>(-)</literal> and <literal>(>=)</literal> are in scope.
1550 </para></listitem>
1551
1552 <listitem>
1553 <para>Negation (e.g. "<literal>- (f x)</literal>")
1554 means "<literal>negate (f x)</literal>", both in numeric
1555 patterns, and expressions.
1556 </para></listitem>
1557
1558 <listitem>
1559 <para>Conditionals (e.g. "<literal>if</literal> e1 <literal>then</literal> e2 <literal>else</literal> e3")
1560 means "<literal>ifThenElse</literal> e1 e2 e3". However <literal>case</literal> expressions are unaffected.
1561 </para></listitem>
1562
1563 <listitem>
1564 <para>"Do" notation is translated using whatever
1565 functions <literal>(>>=)</literal>,
1566 <literal>(>>)</literal>, and <literal>fail</literal>,
1567 are in scope (not the Prelude
1568 versions). List comprehensions, mdo (<xref linkend="recursive-do-notation"/>), and parallel array
1569 comprehensions, are unaffected. </para></listitem>
1570
1571 <listitem>
1572 <para>Arrow
1573 notation (see <xref linkend="arrow-notation"/>)
1574 uses whatever <literal>arr</literal>,
1575 <literal>(>>>)</literal>, <literal>first</literal>,
1576 <literal>app</literal>, <literal>(|||)</literal> and
1577 <literal>loop</literal> functions are in scope. But unlike the
1578 other constructs, the types of these functions must match the
1579 Prelude types very closely. Details are in flux; if you want
1580 to use this, ask!
1581 </para></listitem>
1582 </itemizedlist>
1583 <option>-XRebindableSyntax</option> implies <option>-XNoImplicitPrelude</option>.
1584 </para>
1585 <para>
1586 In all cases (apart from arrow notation), the static semantics should be that of the desugared form,
1587 even if that is a little unexpected. For example, the
1588 static semantics of the literal <literal>368</literal>
1589 is exactly that of <literal>fromInteger (368::Integer)</literal>; it's fine for
1590 <literal>fromInteger</literal> to have any of the types:
1591 <programlisting>
1592 fromInteger :: Integer -> Integer
1593 fromInteger :: forall a. Foo a => Integer -> a
1594 fromInteger :: Num a => a -> Integer
1595 fromInteger :: Integer -> Bool -> Bool
1596 </programlisting>
1597 </para>
1598
1599 <para>Be warned: this is an experimental facility, with
1600 fewer checks than usual. Use <literal>-dcore-lint</literal>
1601 to typecheck the desugared program. If Core Lint is happy
1602 you should be all right.</para>
1603
1604 </sect2>
1605
1606 <sect2 id="postfix-operators">
1607 <title>Postfix operators</title>
1608
1609 <para>
1610 The <option>-XPostfixOperators</option> flag enables a small
1611 extension to the syntax of left operator sections, which allows you to
1612 define postfix operators. The extension is this: the left section
1613 <programlisting>
1614 (e !)
1615 </programlisting>
1616 is equivalent (from the point of view of both type checking and execution) to the expression
1617 <programlisting>
1618 ((!) e)
1619 </programlisting>
1620 (for any expression <literal>e</literal> and operator <literal>(!)</literal>.
1621 The strict Haskell 98 interpretation is that the section is equivalent to
1622 <programlisting>
1623 (\y -> (!) e y)
1624 </programlisting>
1625 That is, the operator must be a function of two arguments. GHC allows it to
1626 take only one argument, and that in turn allows you to write the function
1627 postfix.
1628 </para>
1629 <para>The extension does not extend to the left-hand side of function
1630 definitions; you must define such a function in prefix form.</para>
1631
1632 </sect2>
1633
1634 <sect2 id="tuple-sections">
1635 <title>Tuple sections</title>
1636
1637 <para>
1638 The <option>-XTupleSections</option> flag enables Python-style partially applied
1639 tuple constructors. For example, the following program
1640 <programlisting>
1641 (, True)
1642 </programlisting>
1643 is considered to be an alternative notation for the more unwieldy alternative
1644 <programlisting>
1645 \x -> (x, True)
1646 </programlisting>
1647 You can omit any combination of arguments to the tuple, as in the following
1648 <programlisting>
1649 (, "I", , , "Love", , 1337)
1650 </programlisting>
1651 which translates to
1652 <programlisting>
1653 \a b c d -> (a, "I", b, c, "Love", d, 1337)
1654 </programlisting>
1655 </para>
1656
1657 <para>
1658 If you have <link linkend="unboxed-tuples">unboxed tuples</link> enabled, tuple sections
1659 will also be available for them, like so
1660 <programlisting>
1661 (# , True #)
1662 </programlisting>
1663 Because there is no unboxed unit tuple, the following expression
1664 <programlisting>
1665 (# #)
1666 </programlisting>
1667 continues to stand for the unboxed singleton tuple data constructor.
1668 </para>
1669
1670 </sect2>
1671
1672 <sect2 id="disambiguate-fields">
1673 <title>Record field disambiguation</title>
1674 <para>
1675 In record construction and record pattern matching
1676 it is entirely unambiguous which field is referred to, even if there are two different
1677 data types in scope with a common field name. For example:
1678 <programlisting>
1679 module M where
1680 data S = MkS { x :: Int, y :: Bool }
1681
1682 module Foo where
1683 import M
1684
1685 data T = MkT { x :: Int }
1686
1687 ok1 (MkS { x = n }) = n+1 -- Unambiguous
1688 ok2 n = MkT { x = n+1 } -- Unambiguous
1689
1690 bad1 k = k { x = 3 } -- Ambiguous
1691 bad2 k = x k -- Ambiguous
1692 </programlisting>
1693 Even though there are two <literal>x</literal>'s in scope,
1694 it is clear that the <literal>x</literal> in the pattern in the
1695 definition of <literal>ok1</literal> can only mean the field
1696 <literal>x</literal> from type <literal>S</literal>. Similarly for
1697 the function <literal>ok2</literal>. However, in the record update
1698 in <literal>bad1</literal> and the record selection in <literal>bad2</literal>
1699 it is not clear which of the two types is intended.
1700 </para>
1701 <para>
1702 Haskell 98 regards all four as ambiguous, but with the
1703 <option>-XDisambiguateRecordFields</option> flag, GHC will accept
1704 the former two. The rules are precisely the same as those for instance
1705 declarations in Haskell 98, where the method names on the left-hand side
1706 of the method bindings in an instance declaration refer unambiguously
1707 to the method of that class (provided they are in scope at all), even
1708 if there are other variables in scope with the same name.
1709 This reduces the clutter of qualified names when you import two
1710 records from different modules that use the same field name.
1711 </para>
1712 <para>
1713 Some details:
1714 <itemizedlist>
1715 <listitem><para>
1716 Field disambiguation can be combined with punning (see <xref linkend="record-puns"/>). For example:
1717 <programlisting>
1718 module Foo where
1719 import M
1720 x=True
1721 ok3 (MkS { x }) = x+1 -- Uses both disambiguation and punning
1722 </programlisting>
1723 </para></listitem>
1724
1725 <listitem><para>
1726 With <option>-XDisambiguateRecordFields</option> you can use <emphasis>unqualified</emphasis>
1727 field names even if the corresponding selector is only in scope <emphasis>qualified</emphasis>
1728 For example, assuming the same module <literal>M</literal> as in our earlier example, this is legal:
1729 <programlisting>
1730 module Foo where
1731 import qualified M -- Note qualified
1732
1733 ok4 (M.MkS { x = n }) = n+1 -- Unambiguous
1734 </programlisting>
1735 Since the constructor <literal>MkS</literal> is only in scope qualified, you must
1736 name it <literal>M.MkS</literal>, but the field <literal>x</literal> does not need
1737 to be qualified even though <literal>M.x</literal> is in scope but <literal>x</literal>
1738 is not. (In effect, it is qualified by the constructor.)
1739 </para></listitem>
1740 </itemizedlist>
1741 </para>
1742
1743 </sect2>
1744
1745 <!-- ===================== Record puns =================== -->
1746
1747 <sect2 id="record-puns">
1748 <title>Record puns
1749 </title>
1750
1751 <para>
1752 Record puns are enabled by the flag <literal>-XNamedFieldPuns</literal>.
1753 </para>
1754
1755 <para>
1756 When using records, it is common to write a pattern that binds a
1757 variable with the same name as a record field, such as:
1758
1759 <programlisting>
1760 data C = C {a :: Int}
1761 f (C {a = a}) = a
1762 </programlisting>
1763 </para>
1764
1765 <para>
1766 Record punning permits the variable name to be elided, so one can simply
1767 write
1768
1769 <programlisting>
1770 f (C {a}) = a
1771 </programlisting>
1772
1773 to mean the same pattern as above. That is, in a record pattern, the
1774 pattern <literal>a</literal> expands into the pattern <literal>a =
1775 a</literal> for the same name <literal>a</literal>.
1776 </para>
1777
1778 <para>
1779 Note that:
1780 <itemizedlist>
1781 <listitem><para>
1782 Record punning can also be used in an expression, writing, for example,
1783 <programlisting>
1784 let a = 1 in C {a}
1785 </programlisting>
1786 instead of
1787 <programlisting>
1788 let a = 1 in C {a = a}
1789 </programlisting>
1790 The expansion is purely syntactic, so the expanded right-hand side
1791 expression refers to the nearest enclosing variable that is spelled the
1792 same as the field name.
1793 </para></listitem>
1794
1795 <listitem><para>
1796 Puns and other patterns can be mixed in the same record:
1797 <programlisting>
1798 data C = C {a :: Int, b :: Int}
1799 f (C {a, b = 4}) = a
1800 </programlisting>
1801 </para></listitem>
1802
1803 <listitem><para>
1804 Puns can be used wherever record patterns occur (e.g. in
1805 <literal>let</literal> bindings or at the top-level).
1806 </para></listitem>
1807
1808 <listitem><para>
1809 A pun on a qualified field name is expanded by stripping off the module qualifier.
1810 For example:
1811 <programlisting>
1812 f (C {M.a}) = a
1813 </programlisting>
1814 means
1815 <programlisting>
1816 f (M.C {M.a = a}) = a
1817 </programlisting>
1818 (This is useful if the field selector <literal>a</literal> for constructor <literal>M.C</literal>
1819 is only in scope in qualified form.)
1820 </para></listitem>
1821 </itemizedlist>
1822 </para>
1823
1824
1825 </sect2>
1826
1827 <!-- ===================== Record wildcards =================== -->
1828
1829 <sect2 id="record-wildcards">
1830 <title>Record wildcards
1831 </title>
1832
1833 <para>
1834 Record wildcards are enabled by the flag <literal>-XRecordWildCards</literal>.
1835 This flag implies <literal>-XDisambiguateRecordFields</literal>.
1836 </para>
1837
1838 <para>
1839 For records with many fields, it can be tiresome to write out each field
1840 individually in a record pattern, as in
1841 <programlisting>
1842 data C = C {a :: Int, b :: Int, c :: Int, d :: Int}
1843 f (C {a = 1, b = b, c = c, d = d}) = b + c + d
1844 </programlisting>
1845 </para>
1846
1847 <para>
1848 Record wildcard syntax permits a "<literal>..</literal>" in a record
1849 pattern, where each elided field <literal>f</literal> is replaced by the
1850 pattern <literal>f = f</literal>. For example, the above pattern can be
1851 written as
1852 <programlisting>
1853 f (C {a = 1, ..}) = b + c + d
1854 </programlisting>
1855 </para>
1856
1857 <para>
1858 More details:
1859 <itemizedlist>
1860 <listitem><para>
1861 Wildcards can be mixed with other patterns, including puns
1862 (<xref linkend="record-puns"/>); for example, in a pattern <literal>C {a
1863 = 1, b, ..})</literal>. Additionally, record wildcards can be used
1864 wherever record patterns occur, including in <literal>let</literal>
1865 bindings and at the top-level. For example, the top-level binding
1866 <programlisting>
1867 C {a = 1, ..} = e
1868 </programlisting>
1869 defines <literal>b</literal>, <literal>c</literal>, and
1870 <literal>d</literal>.
1871 </para></listitem>
1872
1873 <listitem><para>
1874 Record wildcards can also be used in expressions, writing, for example,
1875 <programlisting>
1876 let {a = 1; b = 2; c = 3; d = 4} in C {..}
1877 </programlisting>
1878 in place of
1879 <programlisting>
1880 let {a = 1; b = 2; c = 3; d = 4} in C {a=a, b=b, c=c, d=d}
1881 </programlisting>
1882 The expansion is purely syntactic, so the record wildcard
1883 expression refers to the nearest enclosing variables that are spelled
1884 the same as the omitted field names.
1885 </para></listitem>
1886
1887 <listitem><para>
1888 The "<literal>..</literal>" expands to the missing
1889 <emphasis>in-scope</emphasis> record fields.
1890 Specifically the expansion of "<literal>C {..}</literal>" includes
1891 <literal>f</literal> if and only if:
1892 <itemizedlist>
1893 <listitem><para>
1894 <literal>f</literal> is a record field of constructor <literal>C</literal>.
1895 </para></listitem>
1896 <listitem><para>
1897 The record field <literal>f</literal> is in scope somehow (either qualified or unqualified).
1898 </para></listitem>
1899 <listitem><para>
1900 In the case of expressions (but not patterns),
1901 the variable <literal>f</literal> is in scope unqualified,
1902 apart from the binding of the record selector itself.
1903 </para></listitem>
1904 </itemizedlist>
1905 For example
1906 <programlisting>
1907 module M where
1908 data R = R { a,b,c :: Int }
1909 module X where
1910 import M( R(a,c) )
1911 f b = R { .. }
1912 </programlisting>
1913 The <literal>R{..}</literal> expands to <literal>R{M.a=a}</literal>,
1914 omitting <literal>b</literal> since the record field is not in scope,
1915 and omitting <literal>c</literal> since the variable <literal>c</literal>
1916 is not in scope (apart from the binding of the
1917 record selector <literal>c</literal>, of course).
1918 </para></listitem>
1919 </itemizedlist>
1920 </para>
1921
1922 </sect2>
1923
1924 <!-- ===================== Local fixity declarations =================== -->
1925
1926 <sect2 id="local-fixity-declarations">
1927 <title>Local Fixity Declarations
1928 </title>
1929
1930 <para>A careful reading of the Haskell 98 Report reveals that fixity
1931 declarations (<literal>infix</literal>, <literal>infixl</literal>, and
1932 <literal>infixr</literal>) are permitted to appear inside local bindings
1933 such those introduced by <literal>let</literal> and
1934 <literal>where</literal>. However, the Haskell Report does not specify
1935 the semantics of such bindings very precisely.
1936 </para>
1937
1938 <para>In GHC, a fixity declaration may accompany a local binding:
1939 <programlisting>
1940 let f = ...
1941 infixr 3 `f`
1942 in
1943 ...
1944 </programlisting>
1945 and the fixity declaration applies wherever the binding is in scope.
1946 For example, in a <literal>let</literal>, it applies in the right-hand
1947 sides of other <literal>let</literal>-bindings and the body of the
1948 <literal>let</literal>C. Or, in recursive <literal>do</literal>
1949 expressions (<xref linkend="recursive-do-notation"/>), the local fixity
1950 declarations of a <literal>let</literal> statement scope over other
1951 statements in the group, just as the bound name does.
1952 </para>
1953
1954 <para>
1955 Moreover, a local fixity declaration *must* accompany a local binding of
1956 that name: it is not possible to revise the fixity of name bound
1957 elsewhere, as in
1958 <programlisting>
1959 let infixr 9 $ in ...
1960 </programlisting>
1961
1962 Because local fixity declarations are technically Haskell 98, no flag is
1963 necessary to enable them.
1964 </para>
1965 </sect2>
1966
1967 <sect2 id="package-imports">
1968 <title>Package-qualified imports</title>
1969
1970 <para>With the <option>-XPackageImports</option> flag, GHC allows
1971 import declarations to be qualified by the package name that the
1972 module is intended to be imported from. For example:</para>
1973
1974 <programlisting>
1975 import "network" Network.Socket
1976 </programlisting>
1977
1978 <para>would import the module <literal>Network.Socket</literal> from
1979 the package <literal>network</literal> (any version). This may
1980 be used to disambiguate an import when the same module is
1981 available from multiple packages, or is present in both the
1982 current package being built and an external package.</para>
1983
1984 <para>Note: you probably don't need to use this feature, it was
1985 added mainly so that we can build backwards-compatible versions of
1986 packages when APIs change. It can lead to fragile dependencies in
1987 the common case: modules occasionally move from one package to
1988 another, rendering any package-qualified imports broken.</para>
1989 </sect2>
1990
1991 <sect2 id="safe-imports-ext">
1992 <title>Safe imports</title>
1993
1994 <para>With the <option>-XSafe</option>, <option>-XTrustworthy</option>
1995 and <option>-XUnsafe</option> language flags, GHC extends
1996 the import declaration syntax to take an optional <literal>safe</literal>
1997 keyword after the <literal>import</literal> keyword. This feature
1998 is part of the Safe Haskell GHC extension. For example:</para>
1999
2000 <programlisting>
2001 import safe qualified Network.Socket as NS
2002 </programlisting>
2003
2004 <para>would import the module <literal>Network.Socket</literal>
2005 with compilation only succeeding if Network.Socket can be
2006 safely imported. For a description of when a import is
2007 considered safe see <xref linkend="safe-haskell"/></para>
2008
2009 </sect2>
2010
2011 <sect2 id="syntax-stolen">
2012 <title>Summary of stolen syntax</title>
2013
2014 <para>Turning on an option that enables special syntax
2015 <emphasis>might</emphasis> cause working Haskell 98 code to fail
2016 to compile, perhaps because it uses a variable name which has
2017 become a reserved word. This section lists the syntax that is
2018 "stolen" by language extensions.
2019 We use
2020 notation and nonterminal names from the Haskell 98 lexical syntax
2021 (see the Haskell 98 Report).
2022 We only list syntax changes here that might affect
2023 existing working programs (i.e. "stolen" syntax). Many of these
2024 extensions will also enable new context-free syntax, but in all
2025 cases programs written to use the new syntax would not be
2026 compilable without the option enabled.</para>
2027
2028 <para>There are two classes of special
2029 syntax:
2030
2031 <itemizedlist>
2032 <listitem>
2033 <para>New reserved words and symbols: character sequences
2034 which are no longer available for use as identifiers in the
2035 program.</para>
2036 </listitem>
2037 <listitem>
2038 <para>Other special syntax: sequences of characters that have
2039 a different meaning when this particular option is turned
2040 on.</para>
2041 </listitem>
2042 </itemizedlist>
2043
2044 The following syntax is stolen:
2045
2046 <variablelist>
2047 <varlistentry>
2048 <term>
2049 <literal>forall</literal>
2050 <indexterm><primary><literal>forall</literal></primary></indexterm>
2051 </term>
2052 <listitem><para>
2053 Stolen (in types) by: <option>-XExplicitForAll</option>, and hence by
2054 <option>-XScopedTypeVariables</option>,
2055 <option>-XLiberalTypeSynonyms</option>,
2056 <option>-XRank2Types</option>,
2057 <option>-XRankNTypes</option>,
2058 <option>-XPolymorphicComponents</option>,
2059 <option>-XExistentialQuantification</option>
2060 </para></listitem>
2061 </varlistentry>
2062
2063 <varlistentry>
2064 <term>
2065 <literal>mdo</literal>
2066 <indexterm><primary><literal>mdo</literal></primary></indexterm>
2067 </term>
2068 <listitem><para>
2069 Stolen by: <option>-XRecursiveDo</option>
2070 </para></listitem>
2071 </varlistentry>
2072
2073 <varlistentry>
2074 <term>
2075 <literal>foreign</literal>
2076 <indexterm><primary><literal>foreign</literal></primary></indexterm>
2077 </term>
2078 <listitem><para>
2079 Stolen by: <option>-XForeignFunctionInterface</option>
2080 </para></listitem>
2081 </varlistentry>
2082
2083 <varlistentry>
2084 <term>
2085 <literal>rec</literal>,
2086 <literal>proc</literal>, <literal>-&lt;</literal>,
2087 <literal>&gt;-</literal>, <literal>-&lt;&lt;</literal>,
2088 <literal>&gt;&gt;-</literal>, and <literal>(|</literal>,
2089 <literal>|)</literal> brackets
2090 <indexterm><primary><literal>proc</literal></primary></indexterm>
2091 </term>
2092 <listitem><para>
2093 Stolen by: <option>-XArrows</option>
2094 </para></listitem>
2095 </varlistentry>
2096
2097 <varlistentry>
2098 <term>
2099 <literal>?<replaceable>varid</replaceable></literal>,
2100 <literal>%<replaceable>varid</replaceable></literal>
2101 <indexterm><primary>implicit parameters</primary></indexterm>
2102 </term>
2103 <listitem><para>
2104 Stolen by: <option>-XImplicitParams</option>
2105 </para></listitem>
2106 </varlistentry>
2107
2108 <varlistentry>
2109 <term>
2110 <literal>[|</literal>,
2111 <literal>[e|</literal>, <literal>[p|</literal>,
2112 <literal>[d|</literal>, <literal>[t|</literal>,
2113 <literal>$(</literal>,
2114 <literal>$<replaceable>varid</replaceable></literal>
2115 <indexterm><primary>Template Haskell</primary></indexterm>
2116 </term>
2117 <listitem><para>
2118 Stolen by: <option>-XTemplateHaskell</option>
2119 </para></listitem>
2120 </varlistentry>
2121
2122 <varlistentry>
2123 <term>
2124 <literal>[:<replaceable>varid</replaceable>|</literal>
2125 <indexterm><primary>quasi-quotation</primary></indexterm>
2126 </term>
2127 <listitem><para>
2128 Stolen by: <option>-XQuasiQuotes</option>
2129 </para></listitem>
2130 </varlistentry>
2131
2132 <varlistentry>
2133 <term>
2134 <replaceable>varid</replaceable>{<literal>&num;</literal>},
2135 <replaceable>char</replaceable><literal>&num;</literal>,
2136 <replaceable>string</replaceable><literal>&num;</literal>,
2137 <replaceable>integer</replaceable><literal>&num;</literal>,
2138 <replaceable>float</replaceable><literal>&num;</literal>,
2139 <replaceable>float</replaceable><literal>&num;&num;</literal>,
2140 <literal>(&num;</literal>, <literal>&num;)</literal>
2141 </term>
2142 <listitem><para>
2143 Stolen by: <option>-XMagicHash</option>
2144 </para></listitem>
2145 </varlistentry>
2146 </variablelist>
2147 </para>
2148 </sect2>
2149 </sect1>
2150
2151
2152 <!-- TYPE SYSTEM EXTENSIONS -->
2153 <sect1 id="data-type-extensions">
2154 <title>Extensions to data types and type synonyms</title>
2155
2156 <sect2 id="nullary-types">
2157 <title>Data types with no constructors</title>
2158
2159 <para>With the <option>-XEmptyDataDecls</option> flag (or equivalent LANGUAGE pragma),
2160 GHC lets you declare a data type with no constructors. For example:</para>
2161
2162 <programlisting>
2163 data S -- S :: *
2164 data T a -- T :: * -> *
2165 </programlisting>
2166
2167 <para>Syntactically, the declaration lacks the "= constrs" part. The
2168 type can be parameterised over types of any kind, but if the kind is
2169 not <literal>*</literal> then an explicit kind annotation must be used
2170 (see <xref linkend="kinding"/>).</para>
2171
2172 <para>Such data types have only one value, namely bottom.
2173 Nevertheless, they can be useful when defining "phantom types".</para>
2174 </sect2>
2175
2176 <sect2 id="datatype-contexts">
2177 <title>Data type contexts</title>
2178
2179 <para>Haskell allows datatypes to be given contexts, e.g.</para>
2180
2181 <programlisting>
2182 data Eq a => Set a = NilSet | ConsSet a (Set a)
2183 </programlisting>
2184
2185 <para>give constructors with types:</para>
2186
2187 <programlisting>
2188 NilSet :: Set a
2189 ConsSet :: Eq a => a -> Set a -> Set a
2190 </programlisting>
2191
2192 <para>This is widely considered a misfeature, and is going to be removed from
2193 the language. In GHC, it is controlled by the deprecated extension
2194 <literal>DatatypeContexts</literal>.</para>
2195 </sect2>
2196
2197 <sect2 id="infix-tycons">
2198 <title>Infix type constructors, classes, and type variables</title>
2199
2200 <para>
2201 GHC allows type constructors, classes, and type variables to be operators, and
2202 to be written infix, very much like expressions. More specifically:
2203 <itemizedlist>
2204 <listitem><para>
2205 A type constructor or class can be an operator, beginning with a colon; e.g. <literal>:*:</literal>.
2206 The lexical syntax is the same as that for data constructors.
2207 </para></listitem>
2208 <listitem><para>
2209 Data type and type-synonym declarations can be written infix, parenthesised
2210 if you want further arguments. E.g.
2211 <screen>
2212 data a :*: b = Foo a b
2213 type a :+: b = Either a b
2214 class a :=: b where ...
2215
2216 data (a :**: b) x = Baz a b x
2217 type (a :++: b) y = Either (a,b) y
2218 </screen>
2219 </para></listitem>
2220 <listitem><para>
2221 Types, and class constraints, can be written infix. For example
2222 <screen>
2223 x :: Int :*: Bool
2224 f :: (a :=: b) => a -> b
2225 </screen>
2226 </para></listitem>
2227 <listitem><para>
2228 A type variable can be an (unqualified) operator e.g. <literal>+</literal>.
2229 The lexical syntax is the same as that for variable operators, excluding "(.)",
2230 "(!)", and "(*)". In a binding position, the operator must be
2231 parenthesised. For example:
2232 <programlisting>
2233 type T (+) = Int + Int
2234 f :: T Either
2235 f = Left 3
2236
2237 liftA2 :: Arrow (~>)
2238 => (a -> b -> c) -> (e ~> a) -> (e ~> b) -> (e ~> c)
2239 liftA2 = ...
2240 </programlisting>
2241 </para></listitem>
2242 <listitem><para>
2243 Back-quotes work
2244 as for expressions, both for type constructors and type variables; e.g. <literal>Int `Either` Bool</literal>, or
2245 <literal>Int `a` Bool</literal>. Similarly, parentheses work the same; e.g. <literal>(:*:) Int Bool</literal>.
2246 </para></listitem>
2247 <listitem><para>
2248 Fixities may be declared for type constructors, or classes, just as for data constructors. However,
2249 one cannot distinguish between the two in a fixity declaration; a fixity declaration
2250 sets the fixity for a data constructor and the corresponding type constructor. For example:
2251 <screen>
2252 infixl 7 T, :*:
2253 </screen>
2254 sets the fixity for both type constructor <literal>T</literal> and data constructor <literal>T</literal>,
2255 and similarly for <literal>:*:</literal>.
2256 <literal>Int `a` Bool</literal>.
2257 </para></listitem>
2258 <listitem><para>
2259 Function arrow is <literal>infixr</literal> with fixity 0. (This might change; I'm not sure what it should be.)
2260 </para></listitem>
2261
2262 </itemizedlist>
2263 </para>
2264 </sect2>
2265
2266 <sect2 id="type-synonyms">
2267 <title>Liberalised type synonyms</title>
2268
2269 <para>
2270 Type synonyms are like macros at the type level, but Haskell 98 imposes many rules
2271 on individual synonym declarations.
2272 With the <option>-XLiberalTypeSynonyms</option> extension,
2273 GHC does validity checking on types <emphasis>only after expanding type synonyms</emphasis>.
2274 That means that GHC can be very much more liberal about type synonyms than Haskell 98.
2275
2276 <itemizedlist>
2277 <listitem> <para>You can write a <literal>forall</literal> (including overloading)
2278 in a type synonym, thus:
2279 <programlisting>
2280 type Discard a = forall b. Show b => a -> b -> (a, String)
2281
2282 f :: Discard a
2283 f x y = (x, show y)
2284
2285 g :: Discard Int -> (Int,String) -- A rank-2 type
2286 g f = f 3 True
2287 </programlisting>
2288 </para>
2289 </listitem>
2290
2291 <listitem><para>
2292 If you also use <option>-XUnboxedTuples</option>,
2293 you can write an unboxed tuple in a type synonym:
2294 <programlisting>
2295 type Pr = (# Int, Int #)
2296
2297 h :: Int -> Pr
2298 h x = (# x, x #)
2299 </programlisting>
2300 </para></listitem>
2301
2302 <listitem><para>
2303 You can apply a type synonym to a forall type:
2304 <programlisting>
2305 type Foo a = a -> a -> Bool
2306
2307 f :: Foo (forall b. b->b)
2308 </programlisting>
2309 After expanding the synonym, <literal>f</literal> has the legal (in GHC) type:
2310 <programlisting>
2311 f :: (forall b. b->b) -> (forall b. b->b) -> Bool
2312 </programlisting>
2313 </para></listitem>
2314
2315 <listitem><para>
2316 You can apply a type synonym to a partially applied type synonym:
2317 <programlisting>
2318 type Generic i o = forall x. i x -> o x
2319 type Id x = x
2320
2321 foo :: Generic Id []
2322 </programlisting>
2323 After expanding the synonym, <literal>foo</literal> has the legal (in GHC) type:
2324 <programlisting>
2325 foo :: forall x. x -> [x]
2326 </programlisting>
2327 </para></listitem>
2328
2329 </itemizedlist>
2330 </para>
2331
2332 <para>
2333 GHC currently does kind checking before expanding synonyms (though even that
2334 could be changed.)
2335 </para>
2336 <para>
2337 After expanding type synonyms, GHC does validity checking on types, looking for
2338 the following mal-formedness which isn't detected simply by kind checking:
2339 <itemizedlist>
2340 <listitem><para>
2341 Type constructor applied to a type involving for-alls.
2342 </para></listitem>
2343 <listitem><para>
2344 Unboxed tuple on left of an arrow.
2345 </para></listitem>
2346 <listitem><para>
2347 Partially-applied type synonym.
2348 </para></listitem>
2349 </itemizedlist>
2350 So, for example,
2351 this will be rejected:
2352 <programlisting>
2353 type Pr = (# Int, Int #)
2354
2355 h :: Pr -> Int
2356 h x = ...
2357 </programlisting>
2358 because GHC does not allow unboxed tuples on the left of a function arrow.
2359 </para>
2360 </sect2>
2361
2362
2363 <sect2 id="existential-quantification">
2364 <title>Existentially quantified data constructors
2365 </title>
2366
2367 <para>
2368 The idea of using existential quantification in data type declarations
2369 was suggested by Perry, and implemented in Hope+ (Nigel Perry, <emphasis>The Implementation
2370 of Practical Functional Programming Languages</emphasis>, PhD Thesis, University of
2371 London, 1991). It was later formalised by Laufer and Odersky
2372 (<emphasis>Polymorphic type inference and abstract data types</emphasis>,
2373 TOPLAS, 16(5), pp1411-1430, 1994).
2374 It's been in Lennart
2375 Augustsson's <command>hbc</command> Haskell compiler for several years, and
2376 proved very useful. Here's the idea. Consider the declaration:
2377 </para>
2378
2379 <para>
2380
2381 <programlisting>
2382 data Foo = forall a. MkFoo a (a -> Bool)
2383 | Nil
2384 </programlisting>
2385
2386 </para>
2387
2388 <para>
2389 The data type <literal>Foo</literal> has two constructors with types:
2390 </para>
2391
2392 <para>
2393
2394 <programlisting>
2395 MkFoo :: forall a. a -> (a -> Bool) -> Foo
2396 Nil :: Foo
2397 </programlisting>
2398
2399 </para>
2400
2401 <para>
2402 Notice that the type variable <literal>a</literal> in the type of <function>MkFoo</function>
2403 does not appear in the data type itself, which is plain <literal>Foo</literal>.
2404 For example, the following expression is fine:
2405 </para>
2406
2407 <para>
2408
2409 <programlisting>
2410 [MkFoo 3 even, MkFoo 'c' isUpper] :: [Foo]
2411 </programlisting>
2412
2413 </para>
2414
2415 <para>
2416 Here, <literal>(MkFoo 3 even)</literal> packages an integer with a function
2417 <function>even</function> that maps an integer to <literal>Bool</literal>; and <function>MkFoo 'c'
2418 isUpper</function> packages a character with a compatible function. These
2419 two things are each of type <literal>Foo</literal> and can be put in a list.
2420 </para>
2421
2422 <para>
2423 What can we do with a value of type <literal>Foo</literal>?. In particular,
2424 what happens when we pattern-match on <function>MkFoo</function>?
2425 </para>
2426
2427 <para>
2428
2429 <programlisting>
2430 f (MkFoo val fn) = ???
2431 </programlisting>
2432
2433 </para>
2434
2435 <para>
2436 Since all we know about <literal>val</literal> and <function>fn</function> is that they
2437 are compatible, the only (useful) thing we can do with them is to
2438 apply <function>fn</function> to <literal>val</literal> to get a boolean. For example:
2439 </para>
2440
2441 <para>
2442
2443 <programlisting>
2444 f :: Foo -> Bool
2445 f (MkFoo val fn) = fn val
2446 </programlisting>
2447
2448 </para>
2449
2450 <para>
2451 What this allows us to do is to package heterogeneous values
2452 together with a bunch of functions that manipulate them, and then treat
2453 that collection of packages in a uniform manner. You can express
2454 quite a bit of object-oriented-like programming this way.
2455 </para>
2456
2457 <sect3 id="existential">
2458 <title>Why existential?
2459 </title>
2460
2461 <para>
2462 What has this to do with <emphasis>existential</emphasis> quantification?
2463 Simply that <function>MkFoo</function> has the (nearly) isomorphic type
2464 </para>
2465
2466 <para>
2467
2468 <programlisting>
2469 MkFoo :: (exists a . (a, a -> Bool)) -> Foo
2470 </programlisting>
2471
2472 </para>
2473
2474 <para>
2475 But Haskell programmers can safely think of the ordinary
2476 <emphasis>universally</emphasis> quantified type given above, thereby avoiding
2477 adding a new existential quantification construct.
2478 </para>
2479
2480 </sect3>
2481
2482 <sect3 id="existential-with-context">
2483 <title>Existentials and type classes</title>
2484
2485 <para>
2486 An easy extension is to allow
2487 arbitrary contexts before the constructor. For example:
2488 </para>
2489
2490 <para>
2491
2492 <programlisting>
2493 data Baz = forall a. Eq a => Baz1 a a
2494 | forall b. Show b => Baz2 b (b -> b)
2495 </programlisting>
2496
2497 </para>
2498
2499 <para>
2500 The two constructors have the types you'd expect:
2501 </para>
2502
2503 <para>
2504
2505 <programlisting>
2506 Baz1 :: forall a. Eq a => a -> a -> Baz
2507 Baz2 :: forall b. Show b => b -> (b -> b) -> Baz
2508 </programlisting>
2509
2510 </para>
2511
2512 <para>
2513 But when pattern matching on <function>Baz1</function> the matched values can be compared
2514 for equality, and when pattern matching on <function>Baz2</function> the first matched
2515 value can be converted to a string (as well as applying the function to it).
2516 So this program is legal:
2517 </para>
2518
2519 <para>
2520
2521 <programlisting>
2522 f :: Baz -> String
2523 f (Baz1 p q) | p == q = "Yes"
2524 | otherwise = "No"
2525 f (Baz2 v fn) = show (fn v)
2526 </programlisting>
2527
2528 </para>
2529
2530 <para>
2531 Operationally, in a dictionary-passing implementation, the
2532 constructors <function>Baz1</function> and <function>Baz2</function> must store the
2533 dictionaries for <literal>Eq</literal> and <literal>Show</literal> respectively, and
2534 extract it on pattern matching.
2535 </para>
2536
2537 </sect3>
2538
2539 <sect3 id="existential-records">
2540 <title>Record Constructors</title>
2541
2542 <para>
2543 GHC allows existentials to be used with records syntax as well. For example:
2544
2545 <programlisting>
2546 data Counter a = forall self. NewCounter
2547 { _this :: self
2548 , _inc :: self -> self
2549 , _display :: self -> IO ()
2550 , tag :: a
2551 }
2552 </programlisting>
2553 Here <literal>tag</literal> is a public field, with a well-typed selector
2554 function <literal>tag :: Counter a -> a</literal>. The <literal>self</literal>
2555 type is hidden from the outside; any attempt to apply <literal>_this</literal>,
2556 <literal>_inc</literal> or <literal>_display</literal> as functions will raise a
2557 compile-time error. In other words, <emphasis>GHC defines a record selector function
2558 only for fields whose type does not mention the existentially-quantified variables</emphasis>.
2559 (This example used an underscore in the fields for which record selectors
2560 will not be defined, but that is only programming style; GHC ignores them.)
2561 </para>
2562
2563 <para>
2564 To make use of these hidden fields, we need to create some helper functions:
2565
2566 <programlisting>
2567 inc :: Counter a -> Counter a
2568 inc (NewCounter x i d t) = NewCounter
2569 { _this = i x, _inc = i, _display = d, tag = t }
2570
2571 display :: Counter a -> IO ()
2572 display NewCounter{ _this = x, _display = d } = d x
2573 </programlisting>
2574
2575 Now we can define counters with different underlying implementations:
2576
2577 <programlisting>
2578 counterA :: Counter String
2579 counterA = NewCounter
2580 { _this = 0, _inc = (1+), _display = print, tag = "A" }
2581
2582 counterB :: Counter String
2583 counterB = NewCounter
2584 { _this = "", _inc = ('#':), _display = putStrLn, tag = "B" }
2585
2586 main = do
2587 display (inc counterA) -- prints "1"
2588 display (inc (inc counterB)) -- prints "##"
2589 </programlisting>
2590
2591 Record update syntax is supported for existentials (and GADTs):
2592 <programlisting>
2593 setTag :: Counter a -> a -> Counter a
2594 setTag obj t = obj{ tag = t }
2595 </programlisting>
2596 The rule for record update is this: <emphasis>
2597 the types of the updated fields may
2598 mention only the universally-quantified type variables
2599 of the data constructor. For GADTs, the field may mention only types
2600 that appear as a simple type-variable argument in the constructor's result
2601 type</emphasis>. For example:
2602 <programlisting>
2603 data T a b where { T1 { f1::a, f2::b, f3::(b,c) } :: T a b } -- c is existential
2604 upd1 t x = t { f1=x } -- OK: upd1 :: T a b -> a' -> T a' b
2605 upd2 t x = t { f3=x } -- BAD (f3's type mentions c, which is
2606 -- existentially quantified)
2607
2608 data G a b where { G1 { g1::a, g2::c } :: G a [c] }
2609 upd3 g x = g { g1=x } -- OK: upd3 :: G a b -> c -> G c b
2610 upd4 g x = g { g2=x } -- BAD (f2's type mentions c, which is not a simple
2611 -- type-variable argument in G1's result type)
2612 </programlisting>
2613 </para>
2614
2615 </sect3>
2616
2617
2618 <sect3>
2619 <title>Restrictions</title>
2620
2621 <para>
2622 There are several restrictions on the ways in which existentially-quantified
2623 constructors can be use.
2624 </para>
2625
2626 <para>
2627
2628 <itemizedlist>
2629 <listitem>
2630
2631 <para>
2632 When pattern matching, each pattern match introduces a new,
2633 distinct, type for each existential type variable. These types cannot
2634 be unified with any other type, nor can they escape from the scope of
2635 the pattern match. For example, these fragments are incorrect:
2636
2637
2638 <programlisting>
2639 f1 (MkFoo a f) = a
2640 </programlisting>
2641
2642
2643 Here, the type bound by <function>MkFoo</function> "escapes", because <literal>a</literal>
2644 is the result of <function>f1</function>. One way to see why this is wrong is to
2645 ask what type <function>f1</function> has:
2646
2647
2648 <programlisting>
2649 f1 :: Foo -> a -- Weird!
2650 </programlisting>
2651
2652
2653 What is this "<literal>a</literal>" in the result type? Clearly we don't mean
2654 this:
2655
2656
2657 <programlisting>
2658 f1 :: forall a. Foo -> a -- Wrong!
2659 </programlisting>
2660
2661
2662 The original program is just plain wrong. Here's another sort of error
2663
2664
2665 <programlisting>
2666 f2 (Baz1 a b) (Baz1 p q) = a==q
2667 </programlisting>
2668
2669
2670 It's ok to say <literal>a==b</literal> or <literal>p==q</literal>, but
2671 <literal>a==q</literal> is wrong because it equates the two distinct types arising
2672 from the two <function>Baz1</function> constructors.
2673
2674
2675 </para>
2676 </listitem>
2677 <listitem>
2678
2679 <para>
2680 You can't pattern-match on an existentially quantified
2681 constructor in a <literal>let</literal> or <literal>where</literal> group of
2682 bindings. So this is illegal:
2683
2684
2685 <programlisting>
2686 f3 x = a==b where { Baz1 a b = x }
2687 </programlisting>
2688
2689 Instead, use a <literal>case</literal> expression:
2690
2691 <programlisting>
2692 f3 x = case x of Baz1 a b -> a==b
2693 </programlisting>
2694
2695 In general, you can only pattern-match
2696 on an existentially-quantified constructor in a <literal>case</literal> expression or
2697 in the patterns of a function definition.
2698
2699 The reason for this restriction is really an implementation one.
2700 Type-checking binding groups is already a nightmare without
2701 existentials complicating the picture. Also an existential pattern
2702 binding at the top level of a module doesn't make sense, because it's
2703 not clear how to prevent the existentially-quantified type "escaping".
2704 So for now, there's a simple-to-state restriction. We'll see how
2705 annoying it is.
2706
2707 </para>
2708 </listitem>
2709 <listitem>
2710
2711 <para>
2712 You can't use existential quantification for <literal>newtype</literal>
2713 declarations. So this is illegal:
2714
2715
2716 <programlisting>
2717 newtype T = forall a. Ord a => MkT a
2718 </programlisting>
2719
2720
2721 Reason: a value of type <literal>T</literal> must be represented as a
2722 pair of a dictionary for <literal>Ord t</literal> and a value of type
2723 <literal>t</literal>. That contradicts the idea that
2724 <literal>newtype</literal> should have no concrete representation.
2725 You can get just the same efficiency and effect by using
2726 <literal>data</literal> instead of <literal>newtype</literal>. If
2727 there is no overloading involved, then there is more of a case for
2728 allowing an existentially-quantified <literal>newtype</literal>,
2729 because the <literal>data</literal> version does carry an
2730 implementation cost, but single-field existentially quantified
2731 constructors aren't much use. So the simple restriction (no
2732 existential stuff on <literal>newtype</literal>) stands, unless there
2733 are convincing reasons to change it.
2734
2735
2736 </para>
2737 </listitem>
2738 <listitem>
2739
2740 <para>
2741 You can't use <literal>deriving</literal> to define instances of a
2742 data type with existentially quantified data constructors.
2743
2744 Reason: in most cases it would not make sense. For example:;
2745
2746 <programlisting>
2747 data T = forall a. MkT [a] deriving( Eq )
2748 </programlisting>
2749
2750 To derive <literal>Eq</literal> in the standard way we would need to have equality
2751 between the single component of two <function>MkT</function> constructors:
2752
2753 <programlisting>
2754 instance Eq T where
2755 (MkT a) == (MkT b) = ???
2756 </programlisting>
2757
2758 But <varname>a</varname> and <varname>b</varname> have distinct types, and so can't be compared.
2759 It's just about possible to imagine examples in which the derived instance
2760 would make sense, but it seems altogether simpler simply to prohibit such
2761 declarations. Define your own instances!
2762 </para>
2763 </listitem>
2764
2765 </itemizedlist>
2766
2767 </para>
2768
2769 </sect3>
2770 </sect2>
2771
2772 <!-- ====================== Generalised algebraic data types ======================= -->
2773
2774 <sect2 id="gadt-style">
2775 <title>Declaring data types with explicit constructor signatures</title>
2776
2777 <para>When the <literal>GADTSyntax</literal> extension is enabled,
2778 GHC allows you to declare an algebraic data type by
2779 giving the type signatures of constructors explicitly. For example:
2780 <programlisting>
2781 data Maybe a where
2782 Nothing :: Maybe a
2783 Just :: a -> Maybe a
2784 </programlisting>
2785 The form is called a "GADT-style declaration"
2786 because Generalised Algebraic Data Types, described in <xref linkend="gadt"/>,
2787 can only be declared using this form.</para>
2788 <para>Notice that GADT-style syntax generalises existential types (<xref linkend="existential-quantification"/>).
2789 For example, these two declarations are equivalent:
2790 <programlisting>
2791 data Foo = forall a. MkFoo a (a -> Bool)
2792 data Foo' where { MKFoo :: a -> (a->Bool) -> Foo' }
2793 </programlisting>
2794 </para>
2795 <para>Any data type that can be declared in standard Haskell-98 syntax
2796 can also be declared using GADT-style syntax.
2797 The choice is largely stylistic, but GADT-style declarations differ in one important respect:
2798 they treat class constraints on the data constructors differently.
2799 Specifically, if the constructor is given a type-class context, that
2800 context is made available by pattern matching. For example:
2801 <programlisting>
2802 data Set a where
2803 MkSet :: Eq a => [a] -> Set a
2804
2805 makeSet :: Eq a => [a] -> Set a
2806 makeSet xs = MkSet (nub xs)
2807
2808 insert :: a -> Set a -> Set a
2809 insert a (MkSet as) | a `elem` as = MkSet as
2810 | otherwise = MkSet (a:as)
2811 </programlisting>
2812 A use of <literal>MkSet</literal> as a constructor (e.g. in the definition of <literal>makeSet</literal>)
2813 gives rise to a <literal>(Eq a)</literal>
2814 constraint, as you would expect. The new feature is that pattern-matching on <literal>MkSet</literal>
2815 (as in the definition of <literal>insert</literal>) makes <emphasis>available</emphasis> an <literal>(Eq a)</literal>
2816 context. In implementation terms, the <literal>MkSet</literal> constructor has a hidden field that stores
2817 the <literal>(Eq a)</literal> dictionary that is passed to <literal>MkSet</literal>; so
2818 when pattern-matching that dictionary becomes available for the right-hand side of the match.
2819 In the example, the equality dictionary is used to satisfy the equality constraint
2820 generated by the call to <literal>elem</literal>, so that the type of
2821 <literal>insert</literal> itself has no <literal>Eq</literal> constraint.
2822 </para>
2823 <para>
2824 For example, one possible application is to reify dictionaries:
2825 <programlisting>
2826 data NumInst a where
2827 MkNumInst :: Num a => NumInst a
2828
2829 intInst :: NumInst Int
2830 intInst = MkNumInst
2831
2832 plus :: NumInst a -> a -> a -> a
2833 plus MkNumInst p q = p + q
2834 </programlisting>
2835 Here, a value of type <literal>NumInst a</literal> is equivalent
2836 to an explicit <literal>(Num a)</literal> dictionary.
2837 </para>
2838 <para>
2839 All this applies to constructors declared using the syntax of <xref linkend="existential-with-context"/>.
2840 For example, the <literal>NumInst</literal> data type above could equivalently be declared
2841 like this:
2842 <programlisting>
2843 data NumInst a
2844 = Num a => MkNumInst (NumInst a)
2845 </programlisting>
2846 Notice that, unlike the situation when declaring an existential, there is
2847 no <literal>forall</literal>, because the <literal>Num</literal> constrains the
2848 data type's universally quantified type variable <literal>a</literal>.
2849 A constructor may have both universal and existential type variables: for example,
2850 the following two declarations are equivalent:
2851 <programlisting>
2852 data T1 a
2853 = forall b. (Num a, Eq b) => MkT1 a b
2854 data T2 a where
2855 MkT2 :: (Num a, Eq b) => a -> b -> T2 a
2856 </programlisting>
2857 </para>
2858 <para>All this behaviour contrasts with Haskell 98's peculiar treatment of
2859 contexts on a data type declaration (Section 4.2.1 of the Haskell 98 Report).
2860 In Haskell 98 the definition
2861 <programlisting>
2862 data Eq a => Set' a = MkSet' [a]
2863 </programlisting>
2864 gives <literal>MkSet'</literal> the same type as <literal>MkSet</literal> above. But instead of
2865 <emphasis>making available</emphasis> an <literal>(Eq a)</literal> constraint, pattern-matching
2866 on <literal>MkSet'</literal> <emphasis>requires</emphasis> an <literal>(Eq a)</literal> constraint!
2867 GHC faithfully implements this behaviour, odd though it is. But for GADT-style declarations,
2868 GHC's behaviour is much more useful, as well as much more intuitive.
2869 </para>
2870
2871 <para>
2872 The rest of this section gives further details about GADT-style data
2873 type declarations.
2874
2875 <itemizedlist>
2876 <listitem><para>
2877 The result type of each data constructor must begin with the type constructor being defined.
2878 If the result type of all constructors
2879 has the form <literal>T a1 ... an</literal>, where <literal>a1 ... an</literal>
2880 are distinct type variables, then the data type is <emphasis>ordinary</emphasis>;
2881 otherwise is a <emphasis>generalised</emphasis> data type (<xref linkend="gadt"/>).
2882 </para></listitem>
2883
2884 <listitem><para>
2885 As with other type signatures, you can give a single signature for several data constructors.
2886 In this example we give a single signature for <literal>T1</literal> and <literal>T2</literal>:
2887 <programlisting>
2888 data T a where
2889 T1,T2 :: a -> T a
2890 T3 :: T a
2891 </programlisting>
2892 </para></listitem>
2893
2894 <listitem><para>
2895 The type signature of
2896 each constructor is independent, and is implicitly universally quantified as usual.
2897 In particular, the type variable(s) in the "<literal>data T a where</literal>" header
2898 have no scope, and different constructors may have different universally-quantified type variables:
2899 <programlisting>
2900 data T a where -- The 'a' has no scope
2901 T1,T2 :: b -> T b -- Means forall b. b -> T b
2902 T3 :: T a -- Means forall a. T a
2903 </programlisting>
2904 </para></listitem>
2905
2906 <listitem><para>
2907 A constructor signature may mention type class constraints, which can differ for
2908 different constructors. For example, this is fine:
2909 <programlisting>
2910 data T a where
2911 T1 :: Eq b => b -> b -> T b
2912 T2 :: (Show c, Ix c) => c -> [c] -> T c
2913 </programlisting>
2914 When pattern matching, these constraints are made available to discharge constraints
2915 in the body of the match. For example:
2916 <programlisting>
2917 f :: T a -> String
2918 f (T1 x y) | x==y = "yes"
2919 | otherwise = "no"
2920 f (T2 a b) = show a
2921 </programlisting>
2922 Note that <literal>f</literal> is not overloaded; the <literal>Eq</literal> constraint arising
2923 from the use of <literal>==</literal> is discharged by the pattern match on <literal>T1</literal>
2924 and similarly the <literal>Show</literal> constraint arising from the use of <literal>show</literal>.
2925 </para></listitem>
2926
2927 <listitem><para>
2928 Unlike a Haskell-98-style
2929 data type declaration, the type variable(s) in the "<literal>data Set a where</literal>" header
2930 have no scope. Indeed, one can write a kind signature instead:
2931 <programlisting>
2932 data Set :: * -> * where ...
2933 </programlisting>
2934 or even a mixture of the two:
2935 <programlisting>
2936 data Bar a :: (* -> *) -> * where ...
2937 </programlisting>
2938 The type variables (if given) may be explicitly kinded, so we could also write the header for <literal>Foo</literal>
2939 like this:
2940 <programlisting>
2941 data Bar a (b :: * -> *) where ...
2942 </programlisting>
2943 </para></listitem>
2944
2945
2946 <listitem><para>
2947 You can use strictness annotations, in the obvious places
2948 in the constructor type:
2949 <programlisting>
2950 data Term a where
2951 Lit :: !Int -> Term Int
2952 If :: Term Bool -> !(Term a) -> !(Term a) -> Term a
2953 Pair :: Term a -> Term b -> Term (a,b)
2954 </programlisting>
2955 </para></listitem>
2956
2957 <listitem><para>
2958 You can use a <literal>deriving</literal> clause on a GADT-style data type
2959 declaration. For example, these two declarations are equivalent
2960 <programlisting>
2961 data Maybe1 a where {
2962 Nothing1 :: Maybe1 a ;
2963 Just1 :: a -> Maybe1 a
2964 } deriving( Eq, Ord )
2965
2966 data Maybe2 a = Nothing2 | Just2 a
2967 deriving( Eq, Ord )
2968 </programlisting>
2969 </para></listitem>
2970
2971 <listitem><para>
2972 The type signature may have quantified type variables that do not appear
2973 in the result type:
2974 <programlisting>
2975 data Foo where
2976 MkFoo :: a -> (a->Bool) -> Foo
2977 Nil :: Foo
2978 </programlisting>
2979 Here the type variable <literal>a</literal> does not appear in the result type
2980 of either constructor.
2981 Although it is universally quantified in the type of the constructor, such
2982 a type variable is often called "existential".
2983 Indeed, the above declaration declares precisely the same type as
2984 the <literal>data Foo</literal> in <xref linkend="existential-quantification"/>.
2985 </para><para>
2986 The type may contain a class context too, of course:
2987 <programlisting>
2988 data Showable where
2989 MkShowable :: Show a => a -> Showable
2990 </programlisting>
2991 </para></listitem>
2992
2993 <listitem><para>
2994 You can use record syntax on a GADT-style data type declaration:
2995
2996 <programlisting>
2997 data Person where
2998 Adult :: { name :: String, children :: [Person] } -> Person
2999 Child :: Show a => { name :: !String, funny :: a } -> Person
3000 </programlisting>
3001 As usual, for every constructor that has a field <literal>f</literal>, the type of
3002 field <literal>f</literal> must be the same (modulo alpha conversion).
3003 The <literal>Child</literal> constructor above shows that the signature
3004 may have a context, existentially-quantified variables, and strictness annotations,
3005 just as in the non-record case. (NB: the "type" that follows the double-colon
3006 is not really a type, because of the record syntax and strictness annotations.
3007 A "type" of this form can appear only in a constructor signature.)
3008 </para></listitem>
3009
3010 <listitem><para>
3011 Record updates are allowed with GADT-style declarations,
3012 only fields that have the following property: the type of the field
3013 mentions no existential type variables.
3014 </para></listitem>
3015
3016 <listitem><para>
3017 As in the case of existentials declared using the Haskell-98-like record syntax
3018 (<xref linkend="existential-records"/>),
3019 record-selector functions are generated only for those fields that have well-typed
3020 selectors.
3021 Here is the example of that section, in GADT-style syntax:
3022 <programlisting>
3023 data Counter a where
3024 NewCounter { _this :: self
3025 , _inc :: self -> self
3026 , _display :: self -> IO ()
3027 , tag :: a
3028 }
3029 :: Counter a
3030 </programlisting>
3031 As before, only one selector function is generated here, that for <literal>tag</literal>.
3032 Nevertheless, you can still use all the field names in pattern matching and record construction.
3033 </para></listitem>
3034
3035 <listitem><para>
3036 In a GADT-style data type declaration there is no obvious way to specify that a data constructor
3037 should be infix, which makes a difference if you derive <literal>Show</literal> for the type.
3038 (Data constructors declared infix are displayed infix by the derived <literal>show</literal>.)
3039 So GHC implements the following design: a data constructor declared in a GADT-style data type
3040 declaration is displayed infix by <literal>Show</literal> iff (a) it is an operator symbol,
3041 (b) it has two arguments, (c) it has a programmer-supplied fixity declaration. For example
3042 <programlisting>
3043 infix 6 (:--:)
3044 data T a where
3045 (:--:) :: Int -> Bool -> T Int
3046 </programlisting>
3047 </para></listitem>
3048 </itemizedlist></para>
3049 </sect2>
3050
3051 <sect2 id="gadt">
3052 <title>Generalised Algebraic Data Types (GADTs)</title>
3053
3054 <para>Generalised Algebraic Data Types generalise ordinary algebraic data types
3055 by allowing constructors to have richer return types. Here is an example:
3056 <programlisting>
3057 data Term a where
3058 Lit :: Int -> Term Int
3059 Succ :: Term Int -> Term Int
3060 IsZero :: Term Int -> Term Bool
3061 If :: Term Bool -> Term a -> Term a -> Term a
3062 Pair :: Term a -> Term b -> Term (a,b)
3063 </programlisting>
3064 Notice that the return type of the constructors is not always <literal>Term a</literal>, as is the
3065 case with ordinary data types. This generality allows us to
3066 write a well-typed <literal>eval</literal> function
3067 for these <literal>Terms</literal>:
3068 <programlisting>
3069 eval :: Term a -> a
3070 eval (Lit i) = i
3071 eval (Succ t) = 1 + eval t
3072 eval (IsZero t) = eval t == 0
3073 eval (If b e1 e2) = if eval b then eval e1 else eval e2
3074 eval (Pair e1 e2) = (eval e1, eval e2)
3075 </programlisting>
3076 The key point about GADTs is that <emphasis>pattern matching causes type refinement</emphasis>.
3077 For example, in the right hand side of the equation
3078 <programlisting>
3079 eval :: Term a -> a
3080 eval (Lit i) = ...
3081 </programlisting>
3082 the type <literal>a</literal> is refined to <literal>Int</literal>. That's the whole point!
3083 A precise specification of the type rules is beyond what this user manual aspires to,
3084 but the design closely follows that described in
3085 the paper <ulink
3086 url="http://research.microsoft.com/%7Esimonpj/papers/gadt/">Simple
3087 unification-based type inference for GADTs</ulink>,
3088 (ICFP 2006).
3089 The general principle is this: <emphasis>type refinement is only carried out
3090 based on user-supplied type annotations</emphasis>.
3091 So if no type signature is supplied for <literal>eval</literal>, no type refinement happens,
3092 and lots of obscure error messages will
3093 occur. However, the refinement is quite general. For example, if we had:
3094 <programlisting>
3095 eval :: Term a -> a -> a
3096 eval (Lit i) j = i+j
3097 </programlisting>
3098 the pattern match causes the type <literal>a</literal> to be refined to <literal>Int</literal> (because of the type
3099 of the constructor <literal>Lit</literal>), and that refinement also applies to the type of <literal>j</literal>, and
3100 the result type of the <literal>case</literal> expression. Hence the addition <literal>i+j</literal> is legal.
3101 </para>
3102 <para>
3103 These and many other examples are given in papers by Hongwei Xi, and
3104 Tim Sheard. There is a longer introduction
3105 <ulink url="http://www.haskell.org/haskellwiki/GADT">on the wiki</ulink>,
3106 and Ralf Hinze's
3107 <ulink url="http://www.informatik.uni-bonn.de/~ralf/publications/With.pdf">Fun with phantom types</ulink> also has a number of examples. Note that papers
3108 may use different notation to that implemented in GHC.
3109 </para>
3110 <para>
3111 The rest of this section outlines the extensions to GHC that support GADTs. The extension is enabled with
3112 <option>-XGADTs</option>. The <option>-XGADTs</option> flag also sets <option>-XRelaxedPolyRec</option>.
3113 <itemizedlist>
3114 <listitem><para>
3115 A GADT can only be declared using GADT-style syntax (<xref linkend="gadt-style"/>);
3116 the old Haskell-98 syntax for data declarations always declares an ordinary data type.
3117 The result type of each constructor must begin with the type constructor being defined,
3118 but for a GADT the arguments to the type constructor can be arbitrary monotypes.
3119 For example, in the <literal>Term</literal> data
3120 type above, the type of each constructor must end with <literal>Term ty</literal>, but
3121 the <literal>ty</literal> need not be a type variable (e.g. the <literal>Lit</literal>
3122 constructor).
3123 </para></listitem>
3124
3125 <listitem><para>
3126 It is permitted to declare an ordinary algebraic data type using GADT-style syntax.
3127 What makes a GADT into a GADT is not the syntax, but rather the presence of data constructors
3128 whose result type is not just <literal>T a b</literal>.
3129 </para></listitem>
3130
3131 <listitem><para>
3132 You cannot use a <literal>deriving</literal> clause for a GADT; only for
3133 an ordinary data type.
3134 </para></listitem>
3135
3136 <listitem><para>
3137 As mentioned in <xref linkend="gadt-style"/>, record syntax is supported.
3138 For example:
3139 <programlisting>
3140 data Term a where
3141 Lit { val :: Int } :: Term Int
3142 Succ { num :: Term Int } :: Term Int
3143 Pred { num :: Term Int } :: Term Int
3144 IsZero { arg :: Term Int } :: Term Bool
3145 Pair { arg1 :: Term a
3146 , arg2 :: Term b
3147 } :: Term (a,b)
3148 If { cnd :: Term Bool
3149 , tru :: Term a
3150 , fls :: Term a
3151 } :: Term a
3152 </programlisting>
3153 However, for GADTs there is the following additional constraint:
3154 every constructor that has a field <literal>f</literal> must have
3155 the same result type (modulo alpha conversion)
3156 Hence, in the above example, we cannot merge the <literal>num</literal>
3157 and <literal>arg</literal> fields above into a
3158 single name. Although their field types are both <literal>Term Int</literal>,
3159 their selector functions actually have different types:
3160
3161 <programlisting>
3162 num :: Term Int -> Term Int
3163 arg :: Term Bool -> Term Int
3164 </programlisting>
3165 </para></listitem>
3166
3167 <listitem><para>
3168 When pattern-matching against data constructors drawn from a GADT,
3169 for example in a <literal>case</literal> expression, the following rules apply:
3170 <itemizedlist>
3171 <listitem><para>The type of the scrutinee must be rigid.</para></listitem>
3172 <listitem><para>The type of the entire <literal>case</literal> expression must be rigid.</para></listitem>
3173 <listitem><para>The type of any free variable mentioned in any of
3174 the <literal>case</literal> alternatives must be rigid.</para></listitem>
3175 </itemizedlist>
3176 A type is "rigid" if it is completely known to the compiler at its binding site. The easiest
3177 way to ensure that a variable a rigid type is to give it a type signature.
3178 For more precise details see <ulink url="http://research.microsoft.com/%7Esimonpj/papers/gadt">
3179 Simple unification-based type inference for GADTs
3180 </ulink>. The criteria implemented by GHC are given in the Appendix.
3181
3182 </para></listitem>
3183
3184 </itemizedlist>
3185 </para>
3186
3187 </sect2>
3188 </sect1>
3189
3190 <!-- ====================== End of Generalised algebraic data types ======================= -->
3191
3192 <sect1 id="deriving">
3193 <title>Extensions to the "deriving" mechanism</title>
3194
3195 <sect2 id="deriving-inferred">
3196 <title>Inferred context for deriving clauses</title>
3197
3198 <para>
3199 The Haskell Report is vague about exactly when a <literal>deriving</literal> clause is
3200 legal. For example:
3201 <programlisting>
3202 data T0 f a = MkT0 a deriving( Eq )
3203 data T1 f a = MkT1 (f a) deriving( Eq )
3204 data T2 f a = MkT2 (f (f a)) deriving( Eq )
3205 </programlisting>
3206 The natural generated <literal>Eq</literal> code would result in these instance declarations:
3207 <programlisting>
3208 instance Eq a => Eq (T0 f a) where ...
3209 instance Eq (f a) => Eq (T1 f a) where ...
3210 instance Eq (f (f a)) => Eq (T2 f a) where ...
3211 </programlisting>
3212 The first of these is obviously fine. The second is still fine, although less obviously.
3213 The third is not Haskell 98, and risks losing termination of instances.
3214 </para>
3215 <para>
3216 GHC takes a conservative position: it accepts the first two, but not the third. The rule is this:
3217 each constraint in the inferred instance context must consist only of type variables,
3218 with no repetitions.
3219 </para>
3220 <para>
3221 This rule is applied regardless of flags. If you want a more exotic context, you can write
3222 it yourself, using the <link linkend="stand-alone-deriving">standalone deriving mechanism</link>.
3223 </para>
3224 </sect2>
3225
3226 <sect2 id="stand-alone-deriving">
3227 <title>Stand-alone deriving declarations</title>
3228
3229 <para>
3230 GHC now allows stand-alone <literal>deriving</literal> declarations, enabled by <literal>-XStandaloneDeriving</literal>:
3231 <programlisting>
3232 data Foo a = Bar a | Baz String
3233
3234 deriving instance Eq a => Eq (Foo a)
3235 </programlisting>
3236 The syntax is identical to that of an ordinary instance declaration apart from (a) the keyword
3237 <literal>deriving</literal>, and (b) the absence of the <literal>where</literal> part.
3238 Note the following points:
3239 <itemizedlist>
3240 <listitem><para>
3241 You must supply an explicit context (in the example the context is <literal>(Eq a)</literal>),
3242 exactly as you would in an ordinary instance declaration.
3243 (In contrast, in a <literal>deriving</literal> clause
3244 attached to a data type declaration, the context is inferred.)
3245 </para></listitem>
3246
3247 <listitem><para>
3248 A <literal>deriving instance</literal> declaration
3249 must obey the same rules concerning form and termination as ordinary instance declarations,
3250 controlled by the same flags; see <xref linkend="instance-decls"/>.
3251 </para></listitem>
3252
3253 <listitem><para>
3254 Unlike a <literal>deriving</literal>
3255 declaration attached to a <literal>data</literal> declaration, the instance can be more specific
3256 than the data type (assuming you also use
3257 <literal>-XFlexibleInstances</literal>, <xref linkend="instance-rules"/>). Consider
3258 for example
3259 <programlisting>
3260 data Foo a = Bar a | Baz String
3261
3262 deriving instance Eq a => Eq (Foo [a])
3263 deriving instance Eq a => Eq (Foo (Maybe a))
3264 </programlisting>
3265 This will generate a derived instance for <literal>(Foo [a])</literal> and <literal>(Foo (Maybe a))</literal>,
3266 but other types such as <literal>(Foo (Int,Bool))</literal> will not be an instance of <literal>Eq</literal>.
3267 </para></listitem>
3268
3269 <listitem><para>
3270 Unlike a <literal>deriving</literal>
3271 declaration attached to a <literal>data</literal> declaration,
3272 GHC does not restrict the form of the data type. Instead, GHC simply generates the appropriate
3273 boilerplate code for the specified class, and typechecks it. If there is a type error, it is
3274 your problem. (GHC will show you the offending code if it has a type error.)
3275 The merit of this is that you can derive instances for GADTs and other exotic
3276 data types, providing only that the boilerplate code does indeed typecheck. For example:
3277 <programlisting>
3278 data T a where
3279 T1 :: T Int
3280 T2 :: T Bool
3281
3282 deriving instance Show (T a)
3283 </programlisting>
3284 In this example, you cannot say <literal>... deriving( Show )</literal> on the
3285 data type declaration for <literal>T</literal>,
3286 because <literal>T</literal> is a GADT, but you <emphasis>can</emphasis> generate
3287 the instance declaration using stand-alone deriving.
3288 </para>
3289 </listitem>
3290
3291 <listitem>
3292 <para>The stand-alone syntax is generalised for newtypes in exactly the same
3293 way that ordinary <literal>deriving</literal> clauses are generalised (<xref linkend="newtype-deriving"/>).
3294 For example:
3295 <programlisting>
3296 newtype Foo a = MkFoo (State Int a)
3297
3298 deriving instance MonadState Int Foo
3299 </programlisting>
3300 GHC always treats the <emphasis>last</emphasis> parameter of the instance
3301 (<literal>Foo</literal> in this example) as the type whose instance is being derived.
3302 </para></listitem>
3303 </itemizedlist></para>
3304
3305 </sect2>
3306
3307
3308 <sect2 id="deriving-typeable">
3309 <title>Deriving clause for extra classes (<literal>Typeable</literal>, <literal>Data</literal>, etc)</title>
3310
3311 <para>
3312 Haskell 98 allows the programmer to add "<literal>deriving( Eq, Ord )</literal>" to a data type
3313 declaration, to generate a standard instance declaration for classes specified in the <literal>deriving</literal> clause.
3314 In Haskell 98, the only classes that may appear in the <literal>deriving</literal> clause are the standard
3315 classes <literal>Eq</literal>, <literal>Ord</literal>,
3316 <literal>Enum</literal>, <literal>Ix</literal>, <literal>Bounded</literal>, <literal>Read</literal>, and <literal>Show</literal>.
3317 </para>
3318 <para>
3319 GHC extends this list with several more classes that may be automatically derived:
3320 <itemizedlist>
3321 <listitem><para> With <option>-XDeriveDataTypeable</option>, you can derive instances of the classes
3322 <literal>Typeable</literal>, and <literal>Data</literal>, defined in the library
3323 modules <literal>Data.Typeable</literal> and <literal>Data.Generics</literal> respectively.
3324 </para>
3325 <para>An instance of <literal>Typeable</literal> can only be derived if the
3326 data type has seven or fewer type parameters, all of kind <literal>*</literal>.
3327 The reason for this is that the <literal>Typeable</literal> class is derived using the scheme
3328 described in
3329 <ulink url="http://research.microsoft.com/%7Esimonpj/papers/hmap/gmap2.ps">
3330 Scrap More Boilerplate: Reflection, Zips, and Generalised Casts
3331 </ulink>.
3332 (Section 7.4 of the paper describes the multiple <literal>Typeable</literal> classes that
3333 are used, and only <literal>Typeable1</literal> up to
3334 <literal>Typeable7</literal> are provided in the library.)
3335 In other cases, there is nothing to stop the programmer writing a <literal>TypeableX</literal>
3336 class, whose kind suits that of the data type constructor, and
3337 then writing the data type instance by hand.
3338 </para>
3339 </listitem>
3340
3341 <listitem><para> With <option>-XDeriveGeneric</option>, you can derive
3342 instances of the class <literal>Generic</literal>, defined in
3343 <literal>GHC.Generics</literal>. You can use these to define generic functions,
3344 as described in <xref linkend="generic-programming"/>.
3345 </para></listitem>
3346
3347 <listitem><para> With <option>-XDeriveFunctor</option>, you can derive instances of
3348 the class <literal>Functor</literal>,
3349 defined in <literal>GHC.Base</literal>.
3350 </para></listitem>
3351
3352 <listitem><para> With <option>-XDeriveFoldable</option>, you can derive instances of
3353 the class <literal>Foldable</literal>,
3354 defined in <literal>Data.Foldable</literal>.
3355 </para></listitem>
3356
3357 <listitem><para> With <option>-XDeriveTraversable</option>, you can derive instances of
3358 the class <literal>Traversable</literal>,
3359 defined in <literal>Data.Traversable</literal>.
3360 </para></listitem>
3361 </itemizedlist>
3362 In each case the appropriate class must be in scope before it
3363 can be mentioned in the <literal>deriving</literal> clause.
3364 </para>
3365 </sect2>
3366
3367 <sect2 id="newtype-deriving">
3368 <title>Generalised derived instances for newtypes</title>
3369
3370 <para>
3371 When you define an abstract type using <literal>newtype</literal>, you may want
3372 the new type to inherit some instances from its representation. In
3373 Haskell 98, you can inherit instances of <literal>Eq</literal>, <literal>Ord</literal>,
3374 <literal>Enum</literal> and <literal>Bounded</literal> by deriving them, but for any
3375 other classes you have to write an explicit instance declaration. For
3376 example, if you define
3377
3378 <programlisting>
3379 newtype Dollars = Dollars Int
3380 </programlisting>
3381
3382 and you want to use arithmetic on <literal>Dollars</literal>, you have to
3383 explicitly define an instance of <literal>Num</literal>:
3384
3385 <programlisting>
3386 instance Num Dollars where
3387 Dollars a + Dollars b = Dollars (a+b)
3388 ...
3389 </programlisting>
3390 All the instance does is apply and remove the <literal>newtype</literal>
3391 constructor. It is particularly galling that, since the constructor
3392 doesn't appear at run-time, this instance declaration defines a
3393 dictionary which is <emphasis>wholly equivalent</emphasis> to the <literal>Int</literal>
3394 dictionary, only slower!
3395 </para>
3396
3397
3398 <sect3> <title> Generalising the deriving clause </title>
3399 <para>
3400 GHC now permits such instances to be derived instead,
3401 using the flag <option>-XGeneralizedNewtypeDeriving</option>,
3402 so one can write
3403 <programlisting>
3404 newtype Dollars = Dollars Int deriving (Eq,Show,Num)
3405 </programlisting>
3406
3407 and the implementation uses the <emphasis>same</emphasis> <literal>Num</literal> dictionary
3408 for <literal>Dollars</literal> as for <literal>Int</literal>. Notionally, the compiler
3409 derives an instance declaration of the form
3410
3411 <programlisting>
3412 instance Num Int => Num Dollars
3413 </programlisting>
3414
3415 which just adds or removes the <literal>newtype</literal> constructor according to the type.
3416 </para>
3417 <para>
3418
3419 We can also derive instances of constructor classes in a similar
3420 way. For example, suppose we have implemented state and failure monad
3421 transformers, such that
3422
3423 <programlisting>
3424 instance Monad m => Monad (State s m)
3425 instance Monad m => Monad (Failure m)
3426 </programlisting>
3427 In Haskell 98, we can define a parsing monad by
3428 <programlisting>
3429 type Parser tok m a = State [tok] (Failure m) a
3430 </programlisting>
3431
3432 which is automatically a monad thanks to the instance declarations
3433 above. With the extension, we can make the parser type abstract,
3434 without needing to write an instance of class <literal>Monad</literal>, via
3435
3436 <programlisting>
3437 newtype Parser tok m a = Parser (State [tok] (Failure m) a)
3438 deriving Monad
3439 </programlisting>
3440 In this case the derived instance declaration is of the form
3441 <programlisting>
3442 instance Monad (State [tok] (Failure m)) => Monad (Parser tok m)
3443 </programlisting>
3444
3445 Notice that, since <literal>Monad</literal> is a constructor class, the
3446 instance is a <emphasis>partial application</emphasis> of the new type, not the
3447 entire left hand side. We can imagine that the type declaration is
3448 "eta-converted" to generate the context of the instance
3449 declaration.
3450 </para>
3451 <para>
3452
3453 We can even derive instances of multi-parameter classes, provided the
3454 newtype is the last class parameter. In this case, a ``partial
3455 application'' of the class appears in the <literal>deriving</literal>
3456 clause. For example, given the class
3457
3458 <programlisting>
3459 class StateMonad s m | m -> s where ...
3460 instance Monad m => StateMonad s (State s m) where ...
3461 </programlisting>
3462 then we can derive an instance of <literal>StateMonad</literal> for <literal>Parser</literal>s by
3463 <programlisting>
3464 newtype Parser tok m a = Parser (State [tok] (Failure m) a)
3465 deriving (Monad, StateMonad [tok])
3466 </programlisting>
3467
3468 The derived instance is obtained by completing the application of the
3469 class to the new type:
3470
3471 <programlisting>
3472 instance StateMonad [tok] (State [tok] (Failure m)) =>
3473 StateMonad [tok] (Parser tok m)
3474 </programlisting>
3475 </para>
3476 <para>
3477
3478 As a result of this extension, all derived instances in newtype
3479 declarations are treated uniformly (and implemented just by reusing
3480 the dictionary for the representation type), <emphasis>except</emphasis>
3481 <literal>Show</literal> and <literal>Read</literal>, which really behave differently for
3482 the newtype and its representation.
3483 </para>
3484 </sect3>
3485
3486 <sect3> <title> A more precise specification </title>
3487 <para>
3488 Derived instance declarations are constructed as follows. Consider the
3489 declaration (after expansion of any type synonyms)
3490
3491 <programlisting>
3492 newtype T v1...vn = T' (t vk+1...vn) deriving (c1...cm)
3493 </programlisting>
3494
3495 where
3496 <itemizedlist>
3497 <listitem><para>
3498 The <literal>ci</literal> are partial applications of
3499 classes of the form <literal>C t1'...tj'</literal>, where the arity of <literal>C</literal>
3500 is exactly <literal>j+1</literal>. That is, <literal>C</literal> lacks exactly one type argument.
3501 </para></listitem>
3502 <listitem><para>
3503 The <literal>k</literal> is chosen so that <literal>ci (T v1...vk)</literal> is well-kinded.
3504 </para></listitem>
3505 <listitem><para>
3506 The type <literal>t</literal> is an arbitrary type.
3507 </para></listitem>
3508 <listitem><para>
3509 The type variables <literal>vk+1...vn</literal> do not occur in <literal>t</literal>,
3510 nor in the <literal>ci</literal>, and
3511 </para></listitem>
3512 <listitem><para>
3513 None of the <literal>ci</literal> is <literal>Read</literal>, <literal>Show</literal>,
3514 <literal>Typeable</literal>, or <literal>Data</literal>. These classes
3515 should not "look through" the type or its constructor. You can still
3516 derive these classes for a newtype, but it happens in the usual way, not
3517 via this new mechanism.
3518 </para></listitem>
3519 </itemizedlist>
3520 Then, for each <literal>ci</literal>, the derived instance
3521 declaration is:
3522 <programlisting>
3523 instance ci t => ci (T v1...vk)
3524 </programlisting>
3525 As an example which does <emphasis>not</emphasis> work, consider
3526 <programlisting>
3527 newtype NonMonad m s = NonMonad (State s m s) deriving Monad
3528 </programlisting>
3529 Here we cannot derive the instance
3530 <programlisting>
3531 instance Monad (State s m) => Monad (NonMonad m)
3532 </programlisting>
3533
3534 because the type variable <literal>s</literal> occurs in <literal>State s m</literal>,
3535 and so cannot be "eta-converted" away. It is a good thing that this
3536 <literal>deriving</literal> clause is rejected, because <literal>NonMonad m</literal> is
3537 not, in fact, a monad --- for the same reason. Try defining
3538 <literal>>>=</literal> with the correct type: you won't be able to.
3539 </para>
3540 <para>
3541
3542 Notice also that the <emphasis>order</emphasis> of class parameters becomes
3543 important, since we can only derive instances for the last one. If the
3544 <literal>StateMonad</literal> class above were instead defined as
3545
3546 <programlisting>
3547 class StateMonad m s | m -> s where ...
3548 </programlisting>
3549
3550 then we would not have been able to derive an instance for the
3551 <literal>Parser</literal> type above. We hypothesise that multi-parameter
3552 classes usually have one "main" parameter for which deriving new
3553 instances is most interesting.
3554 </para>
3555 <para>Lastly, all of this applies only for classes other than
3556 <literal>Read</literal>, <literal>Show</literal>, <literal>Typeable</literal>,
3557 and <literal>Data</literal>, for which the built-in derivation applies (section
3558 4.3.3. of the Haskell Report).
3559 (For the standard classes <literal>Eq</literal>, <literal>Ord</literal>,
3560 <literal>Ix</literal>, and <literal>Bounded</literal> it is immaterial whether
3561 the standard method is used or the one described here.)
3562 </para>
3563 </sect3>
3564 </sect2>
3565 </sect1>
3566
3567
3568 <!-- TYPE SYSTEM EXTENSIONS -->
3569 <sect1 id="type-class-extensions">
3570 <title>Class and instances declarations</title>
3571
3572 <sect2 id="multi-param-type-classes">
3573 <title>Class declarations</title>
3574
3575 <para>
3576 This section, and the next one, documents GHC's type-class extensions.
3577 There's lots of background in the paper <ulink
3578 url="http://research.microsoft.com/~simonpj/Papers/type-class-design-space/">Type
3579 classes: exploring the design space</ulink> (Simon Peyton Jones, Mark
3580 Jones, Erik Meijer).
3581 </para>
3582
3583 <sect3>
3584 <title>Multi-parameter type classes</title>
3585 <para>
3586 Multi-parameter type classes are permitted, with flag <option>-XMultiParamTypeClasses</option>.
3587 For example:
3588
3589
3590 <programlisting>
3591 class Collection c a where
3592 union :: c a -> c a -> c a
3593 ...etc.
3594 </programlisting>
3595
3596 </para>
3597 </sect3>
3598
3599 <sect3 id="superclass-rules">
3600 <title>The superclasses of a class declaration</title>
3601
3602 <para>
3603 In Haskell 98 the context of a class declaration (which introduces superclasses)
3604 must be simple; that is, each predicate must consist of a class applied to
3605 type variables. The flag <option>-XFlexibleContexts</option>
3606 (<xref linkend="flexible-contexts"/>)
3607 lifts this restriction,
3608 so that the only restriction on the context in a class declaration is
3609 that the class hierarchy must be acyclic. So these class declarations are OK:
3610
3611
3612 <programlisting>
3613 class Functor (m k) => FiniteMap m k where
3614 ...
3615
3616 class (Monad m, Monad (t m)) => Transform t m where
3617 lift :: m a -> (t m) a
3618 </programlisting>
3619
3620
3621 </para>
3622 <para>
3623 As in Haskell 98, The class hierarchy must be acyclic. However, the definition
3624 of "acyclic" involves only the superclass relationships. For example,
3625 this is OK:
3626
3627
3628 <programlisting>
3629 class C a where {
3630 op :: D b => a -> b -> b
3631 }
3632
3633 class C a => D a where { ... }
3634 </programlisting>
3635
3636
3637 Here, <literal>C</literal> is a superclass of <literal>D</literal>, but it's OK for a
3638 class operation <literal>op</literal> of <literal>C</literal> to mention <literal>D</literal>. (It
3639 would not be OK for <literal>D</literal> to be a superclass of <literal>C</literal>.)
3640 </para>
3641 <para>
3642 With the extension that adds a <link linkend="constraint-kind">kind of constraints</link>,
3643 you can write more exotic superclass definitions. The superclass cycle check is even more
3644 liberal in these case. For example, this is OK:
3645
3646 <programlisting>
3647 class A cls c where
3648 meth :: cls c => c -> c
3649
3650 class A B c => B c where
3651 </programlisting>
3652
3653 A superclass context for a class <literal>C</literal> is allowed if, after expanding
3654 type synonyms to their right-hand-sides, and uses of classes (other than <literal>C</literal>)
3655 to their superclasses, <literal>C</literal> does not occur syntactically in the context.
3656 </para>
3657 </sect3>
3658
3659
3660
3661
3662 <sect3 id="class-method-types">
3663 <title>Class method types</title>
3664
3665 <para>
3666 Haskell 98 prohibits class method types to mention constraints on the
3667 class type variable, thus:
3668 <programlisting>
3669 class Seq s a where
3670 fromList :: [a] -> s a
3671 elem :: Eq a => a -> s a -> Bool
3672 </programlisting>
3673 The type of <literal>elem</literal> is illegal in Haskell 98, because it
3674 contains the constraint <literal>Eq a</literal>, constrains only the
3675 class type variable (in this case <literal>a</literal>).
3676 GHC lifts this restriction (flag <option>-XConstrainedClassMethods</option>).
3677 </para>
3678
3679
3680 </sect3>
3681
3682
3683 <sect3 id="class-default-signatures">
3684 <title>Default method signatures</title>
3685
3686 <para>
3687 Haskell 98 allows you to define a default implementation when declaring a class:
3688 <programlisting>
3689 class Enum a where
3690 enum :: [a]
3691 enum = []
3692 </programlisting>
3693 The type of the <literal>enum</literal> method is <literal>[a]</literal>, and
3694 this is also the type of the default method. You can lift this restriction
3695 and give another type to the default method using the flag
3696 <option>-XDefaultSignatures</option>. For instance, if you have written a
3697 generic implementation of enumeration in a class <literal>GEnum</literal>
3698 with method <literal>genum</literal> in terms of <literal>GHC.Generics</literal>,
3699 you can specify a default method that uses that generic implementation:
3700 <programlisting>
3701 class Enum a where
3702 enum :: [a]
3703 default enum :: (Generic a, GEnum (Rep a)) => [a]
3704 enum = map to genum
3705 </programlisting>
3706 We reuse the keyword <literal>default</literal> to signal that a signature
3707 applies to the default method only; when defining instances of the
3708 <literal>Enum</literal> class, the original type <literal>[a]</literal> of
3709 <literal>enum</literal> still applies. When giving an empty instance, however,
3710 the default implementation <literal>map to0 genum</literal> is filled-in,
3711 and type-checked with the type
3712 <literal>(Generic a, GEnum (Rep a)) => [a]</literal>.
3713 </para>
3714
3715 <para>
3716 We use default signatures to simplify generic programming in GHC
3717 (<xref linkend="generic-programming"/>).
3718 </para>
3719
3720
3721 </sect3>
3722 </sect2>
3723
3724 <sect2 id="functional-dependencies">
3725 <title>Functional dependencies
3726 </title>
3727
3728 <para> Functional dependencies are implemented as described by Mark Jones
3729 in &ldquo;<ulink url="http://citeseer.ist.psu.edu/jones00type.html">Type Classes with Functional Dependencies</ulink>&rdquo;, Mark P. Jones,
3730 In Proceedings of the 9th European Symposium on Programming,
3731 ESOP 2000, Berlin, Germany, March 2000, Springer-Verlag LNCS 1782,
3732 .
3733 </para>
3734 <para>
3735 Functional dependencies are introduced by a vertical bar in the syntax of a
3736 class declaration; e.g.
3737 <programlisting>
3738 class (Monad m) => MonadState s m | m -> s where ...
3739
3740 class Foo a b c | a b -> c where ...
3741 </programlisting>
3742 There should be more documentation, but there isn't (yet). Yell if you need it.
3743 </para>
3744
3745 <sect3><title>Rules for functional dependencies </title>
3746 <para>
3747 In a class declaration, all of the class type variables must be reachable (in the sense
3748 mentioned in <xref linkend="flexible-contexts"/>)
3749 from the free variables of each method type.
3750 For example:
3751
3752 <programlisting>
3753 class Coll s a where
3754 empty :: s
3755 insert :: s -> a -> s
3756 </programlisting>
3757
3758 is not OK, because the type of <literal>empty</literal> doesn't mention
3759 <literal>a</literal>. Functional dependencies can make the type variable
3760 reachable:
3761 <programlisting>
3762 class Coll s a | s -> a where
3763 empty :: s
3764 insert :: s -> a -> s
3765 </programlisting>
3766
3767 Alternatively <literal>Coll</literal> might be rewritten
3768
3769 <programlisting>
3770 class Coll s a where
3771 empty :: s a
3772 insert :: s a -> a -> s a
3773 </programlisting>
3774
3775
3776 which makes the connection between the type of a collection of
3777 <literal>a</literal>'s (namely <literal>(s a)</literal>) and the element type <literal>a</literal>.
3778 Occasionally this really doesn't work, in which case you can split the
3779 class like this:
3780
3781
3782 <programlisting>
3783 class CollE s where
3784 empty :: s
3785
3786 class CollE s => Coll s a where
3787 insert :: s -> a -> s
3788 </programlisting>
3789 </para>
3790 </sect3>
3791
3792
3793 <sect3>
3794 <title>Background on functional dependencies</title>
3795
3796 <para>The following description of the motivation and use of functional dependencies is taken
3797 from the Hugs user manual, reproduced here (with minor changes) by kind
3798 permission of Mark Jones.
3799 </para>
3800 <para>
3801 Consider the following class, intended as part of a
3802 library for collection types:
3803 <programlisting>
3804 class Collects e ce where
3805 empty :: ce
3806 insert :: e -> ce -> ce
3807 member :: e -> ce -> Bool
3808 </programlisting>
3809 The type variable e used here represents the element type, while ce is the type
3810 of the container itself. Within this framework, we might want to define
3811 instances of this class for lists or characteristic functions (both of which
3812 can be used to represent collections of any equality type), bit sets (which can
3813 be used to represent collections of characters), or hash tables (which can be
3814 used to represent any collection whose elements have a hash function). Omitting
3815 standard implementation details, this would lead to the following declarations:
3816 <programlisting>
3817 instance Eq e => Collects e [e] where ...
3818 instance Eq e => Collects e (e -> Bool) where ...
3819 instance Collects Char BitSet where ...
3820 instance (Hashable e, Collects a ce)
3821 => Collects e (Array Int ce) where ...
3822 </programlisting>
3823 All this looks quite promising; we have a class and a range of interesting
3824 implementations. Unfortunately, there are some serious problems with the class
3825 declaration. First, the empty function has an ambiguous type:
3826 <programlisting>
3827 empty :: Collects e ce => ce
3828 </programlisting>
3829 By "ambiguous" we mean that there is a type variable e that appears on the left
3830 of the <literal>=&gt;</literal> symbol, but not on the right. The problem with
3831 this is that, according to the theoretical foundations of Haskell overloading,
3832 we cannot guarantee a well-defined semantics for any term with an ambiguous
3833 type.
3834 </para>
3835 <para>
3836 We can sidestep this specific problem by removing the empty member from the
3837 class declaration. However, although the remaining members, insert and member,
3838 do not have ambiguous types, we still run into problems when we try to use
3839 them. For example, consider the following two functions:
3840 <programlisting>
3841 f x y = insert x . insert y
3842 g = f True 'a'
3843 </programlisting>
3844 for which GHC infers the following types:
3845 <programlisting>
3846 f :: (Collects a c, Collects b c) => a -> b -> c -> c
3847 g :: (Collects Bool c, Collects Char c) => c -> c
3848 </programlisting>
3849 Notice that the type for f allows the two parameters x and y to be assigned
3850 different types, even though it attempts to insert each of the two values, one
3851 after the other, into the same collection. If we're trying to model collections
3852 that contain only one type of value, then this is clearly an inaccurate
3853 type. Worse still, the definition for g is accepted, without causing a type
3854 error. As a result, the error in this code will not be flagged at the point
3855 where it appears. Instead, it will show up only when we try to use g, which
3856 might even be in a different module.
3857 </para>
3858
3859 <sect4><title>An attempt to use constructor classes</title>
3860
3861 <para>
3862 Faced with the problems described above, some Haskell programmers might be
3863 tempted to use something like the following version of the class declaration:
3864 <programlisting>
3865 class Collects e c where
3866 empty :: c e
3867 insert :: e -> c e -> c e
3868 member :: e -> c e -> Bool
3869 </programlisting>
3870 The key difference here is that we abstract over the type constructor c that is
3871 used to form the collection type c e, and not over that collection type itself,
3872 represented by ce in the original class declaration. This avoids the immediate
3873 problems that we mentioned above: empty has type <literal>Collects e c => c
3874 e</literal>, which is not ambiguous.
3875 </para>
3876 <para>
3877 The function f from the previous section has a more accurate type:
3878 <programlisting>
3879 f :: (Collects e c) => e -> e -> c e -> c e
3880 </programlisting>
3881 The function g from the previous section is now rejected with a type error as
3882 we would hope because the type of f does not allow the two arguments to have
3883 different types.
3884 This, then, is an example of a multiple parameter class that does actually work
3885 quite well in practice, without ambiguity problems.
3886 There is, however, a catch. This version of the Collects class is nowhere near
3887 as general as the original class seemed to be: only one of the four instances
3888 for <literal>Collects</literal>
3889 given above can be used with this version of Collects because only one of
3890 them---the instance for lists---has a collection type that can be written in
3891 the form c e, for some type constructor c, and element type e.
3892 </para>
3893 </sect4>
3894
3895 <sect4><title>Adding functional dependencies</title>
3896
3897 <para>
3898 To get a more useful version of the Collects class, Hugs provides a mechanism
3899 that allows programmers to specify dependencies between the parameters of a
3900 multiple parameter class (For readers with an interest in theoretical
3901 foundations and previous work: The use of dependency information can be seen
3902 both as a generalization of the proposal for `parametric type classes' that was
3903 put forward by Chen, Hudak, and Odersky, or as a special case of Mark Jones's
3904 later framework for "improvement" of qualified types. The
3905 underlying ideas are also discussed in a more theoretical and abstract setting
3906 in a manuscript [implparam], where they are identified as one point in a
3907 general design space for systems of implicit parameterization.).
3908
3909 To start with an abstract example, consider a declaration such as:
3910 <programlisting>
3911 class C a b where ...
3912 </programlisting>
3913 which tells us simply that C can be thought of as a binary relation on types
3914 (or type constructors, depending on the kinds of a and b). Extra clauses can be
3915 included in the definition of classes to add information about dependencies
3916 between parameters, as in the following examples:
3917 <programlisting>
3918 class D a b | a -> b where ...
3919 class E a b | a -> b, b -> a where ...
3920 </programlisting>
3921 The notation <literal>a -&gt; b</literal> used here between the | and where
3922 symbols --- not to be
3923 confused with a function type --- indicates that the a parameter uniquely
3924 determines the b parameter, and might be read as "a determines b." Thus D is
3925 not just a relation, but actually a (partial) function. Similarly, from the two
3926 dependencies that are included in the definition of E, we can see that E
3927 represents a (partial) one-one mapping between types.
3928 </para>
3929 <para>
3930 More generally, dependencies take the form <literal>x1 ... xn -&gt; y1 ... ym</literal>,
3931 where x1, ..., xn, and y1, ..., yn are type variables with n&gt;0 and
3932 m&gt;=0, meaning that the y parameters are uniquely determined by the x
3933 parameters. Spaces can be used as separators if more than one variable appears
3934 on any single side of a dependency, as in <literal>t -&gt; a b</literal>. Note that a class may be
3935 annotated with multiple dependencies using commas as separators, as in the
3936 definition of E above. Some dependencies that we can write in this notation are
3937 redundant, and will be rejected because they don't serve any useful
3938 purpose, and may instead indicate an error in the program. Examples of
3939 dependencies like this include <literal>a -&gt; a </literal>,
3940 <literal>a -&gt; a a </literal>,
3941 <literal>a -&gt; </literal>, etc. There can also be
3942 some redundancy if multiple dependencies are given, as in
3943 <literal>a-&gt;b</literal>,
3944 <literal>b-&gt;c </literal>, <literal>a-&gt;c </literal>, and
3945 in which some subset implies the remaining dependencies. Examples like this are
3946 not treated as errors. Note that dependencies appear only in class
3947 declarations, and not in any other part of the language. In particular, the
3948 syntax for instance declarations, class constraints, and types is completely
3949 unchanged.
3950 </para>
3951 <para>
3952 By including dependencies in a class declaration, we provide a mechanism for
3953 the programmer to specify each multiple parameter class more precisely. The
3954 compiler, on the other hand, is responsible for ensuring that the set of
3955 instances that are in scope at any given point in the program is consistent
3956 with any declared dependencies. For example, the following pair of instance
3957 declarations cannot appear together in the same scope because they violate the
3958 dependency for D, even though either one on its own would be acceptable:
3959 <programlisting>
3960 instance D Bool Int where ...
3961 instance D Bool Char where ...
3962 </programlisting>
3963 Note also that the following declaration is not allowed, even by itself:
3964 <programlisting>
3965 instance D [a] b where ...
3966 </programlisting>
3967 The problem here is that this instance would allow one particular choice of [a]
3968 to be associated with more than one choice for b, which contradicts the
3969 dependency specified in the definition of D. More generally, this means that,
3970 in any instance of the form:
3971 <programlisting>
3972 instance D t s where ...
3973 </programlisting>
3974 for some particular types t and s, the only variables that can appear in s are
3975 the ones that appear in t, and hence, if the type t is known, then s will be
3976 uniquely determined.
3977 </para>
3978 <para>
3979 The benefit of including dependency information is that it allows us to define
3980 more general multiple parameter classes, without ambiguity problems, and with
3981 the benefit of more accurate types. To illustrate this, we return to the
3982 collection class example, and annotate the original definition of <literal>Collects</literal>
3983 with a simple dependency:
3984 <programlisting>
3985 class Collects e ce | ce -> e where
3986 empty :: ce
3987 insert :: e -> ce -> ce
3988 member :: e -> ce -> Bool
3989 </programlisting>
3990 The dependency <literal>ce -&gt; e</literal> here specifies that the type e of elements is uniquely
3991 determined by the type of the collection ce. Note that both parameters of
3992 Collects are of kind *; there are no constructor classes here. Note too that
3993 all of the instances of Collects that we gave earlier can be used
3994 together with this new definition.
3995 </para>
3996 <para>
3997 What about the ambiguity problems that we encountered with the original
3998 definition? The empty function still has type Collects e ce => ce, but it is no
3999 longer necessary to regard that as an ambiguous type: Although the variable e
4000 does not appear on the right of the => symbol, the dependency for class
4001 Collects tells us that it is uniquely determined by ce, which does appear on
4002 the right of the => symbol. Hence the context in which empty is used can still
4003 give enough information to determine types for both ce and e, without
4004 ambiguity. More generally, we need only regard a type as ambiguous if it
4005 contains a variable on the left of the => that is not uniquely determined
4006 (either directly or indirectly) by the variables on the right.
4007 </para>
4008 <para>
4009 Dependencies also help to produce more accurate types for user defined
4010 functions, and hence to provide earlier detection of errors, and less cluttered
4011 types for programmers to work with. Recall the previous definition for a
4012 function f:
4013 <programlisting>
4014 f x y = insert x y = insert x . insert y
4015 </programlisting>
4016 for which we originally obtained a type:
4017 <programlisting>
4018 f :: (Collects a c, Collects b c) => a -> b -> c -> c
4019 </programlisting>
4020 Given the dependency information that we have for Collects, however, we can
4021 deduce that a and b must be equal because they both appear as the second
4022 parameter in a Collects constraint with the same first parameter c. Hence we
4023 can infer a shorter and more accurate type for f:
4024 <programlisting>
4025 f :: (Collects a c) => a -> a -> c -> c
4026 </programlisting>
4027 In a similar way, the earlier definition of g will now be flagged as a type error.
4028 </para>
4029 <para>
4030 Although we have given only a few examples here, it should be clear that the
4031 addition of dependency information can help to make multiple parameter classes
4032 more useful in practice, avoiding ambiguity problems, and allowing more general
4033 sets of instance declarations.
4034 </para>
4035 </sect4>
4036 </sect3>
4037 </sect2>
4038
4039 <sect2 id="instance-decls">
4040 <title>Instance declarations</title>
4041
4042 <para>An instance declaration has the form
4043 <screen>
4044 instance ( <replaceable>assertion</replaceable><subscript>1</subscript>, ..., <replaceable>assertion</replaceable><subscript>n</subscript>) =&gt; <replaceable>class</replaceable> <replaceable>type</replaceable><subscript>1</subscript> ... <replaceable>type</replaceable><subscript>m</subscript> where ...
4045 </screen>
4046 The part before the "<literal>=&gt;</literal>" is the
4047 <emphasis>context</emphasis>, while the part after the
4048 "<literal>=&gt;</literal>" is the <emphasis>head</emphasis> of the instance declaration.
4049 </para>
4050
4051 <sect3 id="flexible-instance-head">
4052 <title>Relaxed rules for the instance head</title>
4053
4054 <para>
4055 In Haskell 98 the head of an instance declaration
4056 must be of the form <literal>C (T a1 ... an)</literal>, where
4057 <literal>C</literal> is the class, <literal>T</literal> is a data type constructor,
4058 and the <literal>a1 ... an</literal> are distinct type variables.
4059 GHC relaxes these rules in two ways.
4060 <itemizedlist>
4061 <listitem><para>
4062 With the <option>-XTypeSynonymInstances</option> flag, instance heads may use type
4063 synonyms. As always, using a type synonym is just shorthand for
4064 writing the RHS of the type synonym definition. For example:
4065 <programlisting>
4066 type Point a = (a,a)
4067 instance C (Point a) where ...
4068 </programlisting>
4069 is legal. The instance declaration is equivalent to
4070 <programlisting>
4071 instance C (a,a) where ...
4072 </programlisting>
4073 As always, type synonyms
4074 must be fully applied. You cannot, for example, write:
4075 <programlisting>
4076 instance Monad Point where ...
4077 </programlisting>
4078 </para></listitem>
4079
4080 <listitem>
4081 <para>
4082 The <option>-XFlexibleInstances</option> flag allows the head of the instance
4083 declaration to mention arbitrary nested types.
4084 For example, this becomes a legal instance declaration
4085 <programlisting>
4086 instance C (Maybe Int) where ...
4087 </programlisting>
4088 See also the <link linkend="instance-overlap">rules on overlap</link>.
4089 </para>
4090 <para>