Use the BF_EVACUATED flag to indicate to-space consistently
[ghc.git] / docs / coding-style.html
2 <HTML>
3 <HEAD>
4 <meta http-equiv="content-type" content="text/html;charset=iso-8859-1">
5 <TITLE>Style Guidelines for fptools</TITLE>
6 </HEAD>
7 <BODY>
9 <H1>Style Guidelines for fptools</h1>
11 <h2>Comments</h2>
13 <p>These coding style guidelines are mainly intended for use in
14 <tt>ghc/rts</tt> and <tt>ghc/includes</tt>.
16 <p>NB These are just suggestions. They're not set in stone. Some of
17 them are probably misguided. If you disagree with them, feel free to
18 modify this document (and make your commit message reasonably
19 informative) or mail someone (eg. <a
20 href="">The GHC mailing list</a>)
22 <h2>References</h2>
24 If you haven't read them already, you might like to check the following.
25 Where they conflict with our suggestions, they're probably right.
27 <ul>
29 <li>
30 The C99 standard. One reasonable reference is <a
31 href="">here</a>.
33 <p><li>
34 Writing Solid Code, Microsoft Press. (Highly recommended. Possibly
35 the only Microsoft Press book that's worth reading.)
37 <p><li>
38 Autoconf documentation.
39 See also <a href="">The autoconf macro archive</a> and
40 <a href="">Cyclic Software's description</a>
42 <p><li> <a
43 href="">Indian
44 Hill C Style and Coding Standards</a>.
46 <p><li>
47 <a href="">A list of C programming style links</a>
49 <p><li>
50 <a href="">A very large list of C programming links</a>
52 <p><li>
53 <a href="">A list of Unix programming links</a>
55 </ul>
58 <h2>Portability issues</h2>
60 <ul>
61 <li> We try to stick to C99 where possible. We use the following
62 C99 features relative to C89, some of which were previously GCC
63 extensions (possibly with different syntax):
65 <ul>
66 <li>Variable length arrays as the last field of a struct. GCC has
67 a similar extension, but the syntax is slightly different: in GCC you
68 would declare the array as <tt>arr[0]</tt>, whereas in C99 it is
69 declared as <tt>arr[]</tt>.
71 <p><li>Inline annotations on functions (see later)
73 <p><li>Labeled elements in initialisers. Again, GCC has a slightly
74 different syntax from C99 here, and we stick with the GCC syntax until
75 GCC implements the C99 proposal.
77 <p><li>C++-style comments. These are part of the C99 standard, and we
78 prefer to use them whenever possible.
79 </ul>
81 <p>In addition we use ANSI-C-style function declarations and
82 prototypes exclusively. Every function should have a prototype;
83 static function prototypes may be placed near the top of the file in
84 which they are declared, and external prototypes are usually placed in
85 a header file with the same basename as the source file (although there
86 are exceptions to this rule, particularly when several source files
87 together implement a subsystem which is described by a single external
88 header file).
90 <p><li>We use the following GCC extensions, but surround them with
91 <tt>#ifdef __GNUC__</tt>:
93 <ul>
94 <li>Function attributes (mostly just <code>no_return</code> and
95 <code>unused</code>)
96 <li>Inline assembly.
97 </ul>
99 <p><li>
100 char can be signed or unsigned - always say which you mean
102 <p><li>Our POSIX policy: try to write code that only uses POSIX (IEEE
103 Std 1003.1) interfaces and APIs. We used to define
104 <code>POSIX_SOURCE</code> by default, but found that this caused more
105 problems than it solved, so now we require any code that is
106 POSIX-compliant to explicitly say so by having <code>#include
107 "PosixSource.h"</code> at the top. Try to do this whenever possible.
109 <p><li> Some architectures have memory alignment constraints. Others
110 don't have any constraints but go faster if you align things. These
111 macros (from <tt>config.h</tt>) tell you which alignment to use
113 <pre>
114 /* minimum alignment of unsigned int */
117 /* minimum alignment of long */
118 #define ALIGNMENT_LONG 4
120 /* minimum alignment of float */
121 #define ALIGNMENT_FLOAT 4
123 /* minimum alignment of double */
124 #define ALIGNMENT_DOUBLE 4
125 </pre>
127 <p><li> Use <tt>StgInt</tt>, <tt>StgWord</tt> and <tt>StgPtr</tt> when
128 reading/writing ints and ptrs to the stack or heap. Note that, by
129 definition, <tt>StgInt</tt>, <tt>StgWord</tt> and <tt>StgPtr</tt> are
130 the same size and have the same alignment constraints even if
131 <code>sizeof(int) != sizeof(ptr)</code> on that platform.
133 <p><li> Use <tt>StgInt8</tt>, <tt>StgInt16</tt>, etc when you need a
134 certain minimum number of bits in a type. Use <tt>int</tt> and
135 <tt>nat</tt> when there's no particular constraint. ANSI C only
136 guarantees that ints are at least 16 bits but within GHC we assume
137 they are 32 bits.
139 <p><li> Use <tt>StgFloat</tt> and <tt>StgDouble</tt> for floating
140 point values which will go on/have come from the stack or heap. Note
141 that <tt>StgDouble</tt> may occupy more than one <tt>StgWord</tt>, but
142 it will always be a whole number multiple.
144 <p>
145 Use <code>PK_FLT(addr)</code>, <code>PK_DBL(addr)</code> to read
146 <tt>StgFloat</tt> and <tt>StgDouble</tt> values from the stack/heap,
147 and <code>ASSIGN_FLT(val,addr)</code> /
148 <code>ASSIGN_DBL(val,addr)</code> to assign StgFloat/StgDouble values
149 to heap/stack locations. These macros take care of alignment
150 restrictions.
152 <p>
153 Heap/Stack locations are always <tt>StgWord</tt> aligned; the
154 alignment requirements of an <tt>StgDouble</tt> may be more than that
155 of <tt>StgWord</tt>, but we don't pad misaligned <tt>StgDoubles</tt>
156 because doing so would be too much hassle (see <code>PK_DBL</code> &
157 co above).
159 <p><li>
160 Avoid conditional code like this:
162 <pre>
163 #ifdef solaris_HOST_OS
164 // do something solaris specific
165 #endif
166 </pre>
168 Instead, add an appropriate test to the script and use
169 the result of that test instead.
171 <pre>
172 #ifdef HAVE_BSD_H
173 // use a BSD library
174 #endif
175 </pre>
177 <p>The problem is that things change from one version of an OS to another
178 - things get added, things get deleted, things get broken, some things
179 are optional extras. Using "feature tests" instead of "system tests"
180 makes things a lot less brittle. Things also tend to get documented
181 better.
183 </ul>
185 <h2>Debugging/robustness tricks</h2>
188 Anyone who has tried to debug a garbage collector or code generator
189 will tell you: "If a program is going to crash, it should crash as
190 soon, as noisily and as often as possible." There's nothing worse
191 than trying to find a bug which only shows up when running GHC on
192 itself and doesn't manifest itself until 10 seconds after the actual
193 cause of the problem.
195 <p>We put all our debugging code inside <tt>#ifdef DEBUG</tt>. The
196 general policy is we don't ship code with debugging checks and
197 assertions in it, but we do run with those checks in place when
198 developing and testing. Anything inside <tt>#ifdef DEBUG</tt> should
199 not slow down the code by more than a factor of 2.
201 <p>We also have more expensive "sanity checking" code for hardcore
202 debugging - this can slow down the code by a large factor, but is only
203 enabled on demand by a command-line flag. General sanity checking in
204 the RTS is currently enabled with the <tt>-DS</tt> RTS flag.
206 <p>There are a number of RTS flags which control debugging output and
207 sanity checking in various parts of the system when <tt>DEBUG</tt> is
208 defined. For example, to get the scheduler to be verbose about what
209 it is doing, you would say <tt>+RTS -Ds -RTS</tt>. See
210 <tt>includes/RtsFlags.h</tt> and <tt>rts/RtsFlags.c</tt> for the full
211 set of debugging flags. To check one of these flags in the code,
212 write:
214 <pre>
215 IF_DEBUG(gc, fprintf(stderr, "..."));
216 </pre>
218 would check the <tt>gc</tt> flag before generating the output (and the
219 code is removed altogether if <tt>DEBUG</tt> is not defined).
221 <p>All debugging output should go to <tt>stderr</tt>.
223 <p>
224 Particular guidelines for writing robust code:
226 <ul>
227 <li>
228 Use assertions. Use lots of assertions. If you write a comment
229 that says "takes a +ve number" add an assertion. If you're casting
230 an int to a nat, add an assertion. If you're casting an int to a char,
231 add an assertion. We use the <tt>ASSERT</tt> macro for writing
232 assertions; it goes away when <tt>DEBUG</tt> is not defined.
234 <li>
235 Write special debugging code to check the integrity of your data structures.
236 (Most of the runtime checking code is in <tt>rts/Sanity.c</tt>)
237 Add extra assertions which call this code at the start and end of any
238 code that operates on your data structures.
240 <li>
241 When you find a hard-to-spot bug, try to think of some assertions,
242 sanity checks or whatever that would have made the bug easier to find.
244 <li>
245 When defining an enumeration, it's a good idea not to use 0 for normal
246 values. Instead, make 0 raise an internal error. The idea here is to
247 make it easier to detect pointer-related errors on the assumption that
248 random pointers are more likely to point to a 0 than to anything else.
250 <pre>
251 typedef enum
252 { i_INTERNAL_ERROR /* Instruction 0 raises an internal error */
253 , i_PANIC /* irrefutable pattern match failed! */
254 , i_ERROR /* user level error */
256 ...
257 </pre>
259 <p><li> Use <tt>#warning</tt> or <tt>#error</tt> whenever you write a
260 piece of incomplete/broken code.
262 <p><li> When testing, try to make infrequent things happen often.
263 For example, make a context switch/gc/etc happen every time a
264 context switch/gc/etc can happen. The system will run like a
265 pig but it'll catch a lot of bugs.
267 </ul>
269 <h2>Syntactic details</h2>
271 <ul>
272 <li><b>Important:</b> Put "redundant" braces or parens in your code.
273 Omitting braces and parens leads to very hard to spot bugs -
274 especially if you use macros (and you might have noticed that GHC does
275 this a lot!)
277 <p>
278 In particular:
279 <ul>
280 <li>
281 Put braces round the body of for loops, while loops, if statements, etc.
282 even if they "aren't needed" because it's really hard to find the resulting
283 bug if you mess up. Indent them any way you like but put them in there!
284 </ul>
286 <li>
287 When defining a macro, always put parens round args - just in case.
288 For example, write:
289 <pre>
290 #define add(x,y) ((x)+(y))
291 </pre>
292 instead of
293 <pre>
294 #define add(x,y) x+y
295 </pre>
297 <li> Don't declare and initialize variables at the same time.
298 Separating the declaration and initialization takes more lines, but
299 make the code clearer.
301 <li>
302 Use inline functions instead of macros if possible - they're a lot
303 less tricky to get right and don't suffer from the usual problems
304 of side effects, evaluation order, multiple evaluation, etc.
306 <ul>
307 <li>Inline functions get the naming issue right. E.g. they
308 can have local variables which (in an expression context)
309 macros can't.
311 <li> Inline functions have call-by-value semantics whereas macros
312 are call-by-name. You can be bitten by duplicated computation
313 if you aren't careful.
315 <li> You can use inline functions from inside gdb if you compile with
316 -O0 or -fkeep-inline-functions. If you use macros, you'd better
317 know what they expand to.
318 </ul>
320 However, note that macros can serve as both l-values and r-values and
321 can be "polymorphic" as these examples show:
322 <pre>
323 // you can use this as an l-value or an l-value
324 #define PROF_INFO(cl) (((StgClosure*)(cl))->header.profInfo)
326 // polymorphic case
327 // but note that min(min(1,2),3) does 3 comparisions instead of 2!!
328 #define min(x,y) (((x)<=(y)) ? (x) : (y))
329 </pre>
331 <li>
332 Inline functions should be "static inline" because:
333 <ul>
334 <li>
335 gcc will delete static inlines if not used or theyre always inlined.
337 <li>
338 if they're externed, we could get conflicts between 2 copies of the
339 same function if, for some reason, gcc is unable to delete them.
340 If they're static, we still get multiple copies but at least they don't conflict.
341 </ul>
343 OTOH, the gcc manual says this
344 so maybe we should use extern inline?
346 <pre>
347 When a function is both inline and `static', if all calls to the
348 function are integrated into the caller, and the function's address is
349 never used, then the function's own assembler code is never referenced.
350 In this case, GNU CC does not actually output assembler code for the
351 function, unless you specify the option `-fkeep-inline-functions'.
352 Some calls cannot be integrated for various reasons (in particular,
353 calls that precede the function's definition cannot be integrated, and
354 neither can recursive calls within the definition). If there is a
355 nonintegrated call, then the function is compiled to assembler code as
356 usual. The function must also be compiled as usual if the program
357 refers to its address, because that can't be inlined.
359 When an inline function is not `static', then the compiler must
360 assume that there may be calls from other source files; since a global
361 symbol can be defined only once in any program, the function must not
362 be defined in the other source files, so the calls therein cannot be
363 integrated. Therefore, a non-`static' inline function is always
364 compiled on its own in the usual fashion.
366 If you specify both `inline' and `extern' in the function
367 definition, then the definition is used only for inlining. In no case
368 is the function compiled on its own, not even if you refer to its
369 address explicitly. Such an address becomes an external reference, as
370 if you had only declared the function, and had not defined it.
372 This combination of `inline' and `extern' has almost the effect of a
373 macro. The way to use it is to put a function definition in a header
374 file with these keywords, and put another copy of the definition
375 (lacking `inline' and `extern') in a library file. The definition in
376 the header file will cause most calls to the function to be inlined.
377 If any uses of the function remain, they will refer to the single copy
378 in the library.
379 </pre>
381 <p><li>
382 Don't define macros that expand to a list of statements.
383 You could just use braces as in:
385 <pre>
386 #define ASSIGN_CC_ID(ccID) \
387 { \
388 ccID = CC_ID; \
389 CC_ID++; \
390 }
391 </pre>
393 (but it's usually better to use an inline function instead - see above).
395 <p><li>
396 Don't even write macros that expand to 0 statements - they can mess you
397 up as well. Use the doNothing macro instead.
398 <pre>
399 #define doNothing() do { } while (0)
400 </pre>
402 <li>
403 This code
404 <pre>
405 int* p, q;
406 </pre>
407 looks like it declares two pointers but, in fact, only p is a pointer.
408 It's safer to write this:
409 <pre>
410 int* p;
411 int* q;
412 </pre>
413 You could also write this:
414 <pre>
415 int *p, *q;
416 </pre>
417 but it is preferrable to split the declarations.
419 <li>
420 Try to use ANSI C's enum feature when defining lists of constants of
421 the same type. Among other benefits, you'll notice that gdb uses the
422 name instead of its (usually inscrutable) number when printing values
423 with enum types and gdb will let you use the name in expressions you
424 type.
426 <p>
427 Examples:
428 <pre>
429 typedef enum { /* N.B. Used as indexes into arrays */
437 } ProfilingFlags;
438 </pre>
439 instead of
440 <pre>
441 # define NO_HEAP_PROFILING 0 /* N.B. Used as indexes into arrays */
442 # define HEAP_BY_CC 1
443 # define HEAP_BY_MOD 2
444 # define HEAP_BY_GRP 3
445 # define HEAP_BY_DESCR 4
446 # define HEAP_BY_TYPE 5
447 # define HEAP_BY_TIME 6
448 </pre>
449 and
450 <pre>
451 typedef enum {
452 CCchar = 'C',
453 MODchar = 'M',
454 GRPchar = 'G',
455 DESCRchar = 'D',
456 TYPEchar = 'Y',
457 TIMEchar = 'T'
458 } ProfilingTag;
459 </pre>
460 instead of
461 <pre>
462 # define CCchar 'C'
463 # define MODchar 'M'
464 # define GRPchar 'G'
465 # define DESCRchar 'D'
466 # define TYPEchar 'Y'
467 # define TIMEchar 'T'
468 </pre>
470 <li> Please keep to 80 columns: the line has to be drawn somewhere,
471 and by keeping it to 80 columns we can ensure that code looks OK on
472 everyone's screen. Long lines are hard to read, and a sign that the
473 code needs to be restructured anyway.
475 <li> When commenting out large chunks of code, use <code>#ifdef 0
476 ... #endif</code> rather than <code>/* ... */</code> because C doesn't
477 have nested comments.
479 <li>When declaring a typedef for a struct, give the struct a name
480 as well, so that other headers can forward-reference the struct name
481 and it becomes possible to have opaque pointers to the struct. Our
482 convention is to name the struct the same as the typedef, but add a
483 leading underscore. For example:
485 <pre>
486 typedef struct _Foo {
487 ...
488 } Foo;
489 </pre>
491 <li>Do not use <tt>!</tt> instead of explicit comparison against
492 <tt>NULL</tt> or <tt>'\0'</tt>; the latter is much clearer.
494 <li> We don't care too much about your indentation style but, if
495 you're modifying a function, please try to use the same style as the
496 rest of the function (or file). If you're writing new code, a
497 tab width of 4 is preferred.
499 </ul>
501 <h2>CVS issues</h2>
503 <ul>
504 <li>
505 Don't be tempted to reindent or reorganise large chunks of code - it
506 generates large diffs in which it's hard to see whether anything else
507 was changed.
508 <p>
509 If you must reindent or reorganise, don't include any functional
510 changes that commit and give advance warning that you're about to do
511 it in case anyone else is changing that file.
512 </ul>
515 <h2>Commandline arguments</h2>
517 A program in fptools should try follow the following rules for
518 commandline arguments:
520 <ul>
521 <li> The <code>-v</code> and <code>--verbose</code> options should be
522 used to generate verbose output (intended for the user).
524 <li> The <code>-d</code> and <code>--debug</code> options should be
525 used to generate debugging output (intended for the developer).
527 <li> The <code>-?</code> and <code>--help</code> options should be used
528 to display usage information on stdout. The program should exit
529 successfully afterwards.
531 <li> The <code>-V</code> and <code>--version</code> options should be
532 used to output version information on stdout, which includes one line
533 of the form '<code><em>Program</em> version
534 <em>Major.Minor[.Patchlevel]</em> ... </code>'. The program
535 should exit successfully afterwards.
536 </ul>
538 When an unknown commandline argument is encountered, the program
539 should display usage information on stderr and exit unsuccessfully.
541 </body>
542 </html>