1 \documentclass{article}
2 \usepackage{code,a4wide}
4 \usepackage{graphics,epsfig,epic,eepic,epsfig}
6 \setlength{\parskip}{0.25cm}
7 \setlength{\parsep}{0.25cm}
8 \setlength{\topsep}{0cm}
9 \setlength{\parindent}{0cm}
10 \renewcommand{\textfraction}{0.2}
11 \renewcommand{\floatpagefraction}{0.7}
14 % Terminology
15 \newcommand{\block}{block}
16 \newcommand{\Block}{Block}
17 \newcommand{\segment}{segment}
18 \newcommand{\Segment}{Segment}
19 \newcommand{\step}{step}
20 \newcommand{\Step}{Step}
22 \newcommand{\note}[1]{{\em $\spadesuit$ #1}}
24 \begin{document}
25 \title{Implementation of Retainer Profiling}
26 \author{Sungwoo Park and Simon Peyton-Jones}
28 \makeatactive
29 \maketitle
31 \section{Retainer Profiling}
33 Retainer profiling~\cite{CN} is a profiling technique which is based upon a
34 special view of production and consumption of heap objects at runtime:
35 while producers build heap objects to form new pieces of graph,
36 consumers hold pointers to these heap objects, or \emph{retain} them, so
37 that they are not freed during garbage collections.
38 On this basis, we refereed to such consumers as \emph{retainers}.
39 Notice that an object can have more than one retainer because it can
40 be pointed to by multiple objects.
42 For each live object in the heap, retainer profiling computes
43 all its retainers, or its \emph{retainer set}.
44 A naive implementation of retainer profiling could consider every
45 immediate ancestor of an object as its retainer.
46 Although this approach appears to provide an accurate report on the
47 relationship between retainers and retainees, the result can hardly be useful.
48 For instance, it is pointless to scrutinize a list and treat each cons
49 cell as a retainer of the following cons cell.
50 This observation suggests that we need to have a way of designating only
51 certain types of objects as candidates for retainers.
52 In other words, we need to employ an oracle which tells whether a given
53 object can be a retainer or not.
55 Since no retainer of a particular object needs to be using the
56 object actively, we can find all its retainers simply by traversing
57 the graph. In other words, we do not have to distinguish those retainers
58 actively exploiting it from other retainers just holding pointers
59 to it (either directly or indirectly).
60 Thus, retainer profiling can be accomplished simply by traversing the
61 graph.
63 Figure~\ref{fig-retaineralgorithm} shows the algorithm for retainer
64 profiling. The traversal begins at every root, and proceeds
65 in a depth first manner (or a breadth first manner).
66 The function @R()@ returns the \emph{identity} of a given retainer such
67 as its information table address or
68 the name of the module which creates it.
69 Notice that the retainer identity function does not need to be a
70 one-to-one mapping:
71 multiple objects can share the same identity.
72 Such a retainer identity function reduces the cost of traversal.
73 For instance, when an object
74 is reached from several retainers which share the same identity, we need to
75 consider only the first visit to the object.
76 In other words, whichever retainer (among those sharing the same identity)
77 leads to the object for the first time affects the retainer set of the object
78 in consideration
79 and all the other retainers can be ignored.
80 Thus, the more coarse the function @R()@ is, the less
81 it costs to traverse the graph for retainer profiling.
82 The function @isRetainer()@ tells whether a given object is a retainer or not.
83 Hence, the behavior of the retainer profiling algorithm is parameterized
84 over: 1) the set of roots; 2) the function @R()@; 3) the function
85 @isRetainer()@.
87 One important invariant on the function @R()@ is that its return value
88 must be consistent for a given retainer. In other words, @R()@ must return
89 the same value for a given retainer no matter it is invoked.
90 For this reason, the memory address of a retainer, for instance, cannot be used as
91 its retainer identity because its location may change during garbage collections.
93 \begin{figure}[ht]
94 \begin{center}
95 \begin{code}
96 for every root r
97 retain(r, r)
99 R(r) =
100 the identity of r
102 isRetainer(c) =
103 if c is a retainer then
104 true
105 else
106 false
108 retain(c, r) =
109 if R(r) is a member of c.retainerSet then
110 return
112 if isRetainer(c) then
113 r' := c
114 else
115 r' := r
116 for every successor c' of c
117 retain(c', r')
118 \end{code}
119 \caption{Retainer profiling algorithm}
120 \label{fig-retaineralgorithm}
121 \end{center}
122 \end{figure}
124 Another way of formulating retainer profiling is in terms of fixed point
125 equations on retainer sets.
126 To be specific, given the two functions @isRetainer()@ and @R()@,
127 the retainer set of every object is computed as the least fixed point
128 solution of the following equations:
129 \begin{itemize}
130 \item For every root @r@,
131 \begin{center}
132 @R(r)@ $\in$ @r.retainerSet@.
133 \end{center}
134 \item For every reachable object @c@,
135 \begin{center}
136 $\bigcup_{\mathit{each\ ancestor\ @a@\ of\ @c@}}$ @from(a)@ $\subseteq$
137 @c.retainerSet@
138 \end{center}
139 where @from(a)@ returns retainer(s) obtainable from @a@:
140 \begin{center}
141 @from(a)@ = if @isRetainer(a)@ then $\{@a@\}$ else @a.retainerSet@.
142 \end{center}
143 \end{itemize}
145 This document describes the implementation of retainer profiling on
146 the Glasgow Haskell Compiler runtime system.
147 It explains every detail of the development so that it can be (hopefully)
148 a complete maintenance guide.
149 A secondary goal is to help (hopefully) those who wish to extend the system
150 to implement another profiling scheme.\footnote{Unless otherwise mentioned,
151 all identifiers are defined in @RetainerProfile.c@ or @RetainerSet.c@.}
153 \section{Installing the GHC}
155 Installing the GHC is done as follows:
157 \begin{enumerate}
158 \item Get the source code from the CVS repository.
159 \begin{code}
160 ./ cvs checkout fpconfig
161 ./fptools/ cvs update -d CONTRIB common-rts distrib docs ghc glafp-utils
162 hslibs literate mhms mk nofib testsuite
163 \end{code}
165 \item Set up the basic configuration.
166 \begin{code}
167 ./fptools/ autoconf
168 ./fptools/ghc/ autoconf
169 ./fptools/ configure
170 \end{code}
172 \item Set up the configuration for development and debugging.
173 \begin{code}
174 ./fptools/mk vi build.mk
175 GhcHcOpts = -O -fasm -Rghc-timing
176 SplitObjs = NO
177 GhcRtsHcOpts =
178 GhcRtsCcOpts = -g
179 STRIP_CMD =:
180 \end{code}
181 @GhcLibWays@ tells the compiler to build the code for profiling as well.
182 @GhcRtsHcOpts@ has additional flags for @gcc@ when compiling @.hc@ files.
183 @GhcRtsCcOpts@ has additional flags for @gcc@ when compiling @.c@ files.
184 Since we will implement retainer profiling in @.c@ files, we turn on the
185 debugging flag @-g@.
186 The empty setting for @STRIP_CMD@ tells the compiler not to remove source code
187 information (generated due to the @-g@ option) from executable files so that
188 they can be examined with @gdb@.
190 \item Remove unnecessary files if needed and build everything.
191 \begin{code}
192 ./fptools/ make
193 \end{code}
194 \end{enumerate}
198 Since every Haskell closure now needs to store its retainer set at runtime,
199 it must be augmented with a new field,
200 namely, a \emph{retainer set field}.
201 This section explains how to add such a field to Haskell closures.
202 It should be clear how to generalize the idea for adding
203 any number of new fields.\footnote{The GHC provides two
204 ways of building executable programs from
205 source files: normal way and profiling way.
206 We are concerned only about profiling way, and all the pieces of code
207 implementing profiling way are wrapped by the @PROFILING@
208 pre-processing directive (as in @\#ifdef PROFILING@).
209 Therefore, all the additions and changes that we make to the source code
210 are assumed to be wrapped by the @PROFILING@ pre-processing
211 directive as well unless otherwise mentioned.}
215 We want to add a retainer set field of type @retainerSet@ to every
216 closure, so we create a new file @includes/StgRetainerProf.h@ where
217 we define the type @retainerSet@.
218 The actual definition of @retainerSet@ will be given later.
220 \begin{code}
221 /* includes/StgRetainerProf.h */
222 typedef ... retainerSet;
223 \end{code}
225 We make type @retainerSet@ to be publicly available by including
226 @includes/StgRetainerProf.h@ itself to @includes/Stg.h@ (not wrapped
227 by @PROFILING@).
229 \begin{code}
230 /* includes/Stg.h */
231 #include "StgRetainerProf.h"
232 \end{code}
234 Then we add a retainer set field @rs@ to the @StgProfHeader@ structure.
236 \begin{code}
237 /* include/Closures.h */
238 typedef struct {
239 CostCentreStack *ccs;
240 retainerSet *rs;
242 \end{code}
244 Now every closure @c@ (including static closures) has a retainer set field,
245 which can be accessed with @c->header.prof.rs@ (where @c@ denotes the
248 \subsection{Changing constants}
250 We are ready to reflect the new size of Haskell closures to other part
251 of the source code.
252 This is accomplished by changing a few constants which specify the size
253 of certain types of closures and their layout.
255 When building the runtime system, the @gcc@ compiler correctly figures out
256 the size of every structure on its own.
257 However,
258 GHC simply reads @includes/Constants.h@ to to determine the size of
259 closures assumed by the runtime system.
260 Thus, we must change the constants used by the GHC itself (as opposed to
261 the runtime system). They are all found in @includes/Constants.h@.
262 We increase each of them by 1 to reflect the retainer set field which is one
263 word long:
264 \begin{code}
265 /* includes/Constants.h */
266 #define PROF_HDR_SIZE 2
267 #define SCC_UF_SIZE 5
268 #define SCC_SEQ_FRAME_SIZE 4
269 \end{code}
270 @PROF_HDR_SIZE@ denotes the size of the structure @StgProfHeader@, which
271 is now two words long.
272 @SCC_UF_SIZE@ and @SCC_SEQ_FRAME_SIZE@ denote the size of the structures
273 @StgUpdateFrame@ and @StgSeqFrame@ (in @includes/Closures.h@) in
274 words.
276 Now we must rebuild the GHC so that, when executed, the code generated by
277 the GHC must now allocate one more word for the retainer set field than before.
279 \begin{code}
280 ./fptools/ghc/ make boot
281 ./fptools/ghc/ make
282 \end{code}
284 The second command @make boot@ instructs the build system to analyze
285 the source code dependency so that the next execution of @make@ correctly
286 finds all required files.
288 Next we change four bitmap constants which specify the layout of
289 certain types of closures.
290 As an example, let us consider @RET_BITMAP@, which specifies the layout
291 of thunk selectors (corresponding to closure type @THUNK_SELECTOR@).
292 Without a retainer set field, there is only one non-pointer (represented
293 by $1$) followed by one or more pointers (represented by $0$) in the closure
294 body and the bitmap representation is $0b1$, or $1$.
295 With a retainer set field, which is not a pointer to another closure and thus
296 represented by $1$, there are two non-pointers, and the bitmap representation
297 is $0b11$, or $3$. Notice that the bitmap is interpreted in reverse order:
298 the least significant bit corresponds to the first word in the closure body,
299 and the second least significant bit to the second word, and so forth.
300 The same rule applies to the other three bitmap constants:
301 @CATCH_FRAME_BITMAP@ (for closure type @CATCH_FRAME@ and structure
302 @StgCatchFrame@),
303 @STOP_THREAD_BITMAP@ (for closure type @STOP_FRAME@ and structure
304 @StgStopFrame@), and
305 @UPD_FRAME_BITMAP@ (for closure type @UPDATE_FRAME@ and structure
306 @StgUpdateFrame@).
308 \begin{code}
309 /* rts/StgStdThunks.hc */
310 #define RET_BITMAP 3
311 /* rts/Exception.hc */
312 #define CATCH_FRAME_BITMAP 15
313 /* rts/StgStartup.hc */
316 #define UPD_FRAME_BITMAP 7
317 \end{code}
319 For most closure types, the new definition of @StgProfHeader@ is
320 automatically propagated to their corresponding structures.
321 However, there are six closures types which are not affected by
322 @StgProfHeader@. They are all stack closures:
323 @RET_DYN@, @RET_BCO@, @RET_SMALL@, @RET_VEC_SMALL@, @RET_BIG@, and
324 @RET_VEC_BIG@.
325 If you want a new field to be added to these closures, you may
326 have to modify their corresponding structures.
328 \textbf{To do:} Presently the above changes introduce two bug in the
329 runtime system.
330 First, @nofib/real/symalg@ ends up with a division-by-zero
331 exception if we add a new field.
332 Second, the runtime system option @-auto-all@ clashes in some test files
333 in the @nofib@ testing suite (e.g., @spectral/expert@).
335 \subsection{Initialization code}
337 When a new closure is allocated, its retainer set field may have to be
338 initialized according to the way that retainer profiling is implemented.
339 For instance, we could use as an initial value a pointer to an empty retainer
340 set.
341 Alternatively we could assign a null pointer to indicate that its retainer
342 set has not been computed yet, which we adopt in our implementation.
343 In either case, we have to visit the new closure and execute some initialization
344 code on it so that its retainer set field is set to an appropriate value.
346 There are three parts in the source code which need to be modified.
347 dynamic closure initialization, static closure initialization,
348 and update frame initialization.
349 The first is accomplished by modifying the macro @SET_PROF_HDR()@ (in
350 @include/ClosureMacros.h@). When a closure @c@ is created at runtime,
351 @SET_PROF_HDR()@ is invoked immediately with @c@ as its first argument.
352 Thus, the following code initializes the retainer set field of every
353 dynamic closure to a null pointer.
355 \begin{code}
356 /* include/ClosureMacros.h */
357 #define SET_PROF_HDR(c,ccs_) \
359 \end{code}
361 Similarly, the macro @SET_STATIC_PROF_HDR()@ (in the
362 same file) specifies how the retainer set field of every static closure
363 is initialized, which is rewritten as follows:
365 \begin{code}
366 /* include/ClosureMacros.h */
367 #define SET_STATIC_PROF_HDR(ccs_) \
368 prof : { ccs : ccs_, rs : NULL },
369 \end{code}
371 \textbf{Obsolete:} Dynamic closures created through explicit C function invocations
372 (in @RtsAPI.c@) are now initialized by @SET_HDR()@.
374 %There is another way of creating dynamic closures through explicit C
375 %function invocations at runtime.
376 %Such functions are all defined in @RtsAPI.c@: @rts_mkChar()@, @rts_mkInt()@,
377 %@rts_mkWord()@, and so forth.
378 %Each function allocates memory for a new closure,
379 %initializes it, and returns its address.
380 %Therefore, we can simply insert in each function another initialization code
381 %for retainer set fields.
382 %To this end, we define a macro @setRetainerField()@ and insert it
383 %in each function:
384 %
385 %\begin{code}
386 %#define setRetainerField(p) \
388 %\end{code}
389 %
390 %For instance, @rts_mkChar()@ is now defined as follows:
391 %
392 %\begin{code}
393 %/* RtsAPI.c */
395 %rts_mkChar (HsChar c)
396 %{
397 % StgClosure *p = (StgClosure *)allocate(CONSTR_sizeW(0,1));
398 % ...
399 % setRetainerField(p);
400 % return p;
401 %}
402 %\end{code}
404 Finally we may need to initialize the retainer set field of an update frame
405 (stack closure) when it is pushed onto the stack for the first time.
406 For instance, if we want to initialize the retainer set field of update
407 frames to a null pointer, we can rewrite the macro @PUSH_STD_CCCS()@
410 \begin{code}
412 #define PUSH_STD_CCCS(frame) \
414 \end{code}
416 In our implementation of retainer profiling, however, the retainer set field is not
417 used for any stack closure.
418 Hence, the above modification is entirely unnecessary.
419 Also, update frames are the only exception to the standard way of creating
420 stack closures: all the other types of stack closures with a retainer set
421 field are eventually initialized by
422 the macro @SET\_HDR()@ (in @includes/ClosureMacros.h@), which in turn
423 invokes @SET\_PROF\_HDR()@. This is not the case for update frames.
424 Compare @PUSH\_UPD\_FRAME()@ (in @includes/Updates.h@) and
425 @PUSH\_SEQ\_FRAME()@ (in @includes/StgMacros.h@) for clarification.
427 \section{Retainer Sets}
429 At the end of retainer profiling, every live closure (except stack
430 closures, for which we do not compute retainer sets) is associated with
431 a retainer set; there can be no closure without an associated retainer set
432 because every live closure is visited during traversal.
433 Since many closures may well be associated with a common retainer set,
434 we want to avoid creating any particular retainer set more than once.
435 This section presents the details of manipulating retainer sets in our
436 implementation.
438 \subsection{Identity of a retainer}
440 The function @R()@ in Figure~\ref{fig-retaineralgorithm} returns
441 the identity of a retainer. In order to implement it, we need
442 a type for retainer identity.
443 The type @retainer@ (in @includes/StgRetainerProf.h@) is introduced for
444 this purpose.
446 There are various ways of defining the type @retainer@.
447 For instance, we can designate the information table address of a retainer as
448 its identity as follows:
450 \begin{code}
451 struct _StgInfoTable;
452 typedef struct _StgInfoTable *retainer;
453 \end{code}
455 We can also use the cost centre stack associated with the retainer:
457 \begin{code}
458 typedef CostCentreStack *retainer;
459 \end{code}
461 The function @R()@ is embodied as the function @getRetainerFrom()@ in the
462 implementation, whose type is @(StgClosure *)@ $\rightarrow$ @retainer@.
463 It is straightforward to define @getRetainerFrom()@ according to the definition
464 of @retainer@, as illustrated below:
466 \begin{code}
467 retainer getRetainerFrom(StgClosure *c) { return get_itbl(c); }
468 retainer getRetainerFrom(StgClosure *c) { return c->header.prof.ccs; }
469 \end{code}
471 \subsection{Retainer sets and the cost function}
473 A retainer set is stored in the structure @retainerSet@
474 (in @includes/StgRetainerProf.h@):
476 \begin{code}
477 typedef struct _retainerSet {
478 nat num;
479 nat cost;
480 ...
481 int id;
482 retainer element[0];
483 } retainerSet;
484 \end{code}
486 The field @num@ gives the number of retainers in the retainer set, which
487 are all stored in the array @element[]@. Thus, the size of @element[]@
488 is assumed to be @num@.
489 The field @cost@ gives the sum of the \emph{costs} of those closures
490 associated with the retainer set: if a closure @c@ is
491 associated with the retainer set, that is, if @c@ is retained by each
492 retainer in the retainer set and none else,
493 the cost of @c@ is added to the field @cost@.
494 The field @id@ gives a unique identification number for the retainer set.
496 The interface to @retainerSet@ is as follows
497 (see @RetainerSet.h@):
499 \begin{description}
500 \item[@void initializeAllRetainerSet(void)@] initializes the store for retainer sets.
501 \item[@void refreshAllRetainerSet(void)@] refreshes each retainer set by setting
502 its @cost@ field to zero. This function does destroy any retainer set.
503 \item[@void closeAllRetainerSet(void)@] destroys all retainer sets and closes
504 the store for retainer sets.
505 \item[@retainerSet *singleton(retainer r)@] returns a singleton retainer set
506 consisting of @r@ alone. If such a retainer set already exists, no new retainer
507 set is created. Otherwise, a new retainer set is created.
508 \item[@retainerSet *addElement(retainer r, retainerSet *rs)@] returns a retainer set
509 @rs@ augmented with @r@. If such a retainer set already exists, no new retainer set
510 is created. Otherwise, a new retainer set is created.
511 \item[@rtsBool isMember(retainer r, retainerSet *rs)@] returns a boolean value
512 indicating whether @r@ is a member of @rs@.
513 \item[@void traverseAllRetainerSet(void (*f)(retainerSet *))@] invokes the function
514 @f@ on every retainer set created.
515 \item[@void printRetainerSetShort(FILE *, retainerSet *)@] prints a single retainer
516 set.
517 \item[@void outputRetainerSet(FILE *, nat *allCost, nat *numSet)@] prints all
518 retainer sets. Stores the sum of all their costs in @*allCost@ and the number
519 of them in @*numSet@.
520 \item[@void outputAllRetainerSet(FILE *)@] prints all retainer sets.
521 \end{description}
523 We also define a \emph{cost function}, which returns the cost of a given closure,
524 in order to compute the field @cost@.
525 The cost function can be defined in several ways.
526 A typical definition is on the size of a closure, which results in
527 the field @cost@ accumulating the size of all the closures retained by a
528 retainer set.
529 If we just want to count the number of closures retained by the
530 retainer set, we can simply set the cost of every closure to one regardless
531 of its closure type.
532 Furthermore, we can define the cost function flexibly according to
533 the closure type.
534 For instance, we can set the size of any static closure to zero so that
535 it is not taken into account at all in computing the field @cost@.
536 Notice that static closures are also visited during traversal because they
537 may lead to other dynamic closures (e.g., static indirection closures of
538 closure type @IND_STATIC@).
539 This is especially desirable because we usually focus on the heap use.
540 We can also selectively choose certain dynamic closure types not to contribute
541 to the field @cost@.
543 In our implementation, there are two functions related with the cost function:
544 @cost()@ and @costPure()@.
545 @cost()@ returns the size of the entire memory allocated for a given closure
546 (even including the two fields in the structure @StgProfHeader@).
547 It returns zero for static closures.
548 @costPure()@ returns the size of the memory which would be allocated for
549 a given closure with no profiling.
550 It is defined in terms of @cost()@, and it suffices to change only @cost()@
551 when a new scheme for the cost function is desired.
552 @costPure()@ is put to actual use in computing the field @cost@ because it
553 effectively hides the memory overhead incurred by profiling.
555 \subsection{Implementation}
557 The algorithm for retainer profiling in Figure~\ref{fig-retaineralgorithm}
558 adds at most one retainer to an existing retainer set (or an empty retainer set)
559 at any moment; it does not require a retainer set union operation.
560 This observation simplifies the implementation, and
561 we employ the following two functions for creating new retainer sets:
562 @singleton()@, which creates a singleton retainer set, and
565 It is a frequent operation during retainer profiling to search for a retainer
566 set, which may or may not exist, built from a given retainer set and a
567 particular retainer.
568 To efficiently implement this operation,
569 we choose to store all retainer sets in a hash table and
570 the structure @retainerSet@ is now extended with two new fields
572 The field @hashKey@ stores the hash key which is obtained
573 from the retainers in a retainer set.
574 The field @link@ points to the next retainer set in the same bucket:
576 \begin{code}
577 typedef struct _retainerSet {
578 ...
579 StgWord hashKey;
581 ...
582 } retainerSet;
583 \end{code}
585 The hashing function must be defined in such a way that a retainer set
586 can have only one unique hash key regardless of the order its elements
587 are stored, i.e., the hashing function must be additive.
589 It is often observed that two successive executions of retainer profiling share
590 a number of retainer sets in common, especially if the two executions are
591 close in time.
592 This also implies that the number of all retainer sets which can be created
593 at any moment does not grow indefinitely regardless of the interval at which
594 retainer profiling is performed; it does not grow commensurately with the
595 number of times retainer profiling is executed.
596 This observation eliminates the need to free the memory allocated for
597 retainer sets; we can simply set the @cost@ field of every retainer set
598 to zero after each retainer profiling and reuse it during the next time.
600 \section{Graph Traversal}
602 At the heart of retainer profiling lies \emph{graph traversal};
603 the algorithm in Figure~\ref{fig-retaineralgorithm} is supposed to visit
604 every closure in the graph at least once and yield statistics on the heap use.
605 Since only live closures are reachable from the root, the algorithm
606 does not deal with dead closures.
608 This section presents details on how to achieve an efficient implementation of
609 graph traversal without incurring extra memory overhead and compromising speed.
611 \subsection{Goal}
613 Traversing a graph itself can be done in a straightforward way;
614 we choose either depth first search or breadth first search, and traverse
615 the graph starting from a given set of roots.
616 After a complete traversal, each live closure @c@ (including static closures)
617 has an associated retainer set, whose address is stored in the field
620 A real complication arises when retainer profiling is performed once again:
621 all live closures which have survived all garbage collections since
622 the previous retainer profiling
623 still have an associated retainer set (indicated by
624 a non-null pointer in their retainer set field), which is no longer
625 valid. Any new closure created since then has
626 a null pointer in its retainer set field at the beginning of retainer
627 profiling and will become associated with a retainer set.
628 Thus, we can no longer distinguish valid retainer set fields
629 from invalid ones.
631 A simple remedy is to linearly scan the heap at the beginning of each
632 retainer profiling and set all retainer set fields to a null pointer.
633 It resets the retainer set field of each dynamic closure, whether it is
634 live or not with respect to the given set of root.
635 This is feasible because any closure in the heap directly adjoins the
636 next closure, if any.
637 The problem is that we have no way of visiting all live static closures,
638 for which we compute retainer sets.
640 A moment of thought, however, reveals that we can completely avoid computing
641 retainer sets for static closures. This is because retainer profiling is
642 concerned only about the heap, which consists of dynamic closures and no
643 static closures. In other words, we can treat every static closure as
644 a bridge connecting two dynamic closures.
645 For instance, if a dynamic closure @c@$_1$ has a pointer to a static
646 closure @s@ and @c@ has a pointer to another dynamic closure @c@$_2$,
647 we can think of the pointer in @c@$_1$ as a direct pointer to @c@$_2$.
648 The big problem is that if the graph has a cycle containing static closures,
649 an infinite loop occurs. In other words, we have no way of telling whether
650 a static closure has been visited or not and are forced to compute
651 retainer sets for static closures as well.\footnote{For instance,
652 a static closure is allowed to have a self-reference in its SRT, which
653 is also followed during retainer profiling.}
655 Another remedy is to stores in every closure a time stamp for the
656 retainer set field. The time stamp indicates whether the retainer
657 set field is valid or no longer valid (i.e., it is for the previous
658 retainer profiling).
659 At the cost of one extra field in each closure, we can achieve an
660 elegant implementation with little complication.
661 However, it turns out that the memory overhead is too big.\footnote{A typical
662 dynamic closure is only two or three words long.}
663 Thus, our goal is to stick to the definition of the structure @StgProfHeader@
664 given earlier and yet to achieve an elegant solution.
666 \subsection{Basic plan}
668 Since we visit every live object and update its retainer set field,
669 any retainer set field can either be valid (the corresponding retainer
670 set is valid) or point to a retainer set created during the previous
671 retainer profiling.
672 In order to distinguish valid retainer set fields
673 from invalid ones, we exploit the least significant bit of the retainer
674 set field: we maintain a one bit mark which flips over every time
675 retainer profiling is performed, and judge that a retainer set field is
676 valid only if its least significant bit matches the mark.
677 The variable @flip@ serves for this purpose.
678 The macros @isRetainerSetFieldValid()@ tests if the retainer set field
679 of a give closure @c@ is valid:
681 \begin{code}
682 #define isRetainerSetFieldValid(c) \
683 ((((StgWord)(c)->header.prof.rs & 1) ^ flip) == 0)
684 \end{code}
686 As an example, a retainer set field can be set to a null value conforming
687 the current value of @flip@ by the macro @setRetainerSetToNull()@:
689 \begin{code}
690 #define setRetainerSetToNull(c) \
691 (c)->header.prof.rs = (retainerSet *)((StgWord)NULL | flip)
692 \end{code}
694 Now, when a dynamic closure @c@ is created, its retainer set field is
695 initialized to a null value conforming to the current value of
696 @flip@:\footnote{Actually this is not mandatory: even when the null
697 value does not conform to the current value of @flip@, it will be replaced
698 by a correct null value when @c@ is visited later.}
700 \begin{code}
701 extern StgWord flip;
702 #define SET_PROF_HDR(c,ccs_) \
704 \end{code}
706 We do not need to revise @SET_STATIC_PROF_HDR()@ if the initial value of
707 @flip@ is set to $0$.\footnote{For the same reason, an initial value $1$
708 does not compromise the correctness of the implementation.}
710 \subsection{Set of roots}
712 The set of roots consists of all thread closures (running, sleeping, or
713 blocked) existing at the beginning of a retainer profiling.
714 It is handily obtained in an indirect way by invoking the function
715 @GetRoots()@ (in @Schedule.c@) with an appropriate argument, which must be
716 a function:
717 @GetRoots()@ invokes on each root known to the runtime system its argument.
718 Thus, we implement a function @retainClosure()@, which initiates traversal
719 from a given root and updates the retainer set of every closure reachable
720 from the root,
721 and invokes @GetRoots()@ with @retainClosure@ as an argument.
723 In addition to the thread closures, weak pointers are also considered
724 as roots; they may not be reachable from any thread closure yet are still
725 being in used.
726 A weak pointer has three pointer fields: @key@, @value@, and
727 @finalizer@ (see the structure @StgWeak@ in @includes/Closures.h@).
728 It turns out that these pointers may not be valid all the time:
729 at a certain point during execution, for instance, the pointer @key@ may point
731 However, right after a major garbage collection, all the three pointers are
732 guaranteed to be valid, i.e., they all point to live closures.
733 This facilitates the handling of weak pointers if we choose to
734 perform retainer profiling immediately after a major garbage collection.
735 All weak pointers are found in the linked list @weak_ptr_list@
736 (in @Weak.c@).
738 See the function @computeRetainerSet()@ for details.
740 \subsection{Static closures}
742 When a live dynamic closure @c@ is visited for the first time during traversal,
743 its retainer set field is checked against the current value of @flip@.
744 If it was created at some point since the previous retainer profiling,
745 its retainer set field is already set to a correct null value.
746 Otherwise, it must have been visited
747 during the previous retainer profiling and thus its retainer set field is
748 invalid and will be set to a correct null value.
749 Therefore it is unnecessary to visit all dynamic closures and set their
750 retainer set field to a correct null value at the beginning of each retainer
751 profiling.
753 However, this operation is required for static closures.
754 The reason is that a static closure, which is never garbage collected,
755 may appear alternately in the set of live closures.
756 In other words, a currently live static closure may become dead and
757 be resuscitated again.
758 Therefore, for a static closure, it does not help to check if its
759 retainer set field conforms to the current value of @flip@.
760 For instance,
761 if a static closure happens to belong to the set of live closures every other
762 retainer profiling, its retainer set field will never set to a null value,
763 which is disastrous.
764 Therefore, we choose to visit all live static closures at the beginning
765 of each retainer profiling and set their retainer set field to a
766 correct null value.
768 In order to find all live static closures, we have each retainer
769 profiling preceded by a major garbage collection, which knows all live
770 static closures.\footnote{This is a heavy
771 restriction on retainer profiling, which makes retainer profiling partially
772 dependent on garbage collection.
773 However, it does not affect any retainer profiling result because
774 retainer profiling considers only live closures, which survive any
775 garbage collection.}
776 To be specific, the garbage collector builds a linked list
777 @scavenged_static_objects@ (in @GC.c@) during a major garbage collection,
778 which stores all live static closures of our interest.
779 \footnote{
780 A static closure of closure type @IND\_STATIC@ may be put in the
781 list @mut\_once\_list@ of the oldest generation, instead of the list
782 @scavenged\_static\_objects@.
783 In our implementation, such a closure is just skipped over because it
784 contains only a pointer to a dynamic closure, and we do not compute
785 its retainer set.
786 Thus, there is no need to traverse the list @mut\_once\_list@ of the oldest
787 generation.}
788 Since it destroys the linked list after finishing the major garbage collection
789 (by invoking the function @zero_static_object_list()@ with
790 @scavenged_static_objects@ as its argument),
791 we traverse the linked list to set the retainer set field of each
792 live static closure to a correct null value before its destruction.
793 This is done by invoking the function
794 @resetStaticObjectForRetainerProfiling()@.
796 \textbf{To do:} In the current implemenation, if a static closure has no child
797 (e.g., @CONSTR_NOCAF_STATIC@, @THUNK_STATIC@ with an empty SRT, and
798 @FUN_STATIC@ with an empty SRT), we do not compute its retainer set (because
799 there is no need to do). This slight optimization allows us to render
800 retainer profiling no longer dependent on garbage collection due to the
801 following propoerty:
803 \begin{center}
804 A static closure can alternately appear and disappear in the set of live
805 closures across multiple executions of retainer profiling if and only if
806 it has an empty SRT and no child.
807 \end{center}
809 Then we can completely eliminate the function
810 @resetStaticObjectForRetainerProfiling()@.
812 \subsection{Traversal}
814 The traversal proceeds in a depth first manner and is implemented
815 with two mutually recursive functions: @retainStack()@ and @retainerClosure()@.
816 @retainerStack()@ can be invoked on dynamic closures holding a stack chunk:
817 closure types @TSO@, @PAP@, and @AP_UPD@.
818 It in turn invokes @retainerClosure()@ on each closure reachable from
819 stack closures in the stack chunk. Notice that it does not invoke
820 @retainerClosure()@ on those stack closures because we do not compute
821 retainer sets for stack closures.
822 @retainerClosure()@ iteratively traverses all live closures reachable
823 from a given closure.
824 It maintains its own stack to record the next scan position in every closure
825 currently under consideration.\footnote{A recursive version of
826 @retainerClosure()@ could be implemented easily.
827 @retainerClosure()@ in our implementation is an iterative version.}
828 When it encounters a closure holding a stack chunk, it invokes @retainerStack()@
829 on that closure.
830 Hence,
831 the traversal is triggered simply by invoking @retainerClosure()@ on every root.
833 \textbf{To do:}
834 The correctness of retainer profiling is subject to the correctness
835 of the two macros @IS_ARG_TAG()@ and @LOOKS_LIKE_GHC_INFO()@
836 (see @retainStack()@). Since
837 @LOOKS_LIKE_GHC_INFO()@ is a bit precarious macro, so I believe that
838 the current implementation may not be quite safe. Also, @scavenge_stack()@
839 in @GC.c@ also exploits this macro in order to identify shallow pointers.
840 This can be a serious problem if a stack chunk contains some
841 word which looks like a pointer but is actually not a pointer.
843 \subsection{Sanity check}
845 Since we assume that a retainer profiling is preceded by a major garbage
846 collection,
847 we expect that the size of all the dynamic closures visited during
848 any retainer profiling adds up exactly to the total size of the heap.
849 In fact, this is not the case; there can be closures not reachable from
850 the set of roots yet residing in the heap even after a major garbage
851 collection.
853 First, a dead weak pointer remains in the heap until its finalizer
854 finishes. Although its finalizer thread closure is part of the set of roots,
855 the dead weak pointer itself is not reachable from any root.
856 Since it cannot be visited during retainer profiling anyhow, we do not
857 need to located it and set its retainer set field
858 appropriately.\footnote{Dead weak pointers are identified with their
859 information table @stg\_DEAD\_WEAK\_info@ (in @StgMiscClosures.hc@).
860 Notice that their closure type is @CONSTR@, \emph{not} @WEAK@;
861 their information table is replaced by @stg\_DEAD\_WEAK\_info@ in the
862 function @scheduleFinalizers()@ (in @GC.c@).}
864 Second,
865 mutable variables (of closure type @MUT_VAR@) may remain in the heap
866 even when they are not reachable from the set of roots while
867 dynamic closures pointed to by them must be live.\footnote{I do not
868 understand clearly why this happens :(}
869 Since such mutable variables may become live again (in the sense that
870 they become reachable from the set of roots), we must locate them
871 and set their retainer set field appropriately after each retainer
872 profiling. This is handily accomplished by traversing the list
873 @mut_once_list@ in every generation.
875 \section{Retainer Profiling Schemes}
877 A retainer profiling scheme specifies \emph{what} retainer profiling
878 yields (as opposed to \emph{how} retainer profiling computes the retainer
879 set for every live object).
880 It is determined primarily by the meaning of retainer identity,
881 that is, the type @retainer@ (in @includes/StgRetainerProf.h@).
882 The function @getRetainerFrom()@ must be defined according to the
883 definition of the type @retainer@.
885 In order for a new retain profiling scheme to fully work, we need to follow
886 four steps:
888 \begin{enumerate}
889 \item Define the type @retainer@ as desired.
890 \item Write @getRetainerFrom()@.
891 \item Write two hashing functions @hashkeySingletone()@ and
892 @hashKeyAddElement()@, which return the hash key from a single
893 retainer and a retainer set with another retainer, respectively.
894 \item Write two printing functions @printRetainer()@ and
895 @printRetainerSetShort()@.
896 These functions are employed when a retainer or a retainer set is
897 printed in the output file.
898 \end{enumerate}
900 In our implementation, we use cost centre stacks for retainer identity:
902 \begin{code}
903 typedef CostCentreStack *retainer;
904 \end{code}
905 \begin{code}
906 retainer getRetainerFrom(StgClosure *c) { return c->header.prof.ccs; }
907 \end{code}
908 \begin{code}
909 void printRetainer(FILE *f, retainer cc)
910 {
911 fprintf(f,"%s[%s]", cc->label, cc->module);
912 }
913 \end{code}
915 \textbf{To do:} All the closures created by @rts_mk...()@ in @RtsAPI.c@ are given
916 @CCS_SYSTEM@ as their cost centre stacks. This may not be accurate indeed,
917 and, for instance, @CCCS@ may be a better choice than @CCS_SYSTEM@.
919 \section{Usage}
921 Since cost centre stacks are used as retainer identity, a source program
922 must be given proper cost centre annotations by programmers.
923 Alternatively,
924 we can ask the compiler to automatically insert cost centre annotations.
925 For instance, the compiler option @-auto-all@ inserts a cost centre
926 annotation around every top-level function as shown below
927 (the @-p@ option is a must
928 because we must build the executable file in a profiling way):
930 \begin{code}
931 $ghc-inplace -o Foo.out -p -auto-all Foo.hs 932 \end{code} 934 The runtime system option @-hR@ tells the executable program to 935 gather profiling statistics and report them in a @.prof@ file: 937 \begin{code} 938$ Foo.out +RTS -hR -RTS
939 \end{code}
941 The option @-i@ can be used to
942 specify a desired interval at which retainer profiling is performed.
943 The default and minimum value is half a second:
945 \begin{code}
946 $Foo.out +RTS -hR -i2.5 -RTS 947 \end{code} 949 Then, two text files are generated: a @.prof@ file and a @.hp@ file. 950 The @.prof@ file records the progress of retainer profiling: 951 for each retainer profiling performed during program execution, 952 it shows 953 the Haskell mutator time (as opposed to the user time) at which 954 the retainer profiling starts, 955 the average number of times a closure is visited, 956 the sum of costs assigned to all retainer sets (obtained from the field 957 @cost@ in each retainer set), 958 and the number of all retainer sets created \emph{since} the beginning 959 of program execution. 960 A typical entry in a @.prof@ file looks like: 962 \begin{code} 963 Retainer Profiling: 3, at 3.530000 seconds 964 Average number of visits per object = 1.687765 965 Current total costs = 801844 966 Number of retainer sets = 118 967 \end{code} 969 The sum of costs assigned to all retainer sets may \emph{not} be equal to the 970 size of the heap. 971 The discrepancy is attributed to those live object which are not reachable 972 from the set of roots. 973 Still it is a good estimate of the size of the heap at the moment when 974 the retainer profiling was performed. 976 The @.prof@ file also shows the contents of every retainer set which 977 has been assigned a positive cost (i.e., the field @cost@) at least once; 978 not every retainer set created is assigned a positive cost because quite 979 a few retainer sets are created as intermediate retainer sets before 980 creating a real retainer set. This results from the restriction on the way 981 retainer sets are created (only one retainer can be added to an existing 982 retainer set at a time). 984 An example of the contents of a retainer set is: 986 \begin{code} 987 SET 71 = {<doFile[Main],main[Main],MAIN[MAIN]>, <synth_2[Main],doFile[Main],main[Main],MAIN[MAIN]>} 988 \end{code} 990 The retainer set has an identification number$71$. 991 It is associated with two retainers, whose retainer identities are shown 992 inside angle brackets @<...>@. 993 For instance, the first retainer is created when the cost centre stack 994 is @doFile[Main],main[Main],MAIN[MAIN]@, shown from the top to the bottom. 995 Each entry in angle brackets consists of a cost centre name (e.g., @doFile@) 996 and its module name (e.g., @Main@). 998 The @.hp@ file can be supplied to the @hp2ps@ program to create a postscript 999 file showing the progress of retainer profiling in a graph: 1001 \begin{code} 1002$ hp2ps Foo.hs
1003 \$ gv Foo.ps
1004 \end{code}
1006 An example of such a graph is shown in Figure~\ref{fig-cacheprof}.
1007 It shows the cost assigned to each retainer set at the point
1008 when a retainer profiling is performed (marked by a corresponding inverted
1009 triangles on the horizontal axis).
1010 The abbreviated contents of each retainer set is displayed in the right column.
1011 Due to the space limitation,
1012 it shows only topmost cost centres (without module names)
1013 instead of printing the full contents.
1014 For instance, @(71)doFile,synth_2@ corresponds to a retainer set shown above
1015 (@71@ is its identification number).
1016 The contents may be truncated if it is too long.
1018 Notice that the time is in the Haskell mutator time, which excludes
1019 the runtime system time such as garbage collection time and retainer profiling
1020 time. Thus, the actual execution takes longer than indicated in the
1021 graph. Also, the timer employed to periodically perform retainer profiling
1022 is not perfectly accurate. Therefore, the result may slightly vary for each
1023 execution of retainer profiling.
1025 \begin{figure}[ht]
1026 \centering
1027 \epsfig{file=cacheprof_p.eps,width=5in}
1028 \caption{A graph showing the progress of retainer profiling}
1029 \label{fig-cacheprof}
1030 \end{figure}
1032 \section{Comparision with nhc}
1034 \section{Files}
1036 This section gives a summary of changes made to the GHC in
1037 implementing retainer profiling.
1038 Only three files (@includes/StgRetainerProf.h@, @RetainerProfile.c@, and
1039 @RetainerProfile.h@) are new, and all others exist in the GHC.
1041 @\includes@ directory:
1043 \begin{description}
1044 \item[StgRetainerProf.h] defines types @retainer@ and @retainerSet@.
1045 \item[Stg.h] includes the header file @StgRetainerProf.h@.
1047 \item[Constants.h] changes constants @PROF_HDR_SIZE@, @SCC_UF_SIZE@, and
1048 @SCC_SEQ_FRAME_SIZE@.
1049 \item[ClosureMacros.h] changes macros @SET_PROF_HDR()@ and
1050 @SET_STATIC_PROF_HDR()@.
1052 \end{description}
1054 @\rts@ directory:
1056 \begin{description}
1057 \item[Exception.hc] changes constant @CATCH_FRAME_BITMAP@,
1059 \item[StgStdThunks.hc] changes constant @RET_BITMAP@.
1061 \item[RetainerProfile.c] implements the retainer profiling engine.
1062 \item[RetainerProfile.h] is the header for @RetainerProfile.c@.
1063 \item[RetainerSet.c] implements the abstract datatype @retainerSet@.
1064 \item[RetainerSet.h] defines the interface for @retainerSet@.
1065 \item[GC.c] invokes @resetStaticObjectForRetainerProfiling()@ in
1066 @GarbageCollect()@.
1067 \item[Itimer.c] changes @handle_tick()@.
1068 \item[ProfHeap.c] changes @initHeapProfiling()@ and @endHeapProfiling()@.
1069 \item[Profiling.c] changes @initProfilingLogFile()@ and
1070 @report_ccs_profiling()@.
1071 \item[Proftimer.c] declares @ticks_to_retainer_profiling@,
1072 @performRetainerProfiling@, and @doContextSwitch@.
1073 \item[Proftimer.h] is the header for @Proftimer.c@. Defines @PROFILING_MIN_PERIOD@,
1074 which specifies the minimum profiling period and the default profiling period.
1075 %\item[RtsAPI.c] implements @setRetainerField()@.
1076 \item[RtsFlags.c]
1077 sets @RtsFlags.ProfFlags.doHeapProfile@ and
1078 adds a string to @usage_text[]@ in @setupRtsFlags()@.
1079 \item[RtsFlags.h] defines constants @HEAP_BY_RETAINER@ and @RETAINERchar@.
1080 \item[RtsStartup.c] includes the header file @RetainerProfile.h@.
1082 \item[Schedule.c] changes @schedule()@.
1083 \item[Stats.c]
1084 declares @RP_start_time@, @RP_tot_time@, @RPe_start_time@,
1085 @RPe_tot_time@.
1086 Changes @mut_user_time_during_GC()@, @mut_user_time()@,
1087 @stat_startExit()@,
1088 @stat_endExit()@, and
1089 @stat_exit()@.
1090 Defines
1091 @mut_user_time_during_RP()@,
1092 @stat_startRP()@, and
1093 @stat_endRP()@.
1094 \item[Stats.h] is the header for @Stats.c@.