1 \documentclass{article}
3 \usepackage{proof}
4 \usepackage{code}
6 \sloppy
7 \setlength{\parskip}{0.5\baselineskip plus 0.2\baselineskip minus 0.1\baselineskip}
8 \setlength{\parsep}{\parskip}
9 \setlength{\topsep}{0cm}
10 \setlength{\parindent}{0cm}
11 \renewcommand{\textfraction}{0}
12 \renewcommand{\topfraction}{1}
13 \renewcommand{\floatpagefraction}{0.8}
14 \renewcommand{\dblfloatpagefraction}{0.8}
17 \newcommand{\NS}{{\cal N}}
18 % NS: set of native threads
19 \newcommand{\HS}{{\cal H}}
21 \newcommand{\hcall}{H}
22 \newcommand{\fcall}[2]{F^{#1}~#2}
23 \newcommand{\ret}[1]{R~#1}
26 \begin{document}
29 \title{A semantics for foreign threads}
31 \author{Simon Peyton Jones \and Simon Marlow}
34 \makeatactive
36 \section{Introduction}
38 Threads created using OS primitives (for example pthreads on POSIX systems) can
39 have thread-local state. Several libraries make use of this thread-local state.
40 Most Haskell systems use just one OS thread to execute all foreign code.
41 With GHC's threaded RTS, this is not even necessarily the same OS thread all the
42 time.
43 Therefore it is not possible to use foreign libraries that depend on thread-local
44 state reliably.
46 The following foreign libraries are known to be affected:
47 \begin{itemize}
48 \item OpenGL
49 \item Carbon and Cocoa (Apple's GUI toolkits)
50 \end{itemize}
52 \section{Requirements}
54 \begin{itemize}
55 \item It should be possible for Haskell code to arrange that a sequence of
56 calls to a given library are performed by the same native thread and
57 that if an external library calls into Haskell, then any outgoing calls
60 \item The specification should be implementable in a way that allows a lot
62 GHC's current unsafe'' foreign calls.
64 \item The good performance of the existing lightweight green'' threads in
65 GHC should not be sacrificed. Performance should still OK when using
66 the new features with only a few threads (i.e. not more than commonly
67 used from multithreaded C programs).
69 \item The specification shouldn't explicitly require lightweight green''
70 threads to exist. The specification should be implementable in a simple
71 and obvious way in haskell systems that always use a 1:1 correspondence
74 \item The specification shouldn't specify which particular OS thread
75 should be used to execute Haskell code. It should be possible to
76 implement it with e.g. a Haskell interpreter running in one OS thread
77 that just uses other OS threads for foreign calls.
79 \item There should be no unexpected blocking. Especially, threadsafe calls
80 should never cause other threads to block.
81 \end{itemize}
84 \section{State}
86 The state of the system is represented by a pair
87 $$\NS ; \HS$$
88 where
89 \begin{itemize}
90 \item $\NS = \{t_1, \ldots ,t_n\}$ is a set of \emph{native threads}.
91 A native thread is directly supported by the underlying operating
92 system, and may have some thread-local state. It may therefore
93 be important which native thread executes foreign
96 \item $\HS = \{h_1, \ldots ,h_n\}$ is a set of \emph{Haskell threads}.
98 can be created very cheaply. A Haskell thread can only make progress
99 if a native thread acts as its execution engine.
100 \end{itemize}
102 The syntax of a native thread is this:
103 $$104 \begin{array}{lrcll} 105 \mbox{Native thread} & t & ::= & N[S] \\ 106 \\ 107 \mbox{Native thread stack} & S & ::= & \epsilon & \mbox{Empty}\\ 108 & & | & \hcall : S & \mbox{Executing Haskel} \\ 109 & & | & \fcall{si}{h} : S & \mbox{Executing foreign code}\\ 110 & & | & \bullet & \mbox{Unknown}\\ 111 \\ 112 \mbox{Safety indicator} & si & ::= & u & \mbox{Unsafe} \\ 113 & & | & s & \mbox{Safe} \\ 114 & & | & t & \mbox{Thread-safe} \\ 115 \end{array} 116$$
117 A native thread of form $N[S]$ has thread-id $N$, while $S$ is
118 an abstraction of its call stack. If $\hcall$ is on top of the stack,
120 \emph{(What about referring to the $\hcall$ state as
121 inside the Haskell runtime system''? It includes things like GC and other
122 administrative tasks''. A native thread might be in the $\hcall$ state
123 without ever really executing Haskell code.)}
124 If $\fcall{si}{h}$ is
125 on top of the stack, the thread is in the process of dealing with a call
126 to a foreign function, which will return its result to the Haskell thread
127 $h$. The safety-indicator $si$ is from the FFI spec.
130 $$131 \begin{array}{lrcll} 132 \mbox{Haskell thread} & h & ::= & (a)_{bt} \\ 133 \\ 134 \mbox{Haskell action} & a & ::= & p ~@>>@~ a & \mbox{Sequence} \\ 135 & & | & \ret{N} & \mbox{Return to native thread N} \\ 136 \\ 137 \mbox{Primitive action} & p & ::= & \tau & \mbox{Internal action} \\ 138 & & | & \fcall{si}{f} & \mbox{Foreign call} \\ 139 \\ 140 \mbox{Bound thread id} & bt & ::= & \epsilon & \mbox{Not bound} \\ 141 & & | & N & \mbox{Bound to native thread N} 142 \end{array} 143$$
144 A Haskell thread $h$ of form $(a)_{bt}$ has action $a$. The indicator
145 $bt$ tells whether the Haskell thread is \emph{bound to} a native
146 thread. If $bt$ is $\epsilon$, the Haskell thread is not bound to a
147 native thread; if $bt$ is $N$, the Haskell thread is bound to native
148 thread $N$. Specifing exactly what it means for a Haskell thread to
149 be bound to a native thread is the main purpose of this document.
151 An action $a$ is a sequence of primitive actions, finishing with a
152 return of some kind. A primitive action is either some internal Haskell
153 thing (such as performing a bit of evaluation, or operating on an @MVar@),
154 or else it is a call to a foreign function $f$.
156 We do not model the data passed to, or returned from, a foreign call, nor
157 any details of what internal Haskell'' operations are.
159 \section{Evolution}
161 We describe how the system evolves in a very standard way, using
162 transition rules, of form
163 $$164 \NS ; \HS ~\Rightarrow~ \NS' ; \HS' 165$$
166 The structural rules are these:
167 $$168 \begin{array}{c} 169 \infer{\NS \cup \{t\} ; \HS ~\Rightarrow~ \NS' \cup \{t\}; \HS'} 170 {\NS ; \HS ~\Rightarrow~ \NS' ; \HS'} 171 \qquad 172 \infer{\NS ; \HS \cup \{h\} ~\Rightarrow~ \NS'; \HS' \cup \{h\}} 173 {\NS ; \HS ~\Rightarrow~ \NS' ; \HS'} 174 \end{array} 175$$
176 These standard rules allow us to write the interesting transitions with less clutter.
177 $$178 \begin{array}{rcll} 179 N[\hcall:S]; (\tau~@>>@~a)_{bt} 180 & \Rightarrow 181 & N[\hcall:S]; (a)_{bt} & (INT) \\ 182 \\ 183 N[\hcall]; (\fcall{si}{f}~@>>@~a)_{\epsilon} 184 & \Rightarrow 185 & N[\fcall{si}{a_\epsilon}:\hcall]; & (FCALL1) \\ 186 \\ 187 N[\hcall:S]; (\fcall{si}{f}@>>@~a)_N 188 & \Rightarrow 189 & N[\fcall{si}{a_N}:\hcall:S]; & (FCALL2) \\ 190 \\ 191 N[\fcall{si}{a_{bt}}:S]; 192 & \Rightarrow 193 & N[S]; a_{bt} & (FRET) \\ 194 \\ 195 N[\bullet]; 196 & \Rightarrow 197 & N[\hcall:\bullet]; (f ~@>>@~ \ret{N})_{bt} & (HCALL1) \\ 198 & \multicolumn{2}{l}{\mbox{where}~ bt \begin{array}[t]{ll} 199 = \epsilon & \mbox{if f is a non-bound foreign export} \\ 200 = N & \mbox{if f is a bound foreign export} 201 \end{array}} \\ 202 \\ 203 N[\fcall{si}{a_{bt}} : S]; 204 & \Rightarrow 205 & N[\hcall : \fcall{si}{a_{bt}} : S]; ~ (f ~@>>@~ \ret{N})_{bt} & (HCALL2) \\ 206 & \multicolumn{2}{l}{\mbox{where}~si \in \{s,t\}} \\ 207 & \multicolumn{2}{l}{\mbox{and}~ bt \begin{array}[t]{ll} 208 = \epsilon & \mbox{if f is a non-bound foreign export} \\ 210 = N & \mbox{if f is a bound foreign export} 211 \end{array}} \\ 212 \\ 213 N[\hcall : S]; \ret{N} 214 & \Rightarrow 215 & N[S]; & (HRET) \\ 216 \\ 217 (nothing) 218 & \Rightarrow 219 & N[\hcall]; & (WKR) \\ 220 & \multicolumn{2}{l}{\mbox{where N is fresh}} \\ 221 \\ 222 (nothing) 223 & \Rightarrow 224 & N[\bullet]; & (EXT) \\ 225 & \multicolumn{2}{l}{\mbox{where N is fresh}} \\ 226 \end{array} 227$$
228 Here is what the rules mean:
229 \begin{itemize}
230 \item (INT) says that an internal action of a Haskell thread
231 can be executed by any native thread that
232 is in the Haskell state; that is, has $H$ on top of its stack. This is true
233 even if the native thread has called a bound foreign export, and that
234 Haskell function is still running.
236 \item On the other hand, (FCALL1) says that only a vanilla worker thread $N[H]$ (i.e. with
237 nothing on its stack except $H$) can execute an unbound foreign call. Similarly (FCALL2) says
238 that only native thread $N$ can execute a foreign call from a Haskell thread bound to $N$.
239 (This is under the proposal that foriegn imports are not labelled bound/unbound.)
241 \item In either (FCALL) case, the Haskell thread is removed from the pool, and captured in the
243 when the native thread decides to return from the call.
245 \item Rules (HCALL1) and (HCALL2) deal with foreign code calling Haskell. (HCALL1)
246 starts with a native thread that comes out of the blue'' with an unknown stack $\bullet$.
247 It just puts a new Haskell thread into the pool, bound or otherwise depending on
248 whether the foreign export being called is labelled as bound.
250 (HCALL2) is similar, except that we aren't allowed to make such a call if
251 the native thread is in the middle of an unsafe foreign call. The difference
252 between (HCALL1) and (HCALL2) looks very unsavoury.
254 \item (HRET) deals with returning from a completed Haskell thread to the appropriate
256 to the right calling native thread, hence the return action'' $\ret{t}$ tacked onto