Explained the reason for it all in the introduction. Fix a few typos.
[haskell-report.git] / ffi / threads.tex
1 \documentclass{article}
2
3 \usepackage{proof}
4 \usepackage{code}
5
6 \sloppy
7 \setlength{\parskip}{0.5\baselineskip plus 0.2\baselineskip minus 0.1\baselineskip}
8 \setlength{\parsep}{\parskip}
9 \setlength{\topsep}{0cm}
10 \setlength{\parindent}{0cm}
11 \renewcommand{\textfraction}{0}
12 \renewcommand{\topfraction}{1}
13 \renewcommand{\floatpagefraction}{0.8}
14 \renewcommand{\dblfloatpagefraction}{0.8}
15
16
17 \newcommand{\NS}{{\cal N}}
18 % NS: set of native threads
19 \newcommand{\HS}{{\cal H}}
20 % HS: set of Haskell threads
21 \newcommand{\hcall}{H}
22 \newcommand{\fcall}[2]{F^{#1}~#2}
23 \newcommand{\ret}[1]{R~#1}
24
25
26 \begin{document}
27
28
29 \title{A semantics for foreign threads}
30
31 \author{Simon Peyton Jones \and Simon Marlow}
32
33
34 \makeatactive
35
36 \section{Introduction}
37
38 Threads created using OS primitives (for example pthreads on POSIX systems) can
39 have thread-local state. Several libraries make use of this thread-local state.
40 Most Haskell systems use just one OS thread to execute all foreign code.
41 With GHC's threaded RTS, this is not even necessarily the same OS thread all the
42 time.
43 Therefore it is not possible to use foreign libraries that depend on thread-local
44 state reliably.
45
46 The following foreign libraries are known to be affected:
47 \begin{itemize}
48 \item OpenGL
49 \item Carbon and Cocoa (Apple's GUI toolkits)
50 \end{itemize}
51
52 \section{Requirements}
53
54 \begin{itemize}
55 \item It should be possible for Haskell code to arrange that a sequence of
56 calls to a given library are performed by the same native thread and
57 that if an external library calls into Haskell, then any outgoing calls
58 from Haskell are performed by the same native thread.
59
60 \item The specification should be implementable in a way that allows a lot
61 of foreign calls to be made with no additional overhead with respect to
62 GHC's current ``unsafe'' foreign calls.
63
64 \item The good performance of the existing lightweight ``green'' threads in
65 GHC should not be sacrificed. Performance should still OK when using
66 the new features with only a few threads (i.e. not more than commonly
67 used from multithreaded C programs).
68
69 \item The specification shouldn't explicitly require lightweight ``green''
70 threads to exist. The specification should be implementable in a simple
71 and obvious way in haskell systems that always use a 1:1 correspondence
72 between Haskell threads and OS threads.
73
74 \item The specification shouldn't specify which particular OS thread
75 should be used to execute Haskell code. It should be possible to
76 implement it with e.g. a Haskell interpreter running in one OS thread
77 that just uses other OS threads for foreign calls.
78
79 \item There should be no unexpected blocking. Especially, threadsafe calls
80 should never cause other threads to block.
81 \end{itemize}
82
83
84 \section{State}
85
86 The state of the system is represented by a pair
87 $$\NS ; \HS$$
88 where
89 \begin{itemize}
90 \item $\NS = \{t_1, \ldots ,t_n\}$ is a set of \emph{native threads}.
91 A native thread is directly supported by the underlying operating
92 system, and may have some thread-local state. It may therefore
93 be important which native thread executes foreign
94 (non-Haskell) code.
95
96 \item $\HS = \{h_1, \ldots ,h_n\}$ is a set of \emph{Haskell threads}.
97 Haskell threads are supposed to be extremely lightweight, and
98 can be created very cheaply. A Haskell thread can only make progress
99 if a native thread acts as its execution engine.
100 \end{itemize}
101
102 The syntax of a native thread is this:
103 $$
104 \begin{array}{lrcll}
105 \mbox{Native thread} & t & ::= & N[S] \\
106 \\
107 \mbox{Native thread stack} & S & ::= & \epsilon & \mbox{Empty}\\
108 & & | & \hcall : S & \mbox{Executing Haskel} \\
109 & & | & \fcall{si}{h} : S & \mbox{Executing foreign code}\\
110 & & | & \bullet & \mbox{Unknown}\\
111 \\
112 \mbox{Safety indicator} & si & ::= & u & \mbox{Unsafe} \\
113 & & | & s & \mbox{Safe} \\
114 & & | & t & \mbox{Thread-safe} \\
115 \end{array}
116 $$
117 A native thread of form $N[S]$ has thread-id $N$, while $S$ is
118 an abstraction of its call stack. If $\hcall$ is on top of the stack,
119 the thread is willing to execute a Haskell thread.
120 \emph{(What about referring to the $\hcall$ state as
121 ``inside the Haskell runtime system''? It includes things like GC and other
122 ``administrative tasks''. A native thread might be in the $\hcall$ state
123 without ever really executing Haskell code.)}
124 If $\fcall{si}{h}$ is
125 on top of the stack, the thread is in the process of dealing with a call
126 to a foreign function, which will return its result to the Haskell thread
127 $h$. The safety-indicator $si$ is from the FFI spec.
128
129 The syntax of a Haskell thread is this:
130 $$
131 \begin{array}{lrcll}
132 \mbox{Haskell thread} & h & ::= & (a)_{bt} \\
133 \\
134 \mbox{Haskell action} & a & ::= & p ~@>>@~ a & \mbox{Sequence} \\
135 & & | & \ret{N} & \mbox{Return to native thread $N$} \\
136 \\
137 \mbox{Primitive action} & p & ::= & \tau & \mbox{Internal action} \\
138 & & | & \fcall{si}{f} & \mbox{Foreign call} \\
139 \\
140 \mbox{Bound thread id} & bt & ::= & \epsilon & \mbox{Not bound} \\
141 & & | & N & \mbox{Bound to native thread N}
142 \end{array}
143 $$
144 A Haskell thread $h$ of form $(a)_{bt}$ has action $a$. The indicator
145 $bt$ tells whether the Haskell thread is \emph{bound to} a native
146 thread. If $bt$ is $\epsilon$, the Haskell thread is not bound to a
147 native thread; if $bt$ is $N$, the Haskell thread is bound to native
148 thread $N$. Specifing exactly what it means for a Haskell thread to
149 be bound to a native thread is the main purpose of this document.
150
151 An action $a$ is a sequence of primitive actions, finishing with a
152 return of some kind. A primitive action is either some internal Haskell
153 thing (such as performing a bit of evaluation, or operating on an @MVar@),
154 or else it is a call to a foreign function $f$.
155
156 We do not model the data passed to, or returned from, a foreign call, nor
157 any details of what ``internal Haskell'' operations are.
158
159 \section{Evolution}
160
161 We describe how the system evolves in a very standard way, using
162 transition rules, of form
163 $$
164 \NS ; \HS ~\Rightarrow~ \NS' ; \HS'
165 $$
166 The structural rules are these:
167 $$
168 \begin{array}{c}
169 \infer{\NS \cup \{t\} ; \HS ~\Rightarrow~ \NS' \cup \{t\}; \HS'}
170 {\NS ; \HS ~\Rightarrow~ \NS' ; \HS'}
171 \qquad
172 \infer{\NS ; \HS \cup \{h\} ~\Rightarrow~ \NS'; \HS' \cup \{h\}}
173 {\NS ; \HS ~\Rightarrow~ \NS' ; \HS'}
174 \end{array}
175 $$
176 These standard rules allow us to write the interesting transitions with less clutter.
177 $$
178 \begin{array}{rcll}
179 N[\hcall:S]; (\tau~@>>@~a)_{bt}
180 & \Rightarrow
181 & N[\hcall:S]; (a)_{bt} & (INT) \\
182 \\
183 N[\hcall]; (\fcall{si}{f}~@>>@~a)_{\epsilon}
184 & \Rightarrow
185 & N[\fcall{si}{a_\epsilon}:\hcall]; & (FCALL1) \\
186 \\
187 N[\hcall:S]; (\fcall{si}{f}@>>@~a)_N
188 & \Rightarrow
189 & N[\fcall{si}{a_N}:\hcall:S]; & (FCALL2) \\
190 \\
191 N[\fcall{si}{a_{bt}}:S];
192 & \Rightarrow
193 & N[S]; a_{bt} & (FRET) \\
194 \\
195 N[\bullet];
196 & \Rightarrow
197 & N[\hcall:\bullet]; (f ~@>>@~ \ret{N})_{bt} & (HCALL1) \\
198 & \multicolumn{2}{l}{\mbox{where}~ bt \begin{array}[t]{ll}
199 = \epsilon & \mbox{if $f$ is a non-bound foreign export} \\
200 = N & \mbox{if $f$ is a bound foreign export}
201 \end{array}} \\
202 \\
203 N[\fcall{si}{a_{bt}} : S];
204 & \Rightarrow
205 & N[\hcall : \fcall{si}{a_{bt}} : S]; ~ (f ~@>>@~ \ret{N})_{bt} & (HCALL2) \\
206 & \multicolumn{2}{l}{\mbox{where}~si \in \{s,t\}} \\
207 & \multicolumn{2}{l}{\mbox{and}~ bt \begin{array}[t]{ll}
208 = \epsilon & \mbox{if $f$ is a non-bound foreign export} \\
209
210 = N & \mbox{if $f$ is a bound foreign export}
211 \end{array}} \\
212 \\
213 N[\hcall : S]; \ret{N}
214 & \Rightarrow
215 & N[S]; & (HRET) \\
216 \\
217 (nothing)
218 & \Rightarrow
219 & N[\hcall]; & (WKR) \\
220 & \multicolumn{2}{l}{\mbox{where $N$ is fresh}} \\
221 \\
222 (nothing)
223 & \Rightarrow
224 & N[\bullet]; & (EXT) \\
225 & \multicolumn{2}{l}{\mbox{where $N$ is fresh}} \\
226 \end{array}
227 $$
228 Here is what the rules mean:
229 \begin{itemize}
230 \item (INT) says that an internal action of a Haskell thread
231 can be executed by any native thread that
232 is in the Haskell state; that is, has $H$ on top of its stack. This is true
233 even if the native thread has called a bound foreign export, and that
234 Haskell function is still running.
235
236 \item On the other hand, (FCALL1) says that only a vanilla worker thread $N[H]$ (i.e. with
237 nothing on its stack except $H$) can execute an unbound foreign call. Similarly (FCALL2) says
238 that only native thread $N$ can execute a foreign call from a Haskell thread bound to $N$.
239 (This is under the proposal that foriegn imports are not labelled bound/unbound.)
240
241 \item In either (FCALL) case, the Haskell thread is removed from the pool, and captured in the
242 native thread's stack. The rule (FRET) reinstates the Haskell thread (bound or otherwise)
243 when the native thread decides to return from the call.
244
245 \item Rules (HCALL1) and (HCALL2) deal with foreign code calling Haskell. (HCALL1)
246 starts with a native thread that comes ``out of the blue'' with an unknown stack $\bullet$.
247 It just puts a new Haskell thread into the pool, bound or otherwise depending on
248 whether the foreign export being called is labelled as bound.
249
250 (HCALL2) is similar, except that we aren't allowed to make such a call if
251 the native thread is in the middle of an unsafe foreign call. The difference
252 between (HCALL1) and (HCALL2) looks very unsavoury.
253
254 \item (HRET) deals with returning from a completed Haskell thread to the appropriate
255 calling native thread. Notice that even an unbound Haskell thread must return its result
256 to the right calling native thread, hence the ``return action'' $\ret{t}$ tacked onto
257 the end of the Haskell thread.
258
259 \item (WKR) and (EXT) allow new native threads to enter the system.
260
261 \end{itemize}
262
263
264 \end{document}