Revert "'DynFlag'-free version of 'mkParserFlags'"
[ghc.git] / compiler / parser / ApiAnnotation.hs
1 {-# LANGUAGE DeriveDataTypeable #-}
2
3 module ApiAnnotation (
4 getAnnotation, getAndRemoveAnnotation,
5 getAnnotationComments,getAndRemoveAnnotationComments,
6 ApiAnns,
7 ApiAnnKey,
8 AnnKeywordId(..),
9 AnnotationComment(..),
10 IsUnicodeSyntax(..),
11 unicodeAnn,
12 HasE(..),
13 LRdrName -- Exists for haddocks only
14 ) where
15
16 import GhcPrelude
17
18 import RdrName
19 import Outputable
20 import SrcLoc
21 import qualified Data.Map as Map
22 import Data.Data
23
24
25 {-
26 Note [Api annotations]
27 ~~~~~~~~~~~~~~~~~~~~~~
28 Given a parse tree of a Haskell module, how can we reconstruct
29 the original Haskell source code, retaining all whitespace and
30 source code comments? We need to track the locations of all
31 elements from the original source: this includes keywords such as
32 'let' / 'in' / 'do' etc as well as punctuation such as commas and
33 braces, and also comments. We collectively refer to this
34 metadata as the "API annotations".
35
36 Rather than annotate the resulting parse tree with these locations
37 directly (this would be a major change to some fairly core data
38 structures in GHC), we instead capture locations for these elements in a
39 structure separate from the parse tree, and returned in the
40 pm_annotations field of the ParsedModule type.
41
42 The full ApiAnns type is
43
44 > type ApiAnns = ( Map.Map ApiAnnKey [SrcSpan] -- non-comments
45 > , Map.Map SrcSpan [Located AnnotationComment]) -- comments
46
47 NON-COMMENT ELEMENTS
48
49 Intuitively, every AST element directly contains a bag of keywords
50 (keywords can show up more than once in a node: a semicolon i.e. newline
51 can show up multiple times before the next AST element), each of which
52 needs to be associated with its location in the original source code.
53
54 Consequently, the structure that records non-comment elements is logically
55 a two level map, from the SrcSpan of the AST element containing it, to
56 a map from keywords ('AnnKeyWord') to all locations of the keyword directly
57 in the AST element:
58
59 > type ApiAnnKey = (SrcSpan,AnnKeywordId)
60 >
61 > Map.Map ApiAnnKey [SrcSpan]
62
63 So
64
65 > let x = 1 in 2 *x
66
67 would result in the AST element
68
69 L span (HsLet (binds for x = 1) (2 * x))
70
71 and the annotations
72
73 (span,AnnLet) having the location of the 'let' keyword
74 (span,AnnEqual) having the location of the '=' sign
75 (span,AnnIn) having the location of the 'in' keyword
76
77 For any given element in the AST, there is only a set number of
78 keywords that are applicable for it (e.g., you'll never see an
79 'import' keyword associated with a let-binding.) The set of allowed
80 keywords is documented in a comment associated with the constructor
81 of a given AST element, although the ground truth is in Parser
82 and RdrHsSyn (which actually add the annotations; see #13012).
83
84 COMMENT ELEMENTS
85
86 Every comment is associated with a *located* AnnotationComment.
87 We associate comments with the lowest (most specific) AST element
88 enclosing them:
89
90 > Map.Map SrcSpan [Located AnnotationComment]
91
92 PARSER STATE
93
94 There are three fields in PState (the parser state) which play a role
95 with annotations.
96
97 > annotations :: [(ApiAnnKey,[SrcSpan])],
98 > comment_q :: [Located AnnotationComment],
99 > annotations_comments :: [(SrcSpan,[Located AnnotationComment])]
100
101 The 'annotations' and 'annotations_comments' fields are simple: they simply
102 accumulate annotations that will end up in 'ApiAnns' at the end
103 (after they are passed to Map.fromList).
104
105 The 'comment_q' field captures comments as they are seen in the token stream,
106 so that when they are ready to be allocated via the parser they are
107 available (at the time we lex a comment, we don't know what the enclosing
108 AST node of it is, so we can't associate it with a SrcSpan in
109 annotations_comments).
110
111 PARSER EMISSION OF ANNOTATIONS
112
113 The parser interacts with the lexer using the function
114
115 > addAnnotation :: SrcSpan -> AnnKeywordId -> SrcSpan -> P ()
116
117 which takes the AST element SrcSpan, the annotation keyword and the
118 target SrcSpan.
119
120 This adds the annotation to the `annotations` field of `PState` and
121 transfers any comments in `comment_q` WHICH ARE ENCLOSED by
122 the SrcSpan of this element to the `annotations_comments`
123 field. (Comments which are outside of this annotation are deferred
124 until later. 'allocateComments' in 'Lexer' is responsible for
125 making sure we only attach comments that actually fit in the 'SrcSpan'.)
126
127 The wiki page describing this feature is
128 https://ghc.haskell.org/trac/ghc/wiki/ApiAnnotations
129
130 -}
131 -- ---------------------------------------------------------------------
132
133 -- If you update this, update the Note [Api annotations] above
134 type ApiAnns = ( Map.Map ApiAnnKey [SrcSpan]
135 , Map.Map SrcSpan [Located AnnotationComment])
136
137 -- If you update this, update the Note [Api annotations] above
138 type ApiAnnKey = (SrcSpan,AnnKeywordId)
139
140
141 -- | Retrieve a list of annotation 'SrcSpan's based on the 'SrcSpan'
142 -- of the annotated AST element, and the known type of the annotation.
143 getAnnotation :: ApiAnns -> SrcSpan -> AnnKeywordId -> [SrcSpan]
144 getAnnotation (anns,_) span ann
145 = case Map.lookup (span,ann) anns of
146 Nothing -> []
147 Just ss -> ss
148
149 -- | Retrieve a list of annotation 'SrcSpan's based on the 'SrcSpan'
150 -- of the annotated AST element, and the known type of the annotation.
151 -- The list is removed from the annotations.
152 getAndRemoveAnnotation :: ApiAnns -> SrcSpan -> AnnKeywordId
153 -> ([SrcSpan],ApiAnns)
154 getAndRemoveAnnotation (anns,cs) span ann
155 = case Map.lookup (span,ann) anns of
156 Nothing -> ([],(anns,cs))
157 Just ss -> (ss,(Map.delete (span,ann) anns,cs))
158
159 -- |Retrieve the comments allocated to the current 'SrcSpan'
160 --
161 -- Note: A given 'SrcSpan' may appear in multiple AST elements,
162 -- beware of duplicates
163 getAnnotationComments :: ApiAnns -> SrcSpan -> [Located AnnotationComment]
164 getAnnotationComments (_,anns) span =
165 case Map.lookup span anns of
166 Just cs -> cs
167 Nothing -> []
168
169 -- |Retrieve the comments allocated to the current 'SrcSpan', and
170 -- remove them from the annotations
171 getAndRemoveAnnotationComments :: ApiAnns -> SrcSpan
172 -> ([Located AnnotationComment],ApiAnns)
173 getAndRemoveAnnotationComments (anns,canns) span =
174 case Map.lookup span canns of
175 Just cs -> (cs,(anns,Map.delete span canns))
176 Nothing -> ([],(anns,canns))
177
178 -- --------------------------------------------------------------------
179
180 -- | API Annotations exist so that tools can perform source to source
181 -- conversions of Haskell code. They are used to keep track of the
182 -- various syntactic keywords that are not captured in the existing
183 -- AST.
184 --
185 -- The annotations, together with original source comments are made
186 -- available in the @'pm_annotations'@ field of @'GHC.ParsedModule'@.
187 -- Comments are only retained if @'Opt_KeepRawTokenStream'@ is set in
188 -- @'DynFlags.DynFlags'@ before parsing.
189 --
190 -- The wiki page describing this feature is
191 -- https://ghc.haskell.org/trac/ghc/wiki/ApiAnnotations
192 --
193 -- Note: in general the names of these are taken from the
194 -- corresponding token, unless otherwise noted
195 -- See note [Api annotations] above for details of the usage
196 data AnnKeywordId
197 = AnnAnyclass
198 | AnnAs
199 | AnnAt
200 | AnnBang -- ^ '!'
201 | AnnBackquote -- ^ '`'
202 | AnnBy
203 | AnnCase -- ^ case or lambda case
204 | AnnClass
205 | AnnClose -- ^ '\#)' or '\#-}' etc
206 | AnnCloseB -- ^ '|)'
207 | AnnCloseBU -- ^ '|)', unicode variant
208 | AnnCloseC -- ^ '}'
209 | AnnCloseQ -- ^ '|]'
210 | AnnCloseQU -- ^ '|]', unicode variant
211 | AnnCloseP -- ^ ')'
212 | AnnCloseS -- ^ ']'
213 | AnnColon
214 | AnnComma -- ^ as a list separator
215 | AnnCommaTuple -- ^ in a RdrName for a tuple
216 | AnnDarrow -- ^ '=>'
217 | AnnDarrowU -- ^ '=>', unicode variant
218 | AnnData
219 | AnnDcolon -- ^ '::'
220 | AnnDcolonU -- ^ '::', unicode variant
221 | AnnDefault
222 | AnnDeriving
223 | AnnDo
224 | AnnDot -- ^ '.'
225 | AnnDotdot -- ^ '..'
226 | AnnElse
227 | AnnEqual
228 | AnnExport
229 | AnnFamily
230 | AnnForall
231 | AnnForallU -- ^ Unicode variant
232 | AnnForeign
233 | AnnFunId -- ^ for function name in matches where there are
234 -- multiple equations for the function.
235 | AnnGroup
236 | AnnHeader -- ^ for CType
237 | AnnHiding
238 | AnnIf
239 | AnnImport
240 | AnnIn
241 | AnnInfix -- ^ 'infix' or 'infixl' or 'infixr'
242 | AnnInstance
243 | AnnLam
244 | AnnLarrow -- ^ '<-'
245 | AnnLarrowU -- ^ '<-', unicode variant
246 | AnnLet
247 | AnnMdo
248 | AnnMinus -- ^ '-'
249 | AnnModule
250 | AnnNewtype
251 | AnnName -- ^ where a name loses its location in the AST, this carries it
252 | AnnOf
253 | AnnOpen -- ^ '(\#' or '{-\# LANGUAGE' etc
254 | AnnOpenB -- ^ '(|'
255 | AnnOpenBU -- ^ '(|', unicode variant
256 | AnnOpenC -- ^ '{'
257 | AnnOpenE -- ^ '[e|' or '[e||'
258 | AnnOpenEQ -- ^ '[|'
259 | AnnOpenEQU -- ^ '[|', unicode variant
260 | AnnOpenP -- ^ '('
261 | AnnOpenPE -- ^ '$('
262 | AnnOpenPTE -- ^ '$$('
263 | AnnOpenS -- ^ '['
264 | AnnPackageName
265 | AnnPattern
266 | AnnProc
267 | AnnQualified
268 | AnnRarrow -- ^ '->'
269 | AnnRarrowU -- ^ '->', unicode variant
270 | AnnRec
271 | AnnRole
272 | AnnSafe
273 | AnnSemi -- ^ ';'
274 | AnnSimpleQuote -- ^ '''
275 | AnnSignature
276 | AnnStatic -- ^ 'static'
277 | AnnStock
278 | AnnThen
279 | AnnThIdSplice -- ^ '$'
280 | AnnThIdTySplice -- ^ '$$'
281 | AnnThTyQuote -- ^ double '''
282 | AnnTilde -- ^ '~'
283 | AnnType
284 | AnnUnit -- ^ '()' for types
285 | AnnUsing
286 | AnnVal -- ^ e.g. INTEGER
287 | AnnValStr -- ^ String value, will need quotes when output
288 | AnnVbar -- ^ '|'
289 | AnnVia -- ^ 'via'
290 | AnnWhere
291 | Annlarrowtail -- ^ '-<'
292 | AnnlarrowtailU -- ^ '-<', unicode variant
293 | Annrarrowtail -- ^ '->'
294 | AnnrarrowtailU -- ^ '->', unicode variant
295 | AnnLarrowtail -- ^ '-<<'
296 | AnnLarrowtailU -- ^ '-<<', unicode variant
297 | AnnRarrowtail -- ^ '>>-'
298 | AnnRarrowtailU -- ^ '>>-', unicode variant
299 | AnnEofPos
300 deriving (Eq, Ord, Data, Show)
301
302 instance Outputable AnnKeywordId where
303 ppr x = text (show x)
304
305 -- ---------------------------------------------------------------------
306
307 data AnnotationComment =
308 -- Documentation annotations
309 AnnDocCommentNext String -- ^ something beginning '-- |'
310 | AnnDocCommentPrev String -- ^ something beginning '-- ^'
311 | AnnDocCommentNamed String -- ^ something beginning '-- $'
312 | AnnDocSection Int String -- ^ a section heading
313 | AnnDocOptions String -- ^ doc options (prune, ignore-exports, etc)
314 | AnnLineComment String -- ^ comment starting by "--"
315 | AnnBlockComment String -- ^ comment in {- -}
316 deriving (Eq, Ord, Data, Show)
317 -- Note: these are based on the Token versions, but the Token type is
318 -- defined in Lexer.x and bringing it in here would create a loop
319
320 instance Outputable AnnotationComment where
321 ppr x = text (show x)
322
323 -- | - 'ApiAnnotation.AnnKeywordId' : 'ApiAnnotation.AnnOpen',
324 -- 'ApiAnnotation.AnnClose','ApiAnnotation.AnnComma',
325 -- 'ApiAnnotation.AnnRarrow'
326 -- 'ApiAnnotation.AnnTilde'
327 -- - May have 'ApiAnnotation.AnnComma' when in a list
328 type LRdrName = Located RdrName
329
330
331 -- | Certain tokens can have alternate representations when unicode syntax is
332 -- enabled. This flag is attached to those tokens in the lexer so that the
333 -- original source representation can be reproduced in the corresponding
334 -- 'ApiAnnotation'
335 data IsUnicodeSyntax = UnicodeSyntax | NormalSyntax
336 deriving (Eq, Ord, Data, Show)
337
338 -- | Convert a normal annotation into its unicode equivalent one
339 unicodeAnn :: AnnKeywordId -> AnnKeywordId
340 unicodeAnn AnnForall = AnnForallU
341 unicodeAnn AnnDcolon = AnnDcolonU
342 unicodeAnn AnnLarrow = AnnLarrowU
343 unicodeAnn AnnRarrow = AnnRarrowU
344 unicodeAnn AnnDarrow = AnnDarrowU
345 unicodeAnn Annlarrowtail = AnnlarrowtailU
346 unicodeAnn Annrarrowtail = AnnrarrowtailU
347 unicodeAnn AnnLarrowtail = AnnLarrowtailU
348 unicodeAnn AnnRarrowtail = AnnRarrowtailU
349 unicodeAnn AnnOpenB = AnnOpenBU
350 unicodeAnn AnnCloseB = AnnCloseBU
351 unicodeAnn AnnOpenEQ = AnnOpenEQU
352 unicodeAnn AnnCloseQ = AnnCloseQU
353 unicodeAnn ann = ann
354
355
356 -- | Some template haskell tokens have two variants, one with an `e` the other
357 -- not:
358 --
359 -- > [| or [e|
360 -- > [|| or [e||
361 --
362 -- This type indicates whether the 'e' is present or not.
363 data HasE = HasE | NoE
364 deriving (Eq, Ord, Data, Show)