forked from doublec/factor-articles
-
Notifications
You must be signed in to change notification settings - Fork 1
/
compilers.tex
381 lines (314 loc) · 12.7 KB
/
compilers.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
\chapter{Compilers and Interpreters}\label{compilers}
Writing compilers and interpreters in Factor is quite easy. By writing
the grammar using parser combinators, and having those combinators
produce an abstract syntax tree (AST), it's easy to write a simple
compiler or interpreter for the tree. I've already covered writing
simple parser combinators so I'll concentrate on processing the AST
for a simple language in this post. A later post might generate a
parser for it or it could be an 'exercise for the reader'.
For an example to start with I took the evaluator defined in this
Lambda the Ultimate posting and converted it to Factor. For the
interpreter I tried two different approaches to compare them. The
first was to use pattern matching. For the second I use generic
functions and Factor's OO system.
For the compiler I show how to compile to Factor code, which can then
be interpreted or compiled by Factor itself, and I also compile to
Javascript. This javascript can be run in a standalone Javascript
interpreter like Rhino, or in a web browser.
To more easily re-use the AST across the two examples I added the
ability to pattern match on tuples to the `match' contrib
library. This can be obtained from the standard Factor git
repository:
\begin{verbatim}
git clone git://factorcode.org/git/factor.git
\end{verbatim}
The factor `latest' boot images can be used to bootstrap from this
repository. The instructions on how to do this are on
factorcode.org. For an Intel x86 linux machine the steps would be:
\begin{verbatim}
git clone git://factorcode.org/git/factor.git
cd factor
make linux-x86-32
wget http://factorcode.org/images/latest/boot.x86.32.image
./factor -i=boot.x86.32.image
\end{verbatim}
The examples here should also work with the released Factor 0.85 as
long as you replace `contrib/match/match.factor' with the version from
my repository. To load the `match' library in Factor use:
\begin{verbatim}
USE: match
\end{verbatim}
In the Factor code snippets below I use `=>' to show the result of
running Factor words. You shouldn't actually type that. Instead the
result will be shown on the Factor stack or you can use `.' to print
it.
The code for the examples in this post can be downloaded from
interpreter.factor. If you copy it into the `contrib' directory of
your Factor installation you can load it with:
\begin{verbatim}
"contrib/interpreter" require
\end{verbatim}
The evaluator that is being implemented has the following description
from the Lambda the Ultimate posting:
\begin{verbatim}
data Term a where
Lit :: Int -> Term Int
Inc :: Term Int -> Term Int
IsZ :: Term Int -> Term Bool
If :: Term Bool -> Term a -> Term a -> Term a
Pair :: Term a -> Term b -> Term (a,b)
Fst :: Term (a,b) -> Term a
Snd :: Term (a,b) -> Term b
eval :: Term a -> a
eval (Lit i) = i
eval (Inc t) = eval t + 1
eval (IsZ t) = eval t == 0
eval (If b t e) = if eval b then eval t else eval e
eval (Pair a b) = (eval a, eval b)
eval (Fst t) = fst (eval t)
eval (Snd t) = snd (eval t)
\end{verbatim}
Basically we have literal values, which in this example are
integers. These can be incremented and tested to see if they are
zero. A `pair' can be created and the first and second elements
obtained from the pair. An `if' expression is available for testing
an expression and evaluating a true or false clause. These are
represented using Factor tuples:
\begin{verbatim}
TUPLE: lit i ;
TUPLE: inc t ;
TUPLE: isz t ;
TUPLE: iff b t e ;
TUPLE: pair a b ;
TUPLE: fst t ;
TUPLE: snd t ;
\end{verbatim}
I used the same names for the types and elements as the example but
changed `if' to `iff' to prevent clashing with Factor's standard `if'
word. An example tree can be created that is the increment of the
number 5 with:
\begin{verbatim}
5 <lit> <inc>
\end{verbatim}
When evaluated this should produce the result `6'.
The interpreter implementation that uses pattern matching uses
`match-cond'. This requires match variables to be set up which can be
used to destructure sequences and tuples. For example:
\begin{verbatim}
5 <lit> T{ lit f ?t } match [ ?t ] bind => 5
\end{verbatim}
`T{ lit f 123 }' is the Factor syntax for a tuple. It creates a tuple
at parse time and the actual object is stored in the word. This is
different from '123 <lit>' which will produce the literal object at
runtime. `?t' is a matching variable and the `match' and `match-cond'
words use these to destructure matching elements of tuples and
sequences. The match variable can then be used to get the actual value
obtained. For more on the Factor pattern matching system you can read
the documentation from within Factor with:
\begin{verbatim}
\ match help
\end{verbatim}
The Factor implementation of the interpreter using pattern matching
is:
\begin{verbatim}
: eval1 ( a -- a )
{
{ T{ lit f ?i } [ ?i ] }
{ T{ inc f ?t } [ ?t eval1 1+ ] }
{ T{ isz f ?t } [ ?t eval1 zero? ] }
{ T{ iff f ?b ?t ?e } [ ?b eval1 [ ?t ] [ ?e ] if eval1 ] }
{ T{ pair f ?a ?b } [ ?a eval1 ?b eval1 2array ] }
{ T{ fst f ?t } [ ?t eval1 first ] }
{ T{ snd f ?t } [ ?t eval1 second ] }
} match-cond ;
\end{verbatim}
It is pretty much a direct translation of the Haskell example. An
example of using it:
\begin{verbatim}
5 <lit> <inc> eval1 => 6
\end{verbatim}
The Lambda the Ultimate post gave a usage example:
\begin{verbatim}
stuff = Pair (If (IsZ (Inc (Inc (Lit 5))))
(Lit 6)
(Lit 7))
(Fst (Snd (Pair (Lit 42)
(Pair (Lit 8) (Lit 9)))))
\end{verbatim}
Translated to Factor it is:
\begin{verbatim}
: driver ( -- v )
5 <lit> <inc> <inc> <isz>
6 <lit>
7 <lit>
<iff>
42 <lit> 8 <lit> 9 <lit> <pair> <pair> <snd> <fst> <pair> ;
\end{verbatim}
Running the `eval1' word on this produces the answer, { 7 8 }:
\begin{verbatim}
driver eval1 => { 7 8 }
\end{verbatim}
Although `eval1' is quite short and easy to understand it has a
disadvantage in that to extend it with new AST types you need to
modify the `eval1' word. Using the Factor OO generic word system you
extend the interpreter without having to modify existing code. Here's
a generic word approach to the interpreter:
\begin{verbatim}
GENERIC: eval2 ( a -- a )
M: lit eval2 ( a -- a ) lit-i ;
M: inc eval2 ( a -- a ) inc-t eval2 1+ ;
M: isz eval2 ( a -- a ) isz-t eval2 zero? ;
M: iff eval2 ( a -- a ) dup iff-b eval2 [ iff-t ] [ iff-e ] if eval2 ;
M: pair eval2 ( a -- a ) dup pair-a eval2 swap pair-b eval2 2array ;
M: fst eval2 ( a -- a ) fst-t eval2 first ;
M: snd eval2 ( a -- a ) snd-t eval2 second ;
\end{verbatim}
The `M:' word is used to define a method for a generic word. This is
similar to how generic functions and methods work in CLOS and
Dylan. When a generic word is called the actual method invoked is
based on the topmost item of the stack. The first symbol past the `M:'
is the type of that object that that method is for. The second is the
name of the generic word the method will be added too. The rest of the
definitions should be reasonably clear - they break apart the tuples
and return results or call `eval2' recursively. `eval2' produces the
same result as `eval1':
\begin{verbatim}
driver eval2 => { 7 8 }
\end{verbatim}
Which approach to use is really personal preference. Of the two,
`eval2' is the most efficient. This is because methods can be compiled
in Factor whereas the current implementation of `match-cond' cannot
be. So `eval1' will run in the Factor interpreter whereas `eval2' will
be compiled to native code. This can be seen by trying to compile the
examples and running a timed test:
\begin{verbatim}
\ eval1 compile => "The word eval1 does not have a stack effect"
\ eval2 compile
\ eval1 compiled? => f
\ eval2 compiled? => t
[ 1000 [ driver eval1 ] times ] time => 487ms run / 5 ms GC time
[ 1000 [ driver eval2 ] times ] time => 3ms run / 0 ms GC time
\end{verbatim}
There is obviously room for improvement in the `match' routines! I can
say that because I wrote them :)
A compiler follows the same structure as the interpreter but instead
of computing the result we generate code. This code is then later fed
to the compiler. For the compiler examples I'm only going to show the
pattern matching based example. The generic word implementation is
very similar and would follow the relevant interpreter implementation.
Factor provides a `make' word that can be used for dynamically appending to a new sequence. From
within a `make' scope you can use `,' to append an element and `\%' to splice in a sequence to the
constructed sequence. `make' can be used for constructing arrays, strings, quotations, etc. The type
of the constructed sequence is identified by an exemplar sequence passed to `make'. For example:
\begin{verbatim}
[ 1 , 2 , { 3 4 } % { 5 } , ] { } make => { 1 2 3 4 { 5 } }
\end{verbatim}
The top level `compile' word will set up a `make' scope for the AST
compile routines to append to. Here's the compiler to Factor:
\begin{verbatim}
: (compile1) ( a -- )
{
{ T{ lit f ?i } [ ?i , ] }
{ T{ inc f ?t } [ ?t (compile1) \ 1+ , ] }
{ T{ isz f ?t } [ ?t (compile1) \ zero? , ] }
{ T{ iff f ?b ?t ?e } [ ?b (compile1)
[ ?t (compile1) ] [ ] make ,
[ ?e (compile1) ] [ ] make ,
\ if , ] }
{ T{ pair f ?a ?b } [ ?a (compile1) ?b (compile1) \ 2array , ] }
{ T{ fst f ?t } [ ?t (compile1) \ first , ] }
{ T{ snd f ?t } [ ?t (compile1) \ second , ] }
} match-cond ;
: compile1 ( a -- quot )
[ (compile1) ] [ ] make ;
\end{verbatim}
As you can see it is very similar to the interpreter. Instead of
returning the result of the AST evaluation `(compile1)' appends the
equivalent Factor code to the constructed quotation created in the
`compile1' word.
The implementation for handling `lit' just appends the numeric value
directly. For `inc' it calls `(compile1)' on the term to be
incremented so that gets appended to the quotation. It then appends
the Factor `1+' word which will be used to increment it. The `\' is an
escape mechanism to tell Factor to store the word itself rather than
call it.
The main complication is in the implementation of `iff'. This word
must delay the computation of the two terms of the `iff' statement. To
to this it creates Factor quotations with `make' and then recursively
calls `(compile1)' for the terms. The results of the compilation for
this recursive call will be stored in the most recently created
quotation rather than the one in the top level `compile1' word. That
is, nested `make' calls are dynamically scoped.
An example of usage of the compiler:
\begin{verbatim}
5 <lit> <inc> compile1 => [ 5 1 + ]
[ 5 1 + ] call => 6
\end{verbatim}
The `driver' AST shown previously compiles to Factor correctly:
\begin{verbatim}
driver compile1 => [
5 1+ 1+ zero? [
6
] [
7
] if 42 8 9 2array 2array second first 2array
]
\end{verbatim}
This can be compiled to native code using Factor or run in the Factor
interpreter:
\begin{verbatim}
driver compile1 compile-quot execute => { 7 8 }
driver compile1 call => { 7 8 }
\end{verbatim}
The final example is a compiler to Javascript. Again it follows the
same basic outline of the interpreter. Instead of generating a Factor
quotation it uses `make' to generate a string:
\begin{verbatim}
: (compile2) ( a -- )
{
{ T{ lit f ?i } [ ?i number>string % ] }
{ T{ inc f ?t } [ ?t (compile2) "+1" % ] }
{ T{ isz f ?t } [ ?t (compile2) "== 0" % ] }
{ T{ iff f ?b ?t ?e } [
"function() {if(" %
?b (compile2)
") {return " %
?t (compile2)
"} else { return " %
?e (compile2)
"}}()" %
]
}
{ T{ pair f ?a ?b } [ "{ first:" % ?a (compile2) ",second:" % ?b (compile2) " }" % ] }
{ T{ fst f ?t } [ ?t (compile2) ".first" % ] }
{ T{ snd f ?t } [ ?t (compile2) ".second" % ] }
} match-cond ;
: compile2 ( a -- quot )
[ "(" % (compile2) ")" % ] "" make ;
\end{verbatim}
The main complication with this compiler was handling `if'. The
Javascript `if' has no result so cannot be assigned immediately. So I
wrap the `if' in a function which is called immediately. The two terms
of the `if' use `return' to return the result from the function. A
simple example:
\begin{verbatim}
5 <lit> <inc> compile2 => "(5+1)"
\end{verbatim}
And `driver' also compiles successfully:
\begin{verbatim}
driver compile2 =>
"({
first:function() {
if(5+1+1== 0) {
return 6
} else {
return 7
}
}(),
second: {
first:42,
second: { first:8,second:9 }
}.second.first
})"
\end{verbatim}
This result evaluates successfully in a browser. You can test it by clicking here.