forked from perl11/potion
-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathINTERNALS
557 lines (432 loc) · 16.4 KB
/
INTERNALS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
~ the nature of potion's insides ~
Potion's internals are based around the idea
that things that happen frequently should be
given the greater attention.
For example, code execution happens more often
than code parsing and loading. Therefore, the
VM is larger and faster than the parser and
compiler.
Other things that happen frequently:
* Math.
* Conditionals.
* Method dispatch.
* Local variables.
* Strings, to build and compare.
* Memory allocation.
More sporadic things likely include:
* New methods.
* Creation of classes.
* Very deep scoping situations.
* Altering the inheritance tree.
* Evals.
* Code allocation.
So understand that often the choices made in
Potion's VM are designed to accomodate the
first set rather than the second.
~
This doc is going to touch quite a bit on
the JIT, compiling straight to x86 and PowerPC
instructions. So many languages are targetting
the JVM and .Net these days... it made me
wonder what the hurdles are to generating
machine code directly.
Clearly, the biggest benefit of having a
common runtime is garbage collection. The
JVM has some wild GC algorithms built in.
And that's actually a huge benefit. I only
wish that more garbage collectors were
available as libraries.
Anyways, my favorite language has been Ruby
for many years, that's partially because
I like how easy it is to script from C,
for some efficiency where I need it. So
Potion is my little attempt to make a
language that feels easy, but compiles
to machine code, to give me the speed of
(un-optimized) C code and to use all the
C function call conventions so that I
can link to other C libs. I want to see
if its still possible to enroach on C.
I should say that O'Caml and Haskell
already do a fine job as C competitors,
so it's not like anything here is that
new.
~ lua's influence ~
Potion uses a register-based bytecode VM that
is nearly a word-for-word copy of Lua's. The
code is all different, but the bytecode is
nearly identical.
I also spelunked Neko VM for some ideas and
found this comparison very useful.
<http://nekovm.org/lua>
I also took from the Lua the notion of using
tables for everything. And, like Lua, lists
and tables are interchangeable. But, in Potion,
I kept their representations separate.
I also use immutable strings, like Lua.
However, my object representation is more
like Neko or Ruby, where objects are all
represented by a word (32-bit on x86 and
64-bit on x86-64).
~ the parts of potion ~
1. the parser
- lexer written with Ragel
(see core/pn-scan.rl)
- parser written with Lemon
(see core/pn-scan.y)
- produces a syntax tree
- the tree is a Potion object that can
be traversed in the code
- see potion_source_load()
in core/compile.c
2. the compiler
- compiles a syntax tree into
bytecode (see core/compile.c)
- has its own file format
(save to .pnb files)
- interception of keywords
happens here, rather than in
the parser, to simplify parsing
3. the vm
- executes bytecode
- used mostly to ensure correctness
- helpful in debugging
- run code on non-x86 with performance
penalty
4. the jit
- compiles bytecode into a function
pointer
- uses the same argument passing strategy
as c functions
5. the garbage collector
- generational, copying, near exact
- marks the stack, then uses a cheney
loop to catch everything else
- stack scanning is automatic (both
the jit and the bytecode vm keep
their registers on the call stack)
- liberal use of volatile for object
pointers, to allow them to move
after any allocation
- based on a collector called qish
~ the jit ~
The X86 JIT is the first half of core/vm.c.
The code translates Potion bytecode into
32-bit or 64-bit machine code.
In the code:
add = (x, y): x + y.
The `add` closure is compiled into a c function
pointer looking like this:
PN (*)(Potion *P, PN cl, PN self, PN x, PN y)
This means you can actually load this closure
into C and call it directly.
Potion *P = potion_create();
PN add = potion_eval(P, "(x, y): x + y.");
PN_F addfn = PN_CLOSURE_F(add);
PN num = addfn(P, 0, 0, PN_NUM(3), PN_NUM(5));
printf("3 + 5 = %d\n", PN_INT(num));
The macros are there to allow bytecode to be
wrapped in a function pointer, if needed. The
inside story looks like this:
PN num = ((struct PNClosure *)add)->method(
P, 0, 0, PN_NUM(3), PN_NUM(5));
~ the jit's assembly ~
Now, let's talk about the compiled code.
Earlier the example was,
add = (x, y): x + y.
So we have a function with two arguments
and a math operation.
The C equivalent is,
PN add(Potion *P, PN cl, PN self, PN x, PN y) {
PN reg1 = x;
PN reg2 = y;
reg1 = PN_NUM(PN_INT(reg1) + PN_INT(reg2));
return reg1;
}
The arguments are moved to registers on the
stack. Then converted to ints. Then added.
And, lastly, the result is cast as a Potion
object.
The machine code produced by the JIT will
be very close to the code produced by a
compiler with the optimizations turned off.
(At least for the present this is true.)
Like C functions, Potion keeps local variables
on the stack. The call stack is like a big tape
measure that goes on and on. Since this function
has two registers (reg1 and reg2,) we need
two inches from the tape measure.
<http://en.wikipedia.org/wiki/Call_stack>
(See the parts that say "Local data storage"
and "Argument passing".)
Our machine code says "give me two word-sized
slots in the stack" as the first order of business.
We store `x` and `y` in those two slots. (See
the comment about C argument passing in the
next section.)
Then we do the math. Even the PN_INT and
PN_NUM macros are fundamentally math.
If you run Potion with -V on this script,
you'll see a bytecode disassembly.
; function ; 16 bytes
; (x, y) 2 registers
.local "x" ; 0
.local "y" ; 1
[1] getlocal 0 0
[2] getlocal 1 1
[3] add 0 1
[4] return 0
Each of these lines corresponds to a line of
C code inside that function a few paragraphs
back. You see how they line up there?
~ x86 and x86-64 ~
64-bit CPU instructions are a superset of
32-bit CPU instructions. And many 32-bit
instructions can be converted to 64-bit by
prefixing them.
Take this code,
8b 55 f8 // mov %ebp[-1] %edx
8b 45 fc // mov %ebp[-2] %eax
83 e8 04 // sub 0x4 %edx
89 50 04 // mov %edx %eax[1]
These are the 32-bit instructions for altering
an upval. (A variable in another scope.) Since
a closure doesn't have access to the registers
of another function, these variables are passed
as pointers (the PNWeakRef struct.)
This code loads the upval into %eax and the new
value into %edx. To get the upval's actual struct
pointer, we subtract 4 from %edx. Finally, we
move the new value into offset 4 in the struct
pointed to by the upval.
In 64-bit code this would be,
48 8b 55 f0 // mov %rbp[-1] %rdx
48 8b 45 f8 // mov %rbp[-2] %rax
83 e8 04 // sub 0x4 %edx
48 89 50 08 // mov %rdx %rax[1]
All it took was adding the 0x48 prefix to our
mov instructions and changing our offsets to
measure 8 bytes rather than 4. The third
instruction actually stays the same, since
we're only doing math on the lower 32-bits of
the upval.
We now call the registers %rax and %rdx.
That's just what the CPU folks call em.
The only other technique that changes between
32-bit and 64-bit is function calling. On 32-
bit, we keep all the function arguments on
the stack. On 64-bit, some function arguments
are passed through registers.
That's the purpose behind the function named
potion_x86_c_arg. To shuttle the arguments
in and out, depending on the architecture.
~ using disassemblers ~
I don't really know the x86 instruction set
too well and most often I just write something
in C and disassemble it.
$ gcc test.c -o test
$ objdump --disassemble-all test
However, objdump doesn't come with the OS X
dev tools, so you can use the included
machodis.pl by Robert Kelp to inspect the
binary.
$ ./tools/machodis.pl test
~ the garbage collector ~
Potion's garbage collector is based on the
work Basile Starynkevitch performed with his
small generational GC entitled Qish. As with
everything else in Potion, the goal isn't to
be the best, but to experiment with what sort
of advanced features can be squeezed into a
language while keeping the code compact.
Since Potion interpreters are per-thread,
each interpreter has its own GC heap. This
allows interpreters to run in parallel and
to start up and shut down independently.
Perhaps one day the GC will run in parallel.
Qish is a two-finger generational collector.
(Also called a Cheney loop.) This collector
only has two generations. When the time for
garbage collection arrives, everything from
the call stack is scanned and moved from the
young gen to the older gen. The insides of
all those objects are then scanned and moved
as well. The goal is simply to wipe out the
birth region and replace it with another.
What about objects from the birth region
that are stored off in some old hashtable
somewhere? Or stuff added to a global object?
The birth region has a write barrier: an
array of pointers to any object which has
recently changed. If a new object is added
to a tuple, it's the tuple's job to notify
GC so that the tuple will get scanned on
the next GC pass.
What happens when the old generation runs
out of room? A new old generation is
allocated and we do a major collection:
the call stack is scanned and everything from
the birth region AND the full-to-the-brim
older region are both scanned and moved on
to the new generation.
This all has some obvious pitfalls. First,
that's a lot of memory moving around all
the time -- pointers moving around. And,
second, that's a lot of scanning to do:
does the entire heap have to be scanned
every time there's a major collection?
Yes, it's a lot of work and I hope to get
some real benchmarks and determine a new
strategy to buy some speed: either through
introducing a third generation or by
defragmenting memory or by allowing pages
to be partially unmapped. Probably the last
one: I'll weight sections of the older
generation and then clear out those sections
as they become more sparse.
As for moving pointers around, that's
why the volatile keyword is a part of the
PN typedef.
Does Potion perform exact GC? Not strictly.
During the stack scanning phase, Potion has
no idea what pointers are legit. For now,
this is resolved by investigating the header
of each pointer's object for a valid type
object. The type can then be compared with
the contents of the object to be sure the
object is real. To me, this feels exact
enough. (If, in practice, it proves to
be too messy, I may resort to using a
macro to identify exact pointers, just as
Qish does.)
~ the volatile keyword ~
Since objects move freely about, Potion
requires use of the volatile keyword for
all access to object ids (PN) from C.
As most people don't readily understand
the volatile keyword, let's go over it.
There are three common uses of the volatile
keyword in Potion code:
PN string = potion_byte_str("...");
Since `volatile` is part of the PN typedef
this code actually means:
volatile unsigned long string =
potion_byte_str("...");
In this case, the `volatile` keyword will
generally prevent the PN from being stored
in a system register when the code is
optimized. Even for caching purposes.
So any time the `string` variable is used,
it'll be looked up directly on the stack.
This means the code won't be optimized fully,
but it also means the value will be looked
up every time we use it, in case GC has changed
the object's place on us.
struct PNObject * volatile string = ...;
If we cast the string to a PNObject pointer,
we need to be sure to place the `volatile`
keyword after the star. This tells the compiler
that the _pointer_ is the volatile one, not the
struct itself.
struct PNUserType {
PN_OBJECT_HEADER
PN a, b, c;
};
Lastly, in any structs which are wrapped by
a Potion object, you need to be sure that any
object handles are marked volatile as well.
In the above case, the PN typedef handles that.
This is a bit tedious to keep in mind, but
is compensated for by Potion's automatic stack
scanning (a technique which has brought easy
C extensions to Ruby) and object convention
(if your object pointers are all 8-byte aligned,
you won't need to write a GC marking function.)
~ the fixed memory section ~
Potion does have one other section of
memory: a permanent generation containing
all of the objects that are used to bootstrap
Potion's environment. The Lobby object, the
class lookup table, the GC structures, etc.
This section of memory is positioned at the
beginning of the first birth area. After the
birth area, fills that memory is detached
(like the saucer section of the Enterprise)
and lives its life independantly of the other
regions, not to be moved or released until
the Potion interpreter is destroyed.
~ object overhead ~
Primitives (such as numbers, booleans and nil) are
immediate values in Potion, kept in registers.
Everything else is given a spot in memory. Every
allocated object has a single word header specifying
its type. Here's a breakdown of what each object looks
like on 32-bit:
PNObject = 4 + (N * 4) bytes
4 bytes (type = 0x250009)
N * 4 bytes (each object field)
PNWeakRef = 8 bytes
4 bytes (type = 0x250004)
4 bytes (pointer to object)
PNString = 13 + N bytes
4 bytes (type = 0x250003)
4 bytes (length of string)
4 bytes (an incremental id used as hash)
N + 1 bytes (UTF-8 + '\0')
PNBytes = 9 + N bytes
4 bytes (type = 0x25000c)
4 bytes (length of string)
N + 1 bytes (string + '\0')
PNTuple = 8 + (N * 4) bytes
4 bytes (type = 0x250006)
4 bytes (length of tuple)
N * 4 bytes (each tuple item)
PNClosure = 16 + (N * 4) bytes
4 bytes (type = 0x25005)
4 bytes (function pointer)
4 bytes (signature pointer)
4 bytes (length of attached data)
N * 4 bytes (attached data)
PNData = 8 + N bytes
4 bytes (type = 0x250011)
4 bytes (length of contained data)
N bytes (data)
The smallest allocation unit in Potion is
16 bytes. Everything larger than 16 bytes
is aligned to 8 bytes. This is true for both
32-bit and 64-bit platforms.
The reason for this figure is to allow room for
the PNFwd struct:
unsigned int // POTION_FWD or POTION_COPIED
unsigned int // allocated space in this slot
unsigned long // pointer to moved object
Any time an object is moved by GC or reallocated,
a temporary PNFwd is placed in the memory slot to
redirect callers temporarily. In practice, you
shouldn't ever need to deal with these values,
since the stack scanning phase will rewrite them
automatically.
The first type of PNFwd is POTION_FWD. This type
is used for reallocation. Potion only has four
built-in types that are subject to reallocation:
tuples, tables, byte strings, and internal buffers.
(Internal buffers are used by the JIT to store
machine code output and by the object model to
keep an array of type classes in circulation.)
The POTION_FWD pointers will happen between GC
cycles, so they stay in place for awhile. The
API around those four built-ins is designed to
check for forwarded pointers whenever they're
passed in as an object. One the next GC cycle
hits all the pointers will be updated to use
the new objects (even if POTION_FWD occurs in
the old generation, due to the write barrier.)
The other PNFwd type is a POTION_COPIED pointer.
This is used by the GC to denote an object which
has moved to the old generation. These should
not appear outside the GC cycle. The PNFwd is
placed immediately when the object is moved and
is then used to rewrite pointers found on the
stack or inside other objects. After the GC
cycle, these pointers will disappear when the
its outdated generation is discarded.