-
Notifications
You must be signed in to change notification settings - Fork 35
/
Copy pathplm
1745 lines (1281 loc) · 68.7 KB
/
plm
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Understanding the UCBLogo evaluator
Prerequisite: understanding the metacircular evaluator (MCE, SICP 4.1)
Corequisite: understanding the explicit control evaluator (ECE, SICP 5.4)
Contents
--------
1. Review of metacircular evaluator
1.1 Recursive structure of the evaluator
1.2 Tail call elimination in the MCE
2. The Explicit-Control Evaluator
2.1 Typical C-style procedure calling convention
2.2 Procedure calling convention in the ECE
2.3 The evaluator uses surprisingly few registers
2.4 The two-screenful version of the explicit control evaluator
2.5 Implementing explicit control in C
3. Complicating Issues in UCBLogo
3.1 Commands vs. operations
3.2 Error messages and tail call elimination
3.2a Tailforms
3.2b Tailforms handled as embedded calls
3.2c Caller-status arguments to eval-sequence and eval-dispatch
3.2d Detecting errors before they happen
3.3 Macros
3.3a Why the MCE uses special forms
3.3b Using rewriting instead of special forms
3.3c Logo continuations
3.4 Dynamic scope
3.4a Reasons for lexical scope
3.4b Reasons for dynamic scope
3.4c Shallow binding
3.5 Nonlocal exit
3.6 Lack of parentheses delimiting procedure calls
3.6a Tokenization
3.6b Monadic and dyadic minus
3.6c More on runparsing
3.6d Treeification
3.6e Procedure bodies
3.6f Changing a procedure's arity
4. Data structures and types
4.1 Pair-based types
4.2 Numeric types
4.3 String types
4.3a Internal structure of a string
4.3b How to create a string node
4.3c Including punctuation characters in words
4.4 Symbols
4.5 Quote and colon types
4.6 Arrays
4.7 Primitive procedures
5. Garbage collector
5.1 Node allocation
5.2 The mark phase
5.3 GCTWA
5.4 The sweep phase
5.5 Per-node gc data
1. Review of metacircular evaluator
-----------------------------------
1.1 Recursive structure of the evaluator
In Logo, as in Scheme, evaluating an expression involves recursively
evaluating other expressions, for two reasons:
1. Visible subexpressions, as in (+ (* 3 4))
2. Expressions in the body of a procedure invoked by this expression
#1 gives rise to the mutual recursion
eval -> list-of-values -> eval
while #2 gives rise to
eval -> apply -> eval-sequence -> eval
As usual with recursive problems, the resulting program is remarkably short,
but it gives rise to complicated processes.
In what follows, we mostly focus on recursive path #2.
1.2 Tail call elimination in the MCE
Since recursion is the main built-in means for expression iterations
(iterative tools such as MAP are written using recursion; for the moment we
ignore Scheme's primitive DO and Logo's primitive REPEAT), it's important that
implementations be able to evaluate tail calls without creating extra frames
for each call. The proper handling of tail calls in the MCE depends on proper
tail calling in the underlying Scheme, and also on the fact that the
evaluation of the last expression in a sequence is done through a tail call to
EVAL:
(define (eval-sequence exps env) ; SICP p. 367
(cond ((last-exp? exps) (EVAL (FIRST-EXP EXPS) ENV))
(else (eval (first-exp exps) env)
(eval-sequence (rest-exps exps) env))))
To emphasize the point, the evaluator would not handle tail calls properly if
eval-sequence were written this way:
(define (eval-sequence exps env) ; wrong!
(let ((result (eval (first-exp exps) env)))
(if (null? (rest-exps exps))
result
(eval-sequence (rest-exps exps) env))))
This might seem more elegant because the call to EVAL appears only once in the
definition, but that call isn't a tail call.
2. The Explicit-Control Evaluator
----------------------------------
2.1 Typical C-style procedure calling convention
In typical machine-language programming (including compiled C programs) it
is the responsibility of called procedures to allocate space and save whatever
might otherwise be clobbered by a recursive call. So the calling sequence (in
the calling procedure) looks like
<put arguments in registers>
<procedure-call instruction>
where the second line represents a single machine language instruction that
saves the return address in a register and jumps to the procedure. The called
procedure looks like this:
<allocate stack space>
<save registers we'll use on the stack, including return address>
... <do the actual work> ...
<restore saved values>
<deallocate stack space>
<jump to return address in some register>
2.2 Procedure calling convention in the ECE
But this won't work if we want a tail-call to avoid allocating new stack
space. It's the caller, not the callee, who knows that this is a tail call.
Therefore the ECE makes saving the caller's job:
<allocate space>
<save registers>
<put arguments in registers>
<call procedure>
<restore saved registers>
<deallocate space>
When the caller is making a tail call rather than an embedded call, it can
leave out the saving and restoring steps:
<put arguments in registers>
<jump to procedure, without saving a return address>
That last line works because the called procedure inherits its return address
from the caller. In other words, it returns directly to the caller's caller.
2.3 The evaluator uses surprisingly few registers
exp an expression to be evaluated (argument to EVAL)
env the current environment (argument to EVAL and EVAL-SEQUENCE)
val the value of the expression (returned from EVAL)
continue the current return address (set by procedure call instruction
in most computers, but assigned explicitly here)
proc the procedure being invoked (argument to APPLY)
argl the actual argument values (argument to APPLY)
unev a list of expressions (argument to EVAL-SEQUENCE)
In addition, the SAVE and RESTORE operations use an implicit stack pointer
register, which is never explicitly saved or restored.
2.4 The two-screenful version of the explicit control evaluator
eval-dispatch: [Corresponds to EVAL in MCE]
exp is self-evaluating -> val=exp, return to caller
exp is variable -> val=(lookup exp env), return to caller
exp is special form -> each one is different, much like MCE
exp is application:
SAVE STUFF
EXP=(CAR EXP)
CALL EVAL-DISPATCH ;operator
RESTORE STUFF
proc=val
argl='()
unev=(cdr exp) ;actual argument expressions
ev-appl-operand-loop:
unev is empty -> goto apply-dispatch
exp=(car unev)
SAVE STUFF
CALL EVAL-DISPATCH
RESTORE STUFF
argl=(cons val argl)
unev=(cdr unev)
goto ev-appl-operand-loop
apply-dispatch: [Corresponds to APPLY in MCE]
proc is primitive -> do magic, return to caller
compound-apply:
unev=(procedure-parameters proc)
env=(procedure-environment proc)
env=(extend-environment unev argl env)
unev=(procedure-body proc)
goto ev-sequence
ev-sequence: [Corresponds to EVAL-SEQUENCE in MCE]
exp=(car unev)
(cdr unev) is empty -> ev-sequence-last-exp
SAVE STUFF
CALL EVAL-DISPATCH
RESTORE STUFF
unev=(cdr unev)
goto ev-sequence
ev-sequence-last-exp:
GOTO EVAL-DISPATCH
The calls to EVAL-DISPATCH are capitalized in the above. There are four of
them; the first three (operator, operand, body expression) are surrounded by
save and restore operations, but the fourth (final body expression) is just a
goto, without saving registers. (SICP pp 556-557.)
The version above simplifies the actual ECE by leaving out an optimization in
evaluating the last actual argument expression in a procedure application.
It's important that special forms that evaluate subexpressions, such as IF,
are written to tail-call EVAL-DISPATCH for the last subexpression, so that
(if (= 2 3) (this-isnt-evaluated) (this-is-tail-called))
works correctly.
2.5 Implementing explicit control in C
The C language provides labels and goto, but the labels are not first-class;
you can't put a label (i.e., a memory address in your code) into a register
or onto the stack.
UCBLogo gets around this restriction by using a kind of pun: C has separate
namespaces for labels and members of enums -- the same symbol can be used for
both. So in logo.h there is a declaration of an "enum labels" whose members
have the same names as labels in the evaluator (e.g., begin_seq). The
newcont macro that's used in the evaluator to store a new continuation (in
the ECE sense) on the stack actually stores this enum (i.e., a small
nonnegative integer). This stored value is translated into an actual label
at fetch_cont, which says
switch (x) {
begin_seq: goto begin_seq;
accumulate_arg: goto accumulate_arg;
...
}
(The actual order of labels is determined in logo.h; I've tried to put the
more commonly used ones first.)
3. Complicating Issues in UCBLogo
---------------------------------
The SICP evaluators leave out one important complexity of a full Scheme
implementation (the nonlocal transfer of control allowed by CALL/CC, which is
somewhat like the issue raised in Logo by CATCH and THROW). Also, Logo is
more complicated to interpret than Scheme, for these reasons:
Commands vs. operations (not every procedure returns a value)
The desire for error messages that hide tail call elimination
Macros
Dynamic scope
Lack of parentheses delimiting procedure calls
These topics are discussed in the following sections.
3.1 Commands vs. operations
The UCBLogo interpreter contains an object named UNBOUND that can be used as
the return value from a Logo procedure to indicate that, above the abstraction
barrier, there is no return value.
Consider the following Logo procedure:
to bfn :n :lst
if :n=0 [output :lst]
output bfn :n-1 butfirst :lst
end
The call to BFN in the last line of the body should be a tail call. This
means that we have to treat the word OUTPUT specially in the interpreter. To
the user, OUTPUT is a primitive procedure that takes one input. But we can't
evaluate the input expression and then call OUTPUT, because that would make
the call to BFN an embedded recursion. (In other words, the call to
EVAL-DISPATCH in 2.4 above that evaluates argument expressions saves and
restores registers.)
Instead, OUTPUT must be treated internally as a special form. When OUTPUT is
seen, even if there are more unevaluated expressions in the body, its input is
evaluated with a tail call (a goto) to EVAL-DISPATCH.
Similarly, STOP must be treated specially. EVAL-SEQUENCE looks to see if the
expression *after* the current one is the word STOP; if so, the current
expression is tail-called. We must catch STOP early to handle things like
to foo
IF condition [tail-call-this STOP]
...more expressions...
end
Under some circumstances the STOP is seen as the current expression, rather
than as the expression after the current one; in this case, EVAL-SEQUENCE just
returns to its caller. This situation is exemplified by
to foo
IF condition [STOP]
...more expressions...
end
In eval.c, OUTPUT and STOP are referred to as "tailforms"; there is a
procedure is_tailform(exp) that returns nonzero (true) if the expression is an
OUTPUT or STOP expression.
Note on terminology: In above-the-line documentation we distinguish between
*expressions*, which return values, and *instructions*, which don't. But
internally they're all expressions.
A third tailform is .MAYBEOUTPUT; this tail-calls its input expression, like
OUTPUT; the only difference is that it's not considered an error if the input
expression returns UNBOUND instead of a value. This brings us to the next
topic:
3.2 Error messages and tail call elimination
First of all, in order to give error messages at all, the interpreter must
maintain some extra information, mainly the names of procedures, in additional
registers. The main ones are:
fun the name of the current procedure
ufun the name of the current user-defined procedure
this_line the line in ufun's body being evaluated
FUN and UFUN will be unequal if we're evaluating a call to a primitive
procedure.
Logo gives error messages if a return value is expected and not provided
(X DIDN'T OUTPUT TO Y) or vice versa (YOU DON'T SAY WHAT TO DO WITH Z).
These error messages would be straightforward were it not for tail call
elimination. The obvious place to check is right after a call to
EVAL-DISPATCH; the caller knows whether it wants a value or not, so it
compares the value returned with UNBOUND, giving an error message when
appropriate. The call to EVAL-DISPATCH for evaluating an argument would
require a return value other than UNBOUND, whereas the one in EVAL-SEQUENCE
for expressions other than the last would require UNBOUND.
3.2a Tailforms
One complicating factor is tailforms. Consider this Logo program:
to foo
output print 3
end
to baz
show foo
end
When we call BAZ, we should get the error message
print didn't output to output in foo
But we don't call PRINT and then call OUTPUT; we just tail-call PRINT directly
from FOO. So if we're not careful we'll get an incorrect message such as
print didn't output to foo
print didn't output to show
foo didn't output to show
On the other hand, consider this program:
to foo
print 3
end
to baz
show foo
end
This time, when we call BAZ, the message should be
foo didn't output to show in baz
and not anything starting "print didn't output..."
To avoid these incorrect error messages, it's not enough to remember "the
current procedure". We maintain additional information specifically for the
DIDN'T OUTPUT message:
didnt_get_output the context (fun, ufun, line) wanting output
didnt_output_name the procedure that should be blamed if no o/p
These should be thought of as arguments to EVAL-DISPATCH. There are three
special values of didnt_get_output:
UNBOUND we don't want an output
NIL either output or no output is okay (from .MAYBEOUTPUT)
(NIL . NIL) this is a macro, and must output, but not for anyone
in particular
3.2b Tailforms handled as embedded calls
Sometimes it's not good enough to handle STOP and OUTPUT as tail calls.
Consider the Logo instruction
repeat 5 [... if :x<0 [output 3] ...]
This is legal inside a procedure body; the OUTPUT outputs from the enclosing
user-defined procedure. But the evaluation of the expression that is the
second input to REPEAT isn't a tail evaluation, since after it's done the
repeat count must be updated and tested. So when :X is negative we can't just
tail-eval the 3; that would return 3 as the value of the entire repeated
expression sequence. This OUTPUT is a nonlocal exit, comparable to a THROW.
So in this case we actually do call a primitive procedure named OUTPUT.
(Never mind for now what OUTPUT does in this case; we'll get back to nonlocal
exit later.)
3.2c Caller-status arguments to eval-sequence and eval-dispatch
To make all this work, we need an extra argument to EVAL-SEQUENCE and an
extra argument to EVAL-DISPATCH; these must be saved and restored with the
other evaluator registers. The extra argument to EVAL-DISPATCH is
didnt_get_output, mentioned earlier. The extra argument to EVAL-SEQUENCE is
named val_status and is an integer containing some combination of the
following flag bits:
#define VALUE_OK 1 /* [instr instr instr exp] */
#define NO_VALUE_OK 2 /* [instr instr instr instr] */
#define OUTPUT_OK 4 /* [instr instr OUTPUT exp ...] */
#define STOP_OK 8 /* [instr instr STOP ...] */
#define OUTPUT_TAIL 16 /* not [repeat n [... output ...]] */
#define STOP_TAIL 32 /* not [repeat n [... stop ...]] */
Here are all the ways in which EVAL-SEQUENCE is called, with the corresponding
values of val_status:
entry to evaluator (from Logo prompt) NO_VALUE_OK
body of procedure when output wanted OUTPUT_OK | OUTPUT_TAIL
body of procedure when no o/p wanted NO_VALUE_OK | STOP_OK | STOP_TAIL
body of procedure for .MAYBEOUTPUT NO_VALUE_OK | STOP_OK | STOP_TAIL |
OUTPUT_OK | OUTPUT_TAIL
RUNRESULT [sequence] VALUE_OK | NO_VALUE_OK |
OUTPUT_OK[**] | STOP_OK[**]
REPEAT n [sequence] NO_VALUE_OK |
OUTPUT_OK[*] | STOP_OK[*]
APPLY [sequence] args, output wanted VALUE_OK
APPLY [sequence] args, no o/p wanted NO_VALUE_OK |
OUTPUT_OK[*] | STOP_OK[*]
[*] means that these flags are on only if they were on for the enclosing
expression sequence.
[**] means that OUTPUT and STOP are actually not okay inside a RUNRESULT,
but the RUNRESULT handler lets STOP or OUTPUT be called and then detects
and reports the error afterwards.
You may wonder why RUNRESULT and REPEAT are special cases here, but not RUN,
IF, and IFELSE. We'll discuss that shortly.
3.2d Detecting errors before they happen
Sometimes it's not good enough to detect an error after evaluation.
These situations are obscure. For example:
to blorp :x
repeat 5 [pr :x if :x<0 [op 3] make "x :x-1]
end
? blorp 2
2
1
0
-1
You don't say what to do with 3
The OP in BLORP would be legal if the top-level instruction had been
PRINT BLORP 2. When we see the OP, we know that no output is wanted,
but we have to evaluate the input expression (3) anyway, just in case
it prints something or has some other side effect. Because the OP is
nested inside a REPEAT, it turns out, we can't just let the error be
detected downstream; we have to flag the error without letting the REPEAT
handler continue. So we non-tail-evaluate the 3, then after the evaluation
returns, we flag the error. (Look at label op_want_stop in eval.c to see
how this works.) This code is used for all cases of OUTPUT when STOP was
wanted, but it's only really necessary in the REPEAT case.
3.3 Macros
In the SICP evaluators, IF is a special form. This means that the evaluation
of the consequent or alternative of an IF expression happens through a string
of tail calls within the evaluator:
eval -> eval-if -> eval (of consequent or alternative)
Thus, if the entire IF expression is in tail position, the consequent or
alternative is also tail-evaluated. (Of course the predicate argument to IF
can't be tail-evaluated, because the job of IF isn't finished when that
argument has been evaluated; the result must still be tested for true or
false, and either the consequent or the alternative must be evaluated.)
SICP handles COND differently; instead of directly tail-evaluating the actions
of the successful COND clause, the entire expression is rewritten as an IF
expression, which is then evaluated:
(define (eval exp env) ; SICP page 365
(cond ...
((cond? exp) (EVAL (COND->IF EXP) ENV))
...))
COND->IF is a rewriting procedure that returns a new expression, which is used
as argument to another (tail-recursive) call to EVAL.
They could have handled IF as a sort of rewriting, also, except that the
rewriting procedure would need access to the environment:
(define (if->exp exp env) ; compare with EVAL-IF, SICP p. 367
(if (true? (eval (if-predicate exp) env))
(if-consequent exp)
(if-alternative exp)))
The general name for such a rewriting procedure is a macro.
3.3a Why the MCE uses special forms
Special forms in the MCE have two purposes. First, they delay (IF) or
prevent (QUOTE) evaluation of subexpressions; second, they have access to the
caller's environment (DEFINE, SET!). Neither of these issues comes up in
quite the same way in a Logo interpreter.
Delayed evaluation in Logo is done by quoting (including " and [] notations);
this is why we say
ifelse :x < 0 [print "negative] [print "positive]
with brackets around the PRINT instructions, but not around the :X < 0
expression. [In Scheme, parentheses would be used for all three
subexpressions:
(if (< x 0) (display 'negative) (display 'positive))
and it's the status of IF as a special form that prevents the early evaluation
of the last two subexpressions.]
Access to the caller's environment isn't a problem for Logo primitives because
of dynamic scope; the evaluator's ENV variable, which is a global variable in
the C program, contains the correct environment when any primitive is used.
So Logo's equivalent to DEFINE and SET!, namely MAKE, is an ordinary
primitive.
As a result, Logo has only two special forms: TO, because of its strange
notation, and .MAYBEOUTPUT, because its "input" expression need not return a
value. (This is the above-the-line view.) In principle there's nothing
special about RUN, IF, IFELSE, REPEAT, or RUNRESULT.
This simple view of the world doesn't quite hold below the line, though,
because of tail call elimination. We want
to foo
if ... [output tail-call-this]
...
end
to be a tail call, so we can't recursively call the evaluator from inside the
IF primitive.
3.3b Using rewriting instead of special forms
Instead we make IF a macro -- that is, a rewriting procedure. If the
condition is satisfied, it outputs its second input; if not, it outputs the
empty list. (IFELSE outputs either its second or its third input; RUN just
outputs its input.) The evaluator recognizes (in APPLY-DISPATCH) that a macro
is being invoked; instead of just returning what the procedure returns, it
splices the return value into the sequence of unevaluated expressions.
What "splices" means, more precisely, is
unev=(append val unev)
[If you're reading carefully you'll notice that I'm having APPLY do something
to a sequence of expressions, whereas such a sequence exists only in
EVAL-SEQUENCE. Actually APPLY pushes on the control stack a return address
of macro_return (so that every call to a macro is a non-tail call, by the
way); when we get there, the C instruction
stopping_flag = MACRO_RETURN;
is executed. Back in EVAL-SEQUENCE, this flag is noted, and that's where the
splicing is done. A further complication is that if the macro was in fact
called from tail position, we won't return to EVAL-SEQUENCE, which tail-called
EVAL-DISPATCH, so instead macro_return has to go to begin_seq (which goes to
eval_sequence) itself. If the macro is called when an argument was expected,
it must return exactly one expression; the MACRO_RETURN flag is caught at
accumulate_arg, and the returned expression is sent to eval_dispatch.]
Why not just treat RUN, IF, and IFELSE as special forms internally (that is,
handle them as special cases within EVAL-DISPATCH) even though we tell users
they're ordinary procedures? Two reasons:
1. The UCBLogo approach allows these to be separate C procedures,
outside of eval.c, instead of making the core evaluator know
about them.
2. Once we have the macro capability, we can fairly easily make it
available to users, through the .MACRO primitive.
(Note: As of R5RS, Scheme has a standard rewriting capability available to
users, more complicated than ours, but with the same purposes.)
3.3c Logo continuations
This is the entire story for RUN, IF, and IFELSE. As we've seen before,
REPEAT and RUNRESULT are more complicated, because they both have more work to
do after their expression-sequence arguments are evaluated. REPEAT must count
down its repetition count and test for zero. RUNRESULT modifies the result of
the evaluation, returning [] instead of UNBOUND and [FOO] instead of FOO.
REPEAT and RUNRESULT are still considered macros, but they don't return
rewritten expressions. Instead they return a special internal data type,
called a "continuation." (The name is inspired by Scheme's continuations, and
it's a reasonably appropriate name because it does embody "what to do next,"
but it's not an exact analog of Scheme continuations.) A continuation is a
pair whose car is a (representation of a) label within the evaluator, and
whose cdr is whatever data the code at that label expects (this generally
means the inputs to the macro). The code at macro_return recognizes that the
return value was a continuation; it sets VAL to the cdr of the continuation
and jumps to the label specified in the car.
Since most of the implementation of these Logo primitives is buried inside the
core evaluator, they are similar to the special forms of the SICP evaluator.
The difference is that they are recognized through the same table lookup used
for ordinary procedures, rather than by special tests within EVAL-DISPATCH.
GOTO and CATCH are also macros with continuations. GOTO doesn't take an
expression sequence as an input; the sense in which it's a rewriting procedure
is that it "rewrites" its input into the expression sequence consisting of the
part of the current user procedure's body starting at the corresponding TAG
and continuing until the end. In this case the resulting expression sequence
replaces the previous value of UNEV instead of being spliced into it.
We'll discuss CATCH shortly, under the heading of nonlocal exit, but we're not
quite ready for that.
3.4 Dynamic scope
It's rather a shame, I think, that the second edition of SICP no longer even
mentions dynamic scope, so I have to explain what it means before discussing
how it's implemented. See also _Computer Science Logo Style_ vol 1 page 49
on dynamic scope, and CSLS vol 3 page 183 on lexical scope.
Within a procedure body, how do we handle references to variables that are not
local to that procedure itself?
In C, any variable reference within a procedure must be either to a variable
local to that procedure or to a global variable; one procedure can never refer
to another procedure's local variables.
In Scheme, one procedure can be defined inside the body of another; the inner
procedure can refer to local variables belonging to the outer one. This is
called *lexical scope*. [C is lexically scoped, too, but trivially so, since
there is no nesting of procedure definitions.]
In Logo, when one procedure *invokes* another, the invoked procedure can refer
to local variables belonging to the caller. This is called *dynamic* scope.
The following would print DYNAMIC in a dynamically-scoped Scheme, but prints
LEXICAL in (the actual) lexically-scoped Scheme:
(define (a x)
(define (b x) (c))
(define (c) x)
(b 'dynamic))
(a 'lexical)
Focus on the reference to X in the body of procedure C. Since C was invoked
by B, its *dynamic environment* is
C -> B -> A -> global
whereas its *lexical environment* is
C -> A -> global
because C was defined inside A, not inside B.
3.4a Reasons for lexical scope
Why does almost everyone prefer lexical scope? (I prefer it, too, under
many circumstances, by the way.) Three reasons:
1. A complier can produce faster compiled code for a lexically scoped
language, because the compiler knows which variable is meant by a
variable reference without actually running the program. The
rules refer only to the physical position of the procedure's
definition within the program, not to who calls whom. In a
dynamically scoped language, variable references can't be resolved
until the program is actually running; the same reference might
mean two different variables on two calls to the same procedure.
2. Lexical scope avoids "name capture" bugs, in which the programmer
accidentally uses the same name for two different purposes, so
that a procedure thinks it's getting one variable X when it really
ends up getting a different variable X because the name was reused
in a procedure that (directly or indirectly) invokes it. This is
the reason most often cited. For the most part I think people who
name variables X deserve what they get, but name capture does come
up as a problem when writing higher-order functions in Logo, if
the names of inputs to the higher-order function are common words
that might appear in a template. See CSLS vol 2 page 204.
3. In Scheme, in which lexical scope is combined with first-class
procedures (i.e., LAMBDA), that combination allows for persistent
local state variables, and therefore for object-oriented
programming, without any OOP-specific language syntax. See
SICP 3.1 and 3.2 for details.
3.4b Reasons for dynamic scope
Why, then, does Logo use dynamic scope? Four reasons:
1. It allows for a simple notation for higher-order functions, using
first-class expressions rather than first-class procedures. We
can say
to add.suffix :suffix :words
output map [word ? :suffix] :words
end
The expression template [WORD ? :SUFFIX] is evaluated inside MAP,
not inside ADD.SUFFIX, and yet it has access to the latter's local
variable SUFFIX. In Scheme we'd have to use LAMBDA, a more
intimidating notation.
2. Some of us think that dynamic scope is just easier for kids and/or
novice programmers to think about. If a variable exists, it's
available for use, unless shadowed by a more recently created
variable of the same name. You don't have to think about scope
at all, really.
3. Debugging support is more natural. You can tell Logo to PAUSE
whenever there's an error. You then have a Logo prompt, to which
you type ordinary Logo instructions, but all the relevant
variables are available for inspection. This is important because
the actual error in your program might not be in the procedure
that died, but rather in the caller of the caller of the caller of
that procedure. With dynamic scope, the variables of the
erroneous procedure are visible. You don't have to learn special
debugging commands that mean "switch to a different environment."
4. It allows for a programming style using "semi-global" variables
that are local to a top-level procedure. For example, a program
to draw a complicated picture might have a SIZE input, which is
then used by its subprocedures, sub-subprocedures, etc.
3.4c Shallow binding
There's an efficiency problem with dynamic scope and recursive procedures.
Consider the following modification of a common example:
make "one 1
to fact :n
if :n = 0 [output :one]
output :n * fact :n-1
end
Suppose we ask for FACT 5. This gives rise to an (embedded) recursive call
for FACT 4, which calls FACT 3, which calls FACT 2, which calls FACT 1, which
calls FACT 0, which says OUTPUT :ONE.
We have to look up the variable name ONE.
Is it local to the FACT 0 environment? No. So we look in the dynamically
enclosing environment, for the FACT 1 call. Is it there? No, so we look in
the FACT 2 environment, the FACT 3 environment, the FACT 4 environment, the
FACT 5 environment, and finally the global environment, which is where we find
it. This is quite time-consuming, and would be even more so if we wanted the
factorial of 100.
Here's the solution. Instead of putting new local bindings in a newly created
environment frame, and linking that to an enclosing environment, as in SICP
3.2 and 4.1.3, we instead keep the currently active bindings in a single
global table. (Since this table is used for every variable reference, we work
hard at making it efficient, namely by using a hash table rather than the
linear lists of SICP environments.) We still create new frames for procedure
calls, but what we put in the new frames are *saved* bindings, the ones we
replaced in the global table for our new local variables. These frames are
used only when the called procedure returns; we restore the previous state of
the global table from the saved values. This is called *shallow binding*.
The implication of this technique is that every variable reference, local or
global, takes constant time. The cost is that returning from a procedure call
takes time proportional to the number of local variables in the procedure.
Since all symbols are looked up in the global symbol table, there is no ENV
register in the Logo evaluator. Instead there are two registers pointing to
a stack of saved values. var_stack points to the head of the stack, where
new entries are added; var points to the beginning of the current frame, so
that bindings between var and var_stack are the ones that should be restored
when an embedded call to eval_dispatch returns.
3.5 Nonlocal exit
Despite the proper handling of tail calls, a typical program still gives rise
to several nested calls to EVAL-DISPATCH. Ordinarily these are "unwound" one
by one, as (Logo) procedures return values. But sometimes we want to unwind
several such recursive evaluations at once. The prototypical case is a call
to THROW that appears several recursive evaluations below the corresponding
CATCH.
Were it not for shallow binding, we could in fact leapfrog over several nested
evaluations at once, just throwing out their local information, sort of like
setjmp/longjmp in C. But instead we must restore all of the saved variable
bindings, in the correct order, so we have to let each of the nested EVAL
calls return to its caller.
But we don't want those intermediate levels to keep computing; the whole point
of THROW is to avoid completing some partial evaluations of expression
sequences. We need some way to tell EVAL-SEQUENCE to return right away,
without evaluating more expressions.
This is done using another evaluator register, called stopping_flag, which
should be viewed as part of the return value from EVAL. It has the following
possible values:
RUN program is running
STOP non-tail STOP was seen
OUTPUT non-tail OUTPUT was seen
THROWING THROW was seen
MACRO_RETURN a macro is returning a rewritten expression (see 3.3b)
Logo.h defines shortcuts to test some of these states:
#define NOT_THROWING (stopping_flag != THROWING)
#define RUNNING (stopping_flag == RUN)
#define STOPPING (stopping_flag == STOP)
It's NOT_THROWING because that turns out to be the most commonly used test.
The very first thing EVAL-SEQUENCE does is
if (!RUNNING) goto fetch_cont;
so that if we've seen a THROW (or a non-tail STOP or OUTPUT, as explained
in 3.2c) we just return without further evaluation of this sequence.
The continuation for CATCH calls EVAL-SEQUENCE for its second argument, then
if stopping_flag is THROWING and throw_node, a global variable, is equal to
its first argument, it sets stopping_flag back to RUN. (THROW can take an
optional second input, which is a value for the corresponding CATCH to return.
This works by setting output_node, another global variable, to the thrown
value. Non-tail OUTPUT also sets output_node.)
The variables throw_node and output_node don't have to be saved and restored
around embedded EVAL calls, because only one THROW or OUTPUT can be active at
a time.
If CATCH catches THROW, who catches non-tail STOP or OUTPUT? EVAL-SEQUENCE
does, after restoring registers and before looping for the next expression.
And so does REPEAT, so that it won't try repeating its input any more.
3.6 Lack of parentheses delimiting procedure calls
The SICP evaluators see an expression sequence as a list in which each element
is one complete expression, because every non-atomic Scheme expression is
enclosed in parentheses. So, for example, how many arguments are in this
procedure call? One less than the number of elements in the list. Easy.
Logo has a harder time, because expressions need not be parenthesized, and
also because of infix arithmetic notation. The Logo evaluator sees a
procedure name, realizes that this is the beginning of a procedure call, and
must then look up the procedure in order to know how many input subexpressions
to evaluate. Furthermore, some procedures take a variable number of inputs,
so the grouping may depend on the use of optional parentheses.
This grouping of tokens into subexpressions is slow, and it's wasteful to do
it repeatedly for expression sequences in the body of a procedure. So UCBLogo
does it once, the first time the procedure is called, and saves the result.
In effect each Logo procedure is translated into a Scheme procedure, as if it
had been typed in with every subexpression parenthesized and no use of infix.
3.6a Tokenization
But I've skipped a step. Even the division of the characters on a line into
tokens is problematic. Here's the classic problem: Should the hyphen
character automatically be a token by itself? We'd like
print 5-2
to work, and so we'd like the 5-2 to be understood as three tokens: 5, -, 2.
On the other hand, we'd like
print first [555-1212 642-8311 868-9827]
to print 555-1212, not 555.
There are three ways to resolve this dilemma:
"Old LCSI" style: the second example prints 555. This is what Apple
Logo, IBM Logo, and other early LCSI products do.
"New LCSI" style: the first example gives the error
I don't know how to 5-2
This is what Logowriter and later LCSI products do.
"Terrapin" style: both examples work as desired.
I apologize to non-Americans for naming these after the main USA Logo vendors,
but that's the history I know best.
UCBLogo uses Terrapin-style tokenization, which I think is clearly the best
choice. To make it work, there are two sets of tokenization rules, which I
call PARSE and RUNPARSE. These are in fact the names of UCBLogo primitives:
? show parse "5-2
[5-2]
? show runparse "5-2
[5 - 2]
When a line is read (from keyboard or file), it is parsed. When and if that
is evaluated, it's runparsed -- but even then, text within square brackets is
not runparsed. (Such text will be runparsed if it is ever used as input to
RUN and friends.)
The precise rules are given in more detail in the "Tokenization" section of
the UCBLogo User Manual.
3.6b Monadic and dyadic minus
The minus sign (-) is particularly thorny for tokenization because it has
three different meanings:
(1) If it begins a token, and the following character is a digit, then it is
part of a negative number token.
(2) If it follows an open parenthesis or bracket, or if it follows a space
and is not followed by a space, then it is the monadic (one-operand) negation
operator.
(3) Otherwise, it is the dyadic subtraction operator.
The part about spaces in (2) above comes from the problem that function
argument expressions are separated by spaces, not commas as in most
programming languages. This, along with infix arithmetic, gives rise to
an ambiguity. Does
foo a - b
represent a call to a one-argument function with A minus B as the argument,
or a call to a two-argument function with A as the first argument and
negative B as the second? The UCBLogo rule is this:
foo a-b ; dyadic subtract
foo a - b ; dyadic subtract
foo a -b ; monadic negate
Once runparsing is done, the spacing information is lost. So when a monadic
negate is seen during runparsing, it is replaced by the two tokens "0" and
"--":