forked from felix-lang/dypgen
-
-
Notifications
You must be signed in to change notification settings - Fork 2
/
CHANGES
1322 lines (1060 loc) · 51.1 KB
/
CHANGES
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
2019/11/16
Fixed the project to work with OCaml versions >= 4.06.0.
__________________________________________________________________________
2012/06/19
Fixed a bug that made the parser return wrong beginning positions for the
matched symbols when the right hand side of the rule to reduce with would
begin with nullable non terminal symbols.
Section 2.4 of the manual has been updated.
__________________________________________________________________________
2012/05/28
Fixed a bug that made the parser return wrong beginning positions for the
matched symbols (layout characters were not skipped).
__________________________________________________________________________
2012/04/19
Fixed a bug that made the parser return wrong positions for the matched
symbols.
__________________________________________________________________________
2012/03/14
Added the function:
val is_re_name : ('t,'o,'gd,'ld,'l) parser_pilot -> string -> bool
Now you can ask the parser_pilot whether s is the name of a declared regexp:
>
> if is_re_name dyp.parser_pilot s
> then Regexp (RE_Name s)
> else failwith "regexp name undefined"
__________________________________________________________________________
2011/11/27
Fixed a bug that prevented merge to happen when reducing with a rule with
a right-hand side beginning with a non terminal reducing to the empty
string.
Fixed a bug that made the parser raise Syntax_error when the starting non
terminal (the entry point of the grammar) was present in a right-hand side.
Fixed a bug that prevented the ASTs of the parse forest to be returned
with their respective priorities (instead empty strings were returned).
Fixed a bug that prevented merge to happen when reducing with a rule that
had an inherited attribute or an early action in the right-hand side.
__________________________________________________________________________
2011/06/18
Added the option --Werror that makes the warnings become errors (except
for merge warnings).
__________________________________________________________________________
2011/03/28
Fixed a bug that prevented calls to functions parse and lexparse to work
properly.
Fixed a bug that happened when using Parser in the parser command list in
some instances (the number identifying the grammar was not updated).
__________________________________________________________________________
2011/02/13
Two functions now allow to save a parsing_device without any functional
value and to attach them after loading it. See manual section 7.4 for more
information.
__________________________________________________________________________
2011/02/02
Fixed a bug that made the functions parse and lexparse not take into
account the optional global_data and local_data.
The constructor Parser of the parser commands list now takes a value of
type parsing_device instead of parsing_pilot.
The type parser_pilot is now concrete:
type ('token,'obj,'global_data,'local_data,'lexbuf) parser_pilot = {
pp_dev : ('token,'obj,'global_data,'local_data,'lexbuf) parsing_device;
pp_par : ('token,'obj,'global_data,'local_data,'lexbuf) parser_parameters;
pp_gd : 'global_data;
pp_ld : 'local_data }
When you want to save the automaton to a file you marshal
dyp.parser_pilot.pp_dev instead of dyp.parser_pilot.
__________________________________________________________________________
2011/01/27
Code generated by dypgen now tests if the version of dypgen to generate
the code matches the version of the dyp library.
It is now possible to load a parser at parsing time using the constructor
Parser of ('token, 'obj,'global_data,'local_data,'lexbuf) parser_pilot
of the parser commands list. See section 7.4 of the manual for more
information.
__________________________________________________________________________
2010/09/01
Fixed a bug that made the function next_lexeme raise
Invalid_argument("String")
Fixed a bug that made the function next_lexeme return an empty list
Fixed a bug that made the lexer raise Invalid_argument("String.create")
Note that the changes made to the lexer in the version 2009/11/15 are not
effective anymore (they did not work anyway).
__________________________________________________________________________
2010/06/20
Fixed a bug that exchanged inherited attributes between distinct parsing
paths inappropriately.
Fixed a bug that made the parser infinitely loop when a left recursive
rule had an early action.
Left recursive rules with inherited attributes do not make the parser loop
indefinitely anymore (but inherited attributes are still not properly
handled in this case, see the manual section 5.4 for more details).
__________________________________________________________________________
2009/11/15
The layout regexp are not prefered anymore over non layout regexp when
their match is longer. For instance a regexp that matches the empty string
and that is expected will be prefered over a layout character.
For example the following rule:
a: "x" - "y"? "z"
now matches the string "x z", while it didn't previously (assuming space
is a layout character).
Fixed a bug that made a syntax error in the generated caml code. It
happened when the type of an entry point was declared and an early action
was used.
__________________________________________________________________________
2009/04/30
Improved the speed of generation of the automaton.
Fixed a bug that made some merges be performed several times instead of
just once.
Renamed the temporary files parser.temp.ml to parser_temp.ml to avoid
ocaml 3.11 emitting a warning.
__________________________________________________________________________
2009/04/11
Fixed a bug that raised Invalid_argument with large grammars using
priorities.
Fixed two bugs that made the parser consume a very large amount of memory
with large grammars using priorities.
Fixed a bug (introduced in the previous version) that made the parser
perform the reductions in a wrong order in some cases resulting in lost
parse trees.
__________________________________________________________________________
2009/04/09
Fixed a bug that raised Not_found with no reason in some instances.
Changed a data structure that made the parser use too much memory for large
grammars with priorities.
__________________________________________________________________________
2009/03/10
Fixed a bug with inherited attributes. They did not work when there was an
early action in the right-hand side of the rule.
The behavior of early actions with respect to dyp.local_data changes
slightly: the scope of local_data now is the whole right-hand side (final
action still excluded) even when there are early actions in the rhs.
Fixed a bug with inherited attributes that caused some parse trees to be
duplicated.
Fixed a bug with lexers generated with dypgen: when the character '-' was
used before a nullable non terminal nt in the rhs of a rule in the parser
definition, the forbidding to have a layout character would be applied
before the previous symbol instead of being applied before nt.
The temporary file is now named with the extension .temp.ml instead of
.ml.temp for compatibility with ocamlc 3.12.
__________________________________________________________________________
2009/02/23
Fixed 2 bugs with inherited attributes. Parsing would fail in some
instances when it shouldn't and the parser would not keep track correctly
of the different synthesized ASTs when the same rule appeared several
times with different inherited attributes.
__________________________________________________________________________
2009/02/21
Fixed a bug happening when extending the grammar in some instances. The
generated automaton would not parse as intended or the generation would
fail.
Added the option --cpp-options "options" to dypgen to pass options to cpp
when called by dypgen. It is useful to pass the flag -w to cpp to avoid
the warning messages. --cpp is not needed when using --cpp-options.
dypgen now supports a subset of the class of L-attribute grammars.
You can send down an inherited attribute except for left-recursive rules
which make the parser loop indefinitely when used with an inherited
attribute. See section 5.5 of the manual.
__________________________________________________________________________
2009/02/10
Nested rules are now enclosed between [ and ] instead of ( and ).
If the right-hand side of a rule is not empty you can state no action this
means that the value bound to the last symbol of the rhs is returned.
For example you can use it with nested rules this way: ["kw1" | "kw2"- ]
here "kw1" can be followed by layout characters while "kw2" cannot.
Another example: nt ["," nt]* to have a list of nt separated with commas.
Note that something like ['a'] will still be interpreted as a character
set and not as a nested rule, write ["a"] instead.
Bug fixed: dypgen raised a syntax error when named regexp were declared
without any main lexer.
__________________________________________________________________________
2009/02/05
Added the following functions to Dyp :
val set_newline : 'obj dyplexbuf -> unit
val set_fname : 'obj dyplexbuf -> string -> unit
documented at the end of section 2.1.
Line of the form:
# -line-number- "filename" -anything-until-end-of-line-
are now allowed in .dyp files and taken into account for error location by
dypgen and Caml.
As a consequence dypgen now accepts cpp preprocessing of .dyp files.
With the option --cpp dypgen calls the C preprocessor cpp on the input
file before processing it. For example:
#define INFIX(op,p) expr(<=p) #op expr(<p) { $1 op $3 } p
expr:
| INFIX(+,pp)
| INFIX(*,pt)
is expanded into:
expr:
| expr(<=pp) "+" expr(<pp) { $1 + $3 } pp
| expr(<=pt) "*" expr(<pt) { $1 * $3 } pt
The following abbreviations are now valid inside caml code:
$< for dyp.Dyp.rhs_start_pos
$> for dyp.Dyp.rhs_end_pos
__________________________________________________________________________
2009/01/29
Fixed a bug that made the parser miss some reductions in some cases, as
a result parsing failed in some instances when it should not.
Bug fixed: when several actions were bound to a rule, they were performed
in reverse order instead of source file order (manual section 5.1).
It is now possible to execute all the actions bound to each rule instead
of just the first that doesn't raise Giveup. For this use the option:
--use-all-actions or declare: let dypgen_use_all_actions = true.
__________________________________________________________________________
2009/01/26
Several functions of the GLR algorithm and of analysis of the grammar have
been rewritten. This fixes several bugs: in some cases some ASTs were lost
or parsing failed on valid inputs.
Cyclic grammars (when a non terminal can derive itself) are now allowed.
__________________________________________________________________________
2009/01/05
Bug fixed: the main function of the parser was not tail recursive when
using the lexer generated by dypgen. This is fixed and results in a big
performance improvement, especially with large files.
Another improvement makes the parser faster (regardless of which lexer
generator is used) when the grammar is ambiguous.
__________________________________________________________________________
2009/01/01
Bugs fixed:
- When a layout regular expression could match the empty string it caused
the parser to loop indefinitely in some instances.
- Stating two regular expressions on a rhs with the second one being a
sequence of regular expressions made dypgen consider them as just one
regexp instead of two separated regexp.
- In some instances, when using dypgen_choose_token = `all, the lexer
would stop lexing inappropriately and the parser would raise Syntax_error.
__________________________________________________________________________
2008/12/31
Bug fixed: merge happened even when one rule forbade layout characters and
the other did not. Such merges do not happen anymore.
__________________________________________________________________________
2008/12/30
Bug fixed. It happened in some instances when extending the grammar.
It is now possible to forbid layout characters more precisely than with
'!':
The type symb is now:
type symb =
| Ter of string
| Ter_NL of string
| Non_ter of string * string nt_prio
| Non_ter_NL of string * string nt_prio
| Regexp of regexp
| Regexp_NL of regexp
instead of:
type symb =
| Ter of string
| Non_ter of string * string nt_prio
| Regexp of regexp
The suffix _NL tells that the symbol cannot be preceded by layout
characters. It is only relevant with lexers generated by dypgen.
In the .dyp file, you use the character '-' before a symbol.
The type rule is now:
type rule = string * (symb list) * string * rule_options list
instead of:
type rule = string * (symb list) * string * bool
with:
type rule_options = No_layout_inside | No_layout_follows
Using [] is like using true in the previous version and using
[No_layout_inside] like using false. No_layout_follows forbids layout
characters after the last symbol of the right-hand side of the rule. In
the .dyp file you use the character '-' after the last symbol in the rhs
of the rule.
__________________________________________________________________________
2008/12/17
The default behavior of dypgen with respect to merging sub trees has
changed. When merging, dypgen used to pick one of the global data and one
of the local data arbitrarily. Now dypgen does not merge anymore when data
are different (according to global_data_equal and local_data_equal). This
behavior turns out to be more natural. You can still have the old behavior
by using the new optional argument:
?keep_data:[`both|`global|`local|`none]
of the functions parse and lexparse with `none. Or define the variable:
val dypgen_keep_data : [`both|`global|`local|`none]
in the header of the parser (see 3.2.4 in the manual for more info).
__________________________________________________________________________
2008/12/16
Bug fixed related to merging sub-trees, some merged trees were duplicated.
__________________________________________________________________________
2008/12/14
The record dyp has a new field:
next_lexeme : unit -> string list
next_lexeme allows the user action to know the next lexeme to be matched
by the lexer. It only works for the main lexer generated by dypgen. For
more information about next_lexeme see section 5.7 of the manual.
__________________________________________________________________________
2008/12/13
Bug fixed: dypgen_choose_token = `all did not work.
__________________________________________________________________________
2008/12/12
Bug fixed: merging of sub-trees was not handled properly when the grammar
contained epsilon rules and an assert failure could be raised.
__________________________________________________________________________
2008/12/09
Bug fixed: when a rule began with an early action the generated code was
wrong.
Bug fixed: in some instances, the layout characters were not skipped.
Bug fixed: long sequences and alternatives of regexp in a .dyp file made
dypgen loop for ever.
The function lexparse has a new optional argument:
?choose_token:[`first|`all]
When `all is used the lexer uses all the tokens that are the longest match
instead of just the first one. By default `first is used.
__________________________________________________________________________
2008/09/25
Dypgen can now generate a lexer for the parser and auxiliary lexers to be
called from the main lexer. These lexers do not support unicode nor
submatching (bindings with the keyword "as"). See sections 1.5 and 2.1 of
the manual for information about dypgen lexer generator. The main lexer
can be extended by using regular expressions in the right-hand sides of
new grammar rules.
The type symb is now:
type symb =
| Ter of string
| Non_ter of string * (string nt_prio)
| Regexp of regexp
instead of:
type symb =
| Ter of int
| Non_ter of string * (string nt_prio)
This means that terminals are refered by a string (their name "TOKEN")
instead of an int (t_TOKEN), and that regular expressions are allowed in
the rhs of a rule. The type regexp is given in section 6.1.
Don't use names of undefined tokens when you write new rules (it raises
Undefined_ter), you cannot define new terminal symbols.
The type obj now owns the constructor:
Lexeme_matched of string
It is the constructor for the strings returned by regular expressions.
This constructor is present even if you don't use dypgen lexer generator.
There is one constructor Lex_lexer_name for each auxiliary lexer (where
lexer_name is the name of the lexer), and one constructor:
Lex_lexer_name_Arg_arg_name
for each parameter of the lexer (where arg_name is the name of the
parameter). the user won't have to deal with these constructors.
When using dypgen as the lexer generator you must use the function
lexparse instead of parse (section 7.3).
The patterns for symbols are now enclosed between < and > instead of [ and
], these brackets are now used for characters intervals.
It is not possible anymore to not state any action code after a rhs (this
meant the user action returned None). It was too ambiguous.
You should not have a non terminal named eof because this is the regular
expression that matches the end of input.
You can use the character '!' at the beginning of the rhs of a rule. This
means that the parser will reduce with this rule only if no layout
character was matched in the part of the input that is being reduced. This
only apply when dypgen is the lexer generator.
As a consequence the type of grammar rules changes, it is now:
type rule = string * (symb list) * string * bool
instead of:
type rule = string * (symb list) * string
When the bool is true layout characters are allowed to be matched in the
part of the input to be reduced.
The documentation has been updated. Partial actions are now called early
actions in the manual.
The keyword %parser is equivalent to %% in .dyp files.
__________________________________________________________________________
2008/09/01
--no-pp now works when non terminals are declared with %start too.
__________________________________________________________________________
2008/08/31
Added the option --no-pp. This prevents dypgen from declaring pp in the
.mli file (only works when no non terminal is declared with %start).
Added the option --no-obj. This prevents dypgen from declaring the type
obj in the .mli file (does not work if pp is declared).
The types token and obj declared in the .mli now use type names that are
completely prefixed with module names (it was already the case for the
value pp). Therefore you should avoid opening modules in %mlitop and
%mlimid. In particular this fixes a type error that happened when a module
containing a module of the same name was opened in the .dyp file.
__________________________________________________________________________
2008/08/27
Fixed a bug that happened when extracting strings from parser.extract_type
__________________________________________________________________________
2008/07/02
Added the option --command (see manual 10.6).
dypgen now works on an auxiliary file .ml.temp when generating the .mli
before saving the result in the .ml file.
Changed the type name parser to parser_pilot and the value name parser
to pp, this allows camlp4 to parse the generated .ml file. The function
update_parser is renamed update_pp and the field parser of the record
dypgen_toolbox is renamed parser_pilot.
__________________________________________________________________________
2008/06/14
The operators *, + and ? have been added to dypgen syntax, see section 4.5
of the manual for more information.
The .ml generated file defines the value parser.
The record dyp has a new field named parser. Because of the type of this
field, the typechecker of ocamlc complains in some cases, like when using
a partial action that returns a parser commands list (i.e. beginning with
...@{ ). In such cases you will have to use the option -rectypes of ocamlc
and use --ocamlc "-rectypes" with dypgen.
The following functions have been added to the module Dyp of the library:
val update_parser :
('token,'obj,'global_data,'local_data,'lexbuf) parser ->
('token,'obj,'global_data,'local_data,'lexbuf) dyp_action list ->
('token,'obj,'global_data,'local_data,'lexbuf) parser
val parse :
('token, 'obj,'global_data,'local_data,'lexbuf) parser -> string ->
?global_data:'global_data ->
?local_data:'local_data ->
?match_len:[`longest|`shortest] ->
?lexpos:('lexbuf -> (Lexing.position * Lexing.position)) ->
('lexbuf -> 'token) ->
'lexbuf ->
(('obj * string) list)
The function parse makes possible to parse for any non terminal symbol of
the grammar and to parse recursively from the action code.
The function update_parser makes possible to modify a parser with a list
of parser commands of type dyp_action. Both functions can be used inside
the action code and outside.
See section 7 for more information about parser, parse and update_parser.
The types of start entry points, global_data and local data are now
infered by Caml. You don't have to state them anymore. The keywords
%global_data_type and %local_data_type are therefore discarded.
The keyword %lexbuf_type is discarded. If you want to use another lexer
than ocamllex you have to define the function dypgen_lexbuf_position in
the header. If you don't need the positions of the lexer you may use the
line: let dypgen_lexbuf_position = Dyp.dummy_lexbuf_position
Added the option --ocamlc string to dypgen. In order to know some types,
dypgen calls ocamlc -i (see 8.2). --ocamlc makes possible to pass some
command-line options to ocamlc. Example:
dypgen --ocamlc "-I ../dypgen/dyplib -rectypes" parser.dyp
The types pliteral and lit are renamed psymbol and symb.
The constructor Will_shift of bool is replaced by Dont_shift.
The program pgen has been removed, dypgen generates itself now.
The type error messages are less difficult to understand in some cases.
Added the option --no-mli : dypgen does not generate a .mli file.
Added the variable dypgen_match_length (see section 10.7).
You can state no action after the right-hand side of a rule (see 4.6).
__________________________________________________________________________
2008/02/04
Some improvements in the speed of extension of the grammar.
__________________________________________________________________________
2008/01/11
The constructor Relations has been renamed Relation for consistency with
dypgen syntax.
Some improvements in the speed of extension of the grammar.
A bug that may happened when extending the relation between priorities has
been fixed.
__________________________________________________________________________
2008/01/04
The constructor Keep_grammar has no argument anymore (you just state
Keep_grammar instead of Keep_grammar true)
It is now possible to add new priorities and make the relation true
between priorities. See the manual section 9 for more information.
__________________________________________________________________________
2007/12/26
The option --lexer does not exist anymore. If you want to use another lexer
than ocamllex, use the keyword %lexbuf_type to assign the type you want to
the lexer buffer (see section 13.5 of the manual for more info).
It is now also possible to have relevant information about the positions
(i.e. with the functions: symbol_start, symbol_start_pos, ...) when not
using ocamllex too (see section 13.6 of the manual for more info).
Added the example program position_token_list. It is the same as the demo
program position except that it first uses ocamllex to make a list of
tokens and then uses dypgen to parse this list of tokens.
New keyword %mlimid to add code to the mli between the type token and the
declaration of the entry point functions.
__________________________________________________________________________
2007/11/09
It is not possible to remove rules at parsing time in this version anymore.
The option --prio-pt does not exist anymore (a new method combines the
benefit of both former methods).
The commands to change the relation between priorities and add new
priorities are not available anymore (they were bugged in the previous
versions).
The constructors Remove_rules and Priority_data have been removed from
the type dyp_action. The fields priority_data, add_nt and find_nt have
been removed from the record dyp (type dypgen_toolbox).
The following functions are not available anymore:
empty_priority_data
is_relation
insert_priority
find_priority
set_relation
update_priority
add_list_relations
dyp.find_nt
dyp.add_nt
The following constructor has been added to the type dyp_action:
Bind_to_cons of (string * string) list
(the type dyp_action is the type of the list that user actions return
along with the resulting tree)
When one returns:
Bind_to_cons [("nt1","Cons1"),("nt2","Cons2")]
the non terminal nt1 is bound to the constructor Cons1 and nt2 to Cons2.
The news rules introduced at parsing tim are constructed differently:
one now simply uses the string of the non terminal and the string of
the priority. e.g.
("expr",
[Non_ter ("expr",No_priority); Ter t_PLUS; Non_ter ("expr",No_priority)],
"default_priority")
for the rule:
expr: expr PLUS expr
The types to construct rules were previously:
type token_name
type non_ter
type priority
type non_terminal_priority =
| No_priority
| Eq_priority of priority
| Less_priority of priority
| Lesseq_priority of priority
| Greater_priority of priority
| Greatereq_priority of priority
type 'a pliteral =
| Ter of token_name
| Non_ter of 'a * non_terminal_priority
type lit = (non_ter * non_terminal_priority) pliteral
type rule = non_ter * (lit list) * priority
And now they are:
type token_name
type 'a nt_prio =
| No_priority
| Eq_priority of 'a
| Less_priority of 'a
| Lesseq_priority of 'a
| Greater_priority of 'a
| Greatereq_priority of 'a
type 'a pliteral =
| Ter of token_name
| Non_ter of 'a
type lit = (string * (string nt_prio)) pliteral
type rule = string * (lit list) * string
Added the commands infix, infixl and infixr to the example tinyML
(see test_infix.tiny)
__________________________________________________________________________
2007/11/08
Fixed a bug that raised an Assert_failure when using an ambiguous
grammar with an epsilon production and using Dyp.keep_all.
Partial actions must now be preceded by three dots `...' to prevent
puzzling errors when one forgets a `|' between productions.
__________________________________________________________________________
2007/07/29
global_data and local_data can be passed to the parser since they are
now optional arguments of the parser. If 'main' is a start symbol of
the grammar then we have the function:
val main : ?global_data:gd_type -> ?local_data:ld_type ->
(Lexing.lexbuf -> token) -> Lexing.lexbuf ->
((main_type) * Dyp.priority) list
where gd_type and ld_type represents the type of the global and local
data. Those have to be declared in the parser definition using:
%global_data_type <gd_type>
%local_data_type <ld_type>
and main_type represents the type of the value returned by the non
terminal 'main'.
You still need to define global_data and local_data in the header of the
parser definition, but their type doesn't have to be a ref any more.
They may have a polymorphic type, for instance
%global_data_type <'a list>
is valid.
The record dyp has no mutable field anymore. To give instructions to the
parser one now uses a list of values of type dyp_action instead of
assigning values to some fields of dyp:
type ('obj,'gd,'ld) dyp_action =
| Global_data of 'gd
| Local_data of 'ld
| Priority_data of priority_data
| Add_rules of
(rule * (('obj,'gd,'ld) dypgen_toolbox ->
('obj list -> 'obj * ('obj,'gd,'ld) dyp_action list))) list
| Remove_rules of rule list
| Will_shift of bool
| Keep_grammar of bool
| Next_state of out_channel
| Next_grammar of out_channel
If the action returns such a list along with the returned AST, then the
character '@' is stated just before the left brace which begins the
action code. For instance if one wants to add a new rule:
@{ returned_AST, [Add_rules [(new_rule, new_action)]] }
If one wants to change global_data:
@{ returned_AST, [Global_data new_global_data] }
and so on.
The user actions that are introduced at parsing time must now return a
couple: (returned_AST, dyp_action_list) instead of just returning an AST.
The keywords
%constructor ... %for ...
can now be used with tokens as well. This is useful to save constructors
or to make the compilation of the generated code less demanding by using
fewer polymorphic variants. For instance by using the same constructor
for any token with no argument.
A same constructor of the type obj can be shared by non terminals and
tokens accepting one argument. In particular the constructors associated
with tokens with an argument can be used as constructors for new non
terminals introduced at parsing time.
The manual has been updated.
Fixed a bug that, in some cases, made some parse trees lost or raised
Syntax_error when it shouldn't.
Fixed a bug that happened with more than one partial action in a right
hand side.
__________________________________________________________________________
2007/07/26
New option for dypgen
--noemit-token-type
the type token is not emitted in the mli or ml files, it must be provided
by the user instead.
New keywords %mltop and %mlitop to add code at the top of the .ml or .mli
generated file.
__________________________________________________________________________
2007/07/22
Added the map:
val ter_of_string : token_name String_ter_map
where
module String_ter_map : Map.S with type key = string
It maps strings of terminal symbols to their corresponding token_name
values. It is defined in the module Dyp_symbols in the generated file.
The following list is also available in this module:
val ter_string_list : (string * token_name) list
Each string of terminal symbol is associated with its corresponding
value of type token_name.
The type of the merge functions changes again. It is now:
(nt_type * global_data_t * local_data_t) list ->
nt_type list * global_data_t * local_data_t
The merges between values for a given part of the input and a given non
terminal (and a given priority) all happen at once. As always you can
return a list of ASTs if you want to keep a forest of them, but you must
choose only one global data and one local data.
Fixed a bug that lost some parse trees in some cases.
__________________________________________________________________________
2007/07/21
Fixed a bug in dypgen lexer that happened when the string "\\" appeared
in the Caml code.
__________________________________________________________________________
2007/07/16
The merge functions are now of type:
nt_type list -> global_data_t -> local_data_t ->
nt_type -> global_data_t -> local_data_t ->
merge_result
with:
type merge_result =
| Merge of (nt_type list * global_data_t * local_data_t)
| Dont_merge
The second, third, fifth and sixth arguments are the global_data and
the local_data associated with respectively the previous parsing and
the current parsing that are merged.
The global_data and local_data that are kept must be returned in the
result of the merge function, like:
Dyp.Merge (tree_list, global_data, local_data)
The constructor Merge is defined in the module Dyp of the library.
If you don't want to merge the two parsings because the two global_data
values or the two local_data values are distinct and you want to keep
two GLR parsings with distinct global_data or local_data, then the merge
function must just return Dyp.Dont_merge.
The section of the manual about merge functions has been updated.
__________________________________________________________________________
2007/07/15
The generic merge functions keep_older and keep_newer have been replaced
by keep_one to make clear it is not possible to know which AST is chosen
by the default merge function.
The merge functions are now associated with the constructors of the type
obj instead of being associated with the non terminals.
The section of the manual about merge functions has been fixed and
updated, see it for more info.
__________________________________________________________________________
2007/07/13
dyp.keep_grammar is now of type bool
If it is set to true then the parser keeps the current grammar after
the current reduction. It has no effect when the current action itself
changes the grammar.
A field last_local_data has been added to the record dyp. It is equal
to the value of local_data when the reduction of the last non terminal
of the right-hand side of the current rule occured.
To prevent local_data from being forgotten, use:
dyp.local_data <- dyp.last_local_data;
This allows local_data to "climb up" a node in the parsing-tree, and
thus to extend its scope.
Contrary to local_data, last_local_data is immutable.
Manual updated, see sections 6.5 and 8.3.
The example tinyML uses dyp.keep_grammar now.
__________________________________________________________________________
2007/07/06
The type obj is now also available when using --pv-obj
The value default_priority is defined in the module Dyp.
Added the field keep_grammar to the record dyp.
dyp.keep_grammar <- `First
makes the parser keep the grammar that was used after reducing the first
symbol of the right-hand side of the rule, after the current reduction.
dyp.keep_grammar <- `Last
makes the parser keep the current grammar after the current reduction.
This feature is not documented in the manual yet, see the example tinyML
for an example.
__________________________________________________________________________
2007/06/27
Fixed a bug that happened in some case when merge was needed, some
parse trees would be lost.
__________________________________________________________________________
2007/06/24
Bug fixed: the exception Bad_constructor was raised inappropriately when
the following conditions were true:
1) a non terminal is declared with
%constructor Cons %for nt
or
%non_terminal nt
and this non terminal is not used in the initial grammar but subsquently
in extensions of the grammar.
2) and the non terminal of the left-hand side of a rule that introduces
extensions of the grammar can derive epsilon (the empty string).
__________________________________________________________________________
2007/06/23
New type for Bad_constructor:
exception Bad_constructor of (string * string * string)
3rd string is the name of the constructor that has been used.
__________________________________________________________________________
2007/06/22
Faster generation of the .ml file
Better encapsulation: the type dypgen_toolbox (i.e. the type of the
record dyp) is now in the module Dyp.
The types of dyp.add_nt and dyp.find_nt change:
dyp.add_nt : string -> string -> Dyp.non_ter
instead of
dyp.add_nt : string -> Dyp.non_ter
and
dyp.find_nt : string -> Dyp.non_ter * string
instead of
dyp.find_nt : string -> Dyp.non_ter
See section 8.4 of the manual for more information.
Three new exceptions:
exception Bad_constructor of (string * string)
This exception is raised when a value is returned by a user action
with a bad constructor (not corresponding to the non terminal). This
can only happen with rules defined dynamically.
1st string is the rule and can be used to be printed.
2nd string is the name of the constructor that should have been used.
exception Constructor_mismatch of (string * string)
This exception is raised when a nt is added using dyp.add_nt with
a constructor cons but it already exists with another constructor.
1st string is the name of the previous constructor,
2nd string is the name of the constructor one tried to add.
exception Undefined_nt of string
This exception is raised when there is in the grammar a non
terminal that is in a right-hand side but never in a left-hand
side (i.e. it is never defined). The string represents this non
terminal. This exception is not raised if the option --no-undef-nt
is used.
__________________________________________________________________________
2007/06/17
Fixed a bug that made dyp.add_nt returns a wrong value when used on
a new non terminal.
Added a few new fields to dyp for debugging purpose (see 10 and 13.1).
__________________________________________________________________________
2007/06/16
Added a keyword %non_terminal which makes possible to include non
terminals in the initial grammar that are not part of any rule.
__________________________________________________________________________
2007/06/13
Generation time of the .ml file is less long.
__________________________________________________________________________
2007/06/11
Some type errors that used to be reported by Caml in the .ml generated
file are now reported in the .dyp file and are less complex.
__________________________________________________________________________
2007/06/08
Fixed a bug that raised Not_found when a starting non terminal was
not used.
Added the keyword %type that behaves as in ocamlyacc.
__________________________________________________________________________
2007/06/07
Added nested rules:
nt1:
| symb1 ( symb2 symb3 { action1 } prio1
| symb4 symb5 { action2 } prio2
| symb6 symb7 { action3 } prio3 ) symb8 symb9 { action4 } prio4
| ...
__________________________________________________________________________
2007/06/06
Fixed a bug that happened when using the option --prio-pt along with
priorities and extensibility of the grammar.
Speed of the automaton generation improved.
__________________________________________________________________________
2007/05/31
Refactoring of the code (the parsing is now table driven instead of
automaton driven).
Speed of the automaton generation improved.
Parsing speed improved.
Memory usage decreased.
The options --automaton LALR and --automaton LR1 are not available
anymore.
New option --version.
A bug that made parse trees lost in case of ambiguity has been fixed.
By default the priorities are now embedded into the automaton (used
to be option --prio-aut), the option --prio-pt disables this.
__________________________________________________________________________
2007/05/23
Bug fixed, caused "Uncaught Exception: "Index out of bounds" at the
initialization of the parser when using priorities.
__________________________________________________________________________
2007/5/20