diff --git a/doc/development/compressed_state_table/main.md b/doc/development/compressed_state_table/main.md index 9000de86..c84f511b 100644 --- a/doc/development/compressed_state_table/main.md +++ b/doc/development/compressed_state_table/main.md @@ -1,8 +1,45 @@ # Compressed State Table LR parser generates two large tables, action table and GOTO table. -Action table is a matrix of current state and token. Each cell of action table indicates next action (shift, reduce, accept and error). -GOTO table is a matrix of current state and nonterminal symbol. Each cell of GOTO table indicates next state. +Action table is a matrix of states and tokens. Each cell of action table indicates next action (shift, reduce, accept and error). +GOTO table is a matrix of states and nonterminal symbols. Each cell of GOTO table indicates next state. + +Action table of "parse.y": + +| |EOF| LF|NUM|'+'|'*'|'('|')'| +|--------|--:|--:|--:|--:|--:|--:|--:| +|State 0| r1| | s1| | | s2| | +|State 1| r3| r3| r3| r3| r3| r3| r3| +|State 2| | | s1| | | s2| | +|State 3| s6| | | | | | | +|State 4| | s7| | s8| s9| | | +|State 5| | | | s8| s9| |s10| +|State 6|acc|acc|acc|acc|acc|acc|acc| +|State 7| r2| r2| r2| r2| r2| r2| r2| +|State 8| | | s1| | | s2| | +|State 9| | | s1| | | s2| | +|State 10| r6| r6| r6| r6| r6| r6| r6| +|State 11| | r4| | r4| s9| | r4| +|State 12| | r5| | r5| r5| | r5| + +GOTO table of "parse.y": + +| |$accept|program|expr| +|--------|------:|------:|---:| +|State 0| | g3| g4| +|State 1| | | | +|State 2| | | g5| +|State 3| | | | +|State 4| | | | +|State 5| | | | +|State 6| | | | +|State 7| | | | +|State 8| | | g11| +|State 9| | | g12| +|State 10| | | | +|State 11| | | | +|State 12| | | | + Both action table and GOTO table are sparse. Therefore LR parser generator compresses both tables and creates these tables. @@ -17,7 +54,8 @@ See also: https://speakerdeck.com/yui_knk/what-is-expected?slide=52 ### `yypact` & `yypgoto` -`yypact` specifies what to do on the current state. +`yypact` specifies offset on `yytable` for the current state. +As an optimization, `yypact` also specifies default reduce action for some states. Accessing the value by `state`. For example, ```ruby @@ -48,7 +86,11 @@ end ### `yytable` -`yytable` specifies what actually to do on the current state. +`yytable` is a mixture of action table and GOTO table. + +#### For action table + +For action table, `yytable` specifies what actually to do on the current state. Positive number means shift and specifies next state. For example, `yytable[yyn] == 1` means shift and next state is State 1. @@ -59,6 +101,13 @@ For example, `yytable[yyn] == YYTABLE_NINF` means syntax error. Other negative number and zero mean reducing with the rule whose number is opposite. For example, `yytable[yyn] == -1` means reduce with Rule 1. +#### For GOTO table + +For GOTO table, `yytable` specifies the next state for given LSH nonterminal. + +The value is always positive number which means next state id. +It never becomes `YYTABLE_NINF`. + ### `yycheck` `yycheck` validates accesses to `yytable`. @@ -90,7 +139,10 @@ yytable = [ `yypact` is an array of each state offset. ```ruby -yypact = [0, 1] +yypact = [ + 0, # State 0 is not shifted + 1 # State 1 is shifted one to right +] ``` We can access the value of `state1[2]` by consulting `yypact`. @@ -298,7 +350,7 @@ In this case, `0` is the minimum offset number then `YYTABLE_NINF` is `-1`. ### `yypact` & `yypgoto` `yypact` & `yypgoto` are mixture of offset in `yytable` and `YYPACT_NINF` (default reduce action). -The index in `yypact` is state id, the index in `yypgoto` is nonterminal symbol id. +Index in `yypact` is state id and index in `yypgoto` is nonterminal symbol id. `YYPACT_NINF` is the minimum negative number. In this case, `-3` is the minimum offset number then `YYPACT_NINF` is `-4`. @@ -374,10 +426,10 @@ yydefgoto = [ ### `yyr1` & `yyr2` -Both of them are Rule table. +Both of them are tables for rules. `yyr1` specifies nonterminal symbol id of rule's Left-Hand-Side. -`yyr2` specifies the length of the rule, number of symbols on the rule's Right-Hand-Side. -Index 0 +`yyr2` specifies the length of the rule, that is, number of symbols on the rule's Right-Hand-Side. +Index 0 is not used because Rule id starts with 1. ```ruby yyr1 = [ @@ -394,6 +446,10 @@ yyr2 = [ ## How to use tables +See also "parse.rb" which implements LALR parser based on "parse.y" file. + +At first, define important constants and arrays: + ```ruby YYNTOKENS = 9 @@ -419,6 +475,9 @@ yyr2 = [ 0, 2, 0, 2, 1, 3, 3, 3] Determine what to do next based on current state (`state`) and next token (`yytoken`). +The first step to decide action is looking up `yypact` table by current state. +If only default reduce exists for the current state, `yypact` returns `YYPACT_NINF`. + ```ruby # Case 1: Only default reduce exists for the state # @@ -438,6 +497,11 @@ if offset == YYPACT_NINF # true end ``` +If both shift and default reduce exists for the current state, `yypact` returns offset in `yytable`. +Index is the sum of `offset` and `yytoken`. +Need to check index before access to `yytable` by consulting `yycheck`. +Index can be out of range because blank cells on head and tail are omitted then need to check index is not less than 0 and not greater than `YYLAST`, see how `yycheck` is constructed in the example above. + ```ruby # Case 2: Both shift and default reduce exists for the state # @@ -493,6 +557,26 @@ end ### Execute (default) reduce +Once next action is decided to default reduce, need to determine + +1. the rule to be applied +2. the next state from GOTO table + +Rule id for the default reduce is stored in `yydefact`. +`0` in `yydefact` means syntax error so need to check the value is not `0` before continue the process. + +Once rule is determined, the lenght of the rule can be decided from `yyr2` and the LHS nonterminal can be decided from `yyr1`. + +The next state is determined by LHS nonterminal and the state after reduce. +GOTO table is also compressed into `yytable` then the process to decide next state is similar to `yypact`. + +1. Look up `yypgoto` by LHS nonterminal. Note `yypact` is indexed by state but `yypgoto` is indexed by nonterminal. +2. Check the value on `yypgoto` is `YYPACT_NINF` is not. +3. Check the index, sum of offset and state, is out of range or not. +4. Check `yycheck` table before access to `yytable`. + +Finally push the state to the stack. + ```ruby # State 11 #