Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

yacc conflicts #44

Closed
stag019 opened this issue Jan 8, 2015 · 7 comments
Closed

yacc conflicts #44

stag019 opened this issue Jan 8, 2015 · 7 comments
Labels
bug Unexpected behavior / crashes; to be fixed ASAP! rgbasm This affects RGBASM

Comments

@stag019
Copy link
Contributor

stag019 commented Jan 8, 2015

yacc warns about some conflicts and then chooses whatever default it seems was more likely intended. We should remove those conflicts.
I don't understand the yacc language that well, but here's what I was able to figure out.

1. const            :   T_ID                            { $$ = sym_GetConstantValue($1); }
                |   const T_OP_SUB const            { $$ = $1 - $3; }
                |   T_ID  T_OP_SUB T_ID             { $$ = sym_GetDefinedValue($1) - sym_GetDefinedValue($3); }

2. constlist_8bit_entry :               { out_Skip( 1 ); }
                        |   const_8bit  { out_RelByte( &$1 ); }
                        |   string      { char *s; int length; s = $1; length = charmap_Convert(&s); out_AbsByteGroup(s, length); free(s); }
    (const_8bit contains relocconst which contains string)
@ghost
Copy link

ghost commented Jan 9, 2015

I would add:

charmap:
T_POP_CHARMAP string ',' string
| T_POP_CHARMAP string ',' const
;

(const produces string)

Cases 2 and 3 cause reduce/reduce conflicts that are correctly handled but reduce/reduce conflicts should really be solved explictly.

This could be done by detaching the string productions like this:

const:
const_n { $$ = $1; }
| string { $$ = str2int($1); }
;

const_n:
(original const without string production)
;

Then, use const_n instead of const in charmap.

Do the same for relocconst and const_8bit.

Case 1 causes a shift/reduced conflict that is correctly solved. This conflict is controlled and could be left as is.

We could solve cases 1, 2 and 3 by associating a struct value to const and relocconst instead of a numeric value to allow ids, strings and numerical values to be "returned" and then use function calls to convert this struct to int where required.

@ghost
Copy link

ghost commented Jan 10, 2015

Here are two other conflicts - at first sight - due to grammar ambiguities:

  • T_Z80_LD T_MODE_SP comma const_16bit is produced by z80_ld_ss and z80_ld_sp
    (and generate the same code of course)
  • T_Z80_LD T_MODE_HL comma const_16bit is produced by z80_ld_ss and z80_ld_hl
z80_ld_ss:
    T_Z80_LD reg_ss comma const_16bit
    ;

z80_ld_sp:
    T_Z80_LD T_MODE_SP comma T_MODE_HL
    | T_Z80_LD T_MODE_SP comma const_16bit
    ;

z80_ld_hl:
    T_Z80_LD T_MODE_HL comma '[' T_MODE_SP const_8bit ']'
    | T_Z80_LD T_MODE_HL comma T_MODE_SP const_8bit
    | T_Z80_LD T_MODE_HL comma const_16bit
    ;

reg_ss:
    T_MODE_BC
    |   T_MODE_DE
    |   T_MODE_HL
    |   T_MODE_SP
    ;

Removing the redundant rules won't solve the conflicts because in fact this part
of the grammar is not LALR(1) (and not even LR(1)):

e.g.: for z80_ld_sp:
shift/reduce conflict on ','

  324 z80_ld_sp: T_Z80_LD T_MODE_SP . comma T_MODE_HL
  393 reg_ss: T_MODE_SP .

There is no enough lookahead information to determine if the parser are recognizing the generic productions (T_Z80_LD reg_ss comma...)
or the specific ones (T_Z80_LD T_MODE_SP comma ...).
Note that replacing the 'comma' non-terminals by terminal tokens (',') would not change anything.

A solution is to use reg_ss in the specific rules and test if reg_ss is well a T_MODE_SP or T_MODE_HL.
Or let it as is as these conflicts are controlled.

@AntonioND
Copy link
Member

The last one mentioned by @chastai has been fixed in 362aea2

@AntonioND AntonioND added bug Unexpected behavior / crashes; to be fixed ASAP! rgbasm This affects RGBASM labels Apr 2, 2018
@dbrotz
Copy link
Contributor

dbrotz commented Dec 3, 2018

There is only one conflict not mentioned here.

line		: label
		| label cpu_command
		| label macro
		| label simple_pseudoop
		| pseudoop
;

label		: /* empty */
		| T_LABEL
		| T_LABEL ':'
		| T_LABEL ':' ':'
;

pseudoop	: equ
		| set
		| rb
		| rw
		| rl
		| equs
		| macrodef
;

set		: T_LABEL T_POP_SET const
;

cpu_command	: 
		/* ... */
		z80_set
		/* ... */
;

z80_set		: T_POP_SET const_3bit comma reg_r
;

If the parser encounters T_LABEL T_POP_SET at the beginning of a line, it's ambiguous whether it should use line -> label cpu_command -> label z80_set or line -> pseudoop -> set. Bison resolves it as the latter.

This means that x set 5 works, but label1 set 1, a doesn't. That is preferable to the other way around, since the set pseudo-op would not be usable in that case. It would be nice if it could parse both, but that would require extra look-ahead. The workaround is to use a colon after the label (label1: set 1, a).

ISSOtm added a commit to ISSOtm/rgbds that referenced this issue Feb 11, 2020
Implements the fix suggested [here](gbdev#44 (comment)), which performed better than expected!
I'm not \*too\* fond of this but this seems like the right way
@JL2210
Copy link
Contributor

JL2210 commented Apr 8, 2020

I have no experience in YACC, but there appears to be one more shift/reduce in asmy.y:

Terminals unused in grammar

   T_POP_ENDM
   T_POP_ENDR


State 5 conflicts: 1 shift/reduce

@ISSOtm
Copy link
Member

ISSOtm commented Apr 8, 2020

Yeah, there are a couple more. Here are the two problems we currently have:
sym rl N can either be the RL instruction, or the pseudo-op from the _RS group. (This causes #322.)
Then there is sym set N, which can similarly be either creating the read-write constant sym, or the beginning of instruction set. (Discussed by @dbrotz's above message)

Both are technically not ambiguous, but require two (RL) or three (SET) tokens of lookahead, when yacc only has one. This is intended to be fixed by #457, since the ambiguity is caused by colon-less labels.

Also, the two unused terminals are intended, the lexer doesn't read them (macro/rept management is whack) but they're still reserved keywords.

@ISSOtm
Copy link
Member

ISSOtm commented Jan 11, 2021

Since colon-less labels have been removed (removing the set ambiguity) and rl as a declarator has been reinstated (closing #322), we no longer have any conflicts or grammar issues.

@ISSOtm ISSOtm closed this as completed Jan 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Unexpected behavior / crashes; to be fixed ASAP! rgbasm This affects RGBASM
Projects
None yet
Development

No branches or pull requests

5 participants