Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature debate] Remove column constraint on assignments and macro calls #457

Closed
ISSOtm opened this issue Dec 7, 2019 · 28 comments
Closed
Labels
enhancement Typically new features; lesser priority than bugs rgbasm This affects RGBASM

Comments

@ISSOtm
Copy link
Member

ISSOtm commented Dec 7, 2019

We've all had this before.

my_macro: MACRO
    PRINTT "Wheeeeeeee"
ENDM

my_macro
ERROR: a.asm(5):
    'my_macro' already defined in a.asm(1)

 label:
ERROR: b.asm(1):
    Macro 'label' not defined

Countless users, newbies or not, have been bitten by this before. It would be much better if labels could be declared at not-column-1, and macros called at column 1.

HOWEVER! This is there for a reason: RGBASM allows declaring labels without a colon. This cannot be distinguished from a macro call, so the column is used to decide.
In practice, labels are usually defined using colons, lifting the ambiguity; there is an exception for local labels, but the presence of a dot removes the ambiguity too (see #423). Even if the dot is in the middle of the label, it's still unambiguous.

tl;dr: the column restriction can be removed, potentially saving a lot of headaches, but then it will break code that declares labels without colons. Is this worth it?

PS: I realized that #362 being fixed may remove a use case for the feature in sights., so that's good.

@pinobatch
Copy link
Member

If anything, this would reduce the need for my dedenter script (which was the motivation for #305).

it will break code that declares labels without colons.

Only if a label shares a name with a macro or mnemonic, correct? Otherwise, a construction analogous to ca65 .feature could be used to make the removal of indentation sensitivity opt-in.

Would the following decision rules be helpful?

  1. Colon or period at expected point in the line: Assume label (though watch out for ambiguity caused by floating-point literals)
  2. Column 1: Assume label first, then macro or mnemonic
  3. Other columns: Assume macro or mnemonic first, then label

@ISSOtm
Copy link
Member Author

ISSOtm commented Dec 7, 2019

I looked into the yacc conflicts, and this would solve one (related to set being either an instruction or a variable creation).

@aaaaaa123456789
Copy link
Member

I'd go one step further and completely disallow colonless non-locals. You might want to have a compatibility mode for a while, but the benefits of this alleged feature are so minimal and the fix is so trivial that you might as well get rid of it.

@ISSOtm
Copy link
Member Author

ISSOtm commented Dec 7, 2019

Maybe I wasn't clear enough, but I am in favor of removing colonless non-local label declarations.
My question is more about how much usage it is currently getting.

It might be possible to warn users of colon-less label declarations for a while, but I'm sadly not sure how to implement it on a grammar level.

@ISSOtm ISSOtm added enhancement Typically new features; lesser priority than bugs rgbasm This affects RGBASM labels Dec 7, 2019
@AntonioND
Copy link
Member

Can't you print the warning when you add the label to the list of symbols?

@ISSOtm
Copy link
Member Author

ISSOtm commented Dec 7, 2019

How so? Isn't the presence of the colon detected at parsing time?

@AntonioND
Copy link
Member

Isn't the colon saved along with the rest of the string? I have never actually checked.

@ISSOtm
Copy link
Member Author

ISSOtm commented Dec 7, 2019

Nope.

I have looked more into this, and it appears that a big chunk of the problem is that the lexer returns a different token depending on the column number. I have tried changing this for a quick try, and even our own test suite fired back—it uses a ton of colon-less labels.
Considering this would be a big change and we already have a bunch of PRs open (and I don't want to have to resolve conflicts), I rolled back everything.

@ISSOtm
Copy link
Member Author

ISSOtm commented Feb 10, 2020

This is actually impossible to perform with the current grammar, if only as the following proves:

; Define a variable
variable equ 0

uniqdef: MACRO
variable\@ \1 \2
ENDM
; Call a macro
uniqdef equ 0

There is an ambiguity in the language itself—not just in the grammar.

@aaaaaa123456789 suggested replacing the X equ VAL syntax with equ X VAL (which is apparently more common in other assemblers); both should be able to be supported for at least one version, with deprecation warnings.

I implemented this for a shot, and we have a problem: set. This token is highly ambiguous (is it an instruction with a label declaration, or a variable definition? Even moving it to the beginning of the line keeps the ambiguity until you see a comma). While technically the syntax is unambiguous, Yacc considers it a conflict. If there is a way to tell it how to lift the ambiguity, I would love to hear it.
If that's not possible, set as a variable declarator could be deprecated entirely in favor of =.

@aaaaaa123456789
Copy link
Member

As I said before, when you break existing code, it doesn't really matter by how much you break it. Something like @set foo, 42 will do.

@ISSOtm
Copy link
Member Author

ISSOtm commented Feb 10, 2020

If we break it, might as well remove set entirely, given how confusing it is.

@AntonioND
Copy link
Member

If we break it, might as well remove set entirely, given how confusing it is.

Agreed.

@ISSOtm
Copy link
Member Author

ISSOtm commented Feb 12, 2020

After more implementation tweaking and discussion, it looks like this might be a hard barrier for RGBASM.

Here is a summary of what transpired:

Goal

The reason why this fix is wanted is because column-1 rules create a newbie trap, with obscure error messages (ranging from the infamous terse syntax error to macro not defined), and the limit seems really arbitrary anyways.

The problem that invariably arises when attempting to remove this limitation is a bunch of grammar conflicts. But those occur because of ambiguities in the language itself! Consider the following line:

sym EQU val

Under current RGBASM, this is without a doubt defining immutable variable sym. But if we lift the restriction, is it still a variable definition, or are we calling macro sym with argument EQU val? Sure, one could be prioritized over the other if needed, but what if the user wants the other one for whatever reason?
Worse still—disambiguation is impossible, because the lexer state is switched as soon as the T_ID for the macro invocation is encountered. So lookahead is impossible, and either macro invocations or symbol definitions are possible, but not both.

Worse still, label declarations themselves become unreliable. Consider this line:

 bar:
; Note the whitespace!

Under current RGBASM, this does call macro bar with argument : (obviously: the lexer sees the T_ID token ending at the colon). But it's non-ambiguous, given that the whitespace enforces this as a macro call. But if the restriction is lifted, then...

Solutions

(╯°□°)╯︵-┻━┻

Maintain the statu quo. After all, it's not that bad, right? Maybe the error messages can be improved. (Especially the syntax error, but that one has a broader scope)

Label rework

A suggested resolution would be to require colons to be stitched to the T_ID, and if so turn it into a T_LABEL [getting rid of colon-less labels]. This solves the issue with defining labels, but not the other one. This could be fixed by changing the definition syntax: EQU sym val. (The current syntax would be deprecated using a warning for at least one release, also allowing to spot even placed where the definition is generated, eg. in a macro).

The two parts I don't like about this are more mucking with the lexer, which I would be okay with if it was a Flex definition, but it's not, it's hand-modified generated C now. (I should open an issue about my plans regarding this, but I digress. See #485) The second part is that the macro invocations are too "broad", and cause conflicts as soon as a T_ID is encountered first thing on a line. (Or after a label.)

Macro invocation token

The solution to that would be to require all macro calls to be prefixed with a special token (I was thinking of a #, for example). The upside is that all is disambiguated in one fell swoop, the downside would be the number of macro call sites that would require changing in large projects.

As with the previous option, there would be a need for changing a lot of locations in the code, but I believe that the bulk could be automated by parsing the warning message output. I would personally prefer this option because it allows more leniency in future extensions to the grammar.

Alternatives

@pinobatch made ca83, a macro pack for ca65 that allows it to process RGBDS-inspired syntax (still using ca65 features, expression syntax, macro invocations, and so on). There is also HGBASM by @Hawkbat, with very good compatibility. If someone isn't comfortable with RGBDS' limitations it's possible to look for alternatives, without going as far as switching to eg. wla-dx.

As for the specific issue we're dealing with, @pinobatch also made a preprocessor that, assuming all label declarations end with a colon, de-indents them to column 1 where RGBASM can process them.

Considering alternatives exist, do we want to break compatibility to fix this?

@AntonioND
Copy link
Member

AntonioND commented Feb 13, 2020

A suggested resolution would be to require colons to be stitched to the T_ID, and if so turn it into a T_LABEL [getting rid of colon-less labels].

I actually like this idea. It's really annoying to fix code that doesn't comply with the new syntax, but it can be fixed automatically for the most part, and it gets rid of that inconsistency (I never liked the idea that you can omit the colon from labels).

The solution to that would be to require all macro calls to be prefixed with a special token

This is annoying, and not a great solution because many macros try to hide the fact that they are macros. For example, several codebases have macros to emulate djnz by having two instructions. Needing a token would break this illusion. On the other hand, maybe it's a good idea to not hide things...

Overall I think this is too restrictive.

@ISSOtm
Copy link
Member Author

ISSOtm commented Feb 18, 2020

I personally heavily disagree with pseudo-op macros, but people should be able to use them if they want to. So then we'd have to fall back to integrating colons in labels (which is something I'm implementing as part of re-doing #437 better), which means editing the lexer and thus this is blocked by #485 at least for me.

@AntonioND
Copy link
Member

AntonioND commented Feb 20, 2020

Oh, yeah, I dislike the pseudo-op macros, for sure, but people are going to use them anyway so why not allow them...

ISSOtm added a commit that referenced this issue Mar 26, 2020
Because a local symbol is necessarily a label, the disambiguation
is not necessary. This is a first step towards fixing #457.
@ISSOtm
Copy link
Member Author

ISSOtm commented Mar 27, 2020

To clarify the commit just above: after discussion in #423, it was decided that only labels could be scoped. (If we want to switch to more general, e.g. ca65 scoping, we'll consider this later)

Thus, it made the T_LOCAL_LABEL token kind of redundant T_LOCAL_ID; thus, the "column-1" restriction has been lifted on those. (Though, this might not work with the Global.local form, I should check.)

@JL2210
Copy link
Contributor

JL2210 commented Apr 5, 2020

Similar happens for SIZE = 11; you can't have whitespace before it or it's treated as a macro.

@ISSOtm ISSOtm mentioned this issue Apr 8, 2020
@JL2210
Copy link
Contributor

JL2210 commented May 3, 2020

If this is fixed, some tutorials might have to be changed, like https://eldred.fr/gb-asm-tutorial/syntax.html and maybe https://teamlampoil.se/book/gbasmdev.pdf

ISSOtm added a commit to ISSOtm/rgbds that referenced this issue Dec 11, 2020
ISSOtm added a commit to ISSOtm/rgbds that referenced this issue Feb 21, 2021
Rangi42 pushed a commit that referenced this issue Feb 23, 2021
@Rangi42 Rangi42 changed the title [Feature debate] Remove column constraint on labels and macro calls [Feature debate] Remove column constraint on macro calls Feb 25, 2021
@Rangi42 Rangi42 changed the title [Feature debate] Remove column constraint on macro calls [Feature debate] Remove column constraint on assignments and macro calls Feb 25, 2021
@Rangi42
Copy link
Contributor

Rangi42 commented Feb 25, 2021

The column constraint has been removed on labels, but not on constant definitions or macro calls. Relevant discussion is in #635.

@Rangi42
Copy link
Contributor

Rangi42 commented Mar 14, 2021

Taking this as the "new prefix-keyword constant definition syntax" issue:

As I said in Discord #asm, I'd prefer a prefix DEF syntax to the other suggestions. EQU FOO 42 would change the order of an existing keyword, and reads awkwardly compared to "foo equals 42". EQU FOO = 42 and SET FOO = 42 would be worse, combining the previously-different-meaning EQU and =. Whereas prefix DEF goes with 0.5.0's REDEF, and could easily be added to existing code:

DEF FOO EQU 42
DEF BAR EQUS "spam"
REDEF BAR EQUS "eggs"
DEF x = 10
REDEF x = x + 1
DEF ID RB 3
DEF WORD RW
DEF square(x) = x**2  ; upcoming user-defined functions

Regarding =, I think it would be worth allowing that without a DEF or REDEF. Constants are defined only once, but SET values often get redefined and are used for quick small things like counters in a REPT block. The 0.5.0 lexer already checks for a trailing : to tell labels apart from macros, so it could also check for a single =. This would mean that mac = 42 gets lexed as an assignment, not a macro usage, but there are plenty of workarounds: mac /*hack*/ = 42, or mac {hack} = 42 with hack EQUS "", or mac \ = 42 (line continuation). (And it's very unusual to need such a macro anyway.)

Rangi42 added a commit to Rangi42/rgbds that referenced this issue Mar 15, 2021
This will enable fixing gbdev#457 later once the old
definition syntax is removed.
Rangi42 added a commit to Rangi42/rgbds that referenced this issue Mar 15, 2021
This will enable fixing gbdev#457 later once the old
definition syntax is removed.
@ISSOtm
Copy link
Member Author

ISSOtm commented Mar 15, 2021

Actually, {hack} would not work, but /* hack */ and consorts may not, depending on future changes and/or refactorings. A more reliable workaround would be the addition of a dummy first argument (possibly empty), that then gets either ignored, or shifted away.

@aaaaaa123456789
Copy link
Member

You once suggested adding a character to indicate a macro call. While I'm strongly against requiring such a character, adding an optional character to signal a macro call would solve all of these problems — just mark the line as a macro call and there's no parsing ambiguity. Since all of the parsing ambiguities like this one are corner cases, that's not really an issue.

@Rangi42
Copy link
Contributor

Rangi42 commented Mar 16, 2021

mac = ... is a rare enough use case that, if ident = ... is special-cased to lex as T_LABEL T_OP_EQUALS ... assignment instead of T_ID T_OP_EQUALS ... macro usage, then I'd rather not build new punctuation syntax into the language just for that. At least one of the workarounds (/*hack*/, \ line continuation, dummy first argument) should always be possible if you really need to pass = to a macro.

Edit: a workaround could also be encapsulated in its own macro which would serve the same purpose as such a punctuation character.

macro apply
	def ___mac equs "\1"
	shift
	{___mac} /*hack*/ \#
	purge ___mac
endm

macro mac
	println "6 * 7 \1"
endm

; would print "6 * 7 = 42"
apply mac = 42

Edit 2: Then again, there's no use case for character escapes outside of strings, and \mac = 42 could unambiguously be handled in normal mode. The backslash could fall into the default case to handle identifiers that's already below it, after setting a local flag to make it not be returned as a T_LABEL.

		/* Handle escapes */

		case '\\':
			c = peek(0);

			if (c == ' ' || c == '\r' || c == '\n') {
				readLineContinuation();
				break;
			} else if (c == EOF) {
				error("Illegal backslash at end of input\n");
				break;
			} else {
				preventLabel = true;
			}
			// fallthrough

		/* Handle identifiers and escapes... or error out */

		default:
			if (startsIdentifier(c)) {
				...

				if (tokenType == T_ID && !preventLabel &&
				    (lexerState->atLineStart || peek(0) == ':'))
					return T_LABEL;

				return tokenType;
			}

Even before removing the column constraint, this would be potentially useful to pass a first argument starting with : without whitespace:

macro mac
    println "<\#>"
endm

    mac + hello  ; prints <+ hello>
    mac : hello  ; this would be a label followed by an identifier
    \mac : hello ; would print <: hello>

Rangi42 added a commit to Rangi42/rgbds that referenced this issue Mar 18, 2021
This will enable fixing gbdev#457 later once the old
definition syntax is removed.
ISSOtm pushed a commit that referenced this issue Mar 19, 2021
This will enable fixing #457 later once the old
definition syntax is removed.
@ISSOtm
Copy link
Member Author

ISSOtm commented Mar 19, 2021

Despite that b809384 mentions closing this issue only once the old syntax is removed, it's one of the core components of RGBASM, and thus is scheduled to stay for a while. Thus, I'll be closing this, and we'll open a separate issue to discuss deprecation and/or removal of the old syntax when we get there.

As far as I'm concerned, due to how breaking a change it would be to remove that syntax, I would prefer to keep it as long as it doesn't become a significant blocker / maintenance burden. (As it happens now, it's just T_LABEL support in the lexer, and a few extra parser rules: not much.)

@ISSOtm ISSOtm closed this as completed Mar 19, 2021
@Rangi42
Copy link
Contributor

Rangi42 commented Mar 19, 2021

I'm also fine with keeping the old syntax for EQU/EQUS/=/etc, now that DEF allows them to be defined with indentation.

The only limitation of keeping them around is that it prevents macros from being invoked at column 1, but there's less demand for that than there is for indented definitions anyway.

@ISSOtm
Copy link
Member Author

ISSOtm commented Mar 19, 2021

Only thing to keep track of is updating the FAQ for 0.5.0, to mention the new syntax for macro definitions, and the fact that they fix the problem.

@Rangi42
Copy link
Contributor

Rangi42 commented Mar 19, 2021

Both the old and new syntaxes for defining macros, mac: MACRO and MACRO mac, can be indented. But then mac must be indented to use it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Typically new features; lesser priority than bugs rgbasm This affects RGBASM
Projects
None yet
Development

No branches or pull requests

6 participants