Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove column 1 restriction for labels with colons #635

Merged
merged 1 commit into from
Feb 23, 2021

Conversation

ISSOtm
Copy link
Member

@ISSOtm ISSOtm commented Dec 11, 2020

Partial fix for #457

What would be left to fully resolve that issue is doing the same for variable definitions (RB, EQU, etc.), which I tried implementing. There was one fundamental (correctness) problem, and a cleanness problem.

Implementation

After reading a (non-local) identifier that wasn't a keyword, the lexer discarded all leading whitespace. Then, without shifting (= using peek with a non-0 argument), it tries to match a symbol-declaring token (keyword or =), and returns T_LABEL if it does.

Correctness problem

If a macro invocation's arguments begins with such a token, which has valid use cases, it would be considered as a label. This is problematic especially for set and rl, which are valid tokens outside of assignments as well.

Cleanness problem

Using peek with an argument larger than 0 is not a good thing. This may expand arguments too early (in some particularly evil code with shift, I believe), and is less performance-efficient than if we had only peek(0). (I want to remove those in the long term).

An alternative would be to perform an exploratory lex, and cache the resulting token for the following lexer call... except that macro args may be lex-able properly, and the lexer reports errors as soon as possible. (I don't think adding a way to suppress errors for such an exploratory lex would be a good idea, either.)

Solutions

  • The grammar could be changed to have variable declarations use a prefix keyword, like other assemblers (name EQU valueEQU name value). Keep in mind that the root cause of this issue is that an identifier at the beginning of a line, if a macro name, must cause a lexer switch immediately after being lexed, so that macro args can be lexed properly.
  • A suggested alternative is to deprecate set, since = is a synonym; this would solve the correctness problem, but not the cleanness one. Further, it wouldn't solve the rl issue. (The reason why rl is currently not a problem is that the lexer never returns the token that the parser expects for that rule.)
  • I found that using {space} with space EQUS " " was a workaround to prevent the exploratory lex from detecting the token while preserving the output argument; however, it's a hack (additionally introducing a leading space in the argument, which might cause problems), and Allow {symbol} interpolation outside of strings #634 would break it by expanding {} pre-peek

@ISSOtm
Copy link
Member Author

ISSOtm commented Dec 11, 2020

Note: update RGBASM(5) to mention that the restriction is relaxed for all labels, not just non-local ones, and for macro defs.

Maybe also mention that the colon must stick directly to the label name? This is actually not the case for non-local labels, but mentioning that in the documentation may make it a bit too confusing.

@Rangi42
Copy link
Contributor

Rangi42 commented Dec 12, 2020

For the sake of alternatives: you could deprecate set and rl (and dl). Renamed to rg and dg for long, or rd and dd for double-word.

@aaaaaa123456789
Copy link
Member

d for double word seems to be the most common usage. I've never seen g for this purpose.

@Rangi42
Copy link
Contributor

Rangi42 commented Dec 12, 2020

I think that removing set and changing rl/dl to rd/dd would be less disruptive than changing EQU and EQUS to be prefix instead of infix. (As for {space}, as you said, it's a hack and #634 will prevent it anyway.)

@ISSOtm
Copy link
Member Author

ISSOtm commented Dec 12, 2020

This still requires some lookahead within the lexer, which is a bad idea for performance and cleanness, and also a maintenance penalty due to having to check specifically for such keywords. Then again, invoking macros with such a token as its first argument becomes impossible, which is merely problem whack-a-mole.

I agree it might be less disruptive, but I don't believe it's a good option. I do not like the latter two options I listed, I only wrote them in case someone had an idea how to resolve their shortcomings. I also realize that option 1 is quite problematic, since for example hardware.inc would have to be rewritten... but the fundamental problem is the grammar.

@Rangi42
Copy link
Contributor

Rangi42 commented Jan 11, 2021

This check:

	if (tokenType == T_ID && lexerState->atLineStart)
		return T_LABEL;

was requiring labels to be at the start of a line, and macro usages to be indented.

The modified check:

	if (tokenType == T_ID && (lexerState->atLineStart || peek(0) == ':'))
		return T_LABEL;

allows labels to be indented or not, but still requires macro usages to be indented, and can't handle mac rl 1 versus newRLLabel rl 1.

How about:

	if (tokenType == T_ID && (!sym || sym->type != SYM_MACRO || peek(0) == ':')
		return T_LABEL;

This should return T_LABEL for:

mac: MACRO
    println "\1"
ENDM
mac rl 1 ; T_ID, prints "rl 1"
    mac rl 1 ; T_ID, prints "rl 1"
foo rl 1 ; T_LABEL, defines foo
    foo rl 1 ; T_LABEL, complains about redefinition

That goes along with disallowing colon-less labels, but should not require reordering EQU/EQUS/SET/=, renaming rl/dl to rd/dd, or a weird hack with {space}.

@ISSOtm
Copy link
Member Author

ISSOtm commented Jan 11, 2021

I refuse to make the behavior dependent on whether a symbol has been defined or not, or its type. It requires more context to understand how a given piece of code works, and I would get rid of EQUS's behavior if not for the large breakage that it'd cause.

SYM equs STRRPL("\1", "this", "that")
SYM equ 42
    purge SYM

First natural reaction is certainly a "what!?".

If mac rl 1 looks like a variable declaration, it should bark like a variable declaration. This is why I insist on finding another way to differentiate, if possible.

@Rangi42
Copy link
Contributor

Rangi42 commented Jan 11, 2021

Fair enough. That sort of SYM equ 42 code confuses me too; {SYM} equ 42 is now a valid equivalent.

Regarding the other solutions:

  • A changed definition syntax (like def SYM equ 42, def x = 10, etc, which would go with user-defined functions like def tiles(n) = n * $10 and with redef s equs strupr("{s}")) would IMO be the most consistent, but it's also very backwards-incompatible.
  • Deprecating set in favor of =, and renaming rl/dl to rd/dd, would make foo rd 1 distinct from mac rl 1 (foo set ... would already lex foo as a macro usage T_ID since colon-less labels are merged). However, this would make it impossible to pass = 1 or rd 1 as an arg to mac.
  • The {space} hack no longer works since Allow {symbol} interpolation outside of strings #634 merged.

Overall I'd rather keep the column 1 restriction, until maybe a 0.5.0 major version update when more severe changes might be acceptable.

@ISSOtm
Copy link
Member Author

ISSOtm commented Jan 14, 2021

I think that the best solution is the def syntax, which we can introduce while deprecating the old syntaxes. (Labels wouldn't use def, and Lab: def X = 0 would also be allowed)

@aaaaaa123456789
Copy link
Member

I think that the best solution is the def syntax, which we can introduce while deprecating the old syntaxes. (Labels wouldn't use def, and Lab: def X = 0 would also be allowed)

def makes no sense for reassigning, though

@Rangi42
Copy link
Contributor

Rangi42 commented Jan 14, 2021

I don't look forward to updating all those definition lines, but if it's a major update that comes along with user-defined functions (which would replace some of our EQUS and macro defs anyway) I guess it's fine.

def makes no sense for reassigning, though

redef could also work for redefining an =.

@ISSOtm
Copy link
Member Author

ISSOtm commented Jan 14, 2021

Should be regex-able. Further, this is the kind of deprecation that I would make last a couple versions, given the size of the change. Alternatively, the behavior could be toggled via a CLI option?

@Rangi42
Copy link
Contributor

Rangi42 commented Jan 14, 2021

If definition syntax is going to be changing anyway, and no longer resembling older assemblers so much, how about changing macro syntax? MACRO mac would fit in more than mac: MACRO, and would let users consistently recognize mac: as a label.

@Rangi42
Copy link
Contributor

Rangi42 commented Jan 14, 2021

So, the verbosity of def x = 1, def n equ $42, and so on still bothers me. How about something like the {space} hack but more robust? Allow \ in macro args just like \,, so that x equ 30 is always an assignment, and x\ equ 30 or x \ equ 30 is always passing ' equ 30' as \1 to x. (Currently that would complain "Begun line continuation, but encountered character 'e'"; under this proposal, that would be allowed inside macro args, and \ would only start a line continuation if it's followed solely by whitespace and possibly a comment. This would also more generally allow any of a macro's args to start or end with whitespace, without them always being stripped.)

@Rangi42
Copy link
Contributor

Rangi42 commented Feb 1, 2021

Re: renaming dl/rl, Sjasm also uses dd for 32-bit data. (RASM uses defi.)

@Rangi42
Copy link
Contributor

Rangi42 commented Feb 10, 2021

Should this merged as-is? (After rebasing to master, and updating rgbasm.5 to document that labels don't have to go in column 1.) It fully enables users to define labels, whether topLevel:, .local, or .localWithColon:, even past column 1. It doesn't fully resolve issue #457, but this is IMO a worthwhile addition on its own, for neater formatting of labels inside an if or macro or (upcoming) inline fragment.

@Rangi42
Copy link
Contributor

Rangi42 commented Feb 17, 2021

@ISSOtm You said in the #716 discussion that this isn't ready to merge. I'm not sure why, or at least, I think something mergeable could be made out of this if it doesn't do what I think.

These case ought to be distinguishable, regardless of leading whitespace:

    ColonLabel:
    .colonLocalLabel:
    .localLabelNoColon
    ColonMac: MACRO
    ENDM

The ambiguous case is this:

something stuff...

There something (no leading ., no trailing :) could be a macro invocation or a newly defined constant. Currently they're distinguished by whether something starts in column 1. They could also be distinguished by whether the following stuff is a definition token (set, =, equ, equs, rb, rw, rl) but that has the problems you discussed above and would be fragile anyway. They could also also be distinguished by whether something is already defined, but that's the C-style lexer hack and is rejected.

Anyway, even if macro invocations and definitions still have to depend on their starting column, I think allowing indented labels would be useful. For one thing, it would make indented multi-line inline fragments from #716 less awkward to write.

@ISSOtm
Copy link
Member Author

ISSOtm commented Feb 21, 2021

Unless CI fails, this should be merge-able. It doesn't solve the most wanted part of the issue, though.

As a temporary hack, we could allow indenting specifically for =... it would help slightly, since I believe = is the most common inside blocks.

Copy link
Contributor

@Rangi42 Rangi42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test case could also verify that indented mac: MACRO defines a macro.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants