-
-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New lexer 2 — Electric Boogaloo #557
Conversation
7f3b0be
to
9b6b2d2
Compare
0189430
to
7aaa1aa
Compare
So I'm wondering if it makes sense to merge this now as you mentioned here: #569 (comment) Even if it's not perfect, while it's not merged it will block any improvement in RBGASM, as you have said. I think that if you're reasonably sure this is working, it could be merged, and then any bugfix can be applied to this code. Is there any reason why this can't be merged? |
|
Sure, I can see that too.
That's not much worse than the current situation. |
7aaa1aa
to
660bdf0
Compare
I fixed the conflicts, actually. I also improved the |
You could leave a TODO comment next to the malloc so that nobody forgets about the leak (I don't know if you have already done this, I haven't checked this last update of the PR!). |
a9a1be5
to
89538cc
Compare
Yes, the comment is there. As for merging the PR, the only downright blocker is |
We could drop Windows support. :P |
I'd just break Windows support until we can remove the need for
open_memstream.
|
To be fair, it's not a bad plan. It's not like we are going to make a release while Windows is broken. All I'd say is to tag a new version right before merging the PR, release binaries, etc, and then there is time to fix whatever is broken. |
89538cc
to
30ccd5e
Compare
Updated to use a new file stack info format in object files, which removes the need for CI reports a random segfault on Windows and specifically Ubuntu 16.04 with Clang... why? |
The lexer itself is very much incomplete, but this is intended to be a safe point to revert to should further implementation go south.
Macro arg detection, first emitted tokens, primitive (bad) column counting
Add keywords and identifiers Add comments Add number literals Add strings Add a lot of new tokens Add (and clean up) IF etc. Improve reporting of unexpected chars / garbage bytes Fix bug with and improved error messages when failing to open file Add verbose-level messages about how files are opened Enforce that files finish with a newline Fix chars returned not being cast to unsigned char (may conflict w/ EOF) Return null path when no file is open, rather than crash Unify and improve error printing slightly Known to be missing: macro expansion, REPT blocks, EQUS expansions
And fix line counting with expansion-made newlines. This has the same bug as the old lexer (equs-newline's output does not print the second warning as being part of the expansion). Additionally, we regress equs-recursion, as we are no longer able to catch this specific EQUS recursion. Simply enough, the new expansion begins **after** the old one ends! I have found no way to handle that.
Attempt to grow it to the max size first. Seriously, if this triggers, *how*
MacOS treats them differently, for some reason.
And added a test to check their behavior
There isn't really a better alternative. Making several mappings instead requires too much bookkeeping.
Gets rid of `open_memstream`, enabling Windows compatibility again Also fixes gbdev#491 as a nice bonus!
Removes a false positive from Clang static analysis
Translate it to \\n regardless of the lexer mode
"Initialization, sizeof, and the assignment operator ignore the flexible array member." Oops!
41b90ad
to
c246942
Compare
Since the lexer buffer wraps, the refilling gets handled in two steps: First, iff the buffer would wrap, the buffer is refilled until its end. Then, if more characters are requested, that amount is refilled too. An important detail is that `read()` may not return as many characters as requested; for this reason, the first step checks if its `read()` was "full", and skips the second step otherwise. This is also where a bug lied. After a *lot* of trying, I eventually managed to reproduce the bug on an OpenBSD VM, and after adding a couple of `assert`s in `peekInternal`, this is what happened, starting at line 724: 0. `lexerState->nbChars` is 0, `lexerState->index` is 19; 1. We end up with `target` = 42, and `writeIndex` = 19; 2. 42 + 19 is greater than `LEXER_BUF_SIZE` (= 42), so the `if` is entered; 3. Within the first `readChars`, **`read` only returns 16 bytes**, advancing `writeIndex` to 35 and `target` to 26; 4. Within the second `readChars`, a `read(26)` is issued, overflowing the buffer. The bug should be clear now: **the check at line 750 failed to work!** Why? Because `readChars` modifies `writeIndex`. The fix is simply to cache the number of characters expected, and use that.
Fixes #485, for real this time.
What's the point?
fstack.c
went from 545 to 400 lines, andlexer.c
went from 1057 + 598 to 2048Note that this lexer does not support "naked" braces, and that's intentional: their behavior was wonky anyways, hardly anyone uses them, and it frees them up for other, more useful behavior;
EQUS
by copying into a buffer of fixed size, it's no longer to error out by expanding a largeEQUS
at the beginning of a buffer (macro orREPT
)It has a few known problems.
open_memstream
. I only used it as a convenience, so we can remove it if needed. Would take some effort, though, and I don't want to duplicate the printing code, either;__FILE__
now leaks memory, but it sees little use, and that code will be changed when Don't terminate RGBDS strings with a NUL #505 is implemented anyways;mmap
might actually not be faster? This needs performance testing. That said, the code path is still needed, because such "file buffers" are used for macros and REPT. I'd expect little to no performance gain on small files, but larger files using macros a lot should see an improvement;mmap
path nevermunmap
s files if they contain a macro, even if the macro isPURGE
d at any point. I also didn't check if nested macros were handled correctly, but I think they are;mmap
, but that should be shimmable around; I'll page @JL2210 to deal with that. (open_memstream
probably won't work there either, but it should be done away with for MinGW anyways, as mentioned above);EQUS
expansion tracking is now harder, due to the different expansion tracking implementation; tokens that end precisely at the end of anEQUS
are not reported as coming from within theEQUS
. It means thatrecurse EQUS "recurse"
does infinitely hang RGBASM.I'm planning to make a "0.4.2-pre" release after this is merged, since this is rewriting the very core of RGBASM, and even after passing the full test suite I still found a bug. The point of that pre-release would be to allow "unstable" downstream packagers and Windows users relying on binaries, to beta-test the new lexer, and help avoiding a horribly broken 0.4.2 release.
Getting #434 vibes? Gee, I wonder. :^)