The lexer doesn't take into account goal symbols #294

Razican · 2020-04-01T13:35:12Z

Getting the code from the harness/assert.js file in the test262 suite, this fails to parse:

function assert(mustBeTrue, message) {
  if (mustBeTrue === true) {
    return;
  }

  if (message === undefined) {
    message = 'Expected true but got ' + assert._toString(mustBeTrue);
  }
  $ERROR(message);
}

assert._isSameValue = function (a, b) {
  if (a === b) {
    // Handle +/-0 vs. -/+0
    return a !== 0 || 1 / a === 1 / b;
  }

  // Handle NaN vs. NaN
  return a !== a && b !== b;
};

With the error:

ParsingError: Expected one of ';', '}' or 'line terminator', got 'b' at line 24, col 24

Interestingly, the error is given in column 24 (marked here):

    return a !== 0 || 1 / a === 1 / b;
                       ^

But I'm guessing the issue is with the last b. I think this is a regression, since this particular file parsed fine before the rewrite, but it might have been something new in the file. Once fixed, we need to add a test for it.

The text was updated successfully, but these errors were encountered:

Razican · 2020-04-01T18:00:41Z

I'll work on fixing this. Ideally, this should be fixed before Boa 0.7.

HalidOdat · 2020-04-01T19:53:42Z

After some debugging the error seems to be coming form:

1 / a === 1 / b

Checking the token stream: cargo run -- --dump-tokens

[
    Token {
        kind: NumericLiteral(
            1.0,
        ),
        pos: Position {
            column_number: 1,
            line_number: 1,
        },
    },
    Token {
        kind: RegularExpressionLiteral(
            " a === 1 ",
            "",
        ),
        pos: Position {
            column_number: 3,
            line_number: 1,
        },
    },
	...
]

It seems to be a lexer bug not a parser one. It lexes / a === 1 / as a regex.

Hope that helps. :)

jasonwilliams · 2020-04-01T19:57:33Z

Good find!
I do like the ast and token output we have now

jasonwilliams · 2020-04-01T19:58:47Z

Do we need to refactor the lexer now? 😂

HalidOdat · 2020-04-01T20:01:10Z

I'm not exactly sure what the lexer is supposed to do in this situation, since / a === 1 / is a regex. but not in this context 1 / a === 1 / b.

HalidOdat · 2020-04-01T20:04:53Z

we probably need a context aware lexer or something like that. any thoughts?

Razican · 2020-04-01T20:19:43Z

hmmm interesting. We could maybe check what token can precede a regex? and see if the previous token is one of those?

I also removed an unused function in the parser and added a test for #294, currently ignored.

jasonwilliams · 2020-04-01T22:41:02Z

we probably need a context aware lexer or something like that. any thoughts?

I'm sure the Lexer is context aware in some places, so it shouldn't be too hard to see what's before it and work it out based on that. Basically what @Razican said

Razican · 2020-04-02T09:18:54Z

There seems to be some information on context-aware lexical grammar in the spec. I will review this today and see if I can improve the lexer.

jasonwilliams · 2020-04-03T08:40:27Z

https://v8.dev/blog/understanding-ecmascript-part-3
Funnily enough this post came out talking about the same thing.

Razican · 2020-04-03T12:35:38Z

Reading this, it seems that the parser needs to call the lexer, and we cannot have a full list of tokens before calling the parser. So this clearly needs a rewrite in the way the parser/lexer work together.

I would say we can release version 0.7 with this known limitation, and we can work on this later. I'm still working on the parser modularization, which I think it's a good point to start.

I think that String interning would also help with this, as we could maybe have Tokens that are Copy, and therefore ease their manipulation from one side to another.

I also removed an unused function in the parser and added a test for #294, currently ignored.

Razican added the bug Something isn't working label Apr 1, 2020

Razican added a commit that referenced this issue Apr 1, 2020

Fixed positions in regexes and strict operators.

6014f4d

I also removed an unused function in the parser and added a test for #294, currently ignored.

Razican mentioned this issue Apr 1, 2020

Fixed positions in regexes and strict operators #295

Merged

Razican changed the title ~~New parser fails with complex expressions~~ The lexer doesn't take into account goal symbols Apr 3, 2020

jasonwilliams pushed a commit that referenced this issue Apr 4, 2020

Fixed positions in regexes and strict operators. (#295)

4ed7122

I also removed an unused function in the parser and added a test for #294, currently ignored.

Razican mentioned this issue Apr 11, 2020

Modularized parser #304

Merged

jasonwilliams mentioned this issue Apr 13, 2020

Streaming Parsing (from Lexer to Parser) #288

Closed

Razican added the lexer Issues surrounding the lexer label May 9, 2020

jasonwilliams mentioned this issue May 9, 2020

setup test262 harness #12

Closed

5 tasks

jasonwilliams added this to the v0.10.0 milestone May 17, 2020

Razican self-assigned this May 26, 2020

Razican mentioned this issue May 31, 2020

Started with the new lexer implementation #432

Closed

Razican mentioned this issue Jun 16, 2020

New lexer #486

Closed

Lan2u mentioned this issue Jul 26, 2020

New lexer #559

Merged

Razican linked a pull request Aug 16, 2020 that will close this issue

New lexer #559

Merged

HalidOdat closed this as completed in #559 Aug 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The lexer doesn't take into account goal symbols #294

The lexer doesn't take into account goal symbols #294

Razican commented Apr 1, 2020 •

edited

Loading

Razican commented Apr 1, 2020

HalidOdat commented Apr 1, 2020 •

edited

Loading

jasonwilliams commented Apr 1, 2020

jasonwilliams commented Apr 1, 2020

HalidOdat commented Apr 1, 2020 •

edited

Loading

HalidOdat commented Apr 1, 2020

Razican commented Apr 1, 2020

jasonwilliams commented Apr 1, 2020

Razican commented Apr 2, 2020

jasonwilliams commented Apr 3, 2020 •

edited

Loading

Razican commented Apr 3, 2020

The lexer doesn't take into account goal symbols #294

The lexer doesn't take into account goal symbols #294

Comments

Razican commented Apr 1, 2020 • edited Loading

Razican commented Apr 1, 2020

HalidOdat commented Apr 1, 2020 • edited Loading

jasonwilliams commented Apr 1, 2020

jasonwilliams commented Apr 1, 2020

HalidOdat commented Apr 1, 2020 • edited Loading

HalidOdat commented Apr 1, 2020

Razican commented Apr 1, 2020

jasonwilliams commented Apr 1, 2020

Razican commented Apr 2, 2020

jasonwilliams commented Apr 3, 2020 • edited Loading

Razican commented Apr 3, 2020

Razican commented Apr 1, 2020 •

edited

Loading

HalidOdat commented Apr 1, 2020 •

edited

Loading

HalidOdat commented Apr 1, 2020 •

edited

Loading

jasonwilliams commented Apr 3, 2020 •

edited

Loading