Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The lexer doesn't take into account goal symbols #294

Closed
Razican opened this issue Apr 1, 2020 · 11 comments · Fixed by #559
Closed

The lexer doesn't take into account goal symbols #294

Razican opened this issue Apr 1, 2020 · 11 comments · Fixed by #559
Assignees
Labels
bug Something isn't working lexer Issues surrounding the lexer
Milestone

Comments

@Razican
Copy link
Member

Razican commented Apr 1, 2020

Getting the code from the harness/assert.js file in the test262 suite, this fails to parse:

function assert(mustBeTrue, message) {
  if (mustBeTrue === true) {
    return;
  }

  if (message === undefined) {
    message = 'Expected true but got ' + assert._toString(mustBeTrue);
  }
  $ERROR(message);
}

assert._isSameValue = function (a, b) {
  if (a === b) {
    // Handle +/-0 vs. -/+0
    return a !== 0 || 1 / a === 1 / b;
  }

  // Handle NaN vs. NaN
  return a !== a && b !== b;
};

With the error:

ParsingError: Expected one of ';', '}' or 'line terminator', got 'b' at line 24, col 24

Interestingly, the error is given in column 24 (marked here):

    return a !== 0 || 1 / a === 1 / b;
                       ^

But I'm guessing the issue is with the last b. I think this is a regression, since this particular file parsed fine before the rewrite, but it might have been something new in the file. Once fixed, we need to add a test for it.

@Razican Razican added the bug Something isn't working label Apr 1, 2020
@Razican
Copy link
Member Author

Razican commented Apr 1, 2020

I'll work on fixing this. Ideally, this should be fixed before Boa 0.7.

@HalidOdat
Copy link
Member

HalidOdat commented Apr 1, 2020

After some debugging the error seems to be coming form:

1 / a === 1 / b

Checking the token stream: cargo run -- --dump-tokens

[
    Token {
        kind: NumericLiteral(
            1.0,
        ),
        pos: Position {
            column_number: 1,
            line_number: 1,
        },
    },
    Token {
        kind: RegularExpressionLiteral(
            " a === 1 ",
            "",
        ),
        pos: Position {
            column_number: 3,
            line_number: 1,
        },
    },
	...
]

It seems to be a lexer bug not a parser one. It lexes / a === 1 / as a regex.

Hope that helps. :)

@jasonwilliams
Copy link
Member

Good find!
I do like the ast and token output we have now

@jasonwilliams
Copy link
Member

Do we need to refactor the lexer now? 😂

@HalidOdat
Copy link
Member

HalidOdat commented Apr 1, 2020

I'm not exactly sure what the lexer is supposed to do in this situation, since / a === 1 / is a regex. but not in this context 1 / a === 1 / b.

@HalidOdat
Copy link
Member

we probably need a context aware lexer or something like that. any thoughts?

@Razican
Copy link
Member Author

Razican commented Apr 1, 2020

hmmm interesting. We could maybe check what token can precede a regex? and see if the previous token is one of those?

Razican added a commit that referenced this issue Apr 1, 2020
I also removed an unused function in the parser and added a test for #294, currently ignored.
@jasonwilliams
Copy link
Member

we probably need a context aware lexer or something like that. any thoughts?

I'm sure the Lexer is context aware in some places, so it shouldn't be too hard to see what's before it and work it out based on that. Basically what @Razican said

@Razican
Copy link
Member Author

Razican commented Apr 2, 2020

There seems to be some information on context-aware lexical grammar in the spec. I will review this today and see if I can improve the lexer.

@jasonwilliams
Copy link
Member

jasonwilliams commented Apr 3, 2020

https://v8.dev/blog/understanding-ecmascript-part-3
Funnily enough this post came out talking about the same thing.

@Razican
Copy link
Member Author

Razican commented Apr 3, 2020

Reading this, it seems that the parser needs to call the lexer, and we cannot have a full list of tokens before calling the parser. So this clearly needs a rewrite in the way the parser/lexer work together.

I would say we can release version 0.7 with this known limitation, and we can work on this later. I'm still working on the parser modularization, which I think it's a good point to start.

I think that String interning would also help with this, as we could maybe have Tokens that are Copy, and therefore ease their manipulation from one side to another.

@Razican Razican changed the title New parser fails with complex expressions The lexer doesn't take into account goal symbols Apr 3, 2020
jasonwilliams pushed a commit that referenced this issue Apr 4, 2020
I also removed an unused function in the parser and added a test for #294, currently ignored.
@Razican Razican added the lexer Issues surrounding the lexer label May 9, 2020
@jasonwilliams jasonwilliams mentioned this issue May 9, 2020
5 tasks
@jasonwilliams jasonwilliams added this to the v0.10.0 milestone May 17, 2020
@Razican Razican self-assigned this May 26, 2020
@Razican Razican mentioned this issue Jun 16, 2020
@Lan2u Lan2u mentioned this issue Jul 26, 2020
@Razican Razican linked a pull request Aug 16, 2020 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working lexer Issues surrounding the lexer
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants