Skip to content

Implementation of RAISE_SYNTAX_ERROR_NO_COL_OFFSET seems incorrect #126

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pablogsal opened this issue May 10, 2020 · 11 comments · Fixed by python/cpython#20050
Closed

Implementation of RAISE_SYNTAX_ERROR_NO_COL_OFFSET seems incorrect #126

pablogsal opened this issue May 10, 2020 · 11 comments · Fixed by python/cpython#20050

Comments

@pablogsal
Copy link

From python#20020 (comment), it seems that the way RAISE_SYNTAX_ERROR_NO_COL_OFFSET works in not correct as is always assigning the caret to the beginning of the line instead of where the error should be.

@pablogsal pablogsal changed the title Implementation RAISE_SYNTAX_ERROR_NO_COL_OFFSET seems incorrect Implementation of RAISE_SYNTAX_ERROR_NO_COL_OFFSET seems incorrect May 10, 2020
@gvanrossum
Copy link

Well, what do you expect given the name of the macro? :-)

Maybe the value -1 can be used to indicate the offset is unknown?

@pablogsal
Copy link
Author

Well, what do you expect given the name of the macro? :-)

I expected that when the macro is used, no column offset was displayed (so there is no error caret on the final error message).

Maybe the value -1 can be used to indicate the offset is unknown?

Exactly! That is what I was thinking about. It sounds odd to me somehow to point always to the beginning of the line.

@gvanrossum
Copy link

The macro itself looks innocent. If anything's the matter, it's in _PyPegen_raise_error. I experimented with both raising SyntaxError and printing it using the traceback module, and then read the code. The only way to consistently suppress the caret is by setting the offset attribute of the SyntaxError object to None. In Python you can do this like this:

err = SyntaxError("message", ("file.py", 1, None, "text"))

The C equivalent is similar (you still end up having to construct a 4-tuple).

@gvanrossum
Copy link

FWIW while we're at it, we might want to fix all the inconsistencies between the traceback module and the code in pythonrun.c for various out-of-bound values of the offset (both 0 and below and len(text) and above).

@pablogsal
Copy link
Author

The C equivalent is similar (you still end up having to construct a 4-tuple).

We have a code path that does this, but currently is not activated if with_col_number is 0, only if loc is NULL:

https://github.com/python/cpython/blob/master/Parser/pegen/pegen.c#L417-L418

@gvanrossum
Copy link

No, ‘loc’ here is confusingly named — it’s the text of the line.

@pablogsal
Copy link
Author

pablogsal commented May 10, 2020

No, ‘loc’ here is confusingly named — it’s the text of the line.

Indeed, that is confusingly named! We could then pass the None changing the Py_BuildValue to (OiON), passing PyNone if with_col_number=0 instead of col_number and manually constructing the integer value in the happy path (for what currently is just col_number).

@gvanrossum
Copy link

Or just two different calls to Py_BuildValue. The confusing loc name was copied from ast_error() in Python/ast.c. I suppose it means "location" but I don't understand why the code is like that -- probably an ancient mistake when that code was refactored (some of it probably predated the AST).

@lysnikolaou
Copy link
Member

lysnikolaou commented May 11, 2020

The RAISE_SYNTAX_ERROR_NO_COL_OFFSET macro was introduced in python#19782 by @isidentical, where he explained to me that this is actually the expected behaviour in some tests. See python#19782 (comment).

@pablogsal
Copy link
Author

pablogsal commented May 11, 2020

where he explained to me that this is actually the expected behaviour in some tests.

Yeah, but that works only for that particular case. If you make it a bit different it fails. For example:

Old parser:

  File "<stdin>", line 1
    if a: 1 + 1 = 2
          ^
SyntaxError: cannot assign to operator

New Parser:

  File "<stdin>", line 1
    if a: 1 + 1 = 2
    ^
SyntaxError: cannot assign to operator

That's why I am saying that always pointing to the beginning of the line is not correct.

@lysnikolaou
Copy link
Member

lysnikolaou commented May 11, 2020

Oh you're right, this is not correct. It seems that ast_error always set the column number of the SyntaxError to the column offset of the offending node, so maybe that's what we want to do as well.

lysnikolaou added a commit to lysnikolaou/cpython that referenced this issue May 12, 2020
This PR fixes SyntaxError locations, when the caret is not displayed,
by doing the following:
- `col_number` always gets set to the location of the offending
  node/expr. When no caret is to be displayed, this gets achieved
  by setting the object holding the error line to None.
- Introduce a new function `_PyPegen_raise_error_known_location`,
  which can be called, when an arbitrary `lineno`/`col_offset`
  needs to be passed. This function then gets used in the grammar
  (through some new macros and inline functions) so that SyntaxError
  locations of the new parser match that of the old.

Closes we-like-parsers#126.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants