Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Malformed type comment halts checking #15678

Open
NarvinSingh opened this issue Jul 15, 2023 · 5 comments · May be fixed by #15686
Open

Malformed type comment halts checking #15678

NarvinSingh opened this issue Jul 15, 2023 · 5 comments · May be fixed by #15686
Labels

Comments

@NarvinSingh
Copy link

NarvinSingh commented Jul 15, 2023

Feature

Continue type checking upon encountering a malformed type comment.

Pitch

I work with a very large code base with parts that I import from, but can't change. I'd like to use mypy to type check my code without skipping imports. Unfortunately, some imports have malformed type comments that cause mypy to halt type checking. ignore_errors for the modules with the malformed comments does not seem to work. I think this is because these comments are syntax errors to the ast parser.

I have come up with a workaround, where at the parse step, if we get a syntax error due to one of these comments, we strip it out of the source file and parse it again.

Even though these are in fact syntax errors, the code will still run, because at the end of the day, they are just comments. So I think stripping them out is valid, provided this feature is put behind an option. I've never worked on mypy before, and this will be my first pull request (for open source, as well as mypy), so if someone could point me in the right direction to do the option flag, that would be much appreciated. I'm also having trouble displaying a note message, so any help there would be great too.

Here is an example to reproduce the issue. If you run mypy on this module, mypy will halt, and not report anything other than the syntax error.

# library.py
def library_fn():
    print("Type checking halts after this line because the comment is a syntax error.")  # type: bad
    x: int = "1"
    return x


if __name__ == "__main__":
    res = library_fn()
    print(f"You can still run me, even though I have a syntax error. See, the result is {res}")
NarvinSingh added a commit to NarvinSingh/mypy that referenced this issue Jul 15, 2023
Option to strip type comments with syntax errors before parsing source,
that would otherwise halt type checking.

TODO: Add notes to the report for stripped out comments.
NarvinSingh added a commit to NarvinSingh/mypy that referenced this issue Jul 15, 2023
TODO: Figure out how to construct a unit test if the note message
includes the line text.
@NarvinSingh NarvinSingh linked a pull request Jul 15, 2023 that will close this issue
NarvinSingh added a commit to NarvinSingh/mypy that referenced this issue Jul 16, 2023
Don't decode bytes sources to str, which avoids having to determine the
encoding, which may not be UTF-8.

Ensure errors object is not None before using it.
@ikonst
Copy link
Contributor

ikonst commented Jul 16, 2023

Reviewing #15686, it appears that:

  • there's no way to coax ast to parse type comments leniently
  • in general, ast doesn't support error recovery (the epilogue in the docs suggests Parso)

One option is to do #15686 - trying to strip comments and re-parse. Beyond the ostensible perf issue (which, I agree with OP, likely isn't an issue), it's a bit crude to regex out Python code, e.g. how do you avoid matching print("# type: bad")? Obviously it'd still be a syntax error if you strip it, but maybe an even more confusing syntax error.

Another option is to add disable_type_comments, a new per-module option, which'd make us pass ast.parse(..., type_comments=False, ...). Parsing without type comments would eliminate support for:

  • # type: SomeType annotations (but most code should be using new annotations now)
  • # type: ignore annotations (but it's 3rd party code so we're not reporting errors)

I'm a bit more inclined to the latter option, since it's simpler and doesn't involve "regex code editing". We could raise the issue with cPython / ast to allow more error recovery at least in some cases.

@JelleZijlstra
Copy link
Member

I agree that #15686's approach of stripping only type comments with syntax errors isn't great. It risks creating a confusing situation when a comment that isn't intended as a type comment happens to parse to a valid AST.

Instead we should provide an option that disables type comments completely (while still supporting # type: ignore). Modern code should never use type comments, and #12947 suggests deprecating support. A flag to disable type comments would be a good first step there.

@NarvinSingh
Copy link
Author

Stripping out type comments that trigger syntax errors seemed to be a more incremental approach than ignoring type (except type ignore) comments altogether. And while I agree that regexs aren't great for parsing in general, the approach taken here I think makes them an appropriate tool to use in this situation, because the regex is used in conjunction with the AST parser.

Before a line is stripped, it has to fail the parser first. Because of this, I don't think print("# type: bad") would be stripped, unless there was an actual syntax error on the line (I can add a unit test to validate this). And if there was a syntax error on the line that wasn't due to the comment, and the type comment was stripped, there would still be a syntax error on the line after stripping, and checking would be halted anyway.

@ikonst
Copy link
Contributor

ikonst commented Jul 16, 2023

@JelleZijlstra wrote

Instead we should provide an option that disables type comments completely (while still supporting # type: ignore).

ast doesn't surface comments in raw form: if we do type_comments=True, then we get ignores but also all type comments must be valid. Can't have one without the other.

(What did we do before the Python 3.7 cutoff to parse type comments? The type_comments flag seems to have been introduced in 3.8.)

I'm not sure whether # type: ignore support is a showstopper. OP's use case is "I work with a very large code base with parts that I import from, but can't change", so OP wants type annotations from modules they import, but the goal isn't to type-check those modules. They're essentially "third party code".

Because of this, I don't think print("# type: bad") would be stripped, unless there was an actual syntax error on the line

So let's assume

print("# type: bad')

a syntax error, which you'd try to fix into

print("

which would still result in a syntax error. At that point you wouldn't want to say something like:

    print("
          ^
SyntaxError: unterminated string literal (detected at line 1)

since that'll be misleading, so you'd have to be careful to keep the old SyntaxError around, and raise that instead. I guess that'd doable, but even more complex.

What I'm wondering is whether introducing an option to parse a module with type_comments=False:

  • would get us there without all the extra complexity, and
  • perhaps type_comments=False is less incremental, in that you lose more than the bare minimum, but then it's maybe directionally where we want to go anyway (Deprecate type comment support? #12947).

@AlexWaygood
Copy link
Member

AlexWaygood commented Jul 16, 2023

What did we do before the Python 3.7 cutoff to parse type comments? The type_comments flag seems to have been introduced in 3.8.

We used typed-ast on Python 3.7 and lower. It's a third-party backport of the py38 ast module, to allow parsing type comments as part of the AST on older Python versions (among other things)

NarvinSingh added a commit to NarvinSingh/mypy that referenced this issue Jul 16, 2023
Add unit tests for edges cases that are syntax errors not due to type
comments.
NarvinSingh added a commit to NarvinSingh/mypy that referenced this issue Jul 17, 2023
The reparse function could be expanded upon to tweak the source and
parse it again to handle any number of exceptions bases on options.

This formulation also preserves the original exception and raises it if
the source can't eventually be successfully parsed.
NarvinSingh added a commit to NarvinSingh/mypy that referenced this issue Jul 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants