Malformed type comment halts checking #15678

NarvinSingh · 2023-07-15T02:39:01Z

Feature

Continue type checking upon encountering a malformed type comment.

Pitch

I work with a very large code base with parts that I import from, but can't change. I'd like to use mypy to type check my code without skipping imports. Unfortunately, some imports have malformed type comments that cause mypy to halt type checking. ignore_errors for the modules with the malformed comments does not seem to work. I think this is because these comments are syntax errors to the ast parser.

I have come up with a workaround, where at the parse step, if we get a syntax error due to one of these comments, we strip it out of the source file and parse it again.

Even though these are in fact syntax errors, the code will still run, because at the end of the day, they are just comments. So I think stripping them out is valid, provided this feature is put behind an option. I've never worked on mypy before, and this will be my first pull request (for open source, as well as mypy), so if someone could point me in the right direction to do the option flag, that would be much appreciated. I'm also having trouble displaying a note message, so any help there would be great too.

Here is an example to reproduce the issue. If you run mypy on this module, mypy will halt, and not report anything other than the syntax error.

# library.py
def library_fn():
    print("Type checking halts after this line because the comment is a syntax error.")  # type: bad
    x: int = "1"
    return x


if __name__ == "__main__":
    res = library_fn()
    print(f"You can still run me, even though I have a syntax error. See, the result is {res}")

The text was updated successfully, but these errors were encountered:

Option to strip type comments with syntax errors before parsing source, that would otherwise halt type checking. TODO: Add notes to the report for stripped out comments.

TODO: Figure out how to construct a unit test if the note message includes the line text.

Don't decode bytes sources to str, which avoids having to determine the encoding, which may not be UTF-8. Ensure errors object is not None before using it.

ikonst · 2023-07-16T03:47:42Z

Reviewing #15686, it appears that:

there's no way to coax ast to parse type comments leniently
in general, ast doesn't support error recovery (the epilogue in the docs suggests Parso)

One option is to do #15686 - trying to strip comments and re-parse. Beyond the ostensible perf issue (which, I agree with OP, likely isn't an issue), it's a bit crude to regex out Python code, e.g. how do you avoid matching print("# type: bad")? Obviously it'd still be a syntax error if you strip it, but maybe an even more confusing syntax error.

Another option is to add disable_type_comments, a new per-module option, which'd make us pass ast.parse(..., type_comments=False, ...). Parsing without type comments would eliminate support for:

# type: SomeType annotations (but most code should be using new annotations now)
# type: ignore annotations (but it's 3rd party code so we're not reporting errors)

I'm a bit more inclined to the latter option, since it's simpler and doesn't involve "regex code editing". We could raise the issue with cPython / ast to allow more error recovery at least in some cases.

JelleZijlstra · 2023-07-16T14:21:03Z

I agree that #15686's approach of stripping only type comments with syntax errors isn't great. It risks creating a confusing situation when a comment that isn't intended as a type comment happens to parse to a valid AST.

Instead we should provide an option that disables type comments completely (while still supporting # type: ignore). Modern code should never use type comments, and #12947 suggests deprecating support. A flag to disable type comments would be a good first step there.

NarvinSingh · 2023-07-16T14:58:45Z

Stripping out type comments that trigger syntax errors seemed to be a more incremental approach than ignoring type (except type ignore) comments altogether. And while I agree that regexs aren't great for parsing in general, the approach taken here I think makes them an appropriate tool to use in this situation, because the regex is used in conjunction with the AST parser.

Before a line is stripped, it has to fail the parser first. Because of this, I don't think print("# type: bad") would be stripped, unless there was an actual syntax error on the line (I can add a unit test to validate this). And if there was a syntax error on the line that wasn't due to the comment, and the type comment was stripped, there would still be a syntax error on the line after stripping, and checking would be halted anyway.

ikonst · 2023-07-16T16:24:49Z

@JelleZijlstra wrote

Instead we should provide an option that disables type comments completely (while still supporting # type: ignore).

ast doesn't surface comments in raw form: if we do type_comments=True, then we get ignores but also all type comments must be valid. Can't have one without the other.

(What did we do before the Python 3.7 cutoff to parse type comments? The type_comments flag seems to have been introduced in 3.8.)

I'm not sure whether # type: ignore support is a showstopper. OP's use case is "I work with a very large code base with parts that I import from, but can't change", so OP wants type annotations from modules they import, but the goal isn't to type-check those modules. They're essentially "third party code".

Because of this, I don't think print("# type: bad") would be stripped, unless there was an actual syntax error on the line

So let's assume

print("# type: bad')

a syntax error, which you'd try to fix into

print("

which would still result in a syntax error. At that point you wouldn't want to say something like:

    print("
          ^
SyntaxError: unterminated string literal (detected at line 1)

since that'll be misleading, so you'd have to be careful to keep the old SyntaxError around, and raise that instead. I guess that'd doable, but even more complex.

What I'm wondering is whether introducing an option to parse a module with type_comments=False:

would get us there without all the extra complexity, and
perhaps type_comments=False is less incremental, in that you lose more than the bare minimum, but then it's maybe directionally where we want to go anyway (Deprecate type comment support? #12947).

AlexWaygood · 2023-07-16T16:51:41Z

What did we do before the Python 3.7 cutoff to parse type comments? The type_comments flag seems to have been introduced in 3.8.

We used typed-ast on Python 3.7 and lower. It's a third-party backport of the py38 ast module, to allow parsing type comments as part of the AST on older Python versions (among other things)

Add unit tests for edges cases that are syntax errors not due to type comments.

The reparse function could be expanded upon to tweak the source and parse it again to handle any number of exceptions bases on options. This formulation also preserves the original exception and raises it if the source can't eventually be successfully parsed.

NarvinSingh added the feature label Jul 15, 2023

NarvinSingh added a commit to NarvinSingh/mypy that referenced this issue Jul 15, 2023

Feature ignore-comment-errors notes (python#15678)

494bae3

TODO: Figure out how to construct a unit test if the note message includes the line text.

NarvinSingh linked a pull request Jul 15, 2023 that will close this issue

Type comment strip #15686

Open

NarvinSingh added a commit to NarvinSingh/mypy that referenced this issue Jul 16, 2023

Feature add line text to note (python#15678)

51cc10b

Add unit tests for edges cases that are syntax errors not due to type comments.

NarvinSingh added a commit to NarvinSingh/mypy that referenced this issue Jul 18, 2023

Fix use 2 code paths to fix AnyStr errors (python#15678)

7282168

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Malformed type comment halts checking #15678

Malformed type comment halts checking #15678

NarvinSingh commented Jul 15, 2023 •

edited

Loading

ikonst commented Jul 16, 2023 •

edited

Loading

JelleZijlstra commented Jul 16, 2023

NarvinSingh commented Jul 16, 2023

ikonst commented Jul 16, 2023

AlexWaygood commented Jul 16, 2023 •

edited

Loading

Malformed type comment halts checking #15678

Malformed type comment halts checking #15678

Comments

NarvinSingh commented Jul 15, 2023 • edited Loading

ikonst commented Jul 16, 2023 • edited Loading

JelleZijlstra commented Jul 16, 2023

NarvinSingh commented Jul 16, 2023

ikonst commented Jul 16, 2023

AlexWaygood commented Jul 16, 2023 • edited Loading

NarvinSingh commented Jul 15, 2023 •

edited

Loading

ikonst commented Jul 16, 2023 •

edited

Loading

AlexWaygood commented Jul 16, 2023 •

edited

Loading