Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect validation: space after 'while' keyword #1095

Closed
sirosen opened this issue Feb 1, 2024 · 7 comments
Closed

Incorrect validation: space after 'while' keyword #1095

sirosen opened this issue Feb 1, 2024 · 7 comments
Labels
bug Something isn't working parsing Converting source code into CST nodes

Comments

@sirosen
Copy link

sirosen commented Feb 1, 2024

I was working on a linter/fixer tool and decided to test it against the stdlib to see what it would do.
I was surprised to see it fail on some stdlib modules with CSTValidationErrors.

I can report a number of these if they are useful to have tracked. I've so far seen "space after 'del'" and "space after 'while'".

Limiting ourselves to the while case, here's a trivial reproducer:

while(True):
    pass

From the stdlib, this is in difflib. Example snippet:

while(num_blanks_to_yield < 0):
    num_blanks_to_yield += 1
    yield None,('','\n'),True
@zsol zsol added bug Something isn't working parsing Converting source code into CST nodes labels Feb 2, 2024
@kiri11
Copy link
Contributor

kiri11 commented Jul 26, 2024

I also got no space after assert:

Traceback:

  File "libcst/_nodes/internal.py", line 159, in visit_iterable
    new_child = child.visit(visitor)
                ^^^^^^^^^^^^^^^^^^^^
  File "libcst/_nodes/base.py", line 227, in visit
    _CSTNodeSelfT, self._visit_and_replace_children(visitor)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "libcst/_nodes/statement.py", line 2514, in _visit_and_replace_children
    return Assert(
           ^^^^^^^
  File "<string>", line 8, in __init__
  File "libcst/_nodes/base.py", line 117, in __post_init__
    self._validate()
  File "libcst/_nodes/statement.py", line 2507, in _validate
    raise CSTValidationError("Must have at least one space after 'assert'.")
libcst._nodes.base.CSTValidationError: Must have at least one space after 'assert'.

This is Python 3.12.

@kiri11
Copy link
Contributor

kiri11 commented Jul 26, 2024

Should this be changed to "Must have at least one space, UNLESS the next node has the left parenthesis"?

UPDATE: turns out it is already like that, I was just trying to generate invalid code.

This validation generally useful, would be great to add this validation to an If node, too.

Because currently I can rewrite this code to simplify the condition:

if(x and True):
    ...

And if I don't add parentheses to the new node, LibCST happily produces this with no warnings:

ifx:
    ...

which is a SyntaxError.

@kiri11
Copy link
Contributor

kiri11 commented Jul 26, 2024

@sirosen

I had a similar issue and turns out I was using LibCST incorrectly. And this validation was working fine by preventing me from creating invalid code. Could you please share what are you trying to do? If you are adding just a True node to while.test without whitespace or parentheses you are trying to generate the following:

whileTrue:
    pass

Which is SyntaxError and LibCST validation is working correctly preventing you from generating bad code.

@kiri11
Copy link
Contributor

kiri11 commented Jul 26, 2024

@zsol Turns out this is not a bug. Still, maybe it would be possible to add a feature to automatically add one space in case if a user tries to add a node without whitespace after a keyword? Given that this exception already resulted in confusion at least three times. Thanks!

@sirosen
Copy link
Author

sirosen commented Jul 27, 2024

This was a failure to parse valid code found in the stdlib, not a failure to generate code.

I can see why it's desirable to avoid creating new nodes in this state, but the validation is stopping parsing from succeeding.

@kiri11
Copy link
Contributor

kiri11 commented Jul 27, 2024

@sirosen can you still reproduce? And if so, how?

I just tried to parse the two examples from the first message and it was parsed fine:

python -m libcst.tool print test.py

Output:
parsed_tree.txt

@sirosen
Copy link
Author

sirosen commented Jul 27, 2024

Sorry about this! I answered from my phone since I was not at my desk and I misremembered the details here.
You are entirely correct. My fixer code was removing the parentheses at the time, and I mistakenly thought that this was an issue in libcst. I just wrote myself a tiny roundtrip transform and checked it against several libcst versions and it worked without error.

In case it's useful, I am running my roundtripper (rt.py, included below) on every .py file in the Lib dir of the cpython repo's current main branch. So far, the only errors I see are from test modules which are intentionally invalid, e.g. ./test/tokenizedata/badsyntax_3131.py.

Let's close this! I might open new issues if my roundtripper finds anything interesting worth sharing.

rt.py
import sys

import libcst


class NoopTransformer(libcst.CSTTransformer):
    pass


def _roundtrip_data(content: bytes) -> bytes:
    raw_tree = libcst.parse_module(content)
    wrapped_tree = libcst.MetadataWrapper(raw_tree)
    tree = wrapped_tree.visit(NoopTransformer())
    return tree.code.encode(tree.encoding)


fn = sys.argv[1]
with open(fn, "rb") as fp:
    content = fp.read()

_roundtrip_data(content)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working parsing Converting source code into CST nodes
Projects
None yet
Development

No branches or pull requests

3 participants