Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect detection of self-documenting f-strings #39

Closed
GjjvdBurg opened this issue Jan 16, 2020 · 11 comments
Closed

Incorrect detection of self-documenting f-strings #39

GjjvdBurg opened this issue Jan 16, 2020 · 11 comments

Comments

@GjjvdBurg
Copy link

Describe the bug

False-positives are returned for self-documenting f-strings.

To Reproduce

$ cat /tmp/test.py
a = 1
print(f"Hello world={a}")
$ vermin /tmp/test.py
Minimum required versions: 3.8
Incompatible versions:     2

Expected behavior

Expected the minimum required version for this code to be 3.6. A self-documenting f-string has the equals sign on the other side of the variable (i.e. {a=}).

Environment (please complete the following information):

  • Vermin version 0.9.2

Thanks in advance!

@netromdk
Copy link
Owner

Thanks for noticing, @GjjvdBurg! Looking at it now. :)
I can deduce that you were running Python 3.8 to execute Vermin to achieve this result, right?

@netromdk
Copy link
Owner

There were some cases I hadn't noticed the first time around. Thanks again!

@gousaiyang
Copy link
Contributor

Actually there are more cases (tested on Python 3.8 on your patch just now):

Case 1
f'{ a = }'
Expected: detected as 3.8+
Actual: detected as 3.6+
Notes: ASCII whitespaces ([ \t\n\r\f\v]) are allowed before and after the =, and are preserved in the output.

Case 2
f'{a+b=}', f'{a+1=}'
Expected: detected as 3.8+
Actual: detected as 3.6+
Notes: However f'{1+a=}' and f'{1+1=}' are detected correctly.

Case 3
f'{(x:=1)}'
Expected: detected as 3.8+
Actual: detected as 3.6+
Notes: This is not a "self-documenting f-string", this is an f-string containing an assignment expression (which is new in 3.8).

Unfortunately, "self-documenting f-string" or "f-string debugging" is poorly documented in the official documentation (I can only find it in whatsnew and changelog). The PR introducing this change contains some useful information (especially the test cases), along with its corresponding issue (contains relevant discussion when introducing this feature, might be a bit noisy because some previous proposals mentioned were rejected).

@netromdk netromdk reopened this Jan 16, 2020
@netromdk
Copy link
Owner

Thanks, @gousaiyang :) Looking into it.

@netromdk
Copy link
Owner

I already implemented support for named expressions actually, but the sub node wasn't getting visited. So now f'{(x:=1)}' will correctly yield:

!2, 3.8      /tmp/fstr_named_expr.py
  f-strings require 3.6+
  named expressions require 3.8+

netromdk added a commit that referenced this issue Jan 17, 2020
Also, visit sub nodes of f-strings.

Related to #39.
@netromdk
Copy link
Owner

Nice test cases. Have to include those, too.

netromdk added a commit that referenced this issue Jan 17, 2020
netromdk added a commit that referenced this issue Jan 17, 2020
@netromdk
Copy link
Owner

Okay, that should be all the remaining cases. Phew..

@gousaiyang
Copy link
Contributor

I just realized that the false positives happen because the ast module returns an optimized AST (= related information is lost). Which means that f'foo{name=}' is parsed into ast.Constant with value = fooname= (constant string merged) and ast.FormattedValue containing ast.Name name.

Unfortunately, this means that f'{a=}' and f'a={a!r}' will be parsed into the same AST, except for the end_col_offset attribute:

In [26]: ast.parse("f'{a=}'").body[0].value.values[0].__dict__
Out[26]:
{'value': 'a=',
 'kind': None,
 'lineno': 1,
 'col_offset': 0,
 'end_lineno': 1,
 'end_col_offset': 7}

In [27]: ast.parse("f'{a=}'").body[0].value.values[1].__dict__
Out[27]:
{'value': <_ast.Name at 0x29b0cc78c70>,
 'conversion': 114,
 'format_spec': None,
 'lineno': 1,
 'col_offset': 0,
 'end_lineno': 1,
 'end_col_offset': 7}

In [28]: ast.parse("f'a={a!r}'").body[0].value.values[0].__dict__
Out[28]:
{'value': 'a=',
 'kind': None,
 'lineno': 1,
 'col_offset': 0,
 'end_lineno': 1,
 'end_col_offset': 10}

In [29]: ast.parse("f'a={a!r}'").body[0].value.values[1].__dict__
Out[29]:
{'value': <_ast.Name at 0x29b0cc737f0>,
 'conversion': 114,
 'format_spec': None,
 'lineno': 1,
 'col_offset': 0,
 'end_lineno': 1,
 'end_col_offset': 10}

So it seems impossible to differentiate f'{a=}' and f'a={a!r}' with the standard ast module without using the end_col_offset attribute or doing extra parsing of the original source code. Using the end_col_offset attribute might help (deduce from the length of code), but it will be somewhat tricky because the f-string could be very complex like f'foo{f"nested"=}bar{no}baz{another=}'.

@netromdk
Copy link
Owner

Yes, you are exactly right. I've run into this a number of times for different features, and as also mentioned in #35, I've thought about maybe utilizing an additional parser. But it would still be required to function with Python 2.7/3.0, and I still prefer the standard ast parser for maintainability and always being up-to-date.

The source visitor class is given the parsed AST (because errors need to be handled beforehand due to the concurrent behavior), but it could perhaps be given the entire source in string form as well if any extra parsing is necessary - like parsing the [(lineno, col_offset); (end_lineno, end_col_offset)] part for this case. Hm.

@bulletmark
Copy link

There are still trivial cases tripping this so please re-open. E.g:

$ cat tmp.py
a = 0
b = 0
print(f'a={a} b={b}')

$ vermin -vv --no-tips tmp.py
Detecting python files..
Analyzing using 4 processes..
!2, 3.8      /home/mark/tmp.py
  f-strings require 3.6+
  print(expr) requires 2+ or 3+
  self-documenting fstrings require 3.8+

Minimum required versions: 3.8
Incompatible versions:     2

@netromdk
Copy link
Owner

netromdk commented Jul 19, 2020

Sorry for the slow reply.

The problem, as already described above, is that ast optimizes away valuable information. The AST cannot distinguish f'{a=}' from f'a={a}'. This is why your "trivial" case is seen as a self-doc f-string.

With this in mind, I think it would make more sense to disable the entire self-doc f-string detection (but keeping regular f-string detection, of course).

netromdk added a commit that referenced this issue Aug 15, 2020
This is done since the built-in AST cannot distinguish `f'{a=}'` from `f'a={a}'`, for instance,
because it optimizes some information away. And this incorrectly marks some source code as using
fstring self-doc when only using general fstring.
netromdk added a commit that referenced this issue Aug 15, 2020
This is done since the built-in AST cannot distinguish `f'{a=}'` from `f'a={a}'`, for instance,
because it optimizes some information away. And this incorrectly marks some source code as using
fstring self-doc when only using general fstring.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants