Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NULL in lexical analysis of f-string #116881

Closed
chepner opened this issue Mar 15, 2024 · 4 comments
Closed

NULL in lexical analysis of f-string #116881

chepner opened this issue Mar 15, 2024 · 4 comments
Assignees
Labels
docs Documentation in the Doc dir interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-parser

Comments

@chepner
Copy link

chepner commented Mar 15, 2024

Documentation

This is somewhat related to #116580. There are two references to NULL in the description of f-strings that don't have a clear meaning.

format_spec       ::=  (literal_char | NULL | replacement_field)*
literal_char      ::=  <any code point except "{", "}" or NULL>

In both cases (but especially literal_char), it could refer to U+0000, but I'm unaware that a null character is allowed anywhere in Python source. At least, my attempt to inject one failed:

>>> print(ast.dump(ast.parse("f'{x:"'\x00'"}'"), indent=2))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/ast.py", line 52, in parse
    return compile(source, filename, mode, flags,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
SyntaxError: source code string cannot contain null bytes

For format_spec, it could refer to an empty specification (f'{x:}'), but (literal_char | replacement_field)* would cover that just as well.

Linked PRs

@chepner chepner added the docs Documentation in the Doc dir label Mar 15, 2024
@ericvsmith
Copy link
Member

Could you point to where in the documentation you're referring? It seems to me that literal_char in your description already says NULL is excluded.

And is there any practical problem this causes?

@terryjreedy terryjreedy added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Mar 16, 2024
@terryjreedy
Copy link
Member

terryjreedy commented Mar 16, 2024

format_spec       ::=  (literal_char | NULL | replacement_field)*
literal_char      ::=  <any code point except "{", "}" or NULL>

are from string literals

In the literal_char token description, NULL has the normal meaning of \0' (\x00'). In the format_spec grammar production, NULL is wrong if taken in the same sense and is confusing since it must be interpreted to mean , which I literally cannot show between backticks because a zero-width background change is invisible. It is redundant because multiple nothings are nothing and that is already included in the production with | NULL removed.

The full grammar has the following, without the unneeded .

fstring_format_spec:
    | FSTRING_MIDDLE 
    | fstring_replacement_field

where FSTRING_MIDDLE is a token. The repetition * is attached to fstring_format_spec in another production.

fstring_full_format_spec:
    | ':' fstring_format_spec*

I am making a PR.

@terryjreedy
Copy link
Member

@encukou Petr, assigning to you after reading your comments on #116580. I will submit the trivial PR but leave it to you to apply now, leave open until later, or close. But it later is months away, please consider correcting what seems to be an overt error now.

terryjreedy added a commit to terryjreedy/cpython that referenced this issue Mar 16, 2024
In Lexical Analysis f-strings section, NULL in the description
of 'literal character' means '\0'.  In the format_spec grammar
production, it is wrong with that meaning and redundant if
instead interpreted as <nothing>.  Remove it there.
encukou pushed a commit that referenced this issue Mar 18, 2024
In Lexical Analysis f-strings section, NULL in the description
of 'literal character' means '\0'.  In the format_spec grammar
production, it is wrong with that meaning and redundant if
instead interpreted as <nothing>.  Remove it there.
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Mar 18, 2024
…-116885)

In Lexical Analysis f-strings section, NULL in the description
of 'literal character' means '\0'.  In the format_spec grammar
production, it is wrong with that meaning and redundant if
instead interpreted as <nothing>.  Remove it there.
(cherry picked from commit 4e45c6c)

Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Mar 18, 2024
…-116885)

In Lexical Analysis f-strings section, NULL in the description
of 'literal character' means '\0'.  In the format_spec grammar
production, it is wrong with that meaning and redundant if
instead interpreted as <nothing>.  Remove it there.
(cherry picked from commit 4e45c6c)

Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu>
@encukou
Copy link
Member

encukou commented Mar 18, 2024

Thanks for the PR, and for pinging me!
Yup, my PR for lexical_analysis is months away. I'd like to be aware of work in this area, not to block it :)

terryjreedy added a commit that referenced this issue Mar 18, 2024
…) (#116951)

In Lexical Analysis f-strings section, NULL in the description
of 'literal character' means '\0'.  In the format_spec grammar
production, it is wrong with that meaning and redundant if
instead interpreted as <nothing>.  Remove it there.
(cherry picked from commit 4e45c6c)

Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu>
terryjreedy added a commit that referenced this issue Mar 18, 2024
…) (#116952)

In Lexical Analysis f-strings section, NULL in the description
of 'literal character' means '\0'.  In the format_spec grammar
production, it is wrong with that meaning and redundant if
instead interpreted as <nothing>.  Remove it there.
(cherry picked from commit 4e45c6c)

Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu>
vstinner pushed a commit to vstinner/cpython that referenced this issue Mar 20, 2024
…-116885)

In Lexical Analysis f-strings section, NULL in the description
of 'literal character' means '\0'.  In the format_spec grammar
production, it is wrong with that meaning and redundant if
instead interpreted as <nothing>.  Remove it there.
adorilson pushed a commit to adorilson/cpython that referenced this issue Mar 25, 2024
…-116885)

In Lexical Analysis f-strings section, NULL in the description
of 'literal character' means '\0'.  In the format_spec grammar
production, it is wrong with that meaning and redundant if
instead interpreted as <nothing>.  Remove it there.
diegorusso pushed a commit to diegorusso/cpython that referenced this issue Apr 17, 2024
…-116885)

In Lexical Analysis f-strings section, NULL in the description
of 'literal character' means '\0'.  In the format_spec grammar
production, it is wrong with that meaning and redundant if
instead interpreted as <nothing>.  Remove it there.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation in the Doc dir interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-parser
Projects
None yet
Development

No branches or pull requests

5 participants