Skip to content

gh-126072: Set docstring attribute for module and class #126231

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Nov 8, 2024

Conversation

xuantengh
Copy link
Contributor

@xuantengh xuantengh commented Oct 31, 2024

This is the ongoging work of #126101, where we introduced a new attribute ste_has_docstring in the symbol table entry. This PR aims to set this attribute for modules and classes whenever there are docstrings for them. Meanwhile, it also prevents None being added to co_consts for lambda, annotation and type alias.

@xuantengh xuantengh force-pushed the main branch 2 times, most recently from 1dc125e to 8622d78 Compare November 2, 2024 13:16
@xuantengh
Copy link
Contributor Author

Just kindly to ask any further updates needed? I'm not sure whether it's proper to re-request review in CPython PR workflow.

Copy link
Member

@iritkatriel iritkatriel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excuse formatting, reviewing on my phone.

Python/codegen.c Outdated
Py_ssize_t first_instr = 0;
PyObject *docstring = _PyAST_GetDocString(body);
assert(OPTIMIZATION_LEVEL(c) < 2 || docstring == NULL);
if (docstring) {
assert(ste->ste_has_docstring);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it still true that GetDocstring returns non-null only when ste_has_docstring is true?

Perhaps it should be
if(ste_has_docstring) assert(GetDocstring)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently I think they are equivalent, i.e., ste_has_docstring == 1 iff. there is a docstring during symbol table generation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And when you run python -OO?

Copy link
Contributor Author

@xuantengh xuantengh Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently when running with -OO, the first docstring is removed and the subsequent docstring is converted into JoinedStr in the AST stage. For _PyAST_GetDocString, it will not identify the converted JoinedStr as docstring, so ste_has_docstring will not be set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a test for docstring optimization under -OO.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently when running with -OO, the first docstring is removed and the subsequent docstring is converted into JoinedStr in the AST stage. For _PyAST_GetDocString, it will not identify the converted JoinedStr as docstring, so ste_has_docstring will not be set.

Right, but we want to remove that hack.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do it in this PR?

@xuantengh xuantengh force-pushed the main branch 4 times, most recently from d9c97a6 to 7db42a1 Compare November 4, 2024 11:56
@xuantengh
Copy link
Contributor Author

@iritkatriel Hi, I've encountered an issue when trying to remove f-string conversion in astfold_body. For the following function when opt=0 or 1:

def func():
    "not " + "a docstring"

astfold_body prevents the concatenated string (optimized as const expression statement somewhere during AST optimization I guess) being parsed as docstring by converting it into f-string. But if we remove the logic, the symbol table generation cannot distinguish "not a docstring" is originally a real docstring, or optimized concatenated string.

@iritkatriel
Copy link
Member

@iritkatriel Hi, I've encountered an issue when trying to remove f-string conversion in astfold_body. For the following function when opt=0 or 1:

def func():
    "not " + "a docstring"

astfold_body prevents the concatenated string (optimized as const expression statement somewhere during AST optimization I guess) being parsed as docstring by converting it into f-string. But if we remove the logic, the symbol table generation cannot distinguish "not a docstring" is originally a real docstring, or optimized concatenated string.

I thought the idea is that it uses ste_has_docstring to determine whether there is a docstring or not, rather than the type of the first expression.

@xuantengh
Copy link
Contributor Author

I thought the idea is that it uses ste_has_docstring to determine whether there is a docstring or not, rather than the type of the first expression.

Sorry I'm a little confused now. 😂

If we remove the f-string conversion in the astfold_body function, the _PyAST_GetDocString returns true and set ste_has_docstring incorrectly in symbol table generation stage.

cpython/Python/symtable.c

Lines 1846 to 1848 in 83ba8c2

if (_PyAST_GetDocString(s->v.FunctionDef.body)) {
new_ste->ste_has_docstring = 1;
}

The reason is _PyAST_GetDocString checks whether there is a docstring by inspecting the first expression type. If it's a Constant_kind, then it returns true. If we remove the f-string conversion in AST optimization stage, it cannot distinguish the constant expression is actually a docstring or concatenated string expression. In either cases, it returns true and ste_has_docstring will be set.

If there is something I misunderstood, please correct me, thanks!

@JelleZijlstra
Copy link
Member

I guess the problem is that we perform AST optimization before we build the symtable, so we have to keep doing the f-string conversion to avoid breaking the symtable. Maybe we can instead run the symtable before ast optimization, so we can get rid of the f-string trick in the AST optimizer.

@xuantengh
Copy link
Contributor Author

xuantengh commented Nov 6, 2024

I guess the problem is that we perform AST optimization before we build the symtable

Yeah that's the problem. That's say, we cannot convey the information from AST construction to symtable generation phase.

Maybe we can instead run the symtable before ast optimization, so we can get rid of the f-string trick in the AST optimizer.

Theoratically it's feasible, but I'm afraid that there are side-effects and the scope is beyond this PR.

@iritkatriel
Copy link
Member

My proposal (in #126072 (comment)) was to move the docstring removal to the symtable, so it's all in one place.

@xuantengh
Copy link
Contributor Author

xuantengh commented Nov 6, 2024

My proposal (in #126072 (comment)) was to move the docstring removal to the symtable, so it's all in one place.

But I think the problem is still not addressed, we've lost some information after the AST optimization (the concatenated string has been optimized as constant string). Even though the docstring removal is put together with symtable, it still fails to distinguish in the above case.

@xuantengh xuantengh force-pushed the main branch 2 times, most recently from 75bfe7b to 0f06e57 Compare November 7, 2024 05:30
@xuantengh xuantengh force-pushed the main branch 2 times, most recently from 2194f4f to 390caaa Compare November 8, 2024 06:40
@xuantengh
Copy link
Contributor Author

Revert to the version without swapping AST optimization and symtable generation. The docstring removal in astfold_body can be omitted in cunrrent implementation, but the f-string conversion should be retained. IMHO, removing the f-string conversion will introduce a lot of compromise changes, which even makes the code more complicated.

@iritkatriel
Copy link
Member

Specifically, can we disable the constant string concatenation optimization in AST when opt=1?

No, this would mean we optimise less when we should be optimising more.

I think we should leave symtable after ast_opt and try my suggestion to just move the docstring handling from ast_opt to symtable.

@xuantengh
Copy link
Contributor Author

I think we should leave symtable after ast_opt and try my suggestion to just move the docstring handling from ast_opt to symtable.

As mentioned above, what blocks us doing so is if we leave the docstring handling to symtable, for the following code:

def with_const_expression():
    "aa" * 4

The optimized AST will be:

Module(
  body=[
    FunctionDef(
      name='with_const_expression',
      args=arguments(),
      body=[
        Expr(
          value=Constant(value='aaaaaaaa'))])])

Then in the symtable stage, it cannot distinguish whether the constant string 'aaaaaaaa' is really a docstring, or the optimized concatenated string. So the compiler will incorrectly set ste_has_docstring.

@iritkatriel
Copy link
Member

Ok, I see. I think we should just forget about that for this PR. Can you revert all the changes related to removing the hack, and make this PR just about about module and class change?

@iritkatriel iritkatriel enabled auto-merge (squash) November 8, 2024 14:17
@iritkatriel iritkatriel merged commit 6ec8865 into python:main Nov 8, 2024
39 checks passed
picnixz pushed a commit to picnixz/cpython that referenced this pull request Dec 8, 2024
ebonnal pushed a commit to ebonnal/cpython that referenced this pull request Jan 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants