gh-118926: Deferred reference counting GC changes for free threading #121318

Fidget-Spinner · 2024-07-03T10:00:40Z

This PR mainly introduces GC changes to the free threading GC to support deferred reference counting in the future.

To get this to work, new stack references must immediately live on the stack, without any interfering Py_DECREF or escaping code between when we put them on the stack. This ensures they are visible to the GC.

This also NULLs out the rest of the stack because the GC scans the entire stack. The stack pointer may be inconsistent between escaping calls (including those to Py_DECREF) so we need to scan the whole stack.

This PR removes the temporary immortalization introduced previously in #117783. I wanted to do this in a separate PR, but the only way to test this properly is to remove that hack. So it has to be bundled in this PR.

Finally, this PR fixes a few bugs in steals and borrows. This was only caught by the GC changes, not by the debugger that I was working on. Since these are untestable without the GC changes, I bundled them in.

Temporary perf regressions (6 threads):

object_cfunction PASSED: 3.6x faster
cmodule_function FAILED: 2.1x faster (expected: LOAD_ATTR from dict)
generator PASSED: 3.7x faster
pymethod FAILED: 2.5x faster (expected: LOAD_ATTR from pytype_lookup)
pyfunction FAILED: 2.9x faster (expected: LOAD_GLOBAL)
module_function FAILED: 2.9x faster (expected: LOAD_ATTR from module dict)
load_string_const PASSED: 4.0x faster
load_tuple_const PASSED: 3.8x faster
create_closure FAILED: 2.3x faster (unsure why)
create_pyobject FAILED: 1.5x faster (expected: LOAD_GLOBAL)

Issue: GC may have inconsistent view of stacktop with deferred references #118926

This reverts commit 6b63066.

Fidget-Spinner · 2024-07-03T10:04:27Z

!buildbot nogil

bedevere-bot · 2024-07-03T10:04:30Z

🤖 New build scheduled with the buildbot fleet by @Fidget-Spinner for commit 8cb139f 🤖

The command will test the builders whose names match following regular expression: nogil

The builders matched are:

AMD64 Ubuntu NoGIL PR
aarch64 Fedora Rawhide NoGIL PR
x86-64 MacOS Intel ASAN NoGIL PR
AMD64 Ubuntu NoGIL Refleaks PR
AMD64 Fedora Rawhide NoGIL refleaks PR
aarch64 Fedora Rawhide NoGIL refleaks PR
PPC64LE Fedora Rawhide NoGIL refleaks PR
AMD64 Windows Server 2022 NoGIL PR
ARM64 MacOS M1 Refleaks NoGIL PR
PPC64LE Fedora Rawhide NoGIL PR
x86-64 MacOS Intel NoGIL PR
ARM64 MacOS M1 NoGIL PR
AMD64 Fedora Rawhide NoGIL PR

Fidget-Spinner · 2024-07-03T12:50:11Z

!buildbot nogil

bedevere-bot · 2024-07-03T12:50:14Z

🤖 New build scheduled with the buildbot fleet by @Fidget-Spinner for commit 700c2fd 🤖

The command will test the builders whose names match following regular expression: nogil

The builders matched are:

AMD64 Ubuntu NoGIL PR
aarch64 Fedora Rawhide NoGIL PR
x86-64 MacOS Intel ASAN NoGIL PR
AMD64 Ubuntu NoGIL Refleaks PR
AMD64 Fedora Rawhide NoGIL refleaks PR
aarch64 Fedora Rawhide NoGIL refleaks PR
PPC64LE Fedora Rawhide NoGIL refleaks PR
AMD64 Windows Server 2022 NoGIL PR
ARM64 MacOS M1 Refleaks NoGIL PR
PPC64LE Fedora Rawhide NoGIL PR
x86-64 MacOS Intel NoGIL PR
ARM64 MacOS M1 NoGIL PR
AMD64 Fedora Rawhide NoGIL PR

Fidget-Spinner · 2024-07-03T13:26:32Z

!buildbot nogil

bedevere-bot · 2024-07-03T13:26:35Z

🤖 New build scheduled with the buildbot fleet by @Fidget-Spinner for commit cde931d 🤖

The command will test the builders whose names match following regular expression: nogil

The builders matched are:

AMD64 Ubuntu NoGIL PR
aarch64 Fedora Rawhide NoGIL PR
x86-64 MacOS Intel ASAN NoGIL PR
AMD64 Ubuntu NoGIL Refleaks PR
AMD64 Fedora Rawhide NoGIL refleaks PR
aarch64 Fedora Rawhide NoGIL refleaks PR
PPC64LE Fedora Rawhide NoGIL refleaks PR
AMD64 Windows Server 2022 NoGIL PR
ARM64 MacOS M1 Refleaks NoGIL PR
PPC64LE Fedora Rawhide NoGIL PR
x86-64 MacOS Intel NoGIL PR
ARM64 MacOS M1 NoGIL PR
AMD64 Fedora Rawhide NoGIL PR

This reverts commit 41c6218.

markshannon · 2024-07-15T13:17:27Z

One possibility to keep things moving for the free-threading build is to generate a different generated_cases.h for the free-threading interpreter.
For that case, you could write all values to the stack before any decrefs or escaping calls.

Fidget-Spinner · 2024-07-15T13:45:00Z

It might be quite a lot of work, but I really think we should mark any escaping calls, so that the code generator knows what escapes. DECREF_INPUTS is already handled by the code generator, so no changes to the code should be necessary in bytecodes.c regarding reference count operations.

Ok I can port your old PR over to this one. I presume we're flushing the values to the stack before every escaping call?

colesbury · 2024-07-15T14:46:42Z

@markshannon, writing all values to the stack is going to introduce an unnecessary performance regression.

This is holding up the free-threaded work for what seems like small aesthetic complaints. If the concern is about maintainer confusion, we can add comments or otherwise document the dozen or so places where this is used.

markshannon · 2024-07-15T15:49:00Z

This is not just aesthetics.
Maintainability is important. By degrading maintainability, you are making more work for others. Particularly for my team.

You say that "writing all values to the stack will by slow". Only for those parts of the stack that are in memory.
Which is why it is important to leave the choice of which parts of the stack to spill to memory and which parts to spill to the stack up to the code generator, as it will differ for different tiers and different platforms.

the dozen or so places where this is used.

There are a hundred or more places where execution can escape from the interpreter, not counting DECREFs which add hundreds more. It seems unlikely that this PR correctly identifies all cases where a garbage collection can occur and correctly spills all the necessary values to the stack memory.

Even if it is correct, it stills lays traps for the unwary.
For example, _BINARY_OP contains the code:

    DECREF_INPUTS();
    ERROR_IF(res_o == NULL, error);
    res = PyStackRef_FromPyObjectSteal(res_o);

If this were changed to

    res = PyStackRef_FromPyObjectSteal(res_o);
    DECREF_INPUTS();
    ERROR_IF(PyStackRef_IsNull(res), error);

then it would be unsafe, yet there would be no warning or error from any tools.

colesbury · 2024-07-15T16:12:11Z

The rewritten _BINARY_OP example is safe because it uses PyStackRef_FromPyObjectSteal. The concern is with calls to
PyStackRef_FromPyObjectNew.

There are a hundred or more places where execution can escape from the interpreter

We should not be thinking about this in terms of where calls escape from the interpreter. We should be thinking about this in terms of where PyStackRef_FromPyObjectNew() calls occur, and ensure that those are always written to the stack.

If you want to refactor PyStackRef_FromPyObjectNew so that it takes a pointer or similar that's fine.

Fidget-Spinner · 2024-07-15T17:59:46Z

Ok I can port your old PR over to this one. I presume we're flushing the values to the stack before every escaping call?

Sorry I forgot that this may introduce another perf regression on the free-threaded build, even if I limit it just to that. So I'm putting a pause on this. I think we should discuss this on Wednesday.

markshannon · 2024-07-19T16:35:33Z

In general, stackrefs must be spilled to the in-memory stack around any escaping call.
To do this automatically, we need to:

Identify all escaping calls
Track assignments to output values
Raise an error if an escaping call occurs unless all or none of the output values have been assigned
Generate the spill around the call.
- If none of the output values have been defined, we just spill the stack pointer after popping the inputs.
- If all of the output values have been defined, we save the outputs to memory and save the stack pointer before making the escaping call.

Identifying all escaping calls.

We have a whitelist, all other calls are escaping. The list needs updating, but it's mostly right.

Track assignments to output values

Assignments are easy to spot name = .... Then we need do is see if name is an output variable. Any assignments in nested code, or multiple assignments, should be treated as an error.

It is easy enough to change the code move assignments out of branches, and don't want the code gen to have to do flow analysis.

Generating spills around calls.

Once we having identified the escaping call, we will need to walk backwards and forwards around the call to identify the whole statement.
Once we've done that we need to emit the spill before and the reload after (if any reload is needed)

markshannon · 2024-07-19T16:42:13Z

In case that sounds too expensive, don't worry. It shouldn't be.

Spilling the stack pointer is cheap as registers often need to be spilled across calls anyway.
We need to save the output values to memory in most cases anyway, we are just doing it a little earlier.

Although escaping calls are common in bytecodes.c, there are not so common dynamically. Dynamically, only about 10% of instructions include an escaping call and for many of those the additional cost is negligible for the reasons given above.

colesbury · 2024-07-19T16:47:47Z

@markshannon, that is not what we discussed and agreed to ~~yesterday~~ Wednesday. What we discussed was tracking PyStackRef_FromPyObjectNew calls from the code generator and spilling those immediately.

We do not need or want to track stackrefs in general for the GC, just PyStackRef_FromPyObjectNew
There's no point in looking for spilling calls and waiting to write the result of PyStackRef_FromPyObjectNew in multiple places.

markshannon · 2024-07-20T09:35:51Z

Tracking just PyStackRef_FromPyObjectNew will solve your immediate problem, but it doesn't help with a broader deferred reference counting implementation nor with top-of-stack caching.

I think what I proposed handles all the use cases, without a significant performance impact.
Even if it turns out the performance impact is too high, the additional analysis with help make any solution more robust.

There's no point in looking for spilling calls and waiting to write the result of PyStackRef_FromPyObjectNew in multiple places.

Why would anything get written multiple times?

Fidget-Spinner · 2024-07-21T12:52:47Z

I think the original goal was to not block TOS caching and full deferred refcounting. With the latest changes, this is now true. The added goal of laying the foundations for the two should be left as another PR. The responsibility of this PR IMO, is to not burden stack caching and full deferred refcounting (which it should have achieved with the latest commit). Supporting more features is out of scope.

colesbury · 2024-07-24T18:32:39Z

@markshannon - I'm most concerned because I thought we were on the same page on this approach after our last meeting.

I don't think starting with this approach precludes future changes, like what you've outlined above. This PR is a bottleneck for a lot of the free-threading deferred reference counting work, so I'd appreciate it if we can figure out how to unblock it.

markshannon

Having the spilling controlled by the code generator is a major improvement.
Thanks for doing that.

I have a few comments

markshannon · 2024-08-02T15:44:04Z

Python/executor_cases.c.h

@@ -1428,6 +1436,8 @@
            PyObject **items = _PyTuple_ITEMS(seq_o);
            for (int i = oparg; --i >= 0; ) {
                *values++ = PyStackRef_FromPyObjectNew(items[i]);
+                #ifdef Py_GIL_DISABLED /* flush specials */


Don't emit anything if there are no values to flush.

markshannon · 2024-08-02T15:51:12Z

Tools/cases_generator/generators_common.py

+    emit_to(out, tkn_iter, "SEMI")
+    out.emit(";\n")
+    target = uop.body.index(tkn)
+    assert uop.body[target-1].kind == "EQUALS", (f"{uop.name} Result of a specials that is flushed"


This is going to crash as soon as anyone writes anything but an assignment in front of PyStackRef_FromPyObjectNew.

We might want to insist that the result of PyStackRef_FromPyObjectNew is assigned to a variable, maybe only an output variable, but that should be an error not a crash.

I've updated the error handling. For the most part, PyStackRef_FromPyObjectNew is now required to be assigned to an output variable, but we also allow assignment to a dereferenced pointer due to usages like:

frame->localsplus[offset + i] = PyStackRef_FromPyObjectNew(o); (in COPY_FREE_VARS)

*values++ = PyStackRef_FromPyObjectNew(items[i]) (in UNPACK_SEQUENCE_LIST)

markshannon · 2024-08-02T15:53:44Z

Python/generated_cases.c.h

+                #ifdef Py_GIL_DISABLED /* flush specials */
+                stack_pointer[-2 - oparg] = method;
+                #endif /* flush specials */
+                stack_pointer[-2 - oparg] = method;


Why is the same value being saved twice?

I updated the generator to avoid saving the same value twice. I got rid of the #ifdef Py_GIL_DISABLED guards because I think they would add too much complexity for no real gain.

markshannon · 2024-08-02T15:58:55Z

Tools/cases_generator/generators_common.py

@@ -177,6 +177,35 @@ def replace_check_eval_breaker(
    if not uop.properties.ends_with_eval_breaker:
        out.emit_at("CHECK_EVAL_BREAKER();", tkn)

+def replace_pystackref_frompyobjectnew(


I don't think that locally replacing PyStackRef_FromPyObjectNew is sufficient in general. It ensures that the references are in memory, but it doesn't ensure that the stack_pointer is saved, so the references might not be visible.

IIRC this is safe for the current FT implementation because you NULL out the unused part of the stack.
Could you add a comment explaining why this is safe for FT even though it doesn't appear to be.

I added a comment

markshannon · 2024-08-02T15:59:40Z

Tools/cases_generator/generators_common.py

+    target = uop.body.index(tkn)
+    assert uop.body[target-1].kind == "EQUALS", (f"{uop.name} Result of a specials that is flushed"
+                                                        " must be written to a stack variable directly")
+    # Scan to look for the first thing it's assigned to


FYI, this sort of analysis should ideally be performed in the analysis phase, not in the code generation phase.
Don't worry about it for now.

I've moved this to analyzer.py since it required a number of changes anyways.

markshannon · 2024-08-02T16:00:40Z

Tools/cases_generator/generators_common.py

+                found_valid_assignment = assgn_target.text
+                break
+
+    if found_valid_assignment:


What happens if we haven't found a valid assignment?
That sounds like an error.

This now raises an analysis_error.

markshannon · 2024-08-02T16:05:30Z

Tools/cases_generator/stack.py

@@ -213,6 +213,24 @@ def flush(self, out: CWriter, cast_type: str = "uintptr_t", extract_bits: bool =
        self.peek_offset.clear()
        out.start_line()

+    def write_variable_to_stack(self, out: CWriter, var_name: str, cast_type: str = "uintptr_t", extract_bits: bool = False) -> None:


It seems odd to be writing individual values to the stack. If any value on the stack needs to be in memory, don't they all?

Would it make more sense to use the flush method?

I'm looking at this since Ken is on vacation.

I think that the logic in this PR (after resolving merge conflicts with main) no longer works after #122286.

Before #122286, the tier1 generator had the outputs on the logical stack when emit_tokens() is called. Now, the inputs are still on the logical stack when emit_tokens is called and the outputs are only pushed after emit_tokens().

I think flush() has the same issue: the logical stack still contains the inputs, not the outputs.

bedevere-app · 2024-08-02T16:08:48Z

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

colesbury · 2024-08-06T19:00:01Z

@markshannon, I've split off the changes to the interpreter generator into a separate PR:

gh-118926: Spill deferred references to stack in cases generator #122748

I think that will make it easier to see the changes to the generated code. (And the merge conflicts were getting unwieldy.)

Fidget-Spinner added 9 commits June 29, 2024 00:24

Fix a few wrong steals in bytecodes.c

34e6b9f

Deferred refcount GC

b3b6851

add a steal

a90a074

Conversion function should have steal variant

e82c2dc

remove a steal

5df23bf

Fix double decref in error path

6b63066

Revert "Fix double decref in error path"

634adcc

This reverts commit 6b63066.

Merge remote-tracking branch 'upstream/main' into deferred_gc

5b55760

Remove immortalize visitor

8cb139f

Fidget-Spinner requested review from markshannon and ericsnowcurrently as code owners July 3, 2024 10:00

bedevere-app bot mentioned this pull request Jul 3, 2024

GC may have inconsistent view of stacktop with deferred references #118926

Closed

bedevere-app bot added the awaiting core review label Jul 3, 2024

Fidget-Spinner added the skip news label Jul 3, 2024

Fidget-Spinner requested a review from colesbury July 3, 2024 10:02

Fidget-Spinner added 2 commits July 3, 2024 18:15

fix the JIT builds

c260fc9

fix buildbot bug

700c2fd

don't deref NULL

cde931d

Fidget-Spinner added the topic-free-threading label Jul 3, 2024

Fidget-Spinner added 5 commits July 3, 2024 23:10

Remove steal conversion, add list_fromstackrefsteal

4e59420

Silence warnings

98f9c0f

Fix error case in CALL

32aaf2b

remove all temporary immortalization

41c6218

Revert "remove all temporary immortalization"

dca2ec3

This reverts commit 41c6218.

Fidget-Spinner added 7 commits July 18, 2024 21:10

Merge remote-tracking branch 'upstream/main' into deferred_gc

34b047e

Automatically flush to stack new

f4b4022

lint

4c0afa1

Simpler implementation

71aed5c

reduce diff

1d19a97

reduce diff

2b383b1

reduce diff again (sorry)

2d2cd7c

markshannon requested changes Aug 2, 2024

View reviewed changes

bedevere-app bot removed the awaiting core review label Aug 2, 2024

bedevere-app bot added the awaiting changes label Aug 2, 2024

Fidget-Spinner closed this Aug 16, 2024

Fidget-Spinner deleted the deferred_gc branch August 18, 2024 13:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-118926: Deferred reference counting GC changes for free threading #121318

gh-118926: Deferred reference counting GC changes for free threading #121318

Fidget-Spinner commented Jul 3, 2024 •

edited

Loading

Fidget-Spinner commented Jul 3, 2024

bedevere-bot commented Jul 3, 2024

Fidget-Spinner commented Jul 3, 2024

bedevere-bot commented Jul 3, 2024

Fidget-Spinner commented Jul 3, 2024

bedevere-bot commented Jul 3, 2024

markshannon commented Jul 15, 2024 •

edited

Loading

Fidget-Spinner commented Jul 15, 2024

colesbury commented Jul 15, 2024

markshannon commented Jul 15, 2024

colesbury commented Jul 15, 2024

Fidget-Spinner commented Jul 15, 2024

markshannon commented Jul 19, 2024

markshannon commented Jul 19, 2024

colesbury commented Jul 19, 2024 •

edited

Loading

markshannon commented Jul 20, 2024

Fidget-Spinner commented Jul 21, 2024

colesbury commented Jul 24, 2024

markshannon left a comment

markshannon Aug 2, 2024

markshannon Aug 2, 2024

colesbury Aug 6, 2024

markshannon Aug 2, 2024

colesbury Aug 6, 2024

markshannon Aug 2, 2024

colesbury Aug 6, 2024

markshannon Aug 2, 2024

colesbury Aug 6, 2024

markshannon Aug 2, 2024

colesbury Aug 6, 2024

markshannon Aug 2, 2024

colesbury Aug 2, 2024

bedevere-app bot commented Aug 2, 2024

colesbury commented Aug 6, 2024

gh-118926: Deferred reference counting GC changes for free threading #121318

gh-118926: Deferred reference counting GC changes for free threading #121318

Conversation

Fidget-Spinner commented Jul 3, 2024 • edited Loading

Fidget-Spinner commented Jul 3, 2024

bedevere-bot commented Jul 3, 2024

Fidget-Spinner commented Jul 3, 2024

bedevere-bot commented Jul 3, 2024

Fidget-Spinner commented Jul 3, 2024

bedevere-bot commented Jul 3, 2024

markshannon commented Jul 15, 2024 • edited Loading

Fidget-Spinner commented Jul 15, 2024

colesbury commented Jul 15, 2024

markshannon commented Jul 15, 2024

colesbury commented Jul 15, 2024

Fidget-Spinner commented Jul 15, 2024

markshannon commented Jul 19, 2024

Identifying all escaping calls.

Track assignments to output values

Generating spills around calls.

markshannon commented Jul 19, 2024

colesbury commented Jul 19, 2024 • edited Loading

markshannon commented Jul 20, 2024

Fidget-Spinner commented Jul 21, 2024

colesbury commented Jul 24, 2024

markshannon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bedevere-app bot commented Aug 2, 2024

colesbury commented Aug 6, 2024

Fidget-Spinner commented Jul 3, 2024 •

edited

Loading

markshannon commented Jul 15, 2024 •

edited

Loading

colesbury commented Jul 19, 2024 •

edited

Loading