Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nogil violation of atomicity on set operations #126136

Open
luisggpina opened this issue Oct 29, 2024 · 3 comments
Open

nogil violation of atomicity on set operations #126136

luisggpina opened this issue Oct 29, 2024 · 3 comments
Labels
3.13 bugs and security fixes 3.14 new features, bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-free-threading type-bug An unexpected behavior, bug, or error

Comments

@luisggpina
Copy link

luisggpina commented Oct 29, 2024

Bug report

Bug description:

Hi,

We're a research group focused on testing concurrent runtimes. Our work-in-progress prototype found a violation of atomicity on the current nogil build when using concurrent operations on the same set. The program below shows the wrong behavior.

from threading import Thread,Barrier

def t1(b1,s):
    b1.wait()
    s.intersection_update({1,5})

def t2(b1,s):
    b1.wait()
    s.symmetric_difference_update({1, 8, 7, 5, 9})

def normalSetTest(i):

    s = {1,2,3,4,5,6}
    setCombos = [{7,8,9}, set()] # all possible results
    barrier = Barrier(2)
    threads = [ Thread(target= t1, args=(barrier,s,)), Thread(target= t2, args=(barrier,s,)) ]


    for t in threads:
        t.start()
    for t in threads:
        t.join()
    if s not in setCombos:
        print("\tfound bug on iter " + str(i) + ": " + str(s))


print("test begin...")

for n in range(0,2000):
    threads = []

    if n % 100 == 0:
        print(n)

    for i in range(0,100):
        threads.append(Thread(target= normalSetTest, args=(n,)))

    for t in threads:
        t.start()

    for t in threads:
        t.join()

print("test done")

A set starts with {1,2,3,4,5,6} and two threads each make a concurrent operation on it: intersection_update({1,5}) and symmetric_difference_update({1, 8, 7, 5, 9}).

Depending on the order of execution, the possible results are either {7,8,9} or {} (the empty set).

Running the test harness above shows that the (incorrect) result {1, 5} is sometimes possible.

If you have trouble observing this behavior, feel free to increase the first range's last argument to a large integer (e.g., from 2000 to 10000).

We replicated this bug on 3 different machines with varying numbers of cores; and we were able to find it even using a single pair of threads (i.e., the second range's argument from 100 to 1). The current harness uses 100 pairs of threads operating on unrelated sets to speed up bug discovery.

We're happy to provide more details about this bug, and to help developers reproducing it.

Output of python -VV: Python 3.14.0a1+ experimental free-threading build (heads/main:faa3272fb8d, Oct 29 2024, 09:14:25) [GCC 14.2.1 20240805]

Example output:

test begin...
0
100
200
300
        found bug on iter 346: {1, 5}
400
500
600
700
800
900
1000
1100
1200
1300
1400
1500
1600
        found bug on iter 1678: {1, 5}
1700
1800
        found bug on iter 1864: {1, 5}
        found bug on iter 1882: {1, 5}
1900
test Done

@flypoodles and @overlorde are part of the team, adding them so they get notified about further discussion.

CPython versions tested on:

3.13, 3.14, CPython main branch

Operating systems tested on:

Linux

@luisggpina luisggpina added the type-bug An unexpected behavior, bug, or error label Oct 29, 2024
@picnixz picnixz added topic-free-threading interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Oct 29, 2024
@ZeroIntensity
Copy link
Member

Well, the free threading docs explicitly note that you shouldn't rely on the internal locks for thread safety--if it does work at all, it's considered an implementation detail, not a full feature. The internal locks are really only for thread-safe reference counting, they don't have any guarantees about what happens to the object when using it concurrently.

That said, maybe this is a bug with concurrency on set somehow, depending on what's going on. Does this still occur when using a proper threading.Lock?

@colesbury
Copy link
Contributor

I'm not sure how we want to deal with this yet, but I think these reports are useful regardless of whether the behavior constitutes a "bug" or not.

@ZeroIntensity ZeroIntensity added 3.13 bugs and security fixes 3.14 new features, bugs and security fixes labels Oct 29, 2024
@luisggpina
Copy link
Author

Thank you for your insight into this behavior.

We're very excited about the work on removing the GIL and want to help in finding weird combinations of operations that lead to unexpected results. Such combinations are very hard to find by manual inspection of the code, but we have a tool (under development) that can find some of them and even generate nice testing harnesses like the one I shared.

We're guided by the text in the original PEP, in particular this passage:

This PEP proposes using per-object locks to provide many of the same protections that the GIL provides. For example, every list, dictionary, and set will have an associated lightweight lock. All operations that modify the object must hold the object's lock.

The text suggests that the methods in our report should hold a per-object lock (on the set itself) for their whole duration such that they execute atomically as far as the set being destructively mutated is concerned.

Of course, the PEP states that the locks offer less protection than the GIL, but still retaining mutual exclusion for methods operating on the same object:

Instead, per-object locking aims for similar protections as the GIL, but with mutual exclusion limited to individual objects.

So perhaps weird results due to the argument being passed to each method being concurrently mutated are allowed. We are not looking for those, only for how the set itself behaves.

Anyway, I understand that @colesbury is the author of the PEP and it seems that wether this behavior should be allowable is still being considered. Please don't read my post as being pedantic on the verbiage used in the PEP. Apologies if I'm not understanding the PEP correctly, even after reading it carefully a few times; but it seems to me that this behavior should not be allowed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.13 bugs and security fixes 3.14 new features, bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-free-threading type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

4 participants