-
-
Notifications
You must be signed in to change notification settings - Fork 30.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-116738: Make _json module safe in the free-threading build #119438
base: main
Are you sure you want to change the base?
Conversation
You need to include the file that defines that macro. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revert newlines
Co-authored-by: Nice Zombies <nineteendo19d0@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, maybe add a comment why we don't lock when using PyMapping_Items.
Should we also make the Python implementation thread safe?
On a side note, how would this be ported to a fork of the _json module?
@nineteendo Thanks for the questions. The result of PyMapping_Items (a list of tuples) can still be mutated from different threads (it is not a copy of the items), so needs to be protected by a lock. This I added to the PR. It could very well be that the Python implementation of the JSON encoder is already safe to use (in the sense that the interpreter does not crash) under free threading. (to my knowledge most python statements and builtins have already been made thread safe). If not, then that should be addressed in a separate PR. I do not fully understand the question about porting. Is it not up to the person who forked _json to decide if and how to port and changes? |
I was thinking about race conditions: we first check if the container is empty and only iterate later over the items. [
]
Well, I'm that person. See https://github.com/nineteendo/jsonyx. The main logic is mostly untouched. Can I use the public API for the critical sections? |
@nineteendo The implementation of free-threading (e.g. PEP703) is still work in progress, so things may change. But currently the critical sections for About the race condition: I think there are no guarantees for the result of the json encoder when the data to be encoded is mutated. So yes, race conditions can occur, and depending on how the data is mutated the json output may differ. But this is accepted behaviour. The goal of this PR is intended to prevent the interpreter crashing. |
Shouldn't the empty list always be on a single line? So, without indentation. You can test this by overwriting import io
import json
class BadDict(dict):
def __len__(self) -> int:
return 1
class BadList(list):
def __len__(self) -> int:
return 1
fp = io.StringIO()
json.dump([BadDict(), BadList()], fp, indent=4)
print(fp.getvalue()) [
{
},
]
] Oh well, I managed to output invalid JSON. Some assumptions shouldn't be made. |
@nineteendo Interesting example. Note for recent python versions (in particular the current main branch) the json output depends on whether
has output
This is something to do with the The good thing is that while your example produces funny results, the interpreter does not crash (I checked the C code to make sure the bad list and bad dict are handled correctly). |
It doesn't use the C encoder because it uses more memory than streaming to the file, but the Python implementation is 4x as slow. I thought about rewriting the C code to use streaming, but it will probably be slower as that would wrap _PyUnicodeWriter instead of using it directly. I opted to always use the C encoder. |
You are right. Using the C encoder works (i tried locally and it passes all the tests), but it would indeed use more memory. Would be nice to rewrite the C code so that it can work in streaming mode, but that is for another PR. |
I think we should create a separate issue. Do we fix the race condition or just sub classes (like float and int)? |
I am a bit lost here. Which race condition do you mean? |
This comment was marked as resolved.
This comment was marked as resolved.
In my opinion there is nothing to fix: when different threads are mutating the underlying data, we give no guarantees on the output. But we do guarantee we will not crash the python interpreter. The python implementation will not crash (since all individual python statements are safe). In this PR we modify the C implementation so that no crashes can occur. On the C side we want to make sure that if the underlying list is emptied we do not index into deallocated memory (this would crash the interpreter). (note: for the json encoder the C method that is unsafe for the list access is There are some other PRs addressing safety under the free-threading builds and the feedback there was similar: address the crashes, but don't make guarantees on correct output (at the cost of performance). See |
There's a precedent for guarding against a broken |
(updated description)
Writing JSON files (or encoding to a string) is not thread-safe in the sense that when encoding data to json while another thread is mutating the data, the result is not well-defined (this is true for both the normal and free-threading build). But the free-threading build can crash the interpreter while writing JSON because of the usage of methods like
PySequence_Fast_GET_ITEM
. In this PR we make the free-threading build safe by adding locks in three places in the JSON encoder.Reading from a JSON file is safe: objects constructed are only known to the executing thread. Encoding data to JSON needs a bit more care: mutable Python objects such as a list or a dict could be modified by another thread during encoding.
Py_BEGIN_CRITICAL_SECTION_SEQUENCE_FAST
to project against mutation the listPyDict_Next
is used there). The non-exact dicts usePyMapping_Items
to create a list of tuples.PyMapping_Items
itself is assumed to be thread safe, but the resulting list is not a copy and can be mutated.Test script
t=JsonThreadingTest(number_of_json_dumps=102, number_of_threads=8)
is a factor 25 faster using free-threading. Nice!