-
-
Notifications
You must be signed in to change notification settings - Fork 30.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-89188: add PyUnicode_Data
and PyUnicode_GetKind
#115211
base: main
Are you sure you want to change the base?
Conversation
Sorry if I wasn't clear, and sorry to spoil the fun, but I wasn't talking about only the stable ABI, but about the API in general.
But what you're exposing here are implementation details. We don't want to expose any more of them than we already historically have. A good API for this will need more design. Before that, PyO3 can keep doing what it's doing now -- it's fragile, but that's because we want to change the details in the future, not set them in stone now. Let's talk privately about how to best do this. If you want to put in more of your time, I'll be happy to mentor or pair-program. In case other core devs don't agree with my opinion:
|
No fun spoiled at all! Regardless of the acceptance of this patch, it was nice to do this on stream so that others could see some of the steps to contributing a patch to CPython (with a bit of PyO3-based testing thrown in), and also so that folks could correct me where my C knowledge fell short. 😄 I had understood you that these are definitely not candidates for inclusion in the stable ABI, and that there are good reasons to not want these APIs as they are at all.
My main worry is that it's fragile to the point of bordering on unsound, for reasons related to the bitfield discussed in #89188. The nuclear option is to completely deprecate this API on the PyO3 side and refuse to expose it to users, though it would be nice to find a more gentle middle ground, especially as that doesn't actually prevent users from trying this themselves, which might be worse. For what it's worth, I did a little bit of digging to see what happens downstream of PyO3 in other Rust code related to these unicode objects. Here's a couple of use cases:
👍 let's aim to catch up soon and circle back here. |
My main issue is that this API encourages users to rely on the detail that they can “borrow” data in one of the formats in the current
The elements of Python strings are any numbers in
(And yes, the docs aren't clear on what's supposed to be valid Unicode/UTF-8 and what's not. Generally, though, we don't check.)
I consider writing the data an entirely separate use case from reading it. It should have its own API. |
This is inspired by #102589 (comment)
The idea is to add
PyUnicode_Data
andPyUnicode_GetKind
to the version-specific API as exported symbols, so that we can use these downstream in PyO3 instead of having a dependency on the C bitfield.We built this on stream today with help of my chat; it was suggested by @mejrs that
PyUnicode_GetData
would be a better name for consistency.I understood that @encukou didn't want these added to the stable ABI due to concerns about exposing PyUnicode encoding and also risk of dangling pointers. That's fine by me, the main aim here is just to get PyO3 off implementation details.
I guess we'll need tests and documentation too; I can add those if it looks like this API is acceptable to add.