Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid creating incomplete/invalid objects #36

Open
encukou opened this issue Oct 24, 2023 · 10 comments
Open

Avoid creating incomplete/invalid objects #36

encukou opened this issue Oct 24, 2023 · 10 comments
Labels
guideline To be included in guidelines PEP

Comments

@encukou
Copy link
Contributor

encukou commented Oct 24, 2023

From capi-workgroup/problems#56
Related to #20 (Disallow mutating immutable objects)

New API should not allow creation of incomplete/invalid objects.

In particular, a traverse function must be safe to call right after an object is tracked with the GC.

@scoder
Copy link

scoder commented Oct 24, 2023

New API should not allow creation of incomplete/invalid objects.
In particular, a traverse function must be safe to call right after an object is tracked with the GC.

These are actually two independent requirements. It's easy to construct an API for creating container objects incrementally that keeps the object out of GC tracking while it's being built. This is independent of the question whether the object has been created in one go or incrementally before it is added to the GC tracking.

@encukou
Copy link
Contributor Author

encukou commented Oct 26, 2023

Yes. This guideline should disallow both:

  • a container that's usable, but requires the user to add it to GC (in a way that the user can easily forget)
  • a container with an unsafe traverse (a particularly tricky class of invalid objects)

That could be solved with one-shot creation API, and/or a "builder" approach that makes the user do all the steps, in the proper order. (And if those approaches can't be fast enough, it's time for an exception -- see #4.)

@vstinner
Copy link
Contributor

vstinner commented Oct 30, 2023

In Python 3.13 alpha1, I modified PyList_SET_ITEM() to restrict index to [0; Py_SIZE(list)[ range. It caused issues in Cython, and so it was changed to allow index in [0; list->allocated[ instead: python/cpython#111480


I dislike PyList_New() + PyList_SET_ITEM() API since the list is immediately tracked by the GC and so calling gc.get_objects() can expose an invalid list object (ex: calling repr(list) can crash).

With python/cpython#111480 change, PyList_SetItem() and PyList_SET_ITEM() become inconsistent: PyList_SetItem() uses list->ob_size, whereas PyList_SET_ITEM() uses list->allocated.

PyList_SET_ITEM() is a strange beast: it doesn't DECREF the old value, it must only be used to initialize a list newly allocated by PyList_New(). But PyList_New(size) sets list->ob_size to size.

PyList_SetItem() is more regular: it calls DECREF on the old item.

Well, python/cpython#111480 was not about designing a correct API, but to migrate Python 3.12 code to Python 3.13. Previously, PyList_SET_ITEM() could write safely in [0; allocated[ range, and in Python 3.13 it's limited to [0; ob_size[.

@vstinner
Copy link
Contributor

For list, we need an API to transfer objects from a C array to a list without the expensive INCREF/DECREF dance: make PyObject * _PyList_FromArraySteal(PyObject *const *src, Py_ssize_t n) public, but rename it to avoid "Steal" misleading name. In fact, ownership is "moved" or "transfered", not "stolen". cc @erlend-aasland

@vstinner
Copy link
Contributor

For list, we need an API to transfer objects from a C array to a list without the expensive INCREF/DECREF dance: make PyObject * _PyList_FromArraySteal(PyObject *const *src, Py_ssize_t n) public, ...

I created an issue for that: python/cpython#111489

@vstinner
Copy link
Contributor

vstinner commented Jul 4, 2024

I added PyUnicodeWriter API to solve this problem for Unicode strings: https://docs.python.org/dev/c-api/unicode.html#pyunicodewriter It provides a convenient API to build a string. The object is only exposed to Python once the caller is done and calls the PyUnicodeWriter_Finish() method.

I'm working on a similar API to create a Python int object from an array of digits, PyLongWriter: PR python/cpython#121339.

@scoder
Copy link

scoder commented Jul 5, 2024 via email

@vstinner
Copy link
Contributor

vstinner commented Jul 5, 2024

Why does it need a PyUnicodeWriter_Discard() and not just the normal decref?

PyUnicodeWriter allocates a structure on the heap memory. It's not a Python object.

@gvanrossum
Copy link
Contributor

PyUnicodeWriter allocates a structure on the heap memory. It's not a Python object.

This is unfortunate (creating a string this way now requires two heap allocations) but if we were to use a stack-allocated writer object instead, it's hard to imagine how this API could ever be promoted to the stable ABI, and it would still require a separate API call (to reveal the created object in its final form). So I've made peace with this design.

@vstinner
Copy link
Contributor

I created python/cpython#121710: [C API] Add PyBytesWriter API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
guideline To be included in guidelines PEP
Projects
None yet
Development

No branches or pull requests

4 participants