-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C API: Add PyObject_AsObjectArray() function: get tuple/list items as PyObject** array #106593
Comments
Should we prevent using Should we copy tuple items if a "write only" (or "read+write") view is requested? It's fine to give a direct access if a "read only" view is requested. |
These API were left unchanged by rejected PEP 674 – Disallow using macros as l-values: PyTuple_GET_ITEM() and PyList_GET_ITEM() are left unchanged. |
In 2020, I created issue #85250: [C API] Convert PyTuple_GET_ITEM() macro to a static inline function. See related Cython issue: cython/cython#3701 (still open) |
In June 2020, I created PEP 620: C API for efficient loop iterating on a sequence of PyObject** or other C types discussion about such hypothetical API. |
* Add PyUnicode_AsUTF8Resource() * Add PyBytes_AsStringResource() * Add PySequence_AsObjectArray() * Add Include/pyresource.h * Add PyResource_Release() to the stable ABI * compute_abstract_methods(): Replace PySequence_Fast() with PySequence_AsObjectArray()
Proof-of-Concept PR to show how this API can be implemented and used: PR #106596. |
A more robust alternative would be API for iterating by chunks, discussed in https://discuss.python.org/t/15993/20 and further. |
I think that ideally the API should consider possible future optimizations, like a compact representation of a list of integers, i.e., we'd be "inflating" the individual values to To be more concrete re the "chunks" approach. It can be something along these lines:
To iterate the sequence, you'd write two nested loops. The outer calling the API and the inner iterating the items in the buffer. There can be an inline helper:
However:
R has similar API for its "alternative" vectors: https://github.com/wch/r-source/blob/688f12cc9627c38ae10c4f597010da3f7142a487/src/include/R_ext/Itermacros.h#L242 They first query the vector for "one big array", which is an optional operation, and if that's not supported, they iterate using fixed size small-ish buffer. All is wrapped in a convenience C macro. |
Yeah, this should probably be a PEP. Adding a function for a subset of use cases will only make the C-API bigger, as we add new ones for the remaining cases. |
I'm pretty sure this has been discussed before, possibly on discuss.python.org or one of the mail lists. An alternative interface is to provide an interface to iterate in chunks, similar to fast enumeration in Objective-C/Cocoa. That way buffer allocation can be avoided, although the semantics will we different in a way that can affect code that currently uses |
It would be nice to have a generic protocol to access an array as a C array for a specific item type. Examples:
We can imagine that it would be possible to query which formats are supported by a object, and the code could switch to the most efficient one. See But this hypothetical "protocol" idea is way wider that the scope of this issue. Here I propose to focus on fixing the API for PyTuple and PyList by making it more generic, and maybe consider later to add the ability for a C extension to provide an "object array" view for a type. There are two problems:
|
What are the optimizations this will enable? Are they being planned somewhere? |
For me, the motivation to use
Yes, proposed PySequence_AsObjectArray() is inefficient on large sequeces which don't store their items as If you know that you only need a few items, there are existing nice APIs for that, no additional API is needed:
These APIs don't have to convert all items to This issue is only a small step to fix the problematic APIs. I don't pretend or want to address the "big general case" of "iterating on items of a specific type". |
Changing PyTuple and/or PyList to have a different memory layout. For example, you can imagine creating a PyList and converting to a PyTuple without having to copy memory: the PyTuple will just delegate get/set operations to an internal PyList (stored unchanged). An interesting optimization, implemented by PyPy, is to specialize a tuple/list to integers: store them as numbers, not as PyLong objects which store numbers (avoid boxing). The fact that If tomorrow, PyTuple/PyList content changes, |
I elaborated it there: https://pythoncapi.readthedocs.io/optimization_ideas.html#specialized-list-for-small-integers |
Currently, PySequence_Fast() creates a copy of the sequence to create a I think that PySequence_AsObjectArray() should only be used to have a read-only access. Using it to modify tuple items or list items should be avoided (or denied?). If we want, we can detect modification in debug mode by creating a second view and compare them in PyResource_Release(). For me, PySequence_SetItem() is the reference API to set an item. If we want to provide "write" view, we can modify the sequence in PyResource_Release(). Pseudo-code of a release function:
This code can be slow if the sequence is long, but it works: it uses regular existing APIs. |
Are these changes currently planned? If not, IMO there's time to add API that considers all the known use cases -- zero-copy, unboxing, chunking, filling a pre-allocated buffer.... If the old API is not removed soon enough, There's no need to rush this and only solve one of the issues. |
HPy "protocol" sketch about this problem: HPySequence x = HPy_AsSequence(obj); /* it can raise an exception if it's not iterable */
int len = HPy_Sequence_Len(x, obj);
for(int i=0; i<len; i++) {
/* HPy_Sequence_GetItem will check a flag on x to see if it can use a
fast-path of direct indexing or it needs to go through a generic
fallback. And the C compiler will hoist the check out of the loop,
hopefully */
HPy item = HPy_Sequence_GetItem(x, obj, i); /* PyList_GET_ITEM */
}
HPySequenceClose(x, obj); This sketch uses functions to get the sequence length and to get an item. Internally, |
Another problem is that it makes the assumption that the object will cannot move in memory, which is not true in a Python implementation using a moving GC. That's why a "release" function is needed. |
Sure, if you know you only need a small constant number of items, or if you need random access, but note that important requirement for this API, in my opinion, and as also noted in https://discuss.python.org/t/15993, is performance given that we talk about ABI calls and not macros used in a loop. Calling something in a tight loop can cause big performance regression if you switch from macros to actual ABI calls. Of course you cannot do anything useful with opaque
Note that these are rather old notes. If I read them correctly, I think the intention is that |
See also capi-workgroup/problems#15 discussion. |
In the Linux kernel, the main API are syscalls. They dislike adding syscalls, but sometimes there is not space to pass new arguments and so a new syscall must be added. Here I feel that tomorrow we will want to have different behavior in some cases. For example, getting an "object array" view of a list just by doing INCREF/DECREF on the list (not on items) is unsafe, since technically a list is mutable. If the consumer of the view calls arbitray Python code, the list can be modified indirectly. Maybe it would be nice to explicitly request a copy of the list when it's known that the list can be modified. It would not be the default. Example: PyAPI_FUNC(int) PySequence_AsObjectArray(
PyObject *,
PyResource *res,
PyObject ***parray,
Py_ssize_t *psize,
int flags); With |
The discussed API fetching a few items in a buffer is appealing. But is it an issue that the sequence can mutate during iteration? Its size can change, items can change, the list can grow, etc. It's a common issue while iterating a list or a dictionary when the loop runs arbitrary Python code, and the code changes the container. In the past, we did our best to detect mutation during iteration, but it's hard to implement it in an efficient way. |
If this happens, it's OK to skip some items or get some items twice. That's why we tell users to not mutate what they're iterating over. |
The worst danger is if the sequence becomes shorter and my proposed API gives a size which is now outdated 😬 But it's hard to guess if the sequence can or cannot be mutated, and that's why I propose letting the caller choose between a dangerous but fast direct access, or get a slow but safe copy. In Python, there is the same problem: if a list/dict is mutated in the loop body, I copy the list/dict and iterate on the list. |
* Add PyUnicode_AsUTF8Resource() * Add PyBytes_AsStringResource() * Add PySequence_AsObjectArray() * Add Include/pyresource.h * Add PyResource_Close() to the stable ABI * compute_abstract_methods(): Replace PySequence_Fast() with PySequence_AsObjectArray()
* Add PyUnicode_AsUTF8Resource() * Add PyBytes_AsStringResource() * Add PySequence_AsObjectArray() * Add Include/pyresource.h * Add PyResource_Close() to the stable ABI * compute_abstract_methods(): Replace PySequence_Fast() with PySequence_AsObjectArray()
One other option that I don't think has come up here: Numpy allows object arrays and I'm Cython manages to view them through the buffer protocol. I'm fairly sure this is an unofficial extension to the buffer protocol (nothing internal in Python checks the format-code too hard). However it does work, and provides largely the functionality you're after. Of course it doesn't currently work with lists/tuples. I suspect making it work with But just pointing out that something similar does exist and work. |
I'm not aware of that. Do you have some details? Link into the code? What's the buffer format used for that? |
The buffer format used is 'O'. The code is scattered about Cython a little.
Sorry that's a bit of a scattered list of links rather than a coherent story. It does mostly just piggy-pack off the buffer protocol so there isn't a huge amount of special-casing for it. |
I see that there are different opinion on how the API should look like. Some would prefer a brand new API which work on small amout of items, like SQL paging. Some others like me would prefer to provide the whole array at once. Sadly, it's unclear to which projects would want to expose a sequence as an array of pointers to objects ( It's unclear to me if such API should allow modifying the array, if changes which would be reported to the original sequence, or if it should be a read-only sequence. The current PySequence_Fast() API doesn't use C const keyword to prevent modifications, but it's doesn't report changes to the original sequence, unless the sequence is a list or a tuple (since we can modify tuples, right?). Overall, I'm not sure which problem I'm trying to solve. Right now, I prefer to leave this problem aside. Thanks everyone who was involved in the discussion. I hope that at least the discussion will help the next volunteer motivated to investigate this topic :-) |
The Python C API has an efficient API to access tuple/list items:
seq = PySequence_Fast(obj)
size = PySequence_Fast_GET_SIZE(seq);
item = PySequence_Fast_GET_ITEM(seq, index);
items = PySequence_Fast_ITEMS(seq);
-- then you can useitems[0]
,items[1]
, ...Py_DECREF(seq);
-- release the "view" on the tuple/listProblem: If obj is not a tuple or a list, the function is inefficient: it creates a temporary list. It's not possible to implement an "object array view" protocol in *3rd party C extension types.
The other problem is that the
&PyTuple_GET_ITEM(tuple, 0)
and&PyList_GET_ITEM(tuple, 0)
code to get a direct access to an object array doesn't give a clear control on when the array remains valid. The returning pointer becomes a dangling pointer if the tuple/list is removed in the meanwhile.I propose designing a new more generic API:
The API gives a
PyObject**
array and it'sPy_ssize_t
size and rely on a new PyResource API to "release the resource" (view on this sequence).The PyResource API is proposed separately: see issue #106592.
Example of usage:
This design is more generic: later, we can add a protocol to let a custom type to implement its own "object array view" and implement the "release function" with arbitrary code: it doesn't have to rely on PyObject reference counting. For example, a view can be a memory block allocated on the heap, the release function just would release the memory.
Providing such protocol is out of the scope of this issue. Maybe we can reuse the
Py_buffer
protocol for that.Linked PRs
The text was updated successfully, but these errors were encountered: