Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C API: Add PyObject_AsObjectArray() function: get tuple/list items as PyObject** array #106593

Closed
vstinner opened this issue Jul 10, 2023 · 28 comments
Labels
topic-C-API type-feature A feature request or enhancement

Comments

@vstinner
Copy link
Member

vstinner commented Jul 10, 2023

The Python C API has an efficient API to access tuple/list items:

  • seq = PySequence_Fast(obj)
  • size = PySequence_Fast_GET_SIZE(seq);
  • item = PySequence_Fast_GET_ITEM(seq, index);
  • items = PySequence_Fast_ITEMS(seq); -- then you can use items[0], items[1], ...
  • Py_DECREF(seq); -- release the "view" on the tuple/list

Problem: If obj is not a tuple or a list, the function is inefficient: it creates a temporary list. It's not possible to implement an "object array view" protocol in *3rd party C extension types.

The other problem is that the &PyTuple_GET_ITEM(tuple, 0) and &PyList_GET_ITEM(tuple, 0) code to get a direct access to an object array doesn't give a clear control on when the array remains valid. The returning pointer becomes a dangling pointer if the tuple/list is removed in the meanwhile.

I propose designing a new more generic API:

PyAPI_FUNC(int) PySequence_AsObjectArray(
    PyObject *,
    PyResource *res,
    PyObject ***parray,
    Py_ssize_t *psize);

The API gives a PyObject** array and it's Py_ssize_t size and rely on a new PyResource API to "release the resource" (view on this sequence).

The PyResource API is proposed separately: see issue #106592.

Example of usage:

void func(PyObject *seq)
{
    PyResource res;
    PyObject **items;
    Py_ssize_t nitem;
    if (PySequence_AsObjectArray(seq, &res, &items, &nitem) < 0) {
        if (PyErr_ExceptionMatches(PyExc_TypeError)) {
            PyErr_SetString(PyExc_TypeError, "items() returned non-iterable");
        }
        goto error;
    }
    if (nitem != 2) {
        PyErr_SetString(PyExc_TypeError,
                        "items() returned item which size is not 2");
        PyResource_Release(&res);
        goto error;
    }

    // items or it may be cleared while accessing __abstractmethod__
    // So we need to keep strong reference for key
    PyObject *key = Py_NewRef(items[0]);
    PyObject *value = Py_NewRef(items[1]);
    PyResource_Release(&res);

    // ... use key and value ...
    
    Py_DECREF(key);
    Py_DECREF(value);
}

This design is more generic: later, we can add a protocol to let a custom type to implement its own "object array view" and implement the "release function" with arbitrary code: it doesn't have to rely on PyObject reference counting. For example, a view can be a memory block allocated on the heap, the release function just would release the memory.

Providing such protocol is out of the scope of this issue. Maybe we can reuse the Py_buffer protocol for that.

Linked PRs

@vstinner vstinner added the type-feature A feature request or enhancement label Jul 10, 2023
@vstinner vstinner changed the title Add PyObject_AsObjectArray() function: get tuple/list items as PyObject** array C API: Add PyObject_AsObjectArray() function: get tuple/list items as PyObject** array Jul 10, 2023
@vstinner
Copy link
Member Author

Should we prevent using PySequence_AsObjectArray() to modify a tuple items? A tuple must be immutable: modifying a tuple is bad: see issue capi-workgroup/problems#56

Should we copy tuple items if a "write only" (or "read+write") view is requested? It's fine to give a direct access if a "read only" view is requested.

@vstinner
Copy link
Member Author

The other problem is that the &PyTuple_GET_ITEM(tuple, 0) and &PyList_GET_ITEM(tuple, 0) code to get a direct access to an object array doesn't give a clear control on when the array remains valid.

These API were left unchanged by rejected PEP 674 – Disallow using macros as l-values: PyTuple_GET_ITEM() and PyList_GET_ITEM() are left unchanged.

@vstinner
Copy link
Member Author

In 2020, I created issue #85250: [C API] Convert PyTuple_GET_ITEM() macro to a static inline function. See related Cython issue: cython/cython#3701 (still open)

@vstinner
Copy link
Member Author

In June 2020, I created PEP 620: C API for efficient loop iterating on a sequence of PyObject** or other C types discussion about such hypothetical API.

vstinner added a commit to vstinner/cpython that referenced this issue Jul 10, 2023
* Add PyUnicode_AsUTF8Resource()
* Add PyBytes_AsStringResource()
* Add PySequence_AsObjectArray()
* Add Include/pyresource.h
* Add PyResource_Release() to the stable ABI
* compute_abstract_methods(): Replace PySequence_Fast()
  with PySequence_AsObjectArray()
@vstinner
Copy link
Member Author

Proof-of-Concept PR to show how this API can be implemented and used: PR #106596.

@encukou
Copy link
Member

encukou commented Jul 10, 2023

A more robust alternative would be API for iterating by chunks, discussed in https://discuss.python.org/t/15993/20 and further.

@steve-s
Copy link
Contributor

steve-s commented Jul 10, 2023

I think that ideally the API should consider possible future optimizations, like a compact representation of a list of integers, i.e., we'd be "inflating" the individual values to PyObject* on the fly. Materializing such list to one big PyObject* array is obviously not ideal.

To be more concrete re the "chunks" approach. It can be something along these lines:

PyAPI_FUNC(int) PySequence_AsObjectArray(
    PyObject *,
    Py_ssize_t start_index,
    Py_ssize_t requested_size,
    PyResource *res,    
    PyObject ***parray,
    Py_ssize_t *actual_size);

To iterate the sequence, you'd write two nested loops. The outer calling the API and the inner iterating the items in the buffer.

There can be an inline helper:

PyAPI_FUNC(int) PySequence_AllAsObjectArray(
    PyObject *o,
    PyResource *res,
    PyObject ***parray,
    Py_ssize_t *actual_size) {
    return PySequence_AllAsObjectArray(o, 0, MAX_LONG, ...);
}

However:

  • the helper would encourage non-buffered usage, which is not ideal
  • the object itself may know better what chunk size is ideal to use (or maybe simply: whether to use buffer or getting the whole contents as one big array is better).

R has similar API for its "alternative" vectors: https://github.com/wch/r-source/blob/688f12cc9627c38ae10c4f597010da3f7142a487/src/include/R_ext/Itermacros.h#L242

They first query the vector for "one big array", which is an optional operation, and if that's not supported, they iterate using fixed size small-ish buffer. All is wrapped in a convenience C macro.

@encukou
Copy link
Member

encukou commented Jul 10, 2023

Yeah, this should probably be a PEP. Adding a function for a subset of use cases will only make the C-API bigger, as we add new ones for the remaining cases.

@ronaldoussoren
Copy link
Contributor

I'm pretty sure this has been discussed before, possibly on discuss.python.org or one of the mail lists.

An alternative interface is to provide an interface to iterate in chunks, similar to fast enumeration in Objective-C/Cocoa. That way buffer allocation can be avoided, although the semantics will we different in a way that can affect code that currently uses PySequence_Fast with objects that can be modified during iteration.

@vstinner
Copy link
Member Author

It would be nice to have a generic protocol to access an array as a C array for a specific item type. Examples:

  • array of int8_t
  • array of uint32_t
  • array of PyObject*
  • etc.

We can imagine that it would be possible to query which formats are supported by a object, and the code could switch to the most efficient one.

See array.array and the buffer protocol for item type, memory layout, etc.

But this hypothetical "protocol" idea is way wider that the scope of this issue. Here I propose to focus on fixing the API for PyTuple and PyList by making it more generic, and maybe consider later to add the ability for a C extension to provide an "object array" view for a type.

There are two problems:

  • Current API gives a a direct access to PyTuple and PyList members prevent to change their layout (to implement new optimizations)
  • The API caller doesn't notice when it is done with using the view

@encukou
Copy link
Member

encukou commented Jul 10, 2023

What are the optimizations this will enable? Are they being planned somewhere?

@vstinner
Copy link
Member Author

For me, the motivation to use PySequence_Fast() is to have a simple C loop on an array:

for (Py_ssize_t i=0; i < size; i++) {
    item = array[i];
    // ... use item ...
}

Yes, proposed PySequence_AsObjectArray() is inefficient on large sequeces which don't store their items as PyObject*. Well, like any PyPy container for example. And the function has to create an array containing all items, even if just a few items are needed.

If you know that you only need a few items, there are existing nice APIs for that, no additional API is needed:

  • size = PySequence_Length(obj)
  • item = PySequence_GetItem(obj, index)

These APIs don't have to convert all items to PyObject*: only requested items (if needed, only a few PyObject** objects are created on demand).

This issue is only a small step to fix the problematic APIs. I don't pretend or want to address the "big general case" of "iterating on items of a specific type".

@vstinner
Copy link
Member Author

What are the optimizations this will enable? Are they being planned somewhere?

Changing PyTuple and/or PyList to have a different memory layout. For example, you can imagine creating a PyList and converting to a PyTuple without having to copy memory: the PyTuple will just delegate get/set operations to an internal PyList (stored unchanged).

An interesting optimization, implemented by PyPy, is to specialize a tuple/list to integers: store them as numbers, not as PyLong objects which store numbers (avoid boxing).

The fact that PyTupleTuple.ob_item is public and PySequence_Fast_ITEMS() expects to be able to get an object array as a direct operation (which canont fail) is problematic for that.

If tomorrow, PyTuple/PyList content changes, PySequence_AsObjectArray() can have to create a new temporary "object array" view which would be destroyed (release the memory) by PyResource_Release().

@vstinner
Copy link
Member Author

specialize a tuple/list to integers

I elaborated it there: https://pythoncapi.readthedocs.io/optimization_ideas.html#specialized-list-for-small-integers

@vstinner
Copy link
Member Author

An alternative interface is to provide an interface to iterate in chunks, similar to fast enumeration in Objective-C/Cocoa. That way buffer allocation can be avoided, although the semantics will we different in a way that can affect code that currently uses PySequence_Fast with objects that can be modified during iteration.

Currently, PySequence_Fast() creates a copy of the sequence to create a PyObject** array view for types other than tuple and list: modifying the view does not update the sequence.

I think that PySequence_AsObjectArray() should only be used to have a read-only access. Using it to modify tuple items or list items should be avoided (or denied?). If we want, we can detect modification in debug mode by creating a second view and compare them in PyResource_Release().

For me, PySequence_SetItem() is the reference API to set an item.


If we want to provide "write" view, we can modify the sequence in PyResource_Release(). Pseudo-code of a release function:

for (Py_ssize_t i=0; i < size; i++) {
   PyObject *obj = array[i]
    if (PySequence_SetItem(seq, i, obj) < 0) ... oops, how should we report errors? unraisable exception? ...
    Py_DECREF(obj); // case of a view storing strong references
}

This code can be slow if the sequence is long, but it works: it uses regular existing APIs.

@encukou
Copy link
Member

encukou commented Jul 10, 2023

An interesting optimization, implemented by PyPy, is to specialize a tuple/list to integers: store them as numbers, not as PyLong objects which store numbers (avoid boxing).

Are these changes currently planned?

If not, IMO there's time to add API that considers all the known use cases -- zero-copy, unboxing, chunking, filling a pre-allocated buffer....

If the old API is not removed soon enough, PySequence_Fast_ITEMS can materialize the PyObject* array. It needs an extra pointer in the optimized struct, and a hard crash if memory allocation fails -- both of which are IMO acceptable.

There's no need to rush this and only solve one of the issues.

@vstinner
Copy link
Member Author

HPy "protocol" sketch about this problem:

HPySequence x = HPy_AsSequence(obj); /* it can raise an exception if it's not iterable */
int len = HPy_Sequence_Len(x, obj);
for(int i=0; i<len; i++) {
    /* HPy_Sequence_GetItem will check a flag on x to see if it can use a
       fast-path of direct indexing or it needs to go through a generic
       fallback. And the C compiler will hoist the check out of the loop,
       hopefully */
    HPy item = HPy_Sequence_GetItem(x, obj, i);  /* PyList_GET_ITEM */
 }
HPySequenceClose(x, obj);

src: https://github.com/hpyproject/hpy/blob/5ddcbc8df8084cef4e0e795aa33c7a3d360855a8/docs/xxx-unsorted-notes.txt#L49-L58

This sketch uses functions to get the sequence length and to get an item. Internally, HPySequence can store a few items, fetch a bunch of items on demand, or just be a pointer to the items. The advantage is that it gives more freedom to the implementation. The disadvantage is that it requires to call functions to get the length and each item.

@vstinner
Copy link
Member Author

The other problem is that the &PyTuple_GET_ITEM(tuple, 0) and &PyList_GET)_ITEM(tuple, 0) code to get a direct access to an object array doesn't give a clear control on when the array remains valid.

Another problem is that it makes the assumption that the object will cannot move in memory, which is not true in a Python implementation using a moving GC. That's why a "release" function is needed.

@steve-s
Copy link
Contributor

steve-s commented Jul 11, 2023

If you know that you only need a few items, there are existing nice APIs for that, no additional API is needed:

size = PySequence_Length(obj)
item = PySequence_GetItem(obj, index)

Sure, if you know you only need a small constant number of items, or if you need random access, but note that important requirement for this API, in my opinion, and as also noted in https://discuss.python.org/t/15993, is performance given that we talk about ABI calls and not macros used in a loop. Calling something in a tight loop can cause big performance regression if you switch from macros to actual ABI calls. Of course you cannot do anything useful with opaque PyObject* without calling some more ABI/API, so for generic Python objects this is not so important (but maybe still worth it), but would be very useful if you could request primitive arrays.

HPy "protocol" sketch about this problem:
...
The disadvantage is that it requires to call functions to get the length and each item.

Note that these are rather old notes. If I read them correctly, I think the intention is that HPy_Sequence_GetItem will not be ABI call, but a helper inline function that will check some flag stored in HPySequence (so some memory layout details of HPySequence would be part of the ABI too) and according to the flag it would either issue ABI call to retrieve the individual item or use a fast path that reads it from some storage in HPySequence directly. This can internally implement the "buffering", but then I am not sure if we can rely on the compiler to be smart enough to convert the single loop to a two loops algorithm where only the outer loop calls ABI function to get the next buffer and the inner loop can be calls free (an hence vectorized or some such).

@vstinner
Copy link
Member Author

See also capi-workgroup/problems#15 discussion.

@vstinner
Copy link
Member Author

In the Linux kernel, the main API are syscalls. They dislike adding syscalls, but sometimes there is not space to pass new arguments and so a new syscall must be added.

Here I feel that tomorrow we will want to have different behavior in some cases.

For example, getting an "object array" view of a list just by doing INCREF/DECREF on the list (not on items) is unsafe, since technically a list is mutable. If the consumer of the view calls arbitray Python code, the list can be modified indirectly.

Maybe it would be nice to explicitly request a copy of the list when it's known that the list can be modified. It would not be the default. Example:

PyAPI_FUNC(int) PySequence_AsObjectArray(
    PyObject *,
    PyResource *res,
    PyObject ***parray,
    Py_ssize_t *psize,
    int flags);

With flags=PySequence_COPY_FLAG. In this case, the function would allocate a new array where items hold a strong refrence, and PyResource_Release() would decrement items reference count and then release the memory (of the array).

@vstinner
Copy link
Member Author

The discussed API fetching a few items in a buffer is appealing. But is it an issue that the sequence can mutate during iteration? Its size can change, items can change, the list can grow, etc.

It's a common issue while iterating a list or a dictionary when the loop runs arbitrary Python code, and the code changes the container. In the past, we did our best to detect mutation during iteration, but it's hard to implement it in an efficient way.

@encukou
Copy link
Member

encukou commented Jul 12, 2023

But is it an issue that the sequence can mutate during iteration?

If this happens, it's OK to skip some items or get some items twice. That's why we tell users to not mutate what they're iterating over.
But a good C API should always prevent refcounting issues or segfaults. (Or if it doesn't do that, be very explicit about what the dangers are, when they happen and how to avoid them -- typically we'd only want that for "unstable" fast API that already has a "proper" slower variant.)

@vstinner
Copy link
Member Author

The worst danger is if the sequence becomes shorter and my proposed API gives a size which is now outdated 😬

But it's hard to guess if the sequence can or cannot be mutated, and that's why I propose letting the caller choose between a dangerous but fast direct access, or get a slow but safe copy.

In Python, there is the same problem: if a list/dict is mutated in the loop body, I copy the list/dict and iterate on the list.

vstinner added a commit to vstinner/cpython that referenced this issue Jul 17, 2023
* Add PyUnicode_AsUTF8Resource()
* Add PyBytes_AsStringResource()
* Add PySequence_AsObjectArray()
* Add Include/pyresource.h
* Add PyResource_Close() to the stable ABI
* compute_abstract_methods(): Replace PySequence_Fast()
  with PySequence_AsObjectArray()
vstinner added a commit to vstinner/cpython that referenced this issue Jul 17, 2023
* Add PyUnicode_AsUTF8Resource()
* Add PyBytes_AsStringResource()
* Add PySequence_AsObjectArray()
* Add Include/pyresource.h
* Add PyResource_Close() to the stable ABI
* compute_abstract_methods(): Replace PySequence_Fast()
  with PySequence_AsObjectArray()
@da-woods
Copy link
Contributor

One other option that I don't think has come up here:

Numpy allows object arrays and I'm Cython manages to view them through the buffer protocol. I'm fairly sure this is an unofficial extension to the buffer protocol (nothing internal in Python checks the format-code too hard). However it does work, and provides largely the functionality you're after. Of course it doesn't currently work with lists/tuples.

I suspect making it work with list would be quite hard because it can mutate. And it doesn't help with the more advanced "iterate over packed, non-PyObject data" issues.

But just pointing out that something similar does exist and work.

@vstinner
Copy link
Member Author

Numpy allows object arrays and I'm Cython manages to view them through the buffer protocol

I'm not aware of that. Do you have some details? Link into the code? What's the buffer format used for that?

@da-woods
Copy link
Contributor

da-woods commented Jul 28, 2023

Do you have some details? Link into the code? What's the buffer format used for that?

The buffer format used is 'O'. The code is scattered about Cython a little.

Sorry that's a bit of a scattered list of links rather than a coherent story. It does mostly just piggy-pack off the buffer protocol so there isn't a huge amount of special-casing for it.

@vstinner
Copy link
Member Author

I see that there are different opinion on how the API should look like. Some would prefer a brand new API which work on small amout of items, like SQL paging. Some others like me would prefer to provide the whole array at once.

Sadly, it's unclear to which projects would want to expose a sequence as an array of pointers to objects (PyObject**).

It's unclear to me if such API should allow modifying the array, if changes which would be reported to the original sequence, or if it should be a read-only sequence. The current PySequence_Fast() API doesn't use C const keyword to prevent modifications, but it's doesn't report changes to the original sequence, unless the sequence is a list or a tuple (since we can modify tuples, right?).

Overall, I'm not sure which problem I'm trying to solve. Right now, I prefer to leave this problem aside.

Thanks everyone who was involved in the discussion. I hope that at least the discussion will help the next volunteer motivated to investigate this topic :-)

@erlend-aasland erlend-aasland closed this as not planned Won't fix, can't repro, duplicate, stale Aug 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-C-API type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

7 participants