pymethods: add support for sequence protocol #2060

davidhewitt · 2021-12-20T00:19:26Z

This PR implements what I think to be a good way to add sequence protocol support to #[pymethods].

There are two ways this is done:

__len__, __getitem__, __setitem__, and __delitem__ now implement both mapping and sequence slots simultaneously. I think this is correct because a class implemented in pure Python also does exactly this. To get the previous behaviour of these methods just implementing a mapping, use new option #[pyclass(true_mapping)].
New sequence-specific methods __seqlen__, __getseqitem__, __setseqitem__, and __delseqitem__ have been added. These can be used to implement a sequence without implementing any mapping methods. (Note that Python adds the sequence length to negative indices before calling these methods; needs careful documenting.)

Bikeshedding very welcome on names #[pyclass(true_mapping)], __seqlen__, __getseqitem__, __setseqitem__, and __delseqitem__. Some alternatives I had:

#[pyclass(no_auto_sequence_methods)] describes the behaviour better but it's a real mouthful.
#[pyclass(pure_mapping)], somehow true_mapping felt better to me.
__seqgetitem__, __seqsetitem__, __setdelitem__, i.e. "seq" at the front rather than in the middle.

This implementation seems to work nicely. To finish off this PR I'd like to add some guide docs and an example for each of a sequence and mapping, like we have the decorator example.

However, before I spend too much time writing documentation I'd really love to hear folks' opinion on whether they agree with this design choice. I could think of two possible alternatives to this PR:

Introduce __seqlen__, __getseqitem__, __setseqitem__, and __delseqitem__ like this PR. Don't change the existing mapping methods to generate sequence methods (and so don't bother adding #[pyclass(true_mapping)]).

This would lead to a simpler implementation, but then PyO3 has several differences to pure Python, which I think is not a desirable outcome.
Only have __len__, __getitem__, __setitem__, and __delitem__. By default these methods implement both protocols like Python. Attributes #[pyclass(mapping)] and #[pyclass(sequence)] can be used to restrict the generated implementation to just the one specified.

I haven't tried to implement this design yet; I'm unsure how hard doing so would be. Main downside is that these methods would change a lot if #[pyclass(sequence)] was used: sequences frequently want to support slicing, so __getitem__, __setitem__ and __delitem__ would need to take some kind of Either<isize, &PySlice> when in sequence mode. In addition, Python adding the length to negative indices would only happen for the isize variants (the slices are passed unchanged), which could lead to nasty footguns.

In this PR implementation __XXXseqitem__ methods would only get isize indices and __XXXitem__ (mapping) methods would be used for slicing, so at least the negative-indices behaviour is clearly separated.

adamreichold · 2021-12-20T07:12:08Z

From looking at the other class arguments, I would prefer something like #[pyclass(no_sequence)] to suppress the default behaviour.

mejrs

A weakness of those two designs is that neither requires users to think upfront about whether they want a Sequence or Mapping, unlike the current pyproto trait implementation.

if we want (which I don't think we should, see below) we could fix that by just not having e.g. __getitem__ and instead have only (modulo bikeshedding) __mapping_get_item__ and __sequence_get_item__. Or we could require users to annotate the __getitem__:

#[pymethods] 
impl Foo{
   #[sequence]
   fn __getitem__(....) -> ...{
       ...   
    }
}

__len__, __getitem__, __setitem__, and __delitem__ now implement both mapping and sequence slots simultaneously. I think this is correct because a class implemented in pure Python also does exactly this. To get the previous behaviour of these methods just implementing a mapping, use new option #[pyclass(true_mapping)].

Latent here seems to be an assumption that this behaviour, where classes behave as both a sequence and a mapping, is undesirable.

I can imagine some examples where you definitely don't want __iter__ to call __getitem__ repeatedly, e.g. with linear time algorithms like indexing into a string or a linked list.

Personally I'm not convinced any of these new dunders or attributes are necessary. I think some documentation along the lines of "if you implement this dunder, you should consider what __iter__ should do" is sufficient.

But I'm not sure if there is anything where this behaviour will actually do something weird. Do you have any such use cases or examples in mind?

mejrs · 2021-12-20T05:49:16Z

guide/src/migration.md

+ - PyO3 will allow instances of these classes to be cast to `PySequence` as well as `PyMapping`.
+ - Python will provide a default implementation of `__iter__` (if the class did not have one) which repeatedly calls `__getitem__` with integers (starting at 0) until an `IndexError` is raised.
+
+To disable this behavior, use `#[pyclass(true_mapping)]` to retain the previous behavior of only providing the mapping implementation.


I'd like to see this example go more into depth into why you wouldn't want this. (perhaps this would fit best in that guide entry though)

mejrs · 2021-12-20T06:45:10Z

src/pyclass.rs

+ push_dict_getset(&mut property_defs, dict_is_dummy);
+
+ if !property_defs.is_empty() {
+ property_defs.push(unsafe { std::mem::zeroed() });


The unsafe block can be avoided (also true for the others):

Suggested change

property_defs.push(unsafe { std::mem::zeroed() });

property_defs.push(ffi::PyMethodDef::default());

davidhewitt · 2021-12-20T09:02:41Z

Latent here seems to be an assumption that this behaviour, where classes behave as both a sequence and a mapping, is undesirable.

That's a really fair point. It's clearly not that bad, because pure Python implements it in this way and provides no additional fleixiblity to target only the mapping or sequence protocols.

I think PyO3 can give users the additional control. In my eyes, it's mostly about correctness and optimizations:

Accessing a sequence through the C sequence API which is implemented directly (e.g. __getseqitem__ in this PR) avoids creation of an intermediate python integer for the index. It's only very tiny effect though.
Having default iteration for mappings will generally lead to a KeyError: 0, rather than iter(mapping) immediately raising a TypeError
There may be other cases where C code wants to check whether a type is a mapping or a sequence and might try to do optimizations based off that.

That said, I'm pretty convinced that by default we want to have behaviour which matches Python. The downsides to doing so are small, and it's great to have PyO3 behaviour match Python.

I'm now thinking I will split this PR in two, with the first half just being the implement-both-protocols behaviour. We don't need to provide this finer control in PyO3 0.16, we just need to provide coverage of all the protocols so that we can deprecate #[pyproto]. We can then take some time to decide which of the alternatives are better.

davidhewitt · 2021-12-22T23:42:16Z

I just noticed Py_TPFLAGS_MAPPING and Py_TPFLAGS_SEQUENCE, new in Python 3.10

This makes me think that having #[pyclass(mapping)] and #[pyclass(sequence)] might be a good idea.

davidhewitt · 2022-02-05T16:46:20Z

Closing this one, since #2065 has been merged.

pymethods: seq methods from mapping methods

dcd1f9e

mejrs reviewed Dec 20, 2021

View reviewed changes

davidhewitt mentioned this pull request Dec 20, 2021

pymethods: seq methods from mapping methods #2065

Merged

davidhewitt closed this Feb 5, 2022

davidhewitt deleted the protos-seq-mapping-fallbacks branch February 5, 2022 16:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pymethods: add support for sequence protocol #2060

pymethods: add support for sequence protocol #2060

davidhewitt commented Dec 20, 2021

adamreichold commented Dec 20, 2021

mejrs left a comment

mejrs Dec 20, 2021

mejrs Dec 20, 2021

davidhewitt commented Dec 20, 2021 •

edited

Loading

davidhewitt commented Dec 22, 2021

davidhewitt commented Feb 5, 2022

	property_defs.push(unsafe { std::mem::zeroed() });
	property_defs.push(ffi::PyMethodDef::default());

pymethods: add support for sequence protocol #2060

pymethods: add support for sequence protocol #2060

Conversation

davidhewitt commented Dec 20, 2021

adamreichold commented Dec 20, 2021

mejrs left a comment

Choose a reason for hiding this comment

mejrs Dec 20, 2021

Choose a reason for hiding this comment

mejrs Dec 20, 2021

Choose a reason for hiding this comment

davidhewitt commented Dec 20, 2021 • edited Loading

davidhewitt commented Dec 22, 2021

davidhewitt commented Feb 5, 2022

davidhewitt commented Dec 20, 2021 •

edited

Loading