Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pymethods: add support for sequence protocol #2060

Closed

Conversation

davidhewitt
Copy link
Member

For #1884.

This PR implements what I think to be a good way to add sequence protocol support to #[pymethods].

There are two ways this is done:

  • __len__, __getitem__, __setitem__, and __delitem__ now implement both mapping and sequence slots simultaneously. I think this is correct because a class implemented in pure Python also does exactly this. To get the previous behaviour of these methods just implementing a mapping, use new option #[pyclass(true_mapping)].
  • New sequence-specific methods __seqlen__, __getseqitem__, __setseqitem__, and __delseqitem__ have been added. These can be used to implement a sequence without implementing any mapping methods. (Note that Python adds the sequence length to negative indices before calling these methods; needs careful documenting.)

Bikeshedding very welcome on names #[pyclass(true_mapping)], __seqlen__, __getseqitem__, __setseqitem__, and __delseqitem__. Some alternatives I had:

  • #[pyclass(no_auto_sequence_methods)] describes the behaviour better but it's a real mouthful.
  • #[pyclass(pure_mapping)], somehow true_mapping felt better to me.
  • __seqgetitem__, __seqsetitem__, __setdelitem__, i.e. "seq" at the front rather than in the middle.

This implementation seems to work nicely. To finish off this PR I'd like to add some guide docs and an example for each of a sequence and mapping, like we have the decorator example.

However, before I spend too much time writing documentation I'd really love to hear folks' opinion on whether they agree with this design choice. I could think of two possible alternatives to this PR:

  1. Introduce __seqlen__, __getseqitem__, __setseqitem__, and __delseqitem__ like this PR. Don't change the existing mapping methods to generate sequence methods (and so don't bother adding #[pyclass(true_mapping)]).

    This would lead to a simpler implementation, but then PyO3 has several differences to pure Python, which I think is not a desirable outcome.

  2. Only have __len__, __getitem__, __setitem__, and __delitem__. By default these methods implement both protocols like Python. Attributes #[pyclass(mapping)] and #[pyclass(sequence)] can be used to restrict the generated implementation to just the one specified.

    I haven't tried to implement this design yet; I'm unsure how hard doing so would be. Main downside is that these methods would change a lot if #[pyclass(sequence)] was used: sequences frequently want to support slicing, so __getitem__, __setitem__ and __delitem__ would need to take some kind of Either<isize, &PySlice> when in sequence mode. In addition, Python adding the length to negative indices would only happen for the isize variants (the slices are passed unchanged), which could lead to nasty footguns.

    In this PR implementation __XXXseqitem__ methods would only get isize indices and __XXXitem__ (mapping) methods would be used for slicing, so at least the negative-indices behaviour is clearly separated.

@adamreichold
Copy link
Member

From looking at the other class arguments, I would prefer something like #[pyclass(no_sequence)] to suppress the default behaviour.

Copy link
Member

@mejrs mejrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A weakness of those two designs is that neither requires users to think upfront about whether they want a Sequence or Mapping, unlike the current pyproto trait implementation.

if we want (which I don't think we should, see below) we could fix that by just not having e.g. __getitem__ and instead have only (modulo bikeshedding) __mapping_get_item__ and __sequence_get_item__. Or we could require users to annotate the __getitem__:

#[pymethods] 
impl Foo{
   #[sequence]
   fn __getitem__(....) -> ...{
       ...   
    }
}

__len__, __getitem__, __setitem__, and __delitem__ now implement both mapping and sequence slots simultaneously. I think this is correct because a class implemented in pure Python also does exactly this. To get the previous behaviour of these methods just implementing a mapping, use new option #[pyclass(true_mapping)].

Latent here seems to be an assumption that this behaviour, where classes behave as both a sequence and a mapping, is undesirable.

I can imagine some examples where you definitely don't want __iter__ to call __getitem__ repeatedly, e.g. with linear time algorithms like indexing into a string or a linked list.

Personally I'm not convinced any of these new dunders or attributes are necessary. I think some documentation along the lines of "if you implement this dunder, you should consider what __iter__ should do" is sufficient.

But I'm not sure if there is anything where this behaviour will actually do something weird. Do you have any such use cases or examples in mind?

- PyO3 will allow instances of these classes to be cast to `PySequence` as well as `PyMapping`.
- Python will provide a default implementation of `__iter__` (if the class did not have one) which repeatedly calls `__getitem__` with integers (starting at 0) until an `IndexError` is raised.

To disable this behavior, use `#[pyclass(true_mapping)]` to retain the previous behavior of only providing the mapping implementation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see this example go more into depth into why you wouldn't want this. (perhaps this would fit best in that guide entry though)

push_dict_getset(&mut property_defs, dict_is_dummy);

if !property_defs.is_empty() {
property_defs.push(unsafe { std::mem::zeroed() });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unsafe block can be avoided (also true for the others):

Suggested change
property_defs.push(unsafe { std::mem::zeroed() });
property_defs.push(ffi::PyMethodDef::default());

@davidhewitt
Copy link
Member Author

davidhewitt commented Dec 20, 2021

Latent here seems to be an assumption that this behaviour, where classes behave as both a sequence and a mapping, is undesirable.

That's a really fair point. It's clearly not that bad, because pure Python implements it in this way and provides no additional fleixiblity to target only the mapping or sequence protocols.

I think PyO3 can give users the additional control. In my eyes, it's mostly about correctness and optimizations:

  • Accessing a sequence through the C sequence API which is implemented directly (e.g. __getseqitem__ in this PR) avoids creation of an intermediate python integer for the index. It's only very tiny effect though.
  • Having default iteration for mappings will generally lead to a KeyError: 0, rather than iter(mapping) immediately raising a TypeError
  • There may be other cases where C code wants to check whether a type is a mapping or a sequence and might try to do optimizations based off that.

That said, I'm pretty convinced that by default we want to have behaviour which matches Python. The downsides to doing so are small, and it's great to have PyO3 behaviour match Python.

I'm now thinking I will split this PR in two, with the first half just being the implement-both-protocols behaviour. We don't need to provide this finer control in PyO3 0.16, we just need to provide coverage of all the protocols so that we can deprecate #[pyproto]. We can then take some time to decide which of the alternatives are better.

@davidhewitt
Copy link
Member Author

I just noticed Py_TPFLAGS_MAPPING and Py_TPFLAGS_SEQUENCE, new in Python 3.10

This makes me think that having #[pyclass(mapping)] and #[pyclass(sequence)] might be a good idea.

@davidhewitt
Copy link
Member Author

Closing this one, since #2065 has been merged.

@davidhewitt davidhewitt closed this Feb 5, 2022
@davidhewitt davidhewitt deleted the protos-seq-mapping-fallbacks branch February 5, 2022 16:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants