RFC: Fix #2615 by relaxing the type check in extract_sequence. #2620

adamreichold · 2022-09-13T18:56:17Z

I am somewhat at a loss on how to properly test this without adding NumPy as a Python-side dependency and would be grateful for any advice here? (Maybe writing a custom Python class that implements only PySequence_Length and PySequence_Length and PyObject_GetIter?)

I am also not sure whether PyMapping needs a similar treatment but the conversions for HashMap seem to go through PyDict instead?

adamreichold · 2022-09-13T20:06:52Z

After staring at this for a bit longer, I noticed that neither PyObject::len nor PyObject::iter are actually part of PySequence and we could be even more lenient by extracting a Vec from everything that can be turned into an iterator at all. I am not sure if this is covering more types than we actually want it to cover though.

adamreichold · 2022-09-18T09:50:13Z

I am somewhat at a loss on how to properly test this without adding NumPy as a Python-side dependency and would be grateful for any advice here?

I think a nice and simple way could be to test this in the rust-numpy repository where I usually have a draft PR targeting the current Git version of PyO3 sitting around anyway, c.f. PyO3/rust-numpy#353 which adds a test case of extracting a Vec from a one-dimensional NumPy array which I could merge after this or a similar fix is released on crates.io.

This of course still leaves the question open whether we want extraction to be as relaxed as this PR would make, i.e. not checking for sequences at all but using anything that can be turned into an iterator?

davidhewitt · 2022-09-18T16:38:21Z

Sorry for the long delay in reply. Been swamped IRL and just not had a chance to get my head into this one.

It seems ok to add numpy to the noxfile for the pytests crate here, perhaps? I think it's better to add the test in this repo rather than rust-numpy if possible, else we could merge stuff which breaks this again and only notice on future rust-numpy CI runs.

As for relaxing the type check this far... I'm a little unsure, though in practice this is likely to match what the previous implementation did? I am tempted to look at what pybind11 does and see if it makes sense for us to be similar.

adamreichold · 2022-09-18T17:27:55Z

As for relaxing the type check this far... I'm a little unsure, though in practice this is likely to match what the previous implementation did? I am tempted to look at what pybind11 does and see if it makes sense for us to be similar.

From a quick glance, they are doing what we did before 0.17.0, i.e. they check for a sequence (but not str or bytes) and then iterate to fill a std::vector, c.f. https://github.com/pybind/pybind11/blob/1874f8fa8767179ab8f8c645a6e8c8c4e4a7c486/include/pybind11/stl.h#L194 and https://github.com/pybind/pybind11/blob/1874f8fa8767179ab8f8c645a6e8c8c4e4a7c486/include/pybind11/stl.h#L145 where checking for a sequence seems to be based on PySequence_Check, i.e. https://github.com/pybind/pybind11/blob/1874f8fa8767179ab8f8c645a6e8c8c4e4a7c486/include/pybind11/pytypes.h#L1985

So this would correspond to the first commit pushed here (modulo the extra check for bytes which we lack). So should I drop the second commit and try to add a pytest based on NumPy? And maybe also add the check to not extract bytes into a Vec?

Personally, I am sort of endeared with the generic version with the reasoning that the Rust is the one that fixes the types and the Python side is the malleable one that tries to fit in.

Meaning if the Rust side asks for a Vec, anything on the Python side that can be turned into this at the same level of efficiency should be fair game. If one really wants to have some sort of precedence here (like handling list[char]/Vec<char> and str/PyString differently), I would try to make this explicit on the Rust side using an enum like

#[derive(FromPyObject)]
enum StringOrChars {
  String(PyString),
  Chars(Vec<char>),
}

which should work even if Vec<char> could be extracted from PyString if I understand things correctly.

At least I don't really have an example I mind where making the extraction fail on the Rust side would make something possible that would not be if it succeeded. But then again, that explicit check for str probably had a rationale that I just cannot remember ATM.

adamreichold · 2022-09-18T17:35:30Z

But then again, that explicit check for str probably had a rationale that I just cannot remember ATM.

So by now, I found #2342 and #2500 and I think even here I would argue that the idiomatic Python solution

if isinstance(strings, str):
    strings = [strings]

could be turned into

#[derive(FromPyObject)]
enum OneOrMany<'a> {
  One(&'a PyString),
  Many(Vec<&'a PyString>),
}

impl Into<Vec<&'a PyString>> for OneOrMany {
  fn into(self) -> Vec<&'a PyString> {
    match self {
      Self::One(one) => vec![one],
      Self::Many(many) => many,
    }
  }
}

adamreichold · 2022-09-18T18:12:53Z

So should I drop the second commit and try to add a pytest based on NumPy?

Added the pytest and verified that both commits would make it green.

adamreichold · 2022-09-18T19:28:18Z

So should I drop the second commit and try to add a pytest based on NumPy?

Added the pytest and verified that both commits would make it green.

There are no binary wheels for NumPy at least on Windows/PyPy, so I restricted that test case to Linux/CPython since one combination should be enough to test this particular functionality.

This allows us to handle types like one-dimensional NumPy arrays which implement just enough of the sequence protocol to support the extract_sequence function but are not an instance of the sequence ABC.

…:iter actually need the sequence protocol.

davidhewitt · 2022-09-21T04:07:04Z

Sorry to have been so slow to reply here. Been busy IRL and needed some time to mull this over.

Personally, I am sort of endeared with the generic version with the reasoning that the Rust is the one that fixes the types and the Python side is the malleable one that tries to fit in.

I think I'm coming round to this argument too. I am flip-flopping on whether #2500 was a mistake. If we revert it, I guess we have to wait until 0.18?

I was thinking about releasing 0.17.2, so should we split this fix in two pieces:

change back to using ffi::PySequence_Check here
revert Automatically wrap str in a vec![] for Vec<&str> and Vec<String> #2500 in a follow up PR

That way I can cherry pick just the first change for 0.17.2.

I'm still a bit murky on whether that's definitely correct, as it'd be nice to get the HashMap and Vec conversions to have "equivalent" behaviour, however in the meanwhile I think going back to pre-0.17 behaviour might be best for users before we make a more deliberate change?

I think the main thing we lost from the 0.17 change was accepting np.ndarray containing objects? Was that so bad? An alternative, just to put it out there, could be to defend the check to be based on the ABC to be correct, and encourage users to have an enum or since kind of new type if they want to accept numpy arrays.

adamreichold · 2022-09-21T07:46:16Z

I was thinking about releasing 0.17.2, so should we split this fix in two pieces:

Since the change is already split into multiple commits here, I think we could already cherry-pick the second commit which basically restores pre-0.17 behaviour for impl FromPyObject for Vec<T> and does nothing else.

The second commit then further relaxes impl FromPyObject for Vec<T> to work with anything that can be turned into an iterator (whether it passes PySequence_Check or not). It however does touch upon the PyString special case introduced by #2500.

To make things less messy, I will open up two new PR, one containing the test and the first change. And another one containing the second commit and a removal of the PyString special case. Both with appropriate changelog entries.

adamreichold · 2022-09-21T08:25:11Z

These two PR, #2631 and #2632, are now pending review.

adamreichold force-pushed the relax-extract-sequence branch from ad841d8 to 39c990b Compare September 13, 2022 20:16

adamreichold linked an issue Sep 13, 2022 that may be closed by this pull request

val ndarray object cannot be converted to Sequence #2615

Closed

Base automatically changed from xtask-drop-workspace to main September 15, 2022 05:12

adamreichold force-pushed the relax-extract-sequence branch from 39c990b to ebac9f5 Compare September 18, 2022 18:12

adamreichold force-pushed the relax-extract-sequence branch 2 times, most recently from 56022a2 to cdff4bb Compare September 18, 2022 19:27

adamreichold added 3 commits September 19, 2022 19:41

Add a Python test exploring from which types a Vec can be extracted.

a1b34f0

Fix #2615 by relaxing the type check in extract_sequence.

076efc7

This allows us to handle types like one-dimensional NumPy arrays which implement just enough of the sequence protocol to support the extract_sequence function but are not an instance of the sequence ABC.

Further relax extract_sequence as neither PyObject::len nor PyObject:…

c2f1810

…:iter actually need the sequence protocol.

adamreichold force-pushed the relax-extract-sequence branch from cdff4bb to c2f1810 Compare September 19, 2022 17:41

adamreichold closed this Sep 21, 2022

adamreichold deleted the relax-extract-sequence branch September 21, 2022 08:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Fix #2615 by relaxing the type check in extract_sequence. #2620

RFC: Fix #2615 by relaxing the type check in extract_sequence. #2620

adamreichold commented Sep 13, 2022

adamreichold commented Sep 13, 2022 •

edited

Loading

adamreichold commented Sep 18, 2022

davidhewitt commented Sep 18, 2022

adamreichold commented Sep 18, 2022

adamreichold commented Sep 18, 2022 •

edited

Loading

adamreichold commented Sep 18, 2022

adamreichold commented Sep 18, 2022

davidhewitt commented Sep 21, 2022

adamreichold commented Sep 21, 2022

adamreichold commented Sep 21, 2022

RFC: Fix #2615 by relaxing the type check in extract_sequence. #2620

RFC: Fix #2615 by relaxing the type check in extract_sequence. #2620

Conversation

adamreichold commented Sep 13, 2022

adamreichold commented Sep 13, 2022 • edited Loading

adamreichold commented Sep 18, 2022

davidhewitt commented Sep 18, 2022

adamreichold commented Sep 18, 2022

adamreichold commented Sep 18, 2022 • edited Loading

adamreichold commented Sep 18, 2022

adamreichold commented Sep 18, 2022

davidhewitt commented Sep 21, 2022

adamreichold commented Sep 21, 2022

adamreichold commented Sep 21, 2022

adamreichold commented Sep 13, 2022 •

edited

Loading

adamreichold commented Sep 18, 2022 •

edited

Loading