Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2.5/3x Performance regression with PySequence as PyTryFrom since 0.16. #2943

Closed
ritchie46 opened this issue Feb 11, 2023 · 6 comments · Fixed by #2944
Closed

2.5/3x Performance regression with PySequence as PyTryFrom since 0.16. #2943

ritchie46 opened this issue Feb 11, 2023 · 6 comments · Fixed by #2944

Comments

@ritchie46
Copy link

ritchie46 commented Feb 11, 2023

This has been reported downstream in pola-rs/polars#6791

It seems that dispatch to a very simple function has got a lot more expensive. Take this simple program:

use pyo3::prelude::*;
use pyo3::types::PySequence;

#[pyfunction]
fn get_len(obj: &PyAny) -> PyResult<usize> {
    let seq = <PySequence as PyTryFrom>::try_from(obj)?;
    seq.len()
}

#[pymodule]
fn pyo3_example(_py: Python, m: &PyModule) -> PyResult<()> {
    m.add_function(wrap_pyfunction!(get_len, m)?)?;
    Ok(())
}

And running this snippet with a release build:

$ python -m timeit -n 1000000 -s "import pyo3_example as p; values = [1]" "p.get_len(values)"

This takes:

  • 0.16: best of 5: 92.7 nsec per loop
  • 0.17: best of 5: 255 nsec per loop
  • 0.18; best of 5: 241 nsec per loop

Cargo.toml

[package]
name = "pyo3-example"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[lib]
name = "pyo3_example"
crate-type = ["cdylib"]

[dependencies]
pyo3 = { version = "0.18", features = ["extension-module"] }
@davidhewitt davidhewitt changed the title 2.5/3x Performance regression function dispatch since 0.16. 2.5/3x Performance regression with PySequence as PyTryFrom since 0.16. Feb 11, 2023
@davidhewitt
Copy link
Member

Thanks for the repro. It looks like this is due to the changes to PySequence in 0.17 to make it match collections.abc.Sequence.

We can optimise the common case for lists and tuples. I'll push a PR in a second.

@adamreichold
Copy link
Member

adamreichold commented Feb 11, 2023

While that does not change that impl PyTryFrom for PySequence got slower, note that you did not drop get_pyseq defined in https://github.com/pola-rs/polars/blob/0a4a2c1f28e53a86ccf20aa3f27db5d0fa75c772/py-polars/src/conversion.rs#L68 as I recommended in pola-rs/polars#6531.

You do not appear to be using the PySequence API in that module. So personally, it would seem that there is no reason to perform that downcast in the first place which should also imply better performance than even the previous implementation provided by PyO3.

@ritchie46
Copy link
Author

Right, I did not get back to this @adamreichold, you are right.

Why is going through PySequence slower, is it more generic than going through the object directly?

@adamreichold
Copy link
Member

adamreichold commented Feb 11, 2023

Why is going through PySequence slower, is it more generic than going through the object directly?

I don't think that actually calling methods on PySequence is slower, i.e. the buffer filling you are doing should be as fast as before, but really the checked downcast itself is the only thing that got slower.

The main point is that in your conversion methods, the only method you seem to be calling is PySequence::len which calls PySequence_Size. You can just as well as call PyAny::len which calls PyObject_Size. I would be highly surprised if there are Python types for which these two methods have different performance characteristics. After that your methods seem to rely on PyAny::iter which is not a member of the PySequence interface in the first place.

So long story short, you don't need the PySequence API and the work you do not perform at all is always fastest.

@davidhewitt I am also not sure if special-casing lists and tuples would be sufficient to resolve the regression. While the bug report involved tuples, I suspect that for example NumPy arrays are a reasonably frequent argument type and checking in the same manner for that would require calling into NumPy's capsule API.

@ritchie46
Copy link
Author

Thanks for the rationale @adamreichold. I will drop the PySequence code. 👍

@adamreichold
Copy link
Member

@ritchie46 Maybe to give some context as to why the downcasting got slower: It is more strict now checking that a type actually implements collections.abc.Sequence. Before that, one could downcast types into PySequence which would then yield a lot of errors when calling the individual methods because they were not actually implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants