`2.5/3x` Performance regression with `PySequence as PyTryFrom` since `0.16`. #2943

ritchie46 · 2023-02-11T08:55:56Z

This has been reported downstream in pola-rs/polars#6791

It seems that dispatch to a very simple function has got a lot more expensive. Take this simple program:

use pyo3::prelude::*;
use pyo3::types::PySequence;

#[pyfunction]
fn get_len(obj: &PyAny) -> PyResult<usize> {
    let seq = <PySequence as PyTryFrom>::try_from(obj)?;
    seq.len()
}

#[pymodule]
fn pyo3_example(_py: Python, m: &PyModule) -> PyResult<()> {
    m.add_function(wrap_pyfunction!(get_len, m)?)?;
    Ok(())
}

And running this snippet with a release build:

$ python -m timeit -n 1000000 -s "import pyo3_example as p; values = [1]" "p.get_len(values)"

This takes:

0.16: best of 5: 92.7 nsec per loop
0.17: best of 5: 255 nsec per loop
0.18; best of 5: 241 nsec per loop

Cargo.toml

[package]
name = "pyo3-example"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[lib]
name = "pyo3_example"
crate-type = ["cdylib"]

[dependencies]
pyo3 = { version = "0.18", features = ["extension-module"] }

The text was updated successfully, but these errors were encountered:

davidhewitt · 2023-02-11T09:19:40Z

Thanks for the repro. It looks like this is due to the changes to PySequence in 0.17 to make it match collections.abc.Sequence.

We can optimise the common case for lists and tuples. I'll push a PR in a second.

adamreichold · 2023-02-11T09:32:14Z

While that does not change that impl PyTryFrom for PySequence got slower, note that you did not drop get_pyseq defined in https://github.com/pola-rs/polars/blob/0a4a2c1f28e53a86ccf20aa3f27db5d0fa75c772/py-polars/src/conversion.rs#L68 as I recommended in pola-rs/polars#6531.

You do not appear to be using the PySequence API in that module. So personally, it would seem that there is no reason to perform that downcast in the first place which should also imply better performance than even the previous implementation provided by PyO3.

ritchie46 · 2023-02-11T09:40:04Z

Right, I did not get back to this @adamreichold, you are right.

Why is going through PySequence slower, is it more generic than going through the object directly?

adamreichold · 2023-02-11T09:47:39Z

Why is going through PySequence slower, is it more generic than going through the object directly?

I don't think that actually calling methods on PySequence is slower, i.e. the buffer filling you are doing should be as fast as before, but really the checked downcast itself is the only thing that got slower.

The main point is that in your conversion methods, the only method you seem to be calling is PySequence::len which calls PySequence_Size. You can just as well as call PyAny::len which calls PyObject_Size. I would be highly surprised if there are Python types for which these two methods have different performance characteristics. After that your methods seem to rely on PyAny::iter which is not a member of the PySequence interface in the first place.

So long story short, you don't need the PySequence API and the work you do not perform at all is always fastest.

@davidhewitt I am also not sure if special-casing lists and tuples would be sufficient to resolve the regression. While the bug report involved tuples, I suspect that for example NumPy arrays are a reasonably frequent argument type and checking in the same manner for that would require calling into NumPy's capsule API.

ritchie46 · 2023-02-11T09:54:52Z

Thanks for the rationale @adamreichold. I will drop the PySequence code. 👍

adamreichold · 2023-02-11T09:55:06Z

@ritchie46 Maybe to give some context as to why the downcasting got slower: It is more strict now checking that a type actually implements collections.abc.Sequence. Before that, one could downcast types into PySequence which would then yield a lot of errors when calling the individual methods because they were not actually implemented.

ritchie46 mentioned this issue Feb 11, 2023

Performance regression in polars.PySeries.new_opt_f64 pola-rs/polars#6791

Closed

2 tasks

davidhewitt changed the title ~~2.5/3x Performance regression function dispatch since 0.16.~~ 2.5/3x Performance regression with PySequence as PyTryFrom since 0.16. Feb 11, 2023

davidhewitt mentioned this issue Feb 11, 2023

optimize sequence conversion for list and tuple #2944

Merged

ritchie46 mentioned this issue Feb 11, 2023

perf(python): remove PySequence downcast pola-rs/polars#6803

Merged

bors bot closed this as completed in c858ced Feb 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`2.5/3x` Performance regression with `PySequence as PyTryFrom` since `0.16`. #2943

`2.5/3x` Performance regression with `PySequence as PyTryFrom` since `0.16`. #2943

ritchie46 commented Feb 11, 2023 •

edited

Loading

davidhewitt commented Feb 11, 2023

adamreichold commented Feb 11, 2023 •

edited

Loading

ritchie46 commented Feb 11, 2023

adamreichold commented Feb 11, 2023 •

edited

Loading

ritchie46 commented Feb 11, 2023

adamreichold commented Feb 11, 2023

2.5/3x Performance regression with PySequence as PyTryFrom since 0.16. #2943

2.5/3x Performance regression with PySequence as PyTryFrom since 0.16. #2943

Comments

ritchie46 commented Feb 11, 2023 • edited Loading

Cargo.toml

davidhewitt commented Feb 11, 2023

adamreichold commented Feb 11, 2023 • edited Loading

ritchie46 commented Feb 11, 2023

adamreichold commented Feb 11, 2023 • edited Loading

ritchie46 commented Feb 11, 2023

adamreichold commented Feb 11, 2023

`2.5/3x` Performance regression with `PySequence as PyTryFrom` since `0.16`. #2943

`2.5/3x` Performance regression with `PySequence as PyTryFrom` since `0.16`. #2943

ritchie46 commented Feb 11, 2023 •

edited

Loading

adamreichold commented Feb 11, 2023 •

edited

Loading

adamreichold commented Feb 11, 2023 •

edited

Loading