-
Notifications
You must be signed in to change notification settings - Fork 791
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatically wrap str
in a vec![]
for Vec<&str>
and Vec<String>
#2500
Conversation
IMO it's not ok to silently change behavior when |
@birkenfeld I agree that the user should be somehow notified of this, either through a warning or some error? |
Gotta agree that I'm not a fan of having behavior change by a feature. I know I copied a quote in #2342 which suggested this wrapping, but how about a simpler solution: just add a check when extracting Without specialization we'd have to pay that cost even for non-text types like That would at least hopefully work on stable...? |
@davidhewitt The only way I see it could be implemented on the stable channel would be to use I have tried multiple tricks, but none seemed to work. The implementation could look something like this: impl<'a, T> FromPyObject<'a> for Vec<T>
where
T: FromPyObject<'a>,
{
fn extract(obj: &'a PyAny) -> PyResult<Self> {
let ti = TypeId::of::<T>(); // Does not work
if (ti == TypeId::of::<String>() || ti == TypeId::of::<&'static str>())
&& obj.is_instance_of::<PyString>()?
{
// Raise some error;
}
extract_sequence(obj)
}
} |
Do we need the if let Ok(true) = obj.is_instance_of::<PyString>()
{
return Err(PyValueError::new_err("Can't extract `str` to `Vec`"))
}
extract_sequence(obj) Just depends how much that affects performance I think? |
The idea behind that kind of type checking (either with ˋTypeId |
@davidhewitt Here is a small benchmark I ran and its results: (Don't mind about the
impl<'a, T> FromPyObject<'a> for Vec<T>
where
T: FromPyObject<'a>,
{
fn extract(obj: &'a PyAny) -> PyResult<Self> {
if let Ok(true) = obj.is_instance_of::<PyString>() {
return Err(PyValueError::new_err("Can't extract `str` to `Vec`"));
}
extract_sequence(obj)
}
}
use pyo3::prelude::*;
#[pyfunction]
fn print_strings(strings: Vec<String>) -> usize {
strings.len()
}
#[pyfunction]
fn print_str(strings: Vec<&str>) -> usize {
strings.len()
}
#[pyfunction]
fn print_int(strings: Vec<isize>) -> usize {
strings.len()
}
#[pyfunction]
fn print_char(strings: Vec<char>) -> usize {
strings.len()
}
#[pymodule]
fn fromstr(_py: Python, m: &PyModule) -> PyResult<()> {
m.add_function(wrap_pyfunction!(print_strings, m)?)?;
m.add_function(wrap_pyfunction!(print_str, m)?)?;
m.add_function(wrap_pyfunction!(print_int, m)?)?;
m.add_function(wrap_pyfunction!(print_char, m)?)?;
Ok(())
}
from fromstr import *
from timeit import timeit
if __name__ == "__main__":
number = 100000
i = list(range(10))
c = list("abcdefg")
s = "some random sentence".split(" ")
print(timeit(lambda: print_strings(s), number=number) / number)
print(timeit(lambda: print_str(s), number=number) / number)
print(timeit(lambda: print_char(c), number=number) / number)
print(timeit(lambda: print_int(i), number=number) / number) Without the
|
Hmm, those I did some benchmarking myself, my rough conclusion is that on my machine extracting an empty Extracting 10 Seems ok to me for correctness? |
I have to agree that |
Agreed. Correctness has to be the top priority, and as new compiler functionality emerges (and new language patterns) we can refine implementations to be more efficient. A few things that this needs before ready for merge:
|
Closes PyO3#2342 Refactor to only raise an error based on `isinstance`
6b01925
to
433bdd8
Compare
@davidhewitt that should all be done now :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me, thanks for iterating on this!
It was a pleasure, thank you for your time! |
Hi all, I realize I'm leaving this comment many months after this was landed, but... Is there a reason ValueError was chosen here? TypeError seems more appropriate to me, and I wanted to ask if there was a reason it was rejected. |
@alex that's a reasonable suggestion; I didn't have a strong opinion when reviewing, however If this behaviour is useful to you, would you be willing to also join the discussion in #2632? The proposal there would remove this behaviour, and I'm undecided as I see both sides of the debate, so I would value having more user input. |
Sure, happy to do a PR to switch to TypeError :-) |
EDIT
After few discussions (see below), it was proposed to raise an error in the case when
str
can be split into aVec<T>
.Thus, only a dynamic type checking is done with
PyAny.isinstance_of::<PyString>()
.Old proposition
As discussed in #2342, this PR proposes a way to wrap
str
arguments inside a vector to avoid iterating through chars when it is not desired.Currently, only the
Vec<String>
is working, as I cannot manage to compile forVec<&str>
(code was commented out).This "solution" leverages the use of the
specialization
features, which is currently unstable. As such, it must be compile withnightly
feature activated as well as the nightly channel for the rust compiler.Another solution, as discussed in #2342, would be to throw an error instead of wrapping in a vector.
Demo