-
Notifications
You must be signed in to change notification settings - Fork 769
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem with default sys.stdout.encoding producing UnicodeEncodeError #2940
Comments
Can you show some code or details where the string comes from and what you do with it? In particular, is there stdin/stdout or files involved? Normal string conversions between Rust and Python are independent of any system encodings. |
Compile the following code: use pyo3::{prepare_freethreaded_python, types::PyDict, Python};
fn main() {
println!("LANG: {:?}", std::env::var("LANG"));
prepare_freethreaded_python();
Python::with_gil(|py| {
let s1: String = "non_ascii_char-->ß<--".to_string();
let dict = PyDict::new(py);
dict.set_item("s1", s1).unwrap();
py.eval("print(s1)", Some(dict), None).unwrap();
})
} And execute the following commands: cargo run -> should execute normally LANG="bogus" cargo run -> should panic |
I know that the string conversion between Rust and Python is independent of the used encoding and that was my problem in the first place. If the standard encoding used by Rust (UTF-8) and by Python (depending on the LANG environment variable) differ, this unexpected error will occur. |
The Rust string is not involved here, neither is Rust's string encoding. You get the same exception from this:
or even by importing a second module with the The problem is that Not a PyO3 bug IMHO. |
Well, that's why I marked this as an "enhancement" and not a bug.
I think this is not entirely correct. Rust
This is true, however this is also the problem. The output encoding is automatically determined from the locale, set via the "LANG" environment variable, which can define a number of encodings. Besides UTF-8, you could e.g. also use "en_GB.iso88591" or "en_GB.iso885915", which are also not compatible with UTF-8. If it is not defined, Python uses the C / POSIX default locale with the ASCII encoding (related PEP: [3]). PYO3 just assumes that the shell that executes the Rust process uses a locale with UTF-8 encoding or leaves the task of reconfiguring the IO encoding of their Python interpreter to the user. I would appreciate it, if the documentation would just list this problem and the solution to it. The current error states the problem (ordinal with an out-of-range value for the current encoding), but doesn't tell you why. I do not expect the average developer to know about locale coercion of the Python interpreter, nor about the (historic) POSIX defaults. |
The Python API that PyO3 uses ( But as I showed, it is also not relevant here: you could have pure-ASCII source code, by encoding the
This is standard Python behavior: try |
HOWEVER: I just realized that with Digging through the code, this is because of this config setting. In an embedded environment initialized by So the actionable item here is (cc @davidhewitt) if we should switch from |
Sorry it's taken me a few days to get to this discussion. Agreed that Rust <-> Python string conversions are irrelevant here, both are unicode encoded. On the locale, yes I absolutely agree that we can do better. Python UTF-8 mode was added in Python 3.7 (i.e. our minimum Python version), so I would be in favour of changing to support it better in PyO3. With issues like #2817 and #1741 kicking around, some reworking to our embedded interpreter initialization seems overdue. (Probably better defaults and a more flexible API combined.) Given that PEP 686 has declared that Python 3.15 will be UTF-8 mode as the default, I'm personally tempted to suggest that we already make all PyO3 embedded interpreters are UTF-8 mode by default (i.e. set the config setting you link above to Let's make this something to action for 0.19 |
I'm not sure that we should switch from not matching |
Quite possibly, yes. I think there's going to be quite a few of these defaults to figure out when implementing "better" Python initialization... |
I just had this really weird bug, where no matter which Rust String I wanted to use in Python always raised this error when trying to print or convert the String: "UnicodeEncodeError('ascii', 'asdf´´', 4, 6, 'ordinal not in range(128)')"
I just now found the problem which caused this error to occur consistently:
The shell in which the rust executable was running didn't have the "LANG" environment variable set. This makes the python interpreter choose the default encoding "ASCII" which is (among others) used for output encoding. This error is easily overlooked because everything works fine if you only use ASCII characters (because UTF-8 is compatible with ASCII in that range). It only fails when using a character that is not included in ASCII, i.e. has a value of over 127.
Rust Strings are (by default) stored as utf-8 encoded bytes and the conversion between a Rust and a Python String apparently directly hands a pointer to the utf-8 encoded byte-array to the Python interpreter to produce a Python String.
Feature Request: Somehow tell the user that the Python output encoding should be UTF-8 (and this means properly setting the "LANG" variable), because otherwise this weird error might occur when dealing with special characters.
The text was updated successfully, but these errors were encountered: