Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cpython 0.6.0 SIGSEGVs and test failures on s390x / System Z (big endian) #265

Open
decathorpe opened this issue Jul 31, 2021 · 5 comments

Comments

@decathorpe
Copy link
Contributor

I tried updating the Fedora Linux packages for rust-cpython from version 0.5.2 to 0.6.0, but there are a lot of new issues with test failures on s390x architecture (the only big-endian architecture we have access to).

With python 3.9, there are these new test failures compared to 0.5.2:

failures:
---- objectprotocol::test::test_debug_string stdout ----
thread 'objectprotocol::test::test_debug_string' panicked at 'Unknown PyUnicode_KIND', src/objects/string.rs:308:22
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
---- objectprotocol::test::test_display_string stdout ----
thread 'objectprotocol::test::test_display_string' panicked at 'Unknown PyUnicode_KIND', src/objects/string.rs:308:22
---- objects::string::test::test_extract_lone_surrogate stdout ----
thread 'objects::string::test::test_extract_lone_surrogate' panicked at 'Unknown PyUnicode_KIND', src/objects/string.rs:308:22
---- objects::string::test::test_extract_lone_surrogate_lossy stdout ----
thread 'objects::string::test::test_extract_lone_surrogate_lossy' panicked at 'Unknown PyUnicode_KIND', src/objects/string.rs:308:22
---- objects::string::test::test_extract_umlaut stdout ----
thread 'objects::string::test::test_extract_umlaut' panicked at 'Unknown PyUnicode_KIND', src/objects/string.rs:308:22
failures:
    objectprotocol::test::test_debug_string
    objectprotocol::test::test_display_string
    objects::num::test::test_u64_max
    objects::string::test::test_extract_lone_surrogate
    objects::string::test::test_extract_lone_surrogate_lossy
    objects::string::test::test_extract_umlaut
test result: FAILED. 81 passed; 6 failed; 0 ignored; 0 measured; 4 filtered out; finished in 0.01s

With Python 3.10, there are additional failures, these two I have already investigated and traced back to changes in Python 3.10 itself:

These two tests fail too, but I do not get output from cargo:

test objectprotocol::test::test_debug_string ... FAILED
test objectprotocol::test::test_display_string ... FAILED

Because it looks like the test threads / processes crash with segmentation faults:

error: test failed.
Caused by:
  process didn't exit successfully: `./target/release/deps/cpython-abe96c0dffd595a2 --skip 'objects::num::test::float_to_' --skip src/objects/capsule.rs` (signal: 11, SIGSEGV: invalid memory reference)
  process didn't exit successfully: `./target/release/deps/test_class-d070ffed372516e4 --skip 'objects::num::test::float_to_' --skip src/objects/capsule.rs` (signal: 11, SIGSEGV: invalid memory reference)
  process didn't exit successfully: `./target/release/deps/test_sharedref-2ff6a3943eb85a79 --skip 'objects::num::test::float_to_' --skip src/objects/capsule.rs` (signal: 11, SIGSEGV: invalid memory reference)

For now, I think I'll have to skip running the test suite on s390x entirely.

@markbt
Copy link
Collaborator

markbt commented Aug 11, 2021

Thanks for the report!

The PyUnicode_KIND errors look concerning. It seems kind is a bitfield, so it must be packed differently so that the Rust version of PyUnicode_KIND is wrong on this platform. I don't have access to a big-endian platform to try things out on, but it would be good to work out a safe way to access C bitfields here.

@decathorpe
Copy link
Contributor Author

I only have very limited access to s390x outside our automated build environment myself, but the error can also be reproduced in emulated s390x machines in QEMU. It's slow, but I can do that on my own machine. So if you need me to test some things, I can do that. But with stuff like mapping C bitfields to Rust, I'm way out of my depth ... maybe this could be a good question to ask on users.rust-lang.org?

@decathorpe
Copy link
Contributor Author

I think I figured out the problem while looking at a very similar failure in pyo3.

There's bitshift and bitwise-and operations happening here:
https://github.com/dgrunwald/rust-cpython/blob/master/python3-sys/src/unicodeobject.rs#L497

Since the byte order of that u32 is different on big-endian vs. little-endian architectures, the result of this function is wrong on big-endian architectures. This bitmask operation probably needs to be done more generically, or with two different code paths for big/little endian architectures ...

@dgrunwald
Copy link
Owner

Both C and Rust use a u32 to access that field, so they should both use the same byte order.
I guess the problem is rather that s390x uses a different bit order within bitfields.

I have no idea how to access a C bitfield in a portable manner from Rust.

dgrunwald added a commit that referenced this issue Sep 1, 2021
Python has three ways of obtaining an integer from an object:
1) directly calling `PyLong_AsLong`
  * uses `__index__` if available
  * in Python 3.9 and ealier, but not in 3.10, uses `__int__` if `__index__` is unavailable
  * throws `TypeError` if the conversion is not possible
2) using `PyNumber_Long`
  * uses `__int__`
  * may throw other exceptions like `ValueError`, e.g. when converting from empty string
3) using `PyNumber_Index`
  * always uses `__index__` with any Python version
  * throws `TypeError` if the conversion is not possible

Previously, the behavior of rust-python was inconsistent: small integer types (e.g. `i32`) used approach 1; where as larger integer types (e.g. `i64`) used approach 2.
This this commit, all integer types consistently use approach 3.

This is a breaking change: extension methods declared with `arg: i32` parameters no longer accept Python floating-point values.
This fixes `objects::num::test::float_to_*` test failures with Python 3.10-rc1 (#265).
dgrunwald added a commit that referenced this issue Sep 1, 2021
Python has three ways of obtaining an integer from an object:
1) directly calling `PyLong_AsLong`
  * uses `__index__` if available
  * in Python 3.9 and ealier, but not in 3.10, uses `__int__` if `__index__` is unavailable
  * throws `TypeError` if the conversion is not possible
2) using `PyNumber_Long`
  * uses `__int__`
  * may throw other exceptions like `ValueError`, e.g. when converting from empty string
3) using `PyNumber_Index`
  * always uses `__index__` with any Python version
  * throws `TypeError` if the conversion is not possible

Previously, the behavior of rust-cpython was inconsistent: small integer types (e.g. `i32`) used approach 1; where as larger integer types (e.g. `i64`) used approach 2.
With this commit, all integer types consistently use approach 3.

This is a breaking change: extension methods declared with `arg: i32` parameters no longer accept Python floating-point values.
This fixes `objects::num::test::float_to_*` test failures with Python 3.10-rc1 (#265).
@decathorpe
Copy link
Contributor Author

I was recently able to fix this in PyO3 with this PR: PyO3/pyo3#3015

You might be able to do something similar here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants