Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZoneInfo comparison fails on free-threaded build #125243

Open
ngoldbaum opened this issue Oct 10, 2024 · 5 comments
Open

ZoneInfo comparison fails on free-threaded build #125243

ngoldbaum opened this issue Oct 10, 2024 · 5 comments
Assignees
Labels
topic-free-threading type-bug An unexpected behavior, bug, or error

Comments

@ngoldbaum
Copy link
Contributor

ngoldbaum commented Oct 10, 2024

Bug report

Bug description:

(originally noticed by @davidhewitt in #116738 (comment))

Very infrequently, PyO3 tests for timezone conversions will fail with errors like:

---- conversions::chrono_tz::tests::test_into_pyobject stdout ----
thread 'conversions::chrono_tz::tests::test_into_pyobject' panicked at src/conversions/chrono_tz.rs:120:17:
zoneinfo.ZoneInfo(key='UTC') != zoneinfo.ZoneInfo(key='UTC')

Unfortunately, I don't have an easier way to trigger this besides "run the PyO3 tests over and over until you see a failure", since I think triggering it requires many threads simultaneously using the C API which happens automatically with cargo for the PyO3 tests.

The test code is here:

https://github.com/PyO3/pyo3/blob/eacebb8db101d05ae4ccf0396e314b098455ecd3/src/conversions/chrono_tz.rs#L116-L134

I believe that rust is more or less doing the equivalent of this C code:

#include <Python.h>

int main()
{
  if (Py_IsInitialized() == 0) {
    Py_InitializeEx(0);

    PyObject *mod = PyImport_ImportModule("zoneinfo");

    if (mod == NULL) {
      PyErr_PrintEx(0);
      return -1;
    }

    PyObject *zoneinfo = PyObject_GetAttrString(mod, "ZoneInfo");

    if (zoneinfo == NULL) {
      PyErr_PrintEx(0);
      return -1;
    }

    PyObject *arg = PyUnicode_FromString("Europe/Paris");

    PyObject *obj1 = PyObject_CallOneArg(zoneinfo, arg);
    PyObject *obj2 = PyObject_CallOneArg(zoneinfo, arg);

    PyObject *result = PyObject_RichCompare(obj1, obj2, Py_EQ);

    PyObject_Print(result, stderr, Py_PRINT_RAW);
  }

  return 0;
}

This doesn't fail, which makes me suspect triggering it requires touching the C API in other threads.

CPython versions tested on:

3.13

Operating systems tested on:

macOS

Linked PRs

@ngoldbaum ngoldbaum added the type-bug An unexpected behavior, bug, or error label Oct 10, 2024
@ngoldbaum
Copy link
Contributor Author

To trigger it, have a rust toolchain and Python installed, clone the PyO3 repo and execute the following command:

while RUST_BACKTRACE=1 UNSAFE_PYO3_BUILD_FREE_THREADED=1 cargo test --lib --features=full -- --test-threads=100; do :; done

You may also hit some other test failures related to importing errors in the socket module, I'm opening a separate issue about that.

@colesbury
Copy link
Contributor

I'm having a bit of trouble reproducing this. I first removed the #[cfg(not(Py_GIL_DISABLED))].

When I run:

RUST_BACKTRACE=1 UNSAFE_PYO3_BUILD_FREE_THREADED=1 cargo test test_into_pyobject --features=full

I get:

thread 'conversions::chrono_tz::tests::test_into_pyobject' panicked at src/conversions/chrono_tz.rs:125:53:
called `Result::unwrap()` on an `Err` value: PyErr { type: <class 'ImportError'>, value: ImportError('/scratch/sgross/python-nogil/lib/python3.13t/lib-dynload/_struct.cpython-313td-x86_64-linux-gnu.so: undefined symbol: PyModule_Type'), traceback: Some(<traceback object at 0x2000202ca90>) }

Is that expected?

If I run all the tests, like in your command, I get lots of failures, including:

test buffer::tests::test_array_buffer ... FAILED
...
test conversions::chrono::tests::test_pyo3_offset_fixed_into_pyobject ... FAILED
...
test conversions::chrono::tests::test_pyo3_date_into_pyobject ... FAILED
test conversions::chrono::tests::test_pyo3_timedelta_into_pyobject ... FAILED
test exceptions::socket::tests::gaierror ... FAILED
test conversions::rust_decimal::test_rust_decimal::convert_one ... FAILED
...
test conversions::chrono::tests::test_pyo3_datetime_into_pyobject_fixed_offset ... FAILED
test conversions::chrono::tests::test_pyo3_time_into_pyobject ... FAILED
test exceptions::socket::tests::timeout ... FAILED
....
test types::dict::tests::test_set_item ... ok
error: test failed, to rerun pass `--lib`

Caused by:
  process didn't exit successfully: `/raid/sgross/pyo3/target/debug/deps/pyo3-cf359ba51a0de9f1` (signal: 11, SIGSEGV: invalid memory reference)

Is that expected?

I also tried with the environment variable PYTHON_GIL=1 set. Does that get passed through cargo test? I still see lots of failures and a panic:

PYTHON_GIL=1 RUST_BACKTRACE=1 UNSAFE_PYO3_BUILD_FREE_THREADED=1 cargo test --lib --features=full
...
test buffer::tests::test_array_buffer ... FAILED
...
test exceptions::socket::tests::gaierror ... FAILED
...
test conversions::chrono::tests::test_pyo3_date_into_pyobject ... FAILED
...
test conversions::rust_decimal::test_rust_decimal::convert_two ... FAILED
...
test conversions::chrono::tests::test_pyo3_offset_utc_frompyobject ... FAILED
test exceptions::socket::tests::herror ... FAILED
test conversions::rust_decimal::test_rust_decimal::test_scientific_notation ... FAILED
test conversions::chrono_tz::tests::test_frompyobject ... FAILED
test conversions::rust_decimal::test_rust_decimal::convert_one ... FAILED
test err::tests::warnings ... ok
test conversions::std::time::tests::test_duration_frompyobject_negative ... FAILED
pyo3_runtime.PanicException: new panic
test err::tests::fetching_panic_exception_resumes_unwind - should panic ... ok
test conversions::rust_decimal::test_rust_decimal::convert_one_hundred ... FAILED
test conversions::chrono::tests::test_zoneinfo_is_not_fixed_offset ... FAILED
test conversions::chrono::tests::test_pyo3_timedelta_into_pyobject ... FAILED
test conversions::rust_decimal::test_rust_decimal::convert_zero ... FAILED
Traceback (most recent call last):
test conversions::chrono_tz::tests::test_into_pyobject ... FAILED
test conversions::chrono::tests::test_invalid_types_fail ... FAILED
test conversions::rust_decimal::test_rust_decimal::test_nan ... FAILED
  File "/scratch/sgross/python-nogil/lib/python3.13t/socket.py", line 52, in <module>
    import _socket
ImportError: /scratch/sgross/python-nogil/lib/python3.13t/lib-dynload/_socket.cpython-313td-x86_64-linux-gnu.so: undefined symbol: PyModule_Type
test exceptions::tests::test_check_exception ... FAILED

@colesbury
Copy link
Contributor

@ngoldbaum here's a draft PR that fixes the non-threadsafe cache in _zoneinfo.c. And here's the same change, but on the 3.13 branch if that's helpful.

I'm still having trouble testing this and it looks like the accesses to PyDateTimeAPI_impl in PyO3 are not thread-safe without the GIL:

https://github.com/PyO3/pyo3/blob/50a254dd8431c2eb680184898db3997c0c2cd77e/pyo3-ffi/src/datetime.rs#L611-L627

(I posted this on the wrong issue originally)

colesbury added a commit to colesbury/cpython that referenced this issue Oct 10, 2024
Lock `ZoneInfoType` to protect accesses to `ZONEINFO_STRONG_CACHE`.
Refactor the `tp_new` handler to use Argument Clinic so that we can just
use `@critical_section` annotations on the relevant functions.

Also use `PyDict_SetDefaultRef` instead of `PyDict_SetDefault` when
inserting into the `TIMEDELTA_CACHE`.
@colesbury
Copy link
Contributor

OK, it turns out that it's very important to use a Python installation built with --enable-shared or PyO3's imports of Python modules that use native extensions fail.

@davidhewitt
Copy link
Contributor

Thanks for investigating and the quick turnaround!

OK, it turns out that it's very important to use a Python installation built with --enable-shared or PyO3's imports of Python modules that use native extensions fail.

Yes, there is some crude support for "static embedding" but it's horribly broken. Sorry that sunk time for you, we should probably make it harder to allow that.

I'm still having trouble testing this and it looks like the accesses to PyDateTimeAPI_impl in PyO3 are not thread-safe without the GIL

👍 that looks very likely, we can look into that downstream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-free-threading type-bug An unexpected behavior, bug, or error
Projects
Development

No branches or pull requests

4 participants