Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sageWithDoc: python 3.12 fixes #323426

Merged
merged 1 commit into from
Jul 8, 2024
Merged

Conversation

collares
Copy link
Member

@collares collares commented Jun 29, 2024

Description of changes

gcc 13.3.0 seems to exploit some undefined behaviour around setjmp/longjmp to do some optimizations that completely break Sage's GAP interface, see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114872 for the upstream investigation and https://trofi.github.io/posts/312-the-sagemath-saga.html for @trofi's analysis. This only affects Python 3.12.

Unfortunately, the minimal upstream patch at sagemath/sage#37951, which was supposed to address the issue, is not enough to fix the issue on Nixpkgs: GCC still emits code in which an always-null pointer is dereferenced with no null checks. Arch is also seeing crashes even with the upstream patch applied. As a stopgap, we compile the relevant file with -O1 (see gap-element-crash.patch)

I also tried other patches similar to sagemath/sage#37951, like having volatile v0, v1 and v2 variables and populating the relevant variables in the 1-arg, 2-arg and 3-arg cases (see here), but that makes crashes more frequent, presumably GCC can then easily infer that __pyx_t_4 (a temporary used in Cython-generated code to hold expressions of the form <GapElement>a[i]) is always non-null when jumping to the error handling cleanup code without longjmp. It then removes the null check, but it also performs other optimizations which notice that __pyx_t_4 is always zero. The case where we jump to the error handling code after longjmp, which is what Sage tests, is undefined behaviour and leads to a crash.

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 24.11 Release Notes (or backporting 23.11 and 24.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

@collares collares changed the title [staging-next] sageWithDoc: remove obsolete dep, import python 3.12 compat patch [staging-next] sageWithDoc: python 3.12 fixes Jun 29, 2024
@collares collares force-pushed the sage-python-312 branch 3 times, most recently from dc04e82 to edc3e6d Compare June 29, 2024 19:16
@collares collares force-pushed the sage-python-312 branch 2 times, most recently from c2c040d to 31f320d Compare July 1, 2024 07:41
@vcunat vcunat changed the base branch from staging-next to master July 6, 2024 07:06
@vcunat vcunat changed the title [staging-next] sageWithDoc: python 3.12 fixes sageWithDoc: python 3.12 fixes Jul 6, 2024
@collares collares marked this pull request as ready for review July 8, 2024 19:27
+
+cdef extern from *:
+ """
+ #pragma GCC optimize("O1")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a big hack, but Sage is missing a lot of volatile variable qualifiers and sagemath/sage#37951 wasn't enough to stop the crashes with Python 3.12 and the newest GCC. I will keep investigating this, but this seems like an acceptable compromise for now. I don't think clang implements the exact same optimizations, so a GCC-only fix seems OK.

@@ -171,6 +173,7 @@ buildPythonPackage rec {
lrcalc-python
matplotlib
memory-allocator
meson-python
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think numpy should be propagating this because it is required to use numpy.f2py (see https://numpy.org/doc/stable/f2py/buildtools/distutils-to-meson.html), but in this case I think it should also make the ninja binary accessible. In the meantime, I will add meson-python here and ninja to Sage's runtime path.

(cc @mweinelt)

Copy link
Member

@7c6f434c 7c6f434c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good on reading through

@collares collares merged commit a7514e3 into NixOS:master Jul 8, 2024
29 of 34 checks passed
@collares
Copy link
Member Author

collares commented Jul 8, 2024

Thanks for the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants