-
-
Notifications
You must be signed in to change notification settings - Fork 30
Synchronize cypari2 stack and Python lifecycle #180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
TLDR, with this PR, the lifecycle of cypari2 objects behaves similarly to that of Python objects, while keeping the PARI stack and heap mechanics underneath up to a certain extent (please see the side effect above). |
@tornaria @NathanDunfield @culler Could you have a look please? |
I'm not sure if this is related. But may I take the occasion to pull attention of Pari / Capari2 experts to sagemath/sage#40081. In this PR I try to integrate Khoca as an optional package into Sage. Khoca is mainly written in C++ and uses native Pari (for Smith-form calculations). When I first tried to integrate it (in 2021) I got seg-faults from the Cypari2 interface (explicitely in In the meantime, I have made some improvements to the corresponding PyPI package. In particular, there are now binary wheels for Linux an MacOS. But even now sporadically a I would apreciate if someone can take a look at it and help to bring this ahead. Thank you in advance! |
@soehms Thank you for your comment. I am not an expert but I can try to debug the problem in your PR. I will drop comments in your PR later. |
By the way, I made a mistake about the side effect in my original PR description (which I have edited). As an illustration, consider the following code:
Suppose that the values of Before this PR, it will create the following cypari2 stack: Meanwhile, the value of the stack and heap at the end of the while loop at iterations:
And if we have
Then, And in these variants of case, ~50% of the entire set of all values will live in the stack. I updated my original post to point to this comment for the side effect illustration. |
The reason for such a complication is that in Python, the deallocation of |
Perhaps as a middle ground, we can introduce a new method
Now, in the above example, all variables will live in stack. And in this case, the current PR behavior is arguably better than before this PR because by default for most users, we have no memory leaks (oh, and this PR also fix the doctest memory leaks) and extra code is only needed for more advanced users who really need more performance. |
Oh I accidentally deleted another mitigation of the side effect. We can also mitigated the side effects by lazily constructing the
We can make a change so that during |
Now I think about it again, implementing such lazy construction won't be too difficult. I will upload a separate PR for that once this one gets in good shape. |
@dimpase replying to your email in sage-devel here as per your request, as it is about testing this PR and also @soehms PR: I cloned sage to my local computer (WSL) and ran make. And then, I did It seems that the
during import of cypari2 in sage. |
It could be that you changed cypari2 API, and do Sage's use of cypari2 has to be adjusted? Anyway, I would make sure sagelib is cleaned and uninstalled make sagelib-clean sagelib-uninstall and only then install newer cypari2 and rebuild sagelib with make build If you use the branch from Sage pr 39030 then it might be less onerous, so that uninstall/clean of sagelib is not needed, but I a not sure |
It might be quicker for the purposes of testing to build a binary wheel of the modified cypari2, outside of Sagemath, and install this wheel. I'm lately using |
anyway, you can also build a dist tarfile by doing something like
The lines around this error don't produce a crash interactively, until I hit Ctrl-D to exit, and get
seems to indicate some "fun" memory issues. |
cypari2-2.2.2.tar.gz |
here is the same crash as above, but on Linux. Also, note the 1st line of output (about Gen size changed,)
On linux I see 2 more test failures:
and a very similar
|
@tobiasdiez - with meson build, I keep getting
which is presumably something to do with numpy, but what? Is the state not cleaned by
|
I could narrow the warning to the following import
so it's most probably nothing to do with numpy, but about the interaction between Sage and cypari2 in this module, |
Thank you @dimpase! I will check the segmentation fault and the dist tarball that you created. |
If you use the classical build of Sage, the fully correct way to update a Sage package (and that's what entails testing a cypari2 update with Sage - updating a Sage package) is as follows: put the dist tarball into Sage's upstream/ directory, bumping its name like diff --git a/build/pkgs/cypari/checksums.ini b/build/pkgs/cypari/checksums.ini
index 1bdcba515d5..bbbbf66db7f 100644
--- a/build/pkgs/cypari/checksums.ini
+++ b/build/pkgs/cypari/checksums.ini
@@ -1,4 +1,4 @@
tarball=cypari2-VERSION.tar.gz
-sha1=5d91408a6e28e43d429667554d2696cfbd58c35b
-sha256=13a338735ea221c1068f8fc415561bf777d8c68725702bc749547264fd091720
+sha1=d0ca8921c6b61cc79061e7299d9ba89e60ee31f7
+sha256=295c931468158e40e954b2b4d097091dcf9e8ab2d738c6cb996fbecfd9a552b3
upstream_url=https://files.pythonhosted.org/packages/source/c/cypari2/cypari2-VERSION.tar.gz
diff --git a/build/pkgs/cypari/package-version.txt b/build/pkgs/cypari/package-version.txt
index b1b25a5ffae..93338fcd951 100644
--- a/build/pkgs/cypari/package-version.txt
+++ b/build/pkgs/cypari/package-version.txt
@@ -1 +1 @@
-2.2.2
+2.2.2p1 then rebuild, i.e. run
apparently, |
@dimpase thank you for the instruction! Oh, and also thank you for the segmentation fault report. I was debugging it, and it seems there is some confusion in my code where a pari stack address was thought of as a pari heap address, and it was causing a segmentation fault because we cannot |
@ptrrsn - do you understand the reason for #180 (comment) ? This appears to be platform-specific (doesn't happen on macOS) |
@dimpase Unfortunately, I don't know the reason yet, but I will try to debug it. |
@dimpase As for #180 (comment), I unfortunately could not reproduce the error from the steps that you provided. I first checkout PR 39030, and then I did
In both workflows, I don't see the Gen size change error anymore. But digging into this a bit (sorry for the unnecessary details if you are already aware of it), the number 56 for the old Gen are computed by Cython and baked into Maybe I misunderstand your workflow, could you elaborate what steps did you use in #180 (comment) ? |
@ptrrsn thanks for trying to reproduce this. It might be some kind of caching of object files which is too aggressive and leads to improper re-use. I'll try again, on a cleaner state. |
@dimpase oh, I noticed that with PR 39030, a
and then we have
But after:
we have the timestamp updated
Maybe just a data point, at least the recompilation is working for me during |
OK, apologies, that was indeed my local problem, with multiple copies of cypari2 lying around. After everything was cleaned up, no more weird warnings at startup. (well, the segfault I reported above didn't go away, but I understand it's not fixed here yet) |
This PR attempts to fix #112.
As explained in #112, the main reason for the leak is that the cypari2 stack (not to be confused with PARI stack) holds ownership of the Gen objects. Therefore, the primary way for the Gen objects to be cleaned up is when the cypari2 stack reaches a certain threshold (half the maximum size) that triggers a cleanup.
For example, consider the following example from #112:
The behavior before this PR: on the second iteration, the old value of
M
is not deallocated because the cypari2 stack still holds a reference to it.This PR solves this problem mainly by replacing cypari2 stack's reference to the Gens object to use weakref. In the above example, after this PR, on the second iteration, the old value of
M
will be deallocated because cypari2 only holds a weak reference toM
, not a strong reference.Therefore, the primary change introduced here is to replace
Gen.next
with a weak reference. To accommodate that, the following changes are made:Gen
object deallocations now can happen more often and in any order. When theGen
object is in the middle of cypari2 stack, we need to move everything below it to the heap to avoid the stack being fragmented.gunclone
(hence, nogunclone_deep
), even if it is a container type like matrix or vector. Therefore, an object in the PARI heap only needs a reference count of 1 and never needs to be 2, removing the complications ofref_target
(which can increase ref count by 2). At the Python level, non-default entries are hold inGen.itemcache
of its container and will be deallocated when theitemcache
is cleared. There is an exception for this and it is documented in the comment inside themove_gens_to_heap
method.The PR was tested with
make check
and memory leak checks similar to examples given in #112.As a note, there is a tricky side effect of this PR explained in: #180 (comment)