-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak in yt's test suite (Python 3.12.0rc1 to 3.12.0rc3) #109602
Comments
I attempted to bisect where this is coming from, but it's challenging since so many of the dependencies break along the 3.11-to-3.12 path. A standalone, simple reproducer would be very helpful here. Perhaps the memray data offers some clues? |
I'm getting there. A simpler case, but still relying on import numpy as np
from yt import load_uniform_grid
def foo():
arr = np.ones((16, 16 ,16))
data = dict(density=arr)
bbox = np.array([[-1, 1], [-1, 1], [-1, 1]])
ds = load_uniform_grid(data, arr.shape, 1.0, bbox=bbox)
data_source = ds.all_data()
field = "density"
min_val, max_val = data_source[field].min() / 2, data_source[field].max() / 2
return min_val, max_val
for i in range(1000):
foo() |
Thanks, this is really helpful. |
All I'm actually able to get from memray is that numpy arrays are the thing we're allocating. So far so good, but what I'd like to know is where memory should be freed but isn't, and I'm not sure that memray can help with that. |
Was this intended for another issue? I'm having trouble connecting the dots (as I don't think ncurses is involved in this bug). |
Oops, so that's where that comment went! Sorry! |
Update: I've spent a couple hours trying to reduce the reproducer and trying to slowly remove yt from it (easier said than done !) I discovered that commenting this single line in yt made the issue disappear So we're not there yet, but that felt like a hint worth sharing. |
Let me share my findings as well. I have a script that prints out "unreachable garbage" at the end of the run (objects where the reference count is larger than Trying to turn that into a pure Python reproducer has also alluded me: simply raising and catching a bunch of exceptions isn't enough, but is probably somehow related to object cycles and/or weak references getting involved as well. I think a pure Python reproducer will unlock the answer, because then we can |
I am getting close to a yt-less reproducer, though for now I am not trying to eliminate numpy from it. Of course having a pure Python reproducer would be even better, but since numpy is a self-contained dependency, I'm wondering if that would be enough. Thoughts @mdboom ? |
Here's my reduced reproducer https://github.com/neutrinoceros/reprod_cpython_bug_109602 In it's current state, it still depends on numpy at runtime and on Cython + setuptools at build time, but yt is out, and we're left with ~100 lines of Python and 3 lines of Cython. I have faith that the remaining dependency on numpy can be eliminated (I think its only effect is to generate big objects), but I wanted to share my progress as removing yt was quite an undertaking. |
I just pushed a new tag |
Thanks a lot, @neutrinoceros. I'll have a look today. |
I think I have found the bug in Cython's new support for 3.12. It will take me a while to put together a proper bug report -- I just wanted to report out that I think I found it so you don't waste too much time further refining. |
I finally isolated this to a reference leak in Cython's exception handling for 3.12, added in cython/cython#5442. It occurs when Cython adds a traceback frame to an exception that was raised in Python code. New minimal reproducer, using the Cython module in https://github.com/neutrinoceros/reprod_cpython_bug_109602: from collections import UserDict
from mini_yt.lib import bar
def main():
d = UserDict()
d["lots of data"] = bytearray(1024**2)
try:
# tries to access d["spam"]
bar(d)
except KeyError:
pass
NLOOPS = 40
for i in range(1, NLOOPS + 1):
main()
print(f"{i}/{NLOOPS}", end="\r") |
Yep, that's what I found, too. I'm in the process of writing up a Cython bug and fix. Would you prefer to take it, @yut23? |
I believe this is the fix [EDITED]: diff --git a/Cython/Utility/Exceptions.c b/Cython/Utility/Exceptions.c
index 20dc7faf3..ba3bd325f 100644
--- a/Cython/Utility/Exceptions.c
+++ b/Cython/Utility/Exceptions.c
@@ -182,10 +182,12 @@ static CYTHON_INLINE void __Pyx_ErrRestoreInState(PyThreadState *tstate, PyObjec
CYTHON_UNUSED_VAR(type);
if (value) {
#if CYTHON_COMPILING_IN_CPYTHON
- if (unlikely(((PyBaseExceptionObject*) value)->traceback != tb))
- #endif
+ if (unlikely(((PyBaseExceptionObject*) value)->traceback != tb)) {
// If this fails, we may lose the traceback but still set the expected exception below.
PyException_SetTraceback(value, tb);
+ Py_XDECREF(tb);
+ }
+ #endif
}
tmp_value = tstate->current_exception;
tstate->current_exception = value; |
Yeah, I'll take care of it later this morning. I think it also needs an extra preprocessor guard and a decref for |
Thanks all for helping to get to the bottom of this. |
Closing this, because it seems to not be a bug in CPython, though feel free to reopen if there's any documentation that should be improved, etc. |
Thanks a lot for your support ! |
Bug report
Bug description:
yt's test suite started crashing when switched to Python 3.12. rc1 was the earliest that we tried, and we're still seeing this with rc3. When run through GitHub Actions, jobs simply exit with
Error: the operation was cancelled
, and we get no traceback.It appeared that this crash can only be reproduced on machines with little RAM available, so we tried profiling memory usage.
Here's an illustrative graph, provided by @yut23, of memory usage VS time for a sample of yt's test suite (all third-party packages in the environment are at the same version in both envs)
Here's an self-contained reproducer script, using memray to trace memory usage
yt uses
weakref
quite a lot, so we suspect that the issue here might be reminiscent of #108295 (this was pointed out by @Xarthisius).CPython versions tested on:
3.12
Operating systems tested on:
Linux, macOS
The text was updated successfully, but these errors were encountered: