-
-
Notifications
You must be signed in to change notification settings - Fork 264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
h5py (+PyTorch) memory leak since hdf5 1.12.1 #1256
Comments
The problem might be linked to this specific change in After the change, it looks like this, and then it leaks: Lines 3433 to 3438 in 98a27f1
Lines 3349 to 3351 in 4aa4036
Another potential clue: after the change, the functions --- calls with leak
+++ calls without leak
2501 H5T__own_vol_obj
2501 H5VL_free_object not yet free 1
- 7502 H5VL_free_object not yet free 2
- 5002 H5VL_free_object not yet free 3
12505 H5VL_free_object was free 0
- 38146 H5T_copy
+ 18154 H5T_copy
15003 H5T_copy_reopen
- 53149 H5T__initiate_copy
+ 33157 H5T__initiate_copy (for I do not understand why this ref counting change would have to lead to more copies. |
Hello, I've hit the same issue using the C++ library (triggered by the same commit 98a27f1). Here are some additional findings in the event it helps:
Here is a minimal reproduction in the form of a C++ test: static void
test_string_attr_many()
{
SUBTEST("Write a string many times");
StrType vls_type(0, H5T_VARIABLE);
DataSpace att_space(H5S_SCALAR);
for (size_t i = 0; i < 10000; ++i) {
H5File fid1("file.h5", H5F_ACC_TRUNC);
Group root = fid1.openGroup("/");
Attribute gr_vlattr = root.createAttribute("attribute", vls_type, att_space);
gr_vlattr.write(vls_type, std::string("value"));
}
PASSED();
} Reverting the ref count change in |
It’s mark for the next 1.14.4 release. It may also be relevant for one of our projects that recently started so gets fixed.
|
@gvtulder @plule-ansys @tacaswell Hi all, this issue is a bit old now; is this still a problem with the latest I looked through the h5py codebase and it appears to expose the I read through the h5py docs briefly, but couldn't find any reference to @tacaswell Are you able to comment on whether h5py has any information |
I have not had time to dig into this, but my expectation is that h5py should be doing the copy before handing back to the (Python space) user or we need to extend our object tear down logic. Either way, it should not be on the end user to remember to do this. |
Thanks for the response! I plan to try and prototype this, but if this is something that h5py can manage such that the end user doesn't have to worry about it, that would be fantastic. I'm less familiar with Python, but I received some advice during a separate discussion pointing me towards context managers that look like they could be useful for this purpose. As long as h5py is still in control of the underlying C buffer when the Python space is finished with it, then I imagine h5py could be updated to call |
@jhendersonHDF Thank you for your answer.
Could you be a bit more specific on what should be freed with Also let me know if the C++ situation should be handled in a separate issue, my initial understanding was that it was the same but now I'm unsure. |
Hi @plule-ansys, after digging more into the issue, it seems to be a bit of a multifaceted problem, HDF5 generally needs to allocate memory for variable-length types during the read Having said all that, I finally found the source of the original issue reported on |
* Update upload- artifact to match download version (#3929) * Reorg and update options for doc and cmake config (#3934) * Add binary build for linux S3 (#3936) * Clean up Doxygen for szip functions and constants (#3943) * Replace off_t with HDoff_t internally (#3944) off_t is a 32-bit signed value on Windows, so we should use HDoff_t (which is __int64 on Windows) internally instead. Also defines HDftell on Windows to be _ftelli64(). * Fix chid_t to hid_t (#3948) * Fortran API work. (#3941) * - Added Fortran APIs: H5FGET_INTENT_F, H5SSELECT_ITER_CREATE_F, H5SSEL_ITER_GET_SEQ_LIST_F, H5SSELECT_ITER_CLOSE_F, H5S_mp_H5SSELECT_ITER_RESET_F - Added Fortran Parameters: H5S_SEL_ITER_GET_SEQ_LIST_SORTED_F, H5S_SEL_ITER_SHARE_WITH_DATASPACE_F - Added tests for new APIs - Removed H5F C wrapper stubs - Documentation misc. cleanup. * Add the user test program in HDFFV-9174 for committed types. (#3937) Add the user test program for committed types in HDFFV-9174 * Remove cached datatype conversion path table entries on file close (#3942) * fixed BIND name (#3957) * update H5Ssel_iter_reset_f test * Change 'extensible' to 'fixed' in H5FA code (#3964) * RF: move codespell configuration into .codespellrc so could be used locally as well (#3958) * Add RELEASE.txt note for the fix for issue #1256 (#3955) * Fix doxygen errors (#3962) * Add API support for Fortran MPI_F08 module definitions. (#3959) * revert to using c-stub for _F08 MPI APIs * use mpi compiler wrappers for cmake and nvhpc * Added a GitHub Codespaces configuration. (#3966) * Fixed XL and gfortran errors (#3968) * h5 compiler wrappers now pass all arguments passed to it to the compile line (#3954) * The issue was that the "allargs" variable was not being used in the final command of the compiler wrapper. Any entries containing an escaped quote (\", \') or other non-matching argument (*) would not be passed to the compile line. I have fixed this problem by ensuring all arguments passed to the compiler wrapper are now included in the compile line. * Add binary testing to CI testing (#3971) * Replace 'T2' with ' ' to avoid failure to match expected output due to (#3975) * Clarify vlen string datatype message (#3950) * append '-WF,' when passing C preprocessor directives to the xlf compiler (#3976) * Create CITATION.cff (#3927) Add citation source based on http://web.archive.org/web/20230610185232/https://portal.hdfgroup.org/display/knowledge/How+do+I+properly+cite+HDF5%The space difference in the Fortran examples must be fixed to match the expected output for compression filter examples. * corrected warning: implicit conversion changes signedness (#3982) * Skip mac bintest until more reliable (#3983) * Make platform specific test presets for windows and macs (#3988) * chore: fix typo (#3989) * Add a missing left parenthesis in RELEASE.txt. (#3990) * Remove ADB signature from RELEASE.txt. (#3986) * Bump the github-actions group with 6 updates (#3981) * Sync API tests with vol-tests (#3940) * Fix for github issue #2414: segfault when copying dataset with attrib… (#3967) * Fix for github issue #2414: segfault when copying dataset with attributes. This also fixes github issue #3241: segfault when copying dataset. Need to set the location via H5T_set_loc() of the src datatype when copying dense attributes. Otherwise the vlen callbacks are not set up therefore causing seg fault when doing H5T_convert() -> H5T__conv_vlen(). * Fix broken links caused by examples relocation. (#3995) * Add abi-complience check and upload to releases (#3996) * Fix h5watch test failures to ignore system warnings on ppc64le. (#3997) * Remove oneapi/clang compiler printf() type warning. (#3994) * Updated information about obtaining the HDF5 source code to use the repos. (#3972) * Fix overwritten preset names (#4000) * Fix incompatible pointer type warnings in object reference examples (#3999) * Fix build issue and some warnings in H5_api_dataset_test.c (#3998) * Modern C++ dtor declarations (#1830) * C++ dtor modernization - Replaced a bunch of empty dtors with `= default` - Removed deprecated `throw()`. In C++11, dtors are `noexcept` by default. * remove incorrect check for environ (#4002) * Add a missing file into Makefile.am for MinGW Autotools build error. (#4004) * Issue #1824: Replaced most remaining sprintf with safer snprint (#4003) * Add hl and cpp ABI reports to daily build (#4006) * Don't add files and directories with names that begin with ., or that match *autom4te* to release tar & zip files. (#4009) * Fix some output issues with ph5diff (#4008) * Update install texts (#4010) * Add C in project line for CMake to fix #4012. (#4014) * separate out individual checks for string removal (#4015) * Add compound subset ops on attributes to API tests (#4005) ---------
In h5py, the Python memory explodes when an array is loaded from an h5py object and converted to a PyTorch tensor, but only if you also retrieve an attribute. See the original issue: h5py/h5py#2010.
I am not entirely sure if this is an h5py issue or a problem in libhdf5, but it is connected to this change in hdf5: 98a27f1 (between libhdf5 1.12.0 and 1.12.1). Before this change, the leak does not appear, after, it does.
Example
On my system, I can produce this error by running
pip install h5py numpy torch
and then running this script:Running this will cause the Python memory to explode. All three steps are necessary (loading an array + reading an attribute + converting the array to PyTorch) to create this issue.
Bisection result
I repeated this with self-compiled h5py 3.6.0 with libdfh5 compiled from git between 1.12.0 and 1.12.1. The problem depends on the hdf5 version: 1.12.0 works, 1.12.1 doesn't, and 98a27f1 is the first commit where it goes wrong.
Is this an hdf5 issue or is this something in h5py?
cc @tacaswell
The text was updated successfully, but these errors were encountered: