Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove cached datatype conversion path table entries on file close #3942

Merged
merged 2 commits into from
Jan 23, 2024

Conversation

jhendersonHDF
Copy link
Collaborator

When performing datatype conversions during I/O, the library checks to see whether it can re-use a cached datatype conversion pathway by performing comparisons between the source and destination datatypes of the current operation and the source and destination datatypes associated with each cached datatype conversion pathway. For variable-length and reference datatypes, a comparison is made between the VOL object for the file associated with these datatypes, which may change as a file is closed and reopened. In workflows involving a loop that opens a file, performs I/O on an object with a variable-length or reference datatype and then closes the file, this can lead to constant memory usage growth as the library compares the file VOL objects between the datatypes as different and adds a new cached conversion pathway entry on each iteration during I/O. This is now fixed by clearing out any cached conversion pathway entries for variable-length or reference datatypes associated with a particular file when that file is closed.

@jhendersonHDF jhendersonHDF added Merge - To 1.14 Priority - 1. High 🔼 These are important issues that should be resolved in the next release Component - C Library Core C library issues (usually in the src directory) Type - Improvement Improvements that don't add a new feature or functionality labels Jan 17, 2024
@jhendersonHDF jhendersonHDF linked an issue Jan 17, 2024 that may be closed by this pull request
@jhendersonHDF jhendersonHDF force-pushed the vlen_path_conv_table_fix branch 2 times, most recently from a20ea15 to 96d6c14 Compare January 17, 2024 20:14
When performing datatype conversions during I/O, the library
checks to see whether it can re-use a cached datatype conversion
pathway by performing comparisons between the source and destination
datatypes of the current operation and the source and destination
datatypes associated with each cached datatype conversion pathway.
For variable-length and reference datatypes, a comparison is made
between the VOL object for the file associated with these datatypes,
which may change as a file is closed and reopened. In workflows
involving a loop that opens a file, performs I/O on an object with a
variable-length or reference datatype and then closes the file, this
can lead to constant memory usage growth as the library compares the
file VOL objects between the datatypes as different and adds a new
cached conversion pathway entry on each iteration during I/O. This is
now fixed by clearing out any cached conversion pathway entries for
variable-length or reference datatypes associated with a particular
file when that file is closed.
@jhendersonHDF
Copy link
Collaborator Author

This PR adds a simple fix for the problem described in #1256. The main problem behind the issue is that variable-length and reference datatypes are compared according to their associated file VOL object, which may change as a file is closed and reopened. When attempting to reuse type conversion pathways in an application that has a file open -> I/O -> file close loop, the library will then compare datatypes in the current conversion operation as different from the datatypes in the cached pathway table, causing the caching table to continually grow as new entries are added until the H5T package is torn down at library termination. Further, since variable-length and reference datatypes hold a reference to their associated file's VOL object after #274, the memory for the VOL object will also be held around until the library terminates. Together, this can eventually lead to a drastic explosion of memory usage.

Several approaches for fixing this were proposed, including having a special conversion path table just for variable-length and reference datatypes, but this fix seems the most straightforward with existing library infrastructure. The downside is that it adds a scan of the cached conversion pathway table on every file close operation. If this still seems workable, I'll add a RELEASE note entry for the fix; Otherwise, we should discuss a better solution.

@jhendersonHDF
Copy link
Collaborator Author

Example program that demonstrates the issue:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include "hdf5.h"

int
main(int argc, char **argv)
{
    const char *buf[] = {"attr_value"};
    hid_t       file_id  = H5I_INVALID_HID;
    hid_t       attr_id  = H5I_INVALID_HID;
    hid_t       str_type = H5I_INVALID_HID;
    hid_t       space_id = H5I_INVALID_HID;
    int        *irbuf    = NULL;
    char       **rbuf    = NULL;

    printf("Test memory leak\n");

    file_id = H5Fcreate("file.h5", H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);
    if (file_id == H5I_INVALID_HID) {
        printf("Couldn't create file\n");
        exit(EXIT_FAILURE);
    }

    str_type = H5Tcopy(H5T_C_S1);
    if (str_type == H5I_INVALID_HID) {
        printf("Couldn't create string type\n");
        exit(EXIT_FAILURE);
    }

    if (H5Tset_size(str_type, H5T_VARIABLE) < 0) {
        printf("Couldn't set string size\n");
        exit(EXIT_FAILURE);
    }

    space_id = H5Screate(H5S_SCALAR);
    if (space_id == H5I_INVALID_HID) {
        printf("Couldn't create dataspace\n");
        exit(EXIT_FAILURE);
    }

    attr_id = H5Acreate2(file_id, "attribute", str_type, space_id, H5P_DEFAULT, H5P_DEFAULT);
    if (attr_id == H5I_INVALID_HID) {
        printf("Couldn't create attribute\n");
        exit(EXIT_FAILURE);
    }

    if (H5Awrite(attr_id, str_type, buf) < 0) {
        printf("Couldn't write to attribute\n");
        exit(EXIT_FAILURE);
    }

    H5Aclose(attr_id);
    H5Fclose(file_id);

    irbuf = malloc(1000 * 1000 * sizeof(int));
    rbuf  = malloc(sizeof(char *));

    /* for (size_t i = 0; i < 10; i++) { */
    while (true) {
        file_id = H5Fopen("file.h5", H5F_ACC_RDONLY, H5P_DEFAULT);
        if (file_id == H5I_INVALID_HID) {
            printf("Couldn't open file\n");
            exit(EXIT_FAILURE);
        }

        attr_id = H5Aopen(file_id, "attribute", H5P_DEFAULT);
        if (attr_id == H5I_INVALID_HID) {
            printf("Couldn't open attribute\n");
            exit(EXIT_FAILURE);
        }

        if (H5Aread(attr_id, str_type, rbuf) < 0) {
            printf("Couldn't read from attribute\n");
            exit(EXIT_FAILURE);
        }

        H5Treclaim(str_type, space_id, H5P_DEFAULT, rbuf);

        H5Aclose(attr_id);
        H5Fclose(file_id);
    }

    free(irbuf);
    free(rbuf);

    H5Tclose(str_type);
    H5Sclose(space_id);

    return 0;
}

src/H5T.c Outdated
@@ -2704,6 +2704,8 @@ H5T__unregister(H5T_pers_t pers, const char *name, H5T_t *src, H5T_t *dst, H5T_c
continue;
if (dst && dst->shared->type != soft->dst)
continue;
if (owned_vol_obj)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to remove the vlen conversion from the soft conversions table here, so always continue on if the owned_vol_obj parameter is specified. Ideally, the logic for removing the soft conversion functions would be separate from removing the cached conversion pathway table entries, but this seems minimally invasive.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The entire for loop can be skipped if owned_vol_obj is non-NULL. (So you could add that check to the surrounding if statement, which will reduce the performance impact a little)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very true; will do

Copy link
Contributor

@qkoziol qkoziol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm clicking 'approve', but believe that if you make the two small changes I've suggested, it would be better. :-)

src/H5T.c Outdated
@@ -2704,6 +2704,8 @@ H5T__unregister(H5T_pers_t pers, const char *name, H5T_t *src, H5T_t *dst, H5T_c
continue;
if (dst && dst->shared->type != soft->dst)
continue;
if (owned_vol_obj)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The entire for loop can be skipped if owned_vol_obj is non-NULL. (So you could add that check to the surrounding if statement, which will reduce the performance impact a little)

src/H5T.c Outdated
path = H5T_g.path[i];
assert(path);

nomatch = ((H5T_PERS_SOFT == pers && path->is_hard) || (H5T_PERS_HARD == pers && !path->is_hard)) ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has become increasingly unwieldy - a small "match" routine is justified?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that it's become unwieldy, though is the point of a "match" routine just to move the visual ugliness out of this routine?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly, and possibly to re-use

@jhendersonHDF jhendersonHDF force-pushed the vlen_path_conv_table_fix branch from c43774f to c063c72 Compare January 18, 2024 20:38
Copy link
Contributor

@qkoziol qkoziol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super, thanks for updating! :-)

@jhendersonHDF jhendersonHDF force-pushed the vlen_path_conv_table_fix branch from 311ca76 to 0b2bf8a Compare January 18, 2024 20:40
@derobins derobins merged commit f86fe61 into HDFGroup:develop Jan 23, 2024
49 checks passed
@tacaswell
Copy link

Even with this merged, is it worth sorting out how to manually clear this memory in h5py?

@jhendersonHDF
Copy link
Collaborator Author

Even with this merged, is it worth sorting out how to manually clear this memory in h5py?

I think I need to better understand the proxy buffering mechanism in h5py first, but I believe that this is still a problem (at least for non-string variable-length types) since I can't see anywhere else in the h5py codebase that H5Dvlen_reclaim/H5Treclaim is called after a read operation. Since this PR should fix the original issue with memory growth and since the h5py docs seem to suggest avoiding variable-length types where possible, it seems like a lower priority issue, but I think it can still be important. I quickly threw together jhendersonHDF/h5py@9004821 as a start, but it seemed a bit unwieldy to me and I was also still working on creating an h5py example for the issue.

lrknox pushed a commit to lrknox/hdf5 that referenced this pull request Feb 15, 2024
…DFGroup#3942)

Remove cached datatype conversion path table entries on file close

When performing datatype conversions during I/O, the library
checks to see whether it can re-use a cached datatype conversion
pathway by performing comparisons between the source and destination
datatypes of the current operation and the source and destination
datatypes associated with each cached datatype conversion pathway.
For variable-length and reference datatypes, a comparison is made
between the VOL object for the file associated with these datatypes,
which may change as a file is closed and reopened. In workflows
involving a loop that opens a file, performs I/O on an object with a
variable-length or reference datatype and then closes the file, this
can lead to constant memory usage growth as the library compares the
file VOL objects between the datatypes as different and adds a new
cached conversion pathway entry on each iteration during I/O. This is
now fixed by clearing out any cached conversion pathway entries for
variable-length or reference datatypes associated with a particular
file when that file is closed.
lrknox added a commit that referenced this pull request Feb 15, 2024
* Update upload- artifact to match download version (#3929)

* Reorg and update options for doc and cmake config (#3934)

* Add binary build for linux S3 (#3936)

* Clean up Doxygen for szip functions and constants (#3943)

* Replace off_t with HDoff_t internally (#3944)

off_t is a 32-bit signed value on Windows, so we should use HDoff_t
(which is __int64 on Windows) internally instead.

Also defines HDftell on Windows to be _ftelli64().

* Fix chid_t to hid_t (#3948)

* Fortran API work. (#3941)

* - Added Fortran APIs:
      H5FGET_INTENT_F, H5SSELECT_ITER_CREATE_F, H5SSEL_ITER_GET_SEQ_LIST_F,
      H5SSELECT_ITER_CLOSE_F, H5S_mp_H5SSELECT_ITER_RESET_F

    - Added Fortran Parameters:
      H5S_SEL_ITER_GET_SEQ_LIST_SORTED_F, H5S_SEL_ITER_SHARE_WITH_DATASPACE_F

    - Added tests for new APIs
    - Removed H5F C wrapper stubs
    - Documentation misc. cleanup.

* Add the user test program in HDFFV-9174 for committed types. (#3937)

Add the user test program for committed types in HDFFV-9174

* Remove cached datatype conversion path table entries on file close (#3942)

* fixed BIND name (#3957)

* update H5Ssel_iter_reset_f test

* Change 'extensible' to 'fixed' in H5FA code (#3964)

* RF: move codespell configuration into .codespellrc so could be used locally as well (#3958)

* Add RELEASE.txt note for the fix for issue #1256 (#3955)

* Fix doxygen errors (#3962)

* Add API support for Fortran MPI_F08 module definitions. (#3959)

* revert to using c-stub for _F08 MPI APIs

* use mpi compiler wrappers for cmake and nvhpc

* Added a GitHub Codespaces configuration. (#3966)

* Fixed XL and gfortran errors (#3968)

* h5 compiler wrappers now pass all arguments passed to it to the compile line (#3954)

* The issue was that the "allargs" variable was not being used in the final command of the compiler wrapper. Any entries containing an escaped quote (\", \') or other non-matching argument (*) would not be passed to the compile line. I have fixed this problem by ensuring all arguments passed to the compiler wrapper are now included in the compile line.

* Add binary testing to CI testing (#3971)

* Replace 'T2' with ' ' to avoid failure to match expected output due to (#3975)

* Clarify vlen string datatype message (#3950)

* append '-WF,' when passing C preprocessor directives to the xlf compiler (#3976)

* Create CITATION.cff (#3927)

Add citation source based on http://web.archive.org/web/20230610185232/https://portal.hdfgroup.org/display/knowledge/How+do+I+properly+cite+HDF5%The space difference in the Fortran examples must be fixed to match the expected output for compression filter examples.

* corrected warning: implicit conversion changes signedness (#3982)

* Skip mac bintest until more reliable (#3983)

* Make platform specific test presets for windows and macs (#3988)

* chore: fix typo (#3989)

* Add a missing left parenthesis in RELEASE.txt. (#3990)

* Remove ADB signature from RELEASE.txt. (#3986)

* Bump the github-actions group with 6 updates (#3981)

* Sync API tests with vol-tests (#3940)

* Fix for github issue #2414: segfault when copying dataset with attrib… (#3967)

* Fix for github issue #2414: segfault when copying dataset with attributes.
This also fixes github issue #3241: segfault when copying dataset.
Need to set the location via H5T_set_loc() of the src datatype
when copying dense attributes.
Otherwise the vlen callbacks are not set up therefore causing seg fault
when doing H5T_convert() -> H5T__conv_vlen().

* Fix broken links caused by examples relocation. (#3995)

* Add abi-complience check and upload to releases (#3996)

* Fix h5watch test failures to ignore system warnings on ppc64le. (#3997)

* Remove oneapi/clang compiler printf() type warning. (#3994)

* Updated information about obtaining the HDF5 source code to use the repos. (#3972)

* Fix overwritten preset names (#4000)

* Fix incompatible pointer type warnings in object reference examples (#3999)

* Fix build issue and some warnings in H5_api_dataset_test.c (#3998)

* Modern C++ dtor declarations (#1830)

* C++ dtor modernization

- Replaced a bunch of empty dtors with `= default`
- Removed deprecated `throw()`. In C++11, dtors are `noexcept` by default.

* remove incorrect check for environ (#4002)

* Add a missing file into Makefile.am for MinGW Autotools build error. (#4004)

* Issue #1824: Replaced most remaining sprintf with safer snprint (#4003)

* Add hl and cpp ABI reports to daily build (#4006)

* Don't add files and directories with names that begin with ., or that match *autom4te* to release tar & zip files. (#4009)

* Fix some output issues with ph5diff (#4008)

* Update install texts (#4010)

* Add C in project line for CMake to fix #4012. (#4014)

* separate out individual checks for string removal (#4015)

* Add compound subset ops on attributes to API tests (#4005)

---------
@jhendersonHDF jhendersonHDF deleted the vlen_path_conv_table_fix branch February 20, 2024 02:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component - C Library Core C library issues (usually in the src directory) Priority - 1. High 🔼 These are important issues that should be resolved in the next release Type - Improvement Improvements that don't add a new feature or functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

h5py (+PyTorch) memory leak since hdf5 1.12.1
4 participants