Remove cached datatype conversion path table entries on file close #3942

jhendersonHDF · 2024-01-17T20:09:56Z

When performing datatype conversions during I/O, the library checks to see whether it can re-use a cached datatype conversion pathway by performing comparisons between the source and destination datatypes of the current operation and the source and destination datatypes associated with each cached datatype conversion pathway. For variable-length and reference datatypes, a comparison is made between the VOL object for the file associated with these datatypes, which may change as a file is closed and reopened. In workflows involving a loop that opens a file, performs I/O on an object with a variable-length or reference datatype and then closes the file, this can lead to constant memory usage growth as the library compares the file VOL objects between the datatypes as different and adds a new cached conversion pathway entry on each iteration during I/O. This is now fixed by clearing out any cached conversion pathway entries for variable-length or reference datatypes associated with a particular file when that file is closed.

jhendersonHDF · 2024-01-17T20:24:45Z

This PR adds a simple fix for the problem described in #1256. The main problem behind the issue is that variable-length and reference datatypes are compared according to their associated file VOL object, which may change as a file is closed and reopened. When attempting to reuse type conversion pathways in an application that has a file open -> I/O -> file close loop, the library will then compare datatypes in the current conversion operation as different from the datatypes in the cached pathway table, causing the caching table to continually grow as new entries are added until the H5T package is torn down at library termination. Further, since variable-length and reference datatypes hold a reference to their associated file's VOL object after #274, the memory for the VOL object will also be held around until the library terminates. Together, this can eventually lead to a drastic explosion of memory usage.

Several approaches for fixing this were proposed, including having a special conversion path table just for variable-length and reference datatypes, but this fix seems the most straightforward with existing library infrastructure. The downside is that it adds a scan of the cached conversion pathway table on every file close operation. If this still seems workable, I'll add a RELEASE note entry for the fix; Otherwise, we should discuss a better solution.

jhendersonHDF · 2024-01-17T20:26:11Z

Example program that demonstrates the issue:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include "hdf5.h"

int
main(int argc, char **argv)
{
    const char *buf[] = {"attr_value"};
    hid_t       file_id  = H5I_INVALID_HID;
    hid_t       attr_id  = H5I_INVALID_HID;
    hid_t       str_type = H5I_INVALID_HID;
    hid_t       space_id = H5I_INVALID_HID;
    int        *irbuf    = NULL;
    char       **rbuf    = NULL;

    printf("Test memory leak\n");

    file_id = H5Fcreate("file.h5", H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);
    if (file_id == H5I_INVALID_HID) {
        printf("Couldn't create file\n");
        exit(EXIT_FAILURE);
    }

    str_type = H5Tcopy(H5T_C_S1);
    if (str_type == H5I_INVALID_HID) {
        printf("Couldn't create string type\n");
        exit(EXIT_FAILURE);
    }

    if (H5Tset_size(str_type, H5T_VARIABLE) < 0) {
        printf("Couldn't set string size\n");
        exit(EXIT_FAILURE);
    }

    space_id = H5Screate(H5S_SCALAR);
    if (space_id == H5I_INVALID_HID) {
        printf("Couldn't create dataspace\n");
        exit(EXIT_FAILURE);
    }

    attr_id = H5Acreate2(file_id, "attribute", str_type, space_id, H5P_DEFAULT, H5P_DEFAULT);
    if (attr_id == H5I_INVALID_HID) {
        printf("Couldn't create attribute\n");
        exit(EXIT_FAILURE);
    }

    if (H5Awrite(attr_id, str_type, buf) < 0) {
        printf("Couldn't write to attribute\n");
        exit(EXIT_FAILURE);
    }

    H5Aclose(attr_id);
    H5Fclose(file_id);

    irbuf = malloc(1000 * 1000 * sizeof(int));
    rbuf  = malloc(sizeof(char *));

    /* for (size_t i = 0; i < 10; i++) { */
    while (true) {
        file_id = H5Fopen("file.h5", H5F_ACC_RDONLY, H5P_DEFAULT);
        if (file_id == H5I_INVALID_HID) {
            printf("Couldn't open file\n");
            exit(EXIT_FAILURE);
        }

        attr_id = H5Aopen(file_id, "attribute", H5P_DEFAULT);
        if (attr_id == H5I_INVALID_HID) {
            printf("Couldn't open attribute\n");
            exit(EXIT_FAILURE);
        }

        if (H5Aread(attr_id, str_type, rbuf) < 0) {
            printf("Couldn't read from attribute\n");
            exit(EXIT_FAILURE);
        }

        H5Treclaim(str_type, space_id, H5P_DEFAULT, rbuf);

        H5Aclose(attr_id);
        H5Fclose(file_id);
    }

    free(irbuf);
    free(rbuf);

    H5Tclose(str_type);
    H5Sclose(space_id);

    return 0;
}

jhendersonHDF · 2024-01-17T20:29:12Z

src/H5T.c

@@ -2704,6 +2704,8 @@ H5T__unregister(H5T_pers_t pers, const char *name, H5T_t *src, H5T_t *dst, H5T_c
                continue;
            if (dst && dst->shared->type != soft->dst)
                continue;
+            if (owned_vol_obj)


We don't want to remove the vlen conversion from the soft conversions table here, so always continue on if the owned_vol_obj parameter is specified. Ideally, the logic for removing the soft conversion functions would be separate from removing the cached conversion pathway table entries, but this seems minimally invasive.

The entire for loop can be skipped if owned_vol_obj is non-NULL. (So you could add that check to the surrounding if statement, which will reduce the performance impact a little)

Very true; will do

qkoziol

I'm clicking 'approve', but believe that if you make the two small changes I've suggested, it would be better. :-)

qkoziol · 2024-01-18T17:44:14Z

src/H5T.c

@@ -2704,6 +2704,8 @@ H5T__unregister(H5T_pers_t pers, const char *name, H5T_t *src, H5T_t *dst, H5T_c
                continue;
            if (dst && dst->shared->type != soft->dst)
                continue;
+            if (owned_vol_obj)


The entire for loop can be skipped if owned_vol_obj is non-NULL. (So you could add that check to the surrounding if statement, which will reduce the performance impact a little)

qkoziol · 2024-01-18T17:50:16Z

src/H5T.c

        path = H5T_g.path[i];
        assert(path);

+        nomatch = ((H5T_PERS_SOFT == pers && path->is_hard) || (H5T_PERS_HARD == pers && !path->is_hard)) ||


This has become increasingly unwieldy - a small "match" routine is justified?

Agreed that it's become unwieldy, though is the point of a "match" routine just to move the visual ugliness out of this routine?

Mostly, and possibly to re-use

qkoziol

Super, thanks for updating! :-)

tacaswell · 2024-01-23T18:58:30Z

Even with this merged, is it worth sorting out how to manually clear this memory in h5py?

jhendersonHDF · 2024-01-23T19:36:36Z

Even with this merged, is it worth sorting out how to manually clear this memory in h5py?

I think I need to better understand the proxy buffering mechanism in h5py first, but I believe that this is still a problem (at least for non-string variable-length types) since I can't see anywhere else in the h5py codebase that H5Dvlen_reclaim/H5Treclaim is called after a read operation. Since this PR should fix the original issue with memory growth and since the h5py docs seem to suggest avoiding variable-length types where possible, it seems like a lower priority issue, but I think it can still be important. I quickly threw together jhendersonHDF/h5py@9004821 as a start, but it seemed a bit unwieldy to me and I was also still working on creating an h5py example for the issue.

…DFGroup#3942) Remove cached datatype conversion path table entries on file close When performing datatype conversions during I/O, the library checks to see whether it can re-use a cached datatype conversion pathway by performing comparisons between the source and destination datatypes of the current operation and the source and destination datatypes associated with each cached datatype conversion pathway. For variable-length and reference datatypes, a comparison is made between the VOL object for the file associated with these datatypes, which may change as a file is closed and reopened. In workflows involving a loop that opens a file, performs I/O on an object with a variable-length or reference datatype and then closes the file, this can lead to constant memory usage growth as the library compares the file VOL objects between the datatypes as different and adds a new cached conversion pathway entry on each iteration during I/O. This is now fixed by clearing out any cached conversion pathway entries for variable-length or reference datatypes associated with a particular file when that file is closed.

* Update upload- artifact to match download version (#3929) * Reorg and update options for doc and cmake config (#3934) * Add binary build for linux S3 (#3936) * Clean up Doxygen for szip functions and constants (#3943) * Replace off_t with HDoff_t internally (#3944) off_t is a 32-bit signed value on Windows, so we should use HDoff_t (which is __int64 on Windows) internally instead. Also defines HDftell on Windows to be _ftelli64(). * Fix chid_t to hid_t (#3948) * Fortran API work. (#3941) * - Added Fortran APIs: H5FGET_INTENT_F, H5SSELECT_ITER_CREATE_F, H5SSEL_ITER_GET_SEQ_LIST_F, H5SSELECT_ITER_CLOSE_F, H5S_mp_H5SSELECT_ITER_RESET_F - Added Fortran Parameters: H5S_SEL_ITER_GET_SEQ_LIST_SORTED_F, H5S_SEL_ITER_SHARE_WITH_DATASPACE_F - Added tests for new APIs - Removed H5F C wrapper stubs - Documentation misc. cleanup. * Add the user test program in HDFFV-9174 for committed types. (#3937) Add the user test program for committed types in HDFFV-9174 * Remove cached datatype conversion path table entries on file close (#3942) * fixed BIND name (#3957) * update H5Ssel_iter_reset_f test * Change 'extensible' to 'fixed' in H5FA code (#3964) * RF: move codespell configuration into .codespellrc so could be used locally as well (#3958) * Add RELEASE.txt note for the fix for issue #1256 (#3955) * Fix doxygen errors (#3962) * Add API support for Fortran MPI_F08 module definitions. (#3959) * revert to using c-stub for _F08 MPI APIs * use mpi compiler wrappers for cmake and nvhpc * Added a GitHub Codespaces configuration. (#3966) * Fixed XL and gfortran errors (#3968) * h5 compiler wrappers now pass all arguments passed to it to the compile line (#3954) * The issue was that the "allargs" variable was not being used in the final command of the compiler wrapper. Any entries containing an escaped quote (\", \') or other non-matching argument (*) would not be passed to the compile line. I have fixed this problem by ensuring all arguments passed to the compiler wrapper are now included in the compile line. * Add binary testing to CI testing (#3971) * Replace 'T2' with ' ' to avoid failure to match expected output due to (#3975) * Clarify vlen string datatype message (#3950) * append '-WF,' when passing C preprocessor directives to the xlf compiler (#3976) * Create CITATION.cff (#3927) Add citation source based on http://web.archive.org/web/20230610185232/https://portal.hdfgroup.org/display/knowledge/How+do+I+properly+cite+HDF5%The space difference in the Fortran examples must be fixed to match the expected output for compression filter examples. * corrected warning: implicit conversion changes signedness (#3982) * Skip mac bintest until more reliable (#3983) * Make platform specific test presets for windows and macs (#3988) * chore: fix typo (#3989) * Add a missing left parenthesis in RELEASE.txt. (#3990) * Remove ADB signature from RELEASE.txt. (#3986) * Bump the github-actions group with 6 updates (#3981) * Sync API tests with vol-tests (#3940) * Fix for github issue #2414: segfault when copying dataset with attrib… (#3967) * Fix for github issue #2414: segfault when copying dataset with attributes. This also fixes github issue #3241: segfault when copying dataset. Need to set the location via H5T_set_loc() of the src datatype when copying dense attributes. Otherwise the vlen callbacks are not set up therefore causing seg fault when doing H5T_convert() -> H5T__conv_vlen(). * Fix broken links caused by examples relocation. (#3995) * Add abi-complience check and upload to releases (#3996) * Fix h5watch test failures to ignore system warnings on ppc64le. (#3997) * Remove oneapi/clang compiler printf() type warning. (#3994) * Updated information about obtaining the HDF5 source code to use the repos. (#3972) * Fix overwritten preset names (#4000) * Fix incompatible pointer type warnings in object reference examples (#3999) * Fix build issue and some warnings in H5_api_dataset_test.c (#3998) * Modern C++ dtor declarations (#1830) * C++ dtor modernization - Replaced a bunch of empty dtors with `= default` - Removed deprecated `throw()`. In C++11, dtors are `noexcept` by default. * remove incorrect check for environ (#4002) * Add a missing file into Makefile.am for MinGW Autotools build error. (#4004) * Issue #1824: Replaced most remaining sprintf with safer snprint (#4003) * Add hl and cpp ABI reports to daily build (#4006) * Don't add files and directories with names that begin with ., or that match *autom4te* to release tar & zip files. (#4009) * Fix some output issues with ph5diff (#4008) * Update install texts (#4010) * Add C in project line for CMake to fix #4012. (#4014) * separate out individual checks for string removal (#4015) * Add compound subset ops on attributes to API tests (#4005) ---------

jhendersonHDF requested review from lrknox, derobins, byrnHDF, fortnern, qkoziol, vchoi-hdfgroup, bmribler, glennsong09, mattjala and brtnfld as code owners January 17, 2024 20:09

jhendersonHDF linked an issue Jan 17, 2024 that may be closed by this pull request

h5py (+PyTorch) memory leak since hdf5 1.12.1 #1256

Closed

jhendersonHDF force-pushed the vlen_path_conv_table_fix branch 2 times, most recently from a20ea15 to 96d6c14 Compare January 17, 2024 20:14

jhendersonHDF commented Jan 17, 2024

View reviewed changes

qkoziol approved these changes Jan 18, 2024

View reviewed changes

jhendersonHDF force-pushed the vlen_path_conv_table_fix branch from c43774f to c063c72 Compare January 18, 2024 20:38

qkoziol reviewed Jan 18, 2024

View reviewed changes

jhendersonHDF force-pushed the vlen_path_conv_table_fix branch from 311ca76 to 0b2bf8a Compare January 18, 2024 20:40

Address comments from review

0b2bf8a

derobins approved these changes Jan 23, 2024

View reviewed changes

derobins merged commit f86fe61 into HDFGroup:develop Jan 23, 2024
49 checks passed

jhendersonHDF deleted the vlen_path_conv_table_fix branch February 20, 2024 02:40

jhendersonHDF mentioned this pull request Mar 7, 2024

Expand check for variable-length or reference types when clearing datatype conversion paths #4085

Merged

jhendersonHDF mentioned this pull request Jul 3, 2024

Memory leak when updating attrs in newly created group h5py/h5py#2451

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove cached datatype conversion path table entries on file close #3942

Remove cached datatype conversion path table entries on file close #3942

jhendersonHDF commented Jan 17, 2024

jhendersonHDF commented Jan 17, 2024

jhendersonHDF commented Jan 17, 2024

jhendersonHDF Jan 17, 2024

qkoziol Jan 18, 2024

jhendersonHDF Jan 18, 2024

qkoziol left a comment

qkoziol Jan 18, 2024

qkoziol Jan 18, 2024

jhendersonHDF Jan 18, 2024

qkoziol Jan 18, 2024

qkoziol left a comment

tacaswell commented Jan 23, 2024

jhendersonHDF commented Jan 23, 2024

Remove cached datatype conversion path table entries on file close #3942

Remove cached datatype conversion path table entries on file close #3942

Conversation

jhendersonHDF commented Jan 17, 2024

jhendersonHDF commented Jan 17, 2024

jhendersonHDF commented Jan 17, 2024

jhendersonHDF Jan 17, 2024

Choose a reason for hiding this comment

qkoziol Jan 18, 2024

Choose a reason for hiding this comment

jhendersonHDF Jan 18, 2024

Choose a reason for hiding this comment

qkoziol left a comment

Choose a reason for hiding this comment

qkoziol Jan 18, 2024

Choose a reason for hiding this comment

qkoziol Jan 18, 2024

Choose a reason for hiding this comment

jhendersonHDF Jan 18, 2024

Choose a reason for hiding this comment

qkoziol Jan 18, 2024

Choose a reason for hiding this comment

qkoziol left a comment

Choose a reason for hiding this comment

tacaswell commented Jan 23, 2024

jhendersonHDF commented Jan 23, 2024