Refactor datatype conversion code to use pointers rather than IDs #4104

jhendersonHDF · 2024-03-10T02:21:05Z

The datatype conversion code previously used IDs for the source and destination datatypes rather than pointers to the internal structures for those datatypes. This was mostly due to the need for an ID for these datatypes that can be passed to an application-registered datatype conversion function or datatype conversion exception function. However, using IDs internally caused a lot of unnecessary ID lookups and hurt performance of datatype conversions in general. This was especially problematic for compound datatype conversions, where the ID lookups were occuring on every member of every compound element of a dataset. The code has now been refactored to use pointers internally and only create IDs for datatypes when necessary.

Fixed a test issue in dt_arith where a library datatype conversion function was being cast to an application conversion function. Since the two have different prototypes, this started failing after the parameters for a library conversion function changed from hid_t to H5T_t * and an extra parameter was added. This appears to have worked coincidentally in the past since the only different between a library conversion function and application conversion function was an extra DXPL parameter at the end of an application conversion function.

Fixed an issue where memory wasn't being freed in the h5fc_chk_idx test program. Even though the program exits quickly after allocating the memory, it still causes failures when testing with -fsanitize=address.

The datatype conversion code previously used IDs for the source and destination datatypes rather than pointers to the internal structures for those datatypes. This was mostly due to the need for an ID for these datatypes that can be passed to an application-registered datatype conversion function or datatype conversion exception function. However, using IDs internally caused a lot of unnecessary ID lookups and hurt performance of datatype conversions in general. This was especially problematic for compound datatype conversions, where the ID lookups were occuring on every member of every compound element of a dataset. The code has now been refactored to use pointers internally and only create IDs for datatypes when necessary. Fixed a test issue in dt_arith where a library datatype conversion function was being cast to an application conversion function. Since the two have different prototypes, this started failing after the parameters for a library conversion function changed from hid_t to H5T_t * and an extra parameter was added. This appears to have worked coincidentally in the past since the only different between a library conversion function and application conversion function was an extra DXPL parameter at the end of an application conversion function Fixed an issue where memory wasn't being freed in the h5fc_chk_idx test program. Even though the program exits quickly after allocating the memory, it still causes failures when testing with -fsanitize=address

jhendersonHDF · 2024-03-10T02:22:13Z

There are some other changes I'd like to make in the H5T code, but I wanted to keep this as small as possible and get it out of the way first. I have not yet added a RELEASE.txt entry because this change doesn't really fit under new feature or bugs fixed. Maybe we should have a "performance enhancements" section?

Some quick testing shows that this gives at least a 3x improvement in simple cases of compound conversions, but I'm still profiling to find other spots that can be improved on.

src/H5Dchunk.c

src/H5Ofill.c

jhendersonHDF · 2024-03-10T02:30:02Z

src/H5Tpkg.h

@@ -150,9 +150,17 @@ struct H5T_stats_t {
    H5_timevals_t times;  /*total time for conversion	     */
 };

+/* Context struct for information used during datatype conversions */
+typedef struct H5T_conv_ctx_t {


New structure that holds relevant information during conversion to make conversions on compound and other container types faster. Similar to H5T_cdata_t, but that structure is public since it gets passed to an app conversion callback and the information we want here is partly private.

jhendersonHDF · 2024-03-10T02:30:58Z

src/H5Tpkg.h

-                                      size_t buf_stride, size_t bkg_stride, void *buf, void *bkg);
-H5_DLL herr_t H5T__conv_ldouble_ullong(hid_t src_id, hid_t dst_id, H5T_cdata_t *cdata, size_t nelmts,
-                                       size_t buf_stride, size_t bkg_stride, void *buf, void *bkg);
+H5_DLL herr_t H5T__conv_noop(H5T_t *src, H5T_t *dst, H5T_cdata_t *cdata, const H5T_conv_ctx_t *conv_ctx,


All the conversion function prototypes here switched from hid_t src_id, hid_t dst_id -> H5T_t *src, H5T_t *dst and got an extra const H5T_conv_ctx_t *conv_ctx parameter.

jhendersonHDF · 2024-03-10T02:38:09Z

src/H5Tpkg.h

@@ -479,357 +487,472 @@ H5_DLL herr_t H5T__commit_named(const H5G_loc_t *loc, const char *name, H5T_t *d
 H5_DLL H5T_t *H5T__open_name(const H5G_loc_t *loc, const char *name);
 H5_DLL hid_t  H5T__get_create_plist(const H5T_t *type);

+/* Helper function for H5T_convert that accepts a pointer to a H5T_conv_ctx_t structure */
+H5_DLL herr_t H5T_convert_with_ctx(H5T_path_t *tpath, H5T_t *src_type, H5T_t *dst_type,


This is a new helper function for H5T_convert that helps avoid a severe amount of overhead during compound and other container datatype conversions where H5T_convert is called for every single member of every element. With the changes in this PR, the H5T_conv_ctx_t structure gets initialized in H5T_convert, meaning that any conversion function calling H5T_convert in a loop would initialize a new structure for every member of every element. In the case where an application-registered conversion exception function is being used, this would cause the conversion function to register and then tear down an ID for the source and destination datatypes each time, which was very expensive. Container type conversion functions can instead copy the current H5T_conv_ctx_t structure into a temporary one, update its src_type_id and dst_type_id fields with the datatype IDs for the current member being converted, and then call H5T_convert_with_ctx to avoid all the overhead.

jhendersonHDF · 2024-03-10T02:40:11Z

test/dt_arith.c

-    /* Unregister the hard conversion from int to float.  Verify the conversion
-     * is a soft conversion. */
-    H5Tunregister(H5T_PERS_HARD, NULL, H5T_NATIVE_INT, H5T_NATIVE_FLOAT,
-                  (H5T_conv_t)((void (*)(void))H5T__conv_int_float));


This test was previously casting H5T__conv_int_float from a library conversion function into an application conversion function. Since the two have different prototypes, the changes in this PR starting tripping over this. This appears to have worked in the past since the difference between application conversion functions and library conversion functions is that application functions just have a DXPL parameter at the very end of the list.

jhendersonHDF · 2024-03-10T02:41:26Z

tools/test/h5format_convert/h5fc_chk_idx.c

The changes here just fix an issue when testing with -fsanitize=address. Not exactly a problematic memory leak since the program exits quickly, but still causes problems.

jhendersonHDF · 2024-03-10T02:44:00Z

src/H5Dint.c

-    dset_op.u.app_op.op      = H5D__vlen_get_buf_size_cb;
-    dset_op.u.app_op.type_id = type_id;
+    dset_op.op_type  = H5S_SEL_ITER_OP_LIB;
+    dset_op.u.lib_op = H5D__vlen_get_buf_size_cb;


It didn't look like there was a need for the callback here to be an application iter op, as opposed to a library one. If I left this as an application op, the passed in type_id would have to be unwrapped for every single element so that the H5T_t * can be assigned to dset_info.mem_type in the changes above to H5D__vlen_get_buf_size_cb. Switching this to a library iter op passes an H5T_t * to the callback which is much simpler.

Agree, good call

jhendersonHDF · 2024-03-10T02:44:50Z

src/H5T.c

+                if ((tmp_sid = H5I_register(H5I_DATATYPE, tmp_stype, false)) < 0)
+                    HGOTO_ERROR(H5E_DATATYPE, H5E_CANTREGISTER, FAIL,
+                                "unable to register ID for source datatype");
+                if ((tmp_did = H5I_register(H5I_DATATYPE, tmp_dtype, false)) < 0)


Only register an ID for the types here if we have to pass them to an application conversion function

qkoziol · 2024-03-10T04:49:54Z

src/H5T.c

+                if ((tmp_did = H5I_register(H5I_DATATYPE, tmp_dtype, false)) < 0)
+                    HGOTO_ERROR(H5E_DATATYPE, H5E_CANTREGISTER, FAIL,
+                                "unable to register ID for destination datatype");
+
                if ((conv->u.app_func)(tmp_sid, tmp_did, &cdata, (size_t)0, (size_t)0, (size_t)0, NULL, NULL,
                                       H5CX_get_dxpl()) < 0) {
                    H5I_dec_ref(tmp_sid);


I know this is legacy code, but adding checks for error returns would be good throughout this section of code.

qkoziol · 2024-03-10T04:53:41Z

src/H5T.c

        path->cdata.command = H5T_CONV_INIT;
        if (conv->is_app) {
+            if (tmp_stype && ((src_id = H5I_register(H5I_DATATYPE, tmp_stype, false)) < 0))


How would tmp_stype or tmp_dtype be NULL here? Shouldn't you always register src_id and dst_id?

Based on the above code that was previously only copying the type and wrapping it in an ID if path->src and path->dst aren't NULL, I kept that same logic assuming that there's a case where the copying could get skipped. That said, if it made it into one of the following blocks, it would pass NULL or H5I_INVALID_HID depending on the type of conversion function, but after studying the code for a while I couldn't really determine when the case would be that path->src or path->dst are NULL, so I tried to leave the functionality the same.

Wouldn't the IDs be INVALID then?

Yes. That part of the code probably needs to be studied more since previously AND currently, if it hit that particular spot it would pass invalid stuff to the conversion functions.

Sorry, re-read your earlier comment and you said it would pass INVALID. :-)

I think the earlier version of the code was buggy and you can assume that the tmp_stype & tmp_dtype are non-NULL.

I may leave this like it is for now. That function is fairly complicated, but it looks like the datatypes in question could be NULL for the library's internal no-op function, as well as an application-registered soft conversion no-op function. If that's the case it would cause an assertion in the library when trying to copy a NULL type, but these functions seem to happily accept NULL or H5I_INVALID_HID for the datatypes as they're just ignored by the function. It seems like a contrived case but I can't easily predict what people may do. The library's no-op conversion function should never hit that part of the code as it is currently, but an application function might.

qkoziol · 2024-03-10T05:01:51Z

src/H5Tconv.c

+                                "can't copy source compound member datatype");
+                priv->src_memb[i] = type;
+
+                if ((tid = H5I_register(H5I_DATATYPE, type, false)) < 0)


Why do we need IDs for these types still?

Still need them for passing to the lower members functions in case a conversion exception function is being used. I'd like to only conditionally allocate the arrays and create the IDs, but there currently isn't anything passed to the conversion function that could tell me at init time whether I need to. The new conversion context is only available at conversion time, not init time. I suppose we could add a field to the H5T_cdata_t structure, though since it's a public one I've tried to avoid that. It also wouldn't really be of interest to an application conversion callback either.

Couldn't you call H5T_convert here, instead of H5T_convert_ctx, and let it sort things out?

Or add a helper routine (maybe in the _init() call) that determined which ones needed an ID and which could avoid making one?

The ctx version of convert was actually created for this specific reason since calling H5T_convert would cause an enormous amount of overhead as it would be creating and destroying IDs for every member of every compound element when a conversion exception function is used. That said, maybe I can provide the context object at init time with a field that lets the function know if it needs these IDs and then separate which fields of the context will be valid based on the conversion command type specified in the cdata.

Yes, that's the direction I was thinking.

qkoziol · 2024-03-10T05:06:06Z

src/H5Tconv.c

-                    0)
-                    HGOTO_ERROR(H5E_DATASET, H5E_CANTREGISTER, FAIL,
-                                "unable to register types for conversion");
+                if (NULL == (tmp_type = H5T_copy(src_parent, H5T_COPY_ALL)))


Why does this type need to be copied? Same question for tons of other copies - I think the only reason 90% of them were originally copied was because they needed an ID and decrementing the ID would delete the type object. Since we don't need the IDs, we shouldn't need the copies.

Agreed (and added a comment above after thinking for a bit). My goal would be to tackle that directly after this PR, I just didn't want to dive into that yet since it can be tricky and this is also already a fairly large set of changes.

qkoziol

Super! Long overdue!

derobins · 2024-03-10T07:42:11Z

src/H5T.c

@@ -1498,8 +1498,8 @@ H5T_top_term_package(void)
                    }                          /* end if */
                }                              /* end if */
                else {
-                    if ((path->conv.u.lib_func)((hid_t)FAIL, (hid_t)FAIL, &(path->cdata), (size_t)0,
-                                                (size_t)0, (size_t)0, NULL, NULL) < 0) {
+                    if ((path->conv.u.lib_func)(NULL, NULL, &(path->cdata), NULL, (size_t)0, (size_t)0,


Don't need noise casts for small integers

…FGroup#4104) The datatype conversion code previously used IDs for the source and destination datatypes rather than pointers to the internal structures for those datatypes. This was mostly due to the need for an ID for these datatypes that can be passed to an application-registered datatype conversion function or datatype conversion exception function. However, using IDs internally caused a lot of unnecessary ID lookups and hurt performance of datatype conversions in general. This was especially problematic for compound datatype conversions, where the ID lookups were occuring on every member of every compound element of a dataset. The code has now been refactored to use pointers internally and only create IDs for datatypes when necessary. Fixed a test issue in dt_arith where a library datatype conversion function was being cast to an application conversion function. Since the two have different prototypes, this started failing after the parameters for a library conversion function changed from hid_t to H5T_t * and an extra parameter was added. This appears to have worked coincidentally in the past since the only different between a library conversion function and application conversion function was an extra DXPL parameter at the end of an application conversion function Fixed an issue where memory wasn't being freed in the h5fc_chk_idx test program. Even though the program exits quickly after allocating the memory, it still causes failures when testing with -fsanitize=address

* Added new H5E with tests. (#4049) Added Fortran H5E APIs: h5eregister_class_f, h5eunregister_class_f, h5ecreate_msg_f, h5eclose_msg_f h5eget_msg_f, h5epush_f, h5eget_num_f, h5ewalk_f, h5eget_class_name_f, h5eappend_stack_f, h5eget_current_stack_f, h5eset_current_stack_f, h5ecreate_stack_f, h5eclose_stack_f, h5epop_f, h5eprint_f (C h5eprint v2 signature) Addresses Issue #3987 * Don't load toolchain file in CMake workflows (#4077) * Add support for the new MSVC preprocessor (#4078) Microsoft has added a new, standards-conformant preprocessor to MSVC, which can be enabled with /Zc:preprocessor. This preprocessor trips over our HDopen() function-like variadic macro since it uses a hack that only works with the legacy MSVC preprocessor. This fix adds ifdefs to use the correct HDopen() macro depending on the MSVC preprocessor selected. Fixes #2515 * Increased H5FD_ROS3_MAX_SECRET_TOK_LEN to 4096 to accommodate long AWS secret tokens (#4064) ros3: increased H5FD_ROS3_MAX_SECRET_TOK_LEN to 4096; stratified the debugging statements so there is more control over the output * Close and reopen file during dset vlen IO API tests (#4050) - Close/reopen file and file objects to prevent cache from being used instead of actual I/O. - Moved vlen io test datasets under the dset container group instead of the root group - Moved the PASSED() invocation to after individual test cleanup in case an error occurs during H5Treclaim * New option for building with static CRT in Windows (#4062) * addressed compilation errors with gfortran 4.8 (#4082) * Fix bin/trace script w/ out params (#4074) The bin/trace script adds TRACE macros to public API calls in the main C library. This script had a parsing bug that caused functions that were annotated with /*out*/, etc. to be labeled as void pointers instead of typed pointers. This is mainly a developer feature and not visible to consumers of the public API. The bin/trace script now annotates public API calls properly. Fixes GH #3733 * Use H5T_STD_I32LE to create datatype in vds examples (#4070) Fixes issues when VDS examples are tested on BE systems * Remove printf debugging in H5HL code (#4086) * Fixed asserts due to H5Pset_est_link_info() values (#4081) * Fixed asserts due to H5Pset_est_link_info() values If large values for est_num_entries and/or est_name_len were passed to H5Pset_est_link_info(), the library would attempt to create an object header NIL message to reserve enough space to hold the links in compact form (i.e., concatenated), which could exceed allowable object header message size limits and trip asserts in the library. This bug only occurred when using the HDF5 1.8 file format or later and required the product of the two values to be ~64k more than the size of any links written to the group, which would cause the library to write out a too-large NIL spacer message to reserve the space for the unwritten links. The library now inspects the phase change values to see if the dataset is likely to be compact and checks the size to ensure any NIL spacer messages won't be larger than the library allows. Fixes GitHub #1632 * Fix copy-paste comments * update macOS support statement (#4084) * fixes compilation failures due to H5E additions (#4090) * Remove extra whitespaces from nvhpc-cmake action. (#4091) * Remove printf debugging in H5I package (#4088) * Add subfiling for h5dump filedriver option help message (#3878) * Merge HDF4 release workflow changes to hdf5 (#4093) * Update long double test with correct values (#4060) Update long double test with correct values * virtual creates must use the same datatype as the main file (#4092) * Fixed -Wdeprecated-copy-dtor warnings by implementing a copy assignment operator (#3306) Example warning was: warning: definition of implicit copy assignment operator for 'Group' is deprecated because it has a user-declared destructor [-Wdeprecated-copy-dtor] * Expand check for variable-length or reference types when clearing datatype conversion paths (#4085) When clearing out datatype conversion paths involving variable-length or reference datatypes on file close, also check for these datatypes inside compound or array datatypes * Remove H5B debug checks (#4089) The H5B (version 1 B-tree) package would add some computationally expensive integrity checks when H5B_DEBUG was defined. Due to their negative effects on performance, this option was rarely turned on, making the H5B__assert() check function stale, if not dead, code. This change: * Builds H5B__assert() when NDEBUG is not defined (the function relies on assert()) so it gets compiled more often. * Removes some printf debugging statements in the B-tree code * Removes all H5B "extra debug" checks that are leftover from past debugging sessions. Maintainers can add H5B__assert() selectively to perform integrity checks when debugging. * Removes the HDF5_ENABLE_DEBUG_H5B CMake option H5B_DEBUG now has no effect * Fix the last C++ stack size warning (#4099) * Clean up off_t usage (#4095) * Add comments to C++ and Fortran API calls that use off_t * Remove noise casts for small integers * Correct missing change of source path for S3 build (#4100) * Remove leading / from relative path. (#4101) * msvc: don't declare `HAVE_TIMEZONE` for older MSVC (#3956) It was introduced in MSVC 15 (Visual Studio 2017). * Remove a few H5O printf debugging statements (#4096) These were in H5Oint.c, were protected by H5O_DEBUG, and only dumped to stdout if the HDF5_DEBUG environment variable were set to do so. * Remove H5DEBUG() calls from H5Dmpio.c (#4103) Just use stdout when a stream is needed. * Remove printf debugging from H5Smpio.c (#4098) * Change how stats are printed in H5Z (#4097) H5Z used the soon-to-be-removed HDEBUG macro to decide if stats would be dumped and to what stream. This is now handled by a DUMP_DEBUG_STATS_g variable and the output is always sent to stdout. This is an internal change, not normally visible to users. * Refactor datatype conversion code to use pointers rather than IDs (#4104) The datatype conversion code previously used IDs for the source and destination datatypes rather than pointers to the internal structures for those datatypes. This was mostly due to the need for an ID for these datatypes that can be passed to an application-registered datatype conversion function or datatype conversion exception function. However, using IDs internally caused a lot of unnecessary ID lookups and hurt performance of datatype conversions in general. This was especially problematic for compound datatype conversions, where the ID lookups were occuring on every member of every compound element of a dataset. The code has now been refactored to use pointers internally and only create IDs for datatypes when necessary. Fixed a test issue in dt_arith where a library datatype conversion function was being cast to an application conversion function. Since the two have different prototypes, this started failing after the parameters for a library conversion function changed from hid_t to H5T_t * and an extra parameter was added. This appears to have worked coincidentally in the past since the only different between a library conversion function and application conversion function was an extra DXPL parameter at the end of an application conversion function Fixed an issue where memory wasn't being freed in the h5fc_chk_idx test program. Even though the program exits quickly after allocating the memory, it still causes failures when testing with -fsanitize=address * Minimize use of abort() (#4110) The abort() call is used at several places where it probably shouldn't. * Clean up a few things in H5T.c (#4105) * remove (size_t) noise casts * replace (hid_t)FAIL with H5I_INVALID_HID * Convert H5B__assert to use error checks (#4109) Switches assert() calls to HGOTO_ERROR in H5B__assert() so it can be used in production mode. Also renames it to H5B__verify_structure() to better reflect what it checks. * Move common variables out of cache test blocks (#4108) Fixes a stack size warning w/ XCode * Remove lint comments (#4107) * Change compression tests reference files to use masking for compression ratios (#4083) Rework TEST_FILTER tests to handle slightly different compression ratios * Add Doxygen for HDFS VFD (#4106) * Add Doxygen for HDFS VFD * Fix Doxygen warning * Update H5FDhdfs.h * long double tests has problems setting precision with offset (#4102) * long double tests has problems setting precision with offset * Disable long double until more development fixes issues * Fix up dsets test for some platforms with different long double format (#4114) * Adjust aocc workflow to fit the autotools/cmake pattern of other callable workflows. (#4115) * Implement ID creation optimization for container datatype conversions (#4113) Makes the datatype conversion context object available during both the initialization and conversion processes for a datatype conversion function, allowing the compound, variable-length and array datatype conversion functions to avoid creating IDs for the datatypes when they aren't necessary Adds internal H5CX_pushed routine to determine if an API context is available to retrieve values from Also adds error checking to several places in H5T.c and H5Tconv.c where the code had previously assumed object close operations would succeed * Handle IBM long double issues in dsets.c test_floattypes test (#4116) * Handle IBM long double issues in dsets.c test_floattypes test * Remove old cmake check (#4117) * Use AC_SYS_LARGEFILE on Autotools (#4119) We previously used a hack introduced in 1.8.5 to paper over differences in off_t and off64_t when determining the type sizes. We no longer explicitly support off64_t in the library and AC_SYS_LARGEFILE works fine. * Initialize selection type in chunk struct (#4087) * Overhaul CMake LFS support (#4122) Externally visible: * The HDF_ENABLE_LARGE_FILE option (advanced) has been removed * We no longer run a test program to determine if LFS works, which will help with cross-compiling * On Linux we now unilaterally set -D_LARGEFILE_SOURCE and -D_FILE_OFFSET_BITS=64, regardless of 32/64 bit system. CMake doesn't offer a nice equivalent to AC_SYS_LARGEFILE and since those options do nothing on 64-bit systems, this seems safe and covers all our bases. We don't set -D_LARGEFILE64_SOURCE since we don't use any of the POSIX 64-bit specific API calls like ftello64, as noted above. * We didn't test for LFS support on non-Linux platforms. We've added comments for how LFS should probably be supported on AIX and Solaris, which seem to be alive, though uncommon. PRs would be appreciated if anyone wishes to test this. Internal: * Drops off64_t size checks since this is unused (as in Autotools) * Remove HDF_EXTRA_FLAGS, which is now unused * Remove hack around deprecated LINUX_LFS Fixes #2395 * Update CMake comment about _POSIX_C_SOURCE (#4124) Was missng the 2008 pread/write info * Deprecate bin/cmakehdf5 (#4127) * Deprecate bin/cmakehdf5 * Add reference text * Don't set the rpath when linking statically (#4125) * Remove invalid compile flag (#4129) * Fix segfault in vlen io API test (#4130) * Update URLs in RELEASE.txt (#4132) * Add cygwin CI and update yaml files for consistency and accuracy (#4131) * Add cygwin CI * add cygwin packages * Correct option names * Cleanup yaml file and synch look and feel * Synch CI look and feel and correct path issues * Upgrade oneapi version * pwsh needs env: for vars * No continuation char for pwsh * restore correct pwsh step * Run subset of tests for cygwin workflow * Remove space chars in regex * restore full tests * Remove ros3 and hdfs VFDs from Autotools VFD list (#4142) These will never pass `make check` and would require a custom test suite for more comprehensive testing. * Skip part of dsets.c test for IBM long double type (#4136) * Capitalize option message for consistency. (#4141) * Fixed misc. H5E fortran failures due to previous PR (#4138) * fixed promotion of integers and reals tests and check-passthrough-vol failure * fixed cygwin issue * Fix Autotools -Werror cleanup (#4144) The Autotools temporarily scrub -Werror(=whatever) from CFLAGS, etc. so configure checks don't trip over warnings generated by configure check programs. The sed line originally only scrubbed -Werror but not -Werror=something, which would cause errors when the '=something' was left behind in CFLAGS. The sed line has been updated to handle -Werror=something lines. Fixes one issue raised in #3872 * Fix doxygen link to example function usage (#4133) * Remove useless headers (#4145) Removes unnecessary headers from C library source files. * Clean up some hbool_t/TRUE/FALSE stragglers (#4143) It looks like most of these snuck in via selection I/O work * Fix error when overwriting an indirectly nested vlen with a shorter sequence (#4140) * defined CMAKE_H5_HAVE_DARWIN (#4146) * Make the newsletter scheme work like HDF4 (#4149) * Remove at the end of list item. (#4151) * Fix buffer size calculation in the deflate filter (#4147) * Remove H5O header and friend status from H5A.c (#4150) * Remove HDF from Fortran 2003 configuration check message. (#4157) * Suppress H5Dmpio debugging output unless HDF5_DEBUG=d is set (#4155) * Header cleanup in C library (#4154) * Ensure H5FL header is included everywhere * Ensure H5SL header is included everywhere * Ensure H5MM header is included everywhere * Add Doxygen to H5FDmirror.h (#4158) * Remove lseek64 and stat64 symbols from CMake (#4163) We don't use these in the library. * Remove HAVE_IOEO checks from CMake (#4160) This was intended to check for thread-safety functionality on Windows. The required functionality has been standard since Windows Vista, so these checks can be removed. * Fix some minor warnings (#4165) * Bump the size of the mirror VFD IP field (#4167) The IP address string isn't big enought to hold an IPv4-mapped IPv6 address. * Fix mirror VFD script (#4170) This had directory problems when running locally. * Fix an issue where the Subfiling VFD's context cache grows too large (#4159) * Address code page issues w/ Windows file paths (#4172) On Windows, HDF5 attempted to convert file paths passed to open() and remove() to UTF-16 in order to handle Unicode file paths. This scheme does not work when the system uses code pages to handle non-ASCII file names. As suggested in the forum post below, we now also try to see if we can open the file with open(), which should handle systems where non-ASCII code pages are in use. https://forum.hdfgroup.org/t/open-create-hdf5-files-with-non-utf8-chars-such-as-shift-jis/11785 * Add Doxygen to API calls in H5VLnative.h (#4173) * Allow H5Soffset_simple to accept NULL offsets (#4152) The reference manual states that the offset parameter of H5Soffset_simple() can be set to NULL to reset the offset of a simple dataspace to 0. This has never been true, and passing NULL was regarded as an error. The library will now accept NULL for the offset parameter and will correctly set the offset to zero. Fixes HDFFV-9299 * Add filter plugin user guide text. Fix registered URL in docs (#4169) * Add support for _Float16 16-bit floating point type (#4065) Fixed some conversion issues with Clang due to problematic undefined behavior when casting a negative floating-point value to an integer Fixed a bug in the library's software integer to floating-point conversion function where a user's conversion exception function returning H5T_CONV_UNHANDLED in the case of overflows would result in incorrect data after conversion Added configure checks for functions and macros related to _Float16 usage since some compilers expose the datatype but not the functions or macros Fixed a dt_arith test failure when H5_WANT_DCONV_EXCEPTION isn't defined Fixed a few warnings from not explicitly casting some _Float16 variables upwards * Remove some H5T_copy calls that are now unnecessary (#4164) Removes some datatype copying calls that are now unnecessary after refactoring the datatype conversion code to use pointers internally rather than IDs Rewrites the enum conversion function so that it uses cached copies of the source and destination datatypes in order to avoid modifying the datatypes passed in Adds a 'recursive' field to the datatype conversion context which allows the conversion functions for members of a container datatype to skip unnecessary repetitive conversion setup code Changes internal datatype conversion callback functions so that the source and destination datatype structure pointers are const Removes some unused and unnecessary internal IDs registered with H5I_register * Add RELEASE.txt note for cmpd segfault fix (#4175) RELEASE notice for the fix in #3842 * Clean up CMake direct VFD handling (#4161) There's no need to build and run programs, or even check the operating system. We just need to check for O_DIRECT and posix_memalign(). * Capitalize linux for consistency (#4178) * Reworked H5Epush_f (#4153) * Add const to new _Float16 conversion routine parameters (#4181) * Update Release Specific Information link. (#4179) * Filter plugins updates for registration URL (#4180) * Update filter plugin URL to new location * Adjust test array size * Add daily VFD CI workflow (#4176) Adds testing of Subfiling VFD * Exclude shell tests from sanitizers (#4186) * Add a missing period at the end of sentence. (#4184) * last-file.txt should not be created for release workflow (#4185) * Skip part of dtypes.c _Float16 file size check for certain VFDs (#4182) * Fixes several MinGW + Autotools issues (#4190) * Fixes detection of various Windows libraries, etc. * Corrects alarm(2) configure checks * Uses Win32 threads by default w/ Pthreads override, if desired * Set _WIN32_WINNT correctly for MinGW * Fix setenv(3) wrapper for MinGW, which does not have getenv_s() MinGW Autotools support is still not Amazing, but this at least allows the library and tools build and is better about thread-safety * Add semicolons to the end of HSYS_GOTO_ERROR (#4193) Looks like we forgot these when we did the other macros. * Remove broken links (#4187) * Skip vlen IO API test for cache VOL (#4135) * Fix cache VOL segfault in vlen io test * Skip vlen IO API test * Handle certain empty subfiling environment variables (#4038) * h5diff compares attribute data like dataset data (#4191) Updates tools docs to indicate that dataset and attribute data are compared in the same way * A path component may include a dot with other characters (#4192) * Add RELEASE.txt note for recent datatype conversion improvements (#4195) * Add NEWSLETTER item about _Float16 support (#4197) * Correct download link for develop doxygen (#4196) * Update version in new .yml files.

jhendersonHDF requested review from lrknox, derobins, byrnHDF, fortnern, qkoziol, vchoi-hdfgroup, bmribler, glennsong09, mattjala and brtnfld as code owners March 10, 2024 02:21

jhendersonHDF commented Mar 10, 2024

View reviewed changes

src/H5Dchunk.c Show resolved Hide resolved

jhendersonHDF commented Mar 10, 2024

View reviewed changes

src/H5Ofill.c Show resolved Hide resolved

jhendersonHDF commented Mar 10, 2024

View reviewed changes

qkoziol reviewed Mar 10, 2024

View reviewed changes

qkoziol approved these changes Mar 10, 2024

View reviewed changes

derobins reviewed Mar 10, 2024

View reviewed changes

derobins approved these changes Mar 10, 2024

View reviewed changes

derobins merged commit ef401a5 into HDFGroup:develop Mar 10, 2024
48 checks passed

jhendersonHDF deleted the dtype_conv_id_removal branch March 11, 2024 00:57

jhendersonHDF mentioned this pull request Mar 11, 2024

Implement ID creation optimization for container datatype conversions #4113

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor datatype conversion code to use pointers rather than IDs #4104

Refactor datatype conversion code to use pointers rather than IDs #4104

jhendersonHDF commented Mar 10, 2024

jhendersonHDF commented Mar 10, 2024 •

edited

Loading

jhendersonHDF Mar 10, 2024

jhendersonHDF Mar 10, 2024

jhendersonHDF Mar 10, 2024

jhendersonHDF Mar 10, 2024 •

edited

Loading

jhendersonHDF Mar 10, 2024

jhendersonHDF Mar 10, 2024

qkoziol Mar 10, 2024

jhendersonHDF Mar 10, 2024 •

edited

Loading

qkoziol Mar 10, 2024

jhendersonHDF Mar 10, 2024

qkoziol Mar 10, 2024

jhendersonHDF Mar 10, 2024

qkoziol Mar 10, 2024

jhendersonHDF Mar 10, 2024

qkoziol Mar 10, 2024

qkoziol Mar 10, 2024

jhendersonHDF Mar 11, 2024

qkoziol Mar 10, 2024

jhendersonHDF Mar 10, 2024

qkoziol Mar 10, 2024

qkoziol Mar 10, 2024

jhendersonHDF Mar 10, 2024 •

edited

Loading

qkoziol Mar 10, 2024

qkoziol Mar 10, 2024

jhendersonHDF Mar 10, 2024

qkoziol left a comment

derobins Mar 10, 2024

Refactor datatype conversion code to use pointers rather than IDs #4104

Refactor datatype conversion code to use pointers rather than IDs #4104

Conversation

jhendersonHDF commented Mar 10, 2024

jhendersonHDF commented Mar 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jhendersonHDF Mar 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jhendersonHDF Mar 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jhendersonHDF Mar 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qkoziol left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jhendersonHDF commented Mar 10, 2024 •

edited

Loading

jhendersonHDF Mar 10, 2024 •

edited

Loading

jhendersonHDF Mar 10, 2024 •

edited

Loading

jhendersonHDF Mar 10, 2024 •

edited

Loading