Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor datatype conversion code to use pointers rather than IDs #4104

Merged
merged 1 commit into from
Mar 10, 2024

Conversation

jhendersonHDF
Copy link
Collaborator

The datatype conversion code previously used IDs for the source and destination datatypes rather than pointers to the internal structures for those datatypes. This was mostly due to the need for an ID for these datatypes that can be passed to an application-registered datatype conversion function or datatype conversion exception function. However, using IDs internally caused a lot of unnecessary ID lookups and hurt performance of datatype conversions in general. This was especially problematic for compound datatype conversions, where the ID lookups were occuring on every member of every compound element of a dataset. The code has now been refactored to use pointers internally and only create IDs for datatypes when necessary.

Fixed a test issue in dt_arith where a library datatype conversion function was being cast to an application conversion function. Since the two have different prototypes, this started failing after the parameters for a library conversion function changed from hid_t to H5T_t * and an extra parameter was added. This appears to have worked coincidentally in the past since the only different between a library conversion function and application conversion function was an extra DXPL parameter at the end of an application conversion function.

Fixed an issue where memory wasn't being freed in the h5fc_chk_idx test program. Even though the program exits quickly after allocating the memory, it still causes failures when testing with -fsanitize=address.

The datatype conversion code previously used IDs for the source and
destination datatypes rather than pointers to the internal structures
for those datatypes. This was mostly due to the need for an ID for these
datatypes that can be passed to an application-registered datatype
conversion function or datatype conversion exception function. However,
using IDs internally caused a lot of unnecessary ID lookups and hurt
performance of datatype conversions in general. This was especially
problematic for compound datatype conversions, where the ID lookups were
occuring on every member of every compound element of a dataset. The
code has now been refactored to use pointers internally and only create
IDs for datatypes when necessary.

Fixed a test issue in dt_arith where a library datatype conversion
function was being cast to an application conversion function. Since the
two have different prototypes, this started failing after the parameters
for a library conversion function changed from hid_t to H5T_t * and an
extra parameter was added. This appears to have worked coincidentally in
the past since the only different between a library conversion function
and application conversion function was an extra DXPL parameter at the
end of an application conversion function

Fixed an issue where memory wasn't being freed in the h5fc_chk_idx test
program. Even though the program exits quickly after allocating the
memory, it still causes failures when testing with -fsanitize=address
@jhendersonHDF jhendersonHDF added Merge - To 1.14 Priority - 2. Medium ⏹ It would be nice to have this in the next release Component - C Library Core C library issues (usually in the src directory) Component - Testing Code in test or testpar directories, GitHub workflows Type - Improvement Improvements that don't add a new feature or functionality labels Mar 10, 2024
@jhendersonHDF
Copy link
Collaborator Author

jhendersonHDF commented Mar 10, 2024

There are some other changes I'd like to make in the H5T code, but I wanted to keep this as small as possible and get it out of the way first. I have not yet added a RELEASE.txt entry because this change doesn't really fit under new feature or bugs fixed. Maybe we should have a "performance enhancements" section?

Some quick testing shows that this gives at least a 3x improvement in simple cases of compound conversions, but I'm still profiling to find other spots that can be improved on.

@@ -150,9 +150,17 @@ struct H5T_stats_t {
H5_timevals_t times; /*total time for conversion */
};

/* Context struct for information used during datatype conversions */
typedef struct H5T_conv_ctx_t {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New structure that holds relevant information during conversion to make conversions on compound and other container types faster. Similar to H5T_cdata_t, but that structure is public since it gets passed to an app conversion callback and the information we want here is partly private.

size_t buf_stride, size_t bkg_stride, void *buf, void *bkg);
H5_DLL herr_t H5T__conv_ldouble_ullong(hid_t src_id, hid_t dst_id, H5T_cdata_t *cdata, size_t nelmts,
size_t buf_stride, size_t bkg_stride, void *buf, void *bkg);
H5_DLL herr_t H5T__conv_noop(H5T_t *src, H5T_t *dst, H5T_cdata_t *cdata, const H5T_conv_ctx_t *conv_ctx,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the conversion function prototypes here switched from hid_t src_id, hid_t dst_id -> H5T_t *src, H5T_t *dst and got an extra const H5T_conv_ctx_t *conv_ctx parameter.

@@ -479,357 +487,472 @@ H5_DLL herr_t H5T__commit_named(const H5G_loc_t *loc, const char *name, H5T_t *d
H5_DLL H5T_t *H5T__open_name(const H5G_loc_t *loc, const char *name);
H5_DLL hid_t H5T__get_create_plist(const H5T_t *type);

/* Helper function for H5T_convert that accepts a pointer to a H5T_conv_ctx_t structure */
H5_DLL herr_t H5T_convert_with_ctx(H5T_path_t *tpath, H5T_t *src_type, H5T_t *dst_type,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a new helper function for H5T_convert that helps avoid a severe amount of overhead during compound and other container datatype conversions where H5T_convert is called for every single member of every element. With the changes in this PR, the H5T_conv_ctx_t structure gets initialized in H5T_convert, meaning that any conversion function calling H5T_convert in a loop would initialize a new structure for every member of every element. In the case where an application-registered conversion exception function is being used, this would cause the conversion function to register and then tear down an ID for the source and destination datatypes each time, which was very expensive. Container type conversion functions can instead copy the current H5T_conv_ctx_t structure into a temporary one, update its src_type_id and dst_type_id fields with the datatype IDs for the current member being converted, and then call H5T_convert_with_ctx to avoid all the overhead.

/* Unregister the hard conversion from int to float. Verify the conversion
* is a soft conversion. */
H5Tunregister(H5T_PERS_HARD, NULL, H5T_NATIVE_INT, H5T_NATIVE_FLOAT,
(H5T_conv_t)((void (*)(void))H5T__conv_int_float));
Copy link
Collaborator Author

@jhendersonHDF jhendersonHDF Mar 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was previously casting H5T__conv_int_float from a library conversion function into an application conversion function. Since the two have different prototypes, the changes in this PR starting tripping over this. This appears to have worked in the past since the difference between application conversion functions and library conversion functions is that application functions just have a DXPL parameter at the very end of the list.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes here just fix an issue when testing with -fsanitize=address. Not exactly a problematic memory leak since the program exits quickly, but still causes problems.

dset_op.u.app_op.op = H5D__vlen_get_buf_size_cb;
dset_op.u.app_op.type_id = type_id;
dset_op.op_type = H5S_SEL_ITER_OP_LIB;
dset_op.u.lib_op = H5D__vlen_get_buf_size_cb;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It didn't look like there was a need for the callback here to be an application iter op, as opposed to a library one. If I left this as an application op, the passed in type_id would have to be unwrapped for every single element so that the H5T_t * can be assigned to dset_info.mem_type in the changes above to H5D__vlen_get_buf_size_cb. Switching this to a library iter op passes an H5T_t * to the callback which is much simpler.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, good call

if ((tmp_sid = H5I_register(H5I_DATATYPE, tmp_stype, false)) < 0)
HGOTO_ERROR(H5E_DATATYPE, H5E_CANTREGISTER, FAIL,
"unable to register ID for source datatype");
if ((tmp_did = H5I_register(H5I_DATATYPE, tmp_dtype, false)) < 0)
Copy link
Collaborator Author

@jhendersonHDF jhendersonHDF Mar 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only register an ID for the types here if we have to pass them to an application conversion function

if ((tmp_did = H5I_register(H5I_DATATYPE, tmp_dtype, false)) < 0)
HGOTO_ERROR(H5E_DATATYPE, H5E_CANTREGISTER, FAIL,
"unable to register ID for destination datatype");

if ((conv->u.app_func)(tmp_sid, tmp_did, &cdata, (size_t)0, (size_t)0, (size_t)0, NULL, NULL,
H5CX_get_dxpl()) < 0) {
H5I_dec_ref(tmp_sid);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is legacy code, but adding checks for error returns would be good throughout this section of code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

path->cdata.command = H5T_CONV_INIT;
if (conv->is_app) {
if (tmp_stype && ((src_id = H5I_register(H5I_DATATYPE, tmp_stype, false)) < 0))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would tmp_stype or tmp_dtype be NULL here? Shouldn't you always register src_id and dst_id?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the above code that was previously only copying the type and wrapping it in an ID if path->src and path->dst aren't NULL, I kept that same logic assuming that there's a case where the copying could get skipped. That said, if it made it into one of the following blocks, it would pass NULL or H5I_INVALID_HID depending on the type of conversion function, but after studying the code for a while I couldn't really determine when the case would be that path->src or path->dst are NULL, so I tried to leave the functionality the same.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't the IDs be INVALID then?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. That part of the code probably needs to be studied more since previously AND currently, if it hit that particular spot it would pass invalid stuff to the conversion functions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, re-read your earlier comment and you said it would pass INVALID. :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the earlier version of the code was buggy and you can assume that the tmp_stype & tmp_dtype are non-NULL.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may leave this like it is for now. That function is fairly complicated, but it looks like the datatypes in question could be NULL for the library's internal no-op function, as well as an application-registered soft conversion no-op function. If that's the case it would cause an assertion in the library when trying to copy a NULL type, but these functions seem to happily accept NULL or H5I_INVALID_HID for the datatypes as they're just ignored by the function. It seems like a contrived case but I can't easily predict what people may do. The library's no-op conversion function should never hit that part of the code as it is currently, but an application function might.

"can't copy source compound member datatype");
priv->src_memb[i] = type;

if ((tid = H5I_register(H5I_DATATYPE, type, false)) < 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need IDs for these types still?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still need them for passing to the lower members functions in case a conversion exception function is being used. I'd like to only conditionally allocate the arrays and create the IDs, but there currently isn't anything passed to the conversion function that could tell me at init time whether I need to. The new conversion context is only available at conversion time, not init time. I suppose we could add a field to the H5T_cdata_t structure, though since it's a public one I've tried to avoid that. It also wouldn't really be of interest to an application conversion callback either.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't you call H5T_convert here, instead of H5T_convert_ctx, and let it sort things out?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or add a helper routine (maybe in the _init() call) that determined which ones needed an ID and which could avoid making one?

Copy link
Collaborator Author

@jhendersonHDF jhendersonHDF Mar 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ctx version of convert was actually created for this specific reason since calling H5T_convert would cause an enormous amount of overhead as it would be creating and destroying IDs for every member of every compound element when a conversion exception function is used. That said, maybe I can provide the context object at init time with a field that lets the function know if it needs these IDs and then separate which fields of the context will be valid based on the conversion command type specified in the cdata.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's the direction I was thinking.

0)
HGOTO_ERROR(H5E_DATASET, H5E_CANTREGISTER, FAIL,
"unable to register types for conversion");
if (NULL == (tmp_type = H5T_copy(src_parent, H5T_COPY_ALL)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this type need to be copied? Same question for tons of other copies - I think the only reason 90% of them were originally copied was because they needed an ID and decrementing the ID would delete the type object. Since we don't need the IDs, we shouldn't need the copies.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed (and added a comment above after thinking for a bit). My goal would be to tackle that directly after this PR, I just didn't want to dive into that yet since it can be tricky and this is also already a fairly large set of changes.

Copy link
Contributor

@qkoziol qkoziol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super! Long overdue!

@@ -1498,8 +1498,8 @@ H5T_top_term_package(void)
} /* end if */
} /* end if */
else {
if ((path->conv.u.lib_func)((hid_t)FAIL, (hid_t)FAIL, &(path->cdata), (size_t)0,
(size_t)0, (size_t)0, NULL, NULL) < 0) {
if ((path->conv.u.lib_func)(NULL, NULL, &(path->cdata), NULL, (size_t)0, (size_t)0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need noise casts for small integers

@derobins derobins merged commit ef401a5 into HDFGroup:develop Mar 10, 2024
48 checks passed
@jhendersonHDF jhendersonHDF deleted the dtype_conv_id_removal branch March 11, 2024 00:57
lrknox pushed a commit to lrknox/hdf5 that referenced this pull request Mar 21, 2024
…FGroup#4104)

The datatype conversion code previously used IDs for the source and
destination datatypes rather than pointers to the internal structures
for those datatypes. This was mostly due to the need for an ID for these
datatypes that can be passed to an application-registered datatype
conversion function or datatype conversion exception function. However,
using IDs internally caused a lot of unnecessary ID lookups and hurt
performance of datatype conversions in general. This was especially
problematic for compound datatype conversions, where the ID lookups were
occuring on every member of every compound element of a dataset. The
code has now been refactored to use pointers internally and only create
IDs for datatypes when necessary.

Fixed a test issue in dt_arith where a library datatype conversion
function was being cast to an application conversion function. Since the
two have different prototypes, this started failing after the parameters
for a library conversion function changed from hid_t to H5T_t * and an
extra parameter was added. This appears to have worked coincidentally in
the past since the only different between a library conversion function
and application conversion function was an extra DXPL parameter at the
end of an application conversion function

Fixed an issue where memory wasn't being freed in the h5fc_chk_idx test
program. Even though the program exits quickly after allocating the
memory, it still causes failures when testing with -fsanitize=address
lrknox added a commit that referenced this pull request Mar 21, 2024
* Added new H5E with tests. (#4049)

Added Fortran H5E APIs:
h5eregister_class_f, h5eunregister_class_f, h5ecreate_msg_f, h5eclose_msg_f
h5eget_msg_f, h5epush_f, h5eget_num_f, h5ewalk_f, h5eget_class_name_f,
h5eappend_stack_f, h5eget_current_stack_f, h5eset_current_stack_f, h5ecreate_stack_f,
h5eclose_stack_f, h5epop_f, h5eprint_f (C h5eprint v2 signature)

Addresses Issue #3987

* Don't load toolchain file in CMake workflows (#4077)

* Add support for the new MSVC preprocessor (#4078)

Microsoft has added a new, standards-conformant preprocessor
to MSVC, which can be enabled with /Zc:preprocessor. This
preprocessor trips over our HDopen() function-like variadic
macro since it uses a hack that only works with the legacy
MSVC preprocessor.

This fix adds ifdefs to use the correct HDopen() macro
depending on the MSVC preprocessor selected.

Fixes #2515

* Increased H5FD_ROS3_MAX_SECRET_TOK_LEN to 4096 to accommodate long AWS secret tokens (#4064)

ros3: increased H5FD_ROS3_MAX_SECRET_TOK_LEN to 4096; stratified the debugging statements so there is more control over the output

* Close and reopen file during dset vlen IO API tests (#4050)

- Close/reopen file and file objects to prevent cache from being used instead of actual I/O.
- Moved vlen io test datasets under the dset container group instead of the root group
- Moved the PASSED() invocation to after individual test cleanup in case an error occurs during H5Treclaim

* New option for building with static CRT in Windows (#4062)

* addressed compilation errors with gfortran 4.8 (#4082)

* Fix bin/trace script w/ out params (#4074)

The bin/trace script adds TRACE macros to public API calls in the main
C library. This script had a parsing bug that caused functions that
were annotated with /*out*/, etc. to be labeled as void pointers
instead of typed pointers.

This is mainly a developer feature and not visible to consumers
of the public API.

The bin/trace script now annotates public API calls properly.

Fixes GH #3733

* Use H5T_STD_I32LE to create datatype in vds examples (#4070)

Fixes issues when VDS examples are tested on BE systems

* Remove printf debugging in H5HL code (#4086)

* Fixed asserts due to H5Pset_est_link_info() values (#4081)

* Fixed asserts due to H5Pset_est_link_info() values

If large values for est_num_entries and/or est_name_len were passed
to H5Pset_est_link_info(), the library would attempt to create an
object header NIL message to reserve enough space to hold the links in
compact form (i.e., concatenated), which could exceed allowable object
header message size limits and trip asserts in the library.

This bug only occurred when using the HDF5 1.8 file format or later and
required the product of the two values to be ~64k more than the size
of any links written to the group, which would cause the library to
write out a too-large NIL spacer message to reserve the space for the
unwritten links.

The library now inspects the phase change values to see if the dataset
is likely to be compact and checks the size to ensure any NIL spacer
messages won't be larger than the library allows.

Fixes GitHub #1632

* Fix copy-paste comments

* update macOS support statement (#4084)

* fixes compilation failures due to H5E additions (#4090)

* Remove extra whitespaces from nvhpc-cmake action. (#4091)

* Remove printf debugging in H5I package (#4088)

* Add subfiling for h5dump filedriver option help message (#3878)

* Merge HDF4 release workflow changes to hdf5 (#4093)

* Update long double test with correct values (#4060)

Update long double test with correct values

* virtual creates must use the same datatype as the main file (#4092)

* Fixed -Wdeprecated-copy-dtor warnings by implementing a copy assignment operator (#3306)

Example warning was:

warning: definition of implicit copy assignment operator for 'Group' is deprecated because it has a user-declared destructor [-Wdeprecated-copy-dtor]

* Expand check for variable-length or reference types when clearing datatype conversion paths (#4085)

When clearing out datatype conversion paths involving variable-length or reference datatypes
on file close, also check for these datatypes inside compound or array datatypes

* Remove H5B debug checks (#4089)

The H5B (version 1 B-tree) package would add some computationally
expensive integrity checks when H5B_DEBUG was defined. Due to their
negative effects on performance, this option was rarely turned on,
making the H5B__assert() check function stale, if not dead, code.

This change:

* Builds H5B__assert() when NDEBUG is not defined (the function
  relies on assert()) so it gets compiled more often.
* Removes some printf debugging statements in the B-tree code
* Removes all H5B "extra debug" checks that are leftover from
  past debugging sessions. Maintainers can add H5B__assert()
  selectively to perform integrity checks when debugging.
* Removes the HDF5_ENABLE_DEBUG_H5B CMake option

H5B_DEBUG now has no effect

* Fix the last C++ stack size warning (#4099)

* Clean up off_t usage (#4095)

* Add comments to C++ and Fortran API calls that use off_t
* Remove noise casts for small integers

* Correct missing change of source path for S3 build (#4100)

* Remove leading / from relative path. (#4101)

* msvc: don't declare `HAVE_TIMEZONE` for older MSVC (#3956)

It was introduced in MSVC 15 (Visual Studio 2017).

* Remove a few H5O printf debugging statements (#4096)

These were in H5Oint.c, were protected by H5O_DEBUG, and only dumped
to stdout if the HDF5_DEBUG environment variable were set to do so.

* Remove H5DEBUG() calls from H5Dmpio.c (#4103)

Just use stdout when a stream is needed.

* Remove printf debugging from H5Smpio.c (#4098)

* Change how stats are printed in H5Z (#4097)

H5Z used the soon-to-be-removed HDEBUG macro to decide if stats
would be dumped and to what stream. This is now handled by a
DUMP_DEBUG_STATS_g variable and the output is always sent to
stdout.

This is an internal change, not normally visible to users.

* Refactor datatype conversion code to use pointers rather than IDs (#4104)

The datatype conversion code previously used IDs for the source and
destination datatypes rather than pointers to the internal structures
for those datatypes. This was mostly due to the need for an ID for these
datatypes that can be passed to an application-registered datatype
conversion function or datatype conversion exception function. However,
using IDs internally caused a lot of unnecessary ID lookups and hurt
performance of datatype conversions in general. This was especially
problematic for compound datatype conversions, where the ID lookups were
occuring on every member of every compound element of a dataset. The
code has now been refactored to use pointers internally and only create
IDs for datatypes when necessary.

Fixed a test issue in dt_arith where a library datatype conversion
function was being cast to an application conversion function. Since the
two have different prototypes, this started failing after the parameters
for a library conversion function changed from hid_t to H5T_t * and an
extra parameter was added. This appears to have worked coincidentally in
the past since the only different between a library conversion function
and application conversion function was an extra DXPL parameter at the
end of an application conversion function

Fixed an issue where memory wasn't being freed in the h5fc_chk_idx test
program. Even though the program exits quickly after allocating the
memory, it still causes failures when testing with -fsanitize=address

* Minimize use of abort() (#4110)

The abort() call is used at several places where it probably shouldn't.

* Clean up a few things in H5T.c (#4105)

* remove (size_t) noise casts
* replace (hid_t)FAIL with H5I_INVALID_HID

* Convert H5B__assert to use error checks (#4109)

Switches assert() calls to HGOTO_ERROR in H5B__assert() so it can be
used in production mode. Also renames it to H5B__verify_structure()
to better reflect what it checks.

* Move common variables out of cache test blocks (#4108)

Fixes a stack size warning w/ XCode

* Remove lint comments (#4107)

* Change compression tests reference files to use masking for compression ratios (#4083)

Rework TEST_FILTER tests to handle slightly different compression ratios

* Add Doxygen for HDFS VFD (#4106)

* Add Doxygen for HDFS VFD

* Fix Doxygen warning

* Update H5FDhdfs.h

* long double tests has problems setting precision with offset (#4102)

* long double tests has problems setting precision with offset

* Disable long double until more development fixes issues

* Fix up dsets test for some platforms with different long double format (#4114)

* Adjust aocc workflow to fit the autotools/cmake pattern of other callable workflows. (#4115)

* Implement ID creation optimization for container datatype conversions (#4113)

Makes the datatype conversion context object available during both the
initialization and conversion processes for a datatype conversion
function, allowing the compound, variable-length and array datatype
conversion functions to avoid creating IDs for the datatypes when they
aren't necessary

Adds internal H5CX_pushed routine to determine if an API context is
available to retrieve values from

Also adds error checking to several places in H5T.c and H5Tconv.c where
the code had previously assumed object close operations would succeed

* Handle IBM long double issues in dsets.c test_floattypes test (#4116)

* Handle IBM long double issues in dsets.c test_floattypes test

* Remove old cmake check (#4117)

* Use AC_SYS_LARGEFILE on Autotools (#4119)

We previously used a hack introduced in 1.8.5 to paper over differences
in off_t and off64_t when determining the type sizes. We no longer explicitly
support off64_t in the library and AC_SYS_LARGEFILE works fine.

* Initialize selection type in chunk struct (#4087)

* Overhaul CMake LFS support (#4122)

Externally visible:
* The HDF_ENABLE_LARGE_FILE option (advanced) has been removed
* We no longer run a test program to determine if LFS works, which
  will help with cross-compiling
* On Linux we now unilaterally set -D_LARGEFILE_SOURCE and
  -D_FILE_OFFSET_BITS=64, regardless of 32/64 bit system. CMake
  doesn't offer a nice equivalent to AC_SYS_LARGEFILE and since
  those options do nothing on 64-bit systems, this seems safe and
  covers all our bases. We don't set -D_LARGEFILE64_SOURCE since
  we don't use any of the POSIX 64-bit specific API calls like
  ftello64, as noted above.
* We didn't test for LFS support on non-Linux platforms. We've added
  comments for how LFS should probably be supported on AIX and Solaris,
  which seem to be alive, though uncommon. PRs would be appreciated if
  anyone wishes to test this.

Internal:
* Drops off64_t size checks since this is unused (as in Autotools)
* Remove HDF_EXTRA_FLAGS, which is now unused
* Remove hack around deprecated LINUX_LFS

Fixes #2395

* Update CMake comment about _POSIX_C_SOURCE (#4124)

Was missng the 2008 pread/write info

* Deprecate bin/cmakehdf5 (#4127)

* Deprecate bin/cmakehdf5

* Add reference text

* Don't set the rpath when linking statically (#4125)

* Remove invalid compile flag (#4129)

* Fix segfault in vlen io API test (#4130)

* Update URLs in RELEASE.txt (#4132)

* Add cygwin CI and update yaml files for consistency and accuracy (#4131)

* Add cygwin CI

* add cygwin packages

* Correct option names

* Cleanup yaml file and synch look and feel

* Synch CI look and feel and correct path issues

* Upgrade oneapi version

* pwsh needs env: for vars

* No continuation char for pwsh

* restore correct pwsh step

* Run subset of tests for cygwin workflow

* Remove space chars in regex

* restore full tests

* Remove ros3 and hdfs VFDs from Autotools VFD list (#4142)

These will never pass `make check` and would require a custom test
suite for more comprehensive testing.

* Skip part of dsets.c test for IBM long double type (#4136)

* Capitalize option message for consistency. (#4141)

* Fixed misc. H5E fortran failures due to previous PR (#4138)

* fixed promotion of integers and reals tests and check-passthrough-vol failure

* fixed cygwin issue

* Fix Autotools -Werror cleanup (#4144)

The Autotools temporarily scrub -Werror(=whatever) from CFLAGS, etc.
  so configure checks don't trip over warnings generated by configure
  check programs. The sed line originally only scrubbed -Werror but not
  -Werror=something, which would cause errors when the '=something' was
  left behind in CFLAGS.

  The sed line has been updated to handle -Werror=something lines.

  Fixes one issue raised in #3872

* Fix doxygen link to example function usage (#4133)

* Remove useless headers (#4145)

Removes unnecessary headers from C library source files.

* Clean up some hbool_t/TRUE/FALSE stragglers (#4143)

It looks like most of these snuck in via selection I/O work

* Fix error when overwriting an indirectly nested vlen with a shorter sequence (#4140)

* defined CMAKE_H5_HAVE_DARWIN (#4146)

* Make the newsletter scheme work like HDF4 (#4149)

* Remove  at the end of list item. (#4151)

* Fix buffer size calculation in the deflate filter (#4147)

* Remove H5O header and friend status from H5A.c (#4150)

* Remove HDF from Fortran 2003 configuration check message. (#4157)

* Suppress H5Dmpio debugging output unless HDF5_DEBUG=d is set (#4155)

* Header cleanup in C library (#4154)

* Ensure H5FL header is included everywhere

* Ensure H5SL header is included everywhere

* Ensure H5MM header is included everywhere

* Add Doxygen to H5FDmirror.h (#4158)

* Remove lseek64 and stat64 symbols from CMake (#4163)

We don't use these in the library.

* Remove HAVE_IOEO checks from CMake (#4160)

This was intended to check for thread-safety functionality on Windows.
The required functionality has been standard since Windows Vista, so
these checks can be removed.

* Fix some minor warnings (#4165)

* Bump the size of the mirror VFD IP field (#4167)

The IP address string isn't big enought to hold an IPv4-mapped IPv6
address.

* Fix mirror VFD script (#4170)

This had directory problems when running locally.

* Fix an issue where the Subfiling VFD's context cache grows too large (#4159)

* Address code page issues w/ Windows file paths (#4172)

On Windows, HDF5 attempted to convert file paths passed to open() and
remove() to UTF-16 in order to handle Unicode file paths. This scheme
does not work when the system uses code pages to handle non-ASCII
file names.

As suggested in the forum post below, we now also try to see if we
can open the file with open(), which should handle systems where
non-ASCII code pages are in use.

https://forum.hdfgroup.org/t/open-create-hdf5-files-with-non-utf8-chars-such-as-shift-jis/11785

* Add Doxygen to API calls in H5VLnative.h (#4173)

* Allow H5Soffset_simple to accept NULL offsets (#4152)

The reference manual states that the offset parameter of H5Soffset_simple()
  can be set to NULL to reset the offset of a simple dataspace to 0. This
  has never been true, and passing NULL was regarded as an error.

  The library will now accept NULL for the offset parameter and will
  correctly set the offset to zero.

  Fixes HDFFV-9299

* Add filter plugin user guide text. Fix registered URL in docs (#4169)

* Add support for _Float16 16-bit floating point type (#4065)

Fixed some conversion issues with Clang due to problematic undefined
behavior when casting a negative floating-point value to an integer

Fixed a bug in the library's software integer to floating-point
conversion function where a user's conversion exception function
returning H5T_CONV_UNHANDLED in the case of overflows would result in
incorrect data after conversion

Added configure checks for functions and macros related to _Float16
usage since some compilers expose the datatype but not the functions or
macros

Fixed a dt_arith test failure when H5_WANT_DCONV_EXCEPTION isn't defined

Fixed a few warnings from not explicitly casting some _Float16 variables
upwards

* Remove some H5T_copy calls that are now unnecessary (#4164)

Removes some datatype copying calls that are now unnecessary after
refactoring the datatype conversion code to use pointers internally
rather than IDs

Rewrites the enum conversion function so that it uses cached copies
of the source and destination datatypes in order to avoid modifying
the datatypes passed in

Adds a 'recursive' field to the datatype conversion context which
allows the conversion functions for members of a container datatype
to skip unnecessary repetitive conversion setup code

Changes internal datatype conversion callback functions so that the
source and destination datatype structure pointers are const

Removes some unused and unnecessary internal IDs registered with
H5I_register

* Add RELEASE.txt note for cmpd segfault fix (#4175)

RELEASE notice for the fix in #3842

* Clean up CMake direct VFD handling (#4161)

There's no need to build and run programs, or even check the operating
system. We just need to check for O_DIRECT and posix_memalign().

* Capitalize linux for consistency (#4178)

* Reworked H5Epush_f (#4153)

* Add const to new _Float16 conversion routine parameters (#4181)

* Update Release Specific Information link. (#4179)

* Filter plugins updates for registration URL (#4180)

* Update filter plugin URL to new location

* Adjust test array size

* Add daily VFD CI workflow (#4176)

Adds testing of Subfiling VFD

* Exclude shell tests from sanitizers (#4186)

* Add a missing period at the end of sentence. (#4184)

* last-file.txt should not be created for release workflow (#4185)

* Skip part of dtypes.c _Float16 file size check for certain VFDs (#4182)

* Fixes several MinGW + Autotools issues (#4190)

* Fixes detection of various Windows libraries, etc.
* Corrects alarm(2) configure checks
* Uses Win32 threads by default w/ Pthreads override, if desired
* Set _WIN32_WINNT correctly for MinGW
* Fix setenv(3) wrapper for MinGW, which does not have getenv_s()

MinGW Autotools support is still not Amazing, but this at least
allows the library and tools build and is better about thread-safety

* Add semicolons to the end of HSYS_GOTO_ERROR (#4193)

Looks like we forgot these when we did the other macros.

* Remove broken links (#4187)

* Skip vlen IO API test for cache VOL (#4135)

* Fix cache VOL segfault in vlen io test
* Skip vlen IO API test

* Handle certain empty subfiling environment variables  (#4038)

* h5diff compares attribute data like dataset data (#4191)

Updates tools docs to indicate that dataset and attribute data are compared in the same way

* A path component may include a dot with other characters (#4192)

* Add RELEASE.txt note for recent datatype conversion improvements (#4195)

* Add NEWSLETTER item about _Float16 support (#4197)

* Correct download link for develop doxygen (#4196)

* Update version in new .yml files.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component - C Library Core C library issues (usually in the src directory) Component - Testing Code in test or testpar directories, GitHub workflows Priority - 2. Medium ⏹ It would be nice to have this in the next release Type - Improvement Improvements that don't add a new feature or functionality
Projects
Status: Needs Merged
Development

Successfully merging this pull request may close these issues.

3 participants