Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Allow users to set data alignment within netcdf4 files #2178

Closed
wants to merge 2 commits into from

Conversation

hmaarrfk
Copy link
Contributor

@hmaarrfk hmaarrfk commented Jan 8, 2022

Closes #2177

I'm really not sure how to make this "backward" compatible with older versions.

@hmaarrfk hmaarrfk requested a review from WardF as a code owner January 8, 2022 17:12
@DennisHeimbigner
Copy link
Collaborator

A couple of immediate questions.

  1. What is the use case(s) for this?
  2. Can this be done by a separate (new) API function that must be called after nc_create, but before anything is written to the file?

@hmaarrfk
Copy link
Contributor Author

hmaarrfk commented Jan 8, 2022

The main usecase for this is to memorymap large arrays in an alligned fashion. This enables efficient transfer of data between SSDs, network interfaces, and RAM.

I'll look to see if i can do this later after opening.

@hmaarrfk
Copy link
Contributor Author

hmaarrfk commented Jan 8, 2022

I honestly tried to look through the documentation, it isn't clear that you can change the FAPL after creation.

From the examples, they seem to open a file, copy the FAPL, then close it, then reopen it with a different FAPL.

I also posted the question on the forum, but I need to get moderated. It should appear
https://forum.hdfgroup.org/c/hdf5
in a few days i guess

@hmaarrfk
Copy link
Contributor Author

hmaarrfk commented Jan 8, 2022

Sorry, looking at the source, it is clear that it is a "copy" of the FAPL.
https://github.com/HDFGroup/hdf5/blob/develop/src/H5F.c#L152
image

@edwardhartnett
Copy link
Contributor

No you cannot change the FAPL in HDF5 after file open...

@DennisHeimbigner
Copy link
Collaborator

I will have to think about this.

@edwardhartnett
Copy link
Contributor

As we discussed in the other thread, alignment has to be set before the file open. When netcdf-c opens a file, it will use the alignment setting to call the HDF5 alignment function when it creates the FAPL...

@hmaarrfk
Copy link
Contributor Author

Closing this thread until we decide how the API will be designed. It seems a different approach is the stronger contender.

@hmaarrfk
Copy link
Contributor Author

On the HDF5 forum, it is confirmed that it must be specified at open time:
https://forum.hdfgroup.org/t/dynamically-change-the-file-access-property-list/9314/3

DennisHeimbigner added a commit to DennisHeimbigner/netcdf-c that referenced this pull request Jan 29, 2022
re: Unidata#2177
re: Unidata#2178

Provide get/set functions to store global data alignment
information and apply it when a file is created.

The api is as follows:
````
int nc_set_alignment(int threshold, int alignment);
int nc_get_alignment(int* thresholdp, int* alignmentp);
````

If defined, then for every file created opened after the call to
nc_set_alignment, for every new variable added to the file, the
most recently set threshold and alignment values will be applied
to that variable.

The nc_get_alignment function return the last values set by
nc_set_alignment.  If nc_set_alignment has not been called, then
it returns the value 0 for both threshold and alignment.

The alignment parameters are stored in the NCglobalstate object
(see below) for use as needed. Repeated calls to nc_set_alignment
will overwrite any existing values in NCglobalstate.

The alignment parameters are applied in libhdf5/hdf5create.c
and libhdf5/hdf5open.c

The set/get alignment functions are defined in libsrc4/nc4internal.c.

A test program was added as nc_test4/tst_alignment.c.

## Misc. Changes Unrelated to Alignment

* The NCRCglobalstate type was renamed to NCglobalstate to
  indicate that it represented more general global state than
  just .rc data.  It was also moved to nc4internal.h.  This led
  to a large number of small changes: mostly renaming. The
  global state management functions were moved to nc4internal.c.

* The global chunk cache variables have been moved into
  NCglobalstate.  As warranted, other global state will be moved
  as well.

* Some misc. problems with the nczarr performance tests were corrected.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow users to specify data alignment
3 participants