Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement reading/writing HDF5 object references in attribute #96

Merged
merged 1 commit into from
Nov 8, 2021
Merged

implement reading/writing HDF5 object references in attribute #96

merged 1 commit into from
Nov 8, 2021

Conversation

ilia-kats
Copy link
Contributor

h5writeAttribute(object_to_reference, object, attribute_name) will write a reference to object_to_reference into attribute_name of object. Similarly, if attr is an attribute containing an object reference, H5Aread(attr) will return an H5IdComponent of the referenced object.

@codecov
Copy link

codecov bot commented Sep 9, 2021

Codecov Report

Merging #96 (d045d45) into master (996bbfb) will decrease coverage by 0.04%.
The diff coverage is 75.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #96      +/-   ##
==========================================
- Coverage   74.84%   74.80%   -0.05%     
==========================================
  Files          34       34              
  Lines        1805     1814       +9     
==========================================
+ Hits         1351     1357       +6     
- Misses        454      457       +3     
Impacted Files Coverage Δ
R/h5writeAttr.R 88.23% <71.42%> (-4.87%) ⬇️
R/H5A.R 73.68% <75.00%> (-0.29%) ⬇️
R/h5create.R 85.98% <100.00%> (+0.08%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 996bbfb...d045d45. Read the comment docs.

h5writeAttribute(object_to_reference, object, attribute_name) will write
a reference to object_to_reference into attribute_name of object.
Similarly, if attr is an attribute containing an object reference,
H5Aread(attr) will return an H5IdComponent of the referenced object.
@grimbough grimbough self-assigned this Sep 28, 2021
@grimbough
Copy link
Owner

Thanks @ilia-kats for creating this pull request, and apologies for taking so long to get round to doing something with it.

I'm not against incorporating something like this in principle, but I wonder if adding this functionality to h5writeAttributes() is really the best place for it. My general design principle is that the h5x() functions should be relatively simple wrappers around common operations and work with standard R datatypes, without too many options and arguments for users to interpret. If someone really wants to get into the deep HDF5 details then the H5X() functions map to the C-API and should be used for the lower-level operations, where you get the full range of oesoteric stuff HDF5 allows. This feels like it falls into that later category, but I'm happy to be persuaded otherwise.

I've never used HDF5 object references before, so I'm curious what you're doing with them. Do you have any example code or schematics for the file type you're developing?

@ilia-kats
Copy link
Contributor Author

Thanks for your reply. I'm working on a pure-R implementation of the AnnData format for single-cell omics data. AnnData is using object references to handle categorical (factor) columns in data frames. The HDF5 object for the column stores the integer codes along with a reference to another HDF5 object storing the labels (code). AnnData has been around for a while and there are tons of these files around, so I'm not really flexible regarding the format.

I briefly looked into a low-level wrapper around HDF5 references when I was writing this PR, and wrapping the entire API would require quite some time, which is why I chose to implement this directly in h5writeAttributes. I can try to do a partial wrapper implementing only what is required to get this particular functionality to work, unless you have a better idea?

@grimbough grimbough changed the base branch from master to object-references November 8, 2021 09:29
@grimbough grimbough merged commit 359c8a3 into grimbough:object-references Nov 8, 2021
@grimbough
Copy link
Owner

I've tried to make the complete H5R API from HDF5 1.10 available in the object-references branch. Thanks a lot for the starting point, was helpful to build on your code. This now supports the dataset region references too if you happen to need those at any point.

Having used the functions I can see why some wrapper functions do do the dereferencing automatically would be nice, and I'll probably add those fairly soon, but I don't have time right now. However this API should remain pretty stable if you want to work with that. I'll merge it into bioc-devel once I've written a few tests and the manual pages.

Hopefully the examples below are useful, but it looks like you know what you're doing. Let me know if anything is missing or doesn't behave as expected.

## create an example file with a group and a dataset
library(rhdf5)
file_name <- tempfile()
h5createFile(file_name)
h5createGroup(file = file_name, group = "/foo")
#> [1] TRUE
h5write(1:100, file=file_name, name="/foo/baa")

###################################################
## Writing references as an attribute #############
###################################################

## open file and create referece to /foo/baa dataset
fid <- H5Fopen(file_name)
ref_to_dataset <- H5Rcreate(fid, name = "/foo/baa")

## create an attribute to contain our object ref
sid <- H5Screate_simple( length(ref_to_dataset) )
tid <- H5Tcopy(dtype_id = "H5T_STD_REF_OBJ")
obj_ref_attr <- H5Acreate(fid, name = "object_refs", dtype_id = tid, h5space = sid)

## write our references to the attribute & close
H5Awrite(h5attribute = obj_ref_attr, buf = ref_to_dataset)
#> Object reference

## tidy up
H5Aclose(obj_ref_attr)
H5Sclose(sid)
H5Fclose(fid)

###################################################
## Reading reference & dereferencing dataset ######
###################################################

## open file and read attribute 
fid <- H5Fopen(file_name)
aid <- H5Aopen(h5obj = fid, name = 'object_refs')
references <- H5Aread(h5attribute = aid)
## this is an H5Ref object
references
#> HDF5 REFERENCE
#> Type: H5R_OBJECT 
#> Length: 1

## apply the ref to the file handle and recieve a dataset identifier
dset_from_ref <- H5Rdereference(ref = references, h5loc = fid)
H5Dread(dset_from_ref)
#>   [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
#>  [19]  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36
#>  [37]  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54
#>  [55]  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72
#>  [73]  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90
#>  [91]  91  92  93  94  95  96  97  98  99 100

## tidy up
H5Aclose(aid)
H5Dclose(dset_from_ref)
H5Fclose(fid)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants