-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve html representation of datasets #1100
base: dev
Are you sure you want to change the base?
Conversation
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## dev #1100 +/- ##
=======================================
Coverage 89.08% 89.09%
=======================================
Files 45 45
Lines 9890 9944 +54
Branches 2816 2825 +9
=======================================
+ Hits 8811 8860 +49
- Misses 763 765 +2
- Partials 316 319 +3 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! Thanks for the PR.
Could you add tests for the data html representation with hdf5 and zarr? I think we mainly have string equivalence tests for this kind of thing.
I'm also wondering if it would be nice to have the hdf5 dataset info displayed in a similar table format as the zarr arrays to make it more consistent across backends. I think we should be able to replicate this using the hdf5 dataset info as an input to a method like this: https://github.com/zarr-developers/zarr-python/blob/9d046ea0d2878af7d15b3de3ec3036fe31661340/zarr/util.py#L402
@stephprince from hdmf.container import Container
container = Container(name="Container")
container.__fields__ = {
"name": "data",
"description": "test data",
}
test_data = np.array([1, 2, 3, 4, 5])
setattr(container, "data", test_data)
container.fields But the data is not added as a field. How can I move forward? |
Related: |
…f_data' into improve_html_repr_of_data
for more information, see https://pre-commit.ci
…f_data' into improve_html_repr_of_data
I added the handling division by zero, check this out what happens with external files (like Video): From this example: import remfile
import h5py
asset_path = "sub-CSHL049/sub-CSHL049_ses-c99d53e6-c317-4c53-99ba-070b26673ac4_behavior+ecephys+image.nwb"
recording_asset = dandiset.get_asset_by_path(path=asset_path)
url = recording_asset.get_content_url(follow_redirects=True, strip_query=True)
file_path = url
rfile = remfile.File(file_path)
file = h5py.File(rfile, 'r')
from pynwb import NWBHDF5IO
io = NWBHDF5IO(file=file, mode='r')
nwbfile = io.read()
nwbfile |
@stephprince when you have time, can you review this? |
Rereading through this discussion, I believe where we left off is that the we want to remove the backend-specific logic from the In this PR we:
In a separate PR on
In the read_io = self.get_read_io() # if the Container was read from file, this will give you the IO object that read it
if read_io is not None:
html_repr = read_io.generate_dataset_html(my_dataset)
else:
# The file was not read from disk so the dataset should be numpy array or a list @h-mayorquin did you want to do this? Otherwise I can go ahead and make the proposed changes to finish up this PR. |
Hi, @stephprince I think this is a good summary. I am not sure how to decouple HDF5IO.generate_dataset_html(h5py.Dataset) here as hdmf seems super coupled with hdf5. Or is it the idea that we only want to exclude zarr? This has been on the back of my mind for a while and everytime but I had other priorities. It would be great if you have time to finish it. |
@h-mayorquin yes I can take a look at it and finish it up |
for more information, see https://pre-commit.ci
I have pushed the updates we discussed:
I tested a Zarr implementation that looks like this and can submit a PR in hdmf_zarr for that: def generate_dataset_html(dataset):
"""Generates an html representation for a dataset for the ZarrIO class"""
# get info from zarr array and generate html repr
zarr_info_dict = {k:v for k, v in dataset.info_items()}
repr_html = generate_array_html_repr(zarr_info_dict, dataset, "Zarr Array")
return repr_html @oruebel @h-mayorquin if you could please review and let me know if there are any remaining concerns |
Looks good to me, thanks for taking on this. |
def generate_dataset_html(dataset): | ||
"""Generates an html representation for a dataset for the HDF5IO class""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be useful to raise a corresponding issue on HDMF_ZARR to have generate_dataset_html
be implemented on ZarrIO
as well (if we have not done this yet). @stephprince can you make and issue
@staticmethod | ||
def generate_dataset_html(dataset): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this function would be triggered when using ZarrIO
. Did we test that this indeed works with ZarrIO
?
Motivation
Improve the display of the data in the html representation of containers. Note that this PR is focused on datasets that were already written. For in memory representation the issue on what to do with things that are wrapped in an iterator or an
DataIO
subtype can be addressed in another PR I think.How to test the behavior?
HDF5
I have been using this script
Zarr
Checklist
CHANGELOG.md
with your changes?