improve html representation of datasets #1100

h-mayorquin · 2024-04-19T16:01:05Z

Motivation

Improve the display of the data in the html representation of containers. Note that this PR is focused on datasets that were already written. For in memory representation the issue on what to do with things that are wrapped in an iterator or an DataIO subtype can be addressed in another PR I think.

How to test the behavior?

HDF5

I have been using this script

from pynwb.testing.mock.ecephys import mock_ElectricalSeries
from pynwb.testing.mock.file import mock_NWBFile
from hdmf.backends.hdf5.h5_utils import H5DataIO
from pynwb.testing.mock.ophys import mock_ImagingPlane, mock_TwoPhotonSeries

import numpy as np

data=np.random.rand(500_000, 384)
timestamps = np.arange(500_000)
data = data=H5DataIO(data=data, compression=True, chunks=True)

nwbfile = mock_NWBFile()
electrical_series = mock_ElectricalSeries(data=data, nwbfile=nwbfile, rate=None, timestamps=timestamps)

imaging_plane = mock_ImagingPlane(grid_spacing=[1.0, 1.0], nwbfile=nwbfile)


data = H5DataIO(data=np.random.rand(2, 2, 2), compression=True, chunks=True)
two_photon_series = mock_TwoPhotonSeries(name="TwoPhotonSeries", imaging_plane=imaging_plane, data=data, nwbfile=nwbfile)


# Write to file
from pynwb import NWBHDF5IO
with NWBHDF5IO('ecephys_tutorial.nwb', 'w') as io:
    io.write(nwbfile)



from pynwb import NWBHDF5IO

io = NWBHDF5IO('ecephys_tutorial.nwb', 'r')
nwbfile = io.read()
nwbfile

Zarr

from numcodecs import Blosc
from hdmf_zarr import ZarrDataIO
import numpy as np
from pynwb.testing.mock.file import mock_NWBFile
from hdmf_zarr.nwb import NWBZarrIO
import os
import zarr
from numcodecs import Blosc, Delta

from pynwb.testing.mock.ecephys import mock_ElectricalSeries
filters = [Delta(dtype="i4")]

data_with_zarr_data_io = ZarrDataIO(
    data=np.arange(100000000, dtype='i4').reshape(10000, 10000),
    chunks=(1000, 1000),
    compressor=Blosc(cname='zstd', clevel=3, shuffle=Blosc.SHUFFLE),
    # filters=filters,
)

timestamps = np.arange(10000)

data = data_with_zarr_data_io

nwbfile = mock_NWBFile()
electrical_series_name = "ElectricalSeries"
rate = None
electrical_series = mock_ElectricalSeries(name=electrical_series_name, data=data, nwbfile=nwbfile, timestamps=timestamps, rate=None)


path = "zarr_test.nwb.zarr"
absolute_path = os.path.abspath(path)
with NWBZarrIO(path=path, mode="w") as io:
    io.write(nwbfile)
    
from hdmf_zarr.nwb import NWBZarrIO

path = "zarr_test.nwb.zarr"

io = NWBZarrIO(path=path, mode="r")
nwbfile = io.read()
nwbfile

Checklist

Did you update CHANGELOG.md with your changes?
Does the PR clearly describe the problem and the solution?
Have you reviewed our Contributing Guide?
Does the PR use "Fix #XXX" notation to tell GitHub to close the relevant issue numbered XXX when the PR is merged?

for more information, see https://pre-commit.ci

src/hdmf/container.py

for more information, see https://pre-commit.ci

codecov · 2024-04-23T15:18:43Z

Codecov Report

Attention: Patch coverage is 87.50000% with 8 lines in your changes missing coverage. Please review.

Project coverage is 89.12%. Comparing base (06a62b9) to head (01f8f8f).
Report is 31 commits behind head on dev.

Files with missing lines	Patch %	Lines
src/hdmf/utils.py	87.50%	2 Missing and 2 partials ⚠️
src/hdmf/backends/hdf5/h5tools.py	84.61%	1 Missing and 1 partial ⚠️
src/hdmf/container.py	84.61%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##              dev    #1100      +/-   ##
==========================================
+ Coverage   89.08%   89.12%   +0.03%     
==========================================
  Files          45       45              
  Lines        9890     9944      +54     
  Branches     2816     2825       +9     
==========================================
+ Hits         8811     8863      +52     
+ Misses        763      762       -1     
- Partials      316      319       +3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

stephprince

This looks great! Thanks for the PR.

Could you add tests for the data html representation with hdf5 and zarr? I think we mainly have string equivalence tests for this kind of thing.

I'm also wondering if it would be nice to have the hdf5 dataset info displayed in a similar table format as the zarr arrays to make it more consistent across backends. I think we should be able to replicate this using the hdf5 dataset info as an input to a method like this: https://github.com/zarr-developers/zarr-python/blob/9d046ea0d2878af7d15b3de3ec3036fe31661340/zarr/util.py#L402

src/hdmf/container.py

for more information, see https://pre-commit.ci

h-mayorquin · 2024-04-26T23:06:25Z

OK, I added table formating for hdf5:

h-mayorquin · 2024-04-26T23:07:46Z

@stephprince
Concerning the test, yes, I can do it, but, can you helmp to create a container that contains array data? I just don't have experienced with the bare bones object. This is my attempt:

from hdmf.container import Container

container = Container(name="Container")
container.__fields__ = {
    "name": "data",
    "description": "test data",
}

test_data = np.array([1, 2, 3, 4, 5])
setattr(container, "data", test_data)
container.fields

But the data is not added as a field. How can I move forward?

h-mayorquin · 2024-04-29T14:30:33Z

…f_data' into improve_html_repr_of_data

for more information, see https://pre-commit.ci

…f_data' into improve_html_repr_of_data

tests/unit/test_container.py

h-mayorquin · 2024-04-30T20:51:23Z

I added the handling division by zero, check this out what happens with external files (like Video):

From this example:

import remfile
import h5py

asset_path = "sub-CSHL049/sub-CSHL049_ses-c99d53e6-c317-4c53-99ba-070b26673ac4_behavior+ecephys+image.nwb"
recording_asset = dandiset.get_asset_by_path(path=asset_path)
url = recording_asset.get_content_url(follow_redirects=True, strip_query=True)
file_path = url

rfile = remfile.File(file_path)
file = h5py.File(rfile, 'r')

from pynwb import NWBHDF5IO

io = NWBHDF5IO(file=file, mode='r')

nwbfile = io.read()
nwbfile

rly · 2024-10-02T20:45:43Z

@stephprince when you have time, can you review this?

stephprince · 2024-10-17T16:54:07Z

Rereading through this discussion, I believe where we left off is that the we want to remove the backend-specific logic from the Container class. To do so, it was proposed that:

In this PR we:

Add HDMFIO.generate_dataset_html(dataset) which would implement a minimalist representation
Implement HDF5IO.generate_dataset_html(h5py.Dataset) to represent an h5py.Dataset

In a separate PR on hdmf_zarr we would:

implement ZarrIO.generate_dataset_html(Zarr.array)

In the Container class, it would look like this:

read_io = self.get_read_io()  # if the Container was read from file, this will give you the IO object that read it
if read_io is not None:
    html_repr = read_io.generate_dataset_html(my_dataset)
else:
    # The file was not read from disk so the dataset should be numpy array or a list

@h-mayorquin did you want to do this? Otherwise I can go ahead and make the proposed changes to finish up this PR.

h-mayorquin · 2024-10-17T17:06:10Z

Hi, @stephprince

I think this is a good summary.

I am not sure how to decouple HDF5IO.generate_dataset_html(h5py.Dataset) here as hdmf seems super coupled with hdf5. Or is it the idea that we only want to exclude zarr?

This has been on the back of my mind for a while and everytime but I had other priorities. It would be great if you have time to finish it.

stephprince · 2024-10-17T21:18:27Z

@h-mayorquin yes I can take a look at it and finish it up

for more information, see https://pre-commit.ci

stephprince · 2024-10-30T21:43:32Z

I have pushed the updates we discussed:

added utility functions generate_array_html_repr and get_basic_array_info to the utils module to get basic array info and generate an array html table
added a static HDMFIO.generate_dataset_html() method, the HDF5/Zarr implementations collect information from the dataset and then generate the actual html representation
updated Container._generate_array_html() to use these methods

I tested a Zarr implementation that looks like this and can submit a PR in hdmf_zarr for that:

def generate_dataset_html(dataset):
    """Generates an html representation for a dataset for the ZarrIO class"""

    # get info from zarr array and generate html repr
    zarr_info_dict = {k:v for k, v in dataset.info_items()}
    repr_html = generate_array_html_repr(zarr_info_dict, dataset, "Zarr Array")

    return repr_html

@oruebel @h-mayorquin if you could please review and let me know if there are any remaining concerns

h-mayorquin · 2024-10-31T21:01:37Z

Looks good to me, thanks for taking on this.

src/hdmf/backends/hdf5/h5tools.py

src/hdmf/backends/io.py

* small patch to html repr * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * comment request --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Steph Prince <40640337+stephprince@users.noreply.github.com>

h-mayorquin and others added 2 commits April 19, 2024 09:52

improve dev repr

649f141

[pre-commit.ci] auto fixes from pre-commit.com hooks

475cda9

for more information, see https://pre-commit.ci

h-mayorquin commented Apr 19, 2024

View reviewed changes

src/hdmf/container.py Outdated Show resolved Hide resolved

h-mayorquin commented Apr 19, 2024

View reviewed changes

src/hdmf/container.py Show resolved Hide resolved

address ruff

7f3c94e

h-mayorquin marked this pull request as ready for review April 23, 2024 15:14

h-mayorquin and others added 2 commits April 23, 2024 09:16

add changelog

5128d53

[pre-commit.ci] auto fixes from pre-commit.com hooks

21ae3cf

for more information, see https://pre-commit.ci

rly requested a review from stephprince April 24, 2024 01:21

rly added the category: enhancement improvements of code or code behavior label Apr 24, 2024

rly added this to the 3.14.0 milestone Apr 24, 2024

stephprince reviewed Apr 24, 2024

View reviewed changes

src/hdmf/container.py Outdated Show resolved Hide resolved

src/hdmf/container.py Show resolved Hide resolved

src/hdmf/container.py Outdated Show resolved Hide resolved

h-mayorquin and others added 2 commits April 26, 2024 17:05

add table representation for hdf5 info

4eb2635

[pre-commit.ci] auto fixes from pre-commit.com hooks

08292c6

for more information, see https://pre-commit.ci

h-mayorquin and others added 6 commits April 29, 2024 14:25

add test

59083c2

Merge remote-tracking branch 'refs/remotes/origin/improve_html_repr_o…

06a064e

…f_data' into improve_html_repr_of_data

[pre-commit.ci] auto fixes from pre-commit.com hooks

7ce5b3f

for more information, see https://pre-commit.ci

ruff

fc14d71

Merge remote-tracking branch 'refs/remotes/origin/improve_html_repr_o…

a2931e2

…f_data' into improve_html_repr_of_data

Merge branch 'dev' into improve_html_repr_of_data

96456a4

h-mayorquin commented Apr 29, 2024

View reviewed changes

tests/unit/test_container.py Show resolved Hide resolved

handle division by zer

133e28d

stephprince added 3 commits April 30, 2024 17:58

add zarr, array, hdf5 repr tests

ae21b61

generalize array html table description

28449a3

remove zarr tests

6e6a84c

CodyCBakerPhD mentioned this pull request May 6, 2024

[Backend Configuration Va] Basic user documentation catalystneuro/neuroconv#802

Merged

Merge branch 'dev' into improve_html_repr_of_data

5b235e0

Merge branch 'dev' into improve_html_repr_of_data

3813723

rly modified the milestones: 3.14.5, 3.14.6 Oct 3, 2024

stephprince and others added 7 commits October 24, 2024 09:53

Merge branch 'dev' into improve_html_repr_of_data

0a929b3

add array html repr utils

2c967dd

add generate_dataset_html method to io objects

6d007d1

add tests for array html repr

3552923

[pre-commit.ci] auto fixes from pre-commit.com hooks

4bb38df

for more information, see https://pre-commit.ci

fix import style

f1afe81

update CHANGLEOG

495e626

Merge branch 'dev' into improve_html_repr_of_data

03c9f8f

oruebel reviewed Oct 31, 2024

View reviewed changes

src/hdmf/backends/hdf5/h5tools.py Show resolved Hide resolved

oruebel reviewed Oct 31, 2024

View reviewed changes

src/hdmf/backends/io.py Show resolved Hide resolved

oruebel previously approved these changes Oct 31, 2024

View reviewed changes

stephprince mentioned this pull request Oct 31, 2024

[Feature]: improve html representation of Zarr Arrays in NWB hdmf-dev/hdmf-zarr#224

Open

3 tasks

add test for base hdmfio

01f8f8f

stephprince dismissed oruebel’s stale review via 01f8f8f November 1, 2024 23:00

oruebel approved these changes Nov 5, 2024

View reviewed changes

stephprince merged commit be602e5 into hdmf-dev:dev Nov 5, 2024
29 checks passed

h-mayorquin deleted the improve_html_repr_of_data branch November 5, 2024 21:50

h-mayorquin mentioned this pull request Nov 6, 2024

Small patch to html repr in #1100 #1201

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve html representation of datasets #1100

improve html representation of datasets #1100

h-mayorquin commented Apr 19, 2024 •

edited

Loading

codecov bot commented Apr 23, 2024 •

edited

Loading

stephprince left a comment

h-mayorquin commented Apr 26, 2024

h-mayorquin commented Apr 26, 2024 •

edited

Loading

h-mayorquin commented Apr 29, 2024

h-mayorquin commented Apr 30, 2024

rly commented Oct 2, 2024

stephprince commented Oct 17, 2024

h-mayorquin commented Oct 17, 2024

stephprince commented Oct 17, 2024

stephprince commented Oct 30, 2024

h-mayorquin commented Oct 31, 2024

improve html representation of datasets #1100

improve html representation of datasets #1100

Conversation

h-mayorquin commented Apr 19, 2024 • edited Loading

Motivation

How to test the behavior?

HDF5

Zarr

Checklist

codecov bot commented Apr 23, 2024 • edited Loading

Codecov Report

stephprince left a comment

Choose a reason for hiding this comment

h-mayorquin commented Apr 26, 2024

h-mayorquin commented Apr 26, 2024 • edited Loading

h-mayorquin commented Apr 29, 2024

h-mayorquin commented Apr 30, 2024

rly commented Oct 2, 2024

stephprince commented Oct 17, 2024

h-mayorquin commented Oct 17, 2024

stephprince commented Oct 17, 2024

stephprince commented Oct 30, 2024

h-mayorquin commented Oct 31, 2024

h-mayorquin commented Apr 19, 2024 •

edited

Loading

codecov bot commented Apr 23, 2024 •

edited

Loading

h-mayorquin commented Apr 26, 2024 •

edited

Loading