POC: append output to input NWB file #764

chrisroat · 2021-11-04T03:45:57Z

This is a proof of concept. It changes the behavior of the NWB output -- if the input is also NWB, the data is appended to the original file.

Output still written in suite2p native format, and then written again to NWB.

TODO: commit data file

Output still written in suite2p native format, and then written again to NWB. TODO: commit data file

chrisroat · 2021-11-17T19:43:33Z

@bendichter @carsen-stringer

This is a proof of concept to see if it's a way to integrate nwb better with suite2p. (The test I included passes locally, but fails in the CI because I have not included the test file via dvc).

This change allows the user to write the suite2p output directly back to the nwb file that had the input data.

Is this a recommended approach for NWB use? My preference would be to write a separate file, but then references have to be maintained across files and that ended up getting a bit janky (though better than the fake data that Suite2p created, which I left in for the case of non-NWB input). For example, DANDI required a Subject ID when I wrote a separate file and that wasn't straightforward to work with.

The change keeps the pattern of "write out data in native format" then "rewrite data to NWB". A future commit (possibly part of this PR) could remove that extra I/O.

bendichter · 2021-11-17T20:45:20Z

suite2p/io/nwb.py

+    imaging_plane = nwbfile.create_imaging_plane(
+        name='ImagingPlane',
+        optical_channel=optical_channel,
+        imaging_rate=ops['fs'],
+        description='standard',
+        device=device,
+        excitation_lambda=600.,
+        indicator='GCaMP',
+        location='V1',
+        grid_spacing=([2.0,2.0,30.0] if multiplane else [2.0,2.0]),
+        grid_spacing_unit='microns'
+    )
+
+    # link to external data
+    image_series = TwoPhotonSeries(
+        name='TwoPhotonSeries', 
+        dimension=[ops['Ly'], ops['Lx']],
+        external_file=(ops['filelist'] if 'filelist' in ops else ['']), 
+        imaging_plane=imaging_plane,
+        starting_frame=[0], 
+        format='external', 
+        starting_time=0.0, 
+        rate=ops['fs'] * ops['nplanes']
+    )


Could we include a way of specifying this metadata?

Likely, though it's outside the scope here. I am not trying to change any existing behavior -- just trying to add functionality.

chrisroat · 2021-11-22T15:04:49Z

Instead of this approach, the discussion on @bendichter's PR to this branch has made me rethink how to proceed.

Instead of adding to the input file, we can do a shallow copy of the input file to a new file (which will have links to data in the input file) and put the Suite2p output into this new file.

This would require that using NWB output format requires NWB as the input format. I think this is a worthwhile approach because otherwise, a lot of metadata must be faked or input by the user. It also encourages upstream data to be in NWB format. This is a change in behavior, and I wanted to be sure @carsen-stringer is OK with it.

Also of note - there is a known deficiency in DANDI with regard to the links between files, so the input/output file pair cannot be used in DANDI. As a workaround, a user could do a deep copy of the output file to create a single DANDI-compatible file that will contain both the input and output data.

bendichter · 2021-11-22T15:36:46Z

@chrisroat I want to make sure this feature is not reduced to a mode that only works for a subset of users. I would prefer it to remain possible to save processed data in a different file from the source data. I agree that faking data in the processed file is bad, but I would prefer to solve that problem by allowing the user to specify such data on save. For users that do not want to copy this type of metadata, it should also be possible to save the processed data to the original file. Your proposed solution also has the problem that it is not currently compatible with DANDI.

chrisroat · 2021-11-25T04:16:50Z

I understand the desire to have many people be able to use this, but I worry it won't be sustainable to have each tool author maintain all of these different options. The boilerplate I'm seeing is making me think about alternative approaches.

In my personal experience, separating the I/O and the tool into layers is better than having the tool understand and be tested against every combination of I/O.

In a sense, I think of tool authors writing things like pandas or numpy functions, which are very focused and use simple data types, and shouldn't have to worry about the on-disk formats, external metadata structure, provenance, etc. With this sort of separation, tools can adapt to NWB updates in a much cleaner way.

We could leave the tool to operate directly on the optimal format (i.e. a numpy array). There would be an I/O translator layer with two implementations -- one that works with the current data formats, and one that works with NWB. The NWB wrapper could be maintained separately, and the other wrapper could be deprecated over a release cycle or two.

bendichter · 2021-11-25T20:24:43Z

@chrisroat that sounds a lot like what we have here. Example usage:

from roiextractors import NwbSegmentationExtractor, Suite2pSegmentationExtractor

seg_ex = Suite2pSegmentationExtractor("segmentation_datasets/suite2p")
NwbSegmentationExtractor.write_segmentation(seg_ex, "output_path.nwb")

This takes as input the standard suite2p output and can save to either an existing or new file and allows you to input the metadata dictionary if you want to.

marius10p · 2023-10-30T15:09:00Z

@chrisroat is correct, we can barely maintain the code we write ourselves. Thanks for the PR though, sorry we couldn't take it.

POC: append output to input NWB file

c3173ff

Output still written in suite2p native format, and then written again to NWB. TODO: commit data file

bendichter reviewed Nov 17, 2021

View reviewed changes

marius10p closed this Oct 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

POC: append output to input NWB file #764

POC: append output to input NWB file #764

chrisroat commented Nov 4, 2021

chrisroat commented Nov 17, 2021

bendichter Nov 17, 2021

chrisroat Nov 17, 2021

chrisroat commented Nov 22, 2021

bendichter commented Nov 22, 2021 •

edited

Loading

chrisroat commented Nov 25, 2021

bendichter commented Nov 25, 2021

marius10p commented Oct 30, 2023

POC: append output to input NWB file #764

POC: append output to input NWB file #764

Conversation

chrisroat commented Nov 4, 2021

chrisroat commented Nov 17, 2021

bendichter Nov 17, 2021

Choose a reason for hiding this comment

chrisroat Nov 17, 2021

Choose a reason for hiding this comment

chrisroat commented Nov 22, 2021

bendichter commented Nov 22, 2021 • edited Loading

chrisroat commented Nov 25, 2021

bendichter commented Nov 25, 2021

marius10p commented Oct 30, 2023

bendichter commented Nov 22, 2021 •

edited

Loading