Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POC: append output to input NWB file #764

Closed
wants to merge 1 commit into from

Conversation

chrisroat
Copy link
Contributor

This is a proof of concept. It changes the behavior of the NWB output -- if the input is also NWB, the data is appended to the original file.

Output still written in suite2p native format, and then written again to NWB.

TODO: commit data file

Output still written in suite2p native format, and then written again to NWB.

TODO: commit data file
@chrisroat
Copy link
Contributor Author

@bendichter @carsen-stringer

This is a proof of concept to see if it's a way to integrate nwb better with suite2p. (The test I included passes locally, but fails in the CI because I have not included the test file via dvc).

This change allows the user to write the suite2p output directly back to the nwb file that had the input data.

Is this a recommended approach for NWB use? My preference would be to write a separate file, but then references have to be maintained across files and that ended up getting a bit janky (though better than the fake data that Suite2p created, which I left in for the case of non-NWB input). For example, DANDI required a Subject ID when I wrote a separate file and that wasn't straightforward to work with.

The change keeps the pattern of "write out data in native format" then "rewrite data to NWB". A future commit (possibly part of this PR) could remove that extra I/O.

Comment on lines +286 to +309
imaging_plane = nwbfile.create_imaging_plane(
name='ImagingPlane',
optical_channel=optical_channel,
imaging_rate=ops['fs'],
description='standard',
device=device,
excitation_lambda=600.,
indicator='GCaMP',
location='V1',
grid_spacing=([2.0,2.0,30.0] if multiplane else [2.0,2.0]),
grid_spacing_unit='microns'
)

# link to external data
image_series = TwoPhotonSeries(
name='TwoPhotonSeries',
dimension=[ops['Ly'], ops['Lx']],
external_file=(ops['filelist'] if 'filelist' in ops else ['']),
imaging_plane=imaging_plane,
starting_frame=[0],
format='external',
starting_time=0.0,
rate=ops['fs'] * ops['nplanes']
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we include a way of specifying this metadata?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely, though it's outside the scope here. I am not trying to change any existing behavior -- just trying to add functionality.

@chrisroat
Copy link
Contributor Author

Instead of this approach, the discussion on @bendichter's PR to this branch has made me rethink how to proceed.

Instead of adding to the input file, we can do a shallow copy of the input file to a new file (which will have links to data in the input file) and put the Suite2p output into this new file.

This would require that using NWB output format requires NWB as the input format. I think this is a worthwhile approach because otherwise, a lot of metadata must be faked or input by the user. It also encourages upstream data to be in NWB format. This is a change in behavior, and I wanted to be sure @carsen-stringer is OK with it.

Also of note - there is a known deficiency in DANDI with regard to the links between files, so the input/output file pair cannot be used in DANDI. As a workaround, a user could do a deep copy of the output file to create a single DANDI-compatible file that will contain both the input and output data.

@bendichter
Copy link

bendichter commented Nov 22, 2021

@chrisroat I want to make sure this feature is not reduced to a mode that only works for a subset of users. I would prefer it to remain possible to save processed data in a different file from the source data. I agree that faking data in the processed file is bad, but I would prefer to solve that problem by allowing the user to specify such data on save. For users that do not want to copy this type of metadata, it should also be possible to save the processed data to the original file. Your proposed solution also has the problem that it is not currently compatible with DANDI.

@chrisroat
Copy link
Contributor Author

I understand the desire to have many people be able to use this, but I worry it won't be sustainable to have each tool author maintain all of these different options. The boilerplate I'm seeing is making me think about alternative approaches.

In my personal experience, separating the I/O and the tool into layers is better than having the tool understand and be tested against every combination of I/O.

In a sense, I think of tool authors writing things like pandas or numpy functions, which are very focused and use simple data types, and shouldn't have to worry about the on-disk formats, external metadata structure, provenance, etc. With this sort of separation, tools can adapt to NWB updates in a much cleaner way.

We could leave the tool to operate directly on the optimal format (i.e. a numpy array). There would be an I/O translator layer with two implementations -- one that works with the current data formats, and one that works with NWB. The NWB wrapper could be maintained separately, and the other wrapper could be deprecated over a release cycle or two.

@bendichter
Copy link

@chrisroat that sounds a lot like what we have here. Example usage:

from roiextractors import NwbSegmentationExtractor, Suite2pSegmentationExtractor

seg_ex = Suite2pSegmentationExtractor("segmentation_datasets/suite2p")
NwbSegmentationExtractor.write_segmentation(seg_ex, "output_path.nwb")

This takes as input the standard suite2p output and can save to either an existing or new file and allows you to input the metadata dictionary if you want to.

@marius10p
Copy link
Contributor

@chrisroat is correct, we can barely maintain the code we write ourselves. Thanks for the PR though, sorry we couldn't take it.

@marius10p marius10p closed this Oct 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants