Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[New Format]: ThorLab's tiff file #995

Open
2 tasks done
mohrabi opened this issue Aug 8, 2024 · 7 comments
Open
2 tasks done

[New Format]: ThorLab's tiff file #995

mohrabi opened this issue Aug 8, 2024 · 7 comments

Comments

@mohrabi
Copy link

mohrabi commented Aug 8, 2024

What format would you like to see added to NeuroConv?

In an attempt to convert the raw tiff data recorded from ThorLab's mesocscope, TiffImagingInterface raised an out of memory error, due to tiff file not being memory mappable. Moreover, there is no support for multichannel recording for this format. Multi-channel data are stored in one giant file whose frames are alternating between channels in a round robin fashion.

Does the format have any documentation?

https://suite2p.readthedocs.io/en/latest/inputs.html

Existing APIs for format

the data format is supported by suite2p, but I'm unaware of the libraries it uses to read the data.

Do you have any example files you are willing to share?

No response

Do you have any interest in helping implement the feature?

Yes.

Code of Conduct

@mohrabi
Copy link
Author

mohrabi commented Aug 8, 2024

This is the traceback from using TiffImagingInterface, which is raised after about an hour, way longer than expected for just reading the header files by a data interface module.

/home/mrabiei1/.conda/envs/kishorelab-to-nwb/lib/python3.11/site-packages/roiextractors/extractors/tiffimagingextractors/tiffimagingextractor.py:60: UserWarning: memmap of TIFF file could not be established. Reading entire matrix into memory. Consider using the ScanImageTiffExtractor for lazy data access.
  warn(
Traceback (most recent call last):
  File "/home/mrabiei1/.conda/envs/kishorelab-to-nwb/lib/python3.11/site-packages/roiextractors/extractors/tiffimagingextractors/tiffimagingextractor.py", line 58, in __init__
    self._video = tifffile.memmap(self.file_path, mode="r")
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mrabiei1/.conda/envs/kishorelab-to-nwb/lib/python3.11/site-packages/tifffile/tifffile.py", line 1578, in memmap
    raise ValueError('image data are not memory-mappable')
ValueError: image data are not memory-mappable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/mrabiei1/kishorelab-to-nwb/dev.py", line 11, in <module>
    interface_tiff = TiffImagingInterface(file_path=file_path, sampling_frequency=17.0, verbose=True)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mrabiei1/.conda/envs/kishorelab-to-nwb/lib/python3.11/site-packages/neuroconv/datainterfaces/ophys/tiff/tiffdatainterface.py", line 37, in __init__
    super().__init__(
  File "/home/mrabiei1/.conda/envs/kishorelab-to-nwb/lib/python3.11/site-packages/neuroconv/datainterfaces/ophys/baseimagingextractorinterface.py", line 44, in __init__
    self.imaging_extractor = self.get_extractor()(**source_data)
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mrabiei1/.conda/envs/kishorelab-to-nwb/lib/python3.11/site-packages/roiextractors/extractors/tiffimagingextractors/tiffimagingextractor.py", line 65, in __init__
    self._video = tif.asarray()
                  ^^^^^^^^^^^^^
  File "/home/mrabiei1/.conda/envs/kishorelab-to-nwb/lib/python3.11/site-packages/tifffile/tifffile.py", line 4525, in asarray
    result = stack_pages(
             ^^^^^^^^^^^^
  File "/home/mrabiei1/.conda/envs/kishorelab-to-nwb/lib/python3.11/site-packages/tifffile/tifffile.py", line 23060, in stack_pages
    out = create_output(out, shape, dtype)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mrabiei1/.conda/envs/kishorelab-to-nwb/lib/python3.11/site-packages/tifffile/tifffile.py", line 23205, in create_output
    return numpy.zeros(shape, dtype)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 272. GiB for an array with shape (60000, 1352, 1802) and data type int16

@h-mayorquin
Copy link
Collaborator

Hi, thanks for raising the issue. We will take a look into it.

@h-mayorquin
Copy link
Collaborator

So for some reason the memamp failed to occur. Let's work through catalystneuro/roiextractors#352 to get a better error message that should inform us what's goin on. Question: Did you try the ScanImageExtractor as suggested in the warning?

@mohrabi
Copy link
Author

mohrabi commented Aug 13, 2024

I'm encountering the following error when using the ScanImageExtractor:

ValueError: ScanImage version could not be determined from metadata.

This makes sense because the data I'm working with was not recorded using MBF's microscope, so the metadata format is different.

Regarding the link in the roiextractor repo: I'm working with a TIFF file that's approximately 300 GB, which means it cannot be fully loaded into memory on most systems. The primary advantage of using memmap is that it allows for handling such large datasets without requiring the entire file to be loaded into memory.

Another issue is handling the multi channel recording. Slicing a memory map entails loading the whole slice to the memory, so even if we have a memmap for the tiff file it still wouldn't work.

Suite2p's approach to multiplane multichannel recordings is to split the data by channel and plane and saving them separately, resulting in each raw binary file having a shape of time by height by width. This structure effectively manages large datasets. After conversion, Suite2p can load data from these binary files using the BinaryFile class, which is a wrapper around a memmap.

Adding support for BinaryFile in neuroconv or the PyNWB API could provide a straightforward solution for users dealing with similar issues, especially for those working with large datasets like this.

@h-mayorquin
Copy link
Collaborator

Yeah, we aim to use memmaps as much as we can but the library that we use is failing to memamp your file as you clearly understood.

I think that hack that you pointed out is a good solution. It would be easy for us to create a memmap extractor in roiextractors and then you could use that to write your data to nwb with that wrapper. The implementation looks really simple Some question to clarify:

  • Do you have small in the binary format that of suit2p that we could use? After we do this in roiextractors we would like to have automated testing. If you only have big files I coulld show you how to stub the memmap so we can test on it.
  • Do you know the difference between BinaryRWFile and BinaryFile . The documentation of Suit2P refers to a BinaryRWFile but this is nowhere to be found in their code base

A separate discussion is on whether we would like something like suit2p does for ThorLab. They rely on the haussmeister code to do the conversion to binary, maybe using this code?

If we had a lot of resources we could probably look at the code base and build a better extractor directly form it. But meanwhile I think that the hack that you propose is a good workaround.

@mohrabi
Copy link
Author

mohrabi commented Aug 14, 2024

They have renamed the BinaryRWFile to BinaryFile.

For sample data, I'll message on Slack.

@h-mayorquin
Copy link
Collaborator

Thanks for the pointers.

Linking this as there is some progress but lacks data:
#846

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants