-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent splits naming between Raw
and Epochs
when using BIDS
schema
#11870
Comments
Seems like a bug to me. @sappelhoff WDYT? |
Thanks for the report! I agree, this looks like a bug. Furthermore, a minor point: the indexes (01, 02, ... ) do not really need to be prefixed with a 0 if they never exceed 9 -- and I guess we can know that before writing the files and adjust the behavior accordingly. |
Thanks for such a quick response from both of you! Frankly, now that I think of it, I like the behaviour of Picture this. You have to write a data processing pipeline for a new experiment for which only part of the data were recorded. So far all the recorded data are under 2GB, so you don't have to worry about splits. Then the new data arrive and for one subject the file size is above 2GB, so when you save the intermediate preprocessed copy, the saved file will have the One way to avoid this would be to use the old "neuromag" split naming, but in this case I think BIDS checkers start complaining. Instead if the main split didn't change its name, like for What do you guys think? |
The BIDS specification says when a file is split, all parts of that file need to contain the split suffix -- so the behavior as it is for raw currently is the BIDS conformant behavior.
https://bids-specification.readthedocs.io/en/latest/glossary.html#split-entities I understand where you are coming from, but I think you'd have to build in a little logic into your reading code to deal with this problem, and/or rely on BIDS reading software like mne-bids. Although admittedly, we do have open issues about split files there, see: |
Ok, I see,
I don't think mne-bids supports splits reading and writing with the same template at the moment. I would partly solve the problem. For example, this fails: from pathlib import Path
from mne_bids import BIDSPath, read_raw_bids
from mne.io import read_raw
from mne.datasets import sample
root_path = Path("./tmp")
subj_path = root_path / "sub-01" / "meg"
subj_path.mkdir(parents=True)
raw_fpath = sample.data_path() / "MEG" / "sample" / "sample_audvis_raw.fif"
raw = read_raw(raw_fpath)
raw.crop(0, 10)
bp = BIDSPath(subject="01", root=root_path, suffix="meg", extension=".fif", datatype="meg")
raw.save(bp.fpath, split_size="10MB", split_naming="bids")
# produces tmp/sub-01/meg/sub-01_split-01_meg.fif, tmp/sub-01/meg/sub-01_split-02_meg.fif
loaded_raw = read_raw_bids(bp) # fails with
# FileNotFoundError: File does not exist:
# tmp/sub-01/meg/sub-01_meg.fif
# Did you mean one of:
# sub-01_split-02_meg.fif
# sub-01_split-01_meg.fif
# instead of:
# sub-01_meg.fif Something like this should work though, both with and without splits loaded_raw = read_raw_bids(sorted(bp.match(), key=str)[0]) ( Splits are a source of never ending joy, it seems :) |
ouch. it's really bad that |
BTW, is it possible to modify saving raw files in EDIT. |
BTW, is it possible to modify saving raw files in .fif.gz so all the
splits live inside the same archive to avoid this altogether?
it's not what .gz (gzip) does. It only compresses one file in mne (to make
it smaller).
I think "parse, not validate" rule applies here: it might be better to
convert to a format that doesn't use splits once than to think about this
corner case each time. Just a thought.
it's a trade off. Using .fif guarantees that all measurement info /
metadata are not lost for MEG data. I don't know any other
format that guarantees this and that already has readers for most
programming languages and will likely be readable in > 10 years.
I would say we need to fix the way we handle split files to make it
painless even if it involves a bit of magic.
… Message ID: ***@***.***>
|
I see. Somehow I thought gzip was like zip archives. I'm definitely not suggesting moving away from I realise there are issues with this approach but still thought it was worth suggesting. Don't know. Maybe it's all nonsense and will only complicate things further. |
I'm definitely not suggesting moving away from .fif. Merely having an
option to wrap it into archive as a potential solution to splits. You
already do the compression/decompression in io functions, when filename
ends in .fif.gz. The same way you could wrap/unwrap split files (or a
single file) in a tarball, when the extension is .fif.tar.gz or .fif.tar.
It's not gonna solve the splits problem completely, because you can't
really drop the support for the vanilla .fif, so the splits bugs will
reappear from time to time. But if tarballs are advertised in the docs as a
recommended way of dealing with splits, a lot of these bugs will go away
for the end user. .fif splits are not meant to be treated as separate
files: any time you move them apart or even rename them, everything breaks.
It won't happen if they are inside an archive.
the issue with tarball idea is that you would need to decompress the data
in memory to be able to start loading data. So I fear it will be heavy
both in computation and memory.
… Message ID: ***@***.***>
|
Yes, that's true, this will break |
Description of the problem
Not really a bug, more of an inconsistent behaviour.
Saving large
Raw
andEpochs
data withsplit_naming='bids'
uses different naming schema.When I save large
Raw
, I get suffixessplit-01
andsplit-02
, etc. but when savingEpochs
, the first split is saved without suffix, while the second one hassplit-01
. See the code snippet below for details. All asserts pass for me.Is there a reason for such difference?
Steps to reproduce
Link to data
No response
Expected results
These two asserts should fail
Actual results
All asserts pass
Additional information
Platform macOS-13.4.1-arm64-arm-64bit
Python 3.11.3 (main, May 15 2023, 18:01:31) [Clang 14.0.6 ]
Executable /Users/dmalt/Applications/miniconda3/envs/mne_cli_tools/bin/python3
CPU arm (8 cores)
Memory Unavailable (requires "psutil" package)
Core
├☑ mne 1.4.2
├☑ numpy 1.25.2 (OpenBLAS 0.3.23.dev with 8 threads)
├☑ scipy 1.9.3
├☑ matplotlib 3.7.2Installed osx event loop hook.
(backend=MacOSX)
├☑ pooch 1.7.0
└☑ jinja2 3.1.2
Numerical (optional)
├☑ sklearn 1.3.0
├☑ pandas 1.5.3
└☐ unavailable numba, nibabel, nilearn, dipy, openmeeg, cupy
Visualization (optional)
└☐ unavailable pyvista, pyvistaqt, ipyvtklink, vtk, qtpy, ipympl, pyqtgraph, mne-qt-browser
Ecosystem (optional)
└☐ unavailable mne-bids, mne-nirs, mne-features, mne-connectivity, mne-icalabel, mne-bids-pipeline
The text was updated successfully, but these errors were encountered: