Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX: don't use op.realpath when reading raw #9227

Merged
merged 3 commits into from
Mar 31, 2021

Conversation

drammock
Copy link
Member

@drammock drammock commented Mar 30, 2021

closes #9221

This one-line fix is almost certainly too simplistic and will probably break something, so I'm hoping for some intense scrutiny here. Here's my understanding of the problem and solution(?):

  1. datalad uses git-annex to manage large data files (datalad is the recommended access method for OpenNeuro, so it's really something we should support)
  2. for split raw files, this poses a problem because when the first raw is loaded, we call os.path.realpath() on the filename, so something like sub-01/ses-day1/meg/sub-01_ses-day1_task-AVLearn_run-01_part-01_meg.fif becomes /home/user/Desktop/test-datalad/ds002598/.git/annex/objects/Xz/5P/MD5E-s2146079382--dca9c1e4ee05b6b32bdcd4b5aef8c9bb.fif/MD5E-s2146079382--dca9c1e4ee05b6b32bdcd4b5aef8c9bb.fif
  3. that works fine for singleton raw files, but for split files, we determine whether it's split based on the original fname, but then go looking for the subsequent parts based on the realpath-transformed fname.

my solution is to just not call realpath(). Doing so doesn't seem to hurt much as far as I can tell; I can still load sample data, and after creating a symlink on my desktop to a random .fif file on another drive, passing the symlink into read_raw() loads it just fine.

The only case I can think of where this will break something that previously worked, is if you had this:

$ tree bar
bar
├── foo
│   ├── qux_part-01_meg.fif
│   └── qux_part-02_meg.fif
└── qux_part-01_meg.fif -> foo/qux_part-01_meg.fif

...and you tried to load the symlink in bar. This would fail because now we would look for ...part-02... in bar/ instead of in bar/foo/. But honestly that seems like an edge case that we shouldn't support anyway --- allowing users to access all parts of a split raw by providing a symlink to just the first part (when symlinks to the other parts aren't also present).

@agramfort
Copy link
Member

I think it’s the right fix

@drammock
Copy link
Member Author

@agramfort any idea why realpath was there in the first place?

@cbrnr
Copy link
Contributor

cbrnr commented Mar 31, 2021

This looks correct. I agree that all parts of a file need to be in the same folder, symlink or not. I don't see the reason why we resolved symlinks previously, this shouldn't be necessary at all.

@agramfort
Copy link
Member

agramfort commented Mar 31, 2021 via email

Copy link
Member

@agramfort agramfort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@larsoner I let you merge if happy

Copy link
Member

@larsoner larsoner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a regression test for the linked issue, then added a general io test for the chdir case I mentioned. It exposed problems in many readers, so I fixed those, too, by making more uniform use of _check_fname, which always returns an abspath now.

@agramfort
Copy link
Member

CIs complain

@larsoner
Copy link
Member

Failure is just a timeout, feel free to merge if you're happy @agramfort

@agramfort agramfort merged commit 53ce587 into mne-tools:main Mar 31, 2021
@drammock drammock deleted the fix-raw-symlink branch April 2, 2021 22:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Some trouble with version control tools for data
5 participants