Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TestSignalExtraction.test_signal_extr_traits_valid failing #2380

Closed
djarecka opened this issue Jan 19, 2018 · 36 comments
Closed

TestSignalExtraction.test_signal_extr_traits_valid failing #2380

djarecka opened this issue Jan 19, 2018 · 36 comments

Comments

@djarecka
Copy link
Collaborator

Summary

I just noticed that TestSignalExtraction.test_signal_extr_traits_valid is failing on my OSX with the message:

E           traits.trait_errors.TraitError: The trait 'out_file' of a SignalExtractionOutputSpec instance is an existing file name, but the path  '/private/var/folders/5s/hk2j7ylx54b2clm9bx1mpvj00000gq/T/tmp_2bektow/SignalExtraction/signals.tsv' does not exist.

I can see that the file signals.tsv is being created but is being deleted here.

@oesteban I'll try to figure out why this happens on my OSX and not on CircleCI (or other Linux we tried today), but if you have any suggestions, please let me know.

@oesteban
Copy link
Contributor

Looks a lot like nipreps/fmriprep#936

@oesteban
Copy link
Contributor

@djarecka could you try on you OSX the PR referenced above? Thanks!

@djarecka
Copy link
Collaborator Author

it could be related to the temporary directory paths IMO.

i'm debugging it right now, and I can see that needed file here contains signals.tsv as /private/var/folders/5s/hk2j7ylx54b2clm9bx1mpvj00000gq/T/tmpuuucmm6n/SignalExtraction/signals.tsv
but file from walks_files(cwd) has /var/folders/5s/hk2j7ylx54b2clm9bx1mpvj00000gq/T/tmpuuucmm6n/SignalExtraction/signals.tsv.

@oesteban
Copy link
Contributor

Okay, it definitely looks like nipreps/fmriprep#936. I bet this is only happening with interfaces based off of SimpleInterface.
Could you give #2383 a quick try?

@djarecka
Copy link
Collaborator Author

just run with your oesteban-maint/node-hash and the test is still failing

@djarecka
Copy link
Collaborator Author

@oesteban not sure if this helps, but only signals.tsv in needed_files has /private at the beginning, i.e.:

(Pdb) needed_files
['/private/var/folders/5s/hk2j7ylx54b2clm9bx1mpvj00000gq/T/tmp1fmidn70/SignalExtraction/signals.tsv', '/var/folders/5s/hk2j7ylx54b2clm9bx1mpvj00000gq/T/tmp1fmidn70/SignalExtraction/_0x074086d35e0945269f2581ef95cdb8c1_unfinished.json', '/var/folders/5s/hk2j7ylx54b2clm9bx1mpvj00000gq/T/tmp1fmidn70/SignalExtraction/_inputs.pklz', '/var/folders/5s/hk2j7ylx54b2clm9bx1mpvj00000gq/T/tmp1fmidn70/SignalExtraction/_node.pklz']

@oesteban
Copy link
Contributor

oesteban commented Jan 19, 2018

Ok, this is the problem

self._results['out_file'] = os.path.abspath(self.inputs.out_file)

which aligns well with this discussion and the issue in fmriprep.

The problem, basically, is that interfaces shouldn't use os.getcwd() and stick with runtime.cwd. In general this shouldn't be a problem, but apparently SimpleInterface circumvents the change of directory in Node.

Since you are running step-by-step, would you mind checking why this line:

os.chdir(outdir)

seems not to be run with SimpleInterfaces (like SignalExtraction)?

EDIT: the problem is not restricted to SimpleInterfaces :(

@oesteban
Copy link
Contributor

Got it. Working on a fix. Sorry for the issue.

@djarecka
Copy link
Collaborator Author

@oesteban sorry for late reply, had a lunch break. Let me know if there is something I can test for you

@djarecka
Copy link
Collaborator Author

#2384 solves the problem on my OSX!

@oesteban
Copy link
Contributor

Oh, you are fast!

@satra satra closed this as completed in 533c247 Jan 20, 2018
@djarecka
Copy link
Collaborator Author

djarecka commented Jan 21, 2018

@oesteban I've just updated my master and I have the issue again..:(

When I checked your branch oesteban-fix/2380 yesterday with the commit 653f690 it works, since you deleted this branch I'm not sure how to check the final changes introduced by your PR (master had a few more merges since yesterday)

@djarecka djarecka reopened this Jan 21, 2018
@oesteban
Copy link
Contributor

Okay, now the corresponding report.rst should have more info. What is the content of cwd and prevcwd when it breaks?

@djarecka
Copy link
Collaborator Author

djarecka commented Jan 21, 2018

in my _reprt/report.rst I can only find info about Node and Original Input:

Node: nilearn

Hierarchy : SignalExtraction
Exec ID : SignalExtraction

Original Inputs

  • class_labels : ['CSF', 'GrayMatter', 'WhiteMatter']
  • detrend : False
  • ignore_exception : False
  • in_file : /private/var/folders/5s/hk2j7ylx54b2clm9bx1mpvj00000gq/T/pytest-of-dorota/pytest-83/test0/fmri.nii
  • incl_shared_variance : False
  • include_global : False
  • label_files : ['/private/var/folders/5s/hk2j7ylx54b2clm9bx1mpvj00000gq/T/pytest-of-dorota/pytest-83/test0/labels.nii']
  • out_file : signals.tsv

can debug later today

@djarecka
Copy link
Collaborator Author

@oesteban
If runtime parameters were written in the report.rst that would be:
cwd: '/var/folders/5s/hk2j7ylx54b2clm9bx1mpvj00000gq/T/tmpjjlptgmu/SignalExtraction'
prevcwd: '/private/var/folders/5s/hk2j7ylx54b2clm9bx1mpvj00000gq/T/pytest-of-dorota/pytest-88/test0'

But I found which changes cause the test failing on my OSX(I tested your branch before this commit):
f4c3db7

@effigies
Copy link
Member

That suggests that the interface isn't being run in the runtime.cwd, which seems to be a bigger problem. I'll have a look at this first thing in the morning. Oscar, ignore this thread.

@djarecka
Copy link
Collaborator Author

@effigies, it also looks like the reports are not working as they should (or @oesteban suggested), my report.rst is pretty short when this fails (no Runtime info)

@effigies effigies added this to the 1.0 milestone Jan 22, 2018
@effigies
Copy link
Member

effigies commented Jan 22, 2018

Set this to 1.0 since this seems symptomatic of a larger problem. @satra @djarecka @mgxd if you think it's okay to release without resolving whether interfaces are running in their working directories, untag.

@djarecka
Copy link
Collaborator Author

@effigies just so you know why the f4c3db7 commit makes difference, when I check os.path.abspath(self.inputs.out_file) here I have:
'/private/var/folders/5s/hk2j7ylx54b2clm9bx1mpvj00000gq/T/tmp5uxkdu21/SignalExtraction/signals.tsv'

runtime.cwd doesn't have /private/ at the beginning, i.e.: '/var/folders/5s/hk2j7ylx54b2clm9bx1mpvj00000gq/T/tmptnlizxv7/SignalExtraction'

so that's why the test ended up in the same place as on Thursday, i.e. needed files contains:
'/private/var/folders/5s/hk2j7ylx54b2clm9bx1mpvj00000gq/T/tmptnlizxv7/SignalExtraction/signals.tsv' (with this extra private)

fyi: on my OSX I can search either private/var/folders/... or /var/folders/.., they are the same. Not sure why OSX has this kind of structure...

@effigies
Copy link
Member

@djarecka Can you run the following in a terminal?

ls -ld / /private /var /private/var /var/folders /private/var/folders

It would be good to know what the symlink situation is.

@djarecka
Copy link
Collaborator Author

Witalis:~ dorota$ ls -ld / /private /var /private/var /var/folders /private/var/folders
drwxr-xr-x 36 root wheel 1292 Jan 15 23:16 /
drwxr-xr-x@ 6 root wheel 204 Nov 10 2016 /private
drwxr-xr-x 28 root wheel 952 Feb 27 2017 /private/var
drwxr-xr-x 9 root wheel 306 Feb 27 2017 /private/var/folders
lrwxr-xr-x@ 1 root wheel 11 Nov 10 2016 /var -> private/var
drwxr-xr-x 9 root wheel 306 Feb 27 2017 /var/folders

@effigies
Copy link
Member

Okay. As I think about this I'm a little inclined just to revert f4c3db7. os.path.join(runtime.cwd, ...) is going to be robust to invalid chdirs and also doesn't have to use an unnecessary system call. And I'm having trouble seeing a case where this would go wrong that os.path.abspath wouldn't.

@djarecka
Copy link
Collaborator Author

It would solve problem for this test on OSX (at least my OSX, we might want to check with some other, more reliable OSX...)

But I still don't understand why for test_signal_extr_traits_valid my runtime.cwd is /var/folders/ and for other test_nilearn runtime.cwd is /private/var/folders.

@satra
Copy link
Member

satra commented Jan 22, 2018

on osx these two things differ, so it's possible that tmpdir is returning one of these in pytests:

In [65]: os.path.abspath('/var/folders/ff/ttgp9pp94_7gfs7_5xn9f90c0000gl/0/com.apple.LaunchSer
    ...: vices')
Out[65]: '/var/folders/ff/ttgp9pp94_7gfs7_5xn9f90c0000gl/0/com.apple.LaunchServices'

In [66]: os.path.realpath('/var/folders/ff/ttgp9pp94_7gfs7_5xn9f90c0000gl/0/com.apple.LaunchSe
    ...: rvices')
Out[66]: '/private/var/folders/ff/ttgp9pp94_7gfs7_5xn9f90c0000gl/0/com.apple.LaunchServices'

@effigies
Copy link
Member

It's almost certainly going to boil down to a os.path.realpath call, or possibly a quirk where os.getcwd() can result in expanding symlinks. But as long as we use runtime.cwd when it's available, then a rogue chdir or realpath won't affect the outputs.

Assuming that we aren't shooting for a deeper fix than this one line change (and I think we shouldn't), any objections to pushing a revert directly to master? Runs through tests once instead of twice.

@djarecka
Copy link
Collaborator Author

@satra I did check that changing to realpath here doesn solve the problem

@satra
Copy link
Member

satra commented Jan 22, 2018

@djarecka - i don't think realpath generally helps with a solution :)

@effigies - fine with me.

@djarecka
Copy link
Collaborator Author

@effigies can you also comment on the sub-issue that I couldn't get any runtime info from report.rst file when the test was failing.

@effigies
Copy link
Member

My guess would be that this is another function of a path mismatch, where the call to update the report.rst is also using a path that was canonicalized to include /private. (It's unclear to me why OSX would care whether the path had /private in it, as it's the exact same thing.)

@effigies
Copy link
Member

But perhaps we should open a new issue to correctly handle temporary directories in OS X. I wouldn't target 1.0, though. My feeling is that it's enough of an edge case that we can resolve by telling people to set a base_dir.

@oesteban
Copy link
Contributor

I'll keep track of nipreps/fmriprep#936 where something very similar is happening without pytest, on a linux setup.

@achetverikov
Copy link
Contributor

I had a lot of weird issues after updating to 1.01 before I realized that the nodes run from the main script directory instead of their own folders for some reason. Changing the base_dir to the absolute path without symlinks helped.

@oesteban
Copy link
Contributor

I think what @achetverikov said is still necessary.

I had a lot of weird issues after updating to 1.01 before I realized that the nodes run from the main script directory instead of their own folders for some reason. Changing the base_dir to the absolute path without symlinks helped.

@oesteban oesteban reopened this Jul 24, 2018
@djarecka
Copy link
Collaborator Author

djarecka commented Jul 24, 2018

@oesteban I recently wrote #2639 , can you please check if this doesn't solve the problem (solved some similar issues from heudiconv recently)

@oesteban
Copy link
Contributor

Cool, I'll reopen if that did not work.

@djarecka
Copy link
Collaborator Author

@oesteban - sure, we also have #2395 on our watchlist.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants