fix: remove switching directory to load results #3047

satra · 2019-09-24T15:40:46Z

Summary

Fixes #3014 . @stilley2 - quick change to test directory changes.

List of changes proposed in this PR (pull-request)

remove indirectory

Acknowledgment

(Mandatory) I acknowledge that this contribution will be available under the Apache 2 license.

effigies · 2019-09-24T15:41:53Z

Wasn't the reason for changing directories to resolve path resolution if relative paths are saved?

effigies · 2019-09-24T15:44:37Z

nipype/utils/filemanip.py

@@ -707,34 +707,33 @@ def loadpkl(infile):
    pkl_metadata = None

    unpkl = None
-    with indirectory(infile.parent):
-        with SoftFileLock('%s.lock' % infile.name):


If we want to continue changing directories while correctly locking, what if you just switched (and locked the absolute path):

- with indirectory(infile.parent): - with SoftFileLock('%s.lock' % infile.name): + with SoftFileLock('%s.lock' % infile): + with indirectory(infile.parent):

This still has potential race conditions when it comes to loading the actual results file.

satra · 2019-09-24T15:45:06Z

that's why i'm asking for @oesteban's input.

for loading, we can resolve the new path, without changing directories. i don't see the need for switching directories.

stilley2 · 2019-09-24T16:12:10Z

Wouldn't a cleaner solution be to change the behavior of _async_callback?

nipype/nipype/pipeline/plugins/multiproc.py

Line 148 in 56a0335

def _async_callback(self, args):

Or at least call don't call it asynchronously. Having an asynchronous chdir seems dangerous

stilley2 · 2019-09-24T16:50:53Z

In fact, can we just remove that chdir in _async_callback? It was added in bcc25fd, but maybe it's not needed anymore.

satra · 2019-09-24T17:35:54Z

In fact, can we just remove that chdir in _async_callback? It was added in bcc25fd, but maybe it's not needed anymore.

i'll let @oesteban comment on this. indeed, i think we have to be careful of where we are switching directories and what impact it has on the main thread.

@stilley2 - regarding the race condition, i thought about it a little more, and i think it's unlikely we can do better.

the worker plugin basically indicates if a job has finished. this happens only after the result file is written (softlock comes into play). the issue we were getting was that the file on the filesystem is not available to another process (because of filesystem latencies), because multiproc has passed info back to the main thread that job has finished. because of filesystem latencies it can still be the case that even the lock file has not come back up to the client, but this could only be handled with a timeout delay.

stilley2 · 2019-09-24T18:00:45Z

Sorry I guess I shouldn't have said race condition. What I mean is that the directory can still be changed between the indirectory context manager, and the loading of the results file itself. To recap, the only way to fix that is to either not change directories in loadpkl, or remove chdir from _async_callback. My personal preference is towards the latter, as I think all directory changes in the main process should be obvious and that having async directory changes will come back to bite us again in the future. However, I don't know why it was but there in the first place, and whether removing it will break something.

oesteban · 2019-09-24T23:38:31Z

for loading, we can resolve the new path, without changing directories. i don't see the need for switching directories.

Sorry for joining late to the party. Unfortunately, this is not possible because we are unpickling traits with the exists=True metadata set most of the times. If we considered a serialization method other than pickling, or workaround the set_value of FileTraits when setting state then we could do it from some other directory.

oesteban · 2019-09-24T23:44:05Z

In fact, can we just remove that chdir in _async_callback? It was added in bcc25fd, but maybe it's not needed anymore.

I would not remove this chdir for the reasons stated in that commit message.

Making it possible to save/load results without changing directories while keeping the rebase/resolve feature would be preferred. But that would entail a little refactor of how the BasePath traits are pickled (probably adding metadata to store the full path, and make it modifiable while unpickling).

stilley2 · 2019-09-25T13:51:49Z

So I was looking again at bcc25fd, and there seem to be two main changes

Initialize working by calling os.chdir(self._cwd)
After the working finishes, call os.chdir(self._cwd)

Regarding 2, the comment is to ensure that the "runtime is not left at a dubious working directory". How would this be possible, as any chdir in the execution of the node/working would not be reflected in the main process, where _async_callback is called.

Regarding 1, it is unclear to me why this is required, as wouldn't the Pool start in the current directory anyways? And I can't see how the cwd would change between L141 where we have self._cwd = os.getcwd() and L160 where that value is passed to the working.

So I guess in summary I can't see how removing the chdir in _async callback (2) would change anything, but I also don't see how the working initialization (1) changes anything either, and clearly it must have if bcc25fd solved some bug, so I'm probably misunderstanding something.

I hope my stubbornness isn't out of line, as I'm an outsider on this project, but I really feel the asynchronous chdir is asking for trouble, and as someone who as started using (and really liking) nipype for their pipelines, I would feel much better at the very least really understanding this issue.

Thanks!

oesteban · 2019-09-25T16:42:54Z

I hope my stubbornness isn't out of line

Not at all, thanks for your patience. I've been checking out the commit in more detail - and you are right, that chdir is asking for trouble.

oesteban · 2019-09-30T18:44:49Z

Hi @satra, any news on this? Could you find a way of unpickling from another dir?

satra · 2019-09-30T20:31:23Z

@oesteban - sorry i haven't had a chance to look at this.

oesteban

I'll try to think about how we can work around the unpickling in the interface's folder.

oesteban · 2019-10-01T01:17:08Z

nipype/utils/filemanip.py

+                # Could not get version info
+                pkl_file.seek(0)
+            try:
+                unpkl = pickle.load(pkl_file)


this still needs to be done in the folder where outputs required to exist are.

how about the other chdir?

fix: remove switching directory to load results

94ec45b

satra requested a review from oesteban September 24, 2019 15:40

effigies reviewed Sep 24, 2019

View reviewed changes

oesteban reviewed Oct 1, 2019

View reviewed changes

stilley2 mentioned this pull request Oct 1, 2019

FIX: Remove asynchronous chdir callback #3060

Merged

1 task

This was referenced Oct 2, 2019

FIX: Minimize scope for directory changes while loading results file #3061

Merged

RFC: Python and Numpy version support and release schedule #3036

Open

satra closed this Oct 2, 2019

fix: remove switching directory to load results #3047

fix: remove switching directory to load results #3047

Uh oh!

Conversation

satra commented Sep 24, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

List of changes proposed in this PR (pull-request)

Acknowledgment

Uh oh!

effigies commented Sep 24, 2019

Uh oh!

effigies Sep 24, 2019

Choose a reason for hiding this comment

Uh oh!

stilley2 Sep 24, 2019

Choose a reason for hiding this comment

Uh oh!

satra commented Sep 24, 2019

Uh oh!

stilley2 commented Sep 24, 2019

Uh oh!

stilley2 commented Sep 24, 2019

Uh oh!

satra commented Sep 24, 2019

Uh oh!

stilley2 commented Sep 24, 2019

Uh oh!

oesteban commented Sep 24, 2019

Uh oh!

oesteban commented Sep 24, 2019

Uh oh!

stilley2 commented Sep 25, 2019

Uh oh!

oesteban commented Sep 25, 2019

Uh oh!

oesteban commented Sep 30, 2019

Uh oh!

satra commented Sep 30, 2019

Uh oh!

oesteban left a comment

Choose a reason for hiding this comment

Uh oh!

oesteban Oct 1, 2019

Choose a reason for hiding this comment

Uh oh!

satra Oct 1, 2019

Choose a reason for hiding this comment

Uh oh!

Uh oh!

satra commented Sep 24, 2019 •

edited

Loading