-
Notifications
You must be signed in to change notification settings - Fork 530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: remove switching directory to load results #3047
Conversation
Wasn't the reason for changing directories to resolve path resolution if relative paths are saved? |
@@ -707,34 +707,33 @@ def loadpkl(infile): | |||
pkl_metadata = None | |||
|
|||
unpkl = None | |||
with indirectory(infile.parent): | |||
with SoftFileLock('%s.lock' % infile.name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to continue changing directories while correctly locking, what if you just switched (and locked the absolute path):
- with indirectory(infile.parent):
- with SoftFileLock('%s.lock' % infile.name):
+ with SoftFileLock('%s.lock' % infile):
+ with indirectory(infile.parent):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still has potential race conditions when it comes to loading the actual results file.
that's why i'm asking for @oesteban's input. for loading, we can resolve the new path, without changing directories. i don't see the need for switching directories. |
Wouldn't a cleaner solution be to change the behavior of _async_callback? nipype/nipype/pipeline/plugins/multiproc.py Line 148 in 56a0335
|
In fact, can we just remove that chdir in _async_callback? It was added in bcc25fd, but maybe it's not needed anymore. |
i'll let @oesteban comment on this. indeed, i think we have to be careful of where we are switching directories and what impact it has on the main thread. @stilley2 - regarding the race condition, i thought about it a little more, and i think it's unlikely we can do better. the worker plugin basically indicates if a job has finished. this happens only after the result file is written (softlock comes into play). the issue we were getting was that the file on the filesystem is not available to another process (because of filesystem latencies), because multiproc has passed info back to the main thread that job has finished. because of filesystem latencies it can still be the case that even the lock file has not come back up to the client, but this could only be handled with a timeout delay. |
Sorry I guess I shouldn't have said race condition. What I mean is that the directory can still be changed between the indirectory context manager, and the loading of the results file itself. To recap, the only way to fix that is to either not change directories in loadpkl, or remove chdir from _async_callback. My personal preference is towards the latter, as I think all directory changes in the main process should be obvious and that having async directory changes will come back to bite us again in the future. However, I don't know why it was but there in the first place, and whether removing it will break something. |
Sorry for joining late to the party. Unfortunately, this is not possible because we are unpickling traits with the |
I would not remove this chdir for the reasons stated in that commit message. Making it possible to save/load results without changing directories while keeping the rebase/resolve feature would be preferred. But that would entail a little refactor of how the |
So I was looking again at bcc25fd, and there seem to be two main changes
Regarding 2, the comment is to ensure that the "runtime is not left at a dubious working directory". How would this be possible, as any chdir in the execution of the node/working would not be reflected in the main process, where _async_callback is called. Regarding 1, it is unclear to me why this is required, as wouldn't the Pool start in the current directory anyways? And I can't see how the cwd would change between L141 where we have So I guess in summary I can't see how removing the chdir in _async callback (2) would change anything, but I also don't see how the working initialization (1) changes anything either, and clearly it must have if bcc25fd solved some bug, so I'm probably misunderstanding something. I hope my stubbornness isn't out of line, as I'm an outsider on this project, but I really feel the asynchronous chdir is asking for trouble, and as someone who as started using (and really liking) nipype for their pipelines, I would feel much better at the very least really understanding this issue. Thanks! |
Not at all, thanks for your patience. I've been checking out the commit in more detail - and you are right, that chdir is asking for trouble. |
Hi @satra, any news on this? Could you find a way of unpickling from another dir? |
@oesteban - sorry i haven't had a chance to look at this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to think about how we can work around the unpickling in the interface's folder.
# Could not get version info | ||
pkl_file.seek(0) | ||
try: | ||
unpkl = pickle.load(pkl_file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this still needs to be done in the folder where outputs required to exist are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about the other chdir
?
Summary
Fixes #3014 . @stilley2 - quick change to test directory changes.
List of changes proposed in this PR (pull-request)
Acknowledgment