Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX] MultiProc starting workers at dubious wd #2368

Merged
merged 10 commits into from
Jan 19, 2018

Conversation

oesteban
Copy link
Contributor

@oesteban oesteban commented Jan 10, 2018

When maxtasksperchild is set at a very low value (e.g. 1), workers are created very often, sometimes at working directories deleted after the interface cleanup. That would trigger an OSError when calling os.getcwd() during nipype.config import.

This PR sets an initializer for the workers that just changes to the appropriate working directory before the worker is spun up. Fixes nipreps/fmriprep#868

The reason why one would be interested in maxtasksperchild=1 is memory consumption. Using this option in addition to multiprocessing in 'forkserver' mode ensures memory is freed after each interface has been run.

When ``maxtasksperchild`` was set at a very low value, workers
are created very often, sometimes at working directories deleted
after the interface cleanup. That would trigger an ``OSError``
when calling ``os.getcwd()`` during ``nipype.config`` import.

This PR sets an initializer for the workers that just changes to
the appropriate working directory before the worker is spun up.

Fixes nipreps/fmriprep#868
@oesteban oesteban requested review from satra and effigies January 10, 2018 20:39
Copy link
Member

@effigies effigies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable. Two questions.

self.pool = NipypePool(
processes=self.processors,
maxtasksperchild=maxtasks,
initializer=_init_worker,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason this shouldn't just be os.chdir?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really

@@ -128,6 +135,10 @@ def __init__(self, plugin_args=None):
self._task_obj = {}
self._taskid = 0

# Cache current working directory and make sure we
# change to it when workers are set up
self._cwd = os.getcwd()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the workflow base directory or the CWD of the shell? Could this cause things to dump into the local directory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't because interfaces handle their WD. I think this is fixing an edge case for fmriprep where we are spinning up and killing workers all the time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok it turns out that some SimpleInterfaces are writing to the workflow base directory :(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well that's... suboptimal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually bothering, because it means that without this patch, those interfaces are being run in some other unexpected path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently it is a problem only with SimpleInterface

@oesteban
Copy link
Contributor Author

oesteban commented Jan 11, 2018

@satra I don't see potential side effects to this PR, but it'd be great if you took a peek through your window to the future since other humans we don't have that.

@oesteban
Copy link
Contributor Author

Hold off merging, I still need to refine a minor nit.

@effigies
Copy link
Member

@oesteban Still need to hold off?

@oesteban
Copy link
Contributor Author

No, once I identified that the problem is the SimpleInterface we will need to fix that. I have taken care of it in FMRIPREP proper meanwhile. So this is okay to go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants