Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Work on mac #53

Merged
merged 7 commits into from
Jun 13, 2024
Merged

Work on mac #53

merged 7 commits into from
Jun 13, 2024

Conversation

pattonw
Copy link
Collaborator

@pattonw pattonw commented May 11, 2024

Make daisy run on Mac without necessitating multiprocessing.set_start_method("fork", force=True)

changes made:

  1. removed all lambda functions
  2. remove double underscore methods
  3. use dill to pickle spawn_function in the Worker class (performance impacts unknown)
  4. improved signature parsing for task process_function

Running daisy with start method spawn:

if __name__ == "__main__":

daisy.run_blockwise must be run from within a if __name__ == "__main__": block. Otherwise you will probably get this error:

RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module: ...

functools.partial

We can't use lambda functions as the process_function. If you have a function e.g. smooth(block, sigma), that you want to run for all sigma in [1, 5, 9], you have to use functools.partial to define a function with the arguments you want. so process_function = partial(smooth, sigma=1)

Here is an example that should run on mac just fine with start method "spawn":

import daisy

from functools import partial


def process_block(block,t):
    import time
    time.sleep(t)

if __name__ == "__main__":

    for t in range(3):

        task = daisy.Task(
            f"test_server_task_t{t}",
            total_roi=daisy.Roi((0,), (100,)),
            read_roi=daisy.Roi((0,), (10,)),
            write_roi=daisy.Roi((1,), (8,)),
            process_function=partial(process_block, t=t/10),
            check_function=None,
            read_write_conflict=True,
            fit="valid",
            num_workers=1,
            max_retries=2,
            timeout=None,
        )
    
        daisy.run_blockwise([task])

use pytest functions instead of UnitTest classes
Remove unnecessary lambdas (`lambda b: func(b)` is equivalent to `func`)
these cannot be easily pickled so they can't be used with start_method="spawn"
We don't care about kwargs with defaults, we only care about mandatory args.
when creating a worker, we pass in a spawn function. Dill is an improved version of pickle so dill can serialize many
more functions than pickle can. Thus we use dill to serialize the spawn function to bytes, which can then be serialized/deserialized by pickle when
spawning subprocesses. Then when the worker needs to run the spawn function, we can use dill to deserialize the bytes into the desired function.
@cmalinmayor
Copy link
Collaborator

@pattonw Just putting the need for a dill dependency in writing

@pattonw pattonw merged commit ad974c1 into master Jun 13, 2024
6 checks passed
@pattonw pattonw deleted the work-on-mac branch June 13, 2024 22:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants