Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create default Client during unit tests #2515

Closed
djhoese opened this issue Feb 7, 2019 · 8 comments
Closed

Create default Client during unit tests #2515

djhoese opened this issue Feb 7, 2019 · 8 comments

Comments

@djhoese
Copy link

djhoese commented Feb 7, 2019

I have some code using distributed that gets the same error message as #516 when run on travis ci. I am creating my client inside a function call run by the "user" if it is not explicitly provided by doing:

            try:
                # get existing client
                from dask.distributed import get_client
                client = get_client()
            except ImportError:
                # distributed not installed
            except ValueError:
                # create new client
                from dask.distributed import Client
                client = Client()

What is the suggested way to create a default client if the user hasn't created one? Or am I using this all wrong. My goal is to use distributed to compute things in parallel without the user needing to worry about how to create a client, but if they want to get fancy they can provide their own or I'll use the existing one.

@djhoese
Copy link
Author

djhoese commented Feb 7, 2019

FYI I am using stdlib unittest and am starting the tests from a test suite that is called from the command line via python setup.py test. Interestingly the python 2 jobs pass, python 3 fail (Warning long log messages): https://travis-ci.org/pytroll/satpy/builds/490259468


/home/travis/miniconda/envs/test/lib/python3.6/site-packages/distributed/bokeh/core.py:57: UserWarning: 
Port 8787 is already in use. 
Perhaps you already have a cluster running?
Hosting the diagnostics dashboard on a random port instead.
  warnings.warn('\n' + msg)
/home/travis/miniconda/envs/test/lib/python3.6/site-packages/distributed/bokeh/core.py:60: ResourceWarning: unclosed <socket.socket fd=28, family=AddressFamily.AF_INET, type=2049, proto=6, laddr=('0.0.0.0', 0)>
  raise
tornado.application - ERROR - Multiple exceptions in yield list
Traceback (most recent call last):
  File "/home/travis/miniconda/envs/test/lib/python3.6/site-packages/tornado/gen.py", line 883, in callback
    result_list.append(f.result())
  File "/home/travis/miniconda/envs/test/lib/python3.6/site-packages/tornado/gen.py", line 1141, in run
    yielded = self.gen.throw(*exc_info)
  File "/home/travis/miniconda/envs/test/lib/python3.6/site-packages/distributed/deploy/local.py", line 220, in _start_worker
    yield w._start()
  File "/home/travis/miniconda/envs/test/lib/python3.6/site-packages/tornado/gen.py", line 1133, in run
    value = future.result()
  File "/home/travis/miniconda/envs/test/lib/python3.6/site-packages/tornado/gen.py", line 1141, in run
    yielded = self.gen.throw(*exc_info)
  File "/home/travis/miniconda/envs/test/lib/python3.6/site-packages/distributed/nanny.py", line 158, in _start
    response = yield self.instantiate()
  File "/home/travis/miniconda/envs/test/lib/python3.6/site-packages/tornado/gen.py", line 1133, in run
    value = future.result()
  File "/home/travis/miniconda/envs/test/lib/python3.6/site-packages/tornado/gen.py", line 1141, in run
    yielded = self.gen.throw(*exc_info)
  File "/home/travis/miniconda/envs/test/lib/python3.6/site-packages/distributed/nanny.py", line 228, in instantiate
    self.process.start()
  File "/home/travis/miniconda/envs/test/lib/python3.6/site-packages/tornado/gen.py", line 1133, in run
    value = future.result()
  File "/home/travis/miniconda/envs/test/lib/python3.6/site-packages/tornado/gen.py", line 1141, in run
    yielded = self.gen.throw(*exc_info)
  File "/home/travis/miniconda/envs/test/lib/python3.6/site-packages/distributed/nanny.py", line 375, in start
    yield self.process.start()
  File "/home/travis/miniconda/envs/test/lib/python3.6/site-packages/tornado/gen.py", line 1133, in run
    value = future.result()
  File "/home/travis/miniconda/envs/test/lib/python3.6/site-packages/distributed/process.py", line 35, in _call_and_set_future
    res = func(*args, **kwargs)
  File "/home/travis/miniconda/envs/test/lib/python3.6/site-packages/distributed/process.py", line 184, in _start
    process.start()
  File "/home/travis/miniconda/envs/test/lib/python3.6/multiprocessing/process.py", line 105, in start
    self._popen = self._Popen(self)
  File "/home/travis/miniconda/envs/test/lib/python3.6/multiprocessing/context.py", line 291, in _Popen
    return Popen(process_obj)
  File "/home/travis/miniconda/envs/test/lib/python3.6/multiprocessing/popen_forkserver.py", line 35, in __init__
    super().__init__(process_obj)
  File "/home/travis/miniconda/envs/test/lib/python3.6/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/home/travis/miniconda/envs/test/lib/python3.6/multiprocessing/popen_forkserver.py", line 42, in _launch
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "/home/travis/miniconda/envs/test/lib/python3.6/multiprocessing/spawn.py", line 143, in get_preparation_data
    _check_not_importing_main()
  File "/home/travis/miniconda/envs/test/lib/python3.6/multiprocessing/spawn.py", line 136, in _check_not_importing_main
    is not going to be frozen to produce an executable.''')
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.
ERROR

@mrocklin
Copy link
Member

mrocklin commented Feb 9, 2019

That is a common Python message if you try to do multiprocessing things at import time. Typically the solution is to place multiprocessing things under the if __name__ == '__main__': block. It's not a Dask-specific thing. It's a multiprocessing thing.

In your case you might bypass this by using a client that only uses threads

client = Client(processes=False)

If you have any desire to help catch that error and have us print something nicer that would be a very welcome contribution. This question comes up a lot.

@mrocklin
Copy link
Member

mrocklin commented Feb 9, 2019

There is not currently any blessed way to say "just get me a client whatever it takes". Currently the rationale for this is that people may want different defaults. For example as discussed above you may want to avoid processes, while currently that's the default.

So currently including code like what you share above may be ideal.

@djhoese
Copy link
Author

djhoese commented Feb 9, 2019

Ok thanks @mrocklin. I ended up mocking the Client object in my tests since right now I only have a few that use the Client object. I decided to do this after looking at all the logic in distributed's pytest fixtures (since we aren't currently using pytest it is harder to use them).

@djhoese djhoese closed this as completed Feb 9, 2019
@mrocklin
Copy link
Member

mrocklin commented Feb 9, 2019

OK, any interest in writing up the multiprocessing error message as a separate issue? It would be nice to help users with that. Python's error message is confusing to many people.

@djhoese
Copy link
Author

djhoese commented Feb 11, 2019

@mrocklin I thought about this more this weekend, would you say it is "better" practice to have software using dask get_client than provide a client argument that the user passes? Or a combination of both? Is there a case where the global client returned by get_client wouldn't be what the user wants to use?

@mrocklin
Copy link
Member

There are cases for everything :)

The concern I was referring to above was in creating a client if it doesn't already exist. We might want the user to have some say in how we do that. We might not though. I'm totally open to a function that creates a default client if it doesn't already exist. That just doesn't currently exist, and we've been defaulting to "ask the user" so far.

@djhoese
Copy link
Author

djhoese commented Feb 11, 2019

Makes sense. That is exactly what I ended up doing for the functions I've been playing with. The user can provide a client via keyword argument, otherwise default to using get_client. If a client is not provided or there is no global one or distributed is not installed or the user says client=False then a simple version of the code is used (iterate over things instead of submitting them to a client). Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants