Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polish parsing of worker-saturation from config #7255

Merged
merged 2 commits into from
Nov 4, 2022

Conversation

crusaderky
Copy link
Collaborator

@crusaderky crusaderky commented Nov 3, 2022

Explain and polish support for string inf.
Add check vs. zero or negative values and increase coverage %.

[ORIGINAL COMMENT]
Remove support for string worker-saturation that can be cast to a float.
It's a pattern that is nowhere else to be seen in the package, is unnecessary (not even when reading from env variables), and violates the json schema.

@crusaderky crusaderky requested a review from gjoseph92 November 3, 2022 17:52
@crusaderky crusaderky self-assigned this Nov 3, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Nov 3, 2022

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

       15 files         15 suites   6h 28m 37s ⏱️
  3 169 tests   3 082 ✔️   84 💤 3
23 448 runs  22 542 ✔️ 901 💤 5

For more details on these failures, see this check.

Results for commit 4248873.

♻️ This comment has been updated with latest results.

@gjoseph92
Copy link
Collaborator

is unnecessary (not even when reading from env variables)

Prior to #7064, a user explicitly reported it not working with environment variables:

FWIW I set the env var DASK_DISTRIBUTED__SCHEDULER__WORKER_SATURATION in the docker image that the client/scheduler/workers use, but when I ran, the workers died before doing any computation

confirmed it works when not setting environment variable

| Traceback (most recent call last):                                                                                                                                                   |
|   File "/opt/conda/lib/python3.9/site-packages/distributed/utils.py", line 742, in wrapper                                                                                           |
|     return await func(*args, **kwargs)                                                                                                                                               |
|   File "/opt/conda/lib/python3.9/site-packages/distributed/scheduler.py", line 4051, in add_worker                                                                                   |
|     self.check_idle_saturated(ws)                                                                                                                                                    |
|   File "/opt/conda/lib/python3.9/site-packages/distributed/scheduler.py", line 2896, in check_idle_saturated                                                                         |
|     if math.isinf(self.WORKER_SATURATION)                                                                                                                                            |
| TypeError: must be real number, not str                                                                                                                                              |

@gjoseph92
Copy link
Collaborator

I can't reproduce that problem myself, but on this branch you can't set worker-saturation to inf with an env var:

(dask-distributed) gabe dev/distributed ‹worker-saturation-conf› » DASK_DISTRIBUTED__SCHEDULER__WORKER_SATURATION=inf dask-scheduler                                                                                                                                                                                                                                                                                           1 ↵
/Users/gabe/dev/distributed/distributed/cli/dask_scheduler.py:140: FutureWarning: dask-scheduler is deprecated and will be removed in a future release; use `dask scheduler` instead
  warnings.warn(
2022-11-03 13:22:11,047 - distributed.scheduler - INFO - -----------------------------------------------
2022-11-03 13:22:11,492 - distributed.http.proxy - INFO - To route to workers diagnostics web server please install jupyter-server-proxy: python -m pip install jupyter-server-proxy
2022-11-03 13:22:11,527 - distributed.scheduler - INFO - State start
2022-11-03 13:22:11,528 - distributed.scheduler - INFO - End scheduler
Traceback (most recent call last):
  File "/Users/gabe/miniconda3/envs/dask-distributed/bin/dask-scheduler", line 33, in <module>
    sys.exit(load_entry_point('distributed', 'console_scripts', 'dask-scheduler')())
  File "/Users/gabe/miniconda3/envs/dask-distributed/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/Users/gabe/miniconda3/envs/dask-distributed/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/Users/gabe/miniconda3/envs/dask-distributed/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/gabe/miniconda3/envs/dask-distributed/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/Users/gabe/dev/distributed/distributed/cli/dask_scheduler.py", line 248, in main
    asyncio.run(run())
  File "/Users/gabe/miniconda3/envs/dask-distributed/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/Users/gabe/miniconda3/envs/dask-distributed/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/Users/gabe/dev/distributed/distributed/cli/dask_scheduler.py", line 208, in run
    scheduler = Scheduler(
  File "/Users/gabe/dev/distributed/distributed/scheduler.py", line 3615, in __init__
    SchedulerState.__init__(
  File "/Users/gabe/dev/distributed/distributed/scheduler.py", line 1656, in __init__
    raise ValueError(  # pragma: nocover
ValueError: `distributed.scheduler.worker-saturation` must be a float > 0; got 'inf'

@crusaderky
Copy link
Collaborator Author

This is frustrating:

$ DASK_TEST1=inf python -c 'import dask; x = dask.config.get("test1"); print(x, type(x))'
inf <class 'str'>
$ DASK_TEST1=.inf python -c 'import dask; x = dask.config.get("test1"); print(x, type(x))'
.inf <class 'str'>
$ DASK_TEST1=math.inf python -c 'import dask; x = dask.config.get("test1"); print(x, type(x))'
math.inf <class 'str'>
$ DASK_TEST1=1.2 python -c 'import dask; x = dask.config.get("test1"); print(x, type(x))'
1.2 <class 'float'>

Relevant code:
https://github.com/dask/dask/blob/c013287bbb708ac473f0c7fc00506306bcb8d873/dask/config.py#L229-L235

@crusaderky crusaderky force-pushed the worker-saturation-conf branch from 3fec109 to 4248873 Compare November 3, 2022 20:58
@crusaderky
Copy link
Collaborator Author

@gjoseph92 please review now

@gjoseph92
Copy link
Collaborator

LGTM. I'm still slightly concerned that a user reported that the environment variable wasn't parsed correctly before #7064, and then it worked after that was released.

@jgdwyer if it's straightforward for you to install git+https://github.com/crusaderky/distributed.git@worker-saturation-conf and see if setting DASK_DISTRIBUTED__SCHEDULER__WORKER_SATURATION=1.0 still works for you with this branch in your environment, that would be very helpful. If it would take you more than 10min, then don't bother. I have a feeling that maybe when you originally set it, some extra whitespace could have snuck in there?

@gjoseph92 gjoseph92 mentioned this pull request Nov 3, 2022
11 tasks
@jgdwyer
Copy link

jgdwyer commented Nov 4, 2022

LGTM. I'm still slightly concerned that a user reported that the environment variable wasn't parsed correctly before #7064, and then it worked after that was released.

@jgdwyer if it's straightforward for you to install git+https://github.com/crusaderky/distributed.git@worker-saturation-conf and see if setting DASK_DISTRIBUTED__SCHEDULER__WORKER_SATURATION=1.0 still works for you with this branch in your environment, that would be very helpful. If it would take you more than 10min, then don't bother. I have a feeling that maybe when you originally set it, some extra whitespace could have snuck in there?

@gjoseph92 I just gave this a try and I think it worked okay. I forgot to update the client image to use this version, but I think that is probably okay as the env var only applies to scheduler/workers if I understand correctly. It's certainly possible that I incorporated extra whitespace when I had tried it earlier. (I've been setting this env var in Docker using ENV DASK_DISTRIBUTED__SCHEDULER__WORKER_SATURATION=1.0).
Screen Shot 2022-11-04 at 8 39 48 AM

Copy link
Collaborator

@gjoseph92 gjoseph92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking @jgdwyer, I'm happy to have this cleaned up then.

@gjoseph92 gjoseph92 merged commit 4b00be1 into dask:main Nov 4, 2022
@crusaderky crusaderky deleted the worker-saturation-conf branch November 7, 2022 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants