KeyError and Worker already exists

I'm trying to setup dask with tpot.

My code looks like this:

      from dask_jobqueue import LSFCluster
    cluster = LSFCluster(cores=1, memory='3GB', job_extra=['-R rusage[mem=2048,scratch=8000]'],
                        local_directory='$TMPDIR',
                        walltime='12:00')

    from dask.distributed import Client
    client = Client(cluster)
    cluster.scale(10)

    from tpot import TPOTRegressor

    reg = TPOTRegressor(max_time_mins=30, generations=20, population_size=96,
                        cv=5,
                        scoring='r2',
                        memory='auto', random_state=42, verbosity=10, use_dask=True)
    reg.fit(X, y)

and I keep getting those annoying errors:

    distributed.scheduler - ERROR - '74905774'
    Traceback (most recent call last):
    File "/cluster/home/abrahalo/.local/lib64/python3.6/site-packages/distributed/scheduler.py", line 1306, in add_worker
        plugin.add_worker(scheduler=self, worker=address)
    File "/cluster/home/abrahalo/.local/lib64/python3.6/site-packages/dask_jobqueue/core.py", line 62, in add_worker
        self.running_jobs[job_id] = self.pending_jobs.pop(job_id)
    KeyError: '74905774'

    distributed.utils - ERROR - Worker already exists tcp://10.205.103.50:35780
    Traceback (most recent call last):
    File "/cluster/home/abrahalo/.local/lib64/python3.6/site-packages/distributed/utils.py", line 648, in log_errors
        yield
    File "/cluster/home/abrahalo/.local/lib64/python3.6/site-packages/distributed/scheduler.py", line 1261, in add_worker
        raise ValueError("Worker already exists %s" % address)
    ValueError: Worker already exists tcp://10.205.103.50:35780

I think there might be a problem with LSFCluster because it puts a lot of workers in `cluster.finished_jobs` that are still running according to `bjobs` and even to the dask.distributed web interface.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

KeyError and Worker already exists #169

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

KeyError and Worker already exists #169

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions