-
-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KeyError in plugin.add_worker #117
Comments
It looks like you've got a memory issue. Calling We should probably handle better this case though. Do some Error catching, or and if somewhere there https://github.com/dask/dask-jobqueue/blob/master/dask_jobqueue/core.py#L63. |
I agree. There is a good chance your persist call is running out of memory. However, the If line 63 raises a dask-jobqueue/dask_jobqueue/core.py Lines 52 to 66 in 392a927
|
Yes I think it is a memory issue as well. It got it to work by making my test smaller (one year instead of ten) and made minor modifications such as adding the I think it's going to be quite common for me to only have a few jobs running (the pegasus queue load varies week-to-week) and lots pending so I can see this popping up quite a bit for us at UM. |
Not sure if it is possible, but my assumption was that:
|
okay, that makes me think we should add a |
FYI An automatic e-mail came through today regarding the memory limit subject: Process killed - Compute node exceeded memory limit rxb826: 18 process killing(s) noticed on pegasus2 |
Thanks! |
I finally got to kick the tires of the
LSFCluster
today.I created https://github.com/raybellwaves/dask-jobqueue_test_lsf/blob/master/dask-jobqueue_test_lsf.ipynb which is adapted from https://www.youtube.com/watch?v=nH_AQo8WdKw
The general queue was heavily loaded today so even though I set off 50 workers only ~2/3 were running (the others remained pending).
When doing
df = df.persist()
an error was raised:This error appeared multiple times:
You may be more familiar with this @jhamman
The text was updated successfully, but these errors were encountered: