-
-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resource specification on GridEngine Clusters #195
Comments
Thanks for the report @ericmjl. Can you show us the job script that jobqueue is submitting? (http://jobqueue.dask.org/en/latest/debug.html#checking-job-script) Also, showing us how your calling SGECluster and your dask configuration would be useful in debugging. |
Thanks @jhamman! This is what the job script is:
And this is how I call the SGECluster: cluster = SGECluster(queue='default.q',
walltime="259200",
processes=1,
memory='8GB',
cores=1,
env_extra=['activate mpnn']) it's in line with how I first figured out how to make this work (without knowing about resource specs), and hence it's identical to the example on the docs, which I PR-ed in. However, when I later inspected the source code, it looked like at least the As for a Dask configuration, I don't have any config files at the moment. |
Indeed, it looks like if user don't specify Something similar to what you propose is done in Would you be interested in submitting a PR that does the same for SGE? It would be very welcomed! |
Yes, definitely, @guillaumeeb! Happy to tackle this later in the day. |
@jhamman @guillaumeeb having had a few more chances to use the SGECluster, I have seen a few more scenarios where I think some PRs may be necessary. I am planning to close #197, and open a new discussion for this. |
So are you still thinking of a PR related to this issue about resource specification? Should this issue stay open? |
@guillaumeeb given the flexibility of grid engine systems (which I read as "quite complicated"), I think that for now, the easiest PR is the documentation PR #220. This is one of those few scenarios where I think more talking is needed up-front. Let's keep exploring with @lesteve on what is the best way forward is for GE-like clusters. |
Yes, 100%! |
Hi Dask team!
My colleague @sntgluca and I have been very enthusiastic about the possibilities enabled by
dask-jobqueue
at NIBR! It's been a very productivity-enhancing tool for me. At the same time, we found something that we think might be a be a bug, but would like to disprove/confirm this before potentially working on a PR to fix it.Firstly, we found that when using the
memory
keyword argument, theSGECluster
will show thatX
GB per worker node is allocated. However, the amount of RAM that is allocated, according to the queueing status screen, is only the default amount specified by the sysadmins.Here is some evidence that I collected from the logs on our machines.
Firstly, Dask's worker logs show that 8GB is allocated to them:
However, for the same job ID, when I used
qstat -j JOBID
:As you can see, the resources granted were only 4GB of memory, not the 8GB requested, but the Dask worker logs show 8GB being allocated.
I have a hunch that this is a bug, but both @sntgluca have this idea that our end users shouldn't have to worry about GridEngine resource spec strings, and should be able to use the very nice
SGECluster
API to set these parameters correctly. Looking at theSGECluster
source code, it looks doable with a small-ish PR to parse thememory
(and other kwargs) into the correct resource specification string, if this is something that you would be open to.Please let us know!
The text was updated successfully, but these errors were encountered: