-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
combining multiple job starts #12
Comments
Note - it's unclear, in retrospect, what makes these remote job starts slow. Need to investigate further before determining how to increase rate. |
Looks like the staging in of files and ssh qsub each take a non-negligible time (around 1s). Both would need to be batched to fully help. |
is this really an issue? I guess you are already batching individual configs, so it won't be the case that you'd want to qsub 10,000 individual jobs (many queueing systems would choke as well) |
It is when you have 1000 jobs (one per config to re-evaluate an entire fitting database with tighter DFT params), and each one take 3 seconds, because the rsync to stage in fils take 1.5 s and the ssh to qsub takes 1.5 s. I guess I could set |
I have a solution for this, where ExPyRe, system, and scheduler can all be told to store information in a buffer, and then start all the jobs in buffer at once (one ssh to set up the directories, one rsync to stage in the run dirs, and one ssh to submit all the jobs). A PR will be available eventually - it'd be useful if people tested the SGE implementation, which I do not have access to. |
On some remote machines just the ssh connection is somewhat slow. It would be nice if multiple job start commands could be combined, perhaps by gathering all the remote commands into an array of strings, and then running all of them in a single ssh connection.
The text was updated successfully, but these errors were encountered: