Skip to content
This repository has been archived by the owner on Dec 7, 2018. It is now read-only.

WIP: Add sub dax #64

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

WIP: Add sub dax #64

wants to merge 4 commits into from

Conversation

Cyberface
Copy link
Contributor

Some of us think it would be good to have the option to both make a gwin workflow and submit the dax to a cluster using a single command. This RP attempts to do this however, it is not entirely suitable at the moment I don't think.

I would like some feedback from @spxiwh.

I found that the only way I could get this to work was if I used the shell=True option of make_external_call.

Also I added the fail_on_error=False flag to make_external_call because even though when I ran gwin_make_workflow with the new --submit-dax flag the code ran to completion, submitted the dax successfully but also make_external_call returned an error code for some reason, even though the dax was actually submitted successfully.

Am I missing some tricks in order to get this to work nicer? Because I would like to add some checks or try/except when I try to submit the dax to try and catch an issues.

Perhaps we don't want to go down this route but I'd like to start the discussion on it here.

@coveralls
Copy link

Pull Request Test Coverage Report for Build 232

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 41.589%

Totals Coverage Status
Change from base Build 218: 0.0%
Covered Lines: 1110
Relevant Lines: 2669

💛 - Coveralls

@spxiwh
Copy link
Contributor

spxiwh commented Jul 23, 2018

Hi @Cyberface. Personally, I would prefer not to add this functionality. In the current PE workflow.py model the creation of the dagman (in lalinference_pipe) has basically already specified all options, parameters etc. There are no options when submitting the dag, you just do it. In the pycbc.workflow model the workflow creation just creates the .dax file, which specifies what commands should be run, and in what order. However, options specific to transforming from the .dax file to a .dag file (or even a Slurm workflow or other) is specified during the "planning" and "submission" stages, which we've put together in pycbc_submit_dax (this includes things like the reuse cache, where to run the jobs, clustering jobs together into one condor node and probably some other things that we've never used before). If we do hide this within workflow generation, people may not be aware that this distinction exists, and it might be harder if people want to make changes that should happen at the submit_dax level (or the "planning" level to use pegasus terminology). Maybe this is less important for PE than search applications (as the workflow is much more simple), but I think it isn't yet clear exactly how the PE workflow would look like (do we need parallel universe for instance), which may also be relevant.

That aside, if we do want to proceed with this, you raise two points:

  • Be careful when setting shell=True! There are lots of well written articles about when not to use this. But the basic idea is that if you set WORKFLOW_NAME=sebs_silly_workflow; rm -rf ~sebastian.khan; echo LOL; you're going to be in trouble!
  • pycbc_submit_dax (which is a bash script, EURGH!) seems to be returning exit code 1. Probably it returns this after it's already submitted the dax. You'd probably have to check the log files and stdout and stderr (all of which get written to file with make_external_command to figure out why this is happening. There might need some changes needed in this code to distinguish between a failure that prevented dax submission (raise error), and failure that prevented some help text being printed (don't return exitcode != 0)

@Cyberface
Copy link
Contributor Author

@spxiwh Thanks for looking at this.

lalinference_pipe has the option --submit-condor which is what I'm trying to implement here.
The user will always have the option to make the PE workflow and not submit the dax automatically. Then their can always do fancier things when submitting the dax.
Having said that, some of those fancier things could be turned into options in the PE workflow.

Side note:
Why is pycbc_submit_dax a bash script? Can it not be turned into a python script?
It would be nice to have import pycbc_submit_dax_command_line_options
The pycbc_submit_dax is a MONSTER. I have no idea how somebody wrote that.

On point 1:
I know the Shell=True isn't good and would not like to have it. However, for whatever reason, I simply could not get python execute the pycbc_submit_dax without setting Shell=True but I will play around with it more to try and get it to work,

On point 2:
Ah good point - I'll have a look at the error codes.

This is still WIP and also if there are alternative options I'd be happy to explore those.

Perhaps make_workflow could simply write a bash script at the end and save that to disk and then call that bash script from python.

@spxiwh
Copy link
Contributor

spxiwh commented Jul 23, 2018

Hi @Cyberface. Yeah, I know that lalinference_pipe has a submit-me option, which in that case makes sense, as the only thing you can do with the dag file that is generated is condor_submit_dag. I think my point really boils down to "You're not rewriting lalinference_pipe, you're writing a new code." It isn't necessarily clear that what was better in the past is still better now, and I think it's worth at least understanding what sort of model we are going to be having for PE workflows, and whether people will use a "standard" submit option 95% of the time. I think that it is important that people really think about the pegasus model as a different model, and not just try to use this like pipeline.py was used in the past. Anyway, I won't object more to adding this option in (I certainly understand why you want it), but my objection stands.

To address the technical points. Yes, pycbc_submit_dax is a monster bash script. Duncan B likes monster bash scripts. If this could be rewritten in python it would be great (dibs not me). Be aware though that this will necessarily make a number of calls to non-python pegasus code.

The fact that this is a monster bash script is probably the reason it won't work with shell=False. But if you do use shell=True maybe do some sanity checking that anything provided on the command line that will be sent to the script contains only lowercase, uppercase or numbers (no spaces, no commas, no ....).

As make_external_call is just running this as a bash script, just printing the command to file (or to screen!) might also work. You could then also highlight that this often isn't just copy and past!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants