Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single batch job for multiple tasks #2754

Open
matthewrmshin opened this issue Aug 15, 2018 · 8 comments
Open

Single batch job for multiple tasks #2754

matthewrmshin opened this issue Aug 15, 2018 · 8 comments
Labels
efficiency For notable efficiency improvements
Milestone

Comments

@matthewrmshin
Copy link
Contributor

matthewrmshin commented Aug 15, 2018

We need the ability to group together a number of related small tasks to run as a single batch job - but still have the suite manages the tasks as separate entities.

Quick points:

  • Redesign suite log/job/ file system to handle this.
  • Allow user to configure job run time independent of tasks, but with tasks being able to subscribe themselves to jobs.
  • Allow grouping of independent but similar tasks (e.g. a parallel group of tasks, each of which requires similar amount of resources) and grouping of tasks in a sub-graph.
  • Change logic and API to remove the one-task-to-many-jobs assumption. E.g. What do we do if we want to poll, kill or trigger a task/job?
  • Handle retry of some task failures in a job.

See also:

@matthewrmshin matthewrmshin added this to the later milestone Aug 15, 2018
@matthewrmshin matthewrmshin added the efficiency For notable efficiency improvements label Aug 15, 2018
@oliver-sanders
Copy link
Member

@dpmatthews
Copy link
Contributor

@dwsutherland
Copy link
Member

dwsutherland commented Nov 29, 2018

Perhaps Families could gain the ability to run like tasks, subsuming the job scripts (modified?) of it's children into a single job (integrating them into a batch job ( in accordance with the batch system)).

A family job could be specified optionally (single/batch (default = None)) via;

    [[BAR]]
        [[[family]]]
            job type  = batch

(not entirely sure of what advantages a family job running task jobs in single (as a suite would normally do) would have(?))
then of course;

foo => BAR

could, along with the relationship of the task-job, mean the same as it currently does (to preserve the behavior of family expansion, and simplify the Family object to not need it's own prereqs). However, it introduces a new pragmatic facet to the concept of a family (family-job relationship, not just inheritance).

Submit numbers could be accumulated by the family, but also individually by the task children in the same way they already are.

To be consistent all families should be represented the same new way in code, but only batch/single option families would instantiate/use associated job object/file/logs and receive status signals.

Dependencies between children would be ignored (warning issued on run/restart/reload?), as a batch job is different than a sub-suite because the former is managed by the batch system.

@dwsutherland
Copy link
Member

This family job could include child family-jobs also I suppose (be recursive).

@matthewrmshin
Copy link
Contributor Author

Hi @dwsutherland Thank you for the contribution.

Ultimately, I want a flexible system where tasks in a sub-graph (which may involve multiple subsets of multiple families) can be grouped into a single batch job that can run in a worker pool and still have the dependencies within the sub-graph honoured.

The problem is not very difficult to solve - it is very much how we would run a build system on interdependent source files with a worker pool. However, it will require some agreement on how users will configure it in their suite configuration. Overloading the run time family setting may be one way of doing this, but may also restrict us as well.

@dwsutherland
Copy link
Member

dwsutherland commented Nov 29, 2018

Ultimately, I want a flexible system where tasks in a sub-graph (which may involve multiple subsets of multiple families) can be grouped into a single batch job that can run in a worker pool and still have the dependencies within the sub-graph honoured.

@matthewrmshin - Yes, that would be ideal, there would be a number of ways to "skin this cat"; would be best to avoid the sub-graph dependencies being built into the batch job; and I suppose DB reads (polling style) are out of the question.

The problem is not very difficult to solve - it is very much how we would run a build system on interdependent source files with a worker pool. However, it will require some agreement on how users will configure it in their suite configuration. Overloading the run time family setting may be one way of doing this, but may also restrict us as well.

Perhaps you could school me; how does a couple (potentially one) line per batch family overload the run time? (can't imagine it would be declared that frequently even in suites that use it).

@matthewrmshin
Copy link
Contributor Author

Hi @dwsutherland Not sure why you are not keen on sub-graph dependencies in batch job. We happen to have some good experience writing simple efficient dependency manager + pool of workers logic for running in batch jobs. Rose file installation and fcm make build system are some recent examples. We can combine this with Cylc messaging, etc. Otherwise, we can also consider using a modern framework like Dask, which has built-in DAG management + capability to run logic in parallel.

When I said overload, I meant it like operator overloading, e.g. Python overloads the + operator so you can use it to add 2 numbers or concatenate 2 strings. Going back to the original purpose and the way run time family logic is implemented, I would certainly consider the usage described as overloading its interface.

I am not saying that we cannot do it this way, but my understanding of the problem tells me that it may restrict what we can do with the logic and how users may have to interact with the system. I would like to explore other possibilities and ideas before drawing any conclusions. My current feeling is that we should have much better separation between batch job settings (job submission and management - which is often site specific and less portable) and actual run time settings (actual logic of the task - which is less site specific and more likely to be portable).

Finally, we have many users with suites that will use this feature heavily in their suites, so I am keen to explore different possibilities to get the user interface right.

@dwsutherland
Copy link
Member

dwsutherland commented Nov 30, 2018

@matthewrmshin - Thanks for the explanation.. Also I'm not against sub-graph dependencies; just speculating on it's desirability from a place of ignorance (apologies; would have been better in the form of a question about having them built into the job script) 😉

@matthewrmshin matthewrmshin modified the milestones: later, cylc-9 Aug 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
efficiency For notable efficiency improvements
Projects
None yet
Development

No branches or pull requests

4 participants