Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More efficient/sophisticated work scheduler #1057

Closed
BoPeng opened this issue Sep 21, 2018 · 3 comments
Closed

More efficient/sophisticated work scheduler #1057

BoPeng opened this issue Sep 21, 2018 · 3 comments

Comments

@BoPeng
Copy link
Contributor

BoPeng commented Sep 21, 2018

Although sos and ipyparallel have quite different design ideas, the scheduler, especially the zmq connections is very interesting because we are migrating from native python multi-processing to zmq. Since ipyparallel has done a lot of work/testing to achieve better performance, perhaps we should learn (really hard) what they have done. Note that, however,

  1. sos spawns task engine, DAG executor, and works locally. There is no need for users to start them.
  2. sos does not allow remote workers (cluster etc), these are done through external tasks.
  3. SoS DAG is very complex because of the flexible multiple style design, concurrent substep, external tasks, nested workflow etc. It is therefore very difficult to improve the performance of SoS.
@BoPeng
Copy link
Contributor Author

BoPeng commented Sep 21, 2018

My understanding of "performance" of workflows can be summarized as follows:

  1. DAG is limited in complexity, at most a few hundred nodes
    • The performance of interpreting DAG and finding the right steps to execute might not matter much
    • Idle workers (idle processes during execution #1056): Performance can be bad if we have idle processes that prevents the full speed execution of steps.
    • Load balancing: ???
  2. The biggest time consuming parts should be step themselves, including
    • Large step or task: performance of interpreter does not matter.
    • Many substeps: we are using pool + async map, which is as fast as it can be. The performance of signature checking matters
    • Many tasks: This can be a problem because of the overhead of saving, executing and monitoring external tasks. However, tasks are not recommended for local executions and users are recommended to use substeps for these cases.

So in the end I am not quite sure how much performance improvement we can achieve when we optimize the SoS execution engine.

@gaow
Copy link
Member

gaow commented Sep 21, 2018

Sorry, but I am not sure I appreciated the big picture here. The goal is to "optimize the SoS execution engine" by borrowing from ipyparallel and extend it to fit the complicated DAG pattern of SoS, right? Then my understanding is that there are major limitations in the current SoS execution engine that we'd like to overcome. Here you have named #1056 which seems the only problem relevant to execution pattern itself. But that ticket already proposed a solution not involving ipyparallel. So I'm not sure what exactly we would like to adopt ipyparallel for.

@BoPeng
Copy link
Contributor Author

BoPeng commented Sep 21, 2018

No. I am just checking how others are using zmq in a task execution setting, and ipyparallel seems to be mature tool that fits the bill, which can be more robust and/or efficient, at least they said that their current model is much more efficient than the model they used before. Whether or not we are suffering from the same set of problems they had is currently uncertain.

As I said in another thread, I will not adopt ipyparallel unless something seriously wrong about the SoS DAG execution engine is found.

@BoPeng BoPeng closed this as completed Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants