Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

engine: replace workerpool #412

Closed
josephjclark opened this issue Oct 27, 2023 · 3 comments · Fixed by #547
Closed

engine: replace workerpool #412

josephjclark opened this issue Oct 27, 2023 · 3 comments · Fixed by #547

Comments

@josephjclark
Copy link
Collaborator

josephjclark commented Oct 27, 2023

Workerpool is not going to fly for us because:

  • We don't have two-way communication with the worker, so we cannot lazy load credentials and dataclips
  • We do not have control over purging specific threads to clean memory leaks or isolate modules
  • A future worry but we have no capacity to re-use a specific thread, like re-running a workflow in the same thread twice to reduce module initialisation time. IF we can safely isolate modules (even in the sandbox), a warm thread/environment is a great optimisation
  • Running two workflows in the same pooled thread is insecure (and may be unstable, depending on module initialisation)
  • If a worker blows up, the parent thread - the whole engine! - also dies

We need to investigate threads and piscina (or whatever), or maybe other libraries, so that we get the control we need.

We need to consider performance, stability and security. I'd like to have some kind of benchmark for this (a benchmark integration test would be nice - run the same 10 workflows 50 times and measure the total time).

@josephjclark
Copy link
Collaborator Author

Some performance stats with workerpool:

Workers # attempts duration max job memory (mb) max system memory (mb)
50 100 4.555 23 1612
20 100 3.79 22 979
5 (default) 100 3.295 23 452
1 100 41.377 19 362

This is based on 0336d31

@josephjclark
Copy link
Collaborator Author

It's becoming increasingly apparent that workerpool or worker threads won't work for us. It is not secure or stable enough.

I've re-estimated (probably naively) based on this plan:

  • Add a quick-and-dirty "child_process" pool implementation
  • Compare benchmarks of a chlid-process pool vs a worker pool.
  • Unless the benchmarks are really really good (and I doubt it!), implement a pool of long-running child processes which spin off worker threads for each attempt. This balances isolation, stability and performance.

@josephjclark
Copy link
Collaborator Author

So the plan is:

  1. A quick implementation of a fire-and-destroy child process runner in the engine, with benchmarks
  2. We expect that to run a bit slowly so we're likely to go straight ahead and implement this model:

Untitled-2023-12-05-1318

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant