Add workflow invocation grabbing with db-skipped-lock #10177

mvdbeek · 2020-09-02T10:14:34Z

and db-transaction-isolation.
Closes #8209.

~~Needs some tests and the grabbing logic should be its own class that can be shared with the job grabber.~~

natefoo · 2020-09-03T15:05:58Z

I had grabbable workflow scheduler assignment working back when I added it for jobs and was told by @jmchilton not to enable it because then workflow invocations would interleave outputs when run in a single history. It was my recollection that you could enable it by explicit configuration in the workflow_schedulers_conf.xml, was that not working?

mvdbeek · 2020-09-03T15:18:28Z

I don't think it was finally implemented, that's what #8209 is about.

history_local_serial_workflow_scheduling is optional and not on by default, I don't think that should prevent deployers from using db-skip-locked.
I also think history_local_serial_workflow_scheduling probably still works ?
The logic for this is here https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/workflow/scheduling_manager.py#L306

That said, I think the history_local_serial_workflow_scheduling logic may be incompatible with subworkflows, where an intermediate invocation output is required to finish scheduling of an outer step, but I guess I am getting off-topic there.

natefoo · 2020-09-03T15:28:23Z

There is a bunch of effort to override db-skip-locked in a variety of handler assignment scenarios unless the admin explicitly sets it in the workflow schedulers config, it looks like you didn't touch that (although changing incompatible methods might have an effect on that)?

mvdbeek · 2020-09-03T17:08:08Z

Yeah, I haven't checked if just setting db-skip-locked in the job handlers works ... I guess not.

natefoo · 2020-09-03T19:10:35Z

I guess that override should probably follow the value of history_local_serial_workflow_scheduling? Not sure if I missed that option or it didn't exist at the time.

and db-transaction-isolation. Closes galaxyproject#8209.

mvdbeek · 2020-09-18T14:38:34Z

There is a bunch of effort to override db-skip-locked in a variety of handler assignment scenarios unless the admin explicitly sets it in the workflow schedulers config, it looks like you didn't touch that (although changing incompatible methods might have an effect on that)?

Yeah, I haven't checked if just setting db-skip-locked in the job handlers works ... I guess not.

Works fine, I've added a test case for this.

I guess that override should probably follow the value of history_local_serial_workflow_scheduling? Not sure if I missed that option or it didn't exist at the time.

Not sure I understand this. Do you agree with me that history_local_serial_workflow_scheduling works regardless of the workflow scheduling method ? I'd add a test but it seems a bit tricky to make sure all is happening serially (not impossible though, if you insist). What probably doesn't work is parallelize_workflow_scheduling_within_histories: false (which is the default) ... I am still trying to wrap my head around this, but we could update the grabbing query to filter out invocations within histories that already have another active invocation scheduled by another handler.

But all these concerns also apply to standalone workflow schedulers, so maybe that can be a followup ?

natefoo · 2020-09-21T17:22:19Z

👍

I think this is all ok and that in-history serialization is being addressed via other means?

mvdbeek · 2020-09-21T18:10:23Z

That's my thinking!

innovate-invent · 2020-09-21T18:29:22Z

Can this be backported to 20.05?

dannon · 2020-09-22T01:39:18Z

@innovate-invent I'll leave that up to Marius, going to go ahead and get this into the dev branch though.

mvdbeek · 2020-09-22T07:25:10Z

I don't think we'd want to make these large-ish changes with a couple of different consequences (see the discussion about serial workflow scheduling above) to an existing release. We're hoping to get 20.09 out in 2 weeks though, so this shouldn't be far away from appearing in a stable release.

innovate-invent · 2020-09-22T16:19:22Z

2 weeks sounds great! Thanks!

innovate-invent · 2020-10-06T18:38:29Z

This PR does not seem to work. Invocations are not being grabbed.

<handlers assign_with="db-skip-locked"></handlers>

galaxy.jobs DEBUG 2020-10-06 16:53:19,747 Loaded job runner 'galaxy.jobs.runners.kubernetes:KubernetesJobRunner' as 'k8s'
galaxy.jobs.handler DEBUG 2020-10-06 16:53:19,748 Loaded job runners plugins: local:k8s
galaxy.jobs.handler INFO 2020-10-06 16:53:19,753 Handler job grabber initialized with 'db-skip-locked' assignment method for handler 'galaxy-worker-99cd6f84d-drpwg', tag(s): _default_
galaxy.jobs.handler INFO 2020-10-06 16:53:19,757 job handler stop queue started
galaxy.jobs.handler DEBUG 2020-10-06 16:53:19,758 Handler queue starting for jobs assigned to handler: galaxy-worker-99cd6f84d-drpwg
galaxy.web_stack.message DEBUG 2020-10-06 16:53:19,812 Bound default message handler 'JobHandlerMessage.default_handler' to <bound method TaskMessage.default_handler of 
galaxy.jobs.handler INFO 2020-10-06 16:53:19,812 job handler queue started
galaxy.jobs.handler INFO 2020-10-06 16:53:19,812 job handler stop queue started
galaxy.web_stack DEBUG 2020-10-06 16:53:19,886 WorkflowSchedulingManager: No job handler assignment methods were configured but this server is configured to attach to the 'job-handlers' pool, automatically enabling the 'db-skip-locked' assignment method
galaxy.web_stack DEBUG 2020-10-06 16:53:19,887 WorkflowSchedulingManager: Removed 'db-self' from handler assignment methods due to use of mules
galaxy.web_stack DEBUG 2020-10-06 16:53:19,887 WorkflowSchedulingManager: handler assignment methods updated to: db-skip-locked
galaxy.web_stack.handlers INFO 2020-10-06 16:53:19,887 WorkflowSchedulingManager: No job handler assignment method is set, defaulting to 'db-skip-locked', set the `assign_with` attribute on <handlers> to override the default
galaxy.workflow.scheduling_manager INFO 2020-10-06 16:53:19,887 Workflow scheduling handler assignment method(s): db-skip-locked
galaxy.workflow.scheduling_manager INFO 2020-10-06 16:53:19,887 Tag [_default_] handlers: galaxy-worker-99cd6f84d-drpwg
galaxy.workflow.scheduling_manager DEBUG 2020-10-06 16:53:19,887 Starting workflow schedulers
galaxy.queue_worker INFO 2020-10-06 16:53:19,914 Binding and starting galaxy control worker for galaxy-worker-99cd6f84d-drpwg
galaxy.queue_worker INFO 2020-10-06 16:53:19,935 Queuing async task rebuild_toolbox_search_index for galaxy-worker-99cd6f84d-drpwg.
galaxy.app INFO 2020-10-06 16:53:20,069 Galaxy app startup finished (13154.525 ms)
galaxy.web_stack INFO 2020-10-06 16:53:20,070 Galaxy server instance 'galaxy-worker-99cd6f84d-drpwg' is running
galaxy.queue_worker INFO 2020-10-06 16:53:20,080 Instance 'galaxy-worker-99cd6f84d-drpwg' received 'rebuild_toolbox_search_index' task, executing now.
galaxy.queue_worker DEBUG 2020-10-06 16:53:20,081 App is not a webapp, not building a search index
galaxy.web_stack.handlers INFO 2020-10-06 17:08:43,250 [p:14,w:1,m:0] [uWSGIWorker1Core1] (WorkflowInvocation[unflushed]) Handler '_default_' assigned using 'db-skip-locked' assignment method

select * from workflow_invocation order by create_time desc limit 20;
 id  |        create_time         |        update_time         | workflow_id |   state   | scheduler |       handler       |               uuid               | history_id 
-----+----------------------------+----------------------------+-------------+-----------+-----------+---------------------+----------------------------------+------------
 332 | 2020-10-06 17:08:43.278402 | 2020-10-06 17:08:43.278406 |         109 |           |           |                     | 9672f30407f611eb98d5a25ce9e9badb |         97
 331 | 2020-10-06 17:08:43.277707 | 2020-10-06 17:08:43.27771  |         107 |           |           |                     | 967140a407f611eb98d5a25ce9e9badb |         97
 330 | 2020-10-06 17:08:43.276959 | 2020-10-06 17:08:43.276963 |         106 |           |           |                     | 966f5ba407f611eb98d5a25ce9e9badb |         97
 329 | 2020-10-06 17:08:43.275917 | 2020-10-06 17:08:43.275924 |         105 |           |           |                     | 966e482c07f611eb98d5a25ce9e9badb |         97
 328 | 2020-10-06 17:08:43.266406 | 2020-10-06 17:08:43.266412 |         114 | new       | core      | _default_           | 966d183007f611eb98d5a25ce9e9badb |         97

Tried adding --attach-to-pool=workflow-schedulers to no effect.

pcm32 · 2021-02-19T13:30:54Z

Did this work for you in the end @innovate-invent ? I think I use it in a similar way than you do (not specifying handlers in the job conf and just joining them to the pool). I currently have to use a trick with gxadmin to assign those workflows to handlers, but I was hoping to get out of that trick.

innovate-invent · 2021-02-20T00:36:25Z

My job handlers run with --attach-to-pool=job-handlers and I just use the default workflow scheduler configs otherwise. The job handlers are configured to use db-skip-locked. There was an issue with separating the job handlers and workflow invocation handlers related to the maximum_workflow_jobs_per_scheduling_iteration config. I don't know if that was ever resolved.

Edit: Going back through the issues, it looks like I got this to work with #10371 but never left it enabled for some reason.

galaxybot added the triage label Sep 2, 2020

galaxybot added this to the 20.09 milestone Sep 2, 2020

mvdbeek force-pushed the db_skip_locked branch 6 times, most recently from e142050 to 7f95da9 Compare September 2, 2020 18:19

mvdbeek added kind/enhancement area/workflows and removed triage labels Sep 2, 2020

mvdbeek marked this pull request as ready for review September 2, 2020 18:25

mvdbeek requested a review from natefoo September 2, 2020 18:25

mvdbeek force-pushed the db_skip_locked branch from dc8b83e to 83bced9 Compare September 8, 2020 12:53

mvdbeek assigned natefoo Sep 8, 2020

mvdbeek added 3 commits September 18, 2020 16:38

Add workflow invocation grabbing with db-skipped-lock

62dbf67

and db-transaction-isolation. Closes galaxyproject#8209.

Add test for db-skip-locked/db-transaction-isolation

51a3c10

Add a test for job handler only with db-skip-locked

78daef3

mvdbeek force-pushed the db_skip_locked branch from 83bced9 to 78daef3 Compare September 18, 2020 14:38

natefoo approved these changes Sep 21, 2020

View reviewed changes

dannon merged commit 1a4052b into galaxyproject:dev Sep 22, 2020

innovate-invent mentioned this pull request Oct 8, 2020

Workflow sheduler db-skip-locked assignment fails with empty handlers section #10386

Closed

mvdbeek deleted the db_skip_locked branch March 1, 2021 08:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add workflow invocation grabbing with db-skipped-lock #10177

Add workflow invocation grabbing with db-skipped-lock #10177

mvdbeek commented Sep 2, 2020 •

edited

Loading

natefoo commented Sep 3, 2020

mvdbeek commented Sep 3, 2020

natefoo commented Sep 3, 2020

mvdbeek commented Sep 3, 2020

natefoo commented Sep 3, 2020

mvdbeek commented Sep 18, 2020

natefoo commented Sep 21, 2020

mvdbeek commented Sep 21, 2020

innovate-invent commented Sep 21, 2020

dannon commented Sep 22, 2020

mvdbeek commented Sep 22, 2020

innovate-invent commented Sep 22, 2020

innovate-invent commented Oct 6, 2020

pcm32 commented Feb 19, 2021

innovate-invent commented Feb 20, 2021 •

edited

Loading

Add workflow invocation grabbing with db-skipped-lock #10177

Add workflow invocation grabbing with db-skipped-lock #10177

Conversation

mvdbeek commented Sep 2, 2020 • edited Loading

natefoo commented Sep 3, 2020

mvdbeek commented Sep 3, 2020

natefoo commented Sep 3, 2020

mvdbeek commented Sep 3, 2020

natefoo commented Sep 3, 2020

mvdbeek commented Sep 18, 2020

natefoo commented Sep 21, 2020

mvdbeek commented Sep 21, 2020

innovate-invent commented Sep 21, 2020

dannon commented Sep 22, 2020

mvdbeek commented Sep 22, 2020

innovate-invent commented Sep 22, 2020

innovate-invent commented Oct 6, 2020

pcm32 commented Feb 19, 2021

innovate-invent commented Feb 20, 2021 • edited Loading

mvdbeek commented Sep 2, 2020 •

edited

Loading

innovate-invent commented Feb 20, 2021 •

edited

Loading