Workflows not being scheduled when workflow handlers set to db-skip-locked #8209

afgane · 2019-06-20T21:52:33Z

Running web (calling uwsgi directly) and job handlers (using scripts/galaxy-main) separately and having the following config/workflow_schedulers.xml (or not having that file at all), leads to workflow invocations never being scheduled (they remain in new state):

<?xml version="1.0"?>
    <workflow_schedulers default="core">
    <core id="core" />
    <handlers assign_with="db-skip-locked" />
</workflow_schedulers>

Changing the handlers assignment method as follows triggers job scheduling.

   <handlers assign_with="db-self" />

Pouring through the logs with @natefoo, everything looks like is should, including the database values in the workflow_invocations table (which is _default_) the missing link is somewhere deeper.

The text was updated successfully, but these errors were encountered:

hexylena · 2019-06-21T07:30:20Z

Ok, so, same issue I saw @natefoo. Cool, glad it's a bug and not just our weird setup.

hexylena · 2019-06-21T07:32:39Z

@afgane for an interim solution I just have the following bash script running which makes things work well enough.

#!/bin/bash
while true; do
        psql -c "update workflow_invocation set handler = 'handler_main_' || (random() * 10)::integer where state = 'new' and handler = '_default_';" | grep -v 'UPDATE 0'
        sleep 1;
done

hexylena · 2019-06-28T11:51:01Z

I've added this to gxadmin

bgruening · 2019-08-03T11:12:27Z

bump this issue again. It seems to be a severe bug or we need to pull this option from the documentation.

pcm32 · 2019-11-03T09:47:48Z

Is it safe to use db-self for workflows while using db-skip-lock for normal job handlers in a multi master webless setup with dynamic handlers? I came accross this issue on current tip of release_19.05. Thanks

pcm32 · 2019-11-03T10:09:33Z

apparently the same happens when using db-transaction-isolation instead of db-skip-lock :-(.

pcm32 · 2019-11-03T12:07:21Z

Ok, I'm using the gxadmin call. However, I wonder, if this is being called from multiple hosts at the same time (because handler prefixes are host dependent) so that the workflows are balanced to handlers in different host, is this transactionally safe from the database point of view? Thanks!

natefoo · 2019-11-05T19:57:30Z

I thought I did a better job documenting the deal with workflow schedulers and assignment methods but the only thing I see is what's in the sample config.

db-skip-locked and db-transaction-isolation are both supposed to work but are discouraged because they can't guarantee serial workflow execution in a single history. Either using mules or db-preassign with a statically configured <handlers> solution are preferred for that reason. If you can run a single static workflow scheduler with --server-name=whatever and <handlers><handler id="whatever"/></handlers> in your workflow schedulers config, that should solve the issue.

That said, this bug ought to be addressed, and I'll try to find the time this week to look at it.

scholtalbers · 2020-03-03T12:01:19Z

I walked into this trap when doing the update to 20.01 and following the documentation ☹️

The preferred method depends on your deployment strategy:

    uWSGI + Mules - uWSGI Mule Messaging is preferred.
    uWSGI + Webless - Either Database SKIP LOCKED or Database Transaction Isolation is preferred.
    uWSGI + Hybrid - Either Database SKIP LOCKED or Database Transaction Isolation is preferred. If your mule and webless handlers are in non-overlapping pools (i.e. tags, or untagged), you can alternatively use both uWSGI Mule Messaging followed by either Database SKIP LOCKED or Database Transaction Isolation. If pools overlap, using uWSGI Mule Messaging would prevent any non-mule handlers in that pool from being assigned jobs.

hexylena · 2020-05-04T07:06:12Z

@natefoo

So then with a job conf like this:

        <handlers assign_with="db-skip-locked" max_grab="8">
                <handler id="handler_main_0"/>
                <handler id="handler_main_1"/>
                <handler id="handler_main_2"/>
                <handler id="handler_main_3"/>
                <handler id="handler_main_4"/>
                <handler id="handler_main_5"/>
                <handler id="handler_main_6"/>
                <handler id="handler_main_7"/>
        </handlers>

this is wrong? There should only be a single workflow scheduler? Then it works or?

<?xml version="1.0"?>
    <workflow_schedulers default="core">
    <core id="core" />
    <handlers default="schedulers">
        <handler id="workflow_scheduler_main_0" tags="schedulers"/>
        <handler id="workflow_scheduler_main_1" tags="schedulers"/>
    </handlers>
</workflow_schedulers>

still an issue for EU

bgruening · 2020-05-23T20:55:08Z

@natefoo do you have any ideas here?

Is the following a valid and recommended config?

<?xml version="1.0"?>

<workflow_schedulers default="core">
    <core id="core" />
    <handlers assign_with="db-self" default="schedulers">
        <handler id="workflow_scheduler_main_0" tags="schedulers"/>
        <handler id="workflow_scheduler_main_1" tags="schedulers"/>
    </handlers>
</workflow_schedulers>

natefoo · 2020-05-26T20:21:34Z

Use assign_with="db-preassign" rather than db-self. You can use multiple workflow schedulers (.org does).

@hexylena we figured out in Barcelona what the issue was but I am not sure if we recorded that revelation - do you recall? Is the issue that a db-skip-locked job conf without a workflow scheduler conf is broken?

natefoo · 2020-05-26T20:27:32Z

Here is .org's workflow scheduler conf, job conf handlers section (individual handlers are only defined here for plugin loading restrictions), and the workflow scheduler and handler supervisor configs.

hexylena · 2020-05-27T11:01:52Z

Oh gosh, that revelation is lost to the 11 weeks of quarantine I've been in since barcelona, sorry @natefoo.

So your workflow schedulers, matches ours., i.e. we have not specified db-self. But it works for you? likewise we're db-skip-lock in our job conf handlers section. So our configuration matches yours currently. Do we need to set it explicitly, like @bgruening did in usegalaxy-eu/infrastructure-playbook#187 ? are the workflow handlers detecting that job handksr are db-skip-locked and choosing to do the same? Which wouldn't make sense, given your configuration.

natefoo · 2020-05-27T13:35:10Z

The code should default it to db-preassign. It doesn't hurt to be explicit, but I only saw that my workflow scheduler assignment method wasn't set after I made that suggestion, which I made because .eu's was set to db-self.

One thing I mentioned to Björn on Gitter yesterday - the web workers (uwsgi) must have the same workflow schedulers conf as the workflow schedulers and job handlers. Just as they do with jobs, the web workers create the invocation and set the handler column according to the assignment method and handler definitions, which it can only do properly if it has the workflow scheduler config.

innovate-invent · 2020-09-02T06:34:23Z

I am not sure I understand how everything is working but I get the following exception:

galaxy.workflow.run_request INFO 2020-09-02 06:25:38,301 [p:9,w:1,m:0] [uWSGIWorker1Core1] Creating a step_state for step.id 953
galaxy.workflow.run_request INFO 2020-09-02 06:25:38,302 [p:9,w:1,m:0] [uWSGIWorker1Core1] Creating a step_state for step.id 954
galaxy.workflow.run_request INFO 2020-09-02 06:25:38,302 [p:9,w:1,m:0] [uWSGIWorker1Core1] Creating a step_state for step.id 955
galaxy.web_stack.handlers ERROR 2020-09-02 06:25:38,302 [p:9,w:1,m:0] [uWSGIWorker1Core1] Caught exception in handler assignment method: db-preassign
Traceback (most recent call last):
  File "/srv/galaxy/lib/galaxy/web_stack/handlers.py", line 447, in assign_handler
    handler = self._handler_assignment_method_methods[method](
  File "/srv/galaxy/lib/galaxy/web_stack/handlers.py", line 370, in _assign_db_preassign_handler
    handler_id = self._get_single_item(self.handlers[handler], index=index)
KeyError: '_default_'
galaxy.web_stack.handlers ERROR 2020-09-02 06:25:38,303 [p:9,w:1,m:0] [uWSGIWorker1Core1] (WorkflowInvocation[unflushed]) Failed to select handler

<?xml version="1.0"?>
<workflow_schedulers default="core">
  <core id="core" />
  <handlers assign_with="db-preassign" />
</workflow_schedulers>

job_conf.xml:

...
<handlers assign_with="db-skip-locked" />
...

I have a uwsgi + webless setup.

bgruening · 2020-09-02T06:45:30Z

Use only db-preassign like:

<workflow_schedulers default="core">
    <core id="core" />
    <handlers assign_with="db-preassign" default="schedulers">
        <handler id="workflow_scheduler_main_0" tags="schedulers"/>
        <handler id="workflow_scheduler_main_1" tags="schedulers"/>
        <handler id="workflow_scheduler_main_2" tags="schedulers"/>
        <handler id="workflow_scheduler_main_3" tags="schedulers"/>
    </handlers>
</workflow_schedulers>

innovate-invent · 2020-09-02T06:49:33Z

How do I get this to work without listing the handlers? My handlers autoscale and I can't explicitly declare them.
Do workflow schedulers also handle sending the individual jobs to their destinations? or do they simply manipulate the database to populate the workflow invocation?
This is a significant issue if I cant scale the workers.

mvdbeek · 2020-09-02T07:11:14Z

I don't think that's a scenario we support (#8209 (comment)). The only mode(s) that would work without knowing the available workers are db-skip-locked and db-transaction-isolation (and uwsgi-mule-messaging if running on the same host, in theory, but with autoscaling I guess that won't help you). In db-skip-locked and db-transaction-isolation mode handlers poll for new invocations. But as you discovered that doesn't work for workflow schedulers at this moment. I know @natefoo explained why that is earlier this year, but I have forgotten again.

If you can run a single static workflow scheduler with --server-name=whatever and in your workflow schedulers config, that should solve the issue.

That would be one solution that should allow you to scale workflow handlers between 0 and 1.

and db-transaction-isolation. Closes galaxyproject#8209. Needs some tests and the grabbing logic should be its own class.

and db-transaction-isolation. Closes galaxyproject#8209.

afgane added area/workflows kind/bug labels Jun 20, 2019

bgruening assigned natefoo Aug 3, 2019

afgane mentioned this issue Jan 31, 2020

Assign fixed handler name since we only support one handler for now galaxyproject/galaxy-helm#105

Merged

bgruening mentioned this issue May 26, 2020

add assign_with="db-preassign" to workflow handler usegalaxy-eu/infrastructure-playbook#187

Merged

mvdbeek added a commit to mvdbeek/galaxy that referenced this issue Sep 2, 2020

Add workflow invocation grabbing with db-skipped-lock

cf8dc58

and db-transaction-isolation. Closes galaxyproject#8209. Needs some tests and the grabbing logic should be its own class.

mvdbeek added a commit to mvdbeek/galaxy that referenced this issue Sep 2, 2020

Add workflow invocation grabbing with db-skipped-lock

ae9782e

and db-transaction-isolation. Closes galaxyproject#8209. Needs some tests and the grabbing logic should be its own class.

mvdbeek mentioned this issue Sep 2, 2020

Add workflow invocation grabbing with db-skipped-lock #10177

Merged

mvdbeek added a commit to mvdbeek/galaxy that referenced this issue Sep 2, 2020

Add workflow invocation grabbing with db-skipped-lock

6e74e23

and db-transaction-isolation. Closes galaxyproject#8209. Needs some tests and the grabbing logic should be its own class.

mvdbeek added a commit to mvdbeek/galaxy that referenced this issue Sep 2, 2020

Add workflow invocation grabbing with db-skipped-lock

dddaf7c

and db-transaction-isolation. Closes galaxyproject#8209.

mvdbeek added a commit to mvdbeek/galaxy that referenced this issue Sep 2, 2020

Add workflow invocation grabbing with db-skipped-lock

aed6524

and db-transaction-isolation. Closes galaxyproject#8209.

mvdbeek added a commit to mvdbeek/galaxy that referenced this issue Sep 2, 2020

Add workflow invocation grabbing with db-skipped-lock

fbe5d4f

and db-transaction-isolation. Closes galaxyproject#8209.

mvdbeek added a commit to mvdbeek/galaxy that referenced this issue Sep 18, 2020

Add workflow invocation grabbing with db-skipped-lock

62dbf67

and db-transaction-isolation. Closes galaxyproject#8209.

dannon closed this as completed in #10177 Sep 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workflows not being scheduled when workflow handlers set to db-skip-locked #8209

Workflows not being scheduled when workflow handlers set to db-skip-locked #8209

afgane commented Jun 20, 2019

hexylena commented Jun 21, 2019

hexylena commented Jun 21, 2019

hexylena commented Jun 28, 2019

bgruening commented Aug 3, 2019

pcm32 commented Nov 3, 2019

pcm32 commented Nov 3, 2019

pcm32 commented Nov 3, 2019

natefoo commented Nov 5, 2019 •

edited

Loading

scholtalbers commented Mar 3, 2020

hexylena commented May 4, 2020 •

edited

Loading

bgruening commented May 23, 2020

natefoo commented May 26, 2020

natefoo commented May 26, 2020

hexylena commented May 27, 2020

natefoo commented May 27, 2020

innovate-invent commented Sep 2, 2020 •

edited

Loading

bgruening commented Sep 2, 2020

innovate-invent commented Sep 2, 2020 •

edited

Loading

mvdbeek commented Sep 2, 2020

Workflows not being scheduled when workflow handlers set to db-skip-locked #8209

Workflows not being scheduled when workflow handlers set to db-skip-locked #8209

Comments

afgane commented Jun 20, 2019

hexylena commented Jun 21, 2019

hexylena commented Jun 21, 2019

hexylena commented Jun 28, 2019

bgruening commented Aug 3, 2019

pcm32 commented Nov 3, 2019

pcm32 commented Nov 3, 2019

pcm32 commented Nov 3, 2019

natefoo commented Nov 5, 2019 • edited Loading

scholtalbers commented Mar 3, 2020

hexylena commented May 4, 2020 • edited Loading

bgruening commented May 23, 2020

natefoo commented May 26, 2020

natefoo commented May 26, 2020

hexylena commented May 27, 2020

natefoo commented May 27, 2020

innovate-invent commented Sep 2, 2020 • edited Loading

bgruening commented Sep 2, 2020

innovate-invent commented Sep 2, 2020 • edited Loading

mvdbeek commented Sep 2, 2020

natefoo commented Nov 5, 2019 •

edited

Loading

hexylena commented May 4, 2020 •

edited

Loading

innovate-invent commented Sep 2, 2020 •

edited

Loading

innovate-invent commented Sep 2, 2020 •

edited

Loading