[EmbeddedAnsible] Force embedded_ansible role for workflow #19187

NickLaMuro · 2019-08-22T00:23:04Z

The queue_signal method in AnsibleRunnerWorkflow forces the "ems_operations" role when it schedules a queue item, but this class (which inherits from AnsibleRunnerWorkflow) is getting assigned this job from a wrapper job that requires the embedded_ansible role. In addition, the previous job queue an job that is locked to the existing server guid, so it is possible for that server to take the first job, but not the second when it doesn't have both an "embedded_ansible" and "ems_operations" role.

When a server exists that only has the "embedded_ansible" role, it is possible to get into a state where a playbook can be scheduled, but then is never ran because no server matches. This fix simply always uses the "embedded_ansible" role for everything, but tries to only modify the lower level classes to achieve that.

Alternative Solution

There is a "cleaner" solution for this that would instead modify the AnsibleRunnerWorkflow#queue_signal:

https://github.com/ManageIQ/manageiq/blob/master/app/models/manageiq/providers/ansible_runner_workflow.rb#L95

to instead just default to "embedded_ansible" instead of "ems_operations". However, since I am not confident that this is only use for "embedded ansible", I favored this method. While much uglier, it felt a bit safer at this stage in the game. I would agree that we should probably not go with this solution long term, but for a quick fix, I would argue this is a bit safer.

Links

Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1742839

Steps for Testing/QA

Provided replication steps in the BZ:

https://bugzilla.redhat.com/show_bug.cgi?id=1742839#c7

And this can be easily replicated on a single appliance.

The `queue_signal` method in `AnsibleRunnerWorkflow` forces the "ems_operations" role when it schedules a queue item, but this class (which inherits from `AnsibleRunnerWorkflow`) is getting assigned this job from a wrapper job that requires the `embedded_ansible` role. In addition, the previous job queue an job that is locked to the existing server guid, so it is possible for that server to take the first job, but not the second when it doesn't have both an "embedded_ansible" and "ems_operations" role. When a server exists that only has the "embedded_ansible" role, it is possible to get into a state where a playbook can be scheduled, but then is never ran because no server matches. This fix simply always uses the "embedded_ansible" role for everything, but tries to only modify the lower level classes to achieve that. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1742839

miq-bot · 2019-08-22T00:31:46Z

Checked commit NickLaMuro@78f9fe7 with ruby 2.4.6, rubocop 0.69.0, haml-lint 0.20.0, and yamllint 1.10.0
1 file checked, 0 offenses detected
Everything looks fine. 🍪

Fryguy · 2019-08-26T14:23:19Z

The various workflow classes should not default to any role so I don't think this direction works either. The main problem with the current code is that the caller defaults to ems_operations, since it was originally designed for providers. In time, we expect all callers (both provider, embedded ansible or other) to go through these workflow classes, so defaulting a role is not correct.

NickLaMuro · 2019-08-26T14:47:08Z

The various workflow classes should not default to any role so I don't think this direction works either.

Does not work in what way? I tested against these replication steps:

https://bugzilla.redhat.com/show_bug.cgi?id=1742839#c7

And with the fix in place, the playbook worked as expected. Is there something else that I am not accounting for that you are talking about?

Fryguy · 2019-08-26T17:42:42Z

When another non-embedded_ansible thing uses this class in the future?

NickLaMuro · 2019-08-26T19:40:48Z

When another non-embedded_ansible thing uses this class in the future?

@Fryguy okay, sure, but that wouldn't that be made even worse if I were to set this in the super class where the role defaults to "ems_operations" currently:

manageiq/app/models/manageiq/providers/ansible_runner_workflow.rb

Line 95 in 78f9fe7

role = options[:role] || "ems_operations"

So this was an attempt to apply this change where it would have the least number of side effects, specifically address what I think you are asking about above. However, if you think this isn't going far enough, I can see two alternatives:

Raise an error for when this is used by a class that shouldn't, though not sure how to accomplish that though...
Create another subclass that then is only used by EmbeddedAnsible

That said, along with having more lines of code to accomplish the second option when compared to this current solution, it also seems unnecessary at this point since only EmbeddedAnsible is the only user of this class (from what I can tell):

https://github.com/search?q=org%3AManageIQ+AnsiblePlaybookWorkflow&type=Code

So not sure what else I could do.

Fryguy · 2019-08-29T19:19:58Z

Going to merge for now to get it working. I think the best approach moving forward is something a bit more robust with having queue_signal more generic and moved into Job itself, but I need to consider the design of that a bit more.

…le_runner_embedded_ansible_jobs [EmbeddedAnsible] Force embedded_ansible role for workflow (cherry picked from commit b74fd8d) https://bugzilla.redhat.com/show_bug.cgi?id=1742839

simaishi · 2019-09-25T20:47:30Z

Ivanchuk backport details:

$ git log -1
commit 9d24747ca9a3027b12de4b1f29ffdb1c56be580e
Author: Jason Frey <jfrey@redhat.com>
Date:   Thu Aug 29 15:20:00 2019 -0400

    Merge pull request #19187 from NickLaMuro/fix_role_mismatch_for_ansible_runner_embedded_ansible_jobs
    
    [EmbeddedAnsible] Force embedded_ansible role for workflow
    
    (cherry picked from commit b74fd8df9546db3b4bbd1273b59719c7791f9449)
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1742839

bdunne requested a review from Fryguy August 23, 2019 13:36

Fryguy added the core/embedded ansible label Aug 26, 2019

Fryguy self-assigned this Aug 26, 2019

Fryguy merged commit b74fd8d into ManageIQ:master Aug 29, 2019

Fryguy added this to the Sprint 119 Ending Sep 2, 2019 milestone Aug 29, 2019

Fryguy added bug ivanchuk/yes blocker labels Aug 29, 2019

simaishi added ivanchuk/backported and removed ivanchuk/yes labels Sep 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[EmbeddedAnsible] Force embedded_ansible role for workflow #19187

[EmbeddedAnsible] Force embedded_ansible role for workflow #19187

NickLaMuro commented Aug 22, 2019

miq-bot commented Aug 22, 2019

Fryguy commented Aug 26, 2019

NickLaMuro commented Aug 26, 2019

Fryguy commented Aug 26, 2019

NickLaMuro commented Aug 26, 2019

Fryguy commented Aug 29, 2019

simaishi commented Sep 25, 2019

[EmbeddedAnsible] Force embedded_ansible role for workflow #19187

[EmbeddedAnsible] Force embedded_ansible role for workflow #19187

Conversation

NickLaMuro commented Aug 22, 2019

Alternative Solution

Links

Steps for Testing/QA

miq-bot commented Aug 22, 2019

Fryguy commented Aug 26, 2019

NickLaMuro commented Aug 26, 2019

Fryguy commented Aug 26, 2019

NickLaMuro commented Aug 26, 2019

Fryguy commented Aug 29, 2019

simaishi commented Sep 25, 2019