-
Notifications
You must be signed in to change notification settings - Fork 898
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Queue ServiceAnsiblePlaybook#execute for the embedded_ansible role #19055
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few questions/concerns, but my guess is they just need clarification for my stupid self and this probably is fine as is.
:role => "embedded_ansible" | ||
} | ||
|
||
task_id = MiqTask.generic_action_with_callback(task_opts, queue_opts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is worth holding up the merge of this PR, but this looks very similar to what is available through EmbeddedAnsible::CrudCommon
:
def queue(zone, instance_id, method_name, args, action, auth_user) | |
task_opts = { | |
:action => action, | |
:userid => auth_user || "system" | |
} | |
queue_opts = { | |
:args => args, | |
:priority => MiqQueue::HIGH_PRIORITY, | |
:class_name => name, | |
:method_name => method_name, | |
:role => "embedded_ansible", | |
:zone => zone | |
} | |
queue_opts[:instance_id] = instance_id if instance_id | |
MiqTask.generic_action_with_callback(task_opts, queue_opts) | |
end |
So it would nice if there was a clean way to share this code without having to dig into the provider code to do it. But again, I think this is fine as a surgical fix and I think trying to DRY this up can be done at a later date.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said, I guess my question for this new implementation is what is now happening with this line (which exists in _execute
now:
new_job = ManageIQ::Providers::EmbeddedAnsible::AutomationManager::Job.create_job(jt, decrypt_options(opts)) |
Is that just going to cause the same issue that we are seeing now, just with another queue item in between it?
(Note: You can just expand the code in the diff below to see what I talking about, just shared a permalink since I can't comment on code that hasn't changed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I don't think it's really worth DRYing this particular bit up. I think this is kinda how the task/queue stuff is meant to be used. Plus I think there's a fair chance that the args here could change separately from the provider CRUD stuff.
I was thinking that there could be a use case for creating an MiqTask helper to do the wait_for_taskid and raise if the status is no-good, but we're not doing that in crud_common so ... shrug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that just going to cause the same issue that we are seeing now, just with another queue item in between it?
No, at this point in the thread of execution we will queue new items for states in the job, but those are all queued for the same server here:
manageiq/app/models/manageiq/providers/ansible_runner_workflow.rb
Lines 62 to 78 in 82fe638
def queue_signal(*args, deliver_on: nil) | |
role = options[:role] || "ems_operations" | |
priority = options[:priority] || MiqQueue::NORMAL_PRIORITY | |
MiqQueue.put( | |
:class_name => self.class.name, | |
:method_name => "signal", | |
:instance_id => id, | |
:priority => priority, | |
:role => role, | |
:zone => zone, | |
:task_id => guid, | |
:args => args, | |
:deliver_on => deliver_on, | |
:server_guid => MiqServer.my_server.guid, | |
) | |
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry, I see now. It is a bit farther down then the preview showed:
:server_guid => MiqServer.my_server.guid, |
I was seeing the *args
and was trying to find where it was being passed in from 😩
Okay, cool, looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fat fingered the re-review request... not really knowing what it did.
Still good. 🐑 🇮🇹
} | ||
|
||
task_id = MiqTask.generic_action_with_callback(task_opts, queue_opts) | ||
task = MiqTask.wait_for_taskid(task_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this should wait for task here, because that is effectively a blocking call (sleep loop, if I recall). I expected a new state that uses queue based waiting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous implementation didn't have a queue call at all so the blocking implementation is the same as it was previously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe put another way, there's a check_complete method, and we should probably check for the task completion there, else bail for the next check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this stage the job itself was queued wasnt it? That was my intention at least, and then the "externally running" job was supposed to be checked in check_complete
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this stage the job itself was queued wasnt it?
That's what I was trying to figure out, but got lost in automate. Here's puts caller.join("\n")
at the start of the old #execute
method:
/home/ncarboni/.gem/ruby/2.5.1/bundler/gems/manageiq-automation_engine-09c8bbdf6762/lib/miq_automation_engine/engine/miq_ae_method_service/miq_ae_service_model_base.rb:333:in `public_send'
/home/ncarboni/.gem/ruby/2.5.1/bundler/gems/manageiq-automation_engine-09c8bbdf6762/lib/miq_automation_engine/engine/miq_ae_method_service/miq_ae_service_model_base.rb:333:in `block in object_send'
/home/ncarboni/.gem/ruby/2.5.1/bundler/gems/manageiq-automation_engine-09c8bbdf6762/lib/miq_automation_engine/engine/miq_ae_method_service/miq_ae_service_model_base.rb:352:in `ar_method'
/home/ncarboni/.gem/ruby/2.5.1/bundler/gems/manageiq-automation_engine-09c8bbdf6762/lib/miq_automation_engine/engine/miq_ae_method_service/miq_ae_service_model_base.rb:362:in `ar_method'
/home/ncarboni/.gem/ruby/2.5.1/bundler/gems/manageiq-automation_engine-09c8bbdf6762/lib/miq_automation_engine/engine/miq_ae_method_service/miq_ae_service_model_base.rb:331:in `object_send'
/home/ncarboni/.gem/ruby/2.5.1/bundler/gems/manageiq-automation_engine-09c8bbdf6762/lib/miq_automation_engine/engine/miq_ae_method_service/miq_ae_service_model_base.rb:182:in `block (3 levels) in expose'
/home/ncarboni/Source/manageiq/app/models/user.rb:283:in `with_user'
/home/ncarboni/.gem/ruby/2.5.1/bundler/gems/manageiq-automation_engine-09c8bbdf6762/lib/miq_automation_engine/engine/miq_ae_method_service/miq_ae_service_model_base.rb:182:in `block (2 levels) in expose'
/home/ncarboni/.rubies/ruby-2.5.1/lib/ruby/2.5.0/drb/drb.rb:1630:in `perform_without_block'
/home/ncarboni/.rubies/ruby-2.5.1/lib/ruby/2.5.0/drb/drb.rb:1590:in `perform'
/home/ncarboni/.rubies/ruby-2.5.1/lib/ruby/2.5.0/drb/drb.rb:1674:in `block (2 levels) in main_loop'
/home/ncarboni/.rubies/ruby-2.5.1/lib/ruby/2.5.0/drb/drb.rb:1670:in `loop'
/home/ncarboni/.rubies/ruby-2.5.1/lib/ruby/2.5.0/drb/drb.rb:1670:in `block in main_loop'
So up to this point the service was queued for execution, but for the automate role. I now need the job itself to run on the appliance with the embedded_ansible role.
ManageIQ::Providers::EmbeddedAnsible::AutomationManager::Job.create_job
calls run on the passed job template which in turn creates the workflow and calls signal the first time, but the base signal implementation just calls the method in the same process and AnsibleRunnerWorkflow#queue_signal
is set up to run the method on the same server here.
So to make a long story short, I'm pretty sure this is what we need to do 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my expectation of what create_job did was wrong, but I still contend that this should be async. In good with merge for now though and talking about that for follow-up since it's technically the same before/after as you've said
@@ -14,6 +14,25 @@ def preprocess(action, add_options = {}) | |||
end | |||
|
|||
def execute(action) | |||
task_opts = { | |||
:action => "Executing Ansible Playbook", | |||
:userid => "system" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be the current user who ordered the service or the service owner?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe? I wasn't sure if this was relevant as the user didn't seem to be used, but I can change it. How to I get the userid for the user who ordered the service (I would assume that's the better option of the two)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should actually stay as system, because it's just an internal placement mechanism, not something we need user context for. I think the "Executing Ansible Playbook" action name is what makes that confusing and we should change that to "Launching Ansible Job" or something.
Before this change the actual playbook run could happen on any appliance with the automate role. We need the playbook run to happen specifically on the embedded_ansible appliance so that we can ensure the repo update/clone process will run successfully.
b08b2f9
to
640a8a5
Compare
Checked commits carbonin/manageiq@11d517d~...640a8a5 with ruby 2.4.6, rubocop 0.69.0, haml-lint 0.20.0, and yamllint 1.10.0 |
Okay, @Fryguy I think I made the changes you wanted for this first pass. I'll work on a followup that makes this async by saving off the task id into the service options and using the |
Queue ServiceAnsiblePlaybook#execute for the embedded_ansible role (cherry picked from commit d855e23)
Ivanchuk backport details:
|
Before this change the actual playbook run could happen on any
appliance with the automate role. We need the playbook run to happen
specifically on the embedded_ansible appliance so that we can ensure
the repo update/clone process will run successfully.
cc @gmcculloug @tinaafitz I'm not sure if this can be done in any better way. I wasn't able to find a higher level place to change how this was getting queued. It seems like the actual call stack for execute ends somewhere in automate engine and drb so I feel like that would be harder to tackle than this.
But if someone knows that there's somewhere else I can make this change, then I'm open for suggestions.