Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update playbook_log_stdout to handle possible nil job #20738

Merged
merged 2 commits into from
Nov 4, 2020
Merged

Update playbook_log_stdout to handle possible nil job #20738

merged 2 commits into from
Nov 4, 2020

Conversation

djberg96
Copy link
Contributor

Currently the AnsiblePlaybookMixin#playbook_log_stdout method assumes that a job will always be present. However, if we look at service_ansible_playbook.rb, we see this call:

playbook_log_stdout(log_option, job(action))

And the job(action) looks like this:

service_resources.find_by(:name => action, :resource_type => 'OrchestrationStack').try(:resource)

We know from the try call that this could thus be nil. And, in fact, that's what @d-m-u hit while testing a variant of #20645 where there's a network interruption:

ERROR -- : Q-task_id([r88_service_template_provision_task_88]) The following error occurred during instance method <on_error> for AR object <#<ServiceAnsiblePlaybook id: 88, name: "drew", <snip>
[----] E, [2020-10-26T14:09:27.389968 #1580547:2ab9680952a8] ERROR -- : Q-task_id([r88_service_template_provision_task_88]) MiqAeServiceModelBase.ar_method raised: <NoMethodError>: <undefined method `name' for nil:NilClass>
[----] E, [2020-10-26T14:09:27.390025 #1580547:2ab9680952a8] ERROR -- : Q-task_id([r88_service_template_provision_task_88]) /var/www/miq/vmdb/app/models/mixins/ansible_playbook_mixin.rb:40:in `rescue in playbook_log_stdout'
/var/www/miq/vmdb/app/models/mixins/ansible_playbook_mixin.rb:34:in `playbook_log_stdout'

So this PR updates the method so that it skips logging job.name info if there isn't actually a name.

No specs for now, there aren't any for this mixin currently, and I wasn't sure if they were moved somewhere else.

@djberg96
Copy link
Contributor Author

CC @NickLaMuro, @d-m-u

@@ -35,9 +35,11 @@ def playbook_log_stdout(log_option, job)
return unless %(on_error always).include?(log_option)
return if log_option == 'on_error' && job.raw_status.succeeded?

$log.info("Stdout from ansible job #{job.name}: #{job.raw_stdout('txt_download')}")
$log.info("Stdout from ansible job #{job.name}: #{job.raw_stdout('txt_download')}") if job
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the event we don't have a job, do we want to log that we're missing a job?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

who are the callers of this method and what is a reasonable expectation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @chessbyte,

This method gets called from the postprocess state.
https://github.com/ManageIQ/manageiq-content/blob/bb73854dd8d6b1e9647390538ef78046b123f608/content/automate/ManageIQ/Service/Generic/StateMachines/GenericLifecycle.class/__methods__/postprocess.rb#L19

The postprocess state is processed after the playbook has been launched. A valid job is a reasonable expectation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment here: #20738 (comment) . IMO callers should be expected to pass a valid job. I think the only thing that should change in this method is to add a guard clause:

raise ArgumentError, "must pass a valid job" if job.nil?

@Fryguy
Copy link
Member

Fryguy commented Oct 27, 2020

I'm surprised we don't have a job as we create the job first before we kick off the run. I think we need some more investigation into how we get into this situation before we band-aid it.

Copy link
Member

@NickLaMuro NickLaMuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some thoughts.

rescue StandardError => err
$log.error("Failed to get stdout from ansible job #{job.name}")
msg = "Failed to get stdout from ansible job"
msg << " #{job.name}" if job
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things here:

  • Would it make sense to rescue NoMethodError above here as well? We could avoid a few of the if calls above by doing that
  • To address @d-m-u 's concern, could we change the code here to this:
    msg = "Failed to get stdout from ansible job #{job ? job.name : '<UknownJob>'}"

That said, I think he concern raised by @chessbyte (here) and @Fryguy (here) are valid, so that might make the above suggesting unnecessary if we determine the root cause for this should normally not happen (I am not sure myself to be honest).

@djberg96
Copy link
Contributor Author

djberg96 commented Oct 27, 2020

@Fryguy, @NickLaMuro I'm actually curious why there's a rescue clause at all in there. All it does is return or log. So, unless the logging itself fails (because coincidentally the job is nil), I'm curious how it would ever reach that condition.

@NickLaMuro
Copy link
Member

@djberg96 not that this really answers your question, but looks like it has always been this way since it was introduced:

#16414

@tinaafitz
Copy link
Member

Dan and I discussed the issue and I don't think this PR will be necessary.

@djberg96
Copy link
Contributor Author

FWIW, I added some specs. So, if it turns out we just want a quick fix, this will still be here.

@Fryguy
Copy link
Member

Fryguy commented Oct 27, 2020

Dan and I discussed the issue and I don't think this PR will be necessary.

What was the conclusion?

@tinaafitz
Copy link
Member

Hi @Fryguy, It doesn't make sense that we don't have a valid job here. I'm starting to look at the environment now.

@tinaafitz
Copy link
Member

After further investigation, this code change is important.

Here's why:

It's important to note that customers will encounter this issue if they have custom Automate methods in any of the pre* states of this state machine that end in error.

Automate State Machine
/Service/Generic/StateMachines/GenericLifecycle/provision state machine

@Fryguy
Copy link
Member

Fryguy commented Oct 30, 2020

To me, then, this feels like the caller to playbook_log_stdout should be the one to change to check that the job is real. If anything changes in playbook_log_stdout, IMO, it should be to raise an ArgumentError if someone passes nil.

That is, since postprocess seems shared between good and bad runs, it also has to be resilient to good and bad runs. Ultimately postprocess calls log_stdouthere

def log_stdout(action)
log_option = options.fetch_path(:config_info, action.downcase.to_sym, :log_output) || 'on_error'
playbook_log_stdout(log_option, job(action))
end

So I think that method should be changed to account for a nil job....something like:

 def log_stdout(action) 
   log_option = options.fetch_path(:config_info, action.downcase.to_sym, :log_output) || 'on_error' 
   job = job(action)
   if job.nil?
     $log.info("No stdout available due to missing job") # or something to that effect.
   else
     playbook_log_stdout(log_option, job) 
   end
 end 

@djberg96
Copy link
Contributor Author

djberg96 commented Nov 2, 2020

@Fryguy ok, updated

@tinaafitz
Copy link
Member

Thanks @Fryguy for pointing that out. That makes sense.

@tinaafitz
Copy link
Member

Thanks @djberg96

@Fryguy
Copy link
Member

Fryguy commented Nov 2, 2020

@djberg96 Can you squash the commits? The commit history now shows a change to the method and then an unchange to the method.

Verify job exists in conditional.

Initial specs.

Add another spec.

Restore playbook_log_stdout, modify caller instead.
@djberg96
Copy link
Contributor Author

djberg96 commented Nov 2, 2020

@Fryguy commits squashed.

@miq-bot
Copy link
Member

miq-bot commented Nov 2, 2020

Checked commits https://github.com/djberg96/manageiq/compare/deb1ed05d8c3ef1bc8c40197039219b74eb8a5f6~...5a3d48e12890dba5cfa534d07b760c3bc3bf76f9 with ruby 2.6.3, rubocop 0.82.0, haml-lint 0.35.0, and yamllint
3 files checked, 11 offenses detected

spec/models/mixins/ansible_playbook_mixin_spec.rb

@kbrock kbrock merged commit 3493002 into ManageIQ:master Nov 4, 2020
@simaishi
Copy link
Contributor

simaishi commented Nov 9, 2020

@djberg96 Can this be ivanchuk/yes (which implies jansa/yes and kasparov/yes)?

@djberg96
Copy link
Contributor Author

djberg96 commented Nov 9, 2020

@dmetzger57 What say you?

@simaishi
Copy link
Contributor

simaishi commented Nov 9, 2020

@djberg96 I mean the compatibility. Are the changes in this PR compatible with ivanchuk branch that it's ok to backport?

@dmetzger57
Copy link
Contributor

The change should be ivanchuk/yes if cleanly backportable

@djberg96
Copy link
Contributor Author

djberg96 commented Nov 9, 2020

@miq-bot add_label ivanchuk/yes

simaishi pushed a commit that referenced this pull request Nov 9, 2020
Update playbook_log_stdout to handle possible nil job

(cherry picked from commit 3493002)

https://bugzilla.redhat.com/show_bug.cgi?id=1835226
@simaishi
Copy link
Contributor

simaishi commented Nov 9, 2020

Ivanchuk backport details:

$ git log -1
commit 3ae0f417dccf908881e1922117f6911f06a48f73
Author: Keenan Brock <keenan@thebrocks.net>
Date:   Wed Nov 4 11:36:54 2020 -0500

    Merge pull request #20738 from djberg96/ansible_playbook_log_stdout

    Update playbook_log_stdout to handle possible nil job

    (cherry picked from commit 349300251dfc90b0933daf3a35faf6abfe9653b2)

    https://bugzilla.redhat.com/show_bug.cgi?id=1835226

simaishi pushed a commit that referenced this pull request Nov 9, 2020
Update playbook_log_stdout to handle possible nil job

(cherry picked from commit 3493002)
@simaishi
Copy link
Contributor

simaishi commented Nov 9, 2020

Jansa backport details:

$ git log -1
commit dffb46b9e15736334522f21c765c78fc12000003
Author: Keenan Brock <keenan@thebrocks.net>
Date:   Wed Nov 4 11:36:54 2020 -0500

    Merge pull request #20738 from djberg96/ansible_playbook_log_stdout

    Update playbook_log_stdout to handle possible nil job

    (cherry picked from commit 349300251dfc90b0933daf3a35faf6abfe9653b2)

simaishi pushed a commit that referenced this pull request Nov 9, 2020
Update playbook_log_stdout to handle possible nil job

(cherry picked from commit 3493002)
@simaishi
Copy link
Contributor

simaishi commented Nov 9, 2020

Kasparov backport details:

$ git log -1
commit 949a51e4fb5f87cc2c62957f77b8031e2dc6648a
Author: Keenan Brock <keenan@thebrocks.net>
Date:   Wed Nov 4 11:36:54 2020 -0500

    Merge pull request #20738 from djberg96/ansible_playbook_log_stdout

    Update playbook_log_stdout to handle possible nil job

    (cherry picked from commit 349300251dfc90b0933daf3a35faf6abfe9653b2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants