Don't scan after a failed Ansible remediation #290

comps · 2024-11-28T14:12:59Z

The original idea was to provide at least some scan results when (a few) Ansible tasks fail to remediate - to print the failed remediations as error + failing oscap rules as a nice double-check.

However RHEL-10 work has shown that this is not generally a good idea - we cannot tell if Ansible failed on something fatal (and thus about 100+ rules will fail) or if only 1-2 tasks failed and the scan won't be a wall of red lines.

So, to be on the safer side, don't do the scan when Ansible fails for any reason.

All the non-fatal remediation errors should still be reported (instead of only the first failing one) as check=True raises an exception only when all the output lines have been processed.

The original idea was to provide at least some scan results when (a few) Ansible tasks fail to remediate - to print the failed remediations as 'error' + failing oscap rules as a nice double-check. However RHEL-10 work has shown that this is not generally a good idea - we cannot tell if Ansible failed on something fatal (and thus about 100+ rules will fail) or if only 1-2 tasks failed and the scan won't be a wall of red lines. So, to be on the safer side, don't do the scan when Ansible fails for any reason. All the non-fatal remediation errors should still be reported (instead of only the first failing one) as check=True raises an exception only when all the output lines have been processed. Signed-off-by: Jiri Jaburek <comps@nomail.dom>

mildas · 2024-11-29T10:14:41Z

All the non-fatal remediation errors should still be reported (instead of only the first failing one) as check=True raises an exception only when all the output lines have been processed.

No errors will be reported by Contest as the reporting (ansible.report_from_output()) is done after playbook process.

Results will report that ansible-playbook returned non-zero return code and reviewer would need to go over output to find out which tasks failed.

Wouldn't it be better to just update the condition to

if proc.returncode != 0:
    raise RuntimeError(f"ansible-playbook failed with {proc.returncode}")

and not use check=True?

comps · 2024-11-29T12:14:08Z

No errors will be reported by Contest as the reporting (ansible.report_from_output()) is done after playbook process.

I don't think that's true, any errors found during processing Ansible output are actually recorded (written) to results.yaml in realtime (making tmt read it regardless of how we exit), and the raised exception is further caught by https://github.com/RHSecurityCompliance/contest/blob/main/lib/runtest.py#L73-L77 which does call report_and_exit().

Also, as I mentioned in PR description,

All the non-fatal remediation errors should still be reported (instead of only the first failing one) as check=True raises an exception only when all the output lines have been processed.

this works because the tests call Ansible via subprocess_stream(), which is a wrapper for subprocess.Popen(), which runs the process in the background. And only when the interprocess pipe is emptied does the process actually end (otherwise it remains a zombie as the kernel caches IO for it).

>>> import subprocess
>>> from lib import util
>>> proc, lines = util.subprocess_stream(['ls', '/', '/sdasds'], stderr=subprocess.STDOUT, check=True)
2024-11-29 13:04:58 <stdin>:1: running: ls / /sdasds
>>> proc
<Popen: returncode: None args: ['ls', '/', '/sdasds']>   # notice rc:None
>>> for line in lines:
...   print(line)
... 
ls: cannot access '/sdasds': No such file or directory
/:
afs
bin
boot
dev
etc
home
lib
lib64
media
mnt
opt
proc
root
run
sbin
srv
sys
tmp
usr
var
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jjaburek/gitit/contest/lib/util/subprocess.py", line 48, in generate_lines
    raise subprocess.CalledProcessError(cmd=cmd, returncode=code)
subprocess.CalledProcessError: Command '['ls', '/', '/sdasds']' returned non-zero exit status 2.
>>>

But maybe an explicit return code check provides a better explanation than 'ansible-playbook -abcd' returned non-zero exit status ?

comps changed the title ~~don't scan after a failed Ansible remediation~~ Don't scan after a failed Ansible remediation Nov 28, 2024

mildas approved these changes Nov 29, 2024

View reviewed changes

mildas merged commit 97e1bc8 into main Nov 29, 2024
3 checks passed

mildas deleted the dont_scan_failing_ansible branch November 29, 2024 14:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't scan after a failed Ansible remediation #290

Don't scan after a failed Ansible remediation #290

comps commented Nov 28, 2024

mildas commented Nov 29, 2024

comps commented Nov 29, 2024

Don't scan after a failed Ansible remediation #290

Don't scan after a failed Ansible remediation #290

Conversation

comps commented Nov 28, 2024

mildas commented Nov 29, 2024

comps commented Nov 29, 2024