Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

logs: Use journalctl -o with-unit for default logs #3346

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

cgwalters
Copy link
Contributor

@cgwalters cgwalters commented Aug 30, 2023

Before systemd was created, syslog was the dominant Unix logging system.

Later, systemd introduced the journal whose output intentionally exactly matched that of syslog for compatibility.

In systemd, the "unit" is the technical heart of things; yet the syslog output only includes the "syslog identifier" which might or might not look like the unit.

For many units that just run shell script, the syslog identifier is just "bash" which is not useful.

The -o with-unit is in RHEL8 and above; it's not in RHEL7. I don't know if it is a goal of this project to ship the latest releases on very old operating systems. If it is, we can try to do dynamic detection.


Please place an 'X' inside each '[]' to confirm you adhere to our Contributor Guidelines

  • Is the commit message split over multiple lines and hard-wrapped at 72 characters?
  • Is the subject and message clear and concise?
  • Does the subject start with [plugin_name] if submitting a plugin patch or a [section_name] if part of the core sosreport code?
  • Does the commit contain a Signed-off-by: First Lastname email@example.com?
  • Are any related Issues or existing PRs properly referenced via a Closes (Issue) or Resolved (PR) line?

@packit-as-a-service
Copy link

Congratulations! One of the builds has completed. 🍾

You can install the built RPMs by following these steps:

  • sudo yum install -y dnf-plugins-core on RHEL 8
  • sudo dnf install -y dnf-plugins-core on Fedora
  • dnf copr enable packit/sosreport-sos-3346
  • And now you can install the packages.

Please note that the RPMs should be used only in a testing environment.

@TurboTurtle
Copy link
Member

Latest sos on EL 7 is sos-3.9. sos-4.0+ requires python3, so we don't have an EL7 target for recent/new releases.

As far as this change goes, I can see the use case here but I wonder about two points

  1. Effect on grokability for human technical support engineer inspecting logs
  2. Automation that may be affected by the change in filename

Point 2 highlights the need for automation tools to use the included manifest.json, but this is such a highly visible file that we'd at least want to make a sanity check here. Tagging @pmoravec @mhradile @jcastill on the RH side for opinions.

@cgwalters
Copy link
Contributor Author

Effect on grokability for human technical support engineer inspecting logs

It's a change, but the premise is that "grokability" improves with this. In some cases, by a lot; the syslog identifier can just vary wildly in value. I gave a few examples in other PRs, but in a nutshell, e.g.:

[root@cosa-devsh ~]# systemd-run -u testunit1 /bin/bash -c 'echo some debugging output'
Running as unit: testunit1.service
[root@cosa-devsh ~]# systemd-run -u testunit2 /bin/bash -c 'echo other debugging output'
Running as unit: testunit2.service
[root@cosa-devsh ~]# journalctl -u testunit1 -u testunit2
Aug 30 17:06:24 cosa-devsh systemd[1]: Started testunit1.service - /bin/bash -c echo some debugging output.
Aug 30 17:06:24 cosa-devsh bash[1440]: some debugging output
Aug 30 17:06:24 cosa-devsh systemd[1]: testunit1.service: Deactivated successfully.
Aug 30 17:06:29 cosa-devsh systemd[1]: Started testunit2.service - /bin/bash -c echo other debugging output.
Aug 30 17:06:29 cosa-devsh bash[1442]: other debugging output
Aug 30 17:06:29 cosa-devsh systemd[1]: testunit2.service: Deactivated successfully.
[root@cosa-devsh ~]# journalctl -u testunit1 -u testunit2 -o with-unit
Wed 2023-08-30 17:06:24 UTC cosa-devsh init.scope[1]: Started testunit1.service - /bin/bash -c echo some debugging output.
Wed 2023-08-30 17:06:24 UTC cosa-devsh testunit1.service[1440]: some debugging output
Wed 2023-08-30 17:06:24 UTC cosa-devsh init.scope[1]: testunit1.service: Deactivated successfully.
Wed 2023-08-30 17:06:29 UTC cosa-devsh init.scope[1]: Started testunit2.service - /bin/bash -c echo other debugging output.
Wed 2023-08-30 17:06:29 UTC cosa-devsh testunit2.service[1442]: other debugging output
Wed 2023-08-30 17:06:29 UTC cosa-devsh init.scope[1]: testunit2.service: Deactivated successfully.
[root@cosa-devsh ~]# 

In this example I ran the units serially, but if the units are run in parallel it can be really hard to figure out which logs correspond to which unit, as they're just attributed to bash in the syslog output.

Automation that may be affected by the change in filename

Ohh, right I didn't realize that the filename is changed. Hmm, well, we could override that?

@TurboTurtle
Copy link
Member

Actually, I didn't realize at first this was nested in CosLogs, and not the broader LogsBase.

I'm not sure at all about what/any automation Google is using on sos report coming from COS systems, but as written this wouldn't have any effect on RHEL boxes.

@cgwalters
Copy link
Contributor Author

but as written this wouldn't have any effect on RHEL boxes.

Oops, I admit to not testing this locally...

@cgwalters cgwalters marked this pull request as draft August 30, 2023 22:05
@cgwalters cgwalters marked this pull request as ready for review August 30, 2023 22:23
@cgwalters
Copy link
Contributor Author

(Now lightly tested locally)

I started to add an option to fall back to default syslog, but ISTM that because CosLogs uses option_list it means parent classes can't do so?

@arif-ali
Copy link
Member

arif-ali commented Sep 1, 2023

We have some automation of checking logs as well in Canonical, so will need to check these items before I would give a +1.

comparing the 2 outputs, the format of the timestamps looks different too. I will liaise with our team as well, if that's ok

@cgwalters
Copy link
Contributor Author

We have some automation of checking logs as well in Canonical, so will need to check these items before I would give a +1.

Of course, there's no urgency on this, and I agree we should have a sense of the potential fallout.

(But I'd say nearly every single time I've read a sosreport journal I've come across at least one ambiguous case where I wanted this)

@arif-ali
Copy link
Member

We have some automation of checking logs as well in Canonical, so will need to check these items before I would give a +1.

Of course, there's no urgency on this, and I agree we should have a sense of the potential fallout.

(But I'd say nearly every single time I've read a sosreport journal I've come across at least one ambiguous case where I wanted this)

Based on our internal discussions, this looks good, and should not affect our automation

so, ack from me, we just need to resolve the one test that is failing in the stage one testing

Before systemd was created, syslog was the dominant Unix logging
system.

Later, systemd introduced the journal whose output intentionally
exactly matched that of syslog for compatibility.

In systemd, the "unit" is the technical heart of things; yet
the syslog output only includes the "syslog identifier" which
might or might not look like the unit.

For many units that just run shell script, the syslog
identifier is just "bash" which is not useful.

The `-o with-unit` is in RHEL8 and above; it's not in RHEL7.  I don't
know if it is a goal of this project to ship the latest releases
on very old operating systems.  If it is, we can try to do dynamic
detection.

Signed-off-by: Colin Walters <walters@verbum.org>
@cgwalters
Copy link
Contributor Author

Thanks! I'm not familiar with avocado; took a blind attempt at a fix, if that doesn't work will dig in.

@TurboTurtle
Copy link
Member

I've restarted the CI jobs to cycle back on this - the logs have been purged so I'm no longer sure what the initial problem was. Assuming these tests fail again, we can merge once the failing test case is resolved.

Overall I think the change is fine, especially with @arif-ali reporting no ill effect on the Canonical side of the house.

@arif-ali
Copy link
Member

@cgwalters the tests need fixing if this is going to go through, do you have interest in pursuing this PR?

@arif-ali arif-ali added the Status/Need More Info Feedback is required to reproduce issue or to continue work label Nov 20, 2024
@jcastill jcastill added the Status/RedHat Eng RedHat Engineering has been requested to review label Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status/Need More Info Feedback is required to reproduce issue or to continue work Status/RedHat Eng RedHat Engineering has been requested to review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants