-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Print more log messages to enable tracking of SLIs #110
Conversation
023ba48
to
30689cc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good to me. I added one nitpick and have one more general comment, although I think that you were trying to keep the changes as minimal as possible.
Since you added the names logger in the hub plugin, I winder whether it wouldn't be better to rework the builder plugin to also get and use a logger with the same name, instead of using self.logger.info()
directly... 🤔
883a0fc
to
8e80eb2
Compare
Indeed.
That might make sense! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, although I would still use the named logger in the builder plugin just for the sake of consistency 😇
Love this, thanks @ochosi ! Will this allow us to measure end to end success ratio in the builder (and the hub for that matter), i.e. not only that the build was started successfully but that it completed with success? |
If a compose id is returned we assume that the compose worked, yes. There is no further evidence about the state of the compose in koji-osbuild, as far as I can tell. The rest would be in composer. We can hopefully somehow plot this side-by-side though, even if we cannot combine the metrics. In the hub we can only measure if the (valid) requests that come in were correctly added to Koji's database, cause that's really all the hub plugin does. (It validates the json and then creates a task in the db.) |
5a60399
to
aa5e8b4
Compare
I apologize for the many force pushes. I hope the PR is ok now. |
Well, there is koji-osbuild/plugins/builder/osbuild.py Line 722 in 741be47
|
Very true, that's already appearing in the Splunk logs. The idea of tracking the compose id as service level indicator separately is that we cannot guarantee that the compose will be successful. It seems interesting and worth tracking, but it is (as far as I understand this) not directly related to the plugin's functionality. |
Each 'Task id' corresponds to a 'Compose id' in case everything works as expected. In order to be able to track both in Splunk to measure our first service level indicator (SLI) we need to explicitly log the 'Task id' when it is received by the plugin.
Log both the entrypoint and the return value from adding a task to Koji's database. We can measure both to ensure a task has been successfully added to the database as a service level indicator.
This is so that new pylint errors with the version in Fedora 37 can be fixed in a separate, subsequent PR.
aa5e8b4
to
70ab44c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Perfect :)
I think we should track both. If the "outer" SLI fails it will be down to either the "inner" one or composer so good to have the possibility of digging deeper. But I think the interesting thing to measure is the overall success rate, whether issues are due to the plugin or our dependencies. |
We also have a |
That sounds like a very good idea. I wonder if the retries are already logged in some way or if we have to log them explicitly. |
@diaasami was tracking retries at some point, not sure if these are the same ones though. |
Yes, these are still being tracked and the frequency has decreased a lot, we have a week without a single retry every few weeks.
They are not, the retries in |
In order to track service level indicators, we need to print more logs.
For the Koji Hub plugin, this means we print a message each time a request is received by the plugin. A second log message is printed once the task has been successfully created in Koji Hub's database. This also will allow us to measure the time it takes for Koji Hub to insert a task into its database.
Example messages:
For the Koji Builder plugin we print a log message once a task has been received. We print a second log message once composer returns a compose id. This means the plugin completed one iteration successfully.
These logs are ingested into Splunk. In a dashboard we can then track the two subsequent requests and the duration between them.
As the newer version of pylint available in Fedora 37 that was just released fails with the current code base, I pinned the container so this can be addressed in a separate, subsequent PR.