-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleaner log messages. #3647
Comments
With timestamps, these log lines are really verbose:
We could at least put the second timestamp ("recieved at"), which is normally the least relevant bit of info, at the end of the line:
And maybe even omit it if it is sufficiently similar to the log timestamp (within a few seconds, say):
|
We should definitely clean up log messages, the only real barrier to doing so has been the functional tests which make exact comparisons to log messages and often do nasty things like Here are my thoughts on cleaning the log files: 1) State change messages
- 2020-05-29T12:24:34+12:00 INFO - [foo.1] status=running: (received)succeeded at 2020-05-29T12:24:33+12:00 for job(01)
+ 2020-05-29T12:24:34+12:00 INFO - foo.1 #01 succeeded 2) Make submitting messages comprehendible
- 2020-06-04T09:54:10Z INFO - [foo.1] -submit-num=01, owner@host=<user>@<host>
+ 2020-06-04T09:54:10Z INFO - foo.1 #01 submitting on <host> 3) Only log client commands if they are actually informational
- 2020-06-04T10:14:21Z INFO - [client-command] put_messages <user>@<host>:cylc-message
- 2020-06-04T10:14:26Z INFO - [client-command] graphql <user>@<host>.local:cylc-tui 4) Consider demoting triggering items to DEBUG
- 2020-06-04T10:18:15Z INFO - [foot.20191209T1200Z] -triggered off []
+ 2020-06-04T10:18:15Z DEBUG - foot.20191209T1200Z - triggered off [] 5) Demote task failure messages from CRITICAL to INFO
- 2020-06-04T09:54:10Z CRITICAL - [foo.1] status=running: (received)failed/EXIT at 2020-06-04T10:18:18Z for job(01)
- 2020-06-04T09:54:10Z CRITICAL - [foo.1] -job(01) failed
+ 2020-06-04T09:54:10Z INFO - foo.1 #01 failed (EXIT) 6) Consider demoting task messages to a new log level
- 2020-06-04T09:54:10Z INFO - [foo.1] status=running: (received)succeeded at 2020-05-29T12:24:33+12:00 for job(01)
+ 2020-06-04T09:54:10Z TASK - foo.1 #01 succeeded
- 2020-06-04T09:54:10Z INFO - [bar.1] status=running: (received)failed/EXIT at 2020-06-04T10:18:18Z for job(01)
+ 2020-06-04T09:54:10Z TASK - bar.1 #01 failed (EXIT) 7) Consider tidying health check messages
- 2020-06-04T10:16:36Z INFO - [foo.1] -health check settings: execution timeout=None Or: - 2020-06-04T10:16:36Z INFO - [foo.1] -health check settings: execution timeout=None
+ 2020-06-04T10:16:36Z DEBUG - foo.1 health check settings: execution timeout=None 8) Change
|
@oliver-sanders, I agree with all suggestions above. I like the idea of using a TASK logging level for task messages, so long as they are not "demoted" in the sense that a deliberate action is required to turn that level on, because task state changes are pretty much the whole purpose of the scheduler. |
Also, not very difficult to implement. It will break a bunch of functional tests, but they'll be relatively easy to fix. I'll optimistically attach a "small" label 😁 |
Agreed, it is a small task, it's just that there are lots of loose ends to tidy up!
When we create the suite loggers we would set the level to A curious side-effect would be the ability to create pure suite or task logs. |
Oh, and another one: 9) Remove the debug line which accompanies state changes.
- 2020-06-11T15:24:27+01:00 INFO - [foo.1] status=ready: (internal)submitted at 2020-06-11T15:24:26+01:00 for job(01)
- 2020-06-11T15:24:27+01:00 DEBUG - [foo.1] -ready => submitted
+ 2020-06-11T15:24:27+01:00 TASK - foo.1 #01 submitted |
Post SoD I'll add another one: 10) Remove flow labels if only one flow is active.
- 2020-06-11T15:24:27+01:00 INFO - [foo.1] status=ready: (internal)submitted at 2020-06-11T15:24:26+01:00 for job(01) flow(X)
+ 2020-06-11T15:24:27+01:00 TASK - foo.1 #01 submitted |
So long as we're careful with the definition of "active" here. If we hold the main flow, then trigger a reflow, then resume the main flow later after the reflow is done, it should not look like (in the log) it was all the same flow because there was only ever one of them "active" at a time. |
Would have thought that |
Yes, that will be sufficient 👍 (Just saying, "active" has to mean "there are tasks present with this flow label" - but those tasks don't themselves have to be "active"!) |
A new logging level between |
(In case anyone plans to work on this in the near future, note there are some cosmetic changes to task proxy logging already coming in #4300). |
Tagging this against 8.0.0, it would be good to tackle some of this during the 8.0.0 stabilisation period. |
I'll assign myself - I'd intended to poke this anyways |
Thoughts on updates to @oliver-sanders proposal in the light of looking at the current code 1) State change messages
- [foo.1 job:01 flows:1 preparing] (internal)submitted 2020-05-29T12:24:34+12:00
+ 2020-05-29T12:24:34+12:00 INFO - [flow:1 1/foo #01 submitted]
2) Make submitting messages comprehensible
- 2020-06-04T09:54:10Z INFO - [foo.1] -submit-num=01, owner@host=<user>@<host>
+ 2020-06-04T09:54:10Z INFO - [flow:1 foo.1 #01] submitting on <host> 5) Demote task failure messages from CRITICAL to INFO@wxtim Votes for WARNING**
- 2020-06-04T09:54:10Z CRITICAL - [foo.1] status=running: (received)failed/EXIT at 2020-06-04T10:18:18Z for job(01)
- 2020-06-04T09:54:10Z CRITICAL - [foo.1] -job(01) failed
+ 2020-06-04T09:54:10Z INFO - foo.1 #01 failed (EXIT) |
Ideas:
|
Updated terminology (for logging of failures):
|
True if the only flow number in the whole run is
It's not really duplicate info, because scanning back up the log to find the most recent previous state change is painful. And at state transitions we log
👍
👍
👍 |
flow number logging proposalOn current master, flow numbers are not part of the task/job ID so should probably go outside of the square brackets.
Note |
Can we go for warning? I'm not sure I think that every path is equal, and would like to give this more weight... |
I'd say, failure is no more important than success for scheduling (for the graph), if handled. But for logging purposes failure should still be highlighted. It shows, by definition, that a task failed to do what it is supposed to do, and even if the failure is handled we might still want to know that handling was required. E.g. if an occasional failure becomes a regular event maybe the failing task should be modified rather than "handled". |
|
In light of my comment above, this needs arguing. IMO it stays CRITICAL or at least WARNING in the log. |
If we shorten |
This pattern of log messages is somewhat confusing:
This means that the task went through the queue and was then prepared for submission. But the But we should be able to go straight from |
Agreed, that's annoying, but it might be somewhat tricky to solve because tasks don't necessarily go direct from queued to preparing. A task waiting on other prerequisites, or xtriggers, may not be queued. It might be easier, and still an improvement to skip the initial plain waiting message:
... because (as I recall) all tasks go to queued at first, and are then immediately released to waiting if not currently queue-limited. |
[from offline chat] it would be really nice to have at least one "verbose" logging level between INFO and DEBUG. [This was already suggested somewhere above on this issue by @MetRonnie so here's my take on it]. Note 3rd party package: https://pypi.org/project/verboselogs and https://github.com/xolox/python-verboselogs It is a trivial extension of Python logging, which we could easily do ourselves to avoid the dependency. Then I'd suggest, something like this:
|
When we get a chance, this issue needs reviewing:
|
At INFO level, we should just log the timestamped (optionally: #3646) sequence of events in the simplest way possible, unless something goes wrong. There's too much usually-irrelevant crap in there.
E.g. (with two timestamps removed):
This should be something like:
No need to log current status unless it is not the expected status.
And no need to log "received" either, just modify slightly for the polled case:
Pull requests welcome!
The text was updated successfully, but these errors were encountered: