-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: Add new tests for Drain pattern matching #12655
Conversation
3d57d66
to
02faf08
Compare
pkg/pattern/drain/drain_test.go
Outdated
patterns: []string{ | ||
"<_> capacity <_>", | ||
"<_> capacity changes <_>", | ||
"{\"id\":\"D4Oh1ivB6cdLWa08\",\"level\":\"debug\",\"max-pool\":4,\"min-pool\":0,\"msg\":\"check capacity\",\"pending-builds\":0,\"running-builds\":0,\"server-buffer\":0,\"server-capacity\":0,\"server-count\":0,\"time\":\"2024-04-16T14:48:52Z\"}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use string with ` to avoid the pain of escaping.
pkg/pattern/drain/drain_test.go
Outdated
// At training, if at the depth of LogClusterDepth there is a cluster with | ||
// similarity coefficient greater that SimTh, then the log message is added | ||
// to that cluster. Otherwise, a new cluster is created. | ||
// | ||
// LogClusterDepth should be equal to the number of constant tokens from | ||
// the beginning of the message that likely determine the message contents. | ||
// | ||
// > In this step, Drain traverses from a 1-st layer node, which | ||
// > is searched in step 2, to a leaf node. This step is based on | ||
// > the assumption that tokens in the beginning positions of a log | ||
// > message are more likely to be constants. Specifically, Drain | ||
// > selects the next internal node by the tokens in the beginning | ||
// > positions of the log message | ||
LogClusterDepth: 8, | ||
// SimTh is basically a ratio of matching/total in the cluster. | ||
// Cluster tokens: "foo <*> bar fred" | ||
// Log line: "foo bar baz qux" | ||
// * * * x | ||
// Similarity of these sequences is 0.75 (the distance) | ||
// Both SimTh and MaxClusterDepth impact branching factor: the greater | ||
// MaxClusterDepth and SimTh, the less the chance that there will be | ||
// "similar" clusters, but the greater the footprint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super cool thank you for that !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't take credit for this! I copied the descriptions and parameters over from
Line 30 in 491d251
var drainConfig = &drain.Config{ |
But now I think it makes more sense to unify it under the DefaultConfig function inside the drain package, at least until it becomes clear that we'd need different configs for different sets of logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love it, happy to merge as is to incrementally move forward.
I think we could add more and more format as we go. Specially popular ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What this PR does / why we need it: