Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve log deduplication by having a log parser do automatically discovery of log patterns #807

Closed
neoakris opened this issue Jul 28, 2019 · 6 comments
Labels
component/loki stale A stale issue or PR that will automatically be closed. type/feature Something new we should do

Comments

@neoakris
Copy link

Is your feature request related to a problem? Please describe.
datadogs was too expensive for me, but I loved it's log-pattern feature. https://www.datadoghq.com/blog/log-patterns/
(Basically, if I have 1000 logs in a timespan it'll tell me I have ~30 unique patterns and frequency count of how often a pattern appears)

Describe the solution you'd like
https://github.com/logpai/logparser
describes log parsers that can do pattern recognition.

Describe alternatives you've considered
Log deduplication is a step in the right direction, maybe it could be improved further without having to incorporate log pattern recognition.

Additional context
(I added a screenshot snippit from their webpage incase they change the webpage)
image

@neoakris neoakris changed the title improve log deduplication by having a log parser to automatic discovery of log patterns improve log deduplication by having a log parser do automatically discovery of log patterns Jul 28, 2019
@cyriltovena
Copy link
Contributor

cyriltovena commented Jul 29, 2019

Interesting feature indeed, However I'm not sure if we should implement this directly in Loki or within Grafana Explore.

WDYT @davkal ?

@cyriltovena cyriltovena added component/loki type/feature Something new we should do labels Jul 29, 2019
@neoakris
Copy link
Author

neoakris commented Jul 29, 2019

That's a really good point, if you implement it within Grafana then everyone wins (Loki, Elastic Search, and potentially other data sources) / could benefit from the feature vs it being Loki specific.

@cyriltovena
Copy link
Contributor

related #28

@neoakris
Copy link
Author

neoakris commented Jul 30, 2019

(I'm not sure if the following belongs in a different ticket, but it's similar enough that I think it makes sense to include here.) <-- If I'm wrong about this I'll move it to a separate feature request.

Feature request: It'd be great if we could only show the unique values in tables

image
^
Notice the option for pages 1, 2, 3 in the picture, it means there could be 200 - 300 instances of that repeated pattern. But there could be some unique values mixed in.

(Above table is using)
-datasource: elasticsearch
-Table Transform JSON Data
-Metric Raw Document (which has no filter options, makes sense as it's raw document, but it'd be cool if we could implement filters after the JSON's been transformed into a table.)
-(Note there is a Metric for Unique Count, but that's not super useful/doesn't give a good end user experience.)

Background info about the data in elasticsearch:
I have nginx ingress controller log shipper that enriches logs
Example lets say original log message is:
"HTTP GET 304 from upstream pod monitoring-kube-grafana-80 "
It'll add tags to the log like Method = GET, Code = 304, upstream = monitoring-kube-grafana-80
(which makes them easier to search)

Other Useful Info:
Lucene Query doesn't offer a way to query only the unique values, so I can't create a query for unique values/I have to do it Grafana side.
Ironically Grafana Variable builder syntax does allow me to get the unique values in a grafana variable, but then I can't put the contents of the Grafana Variable in a table, and even if I could Grafana Filters wouldn't work right:
$UniqueHTTPMethods = {"find": "terms", "field": "nginx.access.method", "query": "service.type:nginx"}

@stale
Copy link

stale bot commented Sep 3, 2019

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale A stale issue or PR that will automatically be closed. label Sep 3, 2019
@stale stale bot closed this as completed Sep 10, 2019
@alanhe
Copy link

alanhe commented Oct 25, 2021

Datadogs' blog looks so great. Can we reconsider the possibility of the feature?

I’m thinking a few ways this might work:

  1. Integrate custom dedup algorithms as Loki plugins. Make Loki implement Grafana’s SPI to:
  • list available (built-in and) custom dedupe algorithms for users to choose from, and
  • send a bunch of logs to Loki for deduplication.
  1. Allow users to add some kind of “Dedup data source”, so that users can choose whichever implementations they like as long as they stick to the protocol.

(We are storing logs in Loki, so I guess Loki plugins might share more insights about the data and might be more performant.)

If neither #1 nor #2 is available in the near future, then is it possible to done it in the form of the Grafana plugin? Does Grafana have extension points might be helpful?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/loki stale A stale issue or PR that will automatically be closed. type/feature Something new we should do
Projects
None yet
Development

No branches or pull requests

3 participants