-
Notifications
You must be signed in to change notification settings - Fork 418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Communicate pipeline parse failure or warnings to users #1373
Comments
@ypid-geberit I think these tags might be best defined downstream of ECS as what works in an Ingest Pipeline might not match Logstash or even 3rd party Extract-Transform-Load tools Could you provide more context? |
I have a strong preference for defining this in ECS. I know multiple log parser "frameworks" (Perl, Logstash, Ingest Pipelines and my current vendor-neutral favorite http://vector.dev/). I don’t see a reason why something should work for one but not the other. All will have something like grok and so on.
Sure, lets look at a practical example. This event is part of integration testing of my Vector config: {
"message": "<134>1 2021-03-22T17:10:11+01:00 - - - - [meta sysUpTime=\"no int\"] Invalid sysUpTime."
} It is transformed into this for indexing to ES: {
"@timestamp": "2021-03-22T16:10:11Z",
"__": {
"event": {
"hash": "fa31847bd9f47fa13459dd3c6922de1924168ed7b175eeea97a39aad01f30c99"
},
"id": "v5_fa31847bd9f47fa13459dd3",
"index_name": "log_other__v1_2021"
},
"ecs": {
"version": "1.9.0"
},
"event": {
"ingested": "Fixed timestamp in test mode.",
"kind": "event",
"original": "<134>1 2021-03-22T17:10:11+01:00 - - - - [meta sysUpTime=\"no int\"] Invalid sysUpTime.",
"severity": 6
},
"host": {},
"log": {
"flags": [
[
"parse_warning: syslog: Drop non-int field meta.sysUpTime: function call error for \"to_int\" at (1938:1968): Invalid integer \"no int\": invalid digit found in string",
"parse_warning: host.name* missing: Neither host.name nor host.name_rdns are known.",
"parse_warning"
]
],
"level": "info",
"syslog": {
"facility": {
"name": "local0"
}
}
},
"message": "Invalid sysUpTime.",
"tags": [
"parse_warning: syslog",
"parse_warning: host.name* missing"
]
} My idea for what goes into |
Hi @ypid-geberit sorry for the slow reply, OOO. These are truly some great ideas - the concept of working with multiple log parser frameworks is inherent to Elastic Observability, see: https://www.elastic.co/guide/en/beats/filebeat/7.13/filebeat-modules.html and there is a built-in processor to add tags see: https://www.elastic.co/guide/en/beats/filebeat/7.13/add-tags.html so the goal would be to respect the diversity of these sources while leveraging their commonality |
Summary
Define a field that can hold and communicate detailed failure and warning sentences populated by the log parser (Ingest Pipelines, something else).
tags
field should additionally only contain a summary of the failures and warnings (similar to how Logstash does it).Motivation:
When parsing logs various failures or warnings can occur. Consider the source log is JSON. If decoding the JSON does not work, this would be a failure that the log parser cannot really recover from other than leaving the undecoded JSON in the
message
field.But there are multiple cases were the parser can do something. For example one field cannot be parsed/normalized. For example the user agent. Or if some quality assurance on the
@timestamp
fails,event.created
could be used instead.Some keywords to make this issue better searchable: _dateparsefailure, _grokparsefailure, QA
Detailed Design:
#1372, #1379The text was updated successfully, but these errors were encountered: