Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Communicate pipeline parse failure or warnings to users #1373

Open
ypid-geberit opened this issue Apr 21, 2021 · 3 comments
Open

Communicate pipeline parse failure or warnings to users #1373

ypid-geberit opened this issue Apr 21, 2021 · 3 comments
Labels
enhancement New feature or request

Comments

@ypid-geberit
Copy link
Contributor

ypid-geberit commented Apr 21, 2021

Summary

Define a field that can hold and communicate detailed failure and warning sentences populated by the log parser (Ingest Pipelines, something else). tags field should additionally only contain a summary of the failures and warnings (similar to how Logstash does it).

Motivation:

When parsing logs various failures or warnings can occur. Consider the source log is JSON. If decoding the JSON does not work, this would be a failure that the log parser cannot really recover from other than leaving the undecoded JSON in the message field.

But there are multiple cases were the parser can do something. For example one field cannot be parsed/normalized. For example the user agent. Or if some quality assurance on the @timestamp fails, event.created could be used instead.

Some keywords to make this issue better searchable: _dateparsefailure, _grokparsefailure, QA

Detailed Design:

#1372, #1379

@ypid-geberit ypid-geberit added the enhancement New feature or request label Apr 21, 2021
@ypid-geberit ypid-geberit changed the title Communicate parse failure or warnings to users Communicate pipeline parse failure or warnings to users Apr 26, 2021
@djptek
Copy link
Contributor

djptek commented May 4, 2021

@ypid-geberit I think these tags might be best defined downstream of ECS as what works in an Ingest Pipeline might not match Logstash or even 3rd party Extract-Transform-Load tools

Could you provide more context?

@ypid-geberit
Copy link
Contributor Author

ypid-geberit commented May 4, 2021

best defined downstream of ECS

I have a strong preference for defining this in ECS. I know multiple log parser "frameworks" (Perl, Logstash, Ingest Pipelines and my current vendor-neutral favorite http://vector.dev/). I don’t see a reason why something should work for one but not the other. All will have something like grok and so on.

Could you provide more context?

Sure, lets look at a practical example. This event is part of integration testing of my Vector config:

{                                                                                                                                                                                             
  "message": "<134>1 2021-03-22T17:10:11+01:00 - - - - [meta sysUpTime=\"no int\"] Invalid sysUpTime."                                                                                        
}

It is transformed into this for indexing to ES:

{
  "@timestamp": "2021-03-22T16:10:11Z",
  "__": {
    "event": {
      "hash": "fa31847bd9f47fa13459dd3c6922de1924168ed7b175eeea97a39aad01f30c99"
    },
    "id": "v5_fa31847bd9f47fa13459dd3",
    "index_name": "log_other__v1_2021"
  },
  "ecs": {
    "version": "1.9.0"
  },
  "event": {
    "ingested": "Fixed timestamp in test mode.",
    "kind": "event",
    "original": "<134>1 2021-03-22T17:10:11+01:00 - - - - [meta sysUpTime=\"no int\"] Invalid sysUpTime.",
    "severity": 6
  },
  "host": {},
  "log": {
    "flags": [
      [
        "parse_warning: syslog: Drop non-int field meta.sysUpTime: function call error for \"to_int\" at (1938:1968): Invalid integer \"no int\": invalid digit found in string",
        "parse_warning: host.name* missing: Neither host.name nor host.name_rdns are known.",
        "parse_warning"
      ]
    ],
    "level": "info",
    "syslog": {
      "facility": {
        "name": "local0"
      }
    }
  },
  "message": "Invalid sysUpTime.",
  "tags": [
    "parse_warning: syslog",
    "parse_warning: host.name* missing"
  ]
}

My idea for what goes into tags and what into log.flags is that I would like to include tags by default in Kibana (Discover saved searches, logs app) for end users. Tags might contain other relevant infos, not just pipeline issues. log.flags can then be referred to if details about a warning or error are needed. It can also be used to alert by searching on the term "parse_warning" for example.

@djptek
Copy link
Contributor

djptek commented Jun 15, 2021

Hi @ypid-geberit sorry for the slow reply, OOO. These are truly some great ideas - the concept of working with multiple log parser frameworks is inherent to Elastic Observability, see: https://www.elastic.co/guide/en/beats/filebeat/7.13/filebeat-modules.html and there is a built-in processor to add tags see: https://www.elastic.co/guide/en/beats/filebeat/7.13/add-tags.html so the goal would be to respect the diversity of these sources while leveraging their commonality

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants