Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using ECS to track the source of errors or exceptions #154

Open
webmat opened this issue Oct 26, 2018 · 21 comments
Open

Using ECS to track the source of errors or exceptions #154

webmat opened this issue Oct 26, 2018 · 21 comments
Labels

Comments

@webmat
Copy link
Contributor

webmat commented Oct 26, 2018

I'd like to discuss the use case of tracking the source of program errors / exceptions (line numbers, function names, etc) with ECS. These fields may end up under the field sets file and/or error.

I'd like to get feedback on how we could approach this. Information we may like to track about where the error was raised:

  • error source file
  • line number
  • error message
  • function/method name
  • class / module (or other structure) inside which that function/method lives, when appropriate
  • error class / standardized name (e.g. literally the class name in an OO language)
  • error fields, if any
  • error stack trace

Please let me know if there's anything I'm missing in terms of relevant data to support.

Here's a suggestion of a mapping for the above:

  • error source file: file.path

  • error.line (long)

  • error.message (text)

  • error.function (keyword)

    • the word function is a bit more broad than method, IMO, so can serve both use cases.
  • Error class: error.id

  • Error fields

    • Either create a field similar to labels, where people nest their kv attributes or we don't add it to ECS at first.
  • stack trace: error.stack_trace (non indexed)

  • @graphaelli or @roncohen I'd like your POV on how you're doing it currently in APM

  • @ruflin I'd like your opinion on this as well

@graphaelli
Copy link
Member

A fully filled (though contrived) error example can be found here with corresponding spec here. We're not thrilled about the error.exception / error.log model but haven't made concrete plans to change that.

Some notable differences:

  • error.message is a text field in APM. In the proposed schema, how is this different from the existing error.code keyword field?
  • error.code is in APM, as a long
  • error.function is error.culprit in APM
  • error.stack_trace is well defined and in use across a number of languages
  • error.fields appears to be error.attributes in APM
  • error.handled is a thing in APM, indicating whether the error was caught somewhere in the code

Have you considered an error level? log.level might be sufficient.

@webmat
Copy link
Contributor Author

webmat commented Oct 29, 2018

Excellent, thanks @graphaelli for the detailed answer and the example log :-)

  • error.message: mis-typed that one, it should have been text :-)
  • error.code: since error codes and IDs can contain letters in some systems, we have no choice but to make them keyword, even if they're often numeric in most systems.
  • <3 error.attributes, a better name
  • error.handled => yes, very good addition!
  • I see how error.culprit avoids mismatch of words loaded with other meanings like method or function, depending on the language being monitored. Although I'm not a fan of the word culprit yet, I think it's a big aggressive :-) But I'll give it some time...

Back to an important point that's not in the bullet list, though. What do you mean when you say the following?

We're not thrilled about the error.exception / error.log model

Is it because you feel there's a disconnect between the concept of error vs exception? Something else entirely?

@graphaelli
Copy link
Member

I was referring to actual naming, sorry that wasn't clear. Currently we don't have error.code or error.message- they're exception.error.code and exception.error.message which seems like an unnecessary nesting now. I still need to track down some background on how things ended up like this.

@webmat
Copy link
Contributor Author

webmat commented Oct 29, 2018

Gotcha. Yeah I still need to find some time to get acquainted with APM, and see it from up close :-)

@ruflin
Copy link
Member

ruflin commented Oct 30, 2018

Historically we put the failure message of the ingest pipeline into error.message. So this is not actually the error from the input but the error that happened during processing. I remember we discussed somewhere else that this is not optimal and the processing errors should probably end up somewhere else.

@webmat As we already have some error fields, I'm wondering if we hit some issues / limitation with these that cause this issue? Would be good to discuss this on an actual use case we need to solve.

@webmat webmat added the discuss label Oct 30, 2018
@bmagistro
Copy link

@ruflin I think some of this started from the log format we have from our apps (#153) where the log message (regardless of if it is an error) includes sourcing information, source file, function, and line number.

@webmat
Copy link
Contributor Author

webmat commented Oct 30, 2018

@ruflin Yeah I don't think I've actually seen a concrete clash between userland errors vs pipeline errors.

But it's the first thing I thought when I started using Ingest Node, and saw that the convention was to put IN errors right in error.message...

This led me to noticing other places where pipeline details get mixed in with userland data, examples:

  • _grok_parse_failure being added to tags by Logstash' grok filter. Mentioned yesterday here
  • Tracking the precise timing of various parts of a logging pipeline (edge agent, centralized receiver, queue, centralized processor, timing even per plugin/processor)

I haven't expressed my grand vision (haha) on this pipeline field set in an issue yet, because I think it's not as pressing as what we're currently working on. But I'm taking notes :-)

But to get back to the simplest part of this issue: if one tracks an error log and processes it with Ingest Node, then potentially their system's error.message gets overwritten by Ingest Node pipeline errors. No need to see this in the wild to see that there's a potential for clobbering userland data :-)

@ruflin
Copy link
Member

ruflin commented Oct 31, 2018

An other place where we use error.message at the moment is in Metricbeat. In case we fail to fetch some data we put the error message there with the rest of the event and send it out. I think that is the use case I expect with error.message and works well as there is no ingest pipeline.

I wonder if for now would could leave it up to the implementer to make sure error.message cannot show up twice / be overwritten. I would need to check if there are any Filebeat datasets that use error.message.

@webmat
Copy link
Contributor Author

webmat commented Oct 31, 2018

Yeah I don't feel like it's pressing to get this right for GA. This whole idea of a pipeline field set can wait a bit.

@sporkmonger
Copy link

Along these lines, some of the logging tools I use have the option to emit the filename and line number the log message came from (e.g. caller.go:42). That shouldn't be nested under error since it happens for all log lines and it might be problematic and ambiguous to put it at the top level under file in the scenario where the log message in question is already in relation to a file.

@xkrt
Copy link

xkrt commented Jun 24, 2019

@sporkmonger +1
Please, note that some logging libs print source line for all log levels, for example glog

@felixbarny
Copy link
Member

With the current schema, is it reasonable to add the exception message and full stack trace to error.message? Even if there might be an ECS definition of a stack trace in the future, I think it's useful to have a field which just contains the stack trace as a raw string.

@felixbarny
Copy link
Member

More APM stack trace discussions going on in elastic/apm#40

@mustafacagataytulun
Copy link

I think some of the suggested fields can have more shared usage. For example, line number can show the line number of the generic logging commands, not just the errors. Let's assume I have a line in my code like this:

myLogger.logInfo("Books has been successfully retrieved.");

This is not an error, but I have still information about,

  • class name (in which class this logging command is?)
  • file name (in which file this logging command is?)
  • line number (on which line this logging command is?)
  • method name (in which method this logging command is?)

Imho, these fields may be more of a candidate of event fieldset. Actually, action field in event fieldset looks like corresponding to the method name information, and dataset field in event fieldset looks like corresponding to the class name information. Of course, this is only a wild guess, since the documentation does not talk about that fields like this.

@felixbarny
Copy link
Member

I have created #562 to add error.stack_trace as a plain-text field.

@felixbarny
Copy link
Member

@Beawolf that should be covered in #563

@mustafacagataytulun
Copy link

@felixbarny Great addition, thank you!

@axw
Copy link
Member

axw commented Sep 18, 2019

It would be great if we could generalise parsed/structured stack traces so they're not only defined for errors/exceptions. The other use case I have in mind is profiling (proposal doc). The gist is that we would create documents that associate one or more metrics with a stack trace and metadata -- labels, host name, etc. This use case would require indexing the stack traces.

@ypid-geberit
Copy link
Contributor

ypid-geberit commented Apr 22, 2021

Historically we put the failure message of the ingest pipeline into error.message.

Beats use log.flags (#191). This connection between the two issues was somehow missed. I created #1379 so that I can put my custom pipeline warnings and errors into a somewhat defined field.

Also, I created #1373 which I now found is related to this issue. You may close #1373 if you want to keep the two topics (errors from the event and errors from the pipeline while processing the event) together.

@ebeahan
Copy link
Member

ebeahan commented Apr 23, 2021

Thanks, @ypid-geberit! Let's leave the two topics separated and discuss further in #1373.

@Mpdreamz
Copy link
Member

Leaving a comment to connect two tickets: elastic/ecs-dotnet#158

In .NET we added stack frames as nested back in '15 to our most popular logging integration serilog.sinks.elasticsearch.

We are about to release an ECS logging variant of the sink that adheres to ECS and ECS logging more strictly: https://github.com/elastic/ecs-dotnet/tree/main/src/Elastic.CommonSchema.Serilog.Sink

Not being able to persist and query for stack frames is definitely a big feature loss for folks looking to migrate. E.g while error.type has the top level exception quite often this shadows the actual offending exception as exceptions get wrapped/rethrown.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests