Using ECS to track the source of errors or exceptions #154

webmat · 2018-10-26T14:18:03Z

I'd like to discuss the use case of tracking the source of program errors / exceptions (line numbers, function names, etc) with ECS. These fields may end up under the field sets file and/or error.

I'd like to get feedback on how we could approach this. Information we may like to track about where the error was raised:

error source file
line number
error message
function/method name
class / module (or other structure) inside which that function/method lives, when appropriate
error class / standardized name (e.g. literally the class name in an OO language)
error fields, if any
error stack trace

Please let me know if there's anything I'm missing in terms of relevant data to support.

Here's a suggestion of a mapping for the above:

error source file: file.path
error.line (long)
error.message (text)
error.function (keyword)
- the word function is a bit more broad than method, IMO, so can serve both use cases.
Error class: error.id
Error fields
- Either create a field similar to labels, where people nest their kv attributes or we don't add it to ECS at first.
stack trace: error.stack_trace (non indexed)
@graphaelli or @roncohen I'd like your POV on how you're doing it currently in APM
@ruflin I'd like your opinion on this as well

The text was updated successfully, but these errors were encountered:

graphaelli · 2018-10-26T20:12:41Z

A fully filled (though contrived) error example can be found here with corresponding spec here. We're not thrilled about the error.exception / error.log model but haven't made concrete plans to change that.

Some notable differences:

error.message is a text field in APM. In the proposed schema, how is this different from the existing error.code keyword field?
error.code is in APM, as a long
error.function is error.culprit in APM
error.stack_trace is well defined and in use across a number of languages
error.fields appears to be error.attributes in APM
error.handled is a thing in APM, indicating whether the error was caught somewhere in the code

Have you considered an error level? log.level might be sufficient.

webmat · 2018-10-29T15:22:28Z

Excellent, thanks @graphaelli for the detailed answer and the example log :-)

error.message: mis-typed that one, it should have been text :-)
error.code: since error codes and IDs can contain letters in some systems, we have no choice but to make them keyword, even if they're often numeric in most systems.
<3 error.attributes, a better name
error.handled => yes, very good addition!
I see how error.culprit avoids mismatch of words loaded with other meanings like method or function, depending on the language being monitored. Although I'm not a fan of the word culprit yet, I think it's a big aggressive :-) But I'll give it some time...

Back to an important point that's not in the bullet list, though. What do you mean when you say the following?

We're not thrilled about the error.exception / error.log model

Is it because you feel there's a disconnect between the concept of error vs exception? Something else entirely?

graphaelli · 2018-10-29T16:32:38Z

I was referring to actual naming, sorry that wasn't clear. Currently we don't have error.code or error.message- they're exception.error.code and exception.error.message which seems like an unnecessary nesting now. I still need to track down some background on how things ended up like this.

webmat · 2018-10-29T19:29:37Z

Gotcha. Yeah I still need to find some time to get acquainted with APM, and see it from up close :-)

ruflin · 2018-10-30T08:56:22Z

Historically we put the failure message of the ingest pipeline into error.message. So this is not actually the error from the input but the error that happened during processing. I remember we discussed somewhere else that this is not optimal and the processing errors should probably end up somewhere else.

@webmat As we already have some error fields, I'm wondering if we hit some issues / limitation with these that cause this issue? Would be good to discuss this on an actual use case we need to solve.

bmagistro · 2018-10-30T18:26:03Z

@ruflin I think some of this started from the log format we have from our apps (#153) where the log message (regardless of if it is an error) includes sourcing information, source file, function, and line number.

webmat · 2018-10-30T19:37:33Z

@ruflin Yeah I don't think I've actually seen a concrete clash between userland errors vs pipeline errors.

But it's the first thing I thought when I started using Ingest Node, and saw that the convention was to put IN errors right in error.message...

This led me to noticing other places where pipeline details get mixed in with userland data, examples:

_grok_parse_failure being added to tags by Logstash' grok filter. Mentioned yesterday here
Tracking the precise timing of various parts of a logging pipeline (edge agent, centralized receiver, queue, centralized processor, timing even per plugin/processor)

I haven't expressed my grand vision (haha) on this pipeline field set in an issue yet, because I think it's not as pressing as what we're currently working on. But I'm taking notes :-)

But to get back to the simplest part of this issue: if one tracks an error log and processes it with Ingest Node, then potentially their system's error.message gets overwritten by Ingest Node pipeline errors. No need to see this in the wild to see that there's a potential for clobbering userland data :-)

ruflin · 2018-10-31T06:57:32Z

An other place where we use error.message at the moment is in Metricbeat. In case we fail to fetch some data we put the error message there with the rest of the event and send it out. I think that is the use case I expect with error.message and works well as there is no ingest pipeline.

I wonder if for now would could leave it up to the implementer to make sure error.message cannot show up twice / be overwritten. I would need to check if there are any Filebeat datasets that use error.message.

webmat · 2018-10-31T12:50:50Z

Yeah I don't feel like it's pressing to get this right for GA. This whole idea of a pipeline field set can wait a bit.

sporkmonger · 2019-04-06T21:57:12Z

Along these lines, some of the logging tools I use have the option to emit the filename and line number the log message came from (e.g. caller.go:42). That shouldn't be nested under error since it happens for all log lines and it might be problematic and ambiguous to put it at the top level under file in the scenario where the log message in question is already in relation to a file.

xkrt · 2019-06-24T11:14:47Z

@sporkmonger +1
Please, note that some logging libs print source line for all log levels, for example glog

felixbarny · 2019-07-29T12:46:19Z

With the current schema, is it reasonable to add the exception message and full stack trace to error.message? Even if there might be an ECS definition of a stack trace in the future, I think it's useful to have a field which just contains the stack trace as a raw string.

felixbarny · 2019-08-14T15:10:37Z

More APM stack trace discussions going on in elastic/apm#40

mustafacagataytulun · 2019-08-26T07:25:08Z

I think some of the suggested fields can have more shared usage. For example, line number can show the line number of the generic logging commands, not just the errors. Let's assume I have a line in my code like this:

myLogger.logInfo("Books has been successfully retrieved.");

This is not an error, but I have still information about,

class name (in which class this logging command is?)
file name (in which file this logging command is?)
line number (on which line this logging command is?)
method name (in which method this logging command is?)

Imho, these fields may be more of a candidate of event fieldset. Actually, action field in event fieldset looks like corresponding to the method name information, and dataset field in event fieldset looks like corresponding to the class name information. Of course, this is only a wild guess, since the documentation does not talk about that fields like this.

felixbarny · 2019-09-16T13:28:30Z

I have created #562 to add error.stack_trace as a plain-text field.

felixbarny · 2019-09-16T14:41:46Z

@Beawolf that should be covered in #563

mustafacagataytulun · 2019-09-18T05:32:19Z

@felixbarny Great addition, thank you!

axw · 2019-09-18T06:01:57Z

It would be great if we could generalise parsed/structured stack traces so they're not only defined for errors/exceptions. The other use case I have in mind is profiling (proposal doc). The gist is that we would create documents that associate one or more metrics with a stack trace and metadata -- labels, host name, etc. This use case would require indexing the stack traces.

ypid-geberit · 2021-04-22T17:34:22Z

Historically we put the failure message of the ingest pipeline into error.message.

Beats use log.flags (#191). This connection between the two issues was somehow missed. I created #1379 so that I can put my custom pipeline warnings and errors into a somewhat defined field.

Also, I created #1373 which I now found is related to this issue. You may close #1373 if you want to keep the two topics (errors from the event and errors from the pipeline while processing the event) together.

ebeahan · 2021-04-23T16:50:20Z

Thanks, @ypid-geberit! Let's leave the two topics separated and discuss further in #1373.

Mpdreamz · 2023-02-10T12:43:15Z

Leaving a comment to connect two tickets: elastic/ecs-dotnet#158

In .NET we added stack frames as nested back in '15 to our most popular logging integration serilog.sinks.elasticsearch.

We are about to release an ECS logging variant of the sink that adheres to ECS and ECS logging more strictly: https://github.com/elastic/ecs-dotnet/tree/main/src/Elastic.CommonSchema.Serilog.Sink

Not being able to persist and query for stack frames is definitely a big feature loss for folks looking to migrate. E.g while error.type has the top level exception quite often this shadows the actual offending exception as exceptions get wrapped/rethrown.

webmat mentioned this issue Oct 26, 2018

Clarify mapping related to logging #153

Closed

webmat mentioned this issue Oct 29, 2018

Tweak note about 'IDs' being keyword to also mention 'codes' #157

Merged

webmat added the discuss label Oct 30, 2018

jderose9 mentioned this issue Jul 19, 2019

What field to use for Java / Jee logs #501

Closed

felixbarny mentioned this issue Jul 29, 2019

Logging libraries for ECS #281

Closed

webmat mentioned this issue Aug 14, 2019

Add log.logger #521

Merged

felixbarny mentioned this issue Sep 11, 2019

Formatting of stacktraces elastic/ecs-logging-java#25

Closed

pgomulka mentioned this issue Sep 11, 2019

[Logging] Support for ECS and custom event tags elastic/elasticsearch#46119

Closed

felixbarny mentioned this issue Sep 16, 2019

Add error.stack_trace #562

Merged

felixbarny mentioned this issue Oct 2, 2019

Set stacktrace.abs_path elastic/apm-agent-java#856

Closed

6 tasks

axw mentioned this issue Nov 22, 2019

Experimental self-profiling elastic/apm-server#2839

Merged

5 tasks

axw mentioned this issue Jun 11, 2020

model: align stacktrace with ECS elastic/apm-server#3877

Open

ebeahan mentioned this issue Apr 23, 2021

[meta] Add support for pipeline details in ECS #940

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using ECS to track the source of errors or exceptions #154

Using ECS to track the source of errors or exceptions #154

webmat commented Oct 26, 2018 •

edited

Loading

graphaelli commented Oct 26, 2018

webmat commented Oct 29, 2018

graphaelli commented Oct 29, 2018

webmat commented Oct 29, 2018

ruflin commented Oct 30, 2018

bmagistro commented Oct 30, 2018

webmat commented Oct 30, 2018

ruflin commented Oct 31, 2018

webmat commented Oct 31, 2018

sporkmonger commented Apr 6, 2019

xkrt commented Jun 24, 2019

felixbarny commented Jul 29, 2019

felixbarny commented Aug 14, 2019

mustafacagataytulun commented Aug 26, 2019

felixbarny commented Sep 16, 2019

felixbarny commented Sep 16, 2019

mustafacagataytulun commented Sep 18, 2019

axw commented Sep 18, 2019

ypid-geberit commented Apr 22, 2021 •

edited

Loading

ebeahan commented Apr 23, 2021

Mpdreamz commented Feb 10, 2023

Using ECS to track the source of errors or exceptions #154

Using ECS to track the source of errors or exceptions #154

Comments

webmat commented Oct 26, 2018 • edited Loading

graphaelli commented Oct 26, 2018

webmat commented Oct 29, 2018

graphaelli commented Oct 29, 2018

webmat commented Oct 29, 2018

ruflin commented Oct 30, 2018

bmagistro commented Oct 30, 2018

webmat commented Oct 30, 2018

ruflin commented Oct 31, 2018

webmat commented Oct 31, 2018

sporkmonger commented Apr 6, 2019

xkrt commented Jun 24, 2019

felixbarny commented Jul 29, 2019

felixbarny commented Aug 14, 2019

mustafacagataytulun commented Aug 26, 2019

felixbarny commented Sep 16, 2019

felixbarny commented Sep 16, 2019

mustafacagataytulun commented Sep 18, 2019

axw commented Sep 18, 2019

ypid-geberit commented Apr 22, 2021 • edited Loading

ebeahan commented Apr 23, 2021

Mpdreamz commented Feb 10, 2023

webmat commented Oct 26, 2018 •

edited

Loading

ypid-geberit commented Apr 22, 2021 •

edited

Loading