AbstractEcsLoggingTest should log keys as nested objects, not with dotted key names #51

andreaturli · 2019-11-18T14:08:49Z

I've noticed AbstractEcsLoggingTest checks for keys like process.thread.name while I think the key should be something like `"process": { "thread": { "name": "co.elastic.logging.logback.EcsEncoderTest"}}" instead of something like

 {"@timestamp":"2019-11-18T13:42:33.333Z","log.level":"DEBUG","message":"test","service.name":"test","process.thread.name":"main","log.logger":"co.elastic.logging.logback.EcsEncoderTest","log.origin":{"file.name":"AbstractEcsEncoderTest.java","function":"debug","file.line":47}}

The text was updated successfully, but these errors were encountered:

webmat · 2019-11-18T14:20:46Z

Agreed, the event above should look like this (once prettified), on disk:

{ "@timestamp":"2019-11-18T13:42:33.333Z",
  "message": "test",
  "log": {
    "level": "DEBUG",
    "logger": "co.elastic.logging.logback.EcsEncoderTest",
    "origin": {
      "file": {
        "name": "AbstractEcsEncoderTest.java",
        "line": 47
      },
      "function": "debug"
    }
  },
  "service": { "name": "test" }
}

There's no automatic transformation of dotted keys to nesting in Elasticsearch / ingest pipelines. The raw events have to be nested from the start.

Only way to change dots to nesting is Logstash' "dedot" filter, which is O(n).

felixbarny · 2019-11-18T16:58:06Z

Elasticsearch will automatically create nested documents if a key contains a dot. Just the _source will not contain a properly nested JSON. But when you do searches/aggregations, ES will treat them as nested fields.

webmat · 2019-11-18T18:39:04Z

The mappings may consider both as equivalent, but the ES mappings aren't the only thing at play here.

Dotted key names means consumers of the raw documents (e.g. pipelines & apps accessing documents via the API) will not reliably be able to access the nested fields of these documents.

The structure must be nested objects only, no dots in key names. Otherwise these are not valid ECS documents.

felixbarny · 2019-11-19T08:28:28Z

Good points. The problem is that the MDC does not support nested fields. We use that, for example, for the APM/log correlation with the field trace.id. Does that mean we have to transform an MDC value of trace.id=72ace1d into a nested object like "trace": { "id": "72ace1d" }? Also, when there are labels like foo.bar=baz and foo.baz=bar, do those have to be transformed to "foo": {"bar": "baz", "baz": "bar"}? That would significantly increase the overhead.

While using a nested structure for process.thread.name would be possible, "process": { "thread": { "name": "co.elastic.logging.logback.EcsEncoderTest"}} is significantly less readable.

Is there any Filebeat processor that could normalize that?

Maybe it's another reason why we should go with a custom log format that supports dots to denote nesting, multi-line strings and does not require keys for timestamp, log.level and message.

webmat · 2019-11-20T14:03:56Z

In the call yesterday other good points were made around having dotted key names in logs.

If we want to have the beginning of the JSON log lines easy to read, log.level (with dots) would be important. Otherwise we have to bring in the whole log.* sub-object in that place, which means we can't start with a few regular keys e.g. timestamp, log.level, message, then the rest.
Some logging libraries don't give the flexibility to perform a change such as transforming dotted key names (e.g. from the MDC) to nesting. So it's not even a matter of "should we do this", but rather "we can't do this".

So it's starting to look like there isn't actually a choice to accept the dotted key names in the raw JSON logs. This will be confusing to people, as the documents really must be converted to nesting ASAP, to avoid pipelines having to hunt around for the key names that are present. Trying nested and falling back to looking for a dotted key name doesn't work. Because when there's more than one level of nesting, it turns into a combinatorial explosion of possibilities:

log.origin.file.name
log.origin.file { name }
log.origin { file.name }
log.origin { file { name } }
log { origin.file.name }
log { origin.file { name } }
log { origin { file.name } }
log { origin { file { name } } }

On the other hand, if the documentation for our logging libraries mention these dotted keys and how they're transformed into nesting ASAP to become ECS compliant, that can be fine.

Another way to think about this: if we step out of JSON and pretend we had gone with KV instead, the keys could have probably been dotted in the raw logs; and we would have converted that to nesting anyway. So it's fine to do this in JSON too, I guess. We just need to be clear about this conversion in the logging library docs.

There isn't currently a processor that performs this conversion in Filebeat or libbeat. But @urso is open to having this created. We can also explore doing this in Javascript via the new script Beats processor, in the meantime.

pgomulka · 2019-11-22T11:52:40Z

I think the examples from #51 (comment) should be supported by Elasticsearch https://www.elastic.co/guide/en/elasticsearch/reference/7.x/dot-expand-processor.html

In my view we should only nest when there is a need to make a log line more compact and easier to read. For instance I don't think this would be need to nest just one field i.e process.thread.name as per @felixbarny comment, but even more fields often times would not be beneficial to fully nest

like process.thread.name process.thread.id would probably be easier to digest if logged as process.thread: {id : 1, name: "thread-0 }

I also support comment made by @webmat that beginning of log line should be easy to read, as otherwise app admins would have trouble investigating their logs visually.

Maybe this should be configurable?

felixbarny · 2019-11-26T13:14:39Z

The problem with the dot expander processor is that you have to explicitly state which field it should expand. Maybe we can extend this processor so that it discovers all dots and converts them to nested objects? I don't think that would be too computationally expensive, especially because we would apply it to documents that have minimal nesting to begin with.

webmat · 2019-11-27T14:03:46Z

Agreed, the dot expander should (perhaps optionally) support finding and converting all dotted key names without having to call out to them explicitly. Otherwise the problem isn't resolved with anything under labels, which we can't know in advance.

But inspecting the whole object is O(N) where N is the total amount of keys (not the amount of dotted keys).

Fr33Radical · 2019-12-04T15:28:55Z

I was about to create the topic too.
Someone report to me that they receive logs from the OS I think that contains dots and should not be nested.

I will ask for an example.
I don't think it is a good idea to have a hybrid nested/flat JSON logs. At this time, they are creating a parser that do something like this for each field at the entry of the pipeline :

 json {
        source => "[event][original]"
    }
        
    mutate {
        rename => { "message" => "[message]" }
        rename => { "process.thread.name" => "[process][thread][name]" }
    }

Also, when there are labels like foo.bar=baz and foo.baz=bar, do those have to be transformed to "foo": {"bar": "baz", "baz": "bar"}? That would significantly increase the overhead.

For my part, I'd prefer to have the choice between performance vs nested JSON.

My issue is about the format when logs are sent to a distant server from the application.

felixbarny · 2019-12-05T09:35:05Z

Ideally, I'd like the performance penalty to be in the processing pipeline and not in the application.

But there's another thing we didn't mention yet - jq-friendliness. When dealing with JSON logs, it can be quite powerful to search specific log events via jq. So having them properly nested instead of dotted would be a big plus.

I'm wondering if we can just start the log messages with timestamp and message and put the "log": { "level": "...", "logger": "..."} object after it. We'd basically lose the visual alignment of the log level which would be somewhat sad.

I'm really torn on that mater but maybe we should not focus too much on readable JSON logs and instead document how to effectively handle JSON logs with jq and tail. And with or stack of course. But the requirement to be human-readable is more for the potential case where the centralized logging is down for whatever reason and you have to fall back to ssh/less/grep.

But also in development, where you look at logs in your IDE, human-readable logs are just more pleasant.

Do we want to tell users they need to configure dedicated plain-text logging for that or are we trying to cover both cases with human and machine-readable logs?

Fr33Radical · 2019-12-31T01:50:40Z

I'd just like to add. Could the project/this feature stay backward compatible?

webmat · 2020-01-03T16:27:52Z

@Fr33Radical As long as the project is in major version 0 (0.1.3 as of this comment), backwards compatibility is best effort only. It's too early to guarantee until we hit 1.x.

felixbarny · 2020-04-30T12:34:40Z

Closing this as it's not feasible to guarantee all fields are nested, especially when allowing user-defined custom attributes.

This means that the JSON produced by ecs-logging libraries is strictly speaking not ECS compatible. Therefore, there has to be another processing step that takes the log file and converts dots to nested fields. I've created elastic/beats#17021 for this. Please add your 👍 there.

Fr33Radical · 2020-10-02T15:03:51Z

If we can live with the fact of having nested json with no exception. Then, is there a way to achieve this without filebeat? (Waiting for the fix).

felixbarny · 2021-08-30T07:30:15Z

Update:
Converting dotted fields to nested objects is now possible both in Beats with the expand_keys option of the decode_json_fields processor and as of 7.15.0, it will also be possible with the dot_expander Elasticsearch ingest node processor.
This issue contains a POC for a ingest node pipeline that auto-detects and properly parses ECS JSON: elastic/integrations#1454

universalmind303 mentioned this issue Nov 20, 2019

Use nested JSON instead of dot strings #52

Closed

philkra mentioned this issue Dec 6, 2019

[DISCUSS] Draft of Spec elastic/ecs-logging#18

Merged

pgomulka mentioned this issue Jan 13, 2020

ES log samples for ECS JSON format elastic/beats#15425

Closed

felixbarny mentioned this issue Mar 16, 2020

Add de-dot processor that converts dotted field names to nested objects elastic/beats#17021

Closed

felixbarny closed this as completed Apr 30, 2020

felixbarny mentioned this issue Jul 6, 2020

Format logs in nested JSON #80

Closed

Fr33Radical mentioned this issue Oct 5, 2020

Nested JSON - De_dot plugin equivalent in log4j #94

Closed

felixbarny mentioned this issue Nov 6, 2020

Java Json format not consistent with ECS #102

Closed

This was referenced Aug 30, 2021

Possible incorrect JSON representation of log.level for Pino log format elastic/ecs-logging-nodejs#91

Closed

Namespace MDC Fields with either labels or tags. #142

Open

nosan mentioned this issue Apr 10, 2025

ECS structure logging is not compatible with all collectors as it does not use the nested format spring-projects/spring-boot#45063

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AbstractEcsLoggingTest should log keys as nested objects, not with dotted key names #51

AbstractEcsLoggingTest should log keys as nested objects, not with dotted key names #51

andreaturli commented Nov 18, 2019 •

edited

Loading

webmat commented Nov 18, 2019

felixbarny commented Nov 18, 2019

webmat commented Nov 18, 2019 •

edited

Loading

felixbarny commented Nov 19, 2019

webmat commented Nov 20, 2019 •

edited

Loading

pgomulka commented Nov 22, 2019 •

edited

Loading

felixbarny commented Nov 26, 2019

webmat commented Nov 27, 2019

Fr33Radical commented Dec 4, 2019 •

edited

Loading

felixbarny commented Dec 5, 2019

Fr33Radical commented Dec 31, 2019

webmat commented Jan 3, 2020

felixbarny commented Apr 30, 2020

Fr33Radical commented Oct 2, 2020

felixbarny commented Aug 30, 2021

AbstractEcsLoggingTest should log keys as nested objects, not with dotted key names #51

AbstractEcsLoggingTest should log keys as nested objects, not with dotted key names #51

Comments

andreaturli commented Nov 18, 2019 • edited Loading

webmat commented Nov 18, 2019

felixbarny commented Nov 18, 2019

webmat commented Nov 18, 2019 • edited Loading

felixbarny commented Nov 19, 2019

webmat commented Nov 20, 2019 • edited Loading

pgomulka commented Nov 22, 2019 • edited Loading

felixbarny commented Nov 26, 2019

webmat commented Nov 27, 2019

Fr33Radical commented Dec 4, 2019 • edited Loading

felixbarny commented Dec 5, 2019

Fr33Radical commented Dec 31, 2019

webmat commented Jan 3, 2020

felixbarny commented Apr 30, 2020

Fr33Radical commented Oct 2, 2020

felixbarny commented Aug 30, 2021

andreaturli commented Nov 18, 2019 •

edited

Loading

webmat commented Nov 18, 2019 •

edited

Loading

webmat commented Nov 20, 2019 •

edited

Loading

pgomulka commented Nov 22, 2019 •

edited

Loading

Fr33Radical commented Dec 4, 2019 •

edited

Loading