-
Notifications
You must be signed in to change notification settings - Fork 80
AbstractEcsLoggingTest should log keys as nested objects, not with dotted key names #51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Agreed, the event above should look like this (once prettified), on disk: { "@timestamp":"2019-11-18T13:42:33.333Z",
"message": "test",
"log": {
"level": "DEBUG",
"logger": "co.elastic.logging.logback.EcsEncoderTest",
"origin": {
"file": {
"name": "AbstractEcsEncoderTest.java",
"line": 47
},
"function": "debug"
}
},
"service": { "name": "test" }
} There's no automatic transformation of dotted keys to nesting in Elasticsearch / ingest pipelines. The raw events have to be nested from the start. Only way to change dots to nesting is Logstash' "dedot" filter, which is O(n). |
Elasticsearch will automatically create nested documents if a key contains a dot. Just the |
The mappings may consider both as equivalent, but the ES mappings aren't the only thing at play here. Dotted key names means consumers of the raw documents (e.g. pipelines & apps accessing documents via the API) will not reliably be able to access the nested fields of these documents. The structure must be nested objects only, no dots in key names. Otherwise these are not valid ECS documents. |
Good points. The problem is that the MDC does not support nested fields. We use that, for example, for the APM/log correlation with the field While using a nested structure for Is there any Filebeat processor that could normalize that? Maybe it's another reason why we should go with a custom log format that supports dots to denote nesting, multi-line strings and does not require keys for |
In the call yesterday other good points were made around having dotted key names in logs.
So it's starting to look like there isn't actually a choice to accept the dotted key names in the raw JSON logs. This will be confusing to people, as the documents really must be converted to nesting ASAP, to avoid pipelines having to hunt around for the key names that are present. Trying nested and falling back to looking for a dotted key name doesn't work. Because when there's more than one level of nesting, it turns into a combinatorial explosion of possibilities:
On the other hand, if the documentation for our logging libraries mention these dotted keys and how they're transformed into nesting ASAP to become ECS compliant, that can be fine. Another way to think about this: if we step out of JSON and pretend we had gone with KV instead, the keys could have probably been dotted in the raw logs; and we would have converted that to nesting anyway. So it's fine to do this in JSON too, I guess. We just need to be clear about this conversion in the logging library docs. There isn't currently a processor that performs this conversion in Filebeat or libbeat. But @urso is open to having this created. We can also explore doing this in Javascript via the new script Beats processor, in the meantime. |
I think the examples from #51 (comment) should be supported by Elasticsearch https://www.elastic.co/guide/en/elasticsearch/reference/7.x/dot-expand-processor.html In my view we should only nest when there is a need to make a log line more compact and easier to read. For instance I don't think this would be need to nest just one field i.e like I also support comment made by @webmat that beginning of log line should be easy to read, as otherwise app admins would have trouble investigating their logs visually. Maybe this should be configurable? |
The problem with the dot expander processor is that you have to explicitly state which field it should expand. Maybe we can extend this processor so that it discovers all dots and converts them to nested objects? I don't think that would be too computationally expensive, especially because we would apply it to documents that have minimal nesting to begin with. |
Agreed, the dot expander should (perhaps optionally) support finding and converting all dotted key names without having to call out to them explicitly. Otherwise the problem isn't resolved with anything under But inspecting the whole object is O(N) where N is the total amount of keys (not the amount of dotted keys). |
I was about to create the topic too. I will ask for an example.
For my part, I'd prefer to have the choice between performance vs nested JSON. My issue is about the format when logs are sent to a distant server from the application. |
Ideally, I'd like the performance penalty to be in the processing pipeline and not in the application. But there's another thing we didn't mention yet - jq-friendliness. When dealing with JSON logs, it can be quite powerful to search specific log events via jq. So having them properly nested instead of dotted would be a big plus. I'm wondering if we can just start the log messages with timestamp and message and put the I'm really torn on that mater but maybe we should not focus too much on readable JSON logs and instead document how to effectively handle JSON logs with jq and tail. And with or stack of course. But the requirement to be human-readable is more for the potential case where the centralized logging is down for whatever reason and you have to fall back to ssh/less/grep. But also in development, where you look at logs in your IDE, human-readable logs are just more pleasant. Do we want to tell users they need to configure dedicated plain-text logging for that or are we trying to cover both cases with human and machine-readable logs? |
I'd just like to add. Could the project/this feature stay backward compatible? |
@Fr33Radical As long as the project is in major version 0 (0.1.3 as of this comment), backwards compatibility is best effort only. It's too early to guarantee until we hit 1.x. |
Closing this as it's not feasible to guarantee all fields are nested, especially when allowing user-defined custom attributes. This means that the JSON produced by ecs-logging libraries is strictly speaking not ECS compatible. Therefore, there has to be another processing step that takes the log file and converts dots to nested fields. I've created elastic/beats#17021 for this. Please add your 👍 there. |
If we can live with the fact of having nested json with no exception. Then, is there a way to achieve this without filebeat? (Waiting for the fix). |
Update: |
I've noticed
AbstractEcsLoggingTest
checks for keys likeprocess.thread.name
while I think the key should be something like `"process": { "thread": { "name": "co.elastic.logging.logback.EcsEncoderTest"}}" instead of something likeThe text was updated successfully, but these errors were encountered: