Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving Elasticsearch response logs #119062

Closed
Tracked by #134169
mshustov opened this issue Nov 18, 2021 · 6 comments
Closed
Tracked by #134169

Improving Elasticsearch response logs #119062

mshustov opened this issue Nov 18, 2021 · 6 comments
Labels
enhancement New value added to drive a business result Feature:elasticsearch Feature:Logging impact:needs-assessment Product and/or Engineering needs to evaluate the impact of the change. loe:small Small Level of Effort Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc

Comments

@mshustov
Copy link
Contributor

Problem

Whenever the Elasicsearch client receives a response, Kibana emits logs in the following format:
layout.type: 'json'

{
   "ecs": { "version": ... },
   "@timestamp": "2021-11-18T17:43:02.452+01:00",
   "message":"200\nPOST /.kibana_task_manager/_update_by_query?ignore_unavailable=true&refresh=true\n{\"query\":{\"bool\":{\"must\”:[…."
   "log":{
      "level":"DEBUG",
      "logger":"elasticsearch.query.data"
   },
   "process": ...,
   "span": ...,
   "trace": ...

layout.type: 'pattern'

[2021-11-18T17:35:38.820+01:00][DEBUG][elasticsearch.query.data] 200
POST /.kibana_task_manager/_update_by_query?ignore_unavailable=true&refresh=true
{"query":{"bool":{"must":[

As you can see, it's hard to use these logs when investigation any performance-related problems because the logs do not contain information about how long the request lasted, nor about the size of the response received.

Compare with http.server.response logs:
layout.type: 'json'

{
   "http":{
      "request":{
         "method":"GET",
         ...
      },
      "response":{
         "body":{
            "bytes":0
         },
         "status_code":304,
         "responseTime":4
      }
   }

layout.type: 'pattern'

[2021-11-18T17:54:11.909+01:00][DEBUG][http.server.response] GET /9007199254740991/bundles/plugin/ml/8.0.0/ml.plugin.js 304 23ms

Solution

Response time

Unforatuntely, elasticsearch-js client doesn't seem to measure how long the response took. I'm erring on the side of adding this logic to the elasticsearch-js client since it manages the underlying connections. If @delvedor agrees, I will open an issue.

Log format

layout.type: 'pattern'
The current format of the logs with pattern layout is influenced by DevTools format to make the statusCode \n method path \n query request params pasteable into Kibana DevTools.
To ensure backward compatibility, we can add response time and response length to the very first line next to statusCode, which is not part of DevTools either. So the result will look like this:
layout.type: 'pattern'

[2021-11-18T17:35:38.820+01:00][DEBUG][elasticsearch.query.data] 200 3kB 123ms
POST /.kibana_task_manager/_update_by_query?ignore_unavailable=true&refresh=true
{"query":{"bool":{"must":[

JSON format is a bit more complicated. Elasticsearch response logs do not confirm ECS HTTP nor responseTime is defined in ECS HTTP. We can start with keeping the response metadata in message field.
layout.type: 'json'

{
   "ecs": { "version": ... },
   "@timestamp": "2021-11-18T17:43:02.452+01:00",
   "message":"200 3kB 123ms\nPOST /.kibana_task_manager/_update_by_query?ignore_unavailable=true&refresh=true\n{\"query\":{\"bool\":{\"must\”:[…."
   "log":{
      "level":"DEBUG",
      "logger":"elasticsearch.query.data"
   },
   "process": ...,
   "span": ...,
   "trace": ...

cc @pmuellr @kobelb to the question of improving the format of the logs to investigate Kibana performance problems.

@mshustov mshustov added Feature:elasticsearch Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc enhancement New value added to drive a business result Feature:Logging labels Nov 18, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

@pmuellr
Copy link
Member

pmuellr commented Nov 18, 2021

Moving in a good direction, I think! Thx!

I'd need to look at some cloud ES proxy logs (between Kibana -> ES), but I believe it's difficult to correlate entries in that log, and the Kibana log, and it would be nice to be able to do that. I think we try to correlate by time right now. Would be easier if ES proxy logs included the query string params, but perhaps that's a big ask. Would x-opaque-id or similar be available in both logs?

Elasticsearch response logs do not confirm ECS HTTP nor responseTime is defined in ECS HTTP. We can start with keeping the response metadata in message field.

We "extend" ECS in the event log with a top-level kibana field, with a bunch of Kibana-related stuff. We could go the same route here. But we should check with the ECS team (if we haven't already) about the stuff we want to add and ask for some advice - perhaps there are already some RFCs that would be appropriate here.

@mshustov
Copy link
Contributor Author

mshustov commented Nov 19, 2021

I think we try to correlate by time right now. Would be easier if ES proxy logs included the query string params, but perhaps that's a big ask. Would x-opaque-id or similar be available in both logs?

Kibana will attach tracing fields to the log records in ECS-JSON format from v8.1. The whole purpose of adding these fields is to support log correlation.
The next logical step is to talk to the Cloud team about whether it's possible for Cloud proxy logs to contain tracing fields
and to figure out if any additional work is required to access tracing fields in Cloud UI. Ideally, we should be able to perform log correlation across the Stack and Cloud infrastructure in a single place in Kibana UI.
All these improvements seem to be out of the scope of the current task, but I can create an issue for the Cloud team to discuss the matter. @felixbarny maybe you already have plans to integrate APM more deeply into the Cloud setup?

We "extend" ECS in the event log with a top-level kibana field, with a bunch of Kibana-related stuff.

Yeah, we do it as well. As you can see in the case of http.server.response, it contains a custom http.response. responseTime field. Agree, we can talk to the ECS team about how to approach the problem since I don't want to introduce custom fields that might complicate maintenance in the long term.

@felixbarny
Copy link
Member

Log correlation with cloud proxy logs would be fantastic. I'm not aware of any initiatives towards that, however. I guess @elastic/cloud-observability would know more.

@exalate-issue-sync exalate-issue-sync bot added impact:needs-assessment Product and/or Engineering needs to evaluate the impact of the change. loe:small Small Level of Effort labels Dec 1, 2021
@lucasmoore
Copy link

Log correlation with cloud proxy logs would be fantastic. I'm not aware of any initiatives towards that, however. I guess @elastic/cloud-observability would know more.

I agree that would be great, but I'm not aware of any planned work for that. The actual changes to the proxies themselves would be done by another team, and cloud observability would ensure that we properly parse and ingest the updated logs.

@pgayvallet
Copy link
Contributor

In the infrastructures where it matters, there are other ways to achieve this (APM, proxy logs). Closing.

@pgayvallet pgayvallet closed this as not planned Won't fix, can't repro, duplicate, stale Jul 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New value added to drive a business result Feature:elasticsearch Feature:Logging impact:needs-assessment Product and/or Engineering needs to evaluate the impact of the change. loe:small Small Level of Effort Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc
Projects
None yet
Development

No branches or pull requests

6 participants