-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable logsdb for http_logs #646
base: master
Are you sure you want to change the base?
Conversation
* add new parameters for logsdb and data streams * remove index.json * add a composable template supporting standard indices, data streams, and logs data streams * Update README
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM from my side.
} | ||
"operation-type": {{ indexing_operation_type | default("create-index") | tojson }}, | ||
"settings": {%- if index_settings is defined %} {{index_settings | tojson}} {%- else %} { | ||
"index.sort.field": "@timestamp", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the template, not sure what else can be a good sorting field without knowing the data, I think just sorting by timestamp is good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is basically nothing there @timestamp
, message
, clientip
, request
, status
, size
and geoip
. At the end it depends on queries...the idea is that we want to sort to favor query latency. But sorting will also determine effective doc value compression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a result I think we can use this as an example to udnerstand what happens if host.name
is missing. I would just remove the index sorting configuration and rely on defaults. In practice since host.name
is missing it would result in just sorting on @timestamp
(the host.name
field is injected and will be empty). LogsDB handles gracefully the fact that the host.name
field might be missing.
@@ -42,6 +42,8 @@ This track allows to overwrite the following parameters with Rally 0.8.0+ using | |||
* `number_of_shards` (default: 5) | |||
* `source_enabled` (default: true): A boolean defining whether the `_source` field is stored in the index. | |||
* `index_settings`: A list of index settings. Index settings defined elsewhere (e.g. `number_of_replicas`) need to be overridden explicitly. | |||
* `index_mode` (default: unset): Set to `logsdb` to enable indexing to [logs data streams](https://www.elastic.co/guide/en/elasticsearch/reference/master/logs-data-stream.html). If not enabled, Rally will not use logs data streams. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively we could just enable logsdb on the plain index and not use data streams. This would make this change a little bit simpler, since this is an index oriented track?
But I think it is fine to use logsdb with data stream here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL logsdb can be enabled on a plain index. I took from https://www.elastic.co/guide/en/elasticsearch/reference/current/logs-data-stream.html it was required to be a data stream.
I would like to leave data stream support for serverless, though I think combining the template into one may have complicated things a bit, so I am thinking of splitting the template into two files: one for a plain index, the other for data stream. WDYT?
"dynamic": "strict", | ||
"_source": { | ||
"enabled": {{ source_enabled | default(true) | tojson }} | ||
"priority": 101, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to https://www.elastic.co/guide/en/elasticsearch/reference/current/index-templates.html
priority should be 100 I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used 101 to ensure the template took priority over the built-in one for logs-*
. Should it be 100 for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was referring to the fact that the standard ones should have 100...so it should be fine for this to have 101 (and have higher priority). I saw some CI failures saying that there is a template priority issue while merging templates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see
Cannot run task [create-all-templates]: Request returned an error. Error type: api, Description: illegal_argument_exception ({'error': {'root_cause': [{'type': 'illegal_argument_exception', 'reason': 'index template [rally-http_logs] has index patterns [logs-*, reindexed-logs] matching patterns from existing templates [apm-index-template] with patterns (apm-index-template => [traces-apm*, logs-apm*, metrics-apm*]) that have the same priority [101], multiple index templates may not match during index creation, please use a different priority'}], 'type': 'illegal_argument_exception', 'reason': 'index template [rally-http_logs] has index patterns [logs-*, reindexed-logs] matching patterns from existing templates [apm-index-template] with patterns (apm-index-template => [traces-apm*, logs-apm*, metrics-apm*]) that have the same priority [101], multiple index templates may not match during index creation, please use a different priority'}, 'status': 400}), HTTP Status: 400
@inqueue Do you plan to merge this? I'm planning to run http_logs with logsdb. |
Add support for logs data streams (logsdb) and data streams to the http logs track. Logsdb was added in ES 8.15.