-
Notifications
You must be signed in to change notification settings - Fork 419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Stage 0 - TSDB Dimensions #2172
Conversation
Co-authored-by: Eric Beahan <ebeahan@gmail.com>
Co-authored-by: Eric Beahan <ebeahan@gmail.com>
This is quite concerning to me, you are asking us to bump the minimum version of all our packages to 8.7, that means no users of integrations below that version can ever receieve any hotfixes anymore.. |
This RFC does not introduce any such limitations. All packages can continue to choose their preferred version of ECS and ES version. Only if a package needs TSDB, it must satisfy two requirements (1. have the proposed ECS version , 2. ES Stack version 8.7 + ). |
Oh okay, I thought that the package installation would error out, please ignore my comment then! |
@ruflin , @lalit-satapathy , @ebeahan , @P1llus , @martijnvg , can you take a final look at the RFC ? If there are no further question / concerns, please approve the RFC. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems the fields listed in this PR is not aligned with elastic/integrations#5193 (comment) ? Can you share some background on why field were left out and others were selected?
@lalit-satapathy, @ruflin ,can you please help by approving , if there are no further queries or changes? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From an Elasticsearch perspective the changes in this PR look good to me.
Note that I'm not an ecs expert here, so my approval should not be weighted heavily.
I do not have this as a priority in front of me. The packages that are not yet TSDB enabled may be prioritised. Once done, packages that are TSDB enabled can be considered. If this must be changed, let me know. |
@lalit-satapathy , if the Stage -0 proposal looks good to you, can I have your approval? |
|
@ebeahan , Are any more approvals needed before going to Stage -1 of RFC ? From the RFC steps, i understand that the next step in the process is
Is this something that you can help me with ? |
rfcs/0000-tsdb-dimensions.md
Outdated
Host types include hardware, virtual machines, Docker containers, and Kubernetes nodes. | ||
type: group | ||
fields: | ||
- name: name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason why not to consider host.id
? https://www.elastic.co/guide/en/ecs/current/ecs-host.html#field-host-id
as mentioned in description of host.name
:
It can contain what hostname returns on Unix systems, the fully qualified domain name (FQDN), or a name specified by the user. The recommended value is the lowercase FQDN of the host.
since it can be specified by user - it might be not unique.
in combination with other fields dimension should be unique I guess though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proposal as-is will be merged as Stage 0, but this concern is worth discussing in Stage 1+ and capturing. I'll make a note this item is brought up in the next stage's PR.
cc @agithomas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason why not to consider
host.id
? https://www.elastic.co/guide/en/ecs/current/ecs-host.html#field-host-id as mentioned in description ofhost.name
:It can contain what hostname returns on Unix systems, the fully qualified domain name (FQDN), or a name specified by the user. The recommended value is the lowercase FQDN of the host.
since it can be specified by user - it might be not unique.
in combination with other fields dimension should be unique I guess though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason why not to consider
host.id
? https://www.elastic.co/guide/en/ecs/current/ecs-host.html#field-host-id as mentioned in description ofhost.name
:It can contain what hostname returns on Unix systems, the fully qualified domain name (FQDN), or a name specified by the user. The recommended value is the lowercase FQDN of the host.
since it can be specified by user - it might be not unique.
in combination with other fields dimension should be unique I guess though
Based on my discussion with ElasticSearch team, host.name
is preferred over host.ip
. host.ip values change often and hence it is not a preferred dimension field.
host.name
, in combination with other dimension fields provides a unique identifier of the asset / resource getting monitored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please ignore my previous comment, i mis-read it as host.ip
.
I shall go through the host.id
, its values & cardinality and its qualification of it becoming dimension field during RFC-1 phase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tetianakravchenko , @lalit-satapathy ,
It appears that host.id
is not unique enough . Reference
The below scenario i could recreate and validate. I have two VMs in GCP created out of same image. Both share the same host.id
, but they have different host.name
My personal belief and opinion is - When defining a dimension, we may not be concerned of meta data such as host.name
getting changed or replaced. If such a change occur, it may be an intentional change by the infra admin - wanting the replaced asset taking the place of the original asset. I am not sure if there exist a situation where we need to bring the insight to a customer if an IT asset is replaced by another (especially in an on-prem environment). Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agithomas thanks for the investigation and bringing to my attention mentioned issue, I agree that host.id
is not the better option and host.name
should be used instead.
The below scenario i could recreate and validate. I have two VMs in GCP created out of same image. Both share the same
host.id
, but they have differenthost.name
in GCP there is no way to create an instance with an existing in the same project name, so it should be safe to assume that host.name
will be unique within one project in such case
Yes I'll merge. |
Hey, Sorry, I forgot to look for this PR after we talked about this last week. Are these dimensions intended to be imported automatically in packages? If that's the case, I have a concern. Changing the set of dimensions for a time series will create a new time series. For example, if In general I think that we shouldn't automatically include dimensions in packages, given that they are quite specific to each use case. As mentioned in the RFC itself, it is important to select an adequate number of fields as dimensions. In my opinion, package developers should take this decision for each case. It can be great though to document in ECS what fields are good candidates to be considered dimensions. |
Thanks for the feedbacks, can discuss this further (offline if needed?). Do you see an overlap among the 7 defined common ECS fields defined by @agithomas (its now host.name instead of host.ip) |
I think we should. A package dev should not have to think about the complexity of dimensions for standard cases. For example, if someone builds a package to run on its host and adds host.id as dimension, now suddenly it's also run in k8s and it might not work because container.id is missing. There are of course cases like "disk metrics" where additional dimensions are needed and the user has to think about it. @felixbarny You might have an opinion here too, as I think you are thinking in the direction of "most keyword fields" are a dimension.
I had a similar discussion in the past with @martijnvg and if I remember correctly, he wasn't too concern. If we already start with this basic set of dimension, the above should not happen. It only happens when the data changes and then I guess it could be argued, it might be a new time series? But I agree, something we should also discuss with Lens team in more details to not have unexpected results. |
Yes, this is a good point to have something like this, along with the automatic inclusion of full ECS with dynamic mappings (elastic/elastic-package#1073). To support metadata added by autodiscover or processors. I am more concerned about the other way around, data streams that already have enough dimensions, but after upgrading ECS, new time series are created. This would be for example the case of container metrics that can have
It will happen every time a new dimension is added, even if the data doesn't change. This is what I find misleading. Package developers may not be aware that by updating ECS they are creating different time series.
This wouldn't be an issue only with Lens, it will affect any feature that makes use of dimensions (query languages, downsampling...). |
I missed the opt-out problem. Very good point. A user should be able to add ECS but opt out of the magical dimensions. |
I have done a quick check with elastic-package and it won't import dimensions from ECS as it is now. So as we have to do some specific implementation for this we can take this concern into account there. We can maybe have it disabled by default, but enabled in new packages. I am going to open a quick PR to add this concern to the list of concerns, mentioning that it needs to be taken into account by implementations consuming ECS. And regarding the implementation for Elastic Packages we can continue with the discussion on elastic-package/package-spec. Thanks! |
I don't think that creating a new time series would be breaking or confusing in any way. The actual _tsid is an internal implementation detail and changes to it shouldn't leak to to user. If that statement is true, we don't need to be too strict about dimensions we're adding after the fact. The only limiting factor is the dimension limit, which we will hopefully get rid of soon:
I guess for integrations we can be specific about which fields we want to map as dimensions. But for custom metrics ( |
This is actually also a burden for integrations development. It is time well spent in the sense that in theory this is time saved to users. Anything we can do to avoid needing to define mappings in TSDB use cases will be welcome.
If that's the case, then I agree that we shouldn't care about having too many dimensions, but I guess we are not on this situation yet? |
Yes, that's just aspirational.
Sure, the integration mappings need to work out-of-the-box for TSDB with no user configuration. That's a given. What I meant was when we're in this aspirational world where we don't have limits on dimensions, we can choose to not spend too many brain cycles agonizing about what the minimum set of dimensions is. Or we could opt-out instead of opt-in. |
Ok, let's forget about my concern then, and think about opt-in or opt-out approaches on the specific implementations handling these dimensions. Thanks! |
make test
?make
and committed those changes?