-
Notifications
You must be signed in to change notification settings - Fork 456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add some dimensions to the kubernetes integration #2076
Conversation
💚 Build Succeeded
Expand to view the summary
Build stats
Test stats 🧪
🤖 GitHub commentsTo re-run your PR in the CI, just comment with:
|
77031c2
to
1f45022
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks reasonable to me.
Very interesting @jsoriano ! I find your additions sane, I only left one comment/question about the |
f7a91e3
to
88f18e4
Compare
Pinging @elastic/integrations (Team:Integrations) |
I went through all the data streams in the kubernetes integration and marked dimensions accordingly, I am opening this for review. In the proxy data stream I haven't marked any dimension, I was not sure which one to use, should we use one that identifies the node? I wonder if this data stream should collect the Regarding the use of the cluster and node identifiers as dimensions, I may not been very consistent, I have tried to identify the cases where having them as dimensions are more relevant. An extra eye/opinion regarding that would be interesting. |
For proxy maybe https://github.com/elastic/beats/blob/95626b8f1690344312c0831ab2bdcbccffe4d089/metricbeat/module/kubernetes/proxy/proxy.go#L47 or some of those labels like host+handler? Regarding cluster identifier I would avoid it since it's not a field that will always be there, we populate it in a best effort way. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, left just 2 comments:
- what it this
-next
tag for the version - let's try to be consistent on the container's ecs fields and the place we locate them. If we need to move some of them to the ecs specific file we need to do it for all of them.
About the orchestator fields I saw that they are used in combination with node.name
so I think we are just fine if those fields are missing since we can rely on the node.name
? Would that somehow make things tricky if we have several clusters and only some of them have the identifiers (like some one AWS and some on GKE)?
packages/kubernetes/manifest.yml
Outdated
@@ -1,7 +1,7 @@ | |||
format_version: 1.0.0 | |||
name: kubernetes | |||
title: Kubernetes | |||
version: 1.4.1 | |||
version: 1.4.2-next |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you provide more background in what special this release has? It's first time I'm seeing this and wondering what is the difference and if it would require any special handling in the release process (promote etc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With semver, any suffix added this way is considered a prerelease. With this we can accumulate changes for snapshot packages, and can decide later to release 1.4.2
(or 1.5.0
).
This is something supported in the spec since elastic/package-spec#193.
In principle a "prerelease" version shouldn't be used in production environments, and shouldn't be promoted to production. We are discussing more about this in elastic/package-spec#225
But I can tag a proper release and promote it if prefered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with both options, however if we have new releases while this one is still a pre-lease then we will have to manually make it an actual release by pushing directly to the registry, right?
I added them to have a way to distinguish nodes with the same name on different pods. It is ok if it is empty, it will only not be meaningful to identify timeseries. If a user needs to distinguish resources between clusters that don't provide this info automatically, they will need to manually set these fields. I wonder if they should be dimensions in all cases, many resources can have the same name on different clusters and not all of them have randomized uids. |
Well, thinking again about the orchestrator fields, I guess that it depends on how this data is used. If data from different clusters (that cannot be identified otherwise) is shown together, then I think that it will make sense. Otherwise, users can still filter their data per cluster. I am going to remove it. |
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good! Not sure if CI failure is relevant.
Waiting for #2177, that will include checks for the dimensions fields. |
All good with CI after including the checks for dimensions in |
On time series databases, dimensions are the set of values that identifies a single time series. For efficiency this set should be minimal, but enough to avoid mixing unrelated data.
It is important to select a good set of dimensions, because changes on them would be breaking.
As an initial exercise before starting to use it more broadly in integrations, I have selected the dimensions for some kubernetes data stream:
In other data streams I have set as dimensions the identifiers related to the monitored resource, and the namespace if the resource is at the namespace level.
I wonder if we should also add cluster identifiers as dimensions. On one side there can be pods with the same name and same namespace on different clusters, but on the other side I wonder how common this is, they usually have randomized suffixes.
@ChrsMark can you think of other fields that would make sense having as dimensions?
@nik9000 could you please take a look here to see if the selected dimensions make sense?
Related links: