Add some dimensions to the kubernetes integration #2076

jsoriano · 2021-11-01T18:30:38Z

On time series databases, dimensions are the set of values that identifies a single time series. For efficiency this set should be minimal, but enough to avoid mixing unrelated data.
It is important to select a good set of dimensions, because changes on them would be breaking.

As an initial exercise before starting to use it more broadly in integrations, I have selected the dimensions for some kubernetes data stream:

For pods: the pod uid, the name and the namespace. The uid could be actually enough, but I have found that it is not present for all pods.
For containers: same as for pods, plus the container name.
For apiserver: most things that can be different between requests. I wonder if we should select all things that can be different between requests.

In other data streams I have set as dimensions the identifiers related to the monitored resource, and the namespace if the resource is at the namespace level.

I wonder if we should also add cluster identifiers as dimensions. On one side there can be pods with the same name and same namespace on different clusters, but on the other side I wonder how common this is, they usually have randomized suffixes.

@ChrsMark can you think of other fields that would make sense having as dimensions?

@nik9000 could you please take a look here to see if the selected dimensions make sense?

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Start Time: 2021-11-19T11:17:15.903+0000
Duration: 31 min 10 sec
Commit: 8f082c9

Test stats 🧪

Test	Results
Failed	0
Passed	114
Skipped	0
Total	114

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

/test : Re-trigger the build.

imotov

It looks reasonable to me.

packages/kubernetes/data_stream/container/fields/base-fields.yml

ChrsMark · 2021-11-02T15:39:34Z

Very interesting @jsoriano ! I find your additions sane, I only left one comment/question about the container one.

elasticmachine · 2021-11-15T15:41:39Z

Pinging @elastic/integrations (Team:Integrations)

jsoriano · 2021-11-15T15:45:20Z

I went through all the data streams in the kubernetes integration and marked dimensions accordingly, I am opening this for review.

In the proxy data stream I haven't marked any dimension, I was not sure which one to use, should we use one that identifies the node? I wonder if this data stream should collect the kubernetes.node.name then. @ChrsMark wdyt?

Regarding the use of the cluster and node identifiers as dimensions, I may not been very consistent, I have tried to identify the cases where having them as dimensions are more relevant. An extra eye/opinion regarding that would be interesting.

ChrsMark · 2021-11-16T09:55:33Z

For proxy maybe https://github.com/elastic/beats/blob/95626b8f1690344312c0831ab2bdcbccffe4d089/metricbeat/module/kubernetes/proxy/proxy.go#L47 or some of those labels like host+handler?

Regarding cluster identifier I would avoid it since it's not a field that will always be there, we populate it in a best effort way.

ChrsMark

Looks good to me, left just 2 comments:

what it this -next tag for the version
let's try to be consistent on the container's ecs fields and the place we locate them. If we need to move some of them to the ecs specific file we need to do it for all of them.

About the orchestator fields I saw that they are used in combination with node.name so I think we are just fine if those fields are missing since we can rely on the node.name? Would that somehow make things tricky if we have several clusters and only some of them have the identifiers (like some one AWS and some on GKE)?

packages/kubernetes/data_stream/container/fields/ecs.yml

ChrsMark · 2021-11-16T10:00:48Z

packages/kubernetes/manifest.yml

@@ -1,7 +1,7 @@
 format_version: 1.0.0
 name: kubernetes
 title: Kubernetes
-version: 1.4.1
+version: 1.4.2-next


Could you provide more background in what special this release has? It's first time I'm seeing this and wondering what is the difference and if it would require any special handling in the release process (promote etc).

With semver, any suffix added this way is considered a prerelease. With this we can accumulate changes for snapshot packages, and can decide later to release 1.4.2 (or 1.5.0).
This is something supported in the spec since elastic/package-spec#193.
In principle a "prerelease" version shouldn't be used in production environments, and shouldn't be promoted to production. We are discussing more about this in elastic/package-spec#225

But I can tag a proper release and promote it if prefered.

I'm fine with both options, however if we have new releases while this one is still a pre-lease then we will have to manually make it an actual release by pushing directly to the registry, right?

jsoriano · 2021-11-16T13:03:52Z

About the orchestator fields I saw that they are used in combination with node.name so I think we are just fine if those fields are missing since we can rely on the node.name? Would that somehow make things tricky if we have several clusters and only some of them have the identifiers (like some one AWS and some on GKE)?

I added them to have a way to distinguish nodes with the same name on different pods. It is ok if it is empty, it will only not be meaningful to identify timeseries. If a user needs to distinguish resources between clusters that don't provide this info automatically, they will need to manually set these fields.

I wonder if they should be dimensions in all cases, many resources can have the same name on different clusters and not all of them have randomized uids.

jsoriano · 2021-11-16T13:18:43Z

Well, thinking again about the orchestrator fields, I guess that it depends on how this data is used. If data from different clusters (that cannot be identified otherwise) is shown together, then I think that it will make sense. Otherwise, users can still filter their data per cluster. I am going to remove it.

jsoriano · 2021-11-16T18:34:53Z

/test

ChrsMark

Changes look good! Not sure if CI failure is relevant.

jsoriano · 2021-11-18T17:48:49Z

Waiting for #2177, that will include checks for the dimensions fields.

jsoriano · 2021-11-22T10:38:14Z

All good with CI after including the checks for dimensions in elastic-package, merging, thanks!

jsoriano added Team:Integrations Label for the Integrations team Team:Ecosystem Label for the Packages Ecosystem team [elastic/ecosystem] labels Nov 1, 2021

jsoriano requested review from nik9000 and ChrsMark November 1, 2021 18:30

jsoriano self-assigned this Nov 1, 2021

nik9000 reviewed Nov 1, 2021

View reviewed changes

packages/kubernetes/data_stream/container/fields/base-fields.yml Show resolved Hide resolved

jsoriano force-pushed the kubernetes-dimensions branch from 77031c2 to 1f45022 Compare November 1, 2021 19:00

imotov reviewed Nov 1, 2021

View reviewed changes

packages/kubernetes/data_stream/container/fields/base-fields.yml Show resolved Hide resolved

ChrsMark reviewed Nov 2, 2021

View reviewed changes

packages/kubernetes/data_stream/container/fields/base-fields.yml Show resolved Hide resolved

jsoriano mentioned this pull request Nov 9, 2021

[Change Proposal] Support external fields as dimensions elastic/package-spec#247

Closed

2 tasks

jsoriano added 6 commits November 15, 2021 16:03

Add some dimensions to kubernetes integration

9fdba0e

Use container.id as dimension too

61efffe

container_logs

682f889

event, controller manager

7b66939

Node

61022f5

More dimensions

88f18e4

jsoriano force-pushed the kubernetes-dimensions branch from f7a91e3 to 88f18e4 Compare November 15, 2021 15:40

jsoriano marked this pull request as ready for review November 15, 2021 15:41

jsoriano requested a review from ChrsMark November 15, 2021 15:45

Fix fmt

834d9cb

ChrsMark reviewed Nov 16, 2021

View reviewed changes

jsoriano added 3 commits November 16, 2021 14:05

Merge remote-tracking branch 'origin/master' into kubernetes-dimensions

842dda7

Don't move container fields to ecs.yml

b0289bc

Use a stable version

7645308

jsoriano added 2 commits November 16, 2021 14:14

Add dimensions to proxy data stream

fbdb974

Remove orchestrator fields as dimensions

51a374b

jsoriano requested a review from ChrsMark November 16, 2021 13:19

jsoriano added 2 commits November 16, 2021 14:20

Revert field level changed by mistake

7cdaac2

Recover wrong lines removed

f753d85

ChrsMark approved these changes Nov 17, 2021

View reviewed changes

Merge remote-tracking branch 'origin/master' into kubernetes-dimensions

8f082c9

jsoriano merged commit f58f321 into elastic:master Nov 22, 2021

jsoriano deleted the kubernetes-dimensions branch November 22, 2021 10:38

eyalkraft pushed a commit to build-security/integrations that referenced this pull request Mar 30, 2022

Add dimensions to the kubernetes integration (elastic#2076)

c1a24ab

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add some dimensions to the kubernetes integration #2076

Add some dimensions to the kubernetes integration #2076

jsoriano commented Nov 1, 2021 •

edited

Loading

elasticmachine commented Nov 1, 2021 •

edited

Loading

Build stats

Test stats 🧪

imotov left a comment

ChrsMark commented Nov 2, 2021

elasticmachine commented Nov 15, 2021

jsoriano commented Nov 15, 2021

ChrsMark commented Nov 16, 2021

ChrsMark left a comment

ChrsMark Nov 16, 2021

jsoriano Nov 16, 2021

ChrsMark Nov 17, 2021

jsoriano commented Nov 16, 2021

jsoriano commented Nov 16, 2021 •

edited

Loading

jsoriano commented Nov 16, 2021

ChrsMark left a comment

jsoriano commented Nov 18, 2021

jsoriano commented Nov 22, 2021

Add some dimensions to the kubernetes integration #2076

Add some dimensions to the kubernetes integration #2076

Conversation

jsoriano commented Nov 1, 2021 • edited Loading

elasticmachine commented Nov 1, 2021 • edited Loading

💚 Build Succeeded

Build stats

Test stats 🧪

🤖 GitHub comments

imotov left a comment

Choose a reason for hiding this comment

ChrsMark commented Nov 2, 2021

elasticmachine commented Nov 15, 2021

jsoriano commented Nov 15, 2021

ChrsMark commented Nov 16, 2021

ChrsMark left a comment

Choose a reason for hiding this comment

ChrsMark Nov 16, 2021

Choose a reason for hiding this comment

jsoriano Nov 16, 2021

Choose a reason for hiding this comment

ChrsMark Nov 17, 2021

Choose a reason for hiding this comment

jsoriano commented Nov 16, 2021

jsoriano commented Nov 16, 2021 • edited Loading

jsoriano commented Nov 16, 2021

ChrsMark left a comment

Choose a reason for hiding this comment

jsoriano commented Nov 18, 2021

jsoriano commented Nov 22, 2021

jsoriano commented Nov 1, 2021 •

edited

Loading

elasticmachine commented Nov 1, 2021 •

edited

Loading

jsoriano commented Nov 16, 2021 •

edited

Loading