-
Notifications
You must be signed in to change notification settings - Fork 456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AWS][Kinesis] Add dimension fields for TSDB support #5891
Conversation
Signed-off-by: constanca-m <constanca.manteigas@elastic.co>
Signed-off-by: constanca-m <constanca.manteigas@elastic.co>
🌐 Coverage report
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review feedback shared.
@@ -5,6 +5,7 @@ | |||
type: group | |||
fields: | |||
- name: StreamName | |||
dimension: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add here the reason for adding the specific field as a dimension field.Adding the reason is among the best practices for TSDB enablement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By "here" you mean in the manifest @agithomas? It's explained in "Details" in the PR description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be added as the inline comment. Reference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have certain thoughts around a better way to handle aws dimensions. We have a length limitation in dimension and AWS permits 30 dimensions
If all 30 names and values are fully used to max limit, the 32KB dimension field length limitation would reach.
Can we have fingerprint processor applied on all aws dimensions and use the new field (having fingerprint) used as a dimension field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I will try to add it to the document without being too confused. I don't understand the other part though, there are only 3 fields set as dimension, why would we have the need for a processor? @agithomas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please validate the proposal against
- how often aws change the dimensions of a managed service
- feasibility of including ingest pipeline and inclusion of new field only for implementing TSDB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I don't understand: is there a reason to create a new dimension using the dimensions.*
field? The 3 fields set to dimension right now should be enough @agithomas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The advantages are mentioned here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above proposal is based only on the convenience of TSDB. Please compare and choose the best approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agithomas from my understanding, aws dimensions shouldn't be a concern at least for this data_stream, since we set StreamName
as a dimension
(dimension
in TSDB scope) - here is an sample of event.
So we set aws.dimensions.StreamName
(the field of type keyword
) , not the aws.dimensions.*
(the field of type object
)
@@ -2,6 +2,7 @@ | |||
name: cloud | |||
- external: ecs | |||
name: cloud.account.id | |||
dimension: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here was a suggestion to align on list of fields - elastic/ecs#2172 and was suggested to use cloud.project.id
. Is this field available for AWS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't checked, but if we added that as a dimension it would be redundant, as we don't really need it set as a dimension @tetianakravchenko
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we checked with @agithomas that cloud.project.id
does not exist for all cloud providers, it only present for gcp.
@@ -5,6 +5,7 @@ | |||
type: group | |||
fields: | |||
- name: StreamName | |||
dimension: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@constanca-m how this name is defined? as I see this name is not set in configuration
what if 2 kinesis data_stream in the same region will be created?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we connect to the AWS, it fetches data from the existent data streams. We don't create any data stream when we add the integration. The stream name is unique per region, and since region is a dimension, it shouldn't be a problem @tetianakravchenko
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As agreed: it might be needed to add agent.id
field as a dimension in future, for now keeping as is.
@constanca-m , please confirm if the below tests are conducted ? Verification and validation
|
Yes, for dimensions they are correct. I didn't check the tasks because they are influenced by the two PRs, one for dimensions and one for metrics. @agithomas |
Ok, Thanks. Please include this validation for metric_type mapping PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Package aws - 1.34.2 containing this change is available at https://epr.elastic.co/search?package=aws |
What does this PR do?
Add dimension fields to Kinesis datastream.
Details
To uniquely identify a Kinesis stream, we need the combination of stream name (unique per AWS region) + account ID + account region. There are no metrics split by labels, so no more dimensions should be needed. The tests with TSDB enabled and disabled did not show a change on the number of documents received.
Checklist
changelog.yml
file.How to test this PR locally
Refer to #5864
Related issues