Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AWS][Kinesis] Add dimension fields for TSDB support #5891

Merged
merged 3 commits into from
May 4, 2023

Conversation

constanca-m
Copy link
Contributor

@constanca-m constanca-m commented Apr 17, 2023

What does this PR do?

Add dimension fields to Kinesis datastream.

Details

To uniquely identify a Kinesis stream, we need the combination of stream name (unique per AWS region) + account ID + account region. There are no metrics split by labels, so no more dimensions should be needed. The tests with TSDB enabled and disabled did not show a change on the number of documents received.

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.

How to test this PR locally

Refer to #5864

Related issues

Signed-off-by: constanca-m <constanca.manteigas@elastic.co>
@constanca-m constanca-m added enhancement New feature or request Integration:aws AWS labels Apr 17, 2023
@constanca-m constanca-m self-assigned this Apr 17, 2023
@constanca-m constanca-m requested a review from a team as a code owner April 17, 2023 07:21
Signed-off-by: constanca-m <constanca.manteigas@elastic.co>
@constanca-m constanca-m mentioned this pull request Apr 17, 2023
5 tasks
@elasticmachine
Copy link

elasticmachine commented Apr 17, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-05-04T07:06:25.496+0000

  • Duration: 52 min 53 sec

Test stats 🧪

Test Results
Failed 0
Passed 188
Skipped 4
Total 192

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

@elasticmachine
Copy link

elasticmachine commented Apr 17, 2023

🌐 Coverage report

Name Metrics % (covered/total) Diff
Packages 100.0% (15/15) 💚
Files 93.75% (15/16) 👎 -3.421
Classes 93.75% (15/16) 👎 -3.421
Methods 86.131% (236/274) 👎 -6.502
Lines 85.925% (7387/8597) 👎 -6.45
Conditionals 100.0% (0/0) 💚

Copy link
Contributor

@agithomas agithomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review feedback shared.

@@ -5,6 +5,7 @@
type: group
fields:
- name: StreamName
dimension: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add here the reason for adding the specific field as a dimension field.Adding the reason is among the best practices for TSDB enablement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "here" you mean in the manifest @agithomas? It's explained in "Details" in the PR description.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be added as the inline comment. Reference

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have certain thoughts around a better way to handle aws dimensions. We have a length limitation in dimension and AWS permits 30 dimensions

If all 30 names and values are fully used to max limit, the 32KB dimension field length limitation would reach.

Can we have fingerprint processor applied on all aws dimensions and use the new field (having fingerprint) used as a dimension field?

cc @tetianakravchenko , @lalit-satapathy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will try to add it to the document without being too confused. I don't understand the other part though, there are only 3 fields set as dimension, why would we have the need for a processor? @agithomas

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please validate the proposal against

  • how often aws change the dimensions of a managed service
  • feasibility of including ingest pipeline and inclusion of new field only for implementing TSDB.

Copy link
Contributor Author

@constanca-m constanca-m Apr 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't understand: is there a reason to create a new dimension using the dimensions.* field? The 3 fields set to dimension right now should be enough @agithomas

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The advantages are mentioned here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above proposal is based only on the convenience of TSDB. Please compare and choose the best approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agithomas from my understanding, aws dimensions shouldn't be a concern at least for this data_stream, since we set StreamName as a dimension (dimension in TSDB scope) - here is an sample of event.
So we set aws.dimensions.StreamName (the field of type keyword) , not the aws.dimensions.* (the field of type object)

@@ -2,6 +2,7 @@
name: cloud
- external: ecs
name: cloud.account.id
dimension: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here was a suggestion to align on list of fields - elastic/ecs#2172 and was suggested to use cloud.project.id. Is this field available for AWS?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't checked, but if we added that as a dimension it would be redundant, as we don't really need it set as a dimension @tetianakravchenko

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we checked with @agithomas that cloud.project.id does not exist for all cloud providers, it only present for gcp.

@@ -5,6 +5,7 @@
type: group
fields:
- name: StreamName
dimension: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@constanca-m how this name is defined? as I see this name is not set in configuration
Screenshot 2023-04-27 at 09 40 35

what if 2 kinesis data_stream in the same region will be created?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we connect to the AWS, it fetches data from the existent data streams. We don't create any data stream when we add the integration. The stream name is unique per region, and since region is a dimension, it shouldn't be a problem @tetianakravchenko

Copy link
Contributor

@tetianakravchenko tetianakravchenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As agreed: it might be needed to add agent.id field as a dimension in future, for now keeping as is.

@agithomas
Copy link
Contributor

@constanca-m , please confirm if the below tests are conducted ?

Verification and validation

  • Verification of data in visualisation after enabling TSDB flag in kibana
  • Verification of the count of documents (before & after TSDB enablement) in Discover Interface
  • Verify if field mapping is correct in the data stream template.

@constanca-m
Copy link
Contributor Author

@constanca-m , please confirm if the below tests are conducted ?

Verification and validation

* [ ]  Verification of data in visualisation after enabling TSDB flag in kibana

* [ ]  Verification of the count of documents (before & after TSDB enablement) in Discover Interface

* [ ]  Verify if field mapping is correct in the data stream template.

Yes, for dimensions they are correct. I didn't check the tasks because they are influenced by the two PRs, one for dimensions and one for metrics. @agithomas

@agithomas
Copy link
Contributor

@constanca-m , please confirm if the below tests are conducted ?

Verification and validation

* [ ]  Verification of data in visualisation after enabling TSDB flag in kibana

* [ ]  Verification of the count of documents (before & after TSDB enablement) in Discover Interface

* [ ]  Verify if field mapping is correct in the data stream template.

Yes, for dimensions they are correct. I didn't check the tasks because they are influenced by the two PRs, one for dimensions and one for metrics. @agithomas

Ok, Thanks. Please include this validation for metric_type mapping PR.

Copy link
Contributor

@agithomas agithomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@constanca-m constanca-m merged commit a904bba into elastic:main May 4, 2023
@constanca-m constanca-m deleted the tsdb-dimensions-kinesis branch May 4, 2023 08:09
@elasticmachine
Copy link

Package aws - 1.34.2 containing this change is available at https://epr.elastic.co/search?package=aws

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Integration:aws AWS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants