Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AWS][Kinesis] Add dimension fields for TSDB support #5891

Merged
merged 3 commits into from
May 4, 2023
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions packages/aws/changelog.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
# newer versions go on top
- version: "1.34.0"
changes:
- description: Add dimension fields to Kinesis datastream.
type: enhancement
link: https://github.com/elastic/integrations/pull/5891
- version: "1.33.3"
changes:
- description: Add number_of_workers and latency to all CloudWatch Logs based integrations.
Expand Down
2 changes: 2 additions & 0 deletions packages/aws/data_stream/kinesis/fields/ecs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
name: cloud
- external: ecs
name: cloud.account.id
dimension: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here was a suggestion to align on list of fields - elastic/ecs#2172 and was suggested to use cloud.project.id. Is this field available for AWS?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't checked, but if we added that as a dimension it would be redundant, as we don't really need it set as a dimension @tetianakravchenko

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we checked with @agithomas that cloud.project.id does not exist for all cloud providers, it only present for gcp.

- external: ecs
name: cloud.account.name
- external: ecs
Expand All @@ -18,6 +19,7 @@
name: cloud.provider
- external: ecs
name: cloud.region
dimension: true
- external: ecs
name: container.image.name
- external: ecs
Expand Down
1 change: 1 addition & 0 deletions packages/aws/data_stream/kinesis/fields/fields.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
type: group
fields:
- name: StreamName
dimension: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add here the reason for adding the specific field as a dimension field.Adding the reason is among the best practices for TSDB enablement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "here" you mean in the manifest @agithomas? It's explained in "Details" in the PR description.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be added as the inline comment. Reference

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have certain thoughts around a better way to handle aws dimensions. We have a length limitation in dimension and AWS permits 30 dimensions

If all 30 names and values are fully used to max limit, the 32KB dimension field length limitation would reach.

Can we have fingerprint processor applied on all aws dimensions and use the new field (having fingerprint) used as a dimension field?

cc @tetianakravchenko , @lalit-satapathy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will try to add it to the document without being too confused. I don't understand the other part though, there are only 3 fields set as dimension, why would we have the need for a processor? @agithomas

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please validate the proposal against

  • how often aws change the dimensions of a managed service
  • feasibility of including ingest pipeline and inclusion of new field only for implementing TSDB.

Copy link
Contributor Author

@constanca-m constanca-m Apr 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't understand: is there a reason to create a new dimension using the dimensions.* field? The 3 fields set to dimension right now should be enough @agithomas

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The advantages are mentioned here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above proposal is based only on the convenience of TSDB. Please compare and choose the best approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agithomas from my understanding, aws dimensions shouldn't be a concern at least for this data_stream, since we set StreamName as a dimension (dimension in TSDB scope) - here is an sample of event.
So we set aws.dimensions.StreamName (the field of type keyword) , not the aws.dimensions.* (the field of type object)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@constanca-m how this name is defined? as I see this name is not set in configuration
Screenshot 2023-04-27 at 09 40 35

what if 2 kinesis data_stream in the same region will be created?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we connect to the AWS, it fetches data from the existent data streams. We don't create any data stream when we add the integration. The stream name is unique per region, and since region is a dimension, it shouldn't be a problem @tetianakravchenko

type: keyword
description: The name of the Kinesis stream. All available statistics are filtered by StreamName.
- name: kinesis.metrics
Expand Down
2 changes: 1 addition & 1 deletion packages/aws/manifest.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
format_version: 1.0.0
name: aws
title: AWS
version: 1.33.3
version: 1.34.0
license: basic
description: Collect logs and metrics from Amazon Web Services with Elastic Agent.
type: integration
Expand Down