Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AWS] improve S3 input states copy by only storing filtered entries #41869

Conversation

Kavindu-Dodan
Copy link
Contributor

@Kavindu-Dodan Kavindu-Dodan commented Dec 3, 2024

Proposed commit message

Improves S3 polling mode state registry copy by considering the bucket_list_prefix usage. Before this change, input stored all registry entries loaded from the underlying registry (ex:- when storing, restarting beats or upgrading while pointing to the same registry). With this improvement, when bucket_list_prefix is used, the state registry copy only holds entries matching the given prefix.

image

This improvement benefits the state registry clean-up planned through #41694. Further, when beats restart, this change improves the input-specific state copy memory usage by only storing relevant entries, matching the configured prefix.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Disruptive User Impact

None - this change only concerns the input-specific copy of the registry entries

How to test this PR locally

Requires a build and S3 bucket entries with prefixes.

  • Generate a mix of S3 bucket entries with prefixes and without. You may use data gen tool 1
  • Build filebeat from this branch & configure multiple inputs with prefixes
  • Observe state registry filling at startup. You may first run without a prefix to store all entries and later restart beats to observe the loading behavior

Related issues

#39116

Footnotes

  1. https://github.com/Kavindu-Dodan/data-gen

@Kavindu-Dodan Kavindu-Dodan requested a review from a team as a code owner December 3, 2024 17:26
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Dec 3, 2024
@Kavindu-Dodan Kavindu-Dodan added the Team:obs-ds-hosted-services Label for the Observability Hosted Services team label Dec 3, 2024
@elasticmachine
Copy link
Collaborator

Pinging @elastic/obs-ds-hosted-services (Team:obs-ds-hosted-services)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Dec 3, 2024
@Kavindu-Dodan Kavindu-Dodan added enhancement needs_team Indicates that the issue/PR needs a Team:* label backport-8.x Automated backport to the 8.x branch with mergify and removed needs_team Indicates that the issue/PR needs a Team:* label labels Dec 3, 2024
@Kavindu-Dodan Kavindu-Dodan changed the title [AWS] improve S3 registry states copy by only storing filtered entries [AWS] improve S3 input states copy by only storing filtered entries Dec 3, 2024
@Kavindu-Dodan Kavindu-Dodan force-pushed the chore/improve-s3-store-when-using-prefix branch from 20a8edf to e6296f6 Compare December 3, 2024 18:41
Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co>
Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co>
@Kavindu-Dodan Kavindu-Dodan force-pushed the chore/improve-s3-store-when-using-prefix branch from e6296f6 to f128cf0 Compare December 3, 2024 18:47
Copy link
Contributor

@kaiyan-sheng kaiyan-sheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks good. Just one question: if we have two s3 input running, one with prefix A and one without any prefix. Will the state be missing entries?

@Kavindu-Dodan
Copy link
Contributor Author

The change looks good. Just one question: if we have two s3 input running, one with prefix A and one without any prefix. Will the state be missing entries?

No, there won't be missing entries. The prefixed input store will store and handle entries with prefixes. The non-prefixed input will maintain all entries, including prefixed ones. This holds even when restarting/upgrading, pointing to the same registry.

@Kavindu-Dodan Kavindu-Dodan merged commit 91070bf into elastic:main Dec 3, 2024
22 checks passed
mergify bot pushed a commit that referenced this pull request Dec 3, 2024
…41869)

* s3 state imporvement with prefix filtering

Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co>

* add changelog entry

Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co>

---------

Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co>
(cherry picked from commit 91070bf)
Kavindu-Dodan added a commit that referenced this pull request Dec 4, 2024
…41869) (#41883)

* s3 state imporvement with prefix filtering

Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co>

* add changelog entry

Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co>

---------

Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co>
(cherry picked from commit 91070bf)

Co-authored-by: Kavindu Dodanduwa <Kavindu-Dodan@users.noreply.github.com>
@Kavindu-Dodan Kavindu-Dodan added the backport-8.16 Automated backport with mergify label Dec 5, 2024
mergify bot pushed a commit that referenced this pull request Dec 5, 2024
…41869)

* s3 state imporvement with prefix filtering

Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co>

* add changelog entry

Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co>

---------

Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co>
(cherry picked from commit 91070bf)
@Kavindu-Dodan Kavindu-Dodan added the backport-8.17 Automated backport with mergify label Dec 5, 2024
mergify bot pushed a commit that referenced this pull request Dec 5, 2024
…41869)

* s3 state imporvement with prefix filtering

Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co>

* add changelog entry

Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co>

---------

Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co>
(cherry picked from commit 91070bf)
Kavindu-Dodan added a commit that referenced this pull request Dec 6, 2024
…oring filtered entries (#41922)

* [AWS] improve S3 input states copy by only storing filtered entries (#41869)

* s3 state imporvement with prefix filtering

Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co>

* add changelog entry

Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co>

---------

Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co>
(cherry picked from commit 91070bf)

* Update CHANGELOG.next.asciidoc

---------

Co-authored-by: Kavindu Dodanduwa <Kavindu-Dodan@users.noreply.github.com>
Kavindu-Dodan added a commit that referenced this pull request Dec 6, 2024
…oring filtered entries (#41921)

* [AWS] improve S3 input states copy by only storing filtered entries (#41869)

* s3 state imporvement with prefix filtering

Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co>

* add changelog entry

Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co>

---------

Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co>
(cherry picked from commit 91070bf)

* Update CHANGELOG.next.asciidoc

---------

Co-authored-by: Kavindu Dodanduwa <Kavindu-Dodan@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.x Automated backport to the 8.x branch with mergify backport-8.16 Automated backport with mergify backport-8.17 Automated backport with mergify enhancement Team:obs-ds-hosted-services Label for the Observability Hosted Services team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants