Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CORE-6992] cloud_storage_clients/abs_client: use item filter in list_objects #24439

Merged
merged 2 commits into from
Dec 5, 2024

Conversation

andrwng
Copy link
Contributor

@andrwng andrwng commented Dec 4, 2024

This seems like an oversight. Noticed in an ABS run of a test:

INFO  2024-11-12 02:16:37,966 [shard 1:main] cloud_storage - topic_manifest_downloader.cc:102 - Labeled topic manifest download resulted in 2 matching manifests with prefix '', printing first 2
INFO  2024-11-12 02:16:37,966 [shard 1:main] cloud_storage - topic_manifest_downloader.cc:112 - Match for hint '': meta/kafka/topic-a/83123841-40c1-47e4-92ed-5ac8d1f536c7/8/topic_manifest.bin
INFO  2024-11-12 02:16:37,966 [shard 1:main] cloud_storage - topic_manifest_downloader.cc:112 - Match for hint '': meta/kafka/topic-a/eb0359d1-a0fb-45aa-8995-d938a34caeef/8_lifecycle.bin
WARN  2024-11-12 02:16:37,966 [shard 1:main] kafka - create_topics.cc:185 - Failed to create topic(s) {{kafka/topic-a}} error_code observed: cluster::errc::topic_operation_error

In this snippet, the list results contain a lifecycle marker, despite
the filter passed in during manifest download being
ends_with("topic_manifest.bin").

Auditing the codebase a bit, there don't appear to be any other usages
of list_objects() with a filter.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.3.x
  • v24.2.x
  • v24.1.x

Release Notes

Bug Fixes

  • Fixes a bug that could prevent topic recovery on ABS object storage when there are objects in a bucket from multiple clusters (e.g. following a whole cluster restore).

This seems like an oversight. Noticed in an ABS run of a test:

```
INFO  2024-11-12 02:16:37,966 [shard 1:main] cloud_storage - topic_manifest_downloader.cc:102 - Labeled topic manifest download resulted in 2 matching manifests with prefix '', printing first 2
INFO  2024-11-12 02:16:37,966 [shard 1:main] cloud_storage - topic_manifest_downloader.cc:112 - Match for hint '': meta/kafka/topic-a/83123841-40c1-47e4-92ed-5ac8d1f536c7/8/topic_manifest.bin
INFO  2024-11-12 02:16:37,966 [shard 1:main] cloud_storage - topic_manifest_downloader.cc:112 - Match for hint '': meta/kafka/topic-a/eb0359d1-a0fb-45aa-8995-d938a34caeef/8_lifecycle.bin
WARN  2024-11-12 02:16:37,966 [shard 1:main] kafka - create_topics.cc:185 - Failed to create topic(s) {{kafka/topic-a}} error_code observed: cluster::errc::topic_operation_error
```

In this snippet, the list results contain a lifecycle marker, despite
the filter passed in during manifest download being
ends_with("topic_manifest.bin").

Auditing the codebase a bit, there don't appear to be any other usages
of list_objects() with a filter.
Testing with ABS would have caught a bug in the ABS client during topic
recovery when there are multiple objects within a topic's manifest path
prefix (e.g. from different clusters).
Copy link
Contributor

@WillemKauf WillemKauf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😨 very good catch

@andrwng andrwng merged commit 8143737 into redpanda-data:dev Dec 5, 2024
18 checks passed
@vbotbuildovich
Copy link
Collaborator

/backport v24.3.x

@vbotbuildovich
Copy link
Collaborator

/backport v24.2.x

@nvartolomei
Copy link
Contributor

@andrwng does the updated test catch the bug now?

@andrwng
Copy link
Contributor Author

andrwng commented Dec 6, 2024

@andrwng does the updated test catch the bug now?

Yea it fails without this change

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants