Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI Failure (BLL: Failed to hydrate chunk start 0, error: NotFound) in ShadowIndexingWhileBusyTest.test_create_or_delete_topics_while_busy #17846

Closed
vbotbuildovich opened this issue Apr 12, 2024 · 6 comments
Labels
area/cloud-storage Shadow indexing subsystem auto-triaged used to know which issues have been opened from a CI job ci-failure ci-rca/test CI Root Cause Analysis - Test Issue

Comments

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Apr 12, 2024

https://buildkite.com/redpanda/vtools/builds/12842

Module: rptest.tests.e2e_shadow_indexing_test
Class: ShadowIndexingWhileBusyTest
Method: test_create_or_delete_topics_while_busy
Arguments: {
    "short_retention": true,
    "cloud_storage_type": 1
}
test_id:    ShadowIndexingWhileBusyTest.test_create_or_delete_topics_while_busy
status:     FAIL
run time:   914.928 seconds

<BadLogLines nodes=ip-172-31-15-134(10) example="ERROR 2024-04-12 10:24:40,464 [shard 1:fetc] cloud_storage - [fiber4776 4d0945a0/kafka/topic-rqfxuuiaxi/18_52/2922-2961-20977360-1-v1.log.1] - segment_chunk_api.cc:226 - Failed to hydrate chunk start 0, error: NotFound">
Traceback (most recent call last):
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 184, in _do_run
    data = self.run_test()
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 276, in run_test
    return self.test_context.function(self.test)
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/mark/_mark.py", line 535, in wrapper
    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/home/ubuntu/redpanda/tests/rptest/services/cluster.py", line 177, in wrapped
    redpanda.raise_on_bad_logs(
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 1523, in raise_on_bad_logs
    lsearcher.search_logs(_searchable_nodes)
  File "/home/ubuntu/redpanda/tests/rptest/services/utils.py", line 156, in search_logs
    raise BadLogLines(bad_loglines)
rptest.services.utils.BadLogLines: <BadLogLines nodes=ip-172-31-15-134(10) example="ERROR 2024-04-12 10:24:40,464 [shard 1:fetc] cloud_storage - [fiber4776 4d0945a0/kafka/topic-rqfxuuiaxi/18_52/2922-2961-20977360-1-v1.log.1] - segment_chunk_api.cc:226 - Failed to hydrate chunk start 0, error: NotFound">

JIRA Link: CORE-2352

@vbotbuildovich vbotbuildovich added auto-triaged used to know which issues have been opened from a CI job ci-failure labels Apr 12, 2024
@vbotbuildovich
Copy link
Collaborator Author

@vbotbuildovich
Copy link
Collaborator Author

@vbotbuildovich
Copy link
Collaborator Author

@travisdowns travisdowns changed the title CI Failure (key symptom) in ShadowIndexingWhileBusyTest.test_create_or_delete_topics_while_busy CI Failure (BLL: Failed to hydrate chunk start 0, error: NotFound) in ShadowIndexingWhileBusyTest.test_create_or_delete_topics_while_busy Apr 17, 2024
@travisdowns travisdowns added the area/cloud-storage Shadow indexing subsystem label Apr 17, 2024
@nvartolomei
Copy link
Contributor

nvartolomei commented Apr 22, 2024

There seem to be some race conditions between archival (write path) and cloud storage (read path). E.g. the one leading to Failed to hydrate chunk start 0, error: NotFound. The archiver does garbage collection from the cloud while we still try to read the segment. I believe we have 2 (3?) flavors of this problem: one in the context of RRR (harder to solve #17857), (follower fetching too?), and a local one (likely easier to solve) which is the one in the issue.

@vbotbuildovich
Copy link
Collaborator Author

@nvartolomei
Copy link
Contributor

Fixed by #17828

@piyushredpanda piyushredpanda added the ci-rca/test CI Root Cause Analysis - Test Issue label May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cloud-storage Shadow indexing subsystem auto-triaged used to know which issues have been opened from a CI job ci-failure ci-rca/test CI Root Cause Analysis - Test Issue
Projects
None yet
Development

No branches or pull requests

4 participants