Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: hummock error NoSuchKey #7232

Closed
Tracked by #6640
zwang28 opened this issue Jan 6, 2023 · 1 comment
Closed
Tracked by #6640

bug: hummock error NoSuchKey #7232

zwang28 opened this issue Jan 6, 2023 · 1 comment

Comments

@zwang28
Copy link
Contributor

zwang28 commented Jan 6, 2023

Describe the bug

Namespace: rwc-3-longevity-20230106-065737
Pod: risingwave-compute-1

2023-01-06T07:02:32.250276Z ERROR risingwave_storage::monitor::monitored_store: Failed in iter: Hummock error: ObjectStore failed with IO error Internal error: read \""rls-apse1-eks-a-rwc-3-longevity-20230106-065737/240/54.data\"" in block Some(BlockLocation { offset: 14477472, size: 65902 }) failed, error: NoSuchKey: The specified key does not exist."

cn-1.csv

To Reproduce

No response

Expected behavior

No response

Additional context

compute-node-0 hits #7002 meanwhile.

@zwang28
Copy link
Contributor Author

zwang28 commented Jan 7, 2023

This issue is not a kernel bug. It's caused by

  1. Firstly the driving test script (test_runner.sh) fails due to create mv timeout.
    subprocess.TimeoutExpired: Command '['psql', 'postgres://dev:dev@rls-apse1-eks-a.risingwave-cloud.xyz:4566/dev?options=--tenant%3Drwc-3-longevity-20230107-045111', '-a', '-f', './nexmark/queries/q101.sql']' timed out after 89.999904184 seconds
  2. Then the test script does clean up: rwc delete tenant, which will shutdown risingwave cluster and empty S3 bucket.

So I think what happens is after calling rwc delete tenant, risingwave cluster is still alive while S3 is being emptying, thus results in risingwave reading a nonexistent S3 object. @mikechesterwang Can you help to clarify the order of deleting risingwave cluster and S3 in rwc delete tenant?

To confirm my guessing, I run other tests (with S3 sdk's connection timeout increased to workaround #7002 temporarily).

  • If I use the original test script, this issue is reproduced every time.
  • If I tune the test script to not delete tenant after failing, this issue doesn't appear anymore.

@zwang28 zwang28 closed this as completed Jan 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant