Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use exponential backoff retry strategy for glue client #23039

Conversation

oskar-szwajkowski
Copy link
Contributor

Description

After merging #22814, @ebyhr reporter that TestIcebergGlueCatalogConnectorSmokeTest#testDeleteRowsConcurrently became flaky

After running it several times locally, I found that commit 4b28d9b seems to be problematic

Default retry strategy, which brings default backoff strategy, wasn't good enough when dealing with such concurrent use cases. After changing to exponential backoff strategy it feels much more stable now.

Additional context and related issues

#22814

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(X) Release notes are required, with the following suggested text:

# Section
* Use exponential backoff in glue client retries

@cla-bot cla-bot bot added the cla-signed label Aug 14, 2024
@ebyhr
Copy link
Member

ebyhr commented Aug 14, 2024

/test-with-secrets sha=61c3c5e23162890cb76fb3b2340fdfdd2c2a8b27

@github-actions github-actions bot added the hive Hive connector label Aug 14, 2024
Copy link

github-actions bot commented Aug 14, 2024

The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/10388756346

@oskar-szwajkowski
Copy link
Contributor Author

oskar-szwajkowski commented Aug 14, 2024

@ebyhr is there anything I can do to retry job, or you need to schedule retry?

I see there seems to be 1 clickhouse flaky test

@ebyhr
Copy link
Member

ebyhr commented Aug 19, 2024

@oskar-szwajkowski We can ignore the failure because we're sure that ClickHouse connector is unrelated to this change.

@electrum Could you review this PR?

Copy link
Member

@electrum electrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this

@oskar-szwajkowski oskar-szwajkowski force-pushed the osz/glue-client-better-backoff-strategy branch 2 times, most recently from c1ca644 to 8dd12b4 Compare August 27, 2024 15:47
Default strategy was not good enough for handling
concurrent modifications, which led to flaky behavior
in concurrent modifications tests like
TestIcebergGlueCatalogConnectorSmokeTest#testDeleteRowsConcurrently
@oskar-szwajkowski oskar-szwajkowski force-pushed the osz/glue-client-better-backoff-strategy branch from 8dd12b4 to f4b839f Compare August 27, 2024 20:12
@ebyhr
Copy link
Member

ebyhr commented Aug 27, 2024

Let me merge to fix flaky tests.

@ebyhr ebyhr merged commit 784a131 into trinodb:master Aug 27, 2024
56 checks passed
@github-actions github-actions bot added this to the 455 milestone Aug 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed hive Hive connector
Development

Successfully merging this pull request may close these issues.

3 participants