Skip to content

Comments

roachtest: increase the token return time with disk bandwidth limit#137019

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
andrewbaptist:2024-12-09-slow-token-return
Dec 10, 2024
Merged

roachtest: increase the token return time with disk bandwidth limit#137019
craig[bot] merged 1 commit intocockroachdb:masterfrom
andrewbaptist:2024-12-09-slow-token-return

Conversation

@andrewbaptist
Copy link

Previously the test would wait 10m for tokens to be returned. Without the disk bandwidth limit set, they typically return almost immediately but with a limit they can take ~30m to return in some cases even after the workload is stopped and the system is idle. This change fixes some of the perturbation/metamorphic/* tests that are hitting this slow token return.

Epic: none
Fixes: #136982
Fixes: #136553
Informs: #137017

Release note: None

@andrewbaptist andrewbaptist requested a review from kvoli December 9, 2024 15:21
@andrewbaptist andrewbaptist requested a review from a team as a code owner December 9, 2024 15:21
@andrewbaptist andrewbaptist requested review from csgourav and shailendra-patel and removed request for a team December 9, 2024 15:21
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist, @csgourav, and @shailendra-patel)


-- commits line 4 at r1:
nit: it is referenced below, but the test [sic] isn't known here yet, consider mentioning it here or updating the commit header.


pkg/cmd/roachtest/tests/admission_control_elastic_mixed_version.go line 136 at r1 (raw file):

			// TODO(pav-kv): also validate that the write throughput was kept under
			// control, and the foreground traffic was not starved.
			roachtestutil.ValidateTokensReturned(ctx, t, c, c.CRDBNodes(), time.Minute)

The commit header:

roachtest: increase the token return time with disk bandwidth limit

The wait duration is decreasing from 10 minutes, to 1 minute unless the diskBandwidthLimit is enabled.

I suggest either updating the commit message/header, or leaving the token return time unchanged for non-disk-bw-limit variations.

Also, do you have any stats handy that confirm the decrease from 10 minutes, to 1 minute, is sufficient to prevent flakes?

Previously the perturbation/* tests would wait 10m for tokens to be
returned. Without the disk bandwidth limit set, they typically return
almost immediately but with a limit they can take ~30m to return in some
cases even after the workload is stopped and the system is idle. This
change fixes some of the perturbation/metamorphic/* tests that are
hitting this slow token return. Additionally this change reduces the
token wait time for the test
admission-control/elastic-workload/mixed-version to 1 minute as this
test doesn't typically wait more then a few seconds for token return.

Epic: none
Fixes: cockroachdb#136982
Fixes: cockroachdb#136553
Informs: cockroachdb#137017

Release note: None
@andrewbaptist andrewbaptist force-pushed the 2024-12-09-slow-token-return branch from 039a2d2 to 87b463a Compare December 10, 2024 14:24
Copy link
Author

@andrewbaptist andrewbaptist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Changes made, let me know if you have additional comments.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @csgourav, @kvoli, and @shailendra-patel)


-- commits line 4 at r1:

Previously, kvoli (Austen) wrote…

nit: it is referenced below, but the test [sic] isn't known here yet, consider mentioning it here or updating the commit header.

Updated to add the test name here.


pkg/cmd/roachtest/tests/admission_control_elastic_mixed_version.go line 136 at r1 (raw file):

Previously, kvoli (Austen) wrote…

The commit header:

roachtest: increase the token return time with disk bandwidth limit

The wait duration is decreasing from 10 minutes, to 1 minute unless the diiskBandwidthLimit is enabled.

I suggest either updating the commit message/header, or leaving the token return time unchanged for non-disk-bw-limit variations.

Also, do you have any stats handy that confirm the decrease from 10 minutes, to 1 minute, is sufficient to prevent flakes?

I left this as 1 minute after looking through the last several runs for this test where it was always in the 1-2s range. It might be easier to catch any regressions in this behavior on this test with a lower bound.

Copy link
Contributor

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andrewbaptist, @csgourav, and @shailendra-patel)


pkg/cmd/roachtest/tests/admission_control_elastic_mixed_version.go line 136 at r1 (raw file):

Previously, andrewbaptist (Andrew Baptist) wrote…

I left this as 1 minute after looking through the last several runs for this test where it was always in the 1-2s range. It might be easier to catch any regressions in this behavior on this test with a lower bound.

ack

@andrewbaptist
Copy link
Author

TFTR!

bors r=kvoli

@craig craig bot merged commit 95d95a6 into cockroachdb:master Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

roachtest: perturbation/metamorphic/addNode failed roachtest: perturbation/metamorphic/restart failed

3 participants