Skip to content

kvflowcontrol: token return can take a long time with diskBandwidthLimit set #137017

@andrewbaptist

Description

@andrewbaptist

Describe the problem

A cluster with diskBandwidthLimit enabled can take 10s of minutes to return elastic tokens on an idle system.

To Reproduce

Run many of the perturbation tests with a disk bandwidth set to 350MiB and notice that the token return can take 10+ minutes.

e.g.

PERTURBATION_OVERRIDE=acMode=diskBandwidthLimit=350MiB roachtest run perturbation/dev/addNode -l

Expected behavior
During the end period of the test there is no longer an IO and the CPU and disks are sitting idle (<5% utilization). It is expected that the tokens would be returned sooner.

Its not clear if the problem is with the disk bandwidth limit or the RACv2 handling of it.

This issue is intented to track the workaround to wait longer for token returns when this setting is enabled.

Jira issue: CRDB-45358

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-replication-admission-control-v2Related to introduction of replication AC v2C-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.O-perturbationBugs found by the perturbation frameworkT-kvKV Teambranch-masterFailures and bugs on the master branch.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions