Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI Failure (Timeout waiting for partitions to move) in NodesDecommissioningTest.test_decommissioning_finishes_after_manual_cancellation #11365

Closed
michael-redpanda opened this issue Jun 12, 2023 · 5 comments · Fixed by #11471
Assignees
Labels
ci-failure kind/bug Something isn't working sev/low Bugs which are non-functional paper cuts, e.g. typos, issues in log messages

Comments

@michael-redpanda
Copy link
Contributor

https://buildkite.com/redpanda/vtools/builds/8050#0188a6e9-85d8-48ed-81ef-8307498c7772

Module: rptest.tests.nodes_decommissioning_test
Class:  NodesDecommissioningTest
Method: test_decommissioning_finishes_after_manual_cancellation
Arguments:
{
  "delete_topic": false
}
test_id:    rptest.tests.nodes_decommissioning_test.NodesDecommissioningTest.test_decommissioning_finishes_after_manual_cancellation.delete_topic=False
status:     FAIL
run time:   3 minutes 51.443 seconds


    TimeoutError('')
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 135, in run
    data = self.run_test()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 227, in run_test
    return self.test_context.function(self.test)
  File "/usr/local/lib/python3.10/dist-packages/ducktape/mark/_mark.py", line 481, in wrapper
    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/home/ubuntu/redpanda/tests/rptest/services/cluster.py", line 79, in wrapped
    r = f(self, *args, **kwargs)
  File "/home/ubuntu/redpanda/tests/rptest/tests/nodes_decommissioning_test.py", line 593, in test_decommissioning_finishes_after_manual_cancellation
    wait_until(lambda: self._partitions_moving(node=survivor_node),
  File "/usr/local/lib/python3.10/dist-packages/ducktape/utils/util.py", line 57, in wait_until
    raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from last_exception
ducktape.errors.TimeoutError
@michael-redpanda michael-redpanda added kind/bug Something isn't working ci-failure labels Jun 12, 2023
@michael-redpanda
Copy link
Contributor Author

@ztlpn ztlpn self-assigned this Jun 14, 2023
@ztlpn
Copy link
Contributor

ztlpn commented Jun 14, 2023

I'll chase this one

@ztlpn ztlpn added the sev/low Bugs which are non-functional paper cuts, e.g. typos, issues in log messages label Jun 14, 2023
@ztlpn
Copy link
Contributor

ztlpn commented Jun 14, 2023

This is a test error: the test first decommissions, then sets the recovery rate:

        self.logger.info(f"decommissioning node: {node_id}", )
        self._decommission(node_id)

        self._set_recovery_rate(100)

decommission manages to finish before the recovery rate is lowered :)

ztlpn added a commit to ztlpn/redpanda that referenced this issue Jun 15, 2023
If we throttle after we issue the decom command, decom can finish before
the throttling will take effect. The correct order (if we want the
partition movements to get stuck) is throttle-then-decom.

fixes redpanda-data#11365
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-failure kind/bug Something isn't working sev/low Bugs which are non-functional paper cuts, e.g. typos, issues in log messages
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants