CI Failure (`TimeoutError: Node docker-rp-xx draining leaderships`) in `BasicAuthUpgradeTest.test_upgrade_and_enable_basic_auth` #10136

dotnwat · 2023-04-17T19:40:11Z

https://buildkite.com/redpanda/redpanda/builds/27277#01878dc4-9515-414e-88a6-89d717325c82

Module: rptest.tests.pandaproxy_test
Class:  BasicAuthUpgradeTest
Method: test_upgrade_and_enable_basic_auth
Arguments:
{
  "base_release": [
    22,
    3
  ],
  "next_release": [
    23,
    1
  ]
}

====================================================================================================
test_id:    rptest.tests.pandaproxy_test.BasicAuthUpgradeTest.test_upgrade_and_enable_basic_auth.base_release=.22.3.next_release=.23.1
status:     FAIL
run time:   54.194 seconds


    TimeoutError('Node docker-rp-23 draining leaderships')
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 135, in run
    data = self.run_test()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 227, in run_test
    return self.test_context.function(self.test)
  File "/usr/local/lib/python3.10/dist-packages/ducktape/mark/_mark.py", line 481, in wrapper
    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/root/tests/rptest/services/cluster.py", line 49, in wrapped
    r = f(self, *args, **kwargs)
  File "/root/tests/rptest/tests/pandaproxy_test.py", line 2127, in test_upgrade_and_enable_basic_auth
    self.redpanda.rolling_restart_nodes(self.redpanda.nodes)
  File "/root/tests/rptest/services/redpanda.py", line 2289, in rolling_restart_nodes
    restarter.restart_nodes(nodes,
  File "/root/tests/rptest/services/rolling_restarter.py", line 88, in restart_nodes
    wait_until(lambda: has_drained_leaders(node),
  File "/usr/local/lib/python3.10/dist-packages/ducktape/utils/util.py", line 57, in wait_until
    raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from last_exception
ducktape.errors.TimeoutError: Node docker-rp-23 draining leaderships

JIRA Link: CORE-1276

The text was updated successfully, but these errors were encountered:

dotnwat · 2023-04-17T19:40:39Z

Probably related #9049

dlex · 2023-05-15T19:58:58Z

on (amd64, container) in job https://buildkite.com/redpanda/redpanda/builds/29125#018818fc-3f05-4301-a16c-3031c6b5d247

dlex · 2023-05-20T02:24:40Z

A very much similar failure in https://buildkite.com/redpanda/redpanda/builds/29432#01883193-77a7-4f46-bac1-be8fac94ff52

test_id:    rptest.tests.cluster_bootstrap_test.ClusterBootstrapUpgrade.test_change_bootstrap_configs_during_upgrade.empty_seed_starts_cluster=False
status:     FAIL
run time:   1 minute 31.113 seconds


    TimeoutError('Node docker-rp-23 draining leaderships')
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 135, in run
    data = self.run_test()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 227, in run_test
    return self.test_context.function(self.test)
  File "/usr/local/lib/python3.10/dist-packages/ducktape/mark/_mark.py", line 481, in wrapper
    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/root/tests/rptest/services/cluster.py", line 49, in wrapped
    r = f(self, *args, **kwargs)
  File "/root/tests/rptest/tests/cluster_bootstrap_test.py", line 137, in test_change_bootstrap_configs_during_upgrade
    self.redpanda.rolling_restart_nodes(self.redpanda.nodes,
  File "/root/tests/rptest/services/redpanda.py", line 953, in rolling_restart_nodes
    restarter.restart_nodes(nodes,
  File "/root/tests/rptest/services/rolling_restarter.py", line 88, in restart_nodes
    wait_until(lambda: has_drained_leaders(node),
  File "/usr/local/lib/python3.10/dist-packages/ducktape/utils/util.py", line 57, in wait_until
    raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from last_exception
ducktape.errors.TimeoutError: Node docker-rp-23 draining leaderships

Log excerpts from docker-rp-23 node_id 1.

DEBUG - 2023-05-19 02:04:33,890 - admin - _request - lineno:332]: Dispatching put http://docker-rp-23:9644/v1/brokers/1/maintenance
DEBUG 2023-05-19 02:04:33,893 [shard 0] admin_api_server - admin_server.cc:365 - [admin] PUT http://docker-rp-23:9644/v1/brokers/1/maintenance 
INFO  2023-05-19 02:04:33,894 [shard 0] cluster - members_table.cc:221 - marking node 1 in maintenance state 
INFO  2023-05-19 02:04:33,894 [shard 1] cluster - members_table.cc:221 - marking node 1 in maintenance state    
...
INFO  2023-05-19 02:04:33,894 [shard 0] cluster - drain_manager.cc:49 - Node draining is starting                                                                                                                              
INFO  2023-05-19 02:04:33,894 [shard 0] cluster - drain_manager.cc:143 - Node draining has started                                                                                                                             
INFO  2023-05-19 02:04:33,894 [shard 0] raft - [group_id:0, {redpanda/controller/0}] consensus.cc:2946 - Starting leadership transfer from {id: {1}, revision: {0}} to {id: {3}, revision: {0}} in term 1                      
TRACE 2023-05-19 02:04:33,894 [shard 0] raft - [group_id:0, {redpanda/controller/0}] consensus.cc:2801 - transfer leadership: preparing target={id: {3}, revision: {0}}, dirty_offset=21                                       
TRACE 2023-05-19 02:04:33,894 [shard 0] raft - [group_id:0, {redpanda/controller/0}] consensus.cc:2807 - transfer leadership: cleared oplock                                                                                   
DEBUG 2023-05-19 02:04:33,894 [shard 0] raft - [group_id:0, {redpanda/controller/0}] consensus.cc:2839 - transfer leadership: node {id: {3}, revision: {0}} doesn't need recovery or is already recovering (is_recovering false dirty offset 21)
DEBUG 2023-05-19 02:04:33,894 [shard 0] raft - [group_id:0, {redpanda/controller/0}] consensus.cc:2872 - transfer leadership: node {id: {3}, revision: {0}} is not recovering, proceeding (dirty offset 21)                    
INFO  2023-05-19 02:04:33,894 [shard 0] cluster - drain_manager.cc:207 - Draining leadership from 1 groups                                                                                                                     
INFO  2023-05-19 02:04:33,894 [shard 1] cluster - drain_manager.cc:49 - Node draining is starting                                                                                                                              
INFO  2023-05-19 02:04:33,894 [shard 1] cluster - drain_manager.cc:143 - Node draining has started                                                                                                                             
INFO  2023-05-19 02:04:33,894 [shard 1] cluster - drain_manager.cc:256 - Node draining has completed on shard 1                                                                                                                
TRACE 2023-05-19 02:04:33,894 [shard 0] storage - readers_cache.cc:328 - {redpanda/controller/0} - removing reader: [0,21] lower_bound: 22                                                                                     
[DEBUG - 2023-05-19 02:04:33,895 - admin - _request - lineno:361]: Response OK                                                                                                                                                 
[INFO  - 2023-05-19 02:04:33,895 - rolling_restarter - restart_nodes - lineno:86]: Waiting for node docker-rp-23 leadership drain                                                                                              
[DEBUG - 2023-05-19 02:04:33,895 - admin - _request - lineno:332]: Dispatching get http://docker-rp-23:9644/v1/brokers/1                                                                                                       
INFO  2023-05-19 02:04:33,895 [shard 0] raft - [group_id:0, {redpanda/controller/0}] consensus.cc:187 - [leadership_transfer] Stepping down as leader in term 1, dirty offset 21                                 
DEBUG 2023-05-19 02:04:33,895 [shard 0] raft - [group_id:0, {redpanda/controller/0}] consensus.cc:2618 - triggering leadership notification with term: 1, new leader: {nullopt}                                  
DEBUG 2023-05-19 02:04:33,895 [shard 0] cluster - health_monitor_backend.cc:395 - aborting current refresh request to 1                                                                                          
DEBUG 2023-05-19 02:04:33,895 [shard 0] cluster - feature_manager.cc:117 - Controller leader notification term 1                                                                                                 
TRACE 2023-05-19 02:04:33,895 [shard 0] cluster - partition_leaders_table.cc:160 - updated partition: {redpanda/controller/0} leader: {term: 1, current leader: {nullopt}, previous leader: {1}, revision: 0}    
INFO  2023-05-19 02:04:33,895 [shard 0] cluster - drain_manager.cc:239 - Draining leadership from 1 groups 1 succeeded                                                                                           
INFO  2023-05-19 02:04:33,895 [shard 0] cluster - drain_manager.cc:256 - Node draining has completed on shard 0                                                                                                  
TRACE 2023-05-19 02:04:33,895 [shard 1] cluster - partition_leaders_table.cc:160 - updated partition: {redpanda/controller/0} leader: {term: 1, current leader: {nullopt}, previous leader: {1}, revision: 0}    
TRACE 2023-05-19 02:04:33,898 [shard 0] request_auth - request_auth.cc:126 - Authenticated user {admin}                                                                                                          
DEBUG 2023-05-19 02:04:33,898 [shard 0] admin_api_server - admin_server.cc:365 - [admin] GET http://docker-rp-23:9644/v1/brokers/1                                                                               
[DEBUG - 2023-05-19 02:04:33,899 - admin - _request - lineno:355]: Response OK, JSON: {                 
    'node_id': 1,                       
    'num_cores': 2,                     
    'membership_status': 'active',      
    'maintenance_status': {             
        'draining': False,              
        'finished': False,              
        'errors': False,                
        'partitions': 0,                
        'eligible': 0,                  
        'transferring': 0,              
        'failed': 0                     
    }                                   
}

Whereas

        def has_drained_leaders(node):
            try:
                node_id = self.redpanda.idx(node)
                broker_resp = admin.get_broker(node_id, node=node)
                maintenance_status = broker_resp["maintenance_status"]
                return maintenance_status["draining"] and maintenance_status["finished"]

So it looks like node draining has finished before the test gets the draining status for the 1st time. The test wants to see the draining status before moving forward but misses it, and timeouts.

dlex · 2023-05-22T04:18:22Z

on (amd64, container) in job https://buildkite.com/redpanda/redpanda/builds/29360#01882d69-f998-49ce-a0c8-4f6bde5a6463

piyushredpanda · 2023-06-22T06:26:29Z

Sounds like more a test failure and hence sev/low to me. @michael-redpanda could you keep me honest, please?

bharathv · 2023-09-27T02:54:21Z

https://buildkite.com/redpanda/redpanda/builds/37651#018ad049-05dd-4d29-a1b5-b6fc8bba64c4

michael-redpanda · 2023-10-30T17:49:21Z

reported as stale by pandatriage

vbotbuildovich · 2023-11-03T16:08:50Z

*https://buildkite.com/redpanda/redpanda/builds/40369

vbotbuildovich · 2023-11-05T00:05:57Z

*https://buildkite.com/redpanda/vtools/builds/10446

vbotbuildovich · 2023-12-15T01:08:13Z

*https://buildkite.com/redpanda/vtools/builds/11146

vbotbuildovich · 2023-12-19T00:41:42Z

*https://buildkite.com/redpanda/vtools/builds/11190

vbotbuildovich · 2023-12-21T00:05:32Z

*https://buildkite.com/redpanda/redpanda/builds/43078

vbotbuildovich · 2023-12-23T00:58:12Z

*https://buildkite.com/redpanda/vtools/builds/11257

vbotbuildovich · 2024-02-13T20:21:20Z

*https://buildkite.com/redpanda/redpanda/builds/44935

vbotbuildovich · 2024-03-21T21:14:31Z

*https://buildkite.com/redpanda/redpanda/builds/46544

vbotbuildovich · 2024-04-03T21:16:01Z

*https://buildkite.com/redpanda/vtools/builds/12656

vbotbuildovich · 2024-04-24T21:12:51Z

*https://buildkite.com/redpanda/vtools/builds/13139

vbotbuildovich · 2024-05-07T21:14:29Z

*https://buildkite.com/redpanda/vtools/builds/13519

vbotbuildovich · 2024-05-12T21:15:40Z

*https://buildkite.com/redpanda/vtools/builds/13656

vbotbuildovich · 2024-05-27T21:08:56Z

*https://buildkite.com/redpanda/redpanda/builds/49573

vbotbuildovich · 2024-05-28T21:08:24Z

*https://buildkite.com/redpanda/vtools/builds/14105

vbotbuildovich · 2024-06-11T21:12:36Z

*https://buildkite.com/redpanda/vtools/builds/13139
*https://buildkite.com/redpanda/vtools/builds/13519
*https://buildkite.com/redpanda/vtools/builds/13656
*https://buildkite.com/redpanda/redpanda/builds/49573
*https://buildkite.com/redpanda/vtools/builds/14105

vbotbuildovich · 2024-06-11T21:29:44Z

*https://buildkite.com/redpanda/vtools/builds/13139
*https://buildkite.com/redpanda/vtools/builds/13519
*https://buildkite.com/redpanda/vtools/builds/13656
*https://buildkite.com/redpanda/redpanda/builds/49573
*https://buildkite.com/redpanda/vtools/builds/14105

vbotbuildovich · 2024-06-12T21:09:36Z

*https://buildkite.com/redpanda/vtools/builds/13139
*https://buildkite.com/redpanda/vtools/builds/13519
*https://buildkite.com/redpanda/vtools/builds/13656
*https://buildkite.com/redpanda/redpanda/builds/49573
*https://buildkite.com/redpanda/vtools/builds/14105

vbotbuildovich · 2024-07-04T23:12:22Z

*https://buildkite.com/redpanda/redpanda/builds/51106

vbotbuildovich · 2024-07-05T00:55:02Z

*https://buildkite.com/redpanda/redpanda/builds/51106

vbotbuildovich · 2024-07-06T21:08:58Z

*https://buildkite.com/redpanda/redpanda/builds/51183

vbotbuildovich · 2024-07-22T21:08:18Z

*https://buildkite.com/redpanda/vtools/builds/15844

vbotbuildovich · 2024-08-07T21:08:18Z

*https://buildkite.com/redpanda/vtools/builds/16241

vbotbuildovich · 2024-08-10T21:09:53Z

*https://buildkite.com/redpanda/vtools/builds/16399

piyushredpanda · 2024-09-24T04:01:53Z

Closing older-bot-filed CI issues as we transition to a more reliable system.

dotnwat added kind/bug Something isn't working ci-failure labels Apr 17, 2023

gene-redpanda mentioned this issue May 10, 2023

add alias for version in rpk #10561

Merged

7 tasks

dlex added area/tests sev/medium Bugs that do not meet criteria for high or critical, but are more severe than low. labels May 20, 2023

dlex mentioned this issue May 20, 2023

Node-wide throughput exemptions for clients #10755

Merged

7 tasks

piyushredpanda added sev/low Bugs which are non-functional paper cuts, e.g. typos, issues in log messages and removed sev/medium Bugs that do not meet criteria for high or critical, but are more severe than low. labels Jun 23, 2023

vshtokman added the area/kafka label Jun 28, 2023

BenPope mentioned this issue Jul 20, 2023

schema_registry: Improve sanitization of Avro namespaces #12334

Merged

8 tasks

VladLazar mentioned this issue Aug 1, 2023

storage: fix race between segment.ms application and the disk append path #12516

Closed

7 tasks

oleiman self-assigned this Oct 24, 2023

oleiman mentioned this issue Oct 25, 2023

Ci/10136/basicauthupgrade timeout #14411

Closed

7 tasks

michael-redpanda closed this as completed Oct 30, 2023

vbotbuildovich reopened this Nov 3, 2023

oleiman removed their assignment Nov 29, 2023

oleiman mentioned this issue Aug 30, 2024

CORE-7000: Node ID/UUID Override #22972

Merged

8 tasks

piyushredpanda closed this as completed Sep 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI Failure (`TimeoutError: Node docker-rp-xx draining leaderships`) in `BasicAuthUpgradeTest.test_upgrade_and_enable_basic_auth` #10136

CI Failure (`TimeoutError: Node docker-rp-xx draining leaderships`) in `BasicAuthUpgradeTest.test_upgrade_and_enable_basic_auth` #10136

dotnwat commented Apr 17, 2023 •

edited by jira bot

Loading

dotnwat commented Apr 17, 2023

dlex commented May 15, 2023

dlex commented May 20, 2023

dlex commented May 22, 2023

piyushredpanda commented Jun 22, 2023

bharathv commented Sep 27, 2023

michael-redpanda commented Oct 30, 2023

vbotbuildovich commented Nov 3, 2023

vbotbuildovich commented Nov 5, 2023

vbotbuildovich commented Dec 15, 2023

vbotbuildovich commented Dec 19, 2023

vbotbuildovich commented Dec 21, 2023

vbotbuildovich commented Dec 23, 2023

vbotbuildovich commented Feb 13, 2024

vbotbuildovich commented Mar 21, 2024

vbotbuildovich commented Apr 3, 2024

vbotbuildovich commented Apr 24, 2024

vbotbuildovich commented May 7, 2024

vbotbuildovich commented May 12, 2024

vbotbuildovich commented May 27, 2024

vbotbuildovich commented May 28, 2024

vbotbuildovich commented Jun 11, 2024

vbotbuildovich commented Jun 11, 2024

vbotbuildovich commented Jun 12, 2024

vbotbuildovich commented Jul 4, 2024

vbotbuildovich commented Jul 5, 2024

vbotbuildovich commented Jul 6, 2024

vbotbuildovich commented Jul 22, 2024

vbotbuildovich commented Aug 7, 2024

vbotbuildovich commented Aug 10, 2024

piyushredpanda commented Sep 24, 2024

CI Failure (TimeoutError: Node docker-rp-xx draining leaderships) in BasicAuthUpgradeTest.test_upgrade_and_enable_basic_auth #10136

CI Failure (TimeoutError: Node docker-rp-xx draining leaderships) in BasicAuthUpgradeTest.test_upgrade_and_enable_basic_auth #10136

Comments

dotnwat commented Apr 17, 2023 • edited by jira bot Loading

dotnwat commented Apr 17, 2023

dlex commented May 15, 2023

dlex commented May 20, 2023

dlex commented May 22, 2023

piyushredpanda commented Jun 22, 2023

bharathv commented Sep 27, 2023

michael-redpanda commented Oct 30, 2023

vbotbuildovich commented Nov 3, 2023

vbotbuildovich commented Nov 5, 2023

vbotbuildovich commented Dec 15, 2023

vbotbuildovich commented Dec 19, 2023

vbotbuildovich commented Dec 21, 2023

vbotbuildovich commented Dec 23, 2023

vbotbuildovich commented Feb 13, 2024

vbotbuildovich commented Mar 21, 2024

vbotbuildovich commented Apr 3, 2024

vbotbuildovich commented Apr 24, 2024

vbotbuildovich commented May 7, 2024

vbotbuildovich commented May 12, 2024

vbotbuildovich commented May 27, 2024

vbotbuildovich commented May 28, 2024

vbotbuildovich commented Jun 11, 2024

vbotbuildovich commented Jun 11, 2024

vbotbuildovich commented Jun 12, 2024

vbotbuildovich commented Jul 4, 2024

vbotbuildovich commented Jul 5, 2024

vbotbuildovich commented Jul 6, 2024

vbotbuildovich commented Jul 22, 2024

vbotbuildovich commented Aug 7, 2024

vbotbuildovich commented Aug 10, 2024

piyushredpanda commented Sep 24, 2024

CI Failure (`TimeoutError: Node docker-rp-xx draining leaderships`) in `BasicAuthUpgradeTest.test_upgrade_and_enable_basic_auth` #10136

CI Failure (`TimeoutError: Node docker-rp-xx draining leaderships`) in `BasicAuthUpgradeTest.test_upgrade_and_enable_basic_auth` #10136

dotnwat commented Apr 17, 2023 •

edited by jira bot

Loading