Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignoring raft_topology errors during rolling upgrade #9511

Closed
timtimb0t opened this issue Dec 9, 2024 · 4 comments · Fixed by #9584
Closed

Ignoring raft_topology errors during rolling upgrade #9511

timtimb0t opened this issue Dec 9, 2024 · 4 comments · Fixed by #9584
Assignees
Labels
on_core_qa tasks that should be solved by Core QA team

Comments

@timtimb0t
Copy link
Contributor

Packages

Base Scylla version: 6.2.1-20241106.a3a0ffbcd015 with build-id 94e4419682b4191f7c37e2d6bf02f4fa7988dff3
Target Scylla version (or git commit hash): 6.3.0~dev-20241206.7e2875d6489d with build-id 5227dd2a3fce4d2beb83ec6c17d47ad2e8ba6f5c

Kernel Version: 6.8.0-1017-gcp

Issue description

Such an errors which occur during the rolling upgrade test may be ignored:

2024-12-07 06:44:14.987 <2024-12-07 06:44:12.541>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=afd96633-fe01-4632-a902-62cbc0a645e4: type=RUNTIME_ERROR regex=std::runtime_error line_number=135539 node=rolling-upgrade--ubuntu-focal-db-node-0bb842d4-0-3
2024-12-07T06:44:12.541+00:00 rolling-upgrade--ubuntu-focal-db-node-0bb842d4-0-3      !ERR | scylla[13464]:  [shard  0: gms] raft_topology - drain rpc failed, proceed to fence old writes: std::runtime_error (raft topology: exec_global_command(barrier_and_drain) failed with seastar::rpc::closed_error (connection is closed))

Need some SCT workaround to handle/ignore such errors

Impact

No any impact, just enhancement for SCT is needed

How frequently does it reproduce?

Describe the frequency with how this issue can be reproduced.

Installation details

Cluster size: 4 nodes (n2-highmem-32)

Scylla Nodes used in this run:

  • rolling-upgrade--ubuntu-focal-db-node-0bb842d4-0-4 (34.138.125.36 | 10.142.0.229) (shards: 30)
  • rolling-upgrade--ubuntu-focal-db-node-0bb842d4-0-3 (34.73.88.37 | 10.142.0.223) (shards: 30)
  • rolling-upgrade--ubuntu-focal-db-node-0bb842d4-0-2 (34.75.166.53 | 10.142.0.220) (shards: 30)
  • rolling-upgrade--ubuntu-focal-db-node-0bb842d4-0-1 (34.148.43.129 | 10.142.0.204) (shards: 30)

OS / Image: https://www.googleapis.com/compute/v1/projects/scylla-images/global/images/scylladb-6-2-1 (gce: undefined_region)

Test: rolling-upgrade-gce-image-test
Test id: 0bb842d4-dff5-49a9-aa6a-abe328dcd1aa
Test name: scylla-master/rolling-upgrade/rolling-upgrade-gce-image-test
Test method: upgrade_test.UpgradeTest.test_rolling_upgrade
Test config file(s):

Logs and commands
  • Restore Monitor Stack command: $ hydra investigate show-monitor 0bb842d4-dff5-49a9-aa6a-abe328dcd1aa
  • Restore monitor on AWS instance using Jenkins job
  • Show all stored logs command: $ hydra investigate show-logs 0bb842d4-dff5-49a9-aa6a-abe328dcd1aa

Logs:

Jenkins job URL
Argus

@timtimb0t timtimb0t added the on_core_qa tasks that should be solved by Core QA team label Dec 11, 2024
@kbr-scylla
Copy link

Wasn't this addressed by #9352?
cc @enaydanov @aleksbykov

@aleksbykov
Copy link
Contributor

This error was not included to ignoring list. need new PR to update.
@timtimb0t , please, add this template and run job several times to catch another possible error messages

@roydahan
Copy link
Contributor

@timtimb0t are you working on a "fix" for it?

@timtimb0t
Copy link
Contributor Author

@roydahan, yes the fix itself is ready, testing it

timtimb0t added a commit to timtimb0t/scylla-cluster-tests that referenced this issue Dec 25, 2024
…rade ignoration

to avoid redundant error messages similar to:
[raft_topology - drain rpc failed, proceed to fence old writes: std::runtime_error ...]
and new unit test for new functionality
Fixes: scylladb#9511
timtimb0t added a commit to timtimb0t/scylla-cluster-tests that referenced this issue Dec 25, 2024
…rade ignoration

to avoid redundant error messages similar to:
[raft_topology - drain rpc failed, proceed to fence old writes: std::runtime_error ...]
and new unit test for new functionality
Fixes: scylladb#9511
timtimb0t added a commit to timtimb0t/scylla-cluster-tests that referenced this issue Dec 25, 2024
…rade ignoration

to avoid redundant error messages similar to:
[raft_topology - drain rpc failed, proceed to fence old writes: std::runtime_error ...]
and new unit test for new functionality
Fixes: scylladb#9511
@fruch fruch closed this as completed in f867079 Dec 25, 2024
mergify bot pushed a commit that referenced this issue Dec 25, 2024
…rade ignoration

to avoid redundant error messages similar to:
[raft_topology - drain rpc failed, proceed to fence old writes: std::runtime_error ...]
and new unit test for new functionality
Fixes: #9511

(cherry picked from commit f867079)
mergify bot pushed a commit that referenced this issue Dec 25, 2024
…rade ignoration

to avoid redundant error messages similar to:
[raft_topology - drain rpc failed, proceed to fence old writes: std::runtime_error ...]
and new unit test for new functionality
Fixes: #9511

(cherry picked from commit f867079)
mergify bot pushed a commit that referenced this issue Dec 25, 2024
…rade ignoration

to avoid redundant error messages similar to:
[raft_topology - drain rpc failed, proceed to fence old writes: std::runtime_error ...]
and new unit test for new functionality
Fixes: #9511

(cherry picked from commit f867079)
fruch pushed a commit that referenced this issue Dec 25, 2024
…rade ignoration

to avoid redundant error messages similar to:
[raft_topology - drain rpc failed, proceed to fence old writes: std::runtime_error ...]
and new unit test for new functionality
Fixes: #9511

(cherry picked from commit f867079)
fruch pushed a commit that referenced this issue Dec 25, 2024
…rade ignoration

to avoid redundant error messages similar to:
[raft_topology - drain rpc failed, proceed to fence old writes: std::runtime_error ...]
and new unit test for new functionality
Fixes: #9511

(cherry picked from commit f867079)
fruch pushed a commit that referenced this issue Dec 25, 2024
…rade ignoration

to avoid redundant error messages similar to:
[raft_topology - drain rpc failed, proceed to fence old writes: std::runtime_error ...]
and new unit test for new functionality
Fixes: #9511

(cherry picked from commit f867079)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
on_core_qa tasks that should be solved by Core QA team
Projects
None yet
5 participants