Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage_service freeze on rebuild tablet replicas of replaced node #20885

Open
1 of 2 tasks
soyacz opened this issue Sep 30, 2024 · 3 comments
Open
1 of 2 tasks

storage_service freeze on rebuild tablet replicas of replaced node #20885

soyacz opened this issue Sep 30, 2024 · 3 comments
Assignees
Labels
area/tablets P1 Urgent triage/master Looking for assignee

Comments

@soyacz
Copy link
Contributor

soyacz commented Sep 30, 2024

Packages

Scylla version: 6.3.0~dev-20240927.c17d35371846 with build-id a9b08d0ce1f3cf99eb39d7a8372848fa2840dc1d

Kernel Version: 6.8.0-1016-aws

Issue description

  • This issue is a regression.
  • It is unknown if this issue is a regression.

During 'disrupt_terminate_and_replace_node' nemesis, after node termination we add another node and replace the terminated one.
Looks like storage_service stuck on storage_service - Waiting for tablet replicas from the replaced node to be rebuilt
It happened with parallel 'create_index' nemesis.

I can see in logs:

Sep 28 10:08:33.565248 parallel-topology-schema-changes-mu-db-node-857012f9-20 scylla[6491]:  [shard 0:strm] range_streamer - Tablet rebuild with 2a05:d018:12e3:f002:d162:20b4:ff27:d412 for keyspace=keyspace1 succeeded, took 46.41925 seconds
Sep 28 10:09:05.970253 parallel-topology-schema-changes-mu-db-node-857012f9-20 scylla[6491]:  [shard 0:strm] range_streamer - Tablet rebuild succeeded, took 78 seconds, nr_ranges_remaining=0

but never reached Tablet replicas from the replaced node have been rebuilt

In the meantime, there were plenty of reader_concurrency_semaphore errors like (first one posted):

Sep 28 09:54:11.595812 parallel-topology-schema-changes-mu-db-node-857012f9-20 scylla[6491]:  [shard 0:strm] stream_session - [Stream #9cc06db4-7d7f-11ef-8503-e96b423adb7e] Executing streaming plan for Tablet rebuild-keyspace1-index-0 with peers={2a05:d01c:964:7d02:9d87:362:74a9:47b}, master
Sep 28 09:54:21.628252 parallel-topology-schema-changes-mu-db-node-857012f9-20 scylla[6491]:  [shard 0:strm] reader_concurrency_semaphore - Semaphore streaming with 10/10 count and 339792/175951052 memory resources: timed out, dumping permit diagnostics:
                                                                                             Trigger permit: count=0, memory=0, table=system.view_build_status_v2, operation=data-query, state=waiting_for_admission
                                                                                             Identified bottleneck(s): disk
                                                                                             
                                                                                             permits        count        memory        table/operation/state
                                                                                             5        5        181K        keyspace1.standard1_c2_nemesis_index/stream-session/active
                                                                                             3        3        82K        keyspace1.standard2/stream-session/active
                                                                                             2        2        68K        keyspace1.sec_ind_c3_index/stream-session/active
                                                                                             3        0        0B        keyspace1.standard2/stream-session/waiting_for_admission
                                                                                             34        0        0B        system.view_build_status_v2/data-query/waiting_for_admission
                                                                                             1        0        0B        keyspace1.sec_ind_c2_index/stream-session/waiting_for_admission
                                                                                             
                                                                                             48        10        332K        total
                                                                                             
                                                                                             Stats:
                                                                                             permit_based_evictions: 0
                                                                                             time_based_evictions: 0
                                                                                             inactive_reads: 0
                                                                                             total_successful_reads: 51
                                                                                             total_failed_reads: 0
                                                                                             total_reads_shed_due_to_overload: 0
                                                                                             total_reads_killed_due_to_kill_limit: 0
                                                                                             reads_admitted: 8828
                                                                                             reads_enqueued_for_admission: 55
                                                                                             reads_enqueued_for_memory: 0
                                                                                             reads_admitted_immediately: 8811
                                                                                             reads_queued_because_ready_list: 6
                                                                                             reads_queued_because_need_cpu_permits: 13
                                                                                             reads_queued_because_memory_resources: 36
                                                                                             reads_queued_because_count_resources: 0
                                                                                             reads_queued_with_eviction: 0
                                                                                             total_permits: 8866
                                                                                             current_permits: 48
                                                                                             need_cpu_permits: 0
                                                                                             awaits_permits: 0
                                                                                             disk_reads: 0
                                                                                             sstables_read: 0

and right after streaming errors:

Sep 28 09:54:21.645950 parallel-topology-schema-changes-mu-db-node-857012f9-20 scylla[6491]:  [shard 0:strm] stream_session - [Stream #9cc046a7-7d7f-11ef-8503-e96b423adb7e] Failed to handle STREAM_MUTATION_FRAGMENTS (receive and distribute phase) for ks=keyspace1, cf=standard2, peer=2a05:d018:12e3:f002:240a:f773:d2a7:4ac3: seastar::nested_exception: exceptions::read_timeout_exception (Operation timed out for system.view_build_status_v2 - received only 0 responses from 1 CL=LOCAL_ONE.) (while cleaning up after std::runtime_error (Dangling queue_reader_handle_v2))
Sep 28 09:54:21.778716 parallel-topology-schema-changes-mu-db-node-857012f9-20 scylla[6491]:  [shard 0:strm] stream_session - [Stream #9cc046a7-7d7f-11ef-8503-e96b423adb7e] Received failed complete message, peer=2a05:d018:12e3:f002:240a:f773:d2a7:4ac3
Sep 28 09:54:21.780570 parallel-topology-schema-changes-mu-db-node-857012f9-20 scylla[6491]:  [shard 0:strm] stream_session - [Stream #9cc046a7-7d7f-11ef-8503-e96b423adb7e] Streaming plan for Tablet rebuild-keyspace1-index-0 failed, peers={2a05:d018:12e3:f002:240a:f773:d2a7:4ac3}, tx=0 KiB, 0.00 KiB/s, rx=21 KiB, 2.10 KiB/s
Sep 28 09:54:21.783397 parallel-topology-schema-changes-mu-db-node-857012f9-20 scylla[6491]:  [shard 0:strm] range_streamer - Tablet rebuild with 2a05:d018:12e3:f002:240a:f773:d2a7:4ac3 for keyspace=keyspace1 failed, took 10.186183 seconds: streaming::stream_exception (Stream failed)
Sep 28 09:54:21.803372 parallel-topology-schema-changes-mu-db-node-857012f9-20 scylla[6491]:  [shard 6:strm] stream_session - [Stream #9cc046ae-7d7f-11ef-8503-e96b423adb7e] Failed to handle STREAM_MUTATION_FRAGMENTS (receive and distribute phase) for ks=keyspace1, cf=standard2, peer=2a05:d01c:964:7d02:c910:8f45:40c7:dc51: seastar::nested_exception: exceptions::read_timeout_exception (Operation timed out for system.view_build_status_v2 - received only 0 responses from 1 CL=LOCAL_ONE.) (while cleaning up after exceptions::read_timeout_exception (Operation timed out for system.view_build_status_v2 - received only 0 responses from 1 CL=LOCAL_ONE.))
Sep 28 09:54:21.833559 parallel-topology-schema-changes-mu-db-node-857012f9-20 scylla[6491]:  [shard 0:strm] stream_session - [Stream #9cc046aa-7d7f-11ef-8503-e96b423adb7e] Failed to handle STREAM_MUTATION_FRAGMENTS (receive and distribute phase) for ks=keyspace1, cf=standard2, peer=2a05:d018:12e3:f002:d162:20b4:ff27:d412: seastar::nested_exception: exceptions::read_timeout_exception (Operation timed out for system.view_build_status_v2 - received only 0 responses from 1 CL=LOCAL_ONE.) (while cleaning up after std::runtime_error (Dangling queue_reader_handle_v2))
Sep 28 09:54:21.834102 parallel-topology-schema-changes-mu-db-node-857012f9-20 scylla[6491]:  [shard 4:strm] stream_session - [Stream #9cc046a3-7d7f-11ef-8503-e96b423adb7e] Failed to handle STREAM_MUTATION_FRAGMENTS (receive and distribute phase) for ks=keyspace1, cf=standard2, peer=2a05:d01c:964:7d02:c910:8f45:40c7:dc51: seastar::nested_exception: exceptions::read_timeout_exception (Operation timed out for system.view_build_status_v2 - received only 0 responses from 1 CL=LOCAL_ONE.) (while cleaning up after exceptions::read_timeout_exception (Operation timed out for system.view_build_status_v2 - received only 0 responses from 1 CL=LOCAL_ONE.))
Sep 28 09:54:21.835576 parallel-topology-schema-changes-mu-db-node-857012f9-20 scylla[6491]:  [shard 0:strm] stream_session - [Stream #9cbff885-7d7f-11ef-8503-e96b423adb7e] Failed to handle STREAM_MUTATION_FRAGMENTS (receive and distribute phase) for ks=keyspace1, cf=standard2, peer=2a05:d018:12e3:f002:c4ae:fbd0:95b:1d89: seastar::nested_exception: exceptions::read_timeout_exception (Operation timed out for system.view_build_status_v2 - received only 0 responses from 1 CL=LOCAL_ONE.) (while cleaning up after std::runtime_error (Dangling queue_reader_handle_v2))
Sep 28 09:54:21.885776 parallel-topology-schema-changes-mu-db-node-857012f9-20 scylla[6491]:  [shard 3:strm] stream_session - [Stream #9cc01f92-7d7f-11ef-8503-e96b423adb7e] Failed to handle STREAM_MUTATION_FRAGMENTS (receive and distribute phase) for ks=keyspace1, cf=standard2, peer=2a05:d01c:964:7d02:c910:8f45:40c7:dc51: seastar::nested_exception: exceptions::read_timeout_exception (Operation timed out for system.view_build_status_v2 - received only 0 responses from 1 CL=LOCAL_ONE.) (while cleaning up after std::runtime_error (Dangling queue_reader_handle_v2))
Sep 28 09:54:21.891135 parallel-topology-schema-changes-mu-db-node-857012f9-20 scylla[6491]:  [shard 6:strm] stream_session - [Stream #9cc046ad-7d7f-11ef-8503-e96b423adb7e] Failed to handle STREAM_MUTATION_FRAGMENTS (receive and distribute phase) for ks=keyspace1, cf=standard2, peer=2a05:d01c:964:7d02:f310:b95:f215:1cd4: seastar::nested_exception: exceptions::read_timeout_exception (Operation timed out for system.view_build_status_v2 - received only 0 responses from 1 CL=LOCAL_ONE.) (while cleaning up after exceptions::read_timeout_exception (Operation timed out for system.view_build_status_v2 - received only 0 responses from 1 CL=LOCAL_ONE.))
Sep 28 09:54:21.894502 parallel-topology-schema-changes-mu-db-node-857012f9-20 scylla[6491]:  [shard 1:strm] stream_session - [Stream #9cc046ab-7d7f-11ef-8503-e96b423adb7e] Failed to handle STREAM_MUTATION_FRAGMENTS (receive and distribute phase) for ks=keyspace1, cf=standard2, peer=2a05:d01c:964:7d02:3f73:3f0b:6cb2:8a36: seastar::nested_exception: exceptions::read_timeout_exception (Operation timed out for system.view_build_status_v2 - received only 0 responses from 1 CL=LOCAL_ONE.) (while cleaning up after std::runtime_error (Dangling queue_reader_handle_v2))
Sep 28 09:54:21.916727 parallel-topology-schema-changes-mu-db-node-857012f9-20 scylla[6491]:  [shard 3:strm] stream_session - [Stream #9cbff886-7d7f-11ef-8503-e96b423adb7e] Failed to handle STREAM_MUTATION_FRAGMENTS (receive and distribute phase) for ks=keyspace1, cf=standard2, peer=2a05:d01c:964:7d02:9d87:362:74a9:47b: seastar::nested_exception: exceptions::read_timeout_exception (Operation timed out for system.view_build_status_v2 - received only 0 responses from 1 CL=LOCAL_ONE.) (while cleaning up after exceptions::read_timeout_exception (Operation timed out for system.view_build_status_v2 - received only 0 responses from 1 CL=LOCAL_ONE.))
Sep 28 09:54:21.923108 parallel-topology-schema-changes-mu-db-node-857012f9-20 scylla[6491]:  [shard 4:strm] stream_session - [Stream #9cbff887-7d7f-11ef-8503-e96b423adb7e] Failed to handle STREAM_MUTATION_FRAGMENTS (receive and distribute phase) for ks=keyspace1, cf=standard2, peer=2a05:d01c:964:7d02:3f73:3f0b:6cb2:8a36: seastar::nested_exception: exceptions::read_timeout_exception (Operation timed out for system.view_build_status_v2 - received only 0 responses from 1 CL=LOCAL_ONE.) (while cleaning up after std::runtime_error (Dangling queue_reader_handle_v2))
Sep 28 09:54:21.938386 parallel-topology-schema-changes-mu-db-node-857012f9-20 scylla[6491]:  [shard 6:strm] stream_session - [Stream #9cc06db2-7d7f-11ef-8503-e96b423adb7e] Failed to handle STREAM_MUTATION_FRAGMENTS (receive and distribute phase) for ks=keyspace1, cf=standard2, peer=2a05:d01c:964:7d02:3f73:3f0b:6cb2:8a36: seastar::nested_exception: exceptions::read_timeout_exception (Operation timed out for system.view_build_status_v2 - received only 0 responses from 1 CL=LOCAL_ONE.) (while cleaning up after std::runtime_error (Dangling queue_reader_handle_v2))
Sep 28 09:54:21.941617 parallel-topology-schema-changes-mu-db-node-857012f9-20 scylla[6491]:  [shard 5:strm] stream_session - [Stream #9cc046a6-7d7f-11ef-8503-e96b423adb7e] Failed to handle STREAM_MUTATION_FRAGMENTS (receive and distribute phase) for ks=keyspace1, cf=standard2, peer=2a05:d01c:964:7d02:c910:8f45:40c7:dc51: seastar::nested_exception: exceptions::read_timeout_exception (Operation timed out for system.view_build_status_v2 - received only 0 responses from 1 CL=LOCAL_ONE.) (while cleaning up after std::runtime_error (Dangling queue_reader_handle_v2))
Sep 28 09:54:21.993141 parallel-topology-schema-changes-mu-db-node-857012f9-20 scylla[6491]:  [shard 5:strm] stream_session - [Stream #9cc046a8-7d7f-11ef-8503-e96b423adb7e] Failed to handle STREAM_MUTATION_FRAGMENTS (receive and distribute phase) for ks=keyspace1, cf=standard2, peer=2a05:d01c:964:7d02:f310:b95:f215:1cd4: seastar::nested_exception: exceptions::read_timeout_exception (Operation timed out for system.view_build_status_v2 - received only 0 responses from 1 CL=LOCAL_ONE.) (while cleaning up after std::runtime_error (Dangling queue_reader_handle_v2))
Sep 28 09:54:22.004539 parallel-topology-schema-changes-mu-db-node-857012f9-20 scylla[6491]:  [shard 5:strm] stream_session - [Stream #9cc01f94-7d7f-11ef-8503-e96b423adb7e] Failed to handle STREAM_MUTATION_FRAGMENTS (receive and distribute phase) for ks=keyspace1, cf=standard2, peer=2a05:d01c:964:7d02:3f73:3f0b:6cb2:8a36: seastar::nested_exception: exceptions::read_timeout_exception (Operation timed out for system.view_build_status_v2 - received only 0 responses from 1 CL=LOCAL_ONE.) (while cleaning up after std::runtime_error (Dangling queue_reader_handle_v2))

Impact

Stuck node replacement

How frequently does it reproduce?

First time seen, this test changed list of executed nemesis this week and this is first run of this nemesis in this scenario.

Installation details

Cluster size: 12 nodes (i3en.2xlarge)

Scylla Nodes used in this run:

  • parallel-topology-schema-changes-mu-db-node-857012f9-9 (35.179.179.146 | 2a05:d01c:0964:7d02:c910:8f45:40c7:dc51) (shards: 7)
  • parallel-topology-schema-changes-mu-db-node-857012f9-8 (18.134.150.131 | 2a05:d01c:0964:7d02:b1e9:f032:91d9:6850) (shards: 7)
  • parallel-topology-schema-changes-mu-db-node-857012f9-7 (18.175.232.42 | 2a05:d01c:0964:7d02:f310:0b95:f215:1cd4) (shards: 7)
  • parallel-topology-schema-changes-mu-db-node-857012f9-6 (3.249.47.218 | 2a05:d018:12e3:f002:d5b3:4556:9b3c:d32b) (shards: 7)
  • parallel-topology-schema-changes-mu-db-node-857012f9-5 (3.249.119.158 | 2a05:d018:12e3:f002:f9d1:36c6:92e1:60fa) (shards: 7)
  • parallel-topology-schema-changes-mu-db-node-857012f9-4 (18.201.152.173 | 2a05:d018:12e3:f002:240a:f773:d2a7:4ac3) (shards: 7)
  • parallel-topology-schema-changes-mu-db-node-857012f9-3 (34.254.179.97 | 2a05:d018:12e3:f002:2178:024f:88fc:4b76) (shards: 7)
  • parallel-topology-schema-changes-mu-db-node-857012f9-20 (34.243.40.166 | 2a05:d018:12e3:f002:4fda:c88f:72d8:7820) (shards: 7)
  • parallel-topology-schema-changes-mu-db-node-857012f9-2 (3.254.61.190 | 2a05:d018:12e3:f002:afab:4199:25e9:a1fd) (shards: 7)
  • parallel-topology-schema-changes-mu-db-node-857012f9-19 (18.134.131.156 | 2a05:d01c:0964:7d02:f49c:2dcc:1413:93e0) (shards: -1)
  • parallel-topology-schema-changes-mu-db-node-857012f9-18 (18.200.246.108 | 2a05:d018:12e3:f002:d162:20b4:ff27:d412) (shards: 7)
  • parallel-topology-schema-changes-mu-db-node-857012f9-17 (35.179.110.98 | 2a05:d01c:0964:7d02:9d87:0362:74a9:047b) (shards: 7)
  • parallel-topology-schema-changes-mu-db-node-857012f9-16 (3.250.37.156 | 2a05:d018:12e3:f002:c4ae:fbd0:095b:1d89) (shards: 7)
  • parallel-topology-schema-changes-mu-db-node-857012f9-15 (18.170.67.165 | 2a05:d01c:0964:7d02:1ac6:c11e:d1cc:864f) (shards: -1)
  • parallel-topology-schema-changes-mu-db-node-857012f9-14 (13.40.167.202 | 2a05:d01c:0964:7d02:3f73:3f0b:6cb2:8a36) (shards: 7)
  • parallel-topology-schema-changes-mu-db-node-857012f9-13 (54.216.147.124 | 2a05:d018:12e3:f002:233f:cb0d:1f58:cdc7) (shards: 7)
  • parallel-topology-schema-changes-mu-db-node-857012f9-12 (35.177.95.3 | 2a05:d01c:0964:7d02:01e0:0044:1056:08b2) (shards: 7)
  • parallel-topology-schema-changes-mu-db-node-857012f9-11 (3.8.164.97 | 2a05:d01c:0964:7d02:3126:afc7:84c8:58f0) (shards: 7)
  • parallel-topology-schema-changes-mu-db-node-857012f9-10 (13.40.145.113 | 2a05:d01c:0964:7d02:f25a:eb37:0bb6:29a5) (shards: 7)
  • parallel-topology-schema-changes-mu-db-node-857012f9-1 (34.249.124.154 | 2a05:d018:12e3:f002:9b5d:20cb:abf5:de90) (shards: 7)

OS / Image: ami-087d814d9b6773015 ami-09b7fc7e317549f14 (aws: undefined_region)

Test: longevity-multidc-schema-topology-changes-12h-test
Test id: 857012f9-ef88-402f-a211-0318be24be7f
Test name: scylla-master/tier1/longevity-multidc-schema-topology-changes-12h-test
Test method: longevity_test.LongevityTest.test_custom_time
Test config file(s):

Logs and commands
  • Restore Monitor Stack command: $ hydra investigate show-monitor 857012f9-ef88-402f-a211-0318be24be7f
  • Restore monitor on AWS instance using Jenkins job
  • Show all stored logs command: $ hydra investigate show-logs 857012f9-ef88-402f-a211-0318be24be7f

Logs:

Jenkins job URL
Argus

@soyacz soyacz added the triage/master Looking for assignee label Sep 30, 2024
@mykaul
Copy link
Contributor

mykaul commented Sep 30, 2024

Semaphore streaming with 10/10 count

You ran out of semaphores. Interesting - haven't seen this for some time.
@denesb - please assign someone to investigate it.

@denesb
Copy link
Contributor

denesb commented Oct 1, 2024

The semaphore is flooded with reads for system.view_build_status_v2/data-query/waiting_for_admission.
This table is read during streaming, to determine whether sstables need to be put in the staging directory, or not.
It is possible that we have a deadlock caused by the migration of view_build_status table to raft. Previously, this was a distributed table, so multiple (remote) replicas were available to answer the query into the table. Now, it is a local table, so all queries into it are answered by shard 0 of the local node. If there is enough ongoing streaming activity that all permits are gone, any queries these streaming processes initiate into this table will be stuck behind the very streaming session which initiated it. Deadlock.

I think this is a regression introduced by #19745.

@denesb
Copy link
Contributor

denesb commented Oct 1, 2024

As for how to solve it -- I have no good ideas off the top of my head.

We can make the streaming reader evictable -- but then we will have the same kind of trashing on shard 0, that we have with mixed shard repairs (#18269).

Another solution is to detach the internal code from the interface presented by the view_build_status_v2 table -- instead directly read the view build status from group0. I don't know if this would avoid a sub-read though.

@piodul @kbr-scylla

@mykaul mykaul added the P1 Urgent label Oct 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/tablets P1 Urgent triage/master Looking for assignee
Projects
None yet
Development

No branches or pull requests

4 participants