Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to apply mutation after restart-and-repair: seastar::gate_closed_exception (gate closed) #5740

Closed
yarongilor opened this issue Feb 6, 2020 · 2 comments

Comments

@yarongilor
Copy link

yarongilor commented Feb 6, 2020

Installation details
Scylla version (or git commit hash): 3.3 - Scylla version 666.development-0.20200123.e1b22b6a4c5 with build-id 98f8288b4a56c70373b863b3f012bd7e6a4a45dd
Cluster size: 4
OS (RHEL/CentOS/Ubuntu/AWS AMI): ami-0e0bc3eb9e39a481d eu-west1

scenario:
longevity-large-partition-200k-pks-4days at 2020-02-03 13:03:44.

Test: longevity_large_partition_test.LargePartitionLongevityTest.test_large_partition_longevity
Test-id: dd772ace-bb57-4b43-a737-cfae50b78267
Test details
Start time: 2020-02-03 13:03:44
End time: 2020-02-05 22:11:45
Build URL: link
Test run by User: yarongilor
System under test
Scylla version: (ami-0dbad0053cd68fcfe)
Instance type: i3en.3xlarge
Number of scylladb nodes: 5

Nemesis Details
Running nemesis: ChaosMonkey

Name Count Runs Failures
ValidateHintedHandoffShortDowntime 1 0 1
RestartThenRepairNode 1 1 0
MultipleHardRebootNode 1 1 0

Hydra commands:
Restore Monitor Stack command: $ hydra investigate show-monitor dd772ace-bb57-4b43-a737-cfae50b78267
Show all stored logs command: $ hydra investigate show-logs dd772ace-bb57-4b43-a737-cfae50b78267
Running instances

Name	Ip address	Current State	Cloud	Region
longevity-large-partitions-200k-pks-db-node-dd772ace-4	52.50.10.159	running	aws	eu-west-1
longevity-large-partitions-200k-pks-db-node-dd772ace-1	52.209.193.53	running	aws	eu-west-1
longevity-large-partitions-200k-pks-db-node-dd772ace-3	34.240.52.33	running	aws	eu-west-1
longevity-large-partitions-200k-pks-monitor-node-dd772ace-1	63.33.206.104	running	aws	eu-west-1
longevity-large-partitions-200k-pks-db-node-dd772ace-7	34.242.62.175	running	aws	eu-west-1
longevity-large-partitions-200k-pks-db-node-dd772ace-8	54.76.132.190	running	aws	eu-west-1

200pk_large_partitions

scenario failures:

2020-02-04 01:32:24.000: (DatabaseLogEvent Severity.CRITICAL): type=DATABASE_ERROR regex=Exception  line_number=77420 node=Node longevity-large-partitions-200k-pks-db-node-dd772ace-4 [34.242.84.132 | 10.0.58.160] (seed: False)
2020-02-04T01:32:24+00:00  longevity-large-partitions-200k-pks-db-node-dd772ace-4 !INFO    | scylla: [shard 0] cql_server - exception while processing connection: std::system_error (error system:32, sendmsg: Broken pipe)
?? ??:0
2020-02-04 01:42:24.000: (DatabaseLogEvent Severity.CRITICAL): type=DATABASE_ERROR regex=Exception  line_number=81904 node=Node longevity-large-partitions-200k-pks-db-node-dd772ace-1 [52.209.193.53 | 10.0.243.15] (seed: True)
2020-02-04T01:42:24+00:00  longevity-large-partitions-200k-pks-db-node-dd772ace-1 !INFO    | scylla: [shard 0] cql_server - exception while processing connection: std::system_error (error system:32, sendmsg: Broken pipe)
?? ??:0
2020-02-05 01:25:24.000: (DatabaseLogEvent Severity.CRITICAL): type=NO_SPACE_ERROR regex=No space left on device line_number=251959 node=Node longevity-large-partitions-200k-pks-db-node-dd772ace-3 [34.240.52.33 | 10.0.16.116] (seed: False)
2020-02-05T01:25:24+00:00  longevity-large-partitions-200k-pks-db-node-dd772ace-3 !WARNING | scylla: [shard 3] commitlog - Exception in segment reservation: storage_io_error (Storage I/O error: 28: No space left on device)
?? ??:0
?? ??:0
2020-02-05 03:31:37.000: (DatabaseLogEvent Severity.CRITICAL): type=DATABASE_ERROR regex=Exception  line_number=278884 node=Node longevity-large-partitions-200k-pks-db-node-dd772ace-4 [52.50.10.159 | 10.0.58.160] (seed: False)
2020-02-05T03:31:37+00:00  longevity-large-partitions-200k-pks-db-node-dd772ace-4 !WARNING | scylla: [shard 2] storage_proxy - Failed to apply mutation from 10.0.243.15#2: seastar::gate_closed_exception (gate closed)

apply mutation error:

< t:2020-02-05 03:26:55,576 f:nemesis.py      l:636  c:sdcm.nemesis         p:INFO  > sdcm.nemesis.ChaosMonkey: >>>>>>>>>>>>>Started random_disrupt_method restart_then_repair_node
< t:2020-02-05 03:26:55,577 f:nemesis.py      l:383  c:sdcm.nemesis         p:DEBUG > sdcm.nemesis.ChaosMonkey: Set current_disruption -> RestartThenRepairNode Node longevity-large-partitions-200k-pks-db-node-dd772ace-4 [34.242.84.132 | 10.0.58.160] (seed: False)
< t:2020-02-05 03:31:48,696 f:cluster.py      l:1369 c:sdcm.cluster         p:DEBUG > 2020-02-05T03:31:37+00:00  longevity-large-partitions-200k-pks-db-node-dd772ace-4 !WARNING | scylla: [shard 2] storage_proxy - Failed to apply mutation from 10.0.243.15#2: seastar::gate_closed_exception (gate closed)
< t:2020-02-05 03:31:48,696 f:cluster.py      l:1369 c:sdcm.cluster         p:DEBUG > 2020-02-05T03:31:37+00:00  longevity-large-partitions-200k-pks-db-node-dd772ace-4 !WARNING | scylla: [shard 2] storage_proxy - Failed to apply mutation from 10.0.243.15#2: seastar::gate_closed_exception (gate closed)
< t:2020-02-05 03:31:48,697 f:cluster.py      l:1369 c:sdcm.cluster         p:DEBUG > 2020-02-05T03:31:37+00:00  longevity-large-partitions-200k-pks-db-node-dd772ace-4 !WARNING | scylla: [shard 2] storage_proxy - Failed to apply mutation from 10.0.243.15#2: seastar::gate_closed_exception (gate closed)
< t:2020-02-05 03:31:48,697 f:cluster.py      l:1369 c:sdcm.cluster         p:DEBUG > 2020-02-05T03:31:37+00:00  longevity-large-partitions-200k-pks-db-node-dd772ace-4 !WARNING | scylla: [shard 2] storage_proxy - Failed to apply mutation from 10.0.243.15#2: seastar::gate_closed_exception (gate closed)
< t:2020-02-05 03:31:48,698 f:cluster.py      l:1369 c:sdcm.cluster         p:DEBUG > 2020-02-05T03:31:37+00:00  longevity-large-partitions-200k-pks-db-node-dd772ace-4 !WARNING | scylla: [shard 2] storage_proxy - Failed to apply mutation from 10.0.243.15#2: seastar::gate_closed_exception (gate closed)
< t:2020-02-05 03:31:48,698 f:cluster.py      l:1369 c:sdcm.cluster    

logs:
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Log links for testrun with test id dd772ace-bb57-4b43-a737-cfae50b78267 |
+-----------------+-------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Date | Log type | Link |
+-----------------+-------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 20200205_220127 | grafana | https://cloudius-jenkins-test.s3.amazonaws.com/dd772ace-bb57-4b43-a737-cfae50b78267/20200205_220127/grafana-screenshot-overview-20200205_220216-longevity-large-partitions-200k-pks-monitor-node-dd772ace-1.png |
| 20200205_220127 | grafana | https://cloudius-jenkins-test.s3.amazonaws.com/dd772ace-bb57-4b43-a737-cfae50b78267/20200205_220127/grafana-screenshot-scylla-per-server-metrics-nemesis-20200205_220137-longevity-large-partitions-200k-pks-monitor-node-dd772ace-1.png |
| 20200205_220800 | grafana | https://cloudius-jenkins-test.s3.amazonaws.com/dd772ace-bb57-4b43-a737-cfae50b78267/20200205_220800/grafana-screenshot-overview-20200205_220844-longevity-large-partitions-200k-pks-monitor-node-dd772ace-1.png |
| 20200205_220800 | grafana | https://cloudius-jenkins-test.s3.amazonaws.com/dd772ace-bb57-4b43-a737-cfae50b78267/20200205_220800/grafana-screenshot-scylla-per-server-metrics-nemesis-20200205_220802-longevity-large-partitions-200k-pks-monitor-node-dd772ace-1.png |
| 20200205_221153 | db-cluster | https://cloudius-jenkins-test.s3.amazonaws.com/dd772ace-bb57-4b43-a737-cfae50b78267/20200205_221153/db-cluster-dd772ace.zip |
| 20200205_221153 | loader-set | https://cloudius-jenkins-test.s3.amazonaws.com/dd772ace-bb57-4b43-a737-cfae50b78267/20200205_221153/loader-set-dd772ace.zip |
| 20200205_221153 | monitor-set | https://cloudius-jenkins-test.s3.amazonaws.com/dd772ace-bb57-4b43-a737-cfae50b78267/20200205_221153/monitor-set-dd772ace.zip |
| 20200205_221153 | sct-runner | https://cloudius-jenkins-test.s3.amazonaws.com/dd772ace-bb57-4b43-a737-cfae50b78267/20200205_221153/sct-runner-dd772ace.zip |
+-----------------+-------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

@roydahan
Copy link

roydahan commented Feb 9, 2020

@yarongilor please add more details

@slivne
Copy link
Contributor

slivne commented Apr 14, 2020

I am closing this issue its stale

@slivne slivne closed this as completed Apr 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants