Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Starting scylla server and populating keyspace causes "Storage I/O error: 28: No space left on device" and shutdown #22020

Open
2 tasks
yarongilor opened this issue Dec 22, 2024 · 18 comments

Comments

@yarongilor
Copy link

yarongilor commented Dec 22, 2024

Packages

Scylla version: 2024.3.0~dev-20241215.811c9ccb7f91 with build-id cf31bbad95480fbbafa9cb498cf0a54cd58c7485

Kernel Version: 6.8.0-1020-aws

Issue description

  • This issue is a regression.
  • It is unknown if this issue is a regression.

Describe your issue in detail and steps it took to produce it.
Run a setup of 90% disk utilization. The massive write load is stopped once getting to 90%.
Ran an SCT nemesis of disrupt_stop_wait_start_scylla_server that stops scylla for 5 minutes then start it back.
No significant load on cluster at that time - mainly a read load and perhaps a 3 writes per second.
So no disk utilization growth is expected at this time. Yet, once scylla started - the utilization started increasing as well, unexpectedly.
The scylla service is started and the node disk utilization is gradually increased until reaches 100% and the start command fails after 10 minutes.

log of node-3 event is:

< t:2024-12-17 17:32:08,905 f:base.py         l:143  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.4.2.150>: Command "sudo systemctl stop scylla-server.service" finished with status 0
< t:2024-12-17 17:32:08,906 f:nemesis.py      l:694  c:sdcm.nemesis         p:INFO  > sdcm.nemesis.SisyphusMonkey: Sleep for 300 seconds
< t:2024-12-17 17:37:09,067 f:remote_base.py  l:560  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.4.2.150>: Running command "sudo systemctl start scylla-server.service"...
< t:2024-12-17 17:37:09,503 f:db_log_reader.py l:125  c:sdcm.db_log_reader   p:DEBUG > 2024-12-17T17:37:09.425+00:00 elasticity-test-nemesis-master-db-node-2c23b329-3   !NOTICE | sudo[9891]: scyllaadm : PWD=/home/scyllaadm ; USER=root ; COMMAND=/usr/bin/systemctl start scylla-server.service
< t:2024-12-17 17:37:09,808 f:db_log_reader.py l:125  c:sdcm.db_log_reader   p:DEBUG > 2024-12-17T17:37:09.759+00:00 elasticity-test-nemesis-master-db-node-2c23b329-3     !INFO | scylla_prepare[9905]: Restarting irqbalance via systemctl...
< t:2024-12-17 17:47:07,204 f:base.py         l:231  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.4.2.150>: See "systemctl status scylla-server.service" and "journalctl -xeu scylla-server.service" for details.
< t:2024-12-17 17:47:07,204 f:base.py         l:147  c:RemoteLibSSH2CmdRunner p:ERROR > <10.4.2.150>: Error executing command: "sudo systemctl start scylla-server.service"; Exit status: 1

node-3 failure:

Dec 17 17:47:06.640044 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9949]:  [shard 1:main] compaction - [Split keyspace1.standard1 e3a4fcb0-bc9e-11ef-a4cc-4055562ebd29] Split 1 sstables to [/var/lib/scylla/data/keyspace1/standard1-cbf5
1600bc7f11ef88000a39228301fa/me-3gm5_1de0_2vb4w2i7xdv75bg209-big-Data.db:level=0,/var/lib/scylla/data/keyspace1/standard1-cbf51600bc7f11ef88000a39228301fa/me-3gm5_1de8_4ogtc2i7xdv75bg209-big-Data.db:level=0]. 1GB to 1GB (~99% of original) in 18453
ms = 57MB/s. ~940544 total partitions merged to 940470.
Dec 17 17:47:06.642875 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9949]:  [shard 0:main] database - Exception while populating keyspace 'keyspace1' with column family 'standard1' from '/var/lib/scylla/data/keyspace1/standard1-cbf5160
0bc7f11ef88000a39228301fa': storage_io_error (Storage I/O error: 28: No space left on device)
Dec 17 17:47:06.644380 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9949]:  [shard 0:main] init - Shutting down compaction manager API

grafana shows the node's disk utilization clibms up to 100%:
Image

SCT failure error event:

disrupt_stop_wait_start_scylla_server	elasticity-test-nemesis-master-db-node-2c23b329-3	Failed	2024-12-17 17:32:04	2024-12-17 17:47:08
Nemesis Information
Class: Sisyphus
Name: disrupt_stop_wait_start_scylla_server
Status: Failed
Failure reason
Traceback (most recent call last):
  File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 5446, in wrapper
    result = method(*args[1:], **kwargs)
  File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 210, in wrapper
    return func(*args, **kwargs)
  File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 700, in disrupt_stop_wait_start_scylla_server
    self.target_node.start_scylla_server(verify_up=True, verify_down=False)
  File "/home/ubuntu/scylla-cluster-tests/sdcm/cluster.py", line 2472, in start_scylla_server
    self.start_service(service_name='scylla-server', timeout=timeout * 4)
  File "/home/ubuntu/scylla-cluster-tests/sdcm/cluster.py", line 2455, in start_service
    self._service_cmd(service_name=service_name, cmd='start', timeout=timeout, ignore_status=ignore_status)
  File "/home/ubuntu/scylla-cluster-tests/sdcm/cluster.py", line 2445, in _service_cmd
    return self.remoter.run(cmd, timeout=timeout, ignore_status=ignore_status)
  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/remote_base.py", line 653, in run
    result = _run()
  File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/decorators.py", line 72, in inner
    return func(*args, **kwargs)
  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/remote_base.py", line 644, in _run
    return self._run_execute(cmd, timeout, ignore_status, verbose, new_session, watchers)
  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/remote_base.py", line 577, in _run_execute
    result = connection.run(**command_kwargs)
  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/libssh2_client/__init__.py", line 625, in run
    return self._complete_run(channel, exception, timeout_reached, timeout, result, warn, stdout, stderr)
  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/libssh2_client/__init__.py", line 660, in _complete_run
    raise UnexpectedExit(result)
sdcm.remote.libssh2_client.exceptions.UnexpectedExit: Encountered a bad command exit code!

Command: 'sudo systemctl start scylla-server.service'

Exit code: 1

Stdout:



Stderr:

Job for scylla-server.service failed because the control process exited with error code.
See "systemctl status scylla-server.service" and "journalctl -xeu scylla-server.service" for details.

Impact

node is down.
Describe the impact this issue causes to the user.

How frequently does it reproduce?

Describe the frequency with how this issue can be reproduced.

Installation details

Cluster size: 3 nodes (i4i.large)

Scylla Nodes used in this run:

  • elasticity-test-nemesis-master-db-node-2c23b329-3 (54.220.97.44 | 10.4.14.87) (shards: 2)
  • elasticity-test-nemesis-master-db-node-2c23b329-2 (54.247.16.111 | 10.4.14.91) (shards: 2)
  • elasticity-test-nemesis-master-db-node-2c23b329-1 (54.73.180.119 | 10.4.13.78) (shards: 2)

OS / Image: ami-0a3508c8059b5dc39 (aws: undefined_region)

Test: byo-longevity-test-yg2
Test id: 2c23b329-4757-4f06-b60c-fc222590dcf4
Test name: scylla-staging/yarongilor/byo-longevity-test-yg2
Test method: longevity_test.LongevityTest.test_custom_time
Test config file(s):

Logs and commands
  • Restore Monitor Stack command: $ hydra investigate show-monitor 2c23b329-4757-4f06-b60c-fc222590dcf4
  • Restore monitor on AWS instance using Jenkins job
  • Show all stored logs command: $ hydra investigate show-logs 2c23b329-4757-4f06-b60c-fc222590dcf4

Logs:

Jenkins job URL
Argus

@mykaul
Copy link
Contributor

mykaul commented Dec 22, 2024

@yarongilor - I'm not sure what was the expected behavior, without increasing the cluster size?

@roydahan
Copy link

First, @yarongilor I think that the title is not good enough.
Second, the most interesting part of the the logs is between the logs you showed in the description.
The relevant part is to know what's running during the "scylla start", from quick look, it looks like compactions and tabelts splitting.

@yarongilor
Copy link
Author

@yarongilor - I'm not sure what was the expected behavior, without increasing the cluster size?

@mykaul , the scenario description is rephrased to:

Run a setup of 90% disk utilization. The massive write load is stopped once getting to 90%.
Ran an SCT nemesis of disrupt_stop_wait_start_scylla_server that stops scylla for 5 minutes then start it back.
No significant load on cluster at that time - mainly a read load and perhaps a 3 writes per second.
So no disk utilization growth is expected at this time. Yet, once scylla started - the utilization started increasing as well, unexpectedly.
The scylla service is started and the node disk utilization is gradually increased until reaches 100% and the start command fails after 10 minutes.

So the bottom line question is why scylla service start triggers something like 64GB additional disk space.

@yarongilor
Copy link
Author

First, @yarongilor I think that the title is not good enough. Second, the most interesting part of the the logs is between the logs you showed in the description. The relevant part is to know what's running during the "scylla start", from quick look, it looks like compactions and tabelts splitting.

it's all about populating keyspace1 and specifically the compaction splitting sstables:

Dec 17 17:37:11.784007 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9949]:  [shard 0:main] database - Populating Keyspace keyspace1
Dec 17 17:44:37.596644 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9949]:  [shard 0:main] compaction - [Split keyspace1.standard1 95ce61c0-bc9e-11ef-8e8f-4056562ebd29] Splitting [/var/lib/scylla/data/keyspace1/standard1-cbf51600bc7f11ef88000a39228301fa/me-3gm5_174b_0cuyo2wkdp0xc5farf-big-Data.db:level=0:origin=compaction]
Dec 17 17:44:40.926614 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9949]:  [shard 0:main] compaction - [Split keyspace1.standard1 80e5e760-bc9e-11ef-8e8f-4056562ebd29] Split 1 sstables to [/var/lib/scylla/data/keyspace1/standard1-cbf51600bc7f11ef88000a39228301fa/me-3gm5_1d9e_54yy8261n5cehtrpyh-big-Data.db:level=0,/var/lib/scylla/data/keyspace1/standard1-cbf51600bc7f11ef88000a39228301fa/me-3gm5_1d9g_0n5c0261n5cehtrpyh-big-Data.db:level=0]. 736MB to 736MB (~100% of original) in 38408ms = 19MB/s. ~650624 total partitions merged to 650619.
Dec 17 17:44:40.930460 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9949]:  [shard 0:main] compaction - [Split keyspace1.standard1 97cb1c20-bc9e-11ef-8e8f-4056562ebd29] Splitting [/var/lib/scylla/data/keyspace1/standard1-cbf51600bc7f11ef88000a39228301fa/me-3gm5_1b0g_4dyq82wkdp0xc5farf-big-Data.db:level=0:origin=split]
Dec 17 17:44:42.824809 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9949]:  [shard 1:main] compaction - [Split keyspace1.standard1 83c2a180-bc9e-11ef-a4cc-4055562ebd29] Split 1 sstables to [/var/lib/scylla/data/keyspace1/standard1-cbf51600bc7f11ef88000a39228301fa/me-3gm5_1d9j_4qtow2i7xdv75bg209-big-Data.db:level=0,/var/lib/scylla/data/keyspace1/standard1-cbf51600bc7f11ef88000a39228301fa/me-3gm5_1da1_2pisw2i7xdv75bg209-big-Data.db:level=0]. 674MB to 674MB (~99% of original) in 35504ms = 19MB/s. ~598144 total partitions merged to 598058.
Dec 17 17:44:42.827192 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9949]:  [shard 1:main] compaction - [Split keyspace1.standard1 98ec91b0-bc9e-11ef-a4cc-4055562ebd29] Splitting [/var/lib/scylla/data/keyspace1/standard1-cbf51600bc7f11ef88000a39228301fa/me-3gm5_1bxs_25sxc263ine9asqa0r-big-Data.db:level=0:origin=split]
Dec 17 17:44:51.665197 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9949]:  [shard 1:main] compaction - [Split keyspace1.standard1 8e335c40-bc9e-11ef-a4cc-4055562ebd29] Split 1 sstables to [/var/lib/scylla/data/keyspace1/standard1-cbf51600bc7f11ef88000a39228301fa/me-3gm5_1da1_1bd8w2i7xdv75bg209-big-Data.db:level=0,/var/lib/scylla/data/keyspace1/standard1-cbf51600bc7f11ef88000a39228301fa/me-3gm5_1dae_1yqao2i7xdv75bg209-big-Data.db:level=0]. 508MB to 507MB (~99% of original) in 26829ms = 18MB/s. ~450432 total partitions merged to 450334.
Dec 17 17:44:51.668259 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9949]:  [shard 1:main] compaction - [Split keyspace1.standard1 9e319940-bc9e-11ef-a4cc-4055562ebd29] Splitting [/var/lib/scylla/data/keyspace1/standard1-cbf51600bc7f11ef88000a39228301fa/me-3gm5_1cfa_2viuo263ine9asqa0r-big-Data.db:level=0:origin=memtable]
Dec 17 17:44:52.148025 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9949]:  [shard 1:main] sstable - Rebuilding bloom filter /var/lib/scylla/data/keyspace1/standard1-cbf51600bc7f11ef88000a39228301fa/me-3gm5_1dar_5drb42i7xdv75bg209-big-Filter.db: resizing bitset from 352 bytes to 240 bytes. sstable origin: split
Dec 17 17:44:52.444784 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9949]:  [shard 1:main] sstable - Rebuilding bloom filter /var/lib/scylla/data/keyspace1/standard1-cbf51600bc7f11ef88000a39228301fa/me-3gm5_1das_1inlc2i7xdv75bg209-big-Filter.db: resizing bitset from 352 bytes to 264 bytes. sstable origin: split
Dec 17 17:44:52.551453 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9949]:  [shard 1:main] compaction - [Split keyspace1.standard1 9e319940-bc9e-11ef-a4cc-4055562ebd29] Split 1 sstables to [/var/lib/scylla/data/keyspace1/standard1-cbf51600bc7f11ef88000a39228301fa/me-3gm5_1dar_5drb42i7xdv75bg209-big-Data.db:level=0,/var/lib/scylla/data/keyspace1/standard1-cbf51600bc7f11ef88000a39228301fa/me-3gm5_1das_1inlc2i7xdv75bg209-big-Data.db:level=0]. 448kB to 454kB (~101% of original) in 883ms = 508kB/s. ~512 total partitions merged to 392.
Dec 17 17:44:52.553084 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9949]:  [shard 1:main] compaction - [Split keyspace1.standard1 9eb87c80-bc9e-11ef-a4cc-4055562ebd29] Splitting [/var/lib/scylla/data/keyspace1/standard1-cbf51600bc7f11ef88000a39228301fa/me-3gm5_1bho_33gcg263ine9asqa0r-big-Data.db:level=0:origin=split]
Dec 17 17:44:59.080024 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9949]:  [shard 1:main] compaction - [Split keyspace1.standard1 9eb87c80-bc9e-11ef-a4cc-4055562ebd29] Split 1 sstables to [/var/lib/scylla/data/keyspace1/standard1-cbf51600bc7f11ef88000a39228301fa/me-3gm5_1das_4u1fk2i7xdv75bg209-big-Data.db:level=0,/var/lib/scylla/data/keyspace1/standard1-cbf51600bc7f11ef88000a39228301fa/me-3gm5_1dav_55m3k2i7xdv75bg209-big-Data.db:level=0]. 113MB to 113MB (~100% of original) in 6526ms = 17MB/s. ~100608 total partitions merged to 100585.
Dec 17 17:44:59.082542 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9949]:  [shard 1:main] compaction - [Split keyspace1.standard1 a29ce2a0-bc9e-11ef-a4cc-4055562ebd29] Splitting [/var/lib/scylla/data/keyspace1/standard1-cbf51600bc7f11ef88000a39228301fa/me-3gm5_1b1h_36o34263ine9asqa0r-big-Data.db:level=0:origin=split]
Dec 17 17:44:59.446138 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9949]:  [shard 1:main] compaction - [Split keyspace1.standard1 820d9e80-bc9e-11ef-a4cc-4055562ebd29] Split 1 sstables to [/var/lib/scylla/data/keyspace1/standard1-cbf51600bc7f11ef88000a39228301fa/me-3gm5_1d9h_0ifkw2i7xdv75bg209-big-Data.db:level=0,/var/lib/scylla/data/keyspace1/standard1-cbf51600bc7f11ef88000a39228301fa/me-3gm5_1dau_1m31s2i7xdv75bg209-big-Data.db:level=0]. 1GB to 1GB (~99% of original) in 54990ms = 19MB/s. ~940032 total partitions merged to 940027.
Dec 17 17:44:59.450053 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9949]:  [shard 1:main] compaction - [Split keyspace1.standard1 a2d4e290-bc9e-11ef-a4cc-4055562ebd29] Splitting [/var/lib/scylla/data/keyspace1/standard1-cbf51600bc7f11ef88000a39228301fa/me-3gm5_1ayu_5r9f4263ine9asqa0r-big-Data.db:level=0:origin=split]
Dec 17 17:45:00.451887 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9949]:  [shard 0:main] storage_service - Shutting down communications due to I/O errors until operator intervention: Disk error: std::system_error (error system:28, No space left on device)

@yarongilor
Copy link
Author

A different (but possibly related?) problem is found on another run during nemesis of restart_then_repair:

  • node-3 was restarted.
  • few minutes later node-2 got no-space-left error.
  • node-3 disk utilization dropped from ~ 90% to 21% after restart (why? expected?)

Image

< t:2024-12-22 12:33:57,811 f:db_log_reader.py l:125  c:sdcm.db_log_reader   p:DEBUG > 2024-12-22T12:33:57.736+00:00 elasticity-test-nemesis-master-db-node-5038e5d7-3   !NOTICE | sudo[2747]: scyllaadm : PWD=/home/scyllaadm ; USER=root ; COMMAND=/usr/bin/systemctl start scylla-server.service
< t:2024-12-22 12:34:00,444 f:base.py         l:143  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.4.1.149>: Command "sudo systemctl start scylla-server.service" finished with status 0
Dec 22 12:51:55.968176 elasticity-test-nemesis-master-db-node-5038e5d7-2 scylla[5309]:  [shard 0:strm] compaction - [Split keyspace1.standard1 863e2650-c063-11ef-9093-a94150c8ccbe] Splitting [/var/lib/scylla/data/keyspace1/standard1-96cea520c04b11
efb04c5e014984fbec/me-3gma_0xyy_5w6w0275edwk4n95tq-big-Data.db:level=0:origin=memtable]
Dec 22 12:51:56.182642 elasticity-test-nemesis-master-db-node-5038e5d7-2 scylla[5309]:  [shard 0:comp] sstable - Rebuilding bloom filter /var/lib/scylla/data/keyspace1/standard1-96cea520c04b11efb04c5e014984fbec/me-3gma_0zno_03uw0275edwk4n95tq-big-
Filter.db: resizing bitset from 1125216 bytes to 1174264 bytes. sstable origin: compaction
Dec 22 12:51:59.176725 elasticity-test-nemesis-master-db-node-5038e5d7-2 scylla[5309]:  [shard 1:comp] sstable - Rebuilding bloom filter /var/lib/scylla/data/keyspace1/standard1-96cea520c04b11efb04c5e014984fbec/me-3gma_0znz_2aiog2n00uvpyn1qri-big-
Filter.db: resizing bitset from 1150416 bytes to 1173904 bytes. sstable origin: compaction
Dec 22 12:52:02.255126 elasticity-test-nemesis-master-db-node-5038e5d7-2 scylla[5309]:  [shard 1:comp] sstable - Rebuilding bloom filter /var/lib/scylla/data/keyspace1/standard1-96cea520c04b11efb04c5e014984fbec/me-3gma_0zo5_1b5j42n00uvpyn1qri-big-
Filter.db: resizing bitset from 1101864 bytes to 1174128 bytes. sstable origin: compaction
Dec 22 12:52:09.005271 elasticity-test-nemesis-master-db-node-5038e5d7-2 scylla[5309]:  [shard 0:comp] sstable - Rebuilding bloom filter /var/lib/scylla/data/keyspace1/standard1-96cea520c04b11efb04c5e014984fbec/me-3gma_0zo5_0spy8275edwk4n95tq-big-
Filter.db: resizing bitset from 790464 bytes to 1173768 bytes. sstable origin: compaction
Dec 22 12:52:19.234425 elasticity-test-nemesis-master-db-node-5038e5d7-2 scylla[5309]:  [shard 0:comp] storage_service - Shutting down communications due to I/O errors until operator intervention: Disk error: std::system_error (error system:28, No
 space left on device)

Packages

Scylla version: 2024.3.0~dev-20241220.e8463f6b719f with build-id bb749b8a21072b70e631576259d767ecfe654583

Kernel Version: 6.8.0-1021-aws

Issue description

  • This issue is a regression.
  • It is unknown if this issue is a regression.

Describe your issue in detail and steps it took to produce it.

Impact

Describe the impact this issue causes to the user.

How frequently does it reproduce?

Describe the frequency with how this issue can be reproduced.

Installation details

Cluster size: 3 nodes (i4i.large)

Scylla Nodes used in this run:

  • elasticity-test-nemesis-master-db-node-5038e5d7-3 (3.250.253.7 | 10.4.12.56) (shards: 2)
  • elasticity-test-nemesis-master-db-node-5038e5d7-2 (3.254.45.100 | 10.4.13.235) (shards: 2)
  • elasticity-test-nemesis-master-db-node-5038e5d7-1 (52.215.77.215 | 10.4.13.250) (shards: 2)

OS / Image: ami-0af6c6b0a814f34a7 (aws: undefined_region)

Test: byo-longevity-test-yg2
Test id: 5038e5d7-3e04-43aa-b281-82b80e350a90
Test name: scylla-staging/yarongilor/byo-longevity-test-yg2
Test method: longevity_test.LongevityTest.test_custom_time
Test config file(s):

Logs and commands
  • Restore Monitor Stack command: $ hydra investigate show-monitor 5038e5d7-3e04-43aa-b281-82b80e350a90
  • Restore monitor on AWS instance using Jenkins job
  • Show all stored logs command: $ hydra investigate show-logs 5038e5d7-3e04-43aa-b281-82b80e350a90

Logs:

Jenkins job URL
Argus

@pehala
Copy link
Contributor

pehala commented Dec 23, 2024

Run a setup of 90% disk utilization. The massive write load is stopped once getting to 90%. Ran an SCT nemesis of disrupt_stop_wait_start_scylla_server that stops scylla for 5 minutes then start it back. No significant load on cluster at that time - mainly a read load and perhaps a 3 writes per second. So no disk utilization growth is expected at this time. Yet, once scylla started - the utilization started increasing as well, unexpectedly. The scylla service is started and the node disk utilization is gradually increased until reaches 100% and the start command fails after 10 minutes.

So the bottom line question is why scylla service start triggers something like 64GB additional disk space.

Based on the scenario you are describing, it should be reasonably easy to create a non-SCT reproducer for this issue, right?

@pehala pehala added the bug label Dec 23, 2024
@yarongilor
Copy link
Author

Run a setup of 90% disk utilization. The massive write load is stopped once getting to 90%. Ran an SCT nemesis of disrupt_stop_wait_start_scylla_server that stops scylla for 5 minutes then start it back. No significant load on cluster at that time - mainly a read load and perhaps a 3 writes per second. So no disk utilization growth is expected at this time. Yet, once scylla started - the utilization started increasing as well, unexpectedly. The scylla service is started and the node disk utilization is gradually increased until reaches 100% and the start command fails after 10 minutes.
So the bottom line question is why scylla service start triggers something like 64GB additional disk space.

Based on the scenario you are describing, it should be reasonably easy to create a non-SCT reproducer for this issue, right?

Not sure it's too easy/available

@pehala
Copy link
Contributor

pehala commented Dec 23, 2024

Not sure it's too easy/available

what would be the problematic part, limiting storage space? I think, we could demonstrate the issue with lower utilization but visible spike in storage space after restart

@avikivity
Copy link
Member

@yarongilor - I'm not sure what was the expected behavior, without increasing the cluster size?

@mykaul , the scenario description is rephrased to:

Run a setup of 90% disk utilization. The massive write load is stopped once getting to 90%. Ran an SCT nemesis of disrupt_stop_wait_start_scylla_server that stops scylla for 5 minutes then start it back. No significant load on cluster at that time - mainly a read load and perhaps a 3 writes per second. So no disk utilization growth is expected at this time. Yet, once scylla started - the utilization started increasing as well, unexpectedly. The scylla service is started and the node disk utilization is gradually increased until reaches 100% and the start command fails after 10 minutes.

So the bottom line question is why scylla service start triggers something like 64GB additional disk space.

Could be commitlog replay.

Did you shut down the server cleanly?

@elcallio clean shutdown marks all commitlog segments as free, does it not?

@avikivity
Copy link
Member

64GB is exactly the amount of space allocated to commitlog on i4i.4xlarge. However, half of it should be free under normal operation (and all of it after a clean shutdown).

@yarongilor
Copy link
Author

@yarongilor - I'm not sure what was the expected behavior, without increasing the cluster size?

@mykaul , the scenario description is rephrased to:
Run a setup of 90% disk utilization. The massive write load is stopped once getting to 90%. Ran an SCT nemesis of disrupt_stop_wait_start_scylla_server that stops scylla for 5 minutes then start it back. No significant load on cluster at that time - mainly a read load and perhaps a 3 writes per second. So no disk utilization growth is expected at this time. Yet, once scylla started - the utilization started increasing as well, unexpectedly. The scylla service is started and the node disk utilization is gradually increased until reaches 100% and the start command fails after 10 minutes.
So the bottom line question is why scylla service start triggers something like 64GB additional disk space.

Could be commitlog replay.

Did you shut down the server cleanly?

@elcallio clean shutdown marks all commitlog segments as free, does it not?

The shutdown was basically clean like:

Dec 17 17:32:01.935285 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9378]:  [shard 0:strm] compaction - [Split keyspace1.standard1 d14a4400-bc9c-11ef-84ab-552edb593d46] Split 1 sstables to [/var/lib/scylla/data/keyspace1/standard1-cbf51600bc7f11ef88000a39228301fa/me-3gm5_1cpa_2fg5c20mpwaxp970w6-big-Data.db:level=0,/var/lib/scylla/data/keyspace1/standard1-cbf51600bc7f11ef88000a39228301fa/me-3gm5_1cpc_14q1s20mpwaxp970w6-big-Data.db:level=0]. 93MB to 93MB (~100% of original) in 3531ms = 26MB/s. ~82432 total partitions merged to 82421.
Dec 17 17:32:01.938423 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9378]:  [shard 0:strm] compaction - [Split keyspace1.standard1 d365d100-bc9c-11ef-84ab-552edb593d46] Splitting [/var/lib/scylla/data/keyspace1/standard1-cbf51600bc7f11ef88000a39228301fa/me-3gm5_1bq2_1w5pc2wkdp0xc5farf-big-Data.db:level=0:origin=split]
Dec 17 17:32:02.505110 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla-audit[9378]: "10.4.14.87:0", "AUTH", "", "", "", "", "10.4.15.166:0", "scylla-qa", "false"
Dec 17 17:32:02.519934 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla-audit[9378]: "10.4.14.87:0", "AUTH", "", "", "", "", "10.4.15.166:0", "scylla-qa", "false"
Dec 17 17:32:02.523593 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla-audit[9378]: "10.4.14.87:0", "AUTH", "", "", "", "", "10.4.15.166:0", "scylla-qa", "false"
Dec 17 17:32:05.413164 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9378]:  [shard 0:main] compaction_manager - Asked to stop
Dec 17 17:32:05.413191 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9378]:  [shard 0:main] compaction_manager - Stopping 3 tasks for 3 ongoing compactions due to shutdown
Dec 17 17:32:05.413200 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9378]:  [shard 0:main] view_update_generator - Terminating background fiber
Dec 17 17:32:05.413276 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9378]:  [shard 0:main] init - Signal received; shutting down
Dec 17 17:32:05.413280 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9378]:  [shard 0:main] init - Shutting down thrift controller API
Dec 17 17:32:05.413290 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9378]:  [shard 0:strm] view_update_generator - leaving 0 unstaged sstables unprocessed
Dec 17 17:32:05.414718 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9378]:  [shard 1:main] compaction_manager - Asked to stop
Dec 17 17:32:05.414756 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9378]:  [shard 1:main] compaction_manager - Stopping 3 tasks for 3 ongoing compactions due to shutdown
Dec 17 17:32:05.414762 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9378]:  [shard 1:main] view_update_generator - Terminating background fiber
Dec 17 17:32:05.414811 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9378]:  [shard 1:strm] view_update_generator - leaving 0 unstaged sstables unprocessed
Dec 17 17:32:05.414842 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9378]:  [shard 0:main] init - Shutting down thrift controller API was successful
Dec 17 17:32:05.414846 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9378]:  [shard 0:main] init - Shutting down transport controller API
Dec 17 17:32:05.414936 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9378]:  [shard 0:strm] gossip - failure_detector_loop: Got error in the loop, live_nodes={10.4.14.91, 10.4.13.78}: seastar::sleep_aborted (Sleep is aborted)
Dec 17 17:32:05.414939 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9378]:  [shard 0:strm] gossip - failure_detector_loop: Finished main loop
Dec 17 17:32:05.414955 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9378]:  [shard 0:main] init - Shutting down transport controller API was successful
Dec 17 17:32:05.414958 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9378]:  [shard 0:main] init - Shutting down local storage
Dec 17 17:32:05.414966 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9378]:  [shard 0:main] storage_service - Stop transport: starts
Dec 17 17:32:05.414981 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9378]:  [shard 0:main] migration_manager - stopping migration service
Dec 17 17:32:05.414995 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9378]:  [shard 1:main] migration_manager - stopping migration service
Dec 17 17:32:05.415005 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9378]:  [shard 0:main] storage_service - Shutting down native transport server
Dec 17 17:32:05.415043 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9378]:  [shard 0:main] gossip - Fail to apply application_state: seastar::abort_requested_exception (abort requested)
Dec 17 17:32:05.415773 elasticity-test-nemesis-master-db-node-2c23b329-3 systemd[1]: Stopping scylla-server.service - Scylla Server...
Dec 17 17:32:05.417205 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9378]:  [shard 0:main] cql_server_controller - CQL server stopped
Dec 17 17:32:05.418301 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9378]:  [shard 0:main] storage_service - Shutting down native transport server was successful
Dec 17 17:32:05.418380 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9378]:  [shard 0:main] storage_service - Stop transport: shutdown rpc and cql server done
Dec 17 17:32:05.418454 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9378]:  [shard 0:main] gossip - My status = NORMAL
Dec 17 17:32:05.418502 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9378]:  [shard 0:main] gossip - Announcing shutdown
Dec 17 17:32:05.418607 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9378]:  [shard 0:main] gossip - Fail to apply application_state: seastar::abort_requested_exception (abort requested)
Dec 17 17:32:05.418658 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9378]:  [shard 0:main] gossip - Sending a GossipShutdown to 10.4.14.91 with generation 1734456368
Dec 17 17:32:05.418703 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9378]:  [shard 0:main] gossip - Sending a GossipShutdown to 10.4.13.78 with generation 1734456368
Dec 17 17:32:05.428723 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9378]:  [shard 0:strm] compaction - [Split keyspace1.standard1 d365d100-bc9c-11ef-84ab-552edb593d46] Splitting of 1 sstables interrupted due to: sstables::compaction_stopped_exception (Compaction for keyspace1/standard1 was stopped due to: user requested abort), at 0x25a3a1e 0x25a34b0 0x1b8f718 0x272d499 0x27449c2 0x1f26d1d
Dec 17 17:32:05.429089 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9378]:  [shard 0:strm] compaction_manager - Split task 0x600023493c00 for table keyspace1.standard1 compaction_group=3 [0x600008d20f20]: Compaction for keyspace1/standard1 was stopped due to: user requested abort: stopping
Dec 17 17:32:05.438720 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9378]:  [shard 1:strm] compaction - [Split keyspace1.standard1 b98a3fa0-bc9c-11ef-89f5-552ddb593d46] Splitting of 1 sstables interrupted due to: sstables::compaction_stopped_exception (Compaction for keyspace1/standard1 was stopped due to: user requested abort), at 0x25a3a1e 0x25a34b0 0x1b8f718 0x272d499 0x27449c2 0x1f26d1d
Dec 17 17:32:05.438875 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9378]:  [shard 1:strm] compaction_manager - Split task 0x60102256c100 for table keyspace1.standard1 compaction_group=10 [0x6010086cd240]: Compaction for keyspace1/standard1 was stopped due to: user requested abort: stopping
Dec 17 17:32:05.439023 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9378]:  [shard 0:strm] storage_service - Failed to complete splitting of table cbf51600-bc7f-11ef-8800-0a39228301fa due to abort requested
Dec 17 17:32:05.489374 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9378]:  [shard 1:comp] compaction - [Compact keyspace1.standard1 c67f8260-bc9c-11ef-89f5-552ddb593d46] Compacting of 21 sstables interrupted due to: sstables::compaction_stopped_exception (Compaction for keyspace1/standard1 was stopped due to: user requested abort), at 0x25a3a1e 0x25a34b0 0x1b8f718 0x272d499 0x27449c2 0x1f26d1d
Dec 17 17:32:05.489707 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9378]:  [shard 1:comp] compaction_manager - Compaction task 0x601021d04000 for table keyspace1.standard1 compaction_group=13 [0x60100862bac0]: Compaction for keyspace1/standard1 was stopped due to: user requested abort: stopping
Dec 17 17:32:05.490313 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9378]:  [shard 1:comp] compaction - [Compact keyspace1.standard1 c67f8262-bc9c-11ef-89f5-552ddb593d46] Compacting of 7 sstables interrupted due to: sstables::compaction_stopped_exception (Compaction for keyspace1/standard1 was stopped due to: user requested abort), at 0x25a3a1e 0x25a34b0 0x1b8f718 0x272d499 0x27449c2 0x1f26d1d
Dec 17 17:32:05.490457 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9378]:  [shard 1:comp] compaction_manager - Compaction task 0x601021d28000 for table keyspace1.standard1 compaction_group=14 [0x60100862ba60]: Compaction for keyspace1/standard1 was stopped due to: user requested abort: stopping
Dec 17 17:32:05.490593 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9378]:  [shard 1:main] compaction_manager - Stopped
Dec 17 17:32:05.493920 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9378]:  [shard 0:comp] compaction - [Compact keyspace1.standard1 208fcdb0-bc9c-11ef-84ab-552edb593d46] Compacting of 10 sstables interrupted due to: sstables::compaction_stopped_exception (Compaction for keyspace1/standard1 was stopped due to: user requested abort), at 0x25a3a1e 0x25a34b0 0x1b8f718 0x272d499 0x27449c2 0x1f26d1d
Dec 17 17:32:05.494280 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9378]:  [shard 0:comp] compaction_manager - Compaction task 0x600009005c00 for table keyspace1.standard1 compaction_group=1 [0x600008d20fe0]: Compaction for keyspace1/standard1 was stopped due to: user requested abort: stopping
Dec 17 17:32:05.498225 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9378]:  [shard 0:comp] compaction - [Compact keyspace1.standard1 0096af10-bc9c-11ef-84ab-552edb593d46] Compacting of 10 sstables interrupted due to: sstables::compaction_stopped_exception (Compaction for keyspace1/standard1 was stopped due to: user requested abort), at 0x25a3a1e 0x25a34b0 0x1b8f718 0x272d499 0x27449c2 0x1f26d1d
Dec 17 17:32:05.498662 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9378]:  [shard 0:comp] compaction_manager - Compaction task 0x600009005400 for table keyspace1.standard1 compaction_group=4 [0x600000df5240]: Compaction for keyspace1/standard1 was stopped due to: user requested abort: stopping
Dec 17 17:32:05.498721 elasticity-test-nemesis-master-db-node-2c23b329-3 scylla[9378]:  [shard 0:main] compaction_manager - Stopped

@yarongilor
Copy link
Author

yarongilor commented Dec 23, 2024

The issue is reproduced on another "rolling restart" nemesis of disrupt_disable_enable_ldap_authorization

Image

DatabaseLogEvent
ERROR
disrupt_disable_enable_ldap_authorization
2024-12-22 23:50:02.903
Received: 2024-12-22 23:50:02.836
one-time
elasticity-test-nemesis-master-db-node-26f080d5-3
2024-12-22 23:50:02.903 <2024-12-22 23:50:02.836>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=78e78fb3-7f38-408a-b1b9-63e8a6e590b3: type=NO_SPACE_ERROR regex=No space left on device line_number=9438 node=elasticity-test-nemesis-master-db-node-26f080d5-3
2024-12-22T23:50:02.836+00:00 elasticity-test-nemesis-master-db-node-26f080d5-3      !ERR | scylla[1441]:  [shard 0:main] storage_service - Shutting down communications due to I/O errors until operator intervention: Disk error: std::system_error (error system:28, No space left on device)

Packages

Scylla version: 2024.3.0~dev-20241220.e8463f6b719f with build-id bb749b8a21072b70e631576259d767ecfe654583

Kernel Version: 6.8.0-1021-aws

Issue description

  • This issue is a regression.
  • It is unknown if this issue is a regression.

Describe your issue in detail and steps it took to produce it.

Impact

Describe the impact this issue causes to the user.

How frequently does it reproduce?

Describe the frequency with how this issue can be reproduced.

Installation details

Cluster size: 3 nodes (i4i.large)

Scylla Nodes used in this run:

  • elasticity-test-nemesis-master-db-node-26f080d5-3 (54.228.255.2 | 10.4.12.138) (shards: 2)
  • elasticity-test-nemesis-master-db-node-26f080d5-2 (3.253.207.156 | 10.4.14.76) (shards: 2)
  • elasticity-test-nemesis-master-db-node-26f080d5-1 (52.31.204.116 | 10.4.15.112) (shards: 2)

OS / Image: ami-0af6c6b0a814f34a7 (aws: undefined_region)

Test: byo-longevity-test-yg2
Test id: 26f080d5-3839-425d-9f20-c97a620e7509
Test name: scylla-staging/yarongilor/byo-longevity-test-yg2
Test method: longevity_test.LongevityTest.test_custom_time
Test config file(s):

Logs and commands
  • Restore Monitor Stack command: $ hydra investigate show-monitor 26f080d5-3839-425d-9f20-c97a620e7509
  • Restore monitor on AWS instance using Jenkins job
  • Show all stored logs command: $ hydra investigate show-logs 26f080d5-3839-425d-9f20-c97a620e7509

Logs:

Jenkins job URL
Argus

@paszkow
Copy link
Contributor

paszkow commented Dec 23, 2024

We are constantly hitting here #20054. For the original report, when data ingestion stops, which happens around 16:25, we have 4 tablets with the average size of 83.6 GB. This particular issue is mentioned in the test plan and can be overcome by setting higher initial tablet count.

Note also that it is not only the node-3's storage utilization that gradually increased, but all of them:

  • node-1 reached its storage utilization peak of 98.4% at 17:15;
  • node-2 reached its storage utilization peak of 97.1% at 17:07;
  • node-3 reached its storage utilization peak of 100% at 17:46;

Furthermore, node-1 also hit 100%, reported the error and got restarted:

Dec 17 17:15:15.310355 elasticity-test-nemesis-master-db-node-2c23b329-1 scylla[9625]: [shard 1:main] storage_service - Shutting down communications due to I/O errors until operator intervention: Disk error: std::system_error (error system:28, No space left on device)

@yarongilor Can you re-run the test with 32/64 initial tablet counts in order to avoid tablet splits? Once applied, I suspect it will work fine.

@yarongilor
Copy link
Author

We are constantly hitting here #20054. For the original report, when data ingestion stops, which happens around 16:25, we have 4 tablets with the average size of 83.6 GB. This particular issue is mentioned in the test plan and can be overcome by setting higher initial tablet count.

Note also that it is not only the node-3's storage utilization that gradually increased, but all of them:

  • node-1 reached its storage utilization peak of 98.4% at 17:15;
  • node-2 reached its storage utilization peak of 97.1% at 17:07;
  • node-3 reached its storage utilization peak of 100% at 17:46;

Furthermore, node-1 also hit 100%, reported the error and got restarted:

Dec 17 17:15:15.310355 elasticity-test-nemesis-master-db-node-2c23b329-1 scylla[9625]: [shard 1:main] storage_service - Shutting down communications due to I/O errors until operator intervention: Disk error: std::system_error (error system:28, No space left on device)

@yarongilor Can you re-run the test with 32/64 initial tablet counts in order to avoid tablet splits? Once applied, I suspect it will work fine.

the issue is not reproduced where setting 64 initial tablets.
@paszkow , according to your above comment, i assume this issue could be a dup of #20054

@pehala
Copy link
Contributor

pehala commented Dec 27, 2024

the issue is not reproduced where setting 64 initial tablets.
@paszkow , according to your above comment, i assume this issue could be a dup of #20054

Thats great! I think we should increase priority of #20054 to P0, this error proves that it can easily happen in production and I do not think we can recommend tablets for production with this behaviour. @bhalevy @paszkow @swasik WDYT?

@paszkow
Copy link
Contributor

paszkow commented Dec 27, 2024

Thats great! I think we should increase priority of #20054 to P0, this error proves that it can easily happen in production and I do not think we can recommend tablets for production with this behaviour. @bhalevy @paszkow @swasik WDYT?

We indeed hit it constantly testing 90% utilization. Note, however, that in all our tests we start from an empty table with 1 tablet/shard and in production, you will have more mature servers. So the problem might not be that severe out there. We shall definitely fix it.

@raphaelsc
Copy link
Member

Thats great! I think we should increase priority of #20054 to P0, this error proves that it can easily happen in production and I do not think we can recommend tablets for production with this behaviour. @bhalevy @paszkow @swasik WDYT?

We indeed hit it constantly testing 90% utilization. Note, however, that in all our tests we start from an empty table with 1 tablet/shard and in production, you will have more mature servers. So the problem might not be that severe out there. We shall definitely fix it.

This problem will be less severe in practice with ongoing work here: #22024

@pehala
Copy link
Contributor

pehala commented Dec 27, 2024

We indeed hit it constantly testing 90% utilization. Note, however, that in all our tests we start from an empty table with 1 tablet/shard and in production, you will have more mature servers. So the problem might not be that severe out there. We shall definitely fix it.

I am not sure we will, as far as I know, we dont have migration from vnodes, so customers wanting to try out tablets will need to start with fresh cluster and thus have increased chance of hitting this

This problem will be less severe in practice with ongoing work here: #22024

In that case, I think either #22024 or #20054 should be P0, in my opinion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants