large_data_handler: Add metrics for large cells and rows #7354

amnonh · 2020-10-07T09:09:21Z

This patch adds counter and metrics for large cells and large rows.
Similar to large partitions, those counters will be updated whenever a
large cell or large rows will be identified.

This allows monitoring big rows and cells.

After this path large rows will look like that in the monitor:
HELP scylla_database_large_row_exceeding_threshold Number of large rows exceeding compaction_large_row_warning_threshold_mb. Large rows have performance impact and should be avoided, check the documentation for details.
scylla_database_large_row_exceeding_threshold{shard="0"} 0
scylla_database_large_row_exceeding_threshold{shard="1"} 0
scylla_database_large_row_exceeding_threshold{shard="2"} 1

And large cells will look like that:
scylla_database_large_cell_exceeding_threshold{shard="0"} 0
scylla_database_large_cell_exceeding_threshold{shard="1"} 0
scylla_database_large_cell_exceeding_threshold{shard="2"} 1

Fixes #7353

Signed-off-by: Amnon Heiman amnon@scylladb.com

This patch adds counter and metrics for large cells and large rows. Similar to large partitions, those counters will be updated whenever a large cell or large rows will be identified. This allows monitoring big rows and cells. After this path large rows will look like that in the monitor: HELP scylla_database_large_row_exceeding_threshold Number of large rows exceeding compaction_large_row_warning_threshold_mb. Large rows have performance impact and should be avoided, check the documentation for details. scylla_database_large_row_exceeding_threshold{shard="0"} 0 scylla_database_large_row_exceeding_threshold{shard="1"} 0 scylla_database_large_row_exceeding_threshold{shard="2"} 1 And large cells will look like that: scylla_database_large_cell_exceeding_threshold{shard="0"} 0 scylla_database_large_cell_exceeding_threshold{shard="1"} 0 scylla_database_large_cell_exceeding_threshold{shard="2"} 1 Fixes scylladb#7353 Signed-off-by: Amnon Heiman <amnon@scylladb.com>

avikivity · 2020-10-07T10:09:51Z

These metrics are problematic. They don't count the number of large cells, they count the number of times an sstable writer encountered a large cell since the last restart. Neither the value not its derivative have any meaning.

Can't you read the large data tables from grafana? I hear it can do anything these days.

amnonh · 2020-10-07T10:19:35Z

The problem this issue is solving is alerting, if the value is incremented an additional line is added to the large_cells/large_rows table.
We can use that to send an alert telling the user they should look for it.
This is the same implementation the large_partition counter has.

Can't you read the large data tables from grafana? I hear it can do anything these days.

scylladb/scylla-monitoring#1070

avikivity · 2020-10-07T15:02:01Z

Good point. But let's explore other options, I don't want to bloat the metrics with single-use items.

Can we stream the logs (via rsyslog) to grafana? Then is can alert on any log events, not just large data warnings.

slivne · 2020-10-08T08:09:57Z

@avikivity if we change the metric name to large_parition_row_cell and increment on every case will that be acceptable (in my view it should).

please note the system.large_* tables hold entries as long as an sstable with such large_* exists - but if the data was deleted - and compacted there will be no record this ever existed (aside of logs / metrics).

I do not mind we search for something that searches the logs - enterprise users already have such systems (as we do in the cloud logz.io) we may not have full control of those aspects and would still want to catch such events

tzach · 2020-10-08T08:43:17Z

@slivne logs can be streamed to more than one target (as metrics)
We do need log collection, (for example scylladb/scylla-monitoring#617)

avikivity · 2020-10-08T09:30:38Z

Yes, rsyslog can send logs to multiple targets.

I think using logs has much greater potential. See the integration QA did.

slivne · 2020-10-11T07:32:48Z

@noamha / @AdamNuclear how do we integrate logz.io - can you please reference what we do to make sure we can have two endpoints.

mykaul · 2024-08-18T13:11:50Z

I believe we have these already?

avikivity force-pushed the next branch from 92e78da to f30e863 Compare October 7, 2020 12:28

psarna force-pushed the next branch from 202bc6f to 6f805bd Compare October 21, 2020 07:53

avikivity force-pushed the next branch 5 times, most recently from 744fc19 to 9d91d38 Compare November 3, 2020 11:43

avikivity force-pushed the next branch from 740b602 to a11ecfe Compare December 11, 2020 15:34

avikivity force-pushed the next branch from 8a07792 to 219ac2b Compare December 24, 2020 09:37

avikivity force-pushed the next branch 2 times, most recently from 24ef2e4 to 32fd38f Compare January 14, 2021 12:56

avikivity force-pushed the next branch from b785003 to 4acc6fe Compare February 9, 2021 12:45

avikivity force-pushed the next branch from 0393e60 to 0016528 Compare February 17, 2021 13:43

avikivity force-pushed the next branch from bb2ad9d to bb04680 Compare March 2, 2021 16:27

psarna force-pushed the next branch from 84cf3d2 to 310b5c9 Compare March 3, 2021 06:06

penberg force-pushed the next branch from ba53326 to 32d386d Compare March 17, 2021 07:59

psarna force-pushed the next branch from b06d820 to 4a7d317 Compare March 18, 2021 10:27

psarna force-pushed the next branch from bce71bb to 4cf21f3 Compare April 14, 2021 11:16

nyh force-pushed the next branch from 81730d1 to dbd0b9a Compare April 18, 2021 11:28

avikivity force-pushed the next branch from 9f519d5 to c7a814f Compare May 11, 2021 15:39

tgrabiec force-pushed the next branch from 13850cd to 5a1c57e Compare May 11, 2021 23:36

avikivity force-pushed the next branch from 5a1c57e to c0dafa7 Compare May 12, 2021 06:52

avikivity force-pushed the next branch 2 times, most recently from 2e049ab to c3216ae Compare February 24, 2022 12:52

denesb force-pushed the next branch from 06f0876 to 40078a6 Compare February 25, 2022 05:26

avikivity force-pushed the next branch from 7dad3c9 to a1d0f08 Compare March 16, 2022 16:57

denesb force-pushed the next branch from a5e632e to edd0481 Compare March 30, 2022 08:43

kbr- force-pushed the next branch from 8456a55 to 4a58427 Compare May 11, 2022 12:47

avikivity force-pushed the next branch from 377f959 to b600398 Compare May 19, 2022 14:21

tgrabiec force-pushed the next branch from 2d6f2ba to ef7643d Compare May 27, 2022 14:38

avikivity force-pushed the next branch from 158ebc3 to 29c28dc Compare July 25, 2022 15:06

kbr- force-pushed the next branch from bf0e739 to 40bd913 Compare October 6, 2022 11:50

kbr-scylla force-pushed the next branch from c1a30f9 to 0959739 Compare February 23, 2023 11:50

kbr-scylla force-pushed the next branch from 9935505 to 51a76e6 Compare March 10, 2023 14:08

kbr-scylla force-pushed the next branch from 8a31d5b to a29b8cd Compare April 25, 2023 12:21

avikivity force-pushed the next branch from 946d1a7 to f1cad23 Compare May 10, 2023 17:24

denesb force-pushed the next branch from 70e827b to 34d94fb Compare September 1, 2023 04:18

avikivity force-pushed the next branch 7 times, most recently from 8896be2 to d450a14 Compare October 30, 2023 12:24

denesb force-pushed the next branch from f6d02a0 to cfcd34b Compare November 22, 2023 15:44

kbr-scylla force-pushed the next branch from 3c30ec2 to ce317d5 Compare December 19, 2023 11:02

denesb force-pushed the next branch from 3f198f7 to 147f30c Compare January 3, 2024 12:46

kbr-scylla force-pushed the next branch from b552100 to 52e6398 Compare February 2, 2024 14:20

avikivity force-pushed the next branch from fc9d853 to 2ad13e5 Compare May 11, 2024 14:04

denesb force-pushed the next branch from 9aa1c87 to 7b41630 Compare June 11, 2024 03:59

mykaul closed this Aug 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

large_data_handler: Add metrics for large cells and rows #7354

large_data_handler: Add metrics for large cells and rows #7354

amnonh commented Oct 7, 2020

avikivity commented Oct 7, 2020

amnonh commented Oct 7, 2020

avikivity commented Oct 7, 2020

slivne commented Oct 8, 2020

tzach commented Oct 8, 2020

avikivity commented Oct 8, 2020

slivne commented Oct 11, 2020

mykaul commented Aug 18, 2024

large_data_handler: Add metrics for large cells and rows #7354

large_data_handler: Add metrics for large cells and rows #7354

Conversation

amnonh commented Oct 7, 2020

avikivity commented Oct 7, 2020

amnonh commented Oct 7, 2020

avikivity commented Oct 7, 2020

slivne commented Oct 8, 2020

tzach commented Oct 8, 2020

avikivity commented Oct 8, 2020

slivne commented Oct 11, 2020

mykaul commented Aug 18, 2024