-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
large_data_handler: Add metrics for large cells and rows #7354
Conversation
This patch adds counter and metrics for large cells and large rows. Similar to large partitions, those counters will be updated whenever a large cell or large rows will be identified. This allows monitoring big rows and cells. After this path large rows will look like that in the monitor: HELP scylla_database_large_row_exceeding_threshold Number of large rows exceeding compaction_large_row_warning_threshold_mb. Large rows have performance impact and should be avoided, check the documentation for details. scylla_database_large_row_exceeding_threshold{shard="0"} 0 scylla_database_large_row_exceeding_threshold{shard="1"} 0 scylla_database_large_row_exceeding_threshold{shard="2"} 1 And large cells will look like that: scylla_database_large_cell_exceeding_threshold{shard="0"} 0 scylla_database_large_cell_exceeding_threshold{shard="1"} 0 scylla_database_large_cell_exceeding_threshold{shard="2"} 1 Fixes scylladb#7353 Signed-off-by: Amnon Heiman <amnon@scylladb.com>
These metrics are problematic. They don't count the number of large cells, they count the number of times an sstable writer encountered a large cell since the last restart. Neither the value not its derivative have any meaning. Can't you read the large data tables from grafana? I hear it can do anything these days. |
The problem this issue is solving is alerting, if the value is incremented an additional line is added to the large_cells/large_rows table.
|
Good point. But let's explore other options, I don't want to bloat the metrics with single-use items. Can we stream the logs (via rsyslog) to grafana? Then is can alert on any log events, not just large data warnings. |
@avikivity if we change the metric name to large_parition_row_cell and increment on every case will that be acceptable (in my view it should). please note the system.large_* tables hold entries as long as an sstable with such large_* exists - but if the data was deleted - and compacted there will be no record this ever existed (aside of logs / metrics). I do not mind we search for something that searches the logs - enterprise users already have such systems (as we do in the cloud logz.io) we may not have full control of those aspects and would still want to catch such events |
@slivne logs can be streamed to more than one target (as metrics) |
Yes, rsyslog can send logs to multiple targets. I think using logs has much greater potential. See the integration QA did. |
@noamha / @AdamNuclear how do we integrate logz.io - can you please reference what we do to make sure we can have two endpoints. |
744fc19
to
9d91d38
Compare
24ef2e4
to
32fd38f
Compare
2e049ab
to
c3216ae
Compare
8896be2
to
d450a14
Compare
I believe we have these already? |
This patch adds counter and metrics for large cells and large rows.
Similar to large partitions, those counters will be updated whenever a
large cell or large rows will be identified.
This allows monitoring big rows and cells.
After this path large rows will look like that in the monitor:
HELP scylla_database_large_row_exceeding_threshold Number of large rows exceeding compaction_large_row_warning_threshold_mb. Large rows have performance impact and should be avoided, check the documentation for details.
scylla_database_large_row_exceeding_threshold{shard="0"} 0
scylla_database_large_row_exceeding_threshold{shard="1"} 0
scylla_database_large_row_exceeding_threshold{shard="2"} 1
And large cells will look like that:
scylla_database_large_cell_exceeding_threshold{shard="0"} 0
scylla_database_large_cell_exceeding_threshold{shard="1"} 0
scylla_database_large_cell_exceeding_threshold{shard="2"} 1
Fixes #7353
Signed-off-by: Amnon Heiman amnon@scylladb.com