Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

large_data_handler: Add metrics for large cells and rows #7354

Closed
wants to merge 1 commit into from

Conversation

amnonh
Copy link
Contributor

@amnonh amnonh commented Oct 7, 2020

This patch adds counter and metrics for large cells and large rows.
Similar to large partitions, those counters will be updated whenever a
large cell or large rows will be identified.

This allows monitoring big rows and cells.

After this path large rows will look like that in the monitor:
HELP scylla_database_large_row_exceeding_threshold Number of large rows exceeding compaction_large_row_warning_threshold_mb. Large rows have performance impact and should be avoided, check the documentation for details.
scylla_database_large_row_exceeding_threshold{shard="0"} 0
scylla_database_large_row_exceeding_threshold{shard="1"} 0
scylla_database_large_row_exceeding_threshold{shard="2"} 1

And large cells will look like that:
scylla_database_large_cell_exceeding_threshold{shard="0"} 0
scylla_database_large_cell_exceeding_threshold{shard="1"} 0
scylla_database_large_cell_exceeding_threshold{shard="2"} 1

Fixes #7353

Signed-off-by: Amnon Heiman amnon@scylladb.com

This patch adds counter and metrics for large cells and large rows.
Similar to large partitions, those counters will be updated whenever a
large cell or large rows will be identified.

This allows monitoring big rows and cells.

After this path large rows will look like that in the monitor:
 HELP scylla_database_large_row_exceeding_threshold Number of large rows exceeding compaction_large_row_warning_threshold_mb. Large rows have performance impact and should be avoided, check the documentation for details.
scylla_database_large_row_exceeding_threshold{shard="0"} 0
scylla_database_large_row_exceeding_threshold{shard="1"} 0
scylla_database_large_row_exceeding_threshold{shard="2"} 1

And large cells will look like that:
scylla_database_large_cell_exceeding_threshold{shard="0"} 0
scylla_database_large_cell_exceeding_threshold{shard="1"} 0
scylla_database_large_cell_exceeding_threshold{shard="2"} 1

Fixes scylladb#7353

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
@avikivity
Copy link
Member

These metrics are problematic. They don't count the number of large cells, they count the number of times an sstable writer encountered a large cell since the last restart. Neither the value not its derivative have any meaning.

Can't you read the large data tables from grafana? I hear it can do anything these days.

@amnonh
Copy link
Contributor Author

amnonh commented Oct 7, 2020

The problem this issue is solving is alerting, if the value is incremented an additional line is added to the large_cells/large_rows table.
We can use that to send an alert telling the user they should look for it.
This is the same implementation the large_partition counter has.

Can't you read the large data tables from grafana? I hear it can do anything these days.

scylladb/scylla-monitoring#1070

@avikivity
Copy link
Member

Good point. But let's explore other options, I don't want to bloat the metrics with single-use items.

Can we stream the logs (via rsyslog) to grafana? Then is can alert on any log events, not just large data warnings.

@slivne
Copy link
Contributor

slivne commented Oct 8, 2020

@avikivity if we change the metric name to large_parition_row_cell and increment on every case will that be acceptable (in my view it should).

please note the system.large_* tables hold entries as long as an sstable with such large_* exists - but if the data was deleted - and compacted there will be no record this ever existed (aside of logs / metrics).

I do not mind we search for something that searches the logs - enterprise users already have such systems (as we do in the cloud logz.io) we may not have full control of those aspects and would still want to catch such events

@tzach
Copy link
Contributor

tzach commented Oct 8, 2020

@slivne logs can be streamed to more than one target (as metrics)
We do need log collection, (for example scylladb/scylla-monitoring#617)

@avikivity
Copy link
Member

Yes, rsyslog can send logs to multiple targets.

I think using logs has much greater potential. See the integration QA did.

@slivne
Copy link
Contributor

slivne commented Oct 11, 2020

@noamha / @AdamNuclear how do we integrate logz.io - can you please reference what we do to make sure we can have two endpoints.

@avikivity avikivity force-pushed the next branch 5 times, most recently from 744fc19 to 9d91d38 Compare November 3, 2020 11:43
@avikivity avikivity force-pushed the next branch 2 times, most recently from 24ef2e4 to 32fd38f Compare January 14, 2021 12:56
@avikivity avikivity force-pushed the next branch 2 times, most recently from 2e049ab to c3216ae Compare February 24, 2022 12:52
@avikivity avikivity force-pushed the next branch 7 times, most recently from 8896be2 to d450a14 Compare October 30, 2023 12:24
@mykaul
Copy link
Contributor

mykaul commented Aug 18, 2024

I believe we have these already?

@mykaul mykaul closed this Aug 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants