Add storage related metrics #2044

amnonh · 2023-08-15T17:28:11Z

Add disk and sstable related metrics either to an existing dashboard or to a new dashboard

amnonh · 2023-08-15T17:40:30Z

@raphaelsc : scylla_sstables_currently_open_for_reading add to detiled

avikivity · 2023-08-16T12:02:47Z

sum(rate(scylla_sstables_index_page_misses[120s])) by (instance) / (sum(rate(scylla_sstables_single_partition_reads[120s])) by (instance)) -> read amplification due to promoted index reads. Maybe need to add range scans to the divisor since they also generate index reads.

denesb · 2023-09-06T06:02:58Z

I would like to export the sstables_read and disk_reads stats from the reader concurrency semaphore. This would give us a per-scheduling group view on these metrics. The corresponding metrics are scylla_database_sstables_read and scylla_database_disk_reads.

denesb · 2023-09-06T06:04:11Z

@raphaelsc : scylla_sstables_currently_open_for_reading add to detiled

#2044 (comment) would already give us this, on a per-scheduling group basis.

raphaelsc · 2023-09-08T21:45:33Z

@raphaelsc : scylla_sstables_currently_open_for_reading add to detiled

#2044 (comment) would already give us this, on a per-scheduling group basis.

the name of the metric is a bit misleading. it's actually sstables_currently_available_for_reading (i.e total number of sstables in the system).

denesb · 2023-09-11T06:44:41Z

@raphaelsc : scylla_sstables_currently_open_for_reading add to detiled

#2044 (comment) would already give us this, on a per-scheduling group basis.

the name of the metric is a bit misleading. it's actually sstables_currently_available_for_reading (i.e total number of sstables in the system).

I see, so it is something else then. Maybe both are valuable then, see how much of the total sstables we need for each read.

raphaelsc · 2023-09-14T17:02:10Z

@raphaelsc : scylla_sstables_currently_open_for_reading add to detiled

#2044 (comment) would already give us this, on a per-scheduling group basis.

the name of the metric is a bit misleading. it's actually sstables_currently_available_for_reading (i.e total number of sstables in the system).

I see, so it is something else then. Maybe both are valuable then, see how much of the total sstables we need for each read.

indeed. In my latest adventures, I have been using it a lot to correlate growth in non lsa with number of sstables (e.g. after a node op).

amnonh · 2023-11-16T11:24:23Z

scylla_database_sstables_read

I see that the class label is not the same as the scheduling_group_name label (tested with scylla 2023.1.2) what's the relation between them (if any) and what does the user understand?

@denesb

amnonh · 2023-11-19T12:09:58Z

@michoecho can you please look 2044#issuecomment-1680476322 I saw that you are the last one who touched the relevant code. I need a clear explenation on what the calculation should be, if possible with reasoning.

michoecho · 2023-11-20T15:09:21Z

@michoecho can you please look 2044#issuecomment-1680476322 I saw that you are the last one who touched the relevant code. I need a clear explenation on what the calculation should be, if possible with reasoning.

I can't give a clear explanation for the calculation without a clear understanding of what is being calculated.
@avikivity What exactly do you want to get out of #2044 (comment)?
(Whatever it is, it probably will need more server-side metrics if we want it to handle range reads properly).

amnonh · 2023-12-04T13:14:59Z

@michoecho @avikivity ping

amnonh · 2023-12-15T21:56:02Z

@michoecho @avikivity ping I'm about to branch 4.6 and would like to add it to the release

amnonh · 2024-02-08T09:27:29Z

@michoecho @avikivity ping

amnonh · 2024-02-13T06:01:02Z

@michoecho @avikivity ping

michoecho · 2024-02-13T10:24:33Z

What are you pinging me for?

amnonh · 2024-02-13T10:33:55Z

@michoecho pining you and @avikivity

amnonh · 2024-03-07T11:35:21Z

@avikivity @denesb @raphaelsc @michoecho I will branch 4.7 soon and would like to have it in the release.

Can we make a decision on what to include?

denesb · 2024-03-11T12:11:11Z

@avikivity @denesb @raphaelsc @michoecho I will branch 4.7 soon and would like to have it in the release.

Can we make a decision on what to include?

I have nothing more, other than my existing comments: #2044 (comment)

As for the decision, I don't know what you mean, which decision?

amnonh · 2024-03-11T14:23:24Z

@denesb I'm looking for a bottom line regarding what to add. That should be an actual metric/calculation; too many options are floating around with no concrete resolution.

denesb · 2024-03-11T14:30:57Z

I mentioned two metrics in the comment: scylla_database_sstables_read and scylla_database_disk_reads.

amnonh · 2024-03-11T14:40:32Z

and @raphaelsc had his thoughts around them, I'm looking for the bottom line after all conversations end.

denesb · 2024-03-12T06:26:19Z

Right, he requested sstables_currently_available_for_reading to be also added, and I agree, it could be valuable.

amnonh · 2024-03-17T13:02:35Z

I see that the class label is not the same as the scheduling_group_name label (tested with scylla 2023.1.2) what's the relation between them (if any) and what does the user understand?

@denesb @raphaelsc

denesb · 2024-03-18T06:31:58Z

Class is one of user, system or maintenance. These refer to the context in which some work (query) is being done. User refers to all work done on behalf of a user (driver) request. Maintenance refers to all work done as a background, maintenance work (think compaction, repair, streaming etc.). System is everything else.

amnonh added the enhancement New feature or request label Aug 15, 2023

amnonh added this to the Monitoring 4.6 milestone Nov 6, 2023

amnonh modified the milestones: Monitoring 4.6, Monitoring 4.7 Dec 27, 2023

amnonh mentioned this issue Mar 21, 2024

Storage metrics #2235

Merged

amnonh closed this as completed in #2235 Mar 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add storage related metrics #2044

Add storage related metrics #2044

amnonh commented Aug 15, 2023

amnonh commented Aug 15, 2023

avikivity commented Aug 16, 2023

denesb commented Sep 6, 2023

denesb commented Sep 6, 2023

raphaelsc commented Sep 8, 2023

denesb commented Sep 11, 2023

raphaelsc commented Sep 14, 2023

amnonh commented Nov 16, 2023 •

edited

Loading

amnonh commented Nov 19, 2023

michoecho commented Nov 20, 2023

amnonh commented Dec 4, 2023

amnonh commented Dec 15, 2023

amnonh commented Feb 8, 2024

amnonh commented Feb 13, 2024

michoecho commented Feb 13, 2024

amnonh commented Feb 13, 2024

amnonh commented Mar 7, 2024

denesb commented Mar 11, 2024

amnonh commented Mar 11, 2024

denesb commented Mar 11, 2024

amnonh commented Mar 11, 2024

denesb commented Mar 12, 2024

amnonh commented Mar 17, 2024

denesb commented Mar 18, 2024

Add storage related metrics #2044

Add storage related metrics #2044

Comments

amnonh commented Aug 15, 2023

amnonh commented Aug 15, 2023

avikivity commented Aug 16, 2023

denesb commented Sep 6, 2023

denesb commented Sep 6, 2023

raphaelsc commented Sep 8, 2023

denesb commented Sep 11, 2023

raphaelsc commented Sep 14, 2023

amnonh commented Nov 16, 2023 • edited Loading

amnonh commented Nov 19, 2023

michoecho commented Nov 20, 2023

amnonh commented Dec 4, 2023

amnonh commented Dec 15, 2023

amnonh commented Feb 8, 2024

amnonh commented Feb 13, 2024

michoecho commented Feb 13, 2024

amnonh commented Feb 13, 2024

amnonh commented Mar 7, 2024

denesb commented Mar 11, 2024

amnonh commented Mar 11, 2024

denesb commented Mar 11, 2024

amnonh commented Mar 11, 2024

denesb commented Mar 12, 2024

amnonh commented Mar 17, 2024

denesb commented Mar 18, 2024

amnonh commented Nov 16, 2023 •

edited

Loading