Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add storage related metrics #2044

Closed
amnonh opened this issue Aug 15, 2023 · 24 comments · Fixed by #2235
Closed

Add storage related metrics #2044

amnonh opened this issue Aug 15, 2023 · 24 comments · Fixed by #2235
Labels
enhancement New feature or request

Comments

@amnonh
Copy link
Collaborator

amnonh commented Aug 15, 2023

Add disk and sstable related metrics either to an existing dashboard or to a new dashboard

@amnonh amnonh added the enhancement New feature or request label Aug 15, 2023
@amnonh
Copy link
Collaborator Author

amnonh commented Aug 15, 2023

@raphaelsc : scylla_sstables_currently_open_for_reading add to detiled

@avikivity
Copy link
Member

sum(rate(scylla_sstables_index_page_misses[120s])) by (instance) / (sum(rate(scylla_sstables_single_partition_reads[120s])) by (instance)) -> read amplification due to promoted index reads. Maybe need to add range scans to the divisor since they also generate index reads.

@denesb
Copy link
Contributor

denesb commented Sep 6, 2023

I would like to export the sstables_read and disk_reads stats from the reader concurrency semaphore. This would give us a per-scheduling group view on these metrics. The corresponding metrics are scylla_database_sstables_read and scylla_database_disk_reads.

@denesb
Copy link
Contributor

denesb commented Sep 6, 2023

@raphaelsc : scylla_sstables_currently_open_for_reading add to detiled

#2044 (comment) would already give us this, on a per-scheduling group basis.

@raphaelsc
Copy link
Member

@raphaelsc : scylla_sstables_currently_open_for_reading add to detiled

#2044 (comment) would already give us this, on a per-scheduling group basis.

the name of the metric is a bit misleading. it's actually sstables_currently_available_for_reading (i.e total number of sstables in the system).

@denesb
Copy link
Contributor

denesb commented Sep 11, 2023

@raphaelsc : scylla_sstables_currently_open_for_reading add to detiled

#2044 (comment) would already give us this, on a per-scheduling group basis.

the name of the metric is a bit misleading. it's actually sstables_currently_available_for_reading (i.e total number of sstables in the system).

I see, so it is something else then. Maybe both are valuable then, see how much of the total sstables we need for each read.

@raphaelsc
Copy link
Member

@raphaelsc : scylla_sstables_currently_open_for_reading add to detiled

#2044 (comment) would already give us this, on a per-scheduling group basis.

the name of the metric is a bit misleading. it's actually sstables_currently_available_for_reading (i.e total number of sstables in the system).

I see, so it is something else then. Maybe both are valuable then, see how much of the total sstables we need for each read.

indeed. In my latest adventures, I have been using it a lot to correlate growth in non lsa with number of sstables (e.g. after a node op).

@amnonh amnonh added this to the Monitoring 4.6 milestone Nov 6, 2023
@amnonh
Copy link
Collaborator Author

amnonh commented Nov 16, 2023

scylla_database_sstables_read

I see that the class label is not the same as the scheduling_group_name label (tested with scylla 2023.1.2) what's the relation between them (if any) and what does the user understand?

@denesb

@amnonh
Copy link
Collaborator Author

amnonh commented Nov 19, 2023

@michoecho can you please look 2044#issuecomment-1680476322 I saw that you are the last one who touched the relevant code. I need a clear explenation on what the calculation should be, if possible with reasoning.

@michoecho
Copy link
Contributor

@michoecho can you please look 2044#issuecomment-1680476322 I saw that you are the last one who touched the relevant code. I need a clear explenation on what the calculation should be, if possible with reasoning.

I can't give a clear explanation for the calculation without a clear understanding of what is being calculated.
@avikivity What exactly do you want to get out of #2044 (comment)?
(Whatever it is, it probably will need more server-side metrics if we want it to handle range reads properly).

@amnonh
Copy link
Collaborator Author

amnonh commented Dec 4, 2023

@michoecho @avikivity ping

@amnonh
Copy link
Collaborator Author

amnonh commented Dec 15, 2023

@michoecho @avikivity ping I'm about to branch 4.6 and would like to add it to the release

@amnonh amnonh modified the milestones: Monitoring 4.6, Monitoring 4.7 Dec 27, 2023
@amnonh
Copy link
Collaborator Author

amnonh commented Feb 8, 2024

@michoecho @avikivity ping

1 similar comment
@amnonh
Copy link
Collaborator Author

amnonh commented Feb 13, 2024

@michoecho @avikivity ping

@michoecho
Copy link
Contributor

What are you pinging me for?

@amnonh
Copy link
Collaborator Author

amnonh commented Feb 13, 2024

@michoecho pining you and @avikivity

@amnonh
Copy link
Collaborator Author

amnonh commented Mar 7, 2024

@avikivity @denesb @raphaelsc @michoecho I will branch 4.7 soon and would like to have it in the release.

Can we make a decision on what to include?

@denesb
Copy link
Contributor

denesb commented Mar 11, 2024

@avikivity @denesb @raphaelsc @michoecho I will branch 4.7 soon and would like to have it in the release.

Can we make a decision on what to include?

I have nothing more, other than my existing comments: #2044 (comment)

As for the decision, I don't know what you mean, which decision?

@amnonh
Copy link
Collaborator Author

amnonh commented Mar 11, 2024

@denesb I'm looking for a bottom line regarding what to add. That should be an actual metric/calculation; too many options are floating around with no concrete resolution.

@denesb
Copy link
Contributor

denesb commented Mar 11, 2024

I mentioned two metrics in the comment: scylla_database_sstables_read and scylla_database_disk_reads.

@amnonh
Copy link
Collaborator Author

amnonh commented Mar 11, 2024

and @raphaelsc had his thoughts around them, I'm looking for the bottom line after all conversations end.

@denesb
Copy link
Contributor

denesb commented Mar 12, 2024

Right, he requested sstables_currently_available_for_reading to be also added, and I agree, it could be valuable.

@amnonh
Copy link
Collaborator Author

amnonh commented Mar 17, 2024

I see that the class label is not the same as the scheduling_group_name label (tested with scylla 2023.1.2) what's the relation between them (if any) and what does the user understand?

@denesb @raphaelsc

@denesb
Copy link
Contributor

denesb commented Mar 18, 2024

Class is one of user, system or maintenance. These refer to the context in which some work (query) is being done. User refers to all work done on behalf of a user (driver) request. Maintenance refers to all work done as a background, maintenance work (think compaction, repair, streaming etc.). System is everything else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants