-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Optimized Prefix Pattern for Shard-Level Files for Efficient Snapshots #15146
Comments
Thanks @ashking94 for the proposal.
Currently, entire snapshot contents (data + metadata) are stored within the provided base path. With new approach, this is not true anymore (correct me if I am wrong). This may change the way user organises snapshots (For example, I would be running 5 clusters each using repository-s3 plugin for snapshot and uses bucket To keep the existing behaviour consistent, does it make sense to introduce the proposed changes as a new type of snapshot? |
That's right.
We will have a fixed path where we will be uploading the paths for all the different paths where the data is stored corresponding to a cluster. Also, this will be by default disabled on a cluster for not breaking backward compatibility.
I am still debating in my head how to support the new mode with data that has been uploaded already in the fixed path. Apart from that, we would also need capability to have the new mode only for repository r1, but not repository r2. Let me cover these in the PRs. |
Thanks @ashking94 for the proposal. On the similar lines of what sachin mentioned. Today an admin level user can give path level access to different users in a bucket and those users can provide those paths as base paths and can use it across multiple clusters. looks like this feature would not work for those usecases. to use it, user must have root level access on the bucket. |
Thanks for your comment, @harishbhakuni. The access would need to be provided at the bucket level to the cluster. Having more cluster would allow the autoscaling to work even better. We definitely need to build some mechanisms to segregate access to the domain level paths based on the base path substring in the key path. |
Problem statement
Snapshots are backups of a cluster's indexes and state, including cluster settings, node information, index metadata, and shard allocation information. They are used to recover from failures, such as a red cluster, or to move data between clusters without loss. Snapshots are stored in a repository in a hierarchical manner that represents the composition of shards, indexes, and the cluster. However, this structure poses a scaling challenge when there are numerous shards due to limitations on concurrent operations over a fixed prefix in a remote store. In this RFC, we discuss various aspects to achieve a solution that scales well with a high number of shards.
Current repository structure
Below is the current structure of the snapshot -
Image used from https://opensearch.org/blog/snapshot-operations/.
The files created once per snapshot or once per index per snapshot are somewhat immune to throttling due to their fewer numbers and are uploaded by the active cluster manager using only five snapshot threads. These files include:
Files susceptible to throttling are created on data nodes, generally per primary shard per snapshot:
For index with snapshot uuid Xy1234-z_x. Similarly, there will be more number of folder for different indexes.
Issue with current structure
The existing structure leads to throttling in clusters with a high shard count, resulting in longer snapshot creation and deletion times due to retries. In worst-case scenarios, this can lead to partial or failed snapshots.
Requirements
Proposed solution
Introduce a prefix pattern accepted by multiple repository providers (e.g., AWS S3, GCP Storage, Azure Blob Storage) that maximizes data spread across prefixes for better scaling. This prefix strategy will be applied to shard-level files. The general recommendation by the providers is to maximise the spread of data across as many prefixes as possible. This allows them to scale better. I propose to introduce this prefix strategy for shard level files. This has been introduced already in #12567 for remote store shard level data & metadata files.
Key changes in the proposed structure:
High level approach
Store the path type in customData within IndexMetadata, which is already stored during snapshot creation.
Cloud neutral solution
This approach is supported by multiple cloud providers:
Proposed repository structure
Appendix
Sample current repository structure
The text was updated successfully, but these errors were encountered: