Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Make StreamContext and WriteContext more extensible #17245

Closed
jed326 opened this issue Feb 4, 2025 · 7 comments
Closed

[Feature Request] Make StreamContext and WriteContext more extensible #17245

jed326 opened this issue Feb 4, 2025 · 7 comments
Assignees
Labels
enhancement Enhancement or improvement to existing feature or request Storage:Remote untriaged

Comments

@jed326
Copy link
Collaborator

jed326 commented Feb 4, 2025

Is your feature request related to a problem? Please describe

I am working on a feature in the k-NN plugin (see: opensearch-project/k-NN#2465) for which I want to upload flat vectors to a remote repository using the asyncBlobUpload path, which requires both a WriteContext and a corresponding StreamContext.

Today this asyncBlobUpload is used by remote store to copy segment files to the remote store after refresh, however in my use case I would like to write the flat vector files to the remote object storage during either the flush or merge operations, so in my case there is no existing file on disk yet, which is one of the assumptions taken by WriteContext.

Describe the solution you'd like

I would like to convert both WriteContext and StreamContext into interfaces and refactor the current concrete implementations into RemoteStoreWriteContext and RemoteStoreStreamContext so I can provide my own custom interface implementations in the k-NN plugin without the same assumptions made by the existing classes.

Related component

Storage:Remote

Describe alternatives you've considered

Both WriteContext and StreamContext are already public classes today, so I can still extend them from the k-NN plugin. However, this isn't ideal as, for example, I will still need to provide a fileName to the super() call to WriteContext. Additionally, these classes are annotated as @opensearch.internal today which implies we probably should not be extending them like so.

Additional context

Other related design docs for k-NN feature:

@jed326
Copy link
Collaborator Author

jed326 commented Feb 4, 2025

Tagging some folks who worked on #7000 for feedback: @raghuvanshraj @vikasvb90 @Bukhtawar @ashking94 @reta @gbbafna

@vikasvb90
Copy link
Contributor

vikasvb90 commented Feb 4, 2025

so in my case there is no existing file on disk yet, which is one of the assumptions taken by WriteContext

@jed326 This statement is incorrect. There's a streamContextSupplier which is a supplier to create streams based on the provided part sizes. Users are free to use it for single part or multiple parts. Also, since these interfaces are stream based, it doesn't matter whether streams are backed by file or by any in memory buffer.
Coming to why there are file specific attributes. File name, size, part sizes are all attributes of a remote store and are mandatory before the upload starts. File size is needed upfront to allocate resources for the upcoming upload. If you don't do so, then entire content may be loaded into memory before the upload even begins. I suggest that you go through existing remote store constraints first before designing your uploads.

@jed326
Copy link
Collaborator Author

jed326 commented Feb 4, 2025

Thanks @vikasvb90! I somehow missed that StreamContextSupplier is defined as a functional interface so I can use that extension point instead of rolling my own StreamContext.

File size is needed upfront to allocate resources for the upcoming upload. If you don't do so, then entire content may be loaded into memory before the upload even begins.

The rest I am already handling as, like you said, instead of backing the stream by file I am backing it my a memory buffer to limit the memory consumption (for example).

To that point, though, today the part size in question is calculated by the specific vendor implementation:

/**
* Calculates the optimal part size of each part request if the upload operation is carried out as multipart upload.
*/
public long calculateOptimalPartSize(long contentLengthOfSource, WritePriority writePriority, boolean uploadRetryEnabled) {
if (contentLengthOfSource < ByteSizeUnit.MB.toBytes(5)) {
return contentLengthOfSource;
}
if (uploadRetryEnabled && (writePriority == WritePriority.HIGH || writePriority == WritePriority.URGENT)) {
return new ByteSizeValue(5, ByteSizeUnit.MB).getBytes();
}
double optimalPartSize = contentLengthOfSource / (double) MAX_UPLOAD_PARTS;
optimalPartSize = Math.ceil(optimalPartSize);
return (long) Math.max(optimalPartSize, minimumPartSize);
}

Are there any other ways to control this part size? Since our InputStream is ultimately backed by a DocIdSetIterator it may not be as performant to create 10k parts as that would require 10k iterators, although that is something we will only know concretely after performing more benchmarking.

@vikasvb90
Copy link
Contributor

Talking specifically about S3 plugin,

  1. If 10K parts are created then that means size of the file is 10K * 5MB (Minimum part size) = ~50gb file. Are you expecting to upload a 50gb file? In default merge policy, 5gb is the max size of a segment file.
  2. Even if you end up with a very huge segment file say as a result of force merge, then also not all parts are created at once. They are blocked if there's isn't sufficient capacity to accommodate uploads. Please check this code pointer.

@vikasvb90
Copy link
Contributor

To add, there is a priority order defined to distinguish between uploads of remote cluster state (URGENT), remote translog (HIGH), remote segments (NORMAL) and large files (>15gb) + background jobs (LOW). Semaphore is acquired for NORMAL and LOW priority transfers. I believe your use case falls into NORMAL priority, so you should be good to create any number of parts.

@jed326
Copy link
Collaborator Author

jed326 commented Feb 4, 2025

Are you expecting to upload a 50gb file?

As you said this would only be in force merge case and we wouldn't expect this to be a common occurrence. Rough math: 1k dimension fp32 vector takes 4k bytes to store each vector, so with 10m documents the vectors would take up 40GB

Thanks for the pointers on the semaphore and the priority transfers as well, will keep those in mind in our perf tuning/analysis as well.

@jed326
Copy link
Collaborator Author

jed326 commented Feb 4, 2025

I think my questions are answered here so closing this issue, thanks again @vikasvb90!

@jed326 jed326 closed this as completed Feb 4, 2025
@github-project-automation github-project-automation bot moved this from 🆕 New to ✅ Done in Storage Project Board Feb 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Storage:Remote untriaged
Projects
Status: ✅ Done
Development

No branches or pull requests

2 participants