Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Starting a multi-input-stream large file upload, and adding the input streams later one-by-one. #96

Open
bemehiser opened this issue Dec 4, 2019 · 8 comments

Comments

@bemehiser
Copy link

bemehiser commented Dec 4, 2019

I'm trying to

  1. start a large file upload, then
  2. add parts later as I receive them.

I get the parts as input streams, and one would assume from the REST API that I should be able to upload them one by one. However, it looks like I either need to have all input streams when I start the upload, or copy the data from each input stream to another input stream which the upload is using, and let it manage the parts.

Is it possible to use generic part upload functionality like the REST API b2_upload_part with each input stream as I acquire it, or can that functionality be added or exposed?

I'm currently using revision 3dfc97 (2019-10-08)

@certainmagic
Copy link
Contributor

Hi Bruce --

Great question. As you point out, there's definitely an API for uploading parts and we definitely have code in the SDK that does that. It could be exposed. Before we dive into that, we should talk more about your use case to make sure that whatever we decide upon will actually help you.

Because the SDK may need to retry uploads, our ContentSource objects must be able to create an InputSource on demand that starts at the beginning of your content. When you say you already have an input stream, are you able to create it on demand or is someone posting it to you outside your control?

thanks,
ab

@bemehiser
Copy link
Author

Thanks for the prompt response!

The part input stream source is mostly out of my control, coming from a different library which allows for generic implementation of cloud storages, but it assumes we manage the parts and retry functionality.

I appreciate the ease of use which the wrapper allows, but I'd also like to be able to access the low level implementation, maybe as a "not recommended, but here you go anyway - use this like you would the REST API" type of thing.
I'd be fine implementing retry myself.

Many thanks!

@certainmagic
Copy link
Contributor

certainmagic commented Dec 4, 2019 via email

@bemehiser
Copy link
Author

bemehiser commented Dec 10, 2019

@certainmagic, I got the functionality I wanted to work by

  • adding a method to the B2StorageClient which allows me to upload a single part.
  • duplicating and modifying the B2LargeFileStorer to B2LargeFilePartStorer, which allows me to manage the part size instead of breaking an input stream into multiple parts.

I can mark the input stream for a given part, and reset it if the B2ContentSource requests a new input stream, so letting the client manage the retry for any given part is fine.
I tried wrapping all the incoming part input streams in a single input stream which the b2 client large file upload could read to upload all parts, but that didn't work because finished part input streams weren't released, so I couldn't reclaim memory (the application stores incoming parts in memory) and very large files couldn't upload.

I'm sure it's not the prettiest way, but it seems to work fine for my use case, and it would be nice functionality to have in the SDK (assuming it doesn't exist already, and I've managed to miss it.)

bemehiser@22be8f6

@certainmagic
Copy link
Contributor

certainmagic commented Dec 10, 2019 via email

@certainmagic
Copy link
Contributor

Hi Bruce --

Thanks for making that version.

I'd like to be able to figure out the right way to add this
feature and my first stab or two got a little more complicated
than I was expecting.

Now that you have your version, are you unblocked? Or are
you stuck waiting for official support for this feature?

thanks,
ab

@bemehiser
Copy link
Author

@certainmagic,

I'm unblocked.
I'd ideally like this feature to be officially supported, but official support is not required at the moment.

Many thanks.

@certainmagic
Copy link
Contributor

hi --

btw, i haven't forgotten about this. i took a few stabs at making a clean version in mid-December, but haven't gotten anything good yet. it could be a while before i get back to it.

ttfn,
ab

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants