-
Notifications
You must be signed in to change notification settings - Fork 342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: chunk stream upload endpoint #2230
Conversation
why not support stream for files first? |
I actually implemented this as part of something else I was working on. I was using the If we are okay with the approach I can implement this for files as well. |
@zelig streaming of files can be done today on the |
Hey, yeah, to have a possibility to stream chunks upload seems like a crucial thing to have the more we will go into the direction of "lean Bee", so I definitely welcome this idea!
@acud I believe what are you mentioning are JS native APIs that are used for streaming in JS, but do not directly affect the transport protocol. I see two ways here. As @acud mentioned there is already support for From this point of view actually, @aloknerurkar's approach is kind of innovative and even though non-standard it would bypass this browser's limitation, which would allow us in the future to do for example uploading big files from the browser using only chunks (eq. chunking, BMT construction and/or manifests all on client side). So I am actually in favor of this approach! |
I don't completely understand this statement or it is just not true. Therefore
is not connected to the transport protocol and its handsakes/headers question like in the case of the stream and fetch APIs that you pointed out correctly. Nevertheless, the websocket implementation seems a good approach too, because it may save some unnecessary header data between messages, but the interpretation of the response messages about the successful chunk uploads may be cumbersome on client side. Because of the latter I don't see why would it be better than a traditional |
Maybe the |
This resonates with why I implemented this in the first place. I am using bee manifests (in golang) outside of bee and hence when I interact with bee, I can already format the data I want to upload into a series of chunks.
Putting chunks into FileList seems hacky and deviating from the standard.
The example code in JS (websockets) that I have seen has callbacks for messages coming from upstream. So I don't understand why it would be complicated. The approach I have taken is a Request-Response scheme. So similar to how we do in libp2p protocols. Each response is read before we send the next request. |
@nugaon yes you are right browser can handle Streams but not completely. As you have guessed the problem is that it will load all the data to memory, which is the main problem here.
Downloading stream is not the problem here. That is already supported, only the upload is problematic. Or do you mean that they can upload big files? My understanding is that this limitation that I am talking about here is only about XHR/AJAX/fetch requests. I think that when submitting a normal form with large files, the browser handles it internally differently and actually stream the file. To my knowledge, only Chrome recently shipped this feature but only as a closed experiment (you have to register somewhere for a given URL and only then Chrome actually enables it for your site). See for example here, here and open issue here.
Yes, with multipart you could upload multiple chunks, but you would have to load them into memory, which is the problem I am trying to raise here :-) With the I guess the question is what is the end-game here. As I see it, from the direction of "lean Bee", a lot of operations with data will in the future happen on the client-side and it is then crucial for the client-side to be able to work with data efficiently which requires for example not loading all the data into the memory. To my knowledge from JS API point of view there is already support for everything we need (like having streams, having a way to stream a file from FS) except stream the data to Bee node. I know that the focus of Swarm/Bee now is not long-term file storage for big data and that we are not promoting uploading big files, but I would argue that if/once we make this shift the tools should be ready for it. Or in another word, the tools/libraries should not be the problem for a task that is already possible but discouraged. |
38cf3cc
to
166086d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I have a few minor comments
Reviewed 7 of 7 files at r2.
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @aloknerurkar)
pkg/api/chunk_stream.go, line 1 at r2 (raw file):
package api
copyright missing
pkg/api/chunk_stream.go, line 164 at r2 (raw file):
if tag != nil { // indicate that the chunk is stored err = tag.Inc(tags.StateStored)
if the chunk has been seen before, no need to increment this again
pkg/api/chunk_stream.go, line 166 at r2 (raw file):
err = tag.Inc(tags.StateStored) if err != nil { s.logger.Debugf("chunk upload: increment tag", err)
i would also add an Error here, as we do in the other endpoints. This applies also to the other error checks in this file
pkg/api/chunk_stream_test.go, line 1 at r2 (raw file):
package api_test
code copyright missing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dismissed @AuHau and @acud from 6 discussions.
Reviewable status: 5 of 8 files reviewed, all discussions resolved (waiting on @acud)
pkg/api/chunk_stream.go, line 1 at r2 (raw file):
Previously, acud (acud) wrote…
copyright missing
Done.
pkg/api/chunk_stream.go, line 164 at r2 (raw file):
Previously, acud (acud) wrote…
if the chunk has been seen before, no need to increment this again
Done.
pkg/api/chunk_stream.go, line 166 at r2 (raw file):
Previously, acud (acud) wrote…
i would also add an Error here, as we do in the other endpoints. This applies also to the other error checks in this file
Done.
pkg/api/chunk_stream_test.go, line 1 at r2 (raw file):
Previously, acud (acud) wrote…
code copyright missing
Done.
pkg/api/router.go, line 66 at r1 (raw file):
Previously, AuHau (Adam Uhlíř) wrote…
I would maybe suggest
/chunks/stream
URL?
Done.
pkg/api/router.go, line 67 at r1 (raw file):
Previously, AuHau (Adam Uhlíř) wrote…
Why is it supposed to be disabled on the gateway?
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 3 files at r3.
Reviewable status: 7 of 8 files reviewed, 2 unresolved discussions (waiting on @acud and @aloknerurkar)
pkg/api/chunk_stream.go, line 185 at r3 (raw file):
if err := s.pinning.CreatePin(ctx, chunk.Address(), false); err != nil { s.logger.Debugf("chunk stream handler: creation of pin for %q failed: %v", chunk.Address(), err) s.logger.Error("chunk stream handler: creation of pin failed")
for consistency consideration if this happens and if the pinning on localstore really did happen correctly, you would need to now unpin the chunk again from the localstore
pkg/api/chunk_stream.go, line 191 at r3 (raw file):
} err = sendMsg(websocket.TextMessage, successWsMsg)
nit, but since the API is handling binary data, a string success message might not be needed. you could utilize some response code which is purely binary too, like 0x0
for clean exit (like exit codes), or 0x1
for error, etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 7 of 8 files reviewed, 2 unresolved discussions (waiting on @acud)
pkg/api/chunk_stream.go, line 185 at r3 (raw file):
Previously, acud (acud) wrote…
for consistency consideration if this happens and if the pinning on localstore really did happen correctly, you would need to now unpin the chunk again from the localstore
In this case, pinning has failed, right? Is there a way we could get an error from CreatePin
even if pinning succeeded? Or you meant if we fail to send the response?
pkg/api/chunk_stream.go, line 191 at r3 (raw file):
Previously, acud (acud) wrote…
nit, but since the API is handling binary data, a string success message might not be needed. you could utilize some response code which is purely binary too, like
0x0
for clean exit (like exit codes), or0x1
for error, etc
For the error, we use sendErrorClose
which sends a code and a message. So we only need this message for confirmation of chunk upload. I have used a 0
byte now instead of the string message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dismissed @acud from a discussion.
Reviewable status: 4 of 8 files reviewed, all discussions resolved (waiting on @acud)
pkg/api/chunk_stream.go, line 185 at r3 (raw file):
Previously, aloknerurkar wrote…
In this case, pinning has failed, right? Is there a way we could get an error from
CreatePin
even if pinning succeeded? Or you meant if we fail to send the response?
Done.
I think a global WS endpoint with RPC-like messages would be more effective for all ws services as we need this streaming on the download part as well later. Also I don't see why requests-responses need to be blocked in a queue instead of simply just upload chunks independently from each other and handle the responses by their IDs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r4, 3 of 3 files at r6.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @aloknerurkar)
pkg/api/chunk.go, line 156 at r6 (raw file):
err = s.storer.Set(ctx, storage.ModeSetUnpin, chunk.Address()) if err != nil { s.logger.Debugf("chunk upload: deletion of pin for %q failed: %v", chunk.Address(), err)
nit, not sure if %q will catch the Address to String method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 7 files at r2, 2 of 3 files at r4, 1 of 3 files at r6, 2 of 2 files at r7.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @aloknerurkar)
pkg/api/chunk.go, line 60 at r7 (raw file):
putter, err = newStamperPutter(s.storer, s.post, s.signer, batch) if err != nil { s.logger.Debugf("chunk upload: putter:%v", err)
Missing space after :
in the debug message.
pkg/api/chunk.go, line 68 at r7 (raw file):
return nil, nil, nil, errors.New("batch not usable") } return nil, nil, nil, errors.New("")
The returned error errors.New("")
doesn't tell much. Don't you wanna return err
instead?
pkg/api/chunk_stream.go, line 106 at r7 (raw file):
for { select {
You should also check for ctx.Done()
.
pkg/api/pss.go, line 26 at r7 (raw file):
) var (
You can change the variables into constants.
pkg/api/chunk_stream.go, line 106 at r7 (raw file): Previously, mrekucci wrote…
This is not required as gorilla websocket library ignores the http request context. The The request context can be canceled prematurely. Ideally we should use a new context here, but none of the function calls which need the context actually check on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dismissed @mrekucci from a discussion.
Reviewable status: 7 of 9 files reviewed, all discussions resolved (waiting on @acud and @mrekucci)
pkg/api/chunk.go, line 68 at r7 (raw file):
Previously, mrekucci wrote…
The returned error
errors.New("")
doesn't tell much. Don't you wanna returnerr
instead?
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r8.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on @aloknerurkar)
Currently in the
/chunks
API we only support single chunk uploadApplications using the
/chunks
endpoint need to loop while sending multiple chunks. This can cause server to hit issues due to too many connections (or on client-side while re-using connections we might see an EOF).As we already use the
gorilla/websocket
library, I think it would be better to have a client-side streaming endpoint to upload multiple chunks. This will be easier on the server side as well as would re-use all the resources initialised for the first chunk upload.This change is![Reviewable](https://camo.githubusercontent.com/1541c4039185914e83657d3683ec25920c672c6c5c7ab4240ee7bff601adec0b/68747470733a2f2f72657669657761626c652e696f2f7265766965775f627574746f6e2e737667)