feat: chunk stream upload endpoint #2230

aloknerurkar · 2021-06-28T09:33:28Z

Currently in the /chunks API we only support single chunk upload

Applications using the /chunks endpoint need to loop while sending multiple chunks. This can cause server to hit issues due to too many connections (or on client-side while re-using connections we might see an EOF).

As we already use the gorilla/websocket library, I think it would be better to have a client-side streaming endpoint to upload multiple chunks. This will be easier on the server side as well as would re-use all the resources initialised for the first chunk upload.

This change is

zelig · 2021-06-28T14:08:40Z

why not support stream for files first?
in other words why not use similar for the file endpoint

aloknerurkar · 2021-06-28T14:32:39Z

why not support stream for files first?
in other words why not use similar for the file endpoint

I actually implemented this as part of something else I was working on. I was using the /chunks endpoint as I am using all our other packages in the code. (https://github.com/aloknerurkar/bee-fs)

If we are okay with the approach I can implement this for files as well.

acud · 2021-07-08T06:31:03Z

why not support stream for files first?
in other words why not use similar for the file endpoint

@zelig streaming of files can be done today on the /bzz endpoint using multipart AFAIK. This is indeed not the most up-to-date method of streaming. That would be using the streams API in conjunction with the Fetch API.
@alok I would consider to alter the implementation so that it is compatible with the streams and fetch API, since this is implemented in a non-standardized manner.
It would be also good to have the people from the JS team to comment which way to go, since it would be mainly leveraged and implemented by them as a primary stakeholder, at least at the beginning. cc @agazso @AuHau @nugaon

AuHau · 2021-07-09T06:24:49Z

Hey,

yeah, to have a possibility to stream chunks upload seems like a crucial thing to have the more we will go into the direction of "lean Bee", so I definitely welcome this idea!

That would be using the streams API in conjunction with the Fetch API.

@acud I believe what are you mentioning are JS native APIs that are used for streaming in JS, but do not directly affect the transport protocol. Stream provides an abstraction to handle streaming data and fetch is a tool for making HTTP requests. So here we still have to choose the transport protocol and its usage.

I see two ways here.

As @acud mentioned there is already support for /bzz using multipart. We could utilize the same principle here. But this does have disadvantages for us in browsers because browsers do not support streaming requests.

From this point of view actually, @aloknerurkar's approach is kind of innovative and even though non-standard it would bypass this browser's limitation, which would allow us in the future to do for example uploading big files from the browser using only chunks (eq. chunking, BMT construction and/or manifests all on client side). So I am actually in favor of this approach!

nugaon · 2021-07-09T12:17:35Z

As @acud mentioned there is already support for /bzz using multipart. We could utilize the same principle here. But this does have disadvantages for us in browsers because browsers do not support streaming requests.

I don't completely understand this statement or it is just not true. multipart/form-data does have support for browsers, even for HTML forms.
Also, on browser-side with JS you can stream but the data that before uploading via any stream solution will be copied into the RAM AFAIK (see FileReader usages). [Though somehow the file stream reading should be possible as Youtube also solved somehow ;) if you know it how pls let me know].

Therefore

From this point of view actually, @aloknerurkar's approach is kind of innovative and even though non-standard it would bypass this browser's limitation, which would allow us in the future to do for example uploading big files from the browser using only chunks (eq. chunking, BMT construction and/or manifests all on client side). So I am actually in favor of this approach!

is not connected to the transport protocol and its handsakes/headers question like in the case of the stream and fetch APIs that you pointed out correctly.
I mean by that with multipart you can also upload as many chunks as you want in one upload process by putting the chunks as files into the FileList and send it to the Bee client, but it still reading whole file(s) into the memory first, so I don't see it would solve the "browser issue". Can you make some clarification about it @AuHau?

Nevertheless, the websocket implementation seems a good approach too, because it may save some unnecessary header data between messages, but the interpretation of the response messages about the successful chunk uploads may be cumbersome on client side. Because of the latter I don't see why would it be better than a traditional multipart uploading with the extension that I just described.

nugaon · 2021-07-09T12:42:45Z

[Though somehow the file stream reading should be possible as Youtube also solved somehow ;) if you know it how pls let me know].

Maybe the File's stream or slice method? https://developer.mozilla.org/en-US/docs/Web/API/File#instance_methods

aloknerurkar · 2021-07-09T14:02:29Z

From this point of view actually, @aloknerurkar's approach is kind of innovative and even though non-standard it would bypass this browser's limitation, which would allow us in the future to do for example uploading big files from the browser using only chunks (eq. chunking, BMT construction and/or manifests all on client side). So I am actually in favor of this approach!

This resonates with why I implemented this in the first place. I am using bee manifests (in golang) outside of bee and hence when I interact with bee, I can already format the data I want to upload into a series of chunks.

Because of the latter I don't see why would it be better than a traditional multipart uploading with the extension that I just described.

Putting chunks into FileList seems hacky and deviating from the standard.

but the interpretation of the response messages about the successful chunk uploads may be cumbersome on client side

The example code in JS (websockets) that I have seen has callbacks for messages coming from upstream. So I don't understand why it would be complicated. The approach I have taken is a Request-Response scheme. So similar to how we do in libp2p protocols. Each response is read before we send the next request.

AuHau · 2021-07-15T10:35:42Z

Also, on browser-side with JS you can stream but the data that before uploading via any stream solution will be copied into the RAM AFAIK (see FileReader usages). [Though somehow the file stream reading should be possible as Youtube also solved somehow ;) if you know it how pls let me know].

@nugaon yes you are right browser can handle Streams but not completely. As you have guessed the problem is that it will load all the data to memory, which is the main problem here.

[Though somehow the file stream reading should be possible as Youtube also solved somehow ;) if you know it how pls let me know].

Downloading stream is not the problem here. That is already supported, only the upload is problematic. Or do you mean that they can upload big files? My understanding is that this limitation that I am talking about here is only about XHR/AJAX/fetch requests. I think that when submitting a normal form with large files, the browser handles it internally differently and actually stream the file.

To my knowledge, only Chrome recently shipped this feature but only as a closed experiment (you have to register somewhere for a given URL and only then Chrome actually enables it for your site). See for example here, here and open issue here.

Therefore ... is not connected to the transport protocol and its handsakes/headers question like in the case of the stream and fetch APIs that you pointed out correctly.
I mean by that with multipart you can also upload as many chunks as you want in one upload process by putting the chunks as files into the FileList and send it to the Bee client, but it still reading whole file(s) into the memory first, so I don't see it would solve the "browser issue". Can you make some clarification about it @AuHau?

Yes, with multipart you could upload multiple chunks, but you would have to load them into memory, which is the problem I am trying to raise here :-)

With the websockets you can assemble chunks on the fly and send them (=stream them) as you have a persistent connection with the server. You can do similar thing now, only making a new request for each chunk, which is something that @aloknerurkar is trying to fix. With multipart you would have to load all the data into memory before sending them.

I guess the question is what is the end-game here. As I see it, from the direction of "lean Bee", a lot of operations with data will in the future happen on the client-side and it is then crucial for the client-side to be able to work with data efficiently which requires for example not loading all the data into the memory. To my knowledge from JS API point of view there is already support for everything we need (like having streams, having a way to stream a file from FS) except stream the data to Bee node.

I know that the focus of Swarm/Bee now is not long-term file storage for big data and that we are not promoting uploading big files, but I would argue that if/once we make this shift the tools should be ready for it. Or in another word, the tools/libraries should not be the problem for a task that is already possible but discouraged.

pkg/api/router.go

acud

Looks good, I have a few minor comments

Reviewed 7 of 7 files at r2.
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @aloknerurkar)

pkg/api/chunk_stream.go, line 1 at r2 (raw file):

package api

copyright missing

pkg/api/chunk_stream.go, line 164 at r2 (raw file):

		if tag != nil {
			// indicate that the chunk is stored
			err = tag.Inc(tags.StateStored)

if the chunk has been seen before, no need to increment this again

pkg/api/chunk_stream.go, line 166 at r2 (raw file):

			err = tag.Inc(tags.StateStored)
			if err != nil {
				s.logger.Debugf("chunk upload: increment tag", err)

i would also add an Error here, as we do in the other endpoints. This applies also to the other error checks in this file

pkg/api/chunk_stream_test.go, line 1 at r2 (raw file):

package api_test

code copyright missing

aloknerurkar

Dismissed @AuHau and @acud from 6 discussions.
Reviewable status: 5 of 8 files reviewed, all discussions resolved (waiting on @acud)

pkg/api/chunk_stream.go, line 1 at r2 (raw file):

Previously, acud (acud) wrote…

copyright missing

Done.

pkg/api/chunk_stream.go, line 164 at r2 (raw file):

Previously, acud (acud) wrote…

if the chunk has been seen before, no need to increment this again

Done.

pkg/api/chunk_stream.go, line 166 at r2 (raw file):

Previously, acud (acud) wrote…

i would also add an Error here, as we do in the other endpoints. This applies also to the other error checks in this file

Done.

pkg/api/chunk_stream_test.go, line 1 at r2 (raw file):

Previously, acud (acud) wrote…

code copyright missing

Done.

pkg/api/router.go, line 66 at r1 (raw file):

Previously, AuHau (Adam Uhlíř) wrote…

I would maybe suggest /chunks/stream URL?

Done.

pkg/api/router.go, line 67 at r1 (raw file):

Previously, AuHau (Adam Uhlíř) wrote…

Why is it supposed to be disabled on the gateway?

Done.

acud

Reviewed 2 of 3 files at r3.
Reviewable status: 7 of 8 files reviewed, 2 unresolved discussions (waiting on @acud and @aloknerurkar)

pkg/api/chunk_stream.go, line 185 at r3 (raw file):

			if err := s.pinning.CreatePin(ctx, chunk.Address(), false); err != nil {
				s.logger.Debugf("chunk stream handler: creation of pin for %q failed: %v", chunk.Address(), err)
				s.logger.Error("chunk stream handler: creation of pin failed")

for consistency consideration if this happens and if the pinning on localstore really did happen correctly, you would need to now unpin the chunk again from the localstore

pkg/api/chunk_stream.go, line 191 at r3 (raw file):

		}

		err = sendMsg(websocket.TextMessage, successWsMsg)

nit, but since the API is handling binary data, a string success message might not be needed. you could utilize some response code which is purely binary too, like 0x0 for clean exit (like exit codes), or 0x1 for error, etc

aloknerurkar

Reviewable status: 7 of 8 files reviewed, 2 unresolved discussions (waiting on @acud)

pkg/api/chunk_stream.go, line 185 at r3 (raw file):

Previously, acud (acud) wrote…

for consistency consideration if this happens and if the pinning on localstore really did happen correctly, you would need to now unpin the chunk again from the localstore

In this case, pinning has failed, right? Is there a way we could get an error from CreatePin even if pinning succeeded? Or you meant if we fail to send the response?

pkg/api/chunk_stream.go, line 191 at r3 (raw file):

Previously, acud (acud) wrote…

nit, but since the API is handling binary data, a string success message might not be needed. you could utilize some response code which is purely binary too, like 0x0 for clean exit (like exit codes), or 0x1 for error, etc

For the error, we use sendErrorClose which sends a code and a message. So we only need this message for confirmation of chunk upload. I have used a 0 byte now instead of the string message.

aloknerurkar

Dismissed @acud from a discussion.
Reviewable status: 4 of 8 files reviewed, all discussions resolved (waiting on @acud)

pkg/api/chunk_stream.go, line 185 at r3 (raw file):

Previously, aloknerurkar wrote…

In this case, pinning has failed, right? Is there a way we could get an error from CreatePin even if pinning succeeded? Or you meant if we fail to send the response?

Done.

nugaon · 2021-08-02T15:57:34Z

The approach I have taken is a Request-Response scheme. So similar to how we do in libp2p protocols. Each response is read before we send the next request.

I think a global WS endpoint with RPC-like messages would be more effective for all ws services as we need this streaming on the download part as well later.
By that, if we will have auth layer on HTTP requests (that we should have sonner than later) this authentication could happen once at the initialization of the websocket connection (like new BeeWs('ws://localhost:1636/ws/api-key). If every service gets its own ws endpoint later the things may go complex (understanding each EPs message syntax one-by-one, also handling the authorization in every handler on bee side) and not unified.

Also I don't see why requests-responses need to be blocked in a queue instead of simply just upload chunks independently from each other and handle the responses by their IDs

acud

Reviewed 3 of 3 files at r4, 3 of 3 files at r6.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @aloknerurkar)

pkg/api/chunk.go, line 156 at r6 (raw file):

			err = s.storer.Set(ctx, storage.ModeSetUnpin, chunk.Address())
			if err != nil {
				s.logger.Debugf("chunk upload: deletion of pin for %q failed: %v", chunk.Address(), err)

nit, not sure if %q will catch the Address to String method

mrekucci

Reviewed 4 of 7 files at r2, 2 of 3 files at r4, 1 of 3 files at r6, 2 of 2 files at r7.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @aloknerurkar)

pkg/api/chunk.go, line 60 at r7 (raw file):

	putter, err = newStamperPutter(s.storer, s.post, s.signer, batch)
	if err != nil {
		s.logger.Debugf("chunk upload: putter:%v", err)

Missing space after : in the debug message.

pkg/api/chunk.go, line 68 at r7 (raw file):

			return nil, nil, nil, errors.New("batch not usable")
		}
		return nil, nil, nil, errors.New("")

The returned error errors.New("") doesn't tell much. Don't you wanna return err instead?

pkg/api/chunk_stream.go, line 106 at r7 (raw file):

	for {
		select {

You should also check for ctx.Done().

pkg/api/pss.go, line 26 at r7 (raw file):

)

var (

You can change the variables into constants.

aloknerurkar · 2021-08-03T07:57:47Z

pkg/api/chunk_stream.go, line 106 at r7 (raw file):

Previously, mrekucci wrote…

You should also check for ctx.Done().

This is not required as gorilla websocket library ignores the http request context. The gone channel will ensure we stop if client goes away and the quit channel will ensure we stop if the application is stopped. Apart from that we dont need any more synchronization.

The request context can be canceled prematurely. Ideally we should use a new context here, but none of the function calls which need the context actually check on ctx.Done() in this context.

aloknerurkar

Dismissed @mrekucci from a discussion.
Reviewable status: 7 of 9 files reviewed, all discussions resolved (waiting on @acud and @mrekucci)

pkg/api/chunk.go, line 68 at r7 (raw file):

Previously, mrekucci wrote…

The returned error errors.New("") doesn't tell much. Don't you wanna return err instead?

Done.

mrekucci

Reviewed 2 of 2 files at r8.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @aloknerurkar)

bee-runner bot added the pull-request label Jun 28, 2021

aloknerurkar marked this pull request as draft June 28, 2021 09:34

AuHau reviewed Jul 16, 2021

View reviewed changes

pkg/api/router.go Outdated Show resolved Hide resolved

AuHau reviewed Jul 16, 2021

View reviewed changes

pkg/api/router.go Outdated Show resolved Hide resolved

aloknerurkar added 6 commits July 21, 2021 17:16

chore: add readDeadline

fffb68c

feat: check upload stream handler

a6666f3

feat: chunk upload stream handler and tests

e24526d

fix: check close error instead of close handler

25669c1

fix: rebase

72ac978

fix: tests, lint and deepsource

166086d

aloknerurkar force-pushed the traversal.1 branch from 38cf3cc to 166086d Compare July 21, 2021 12:45

aloknerurkar added 2 commits July 21, 2021 18:17

fix: review comments

8ac00bc

fix: test

916782c

aloknerurkar marked this pull request as ready for review July 21, 2021 14:31

aloknerurkar requested a review from acud July 21, 2021 14:32

aloknerurkar added 2 commits July 22, 2021 12:42

fix: remove pingpong handling, not required

8583ee0

fix: remove unused const

c3faa21

acud suggested changes Jul 27, 2021

View reviewed changes

fix: review comments and openapi changes

b21ff60

aloknerurkar commented Jul 28, 2021

View reviewed changes

aloknerurkar requested a review from acud July 28, 2021 12:40

agazso mentioned this pull request Jul 30, 2021

Export low level functions for feeds and soc ethersphere/bee-js#379

Open

acud suggested changes Aug 2, 2021

View reviewed changes

aloknerurkar commented Aug 2, 2021

View reviewed changes

aloknerurkar added 2 commits August 2, 2021 16:30

fix: review comments

208b528

fix: review comment

22fdbf5

aloknerurkar commented Aug 2, 2021

View reviewed changes

feat: fix tag logic and add test

d8e9761

aloknerurkar requested a review from acud August 2, 2021 16:42

acud approved these changes Aug 2, 2021

View reviewed changes

fix: minor review comment

2de3be2

mrekucci suggested changes Aug 3, 2021

View reviewed changes

fix: review comments

4e61198

aloknerurkar commented Aug 3, 2021

View reviewed changes

aloknerurkar requested a review from mrekucci August 3, 2021 08:00

mrekucci approved these changes Aug 3, 2021

View reviewed changes

aloknerurkar merged commit 0b244e9 into master Aug 3, 2021

aloknerurkar deleted the traversal.1 branch August 3, 2021 10:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: chunk stream upload endpoint #2230

feat: chunk stream upload endpoint #2230

aloknerurkar commented Jun 28, 2021 •

edited by acud

Loading

zelig commented Jun 28, 2021

aloknerurkar commented Jun 28, 2021

acud commented Jul 8, 2021

AuHau commented Jul 9, 2021

nugaon commented Jul 9, 2021

nugaon commented Jul 9, 2021

aloknerurkar commented Jul 9, 2021 •

edited

Loading

AuHau commented Jul 15, 2021

acud left a comment

aloknerurkar left a comment

acud left a comment

aloknerurkar left a comment

aloknerurkar left a comment

nugaon commented Aug 2, 2021

acud left a comment

mrekucci left a comment

aloknerurkar commented Aug 3, 2021

aloknerurkar left a comment

mrekucci left a comment

feat: chunk stream upload endpoint #2230

feat: chunk stream upload endpoint #2230

Conversation

aloknerurkar commented Jun 28, 2021 • edited by acud Loading

zelig commented Jun 28, 2021

aloknerurkar commented Jun 28, 2021

acud commented Jul 8, 2021

AuHau commented Jul 9, 2021

nugaon commented Jul 9, 2021

nugaon commented Jul 9, 2021

aloknerurkar commented Jul 9, 2021 • edited Loading

AuHau commented Jul 15, 2021

acud left a comment

Choose a reason for hiding this comment

aloknerurkar left a comment

Choose a reason for hiding this comment

acud left a comment

Choose a reason for hiding this comment

aloknerurkar left a comment

Choose a reason for hiding this comment

aloknerurkar left a comment

Choose a reason for hiding this comment

nugaon commented Aug 2, 2021

acud left a comment

Choose a reason for hiding this comment

mrekucci left a comment

Choose a reason for hiding this comment

aloknerurkar commented Aug 3, 2021

aloknerurkar left a comment

Choose a reason for hiding this comment

mrekucci left a comment

Choose a reason for hiding this comment

aloknerurkar commented Jun 28, 2021 •

edited by acud

Loading

aloknerurkar commented Jul 9, 2021 •

edited

Loading