Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adopting byte range PATCH in resumable upload #2501

Closed
guoye-zhang opened this issue Apr 5, 2023 · 41 comments
Closed

Adopting byte range PATCH in resumable upload #2501

guoye-zhang opened this issue Apr 5, 2023 · 41 comments

Comments

@guoye-zhang
Copy link
Contributor

Let's look into adopting byte range PATCH proposed in httpapi WG by @awwright

@Acconut
Copy link
Member

Acconut commented Apr 5, 2023

For reference, here is @awwright's presentation from IETF 116: https://datatracker.ietf.org/meeting/116/materials/slides-116-httpapi-byte-range-patch
And this should be the related draft: https://datatracker.ietf.org/doc/draft-wright-http-patch-byterange/

@Acconut
Copy link
Member

Acconut commented Apr 5, 2023

I am open to the idea of using a byte range instead of the Upload-Offset header field for signaling the current upload progress. However, I am not sure if we would want to support the full byte range syntax. Our current draft specifies append-only uploads, but byte range allows for non-sequential writes and even overwriting previous data. Some upload servers might be interested in implementing this full functionality, but many applications do not need more than an append-only upload. Implementing the full byte range syntax would be just more effort.

So, if we say a server must support append operations and can optionally support non-sequential writes and overwrites, I can also live with the byte range syntax. That might open the door for more sophisticated upload schemes in the future.

@kazuho
Copy link
Contributor

kazuho commented Apr 5, 2023

@Acconut

So, if we say a server must support append operations and can optionally support non-sequential writes and overwrites, I can also live with the byte range syntax. That might open the door for more sophisticated upload schemes in the future.

+1.

Assuming that we'd switch to using byte-range PATCH for appending the upload, are we also going to switch to using content-length for determining the length of the amount of data that has already been uploaded?

@Acconut
Copy link
Member

Acconut commented Apr 5, 2023

Assuming that we'd switch to using byte-range PATCH for appending the upload, are we also going to switch to using content-length for determining the length of the amount of data that has already been uploaded?

I am not sure if Content-Length would be appropriate then. If clients can use byte range PATCH for uploading, they might upload non-contiguous chunks (e.g. two range: 0-100 bytes and 200-300 bytes). These ranges should then also be correctly present in the response for HEAD requests. But Content-Length cannot capture these ranges, so we would need a response header similar to the Range header field, which can capture multiple ranges: https://httpwg.org/specs/rfc9110.html#field.range

Of course, if the server only allows appending, the returned ranges with always be contiguous and start at 0.

@awwright
Copy link

awwright commented Apr 5, 2023

However, I am not sure if we would want to support the full byte range syntax. Our current draft specifies append-only uploads, but byte range allows for non-sequential writes and even overwriting previous data.

Yes, when using a byte range PATCH, this is acceptable; a server may choose to accept only writes that append; or only the writes that overwrite.

You may also require that the data be contiguous, and reject uploads that contain multiple parts.

If clients can use byte range PATCH for uploading, they might upload non-contiguous chunks (e.g. two range: 0-100 bytes and 200-300 bytes).

Clients should not attempt this, and servers that receive this ought to return an error, so in the event of a network problem, clients may re-synchronize their state with the server.

For example, suppose the client uploads bytes 0-99, and suppose the client never receives acknowledgement. The client could retry 0-99, or go ahead and try 100-199. (Or, the client could re-synchronize and find out the last received data ended at byte 55.) Regardless of how the client handles it, the server should error if the upload would repeat data that has already been uploaded but not acknowledged, or if this would skip over data that was sent, but lost on the way to the server.

These ranges should then also be correctly present in the response for HEAD requests. But Content-Length cannot capture these ranges, so we would need a response header similar to the Range header field, which can capture multiple ranges

Byte range PATCH discusses "sparse documents", however I don't think this makes sense in resumable uploads. The server should simply reject requests that would require support for sparse or non-continuous data.

@guoye-zhang
Copy link
Contributor Author

I think Content-Length for offset retrieving is reasonable. It roughly translates to file size on disk. Assuming a typical file system implementation, the holes will be 0 filled, and the size would be the location of the last byte of the file. In the future, we could add more info like "first hole in document" but I think the use case is very niche.

We can take this opportunity to reduce the number of new headers introduced in this draft. Upload-Offset can be replaced, but I think Upload-Incomplete/Complete will have to stay since we need a way to mark the end of an upload.

Servers are only required to support appending. Any other types of operations would be optional pending agreement between clients and servers or negotiation mechanism not defined in this draft.

@awwright
Copy link

awwright commented Apr 6, 2023

but I think Upload-Incomplete/Complete will have to stay since we need a way to mark the end of an upload

The Content-Range header indicates this if you specify the final, intended length of the file. For example,

Content-Range: bytes 200-299/300

This indicates the upload is providing the last byte of a file that's 300 bytes long, and so, this must be the final upload. (Since the final byte of a 300 byte large file will be byte number 299.)

@guoye-zhang
Copy link
Contributor Author

That requires the length to be known when we start sending the request. If we streaming compress a file for example, we do not know if the body will end in 10 bytes or in 100MB.

@awwright
Copy link

awwright commented Apr 6, 2023

@guoye-zhang If the final length is unknown then you would use the indeterminate length form, for example:

Content-Range: 0-99/*

This would indicate an upload of the first 100 bytes, and some unknown amount of bytes is to follow.

@Acconut
Copy link
Member

Acconut commented Apr 6, 2023

I think Content-Length for offset retrieving is reasonable.

I disagree on that point. I think we should either go full byte range mode or just stick to the current offset. Meaning that either we use byte ranges for the PATCH and HEAD requests, even if our draft only defines append-only modification. Or, we stick to the method of having an offset that is acknowledged by the client in the PATCH request. However, mixing between these approaches in the PATCH and HEAD requests would be confusing and inconsistent.

In both cases, I do not mind if we reuse existing headers or have to introduce new one. Reusing fields is always great, of course.

@guoye-zhang
Copy link
Contributor Author

@awwright How do we say the length is unknown but it's the upload is complete?

@Acconut
Copy link
Member

Acconut commented Apr 6, 2023

In what situations would the upload be complete but the client or server do not know its length? Both should be able to keep track of the upload's length until it is complete.

@guoye-zhang
Copy link
Contributor Author

The example is that we are streaming compressing a file. We don't know how big the compressed file will be, but we need to tell the server that the end of the request body is the end of the upload.

@Acconut
Copy link
Member

Acconut commented Apr 6, 2023

Yeah, right. At the beginning of the request we do not know its length. That is a good point. We could circumvent this by leaving the length undeclared in the first request and then using a second request with an empty body to effectively declare the length. That is similar to how we handle it in tus right now.

@awwright
Copy link

awwright commented Apr 6, 2023

@guoye-zhang If the upload is complete, then you'll know the size of the upload, and you can specify it at that time. For example, here is three requests uploading a 300 byte file:

Content-Range: 0-99/*

Content-Range: 100-199/*

Content-Range: 200-299/300

@Acconut
Copy link
Member

Acconut commented Apr 6, 2023

@awwright I think @guoye-zhang is talking about something else: Assume we want to upload a stream in a single request but we do not know the full size of the stream at the beginning of the request (e.g. because the stream is the result of a compression). What range would we use in this case where we do not know the request body size upfront?

We would need something like

Range: bytes=0-

but even that does not convey the information that the upload will be complete if the entire body has been received. Currently, we convey that information using the Upload-(In)complete header field.

@awwright
Copy link

awwright commented Apr 6, 2023

We would need something like

Range: bytes=0-

This is the syntax for requesting a range in the response, would that be used in resumable uploads? Or are you looking for a syntax like Content-Range: bytes 0-*/* when making the upload?

In any event, if you want to begin uploading a "last piece" of indeterminate length, this poses a logical contradiction: If you make the "final" upload, but then that last piece is interrupted, how do you resume it?

You shouldn't be able to mark an upload as finished, until the last byte is actually received by the server. The most straightforward way to do this is to tell the server how long the final length will be.

@guoye-zhang
Copy link
Contributor Author

One of the main goals of the current draft is that we can transparently upgrade a regular upload into a resumable upload. The Upload-Complete: ?1 header means that if the server doesn't support resumable upload, or if the upload didn't get interrupted at all, it essentially works the same way as a regular upload.

We can resumable an interrupted upload by sending a HEAD request asking for an offset. I don't think there is a different whether it's the last piece or not. In automatic upgrade mode, we are always sending the last piece but we don't always know the final length until the upload body ends.

@awwright
Copy link

An upload with a header like Upload-Offset: n ought to be the same as a PATCH body containing Content-Range: n-*/* (for some positive integer n). All the same functionality is retained, and you will gain the ability to send this to servers even if they don't support, or have removed support for, resumable uploads (it will cause an error, usually 415 Unsupported Media Type).

What I was wondering is what happens if a continuation is interrupted... if I resume the upload, how does the server know that the end of my stream is the conclusion of my upload, and not another network interruption?

In HTTP/1.1 you can typically detect the end of the message, and distinguish it from an interruption. But in some cases (I think limited to HTTP/1.0, hopefully rare nowadays), a network error is indistinguishable from the true end of the message. I don't see any language warning about this possibility. But it sounds like this is a separate concern that should be looked at separately.

@guoye-zhang
Copy link
Contributor Author

The discussion was about the necessity of Upload-Complete/Incomplete header. I think it's still required since Content-Range: n-*/* + Upload-Complete: ?1 and Content-Range: n-*/* + Upload-Complete: ?0 mean different things.

I'm not aware of HTTP/1.0 interruption issues. HTTP/1.0 supports chunked encoding and can properly end the request with an empty chunk. Even if such an issue exists, I don't think it's within the scope of this protocol to address since we don't want to reinvent a way to identify the end of a message because we don't trust HTTP to do so. We can always say that this protocol can't be used with HTTP/1.0.

@kazuho
Copy link
Contributor

kazuho commented Apr 11, 2023

FWIW use of chunked encoding with HTTP/1.0 is forbidden; see https://httpwg.org/specs/rfc9112.html#field.transfer-encoding and the related discussion in httpwg/http-core#879.

That said, I do not think the problem of handling of chunked encoding in HTTP/1.0 has an effect on our outcome here, because Resumable Upload relies on an informational response. I do not think we'd want to send informational responses to arbitrary HTTP/1.x clients, because doing so is known to cause interoperability issues.

Back to the topic, I tend to believe that the necessity of having upload-completedepends on if we can assume all implementations to use Content-Range: x-y/z where z will be something other than * at least for the final chunk that the clients attempts to upload.

I see others pointing out that at the moment a client tries to upload the final chunk by generating the content-range header, the client might not know the true size of the request body, because the client might be streaming the request body, doing decompression, etc.

I think that is a valid argument and therefore my +1 to retaining upload-complete header field.

@Acconut
Copy link
Member

Acconut commented Apr 11, 2023

because Resumable Upload relies on an informational response.

That is not entirely true. If the client is not able to receive informational responses, it can also perform an empty Upload Creation Procedure without any body. The (normal) response will then also contain the Location header with the upload URL, just as the informational response would. Then, the client can start sending PATCH requests to this endpoint without relying on informational responses.

I see others pointing out that at the moment a client tries to upload the final chunk by generating the content-range header, the client might not know the true size of the request body, because the client might be streaming the request body, doing decompression, etc.

I think that is a valid argument and therefore my +1 to retaining upload-complete header field.

I agree on this point. Upload-Complete seems to be necessary (or at least useful in many cases).

@awwright
Copy link

awwright commented May 3, 2023

I just published draft-wright-http-patch-byterange-02 which ought to address some of the considerations here, particularly, specifying how to make writes of indeterminate length.

I believe this has everything required to replace the Upload-Offset header. For example, the following patch represents data of an unknown length, but starting at a 200 byte offset:

Content-Range: bytes 200-/*

data...

I would suggest saying that servers MUST support this indeterminate length form, and MAY support other forms.

@Acconut
Copy link
Member

Acconut commented Jul 14, 2023

@awwright I was currently looking into your latest draft and was wondering if it is already adopted by any working group or you intend to bring it any WG?

@awwright
Copy link

@Acconut I believe a Call for Adoption with the HTTP APIs WG is on the IETF 117 schedule.

@Acconut
Copy link
Member

Acconut commented Jul 19, 2023

I just read through -03, in which a Content-Range header field and message/byterange media type is defined, and I would to recap on three things:

  1. The Upload-Offset header field could be replaced by Content-Range, for example Upload-Offset: 123 would become Content-Range: bytes 123-/* in the Upload Creation or Upload Appending Procedure.
    However, I am wondering if Content-Range is also suitable to be used in the response for the Offset Retrieving Procedure (HEAD request). Can we put Content-Range: bytes 123-/* in a HEAD response? If so, would this have the same meaning as the current Upload-Offset: 123 header field? Could it lead to semantic issues because a (ranged) GET request may not be allowed to this upload URL by the server?
  2. The Upload-Incomplete header can not be replaced by Content-Range or Content-Length because the client may not know the final length when it is starting to send the request. For example, when uploading a streaming resource. Upload-Incomplete: ?0 without a Content-Length allows the client to say that it tries to upload the resource in one request, but it does not know the entire size yet. Or is there a way to get rid of Upload-Incomplete as well?
  3. Your draft also talks about message/byterange, but this issue has mostly talked about Content-Range. Is message/byterange also suitable for our Upload Appending Procedure?

@jrflat
Copy link

jrflat commented Jul 19, 2023

However, I am wondering if Content-Range is also suitable to be used in the response for the Offset Retrieving Procedure (HEAD request).

Guoye mentioned we could use Content-Length in the response to the Offset Retrieving Procedure to replace Upload-Offset. E.g. for resuming at an offset of 100, the pattern could be:

--> HEAD /resumeURL
<-- 204, Content-Length: 100
If the upload is complete/total size is known to be 200 bytes...
--> PATCH /resumeURL, Upload-Complete: ?1, Content-Range: bytes 100-199/200
Or if the data is chunked/streamed...
--> PATCH /resumeURL, Upload-Complete: ?0, Content-Range: bytes 100-/*

@awwright
Copy link

However, I am wondering if Content-Range is also suitable to be used in the response for the Offset Retrieving Procedure (HEAD request). Can we put Content-Range: bytes 123-/* in a HEAD response? If so, would this have the same meaning as the current Upload-Offset: 123 header field? Could it lead to semantic issues because a (ranged) GET request may not be allowed to this upload URL by the server?

I would think that if you're making a HEAD request, you would see the number of bytes received in a Content-Length response header, e.g. Content-Length: 123. You could also use the unsatisfied-range form of a Content-Range response header, e.g. Content-Range: */123

The Upload-Incomplete header can not be replaced by Content-Range or Content-Length because the client may not know the final length when it is starting to send the request.

This ought to be possible if you use the unsatisfied-range form, to set the size of the target resource without writing any data. I will add this to the draft once I-D submissions open up (or see the changes here), since I realized there was no way to set the total length of the target document, which is a common filesystem operation that should be supported. A typical use of this would be to truncate the document (e.g. Content-Range: */0 would truncate the document to zero bytes).

In resumable uploads, you would use this to send a "trailer" that indicates the size of the upload once it becomes known, indicating that the upload is complete.

For example,

PATCH /transfers/1 HTTP/1.1
Content-Type: multipart/byteranges; boundary=PART

--PART
Content-Range: bytes 200-/*

01234
--PART
Content-Range: bytes */205

--PART--

(Note in HTTP/1.1 this would be sent with Transfer-Encoding: chunked but I've stripped that out for simplicity.)

In the first part, the client is sending the remainder of the upload starting at the 200th byte; it continues to byte 204, which is the last byte of the entire request.

To signal that it is the last byte, the client sends one final part, saying the complete length of the document is 205 bytes—i.e. the 204th byte is the last, and is therefore complete (as opposed to a larger number, which would indicate the upload has not finished.)

Your draft also talks about message/byterange, but this issue has mostly talked about Content-Range. Is message/byterange also suitable for our Upload Appending Procedure.

Content-Range is used inside one of the media types, like message/byterange.

However, for resumable uploads, most clients will probably want to use the application/byteranges media type, which is binary, and somewhat easier to parse. Each part in that patch will contain a Content-Range field.

@guoye-zhang
Copy link
Contributor Author

guoye-zhang commented Jul 22, 2023

We want to avoid the overhead of multipart framing, especially since the request might already be multipart itself, having nested multipart body will be a serious test of some parser implementations.

Upload creation is also a concern. We need it to be fully compatible with a regular upload when upload-complete: ?1, regardless of whether or not we know the final size, so we wouldn't be able to do multipart encoding like this.

Ultimately, we are just adopting ranged PATCH to perform mutations on the resource, but whether or not this is the last mutation doesn't need to be inferred by what mutation we are performing. If the client and the server agree, you can even use this protocol to perform live editing of a document and save it in the end by setting upload-complete: ?1.

@awwright
Copy link

We want to avoid the overhead of multipart framing, especially since the request might already be multipart itself, having nested multipart body will be a serious test of some parser implementations.

There is no reason this should prose a problem. There is no overhead with the binary application/byteranges format.

And nested multipart responses do not pose a problem to the multipart format; the contents of a part body are completely opaque, and can be anything, including another multipart document.

whether or not this is the last mutation doesn't need to be inferred by what mutation we are performing.

If this is so, then I'm struggling to figure out what Upload-Incomplete (or Upload-Complete) actually indicates. If you say "This upload is complete" but the request is indeterminate length, what happens if that upload is interrupted? It seems to me the server will mistakenly think that's the end of the upload, when in fact you still have more data to upload, and you will need to make one more request to send it all off.

On the other hand, if the server can detect that the upload was interrupted or cleanly finished, even in indeterminate length uploads, why is this flag necessary at all?

What does the error handling look like if a continued upload is itself interrupted?

@guoye-zhang
Copy link
Contributor Author

There is no reason this should prose a problem. There is no overhead with the binary application/byteranges format.

I was talking about the multipart/byteranges example above.

And nested multipart responses do not pose a problem to the multipart format; the contents of a part body are completely opaque, and can be anything, including another multipart document.

That is true in theory, and request smuggling is also impossible in theory. But parsers are not perfect in reality so we would want to defend against it by not putting ourselves in such a situation.

If this is so, then I'm struggling to figure out what Upload-Incomplete (or Upload-Complete) actually indicates. If you say "This upload is complete" but the request is indeterminate length, what happens if that upload is interrupted? It seems to me the server will mistakenly think that's the end of the upload, when in fact you still have more data to upload, and you will need to make one more request to send it all off.

We either have Content-Length header or chunked encoding / framing to indicate the end of the request body. These are existing mechanisms in HTTP today used by regular uploads / downloads.

On the other hand, if the server can detect that the upload was interrupted or cleanly finished, even in indeterminate length uploads, why is this flag necessary at all?

This flag is for an advanced use case where you need to chunk uploads to a certain limit, e.g. to get around the CDN's 100MB request body size limit. As is implemented today on Apple platforms, the flag always set it to complete and we don't support chunked uploads which is allowed by the draft.

What does the error handling look like if a continued upload is itself interrupted?

There is no difference (in the draft today) between the interruption of the creation procedure and appending procedure. You query the offset and continue appending.

@awwright
Copy link

I was talking about the multipart/byteranges example above.

Yes, but I only wrote the example using multipart/byteranges because it's human-readable. I believe application/byteranges would be the best patch format to use.

But parsers are not perfect in reality so we would want to defend against it by not putting ourselves in such a situation.

This argument could be made of any parser, including resumable uploads itself; I'm not sure why multipart/byteranges would uniquely be a concern—but if implementation complexity is a concern, application/byteranges would probably be the right pick here, as it's somewhat simpler to implement (you must tag the length of variable-length strings, you don't parse the contents at all—there's no risk whatsoever that the contents "leak out" due to an error searching for the multipart boundary).

Also note, the PATCH method must use some media type, otherwise the server has no way to know what the body of the PATCH payload means. I have no way to implement a PATCH method in an HTTP server without also defining a media type and its behavior.

The only alternative I know of is create a new method (perhaps call it APPEND).

This flag is for an advanced use case where you need to chunk uploads to a certain limit, e.g. to get around the CDN's 100MB request body size limit.

So then, what about clients who have no way of knowing this ahead of time? If you start uploading a request, and then halfway through you realize it's going to be 200M and you need to split it up—you can't do this if you've already specified in the headers "upload complete".

I think it makes the whole request more complicated if you have to declare this in advance. You have to process the end of the request differently depending on the value for Upload-Incomplete (or Upload-Complete?).

There's at least two options to avoid this: use an HTTP trailer, or application/byteranges which, by design, provides all of the information necessary to determine if the upload is complete.

Or put it another way, the Upload-Incomplete header is a work-around for the fact that the end of the resumed upload is tied to the end of the original upload, and this inflexibility results in added complexity elsewhere. Instead, the end of the original request body should be signaled separately from the end of resumed upload request.

@guoye-zhang
Copy link
Contributor Author

Yes, but I only wrote the example using multipart/byteranges because it's human-readable. I believe application/byteranges would be the best patch format to use.

OK, I'll look into it.

This argument could be made of any parser, including resumable uploads itself; I'm not sure why multipart/byteranges would uniquely be a concern—but if implementation complexity is a concern, application/byteranges would probably be the right pick here, as it's somewhat simpler to implement (you must tag the length of variable-length strings, you don't parse the contents at all—there's no risk whatsoever that the contents "leak out" due to an error searching for the multipart boundary).

We are proposing the minimum set of features to support a resumable upload. You have to define an offset and whether it is the end of the upload, the length of the upload is already provided by HTTP. Arguable even signaling the upload completion is unnecessary, but chucked upload is a commonly requested and used feature, and people will just reinvent it poorly if we don't define it. My objection is having additional chunking inside a single append operation.

The only alternative I know of is create a new method (perhaps call it APPEND).

A new method is an additional implementation hurdle since there are HTTP libraries that don't support that.

So then, what about clients who have no way of knowing this ahead of time? If you start uploading a request, and then halfway through you realize it's going to be 200M and you need to split it up—you can't do this if you've already specified in the headers "upload complete".

Well if you know about the specific limitation and you might go over, you should always set upload to incomplete. Otherwise, it's OK to do optimistically do complete uploads, and attempt a resumption when CDN terminates the oversized upload.

Or put it another way, the Upload-Incomplete header is a work-around for the fact that the end of the resumed upload is tied to the end of the original upload, and this inflexibility results in added complexity elsewhere. Instead, the end of the original request body should be signaled separately from the end of resumed upload request.

Upload complete makes the upload creation procedure compatible with a non-resumable upload, and allows feature detection and transparent upgrade to resumable uploads. Upload incomplete is an advanced feature for people who cannot depend on 1xx responses, or need chunked uploads. The design tradeoff has been extensively debated and iterated to meet the requirements of various adopters.

We are willing to switch to a mechanism which has direct translations of our current operation mode, with an equivalent or strictly more powerful feature set. "Append this request body to an existing upload, treat this as the final operation or not" isn't a very complicated semantics.

@Acconut
Copy link
Member

Acconut commented Nov 6, 2023

We talked about this again after the httpapi session at IETF 118 and there are two relevant points.

draft-wright-http-patch-byterange might move away from changing the Content-Range header to instead using a new header, for example Content-Offset (or Upload-Offset or some other name). This could be the same header that is used in resumable uploads, allowing both drafts to use a common header field without running into issues with Content-Range.

In addition, an explicit design goal of the message/byterange media type from draft-wright-http-patch-byterange is to include the range (or offset) of the partial content in the request body itself. That's why the request body consists of a set of headers before the actual partial content (notice the additional empty line and the Content-Length: 272):

   PATCH /uploads/foo HTTP/1.1
   Content-Type: message/byterange
   Content-Length: 272
   If-Match: "xyzzy"
   If-Unmodified-Since: Sat, 29 Oct 1994 19:43:31 GMT

   Content-Range: bytes 100-299/600
   Content-Type: text/plain

   [200 bytes...]

The purpose is that the message is an encapsulated patch document that can be stored on disk or sent to clients in responses to GET requests. This document not only contains the partial chunk but alse the range at which is should be applied. That's why it is preferable to include the range in the message body instead of a header field. While this might make sense for some applications, I don't think this is a good approach for resumable uploads. Here we should rather include the offset in an actual header field instead of having to parse it from the request body.

@awwright Is this a good summary of our discussion? Let me know if I missed something.

@awwright
Copy link

awwright commented Nov 7, 2023

@Acconut Yes, this is an excellent summary; using Content-Offset as a Partial PUT would look like:

PUT /uploads/foo HTTP/1.1
Content-Offset: bytes 5000
Transfer-Encoding: Chunked

[bytes...]

Additionally, I was thinking about your proposal on the list, what about using "application/octet-stream" as the Content-Type in the manner you were proposing, where the upload body is only the data to write, and the metadata is provided in HTTP headers?

I think this would be reasonable as there's not very many things that "application/octet-stream" in a PATCH upload could possibly do. Naturally, you can assume the upload body is an opaque binary blob, and so the only place the metadata could possibly be would be out-of-band, i.e. the HTTP headers:

PATCH /uploads/foo HTTP/1.1
Content-Type: application/octet-stream
Content-Offset: bytes 5000
Transfer-Encoding: Chunked

[bytes...]

The rules for using "application/octet-stream" like this might be "the body MUST be an HTTP partial response, a message without recognized partial headers (Content-Range, Content-Offset) MUST be an error."

@reschke
Copy link
Contributor

reschke commented Nov 7, 2023

Well. Using PUT with a new header field risks that existing servers will ignore it.

@awwright
Copy link

awwright commented Nov 7, 2023

@reschke Indeed that's the whole motivation for Byte Range PATCH. However for resumable uploads, where the server establishes support for resumption with a 1xx code, that's unlikely to be a problem.

@Acconut
Copy link
Member

Acconut commented Nov 9, 2023

@awwright I see little benefit of using PUT over PATCH here. Yes, if the server returns an upload resource URL to the client, it should make sure that that endpoint does support the method and the needed headers properly. However, with PATCH we have a more explicit approach to this, where the media type ensures that the server knows how to apply the patch. With this, we can avoid any possible compatibility issues with PUT in existing infrastructure.

My current preference would be to add an application/octet-stream+offset media type. This would indicate that the request body is an opaque byte stream, but it starts at an offset, which is further defined in a header field. We could reuse application/octet-stream, but I don't see any downsides to using a new media type, which would make the appending procedure much more explicit.

For example:

PATCH /uploads/foo HTTP/1.1
Content-Type: application/octet-stream+offset
Upload-Offset: 5000

[bytes...]

@MikeBishop
Copy link
Contributor

Having thought about this a bit since the session, I find myself coming back to the definitions of PUT vs. PATCH. PUT creates a new resource out of the thing being uploaded in the body; PATCH transforms an existing resource using some description defined by the media type.

If resumable upload were specified as a special-case of coming back to the same upload endpoint with special headers to indicate it's a continuation of the same PUT, then I think a PUT with appropriate decoration would be appropriate. But it's not -- it creates an "upload resource" to represent the pending upload, and the client is able to act on that temporary upload resource in subsequent calls.

Semantically, I think PATCH is appropriate -- the client is transforming the upload resource into what should have been the complete body of the interrupted request, and at the end the server processes the completed resource as if it were the original request.

@Acconut
Copy link
Member

Acconut commented Nov 23, 2023

But it's not -- it creates an "upload resource" to represent the pending upload, and the client is able to act on that temporary upload resource in subsequent calls.

Exactly, that is how the current draft is laid out. This is also how most people implementing the protocol on the server-side will approach their implementation. A temporary resource is created, which gets modified over time until the upload is considered complete.

Your argument is a great formulation about why PATCH is the correct choice here. Thanks for sharing it. I agree that PUT is not the best fit here. Now we just need to think about the media type to properly describe the semantics of the PATCH request.

@Acconut
Copy link
Member

Acconut commented Mar 4, 2024

Closing in favor of #2610.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

7 participants