Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http4s-blaze-client: File upload (Multipart) fails if server doesn't support "Transfer-Encoding: chunked" #639

Open
kevin-lee opened this issue Jan 30, 2021 · 6 comments

Comments

@kevin-lee
Copy link
Contributor

Issue in http4s-blaze-client

Summary

A POST request with Multipart body for uploading a file to GitHub Release using GitHub API fails with the following error message.

{
  "message": "Bad Content-Length: ",
  "request_id": "SOME_ID_HERE"
}

Cause

It looks like GitHub API doesn't support Transfer-Encoding: chunked header.
It's confirmed by the following curl command.

curl -v -i  -F upload=@"$(pwd)/path/to/some.jar" \
  -H "Authorization: token ${github_api_token}" \
  -H "Transfer-Encoding: chunked" \
  https://uploads.github.com/repos/USER/REPO/releases/RELEASE_ID/assets?name=some.jar
# fails with "message":"Bad Content-Length: "

If I remove -H "Transfer-Encoding: chunked from the header, it succeeds. curl sets the Content-Lengthwhen Transfer-Encoding: chunked is not set.

Detailed Description

When I do it using http4s-blaze-client, it fails since Multipart.headers has Transfer-Encoding: chunked.
If I remove Transfer-Encoding: chunked from the Multipart.headers before adding it to Request, it can successfully upload a file when the file size is small. I don't know how small is small (maybe default buffer size?), but the file with about 719k (719,018 bytes) was OK while the one with about 2.3M (2,301,910 bytes) was not.

If I remove the Transfer-Encoding: chunked header and set the Content-Length manually, it can successfully upload the file.

I don't think it matters but these are the Scala and http4s versions.
Scala version: 2.12.12
http4s version: 0.21.16

My guess for GitHub API not supporting Transfer-Encoding is probably because of this?
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Transfer-Encoding
Since HTTP 2 doesn't support it and the current GitHub API is designed for HTTP 2 so it has probably dropped any HTTP 1/1.1 specific things?


Is there any way to make http4s-blaze-client automatically set Content-Length?

@kevin-lee
Copy link
Contributor Author

Setting the Content-Length manually might be problematic because the length is not just about the file. The length should be the length of the entire request content so it is bigger than the file size.

@rossabaker
Copy link
Member

Entity Encoders will set a Content-Length header when they know the content length in advance. Because Multipart is full of parts with streams, and because we don't know the length of a stream before we run it, Multipart doesn't. And that's why we fall back to chunked encoding.

Workarounds:

  1. If you know the exact length of your encoded body (including boundaries and such!) you can set your own Content-Length header after you call withEntity. (I can't remember if we explicitly set chunked encoding, or wait for the backend to do it. You might need to strip that, too.)
  2. If you don't know the exact length of your body, and don't care about streaming, compile your request body to a ByteChunk, and then .withEntity(body = Stream.chunk(theCompiledChunk)) and set the header yourself. (That might reset your Content-Type. Check that, too. We can add API to make this more convenient if this is the way.)
  3. If every Part in your Multipart has a defined contentLength header, we could plausibly add them up (plus other prat headers plus boundaries) and set a Content-Length automatically through the entity encoder. This would have to be an enhancement in the library.

@rossabaker
Copy link
Member

All three of those are bad answers and we should make this better.

@kevin-lee
Copy link
Contributor Author

@rossabaker Thanks for your answer.
I think it should be able to figure out the size when the stream source of the Multipart if the source is a file as we can get the size of the file easily (file.length()).

I think I can probably try the second option which is compiling to ByteChunk for now.

Regarding a proper solution, what about adding an optional length info to org.http4s.multipart.Part?
e.g.)

final case class Part[F[_]](headers: Headers, body: Stream[F, Byte], length: Option[Long]) extends Media[F]

When a Part is created, it can be set if it's a File. However, I'm not sure if this would helpful to solve the problem. I can check if the Multipart or MultipartEncoder can use it to set the its headers. If you think it's worth trying, I can do it and create a PR once I confirm that it's working. At this point, I'm not sure if it's a good idea since I don't know the reasons behind the design decisions.

@kevin-lee
Copy link
Contributor Author

Sorry, never mind what I said about Part. It doesn't look so helpful.

@kevin-lee
Copy link
Contributor Author

kevin-lee commented Jan 30, 2021

Using Chunk[Byte] as a Request body with removal of Transfer-Encoding: chunked works. Thank you!
I'll use it for now and will also think about any better ways.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants