-
-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace java.net.HTTPUrlConnection
with java.net.http.HttpClient
(500USD Bounty)
#150
Comments
java.net.HTTPUrlConnection
with java.net.http.HttpClient
java.net.HTTPUrlConnection
with java.net.http.HttpClient
(500USD Bounty)
I'll take a stab at this now |
HTTPUrlConnection has you write the request body by exposing an OutputStream, and RequestBlob is designed around this. OTOH with HttpClient you pass a BodyPublisher, which is an abstraction analogous to RequestBlob other than the headers, that has helpers given you have a byte array or InputStream or other common cases. Otherwise it's based on Can I change the RequestBlob API, or do we have to work with it? In particular I'm not sure how to adapt it to a BodyPublisher without either loading it all into memory or starting another thread. TBH I'm not sure why it has to take something as powerful as Publisher which can have multiple subscribers. |
@nafg does |
I guess you mean BodyPublisher (BodyHandler would be for the response body). It does "the right thing" as you say, but I don't know how to adapt a RequestBlob to it (without reading it into memory or starting another thread). |
As I said, the cleanest implementation would be to completely break the RequestBlob API. (Of course we would still support implicitly adapting the same types of values.) |
Maybe I could just create a connected PipedOutputStream and PipedInputStream and pass the former to |
Sorry, I mixed up If I understand the problem right, it is that Making Spawning a thread is an option, but an expensive one. What if we instead use the Java HTTPClient's I'm actually not sure exactly how the async concurrency story works with java.net.http.HttpClient. Does it need an Executor, ExecutionContext, or Threadpool of some sort? We need to sort that out as part of this ticket so we can understand what the consequences of this replacement will be |
Yeah, the builder takes an Executor IIRC |
Does that mean that we give it an InputStream that will block until we write to the OutputStream? Not sure I follow. |
I'm not totally sure myself TBH. But if you said you can solve the problem by spawning a thread, it seems like you should be able to solve the problem by using sendAsync and taking advantage of the calling thread. |
Do I need to handle chunkedUpload==false |
I don't see an equivalent of connection.getResponseMessage |
@nafg You'll have to dig through the code and tell me what the tradeoffs and necessary changes are |
Re response message (reason phrase) see https://stackoverflow.com/a/63576759/333643 Some points from there: HTTP 2 and some HTTP 1 servers don't provide it, HttpClient has no API to get it, and it doesn't really have any purpose. Two options:
|
I guess we'll need more breaking changes or fake compatibility: https://stackoverflow.com/q/53617574/333643 (can't control keep-alive) |
It doesn't let you set Content-Length manually; it should be determined by the BodyPublisher. Which means RequestBlob can't define it among an arbitrary list of headers. If I can make breaking changes as desired the most natural replacement would be trait RequestBlob {
def bodyPublisher: BodyPublisher
def contentType: String
} There are less breaking ways to do it though. I guess that's really the theme of all my questions. Is this supposed to be "preserve the interface, swap the implementation" or is this going to be a V2 designed around HttpClient? If it's somewhere in the middle then I have to consult on all the specifics I guess. |
I think it's somewhere in the middle:
At least for now, it seems we have "use |
I have all tests passing except For some reason it returns a 400. I don't know why, and I don't know what it has to do with gzip. Any thoughts? |
@nafg not sure, you'll have to dig into it to take a look. Maybe try setting up a proxy to intercept the requests and compare the difference between the old and new? |
Just logging this here to remember it. I used webhook.site to see what the gzipError test was doing different. I saved each as HAR JSON and compared with https://jsoncompare.com/#!/diff/id=a5efdd49952712265bd06d4e8be477cb&fullscreen&sortAlphabetically/ and got this: |
so:
I'm guessing the issue (if any of the above and not something more subtle) is (4). But I still don't know what the purpose of the test is. |
So using a BodyPublisher with a fixed length when contentLength is known (as would be correct anyway) fixes the 400. But it creates another issue, which I think is a bug in the current behavior that now crashes: Content-Length should be the size after compression. Since there's no way to know the length after compression without compressing it, that basically means that known-content length RequestBlobs must be written into memory, which means there's no point in them specifying their content length at all. They should just say if they should be streamed or not. |
The gzip + content length thing seems like it might need an API change to make it work. What if we defaulted to I don't see an easy way to make the decision automatically in a principled fashion, so maybe the best we can do is let the user choose |
I didn't fully understand that. I thought we needed an API change anyway and that was okay. I'll implement my last idea in the near future and push my code up. I think that should help us communicate ;) |
Regarding the last issue, what I did for now is that the request body is fixed-length if it's empty or has a known length and is uncompressed. If it is compressed or its length is unknown then it will be streamed. In both cases the request is first initiated with (Their API is a bit high level but IIUC fixed-length would mean it includes a Content-Length header and streaming means it will be Transfer-Encoding: chunked.) This means that uploading even a large file without compression will load it all (compressed) into memory (using the provided implicit conversion). I don't know if there is alternative (other than perhaps picking a maximum size above which we always stream, or else saying that certain RequestBlobs such as files always should stream). Similar for adapting an arbitrary Writable that defines a |
@nafg I cut 0.9.0-RC1. Will leave it for a few weeks before cutting 0.9.0 final. If you email me your bank transfer details I can close out the bounty |
I don't know your email address |
***@***.***
…On Mon, 24 Jun 2024 at 5:22 PM, Naftoli Gugenheim ***@***.***> wrote:
I don't know your email address
—
Reply to this email directly, view it on GitHub
<#150 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAHEB7DFLMWWY2EAPEPPSCDZI7JM7AVCNFSM6AAAAABCINXRLSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBVHEYDCMZTGI>
.
You are receiving this because you modified the open/close state.Message
ID: ***@***.***>
|
@nafg https://github.com/lihaoyi in case the email address is censored, because I see this as the response |
huh, github censors emails??? |
Maybe only in email responses? 🤷 |
This would fix #123, and put us in a better state going forward.
HttpClient
was standardized on Java 11, which is probably old enough we can rely on it alreadyHttpClient
provides an async API, and we should expose it asrequests.get.async
,requests.post.async
, etc.. Apart from that, the rest of the API and test suite should be unchanged. All existing tests should pass, with any unavoidable failures or necessary changes in the tests called out in the ticketThis should also resolve #123
To incentivize contribution, I'm putting a 500USD bounty on resolving this ticket. This is payable via bank transfer, and at my discretion in case of ambiguity.
The text was updated successfully, but these errors were encountered: