Skip to content
This repository has been archived by the owner on Nov 5, 2022. It is now read-only.

Consider Using a WebSocket for Publish Endpoint #57

Open
jtgeibel opened this issue Oct 29, 2019 · 2 comments
Open

Consider Using a WebSocket for Publish Endpoint #57

jtgeibel opened this issue Oct 29, 2019 · 2 comments

Comments

@jtgeibel
Copy link
Member

So this may be a crazy idea, and it would probably take some effort to work into our existing middleware on the crates.io side, but it may be path to removing a few papercuts in the publishing workflow. Those issues are:

  • Slow clients may not upload an entire crate within the 30 second response timeout on Heroku, causing the connection to be disconnected.
  • Now that the index updating is done in a background job, a crate isn't immediately available in the index. When publishing a group of crates from a script, the first crate will be published, but the second crate (depending on the first) fails to publish because the background job hasn't completed yet.

The idea occurred to me when I was reading the Heroku docs on timeouts. As long as the server starts to send some response data within the first 30 seconds, longer term connections can be maintained. Typically, the server doesn't know the correct HTTP status code to respond with until the crate is accepted (all checks passed and added to the background queue). It isn't always possible to send some response data within the initial timeout.

However, if new clients request a connection upgrade, the server can respond with a 101 header and switch to a WebSocket connection. The server could send a message when the crate is accepted, and then another one once the index update is complete. If there is an unexpected delay in running the background job, the server could send a more descriptive error message notifying the user that no further action is needed on their part, and then closing the connection.

Alternatives

The main alternative I'm aware of is to use a scheme where:

  • cargo sends metadata and crates.io sends an S3 signature that cargo uses to POST the crate directly to S3.
  • cargo notifies crates.io that the upload is complete
  • the server verifies the crate and enqueues the background job. (I guess the server would have to delete the upload if verification fails.)

Drawbacks

The main drawback is that this adds complexity, especially in the server. The current design assumes a complete response value is returned up the middleware layers and there is no mechanism for maintaining a long term connection. It may be possible to bolt something on in a reasonable way, and it might be blocked by switching to hyper in production.

@smarnach
Copy link
Contributor

Just for completeness, another alternative would be do move away from Heroku. I'm not saying we should, but I do notice that we are fighting Heroku's limitations rather often (e.g. having to move the whole app behind CloudFront, noisy neighbours, having to worry about the memory increase caused by Fasstboot, runtime configuration, 30-second limit, among other problems I currently don't remember).

Currently we don't have the capacity to move to anything else, neither would we want to, but I expect that the growing traffic will force us to move to a cheaper option within the next few years.

@ehuss ehuss mentioned this issue May 8, 2020
@Nemo157
Copy link
Member

Nemo157 commented May 25, 2020

Another, another alternative would be to send 100 Continue responses periodically while the upload is in progress (though this is also something I haven't seen supported by any Rust web servers since they all want a single response per-request).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants