-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S3: scrounging bandwidth for uploads #2103
Comments
It'd also be super nifty if we had a way to track the progress of background tasks in the admin console. |
Good idea, this would be helpful! |
Any thoughts on how we should manage the network traffic? Uploading hundreds of projects will require some automation in any event. But I'd prefer to do so using Chrystinne's code rather than trying to script it in some other way. |
Sorry, not my area of expertise! Personally I think I'd just take a short term hit on our network, perhaps alongside a news item explaining why downloads are slow. |
Thinking about this a little more, my preference would be:
This seems like an approach that may be useful in the longer term (rather than a one-off, just for the initial batch of uploads to AWS). |
I don't think that's practical, though. When I say "competing for bandwidth", I mean that uploading to Amazon would be limited to the same speed as everyone else; uploading 30 TB would take months. It's true that in theory we could monkey with the traffic control settings to prioritize certain connections over others, but that's difficult and finicky and I don't want to try to deal with it. |
to set a custom proxy, something like this should work
|
In pull #2086, one thing that worries me a little is that it will initially be very slow to upload projects. At present, the upload process would have to compete for bandwidth with all of the clients currently downloading data.
I can think of a couple of workarounds:
We could do the uploads from the backup (physionet-production) server. One advantage is that it's in a completely different physical location. However, it would be a messy manual process and we would probably need to manually update the database on physionet-live.
We could configure the S3 client to use an HTTP proxy (via a separate network link, albeit from the same building.) In fact we could set a single proxy server for everything (GCP, DataCite, ORCID, as well as AWS) but I think it might be preferable to configure S3 separately.
One thing I don't want to do is to prioritize uploads over client requests.
The text was updated successfully, but these errors were encountered: