-
Notifications
You must be signed in to change notification settings - Fork 549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adapt S3 TransferUtility for unstable network conditions #616
Comments
@hipwelljo Sorry for the delayed response. I am splitting this question into two separate issues: Issue-1: (FEATURE-REQUEST) TransferUtility does not retry the failed transfers. This is a feature that is currently not supported by TransferUtility in Android. I will take this feature request to the team for prioritization. As a workaround, I would suggest you to retry the upload/download when you encounter a failure. You can decide the condition for success as: Issue-2: (BUG) There is inconsistency in the error/state/progress reporting mechanism in TransferUtility. If I understand correctly, the transfer progress is 100% where bytesTransferred is equal to bytesTotal, however the upload failed with onStateChanged reporting FAILED state and onError was not invoked. If it's multi-part upload (>= 5MB), my suspicion would be that the multi-part complete request might have failed when the network connection is poor. In this case, it should be a bug and I will look into the reporting mechanism. Can I know the size of the file being uploaded? |
Thanks for the info on item 1! For item 2, yes that is exactly correct - the transfer progress is reported to be 100% where transferred equals the total even though the upload did fail, onStateChanged reported the FAILED state, and onError was not invoked. It is however not a multipart upload since the file size is under 1 MB - only 122 KB in that example. |
@hipwelljo Thank you for the response. Can you paste the logs from Logcat when the item#2 occurs? I couldn't reproduce the issue so it would be beneficial if you could share the logs to see where the failure is. |
Oh interesting it looks like it's trying to do a multipart even though this photo was just 58 KB.
|
FWIW, I reproduced that last one using the Network Link Conditioner in macOS with the android emulator, not a physical device this time. |
@hipwelljo From the stacktrace, I see Do you have |
Ok good catch. Yes, TransferService is in the manifest like so: Yes, it is started from the first activity - see |
@hipwelljo I would suggest to start the service from public class MyApplication extends Application {
// Overriding this method is totally optional!
@Override
public void onCreate() {
super.onCreate();
// Required initialization logic here!
// Network service
getApplicationContext().startService(new Intent(getApplicationContext(), TransferService.class));
}
} Can you try with this? |
Okay I've moved that startService call into an Application subclass, but I continue to get the same outcome. Do you recommend moving the initialize call into there as well, or keep that in the first activity? |
In reproducing this issue more today, I am seeing the upload does in fact succeed, which explains why it reports 100% progress, so it seems the only issue is the incorrect state update of FAILED when it should be COMPLETED. |
@hipwelljo We can debug this issue in the call. |
To document this for anyone following along, as discussed on the call, it was suspected maybe the poor network results in a failure to verify the photo was uploaded to s3 due to a timeout - s3 has a 20s socket timeout. That could maybe explain why the photo was uploaded successfully yet the sdk couldn't confirm that and thus it reports it FAILED. Karthik is investigating to see if another request is made for single-part uploads. As a temporary workaround, it was suggested to check for a FAILED state and then find out if the object exists in s3 via a metadata HEAD request and if not then retry it. |
I can confirm that we do not make additional request for single-part uploads. The retry functionality is a feature request for |
Thanks for checking into that @kvasukib! Do you have any other ideas as to why sometimes the state changes to FAILED even though the photo did upload to S3 on a poor connection like edge? |
@hipwelljo I would check the device, API level and network conditions. Can you try the same experiment on a different device or network condition? You could also check if there any firewall policies set on the network? |
@kvasukib The issue only occurs with a poor network condition like Edge data speeds (but many of our customers will be in that scenario) that I've seen. I do only have the one physical device available to test with, besides emulators which I can reproduce this issue on those as well (Pixel and Pixel 2, let me know if you would like me to try others). I can confirm there's not any firewall policies set on the network and have reproduced it on two different WiFi networks. |
@hipwelljo @kvasukib Hi guys, first of all sorry for my English. I have the same issue simulating 3G conditions (1024 kbps Upload bandwidth). I receive "Failed" signal but observing with a proxy I can see that all parts have been "Completed". Very strange. However I stop receiving "InProgress" updates from the SDK. The size of the uploaded video is 110 Mb. Any solution or alternative workaround? Thanks in advance. Timeout log |
@gilbertdigio Sorry for the inconvenience caused. |
@kvasukib Every time I repeat the test under those conditions. Also increasing the bandwidth to 2048 kbps. I'm using 2.11.0 version of the SDK. |
I have fixed this issue in Root cause analysis:
Description of the fix: The fix involves
See 2.11.1 for more information. |
Hi @kvasukib , I have updated to 2.11.1. I have also instantiated TransferNetworkLossHandler.getInstance(context.applicationContext)
sTransferUtility = TransferUtility.builder().context(context.applicationContext)
.s3Client(getS3Client(context.applicationContext))
.build() It`s supposed that I should receive WAITING_FOR_NETWORK state but I receive FAILED state instead. Maybe I'm missing something. I receive multiple timeout errors from the pending parts of the transfers. However I'm watching the transfer with a proxy and none of the pending parts is marked as error. Timeout log
Any suggestions? thanks in advance, and sorry for my English. |
@hipwelljo Thank you for the detailed feedback. We only detect when the network disconnects and currently |
Okay. Are you working towards a fix for these issues we're encountering reported here?
|
@hipwelljo When the state is reported as FAILED, do you get the |
Hello @kvasukib
|
@kvasukib Yes, |
@hipwelljo @dmillermx From the logs (onProgressChanged: 6, bytesCurrent=765862, bytesTotal=765862) I can see that the bytesCurrent and bytesTotal are equal and the file has been uploaded successfully but the TransferUtility reported the transfer as When you get the onError callback, what is the exception that is being passed into the onError callback? |
@kvasukib, onError callback we see next in log: Thank you |
@dmillermx Does this happen for all transfers (single-part, multi-part uploads and downloads)? Currently our logic when the transfer errors out with an exception is the following:-
We don't have a mechanism at this point to detect a poor network condition when the transfer errors out. We are only able to detect when the network is completely disconnected. |
Yes, that same exception dmillermx posted. This is single part uploads since all our photo uploads are small file sizes. We don't utilize the others in this app. This scenario where the state changes to FAILED even though it may have actually succeeded on a bad connection isn't as worrying to me as the issue where sometimes retrying the photo upload will neither fail nor succeed, just gets lost. |
@hipwelljo Can you elaborate on what you mean when the upload gets lost? |
As shown in the logs in this reply, from my 6th test, I retried the upload but nothing ever happened after that (I left it for 10 minutes) - state never changed to IN_PROGRESS or FAILED. So the end result is the UI shows a progress bar filled to 100% but it will never complete. |
During testing with emulation of bad connection it happens always - 100% for single-part and for multi-part uploads. |
@dmillermx @hipwelljo I will take this as a feature request to the team as we may have to implement a retry mechanism to retry the transfer when the network condition is poor by detecting the request timeout. Meanwhile, we would suggest you to retry the transfer based on the exception thrown in the |
@kvasukib Thank you, we'll try. |
@kvasukib I am already retrying the transfer manually when the state changes to I retry the upload like so:
|
@hipwelljo @dmillermx Sorry for the delayed response. Can you try attaching a new My solution for this problem would be to retry the transfer when this exception occurs. The retry can be done at the SDK level or at the app level with the resume functionality. Regardless of the solution being in SDK or app, the best we could do is to retry the transfer for N number of times or until it eventually succeeds. On a side note, can you find the number of available processors on the device where you are experiencing this problem? When you do a multi-part upload, we create two thread pools: Pool-1 of size N + 1 and Pool-2 of size N + 1, where N is the number of available processors on the mobile device found through ( |
Hi @kvasukib I was able to look at this again. I should always be performing single-part uploads because the file sizes are so small (<200 KB). I attached a new TransferListener for each retry, instead of |
This still seems to be an issue. Setting an emulator into a poor network state will make the transfer fail (which is expected). Once a good network is reestablished the transfer will resume itself and complete. However, the |
…when network disconnects (aws-amplify#865)
Hi @Shakezulla57,
This is an unrelated multipart upload bug that may have been resolved recently in #1536 . Would you be able to check if you are experiencing this issue with our newest SDK release and submit a new ticket? Thank you. |
Describe the bug
I am testing uploading photos to S3 on a very bad network connection. I am finding that the transfer starts, reports 0% progress then 100% but then the upload state changes to FAILED and yet onError is not called. This is surprising to me. I can confirm the upload did not succeed. It never appears to attempt a retry or otherwise report any other information about this transfer.
What is the expected behavior here? How can I ensure that photo uploads don't get "lost" - do I need to manually retry them? The iOS SDK automatically retries them and I configured the retryLimit to be 10, is there a similar approach with the Android SDK? Thanks!
To Reproduce
Steps to reproduce the behavior:
Which AWS service(s) are affected?
S3 TransferUtility
Expected behavior
The upload should succeed, and retry if it fails, eventually succeeding. It should not report 100% progress unless the upload succeeded.
Environment(please complete the following information):
Device Information (please complete the following information):
Logs
Code
If helpful, this is how I initialize it in my startup activity:
The text was updated successfully, but these errors were encountered: