Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapt S3 TransferUtility for unstable network conditions #616

Closed
hipwelljo opened this issue Dec 13, 2018 · 60 comments
Closed

Adapt S3 TransferUtility for unstable network conditions #616

hipwelljo opened this issue Dec 13, 2018 · 60 comments
Labels
feature-request Request a new feature s3 Issues with the AWS Android SDK for Simple Storage Service (S3).

Comments

@hipwelljo
Copy link

hipwelljo commented Dec 13, 2018

Describe the bug
I am testing uploading photos to S3 on a very bad network connection. I am finding that the transfer starts, reports 0% progress then 100% but then the upload state changes to FAILED and yet onError is not called. This is surprising to me. I can confirm the upload did not succeed. It never appears to attempt a retry or otherwise report any other information about this transfer.

What is the expected behavior here? How can I ensure that photo uploads don't get "lost" - do I need to manually retry them? The iOS SDK automatically retries them and I configured the retryLimit to be 10, is there a similar approach with the Android SDK? Thanks!

To Reproduce
Steps to reproduce the behavior:

  1. Connect to iPhone hotspot with Network Link Conditioner enabled and set to Edge
  2. Begin uploading photo
  3. Switch the Network Link Conditioner profile to Very Bad Network

Which AWS service(s) are affected?
S3 TransferUtility

Expected behavior
The upload should succeed, and retry if it fails, eventually succeeding. It should not report 100% progress unless the upload succeeded.

Environment(please complete the following information):

  • SDK Version: 2.9.0

Device Information (please complete the following information):

  • Device: Google pixel
  • Android Version: Nougat 7.1.2
  • Specific to simulators: No

Logs

2018-12-13 16:12:47.594 13106-13106/com.x.x D/RestHelper: ID:7 bytesCurrent: 0 bytesTotal: 125024 0%
2018-12-13 16:13:11.970 13106-13106/com.x.x D/RestHelper: ID:7 bytesCurrent: 125024 bytesTotal: 125024 100%
2018-12-13 16:13:27.002 13106-13106/com.x.x D/RestHelper: ID:7 bytesCurrent: 125024 bytesTotal: 125024 100%
2018-12-13 16:13:27.027 13106-13106/com.x.x D/RestHelper: Upload state changed to FAILED

Code

        TransferUtility transferUtility =
                TransferUtility.builder()
                        .context(context.getApplicationContext())
                        .awsConfiguration(AWSMobileClient.getInstance().getConfiguration())
                        .s3Client(new AmazonS3Client(AWSMobileClient.getInstance()))
                        .build();

        final String filename = "image_" + randomAlphaNumeric(10) + ".jpg";

        File file = scaleImage(uri, context.getCacheDir().getPath() + File.separator + filename);

        TransferObserver transferObserver = transferUtility.upload("bucketname", filename, file);
        transferObserver.setTransferListener(new TransferListener() {
            @Override
            public void onStateChanged(int id, TransferState state) {
                Log.d(TAG, "Upload state changed to " + state);
            }

            @Override
            public void onProgressChanged(int id, long bytesCurrent, long bytesTotal) {
                float percentDonef = ((float) bytesCurrent / (float) bytesTotal) * 100;
                int percentDone = (int)percentDonef;

                Log.d(TAG, "ID:" + id + " bytesCurrent: " + bytesCurrent
                        + " bytesTotal: " + bytesTotal + " " + percentDone + "%");
            }

            @Override
            public void onError(int id, Exception ex) {
                Log.e(TAG, ex);
            }
        });

If helpful, this is how I initialize it in my startup activity:

        AWSMobileClient.getInstance().initialize(getApplicationContext(), new Callback<UserStateDetails>() {
            @Override
            public void onResult(UserStateDetails result) {
                Log.d(TAG, "AWSMobileClient initialized. User State is " + result.getUserState());

                TransferUtility transferUtility =
                        TransferUtility.builder()
                                .context(getApplicationContext())
                                .awsConfiguration(AWSMobileClient.getInstance().getConfiguration())
                                .s3Client(new AmazonS3Client(AWSMobileClient.getInstance()))
                                .build();
                transferUtility.resumeAllWithType(UPLOAD); //resume any uploads that were paused on app termination
            }

            @Override
            public void onError(Exception e) {
                Log.e(TAG, "Initialization error.", e);
            }
        });

        getApplicationContext().startService(new Intent(getApplicationContext(), TransferService.class));
@sunchunqiang sunchunqiang added s3 Issues with the AWS Android SDK for Simple Storage Service (S3). feature-request Request a new feature labels Dec 14, 2018
@mutablealligator
Copy link
Contributor

mutablealligator commented Dec 23, 2018

@hipwelljo Sorry for the delayed response. I am splitting this question into two separate issues:

Issue-1: (FEATURE-REQUEST) TransferUtility does not retry the failed transfers. This is a feature that is currently not supported by TransferUtility in Android. I will take this feature request to the team for prioritization. As a workaround, I would suggest you to retry the upload/download when you encounter a failure. You can decide the condition for success as: STATE == TransferState.COMPLETED && PROGRESS_PERCENTAGE == 100 and if it's not met it should be a failure. In the failure case, you can retry the transfer by calling the upload and download methods again.

Issue-2: (BUG) There is inconsistency in the error/state/progress reporting mechanism in TransferUtility. If I understand correctly, the transfer progress is 100% where bytesTransferred is equal to bytesTotal, however the upload failed with onStateChanged reporting FAILED state and onError was not invoked. If it's multi-part upload (>= 5MB), my suspicion would be that the multi-part complete request might have failed when the network connection is poor. In this case, it should be a bug and I will look into the reporting mechanism.

Can I know the size of the file being uploaded?

@mutablealligator mutablealligator added the pending-community-response Issue is pending response from the issue requestor label Dec 27, 2018
@hipwelljo
Copy link
Author

Thanks for the info on item 1! For item 2, yes that is exactly correct - the transfer progress is reported to be 100% where transferred equals the total even though the upload did fail, onStateChanged reported the FAILED state, and onError was not invoked. It is however not a multipart upload since the file size is under 1 MB - only 122 KB in that example.

@mutablealligator
Copy link
Contributor

mutablealligator commented Dec 28, 2018

@hipwelljo Thank you for the response. Can you paste the logs from Logcat when the item#2 occurs? I couldn't reproduce the issue so it would be beneficial if you could share the logs to see where the failure is.

@hipwelljo
Copy link
Author

Oh interesting it looks like it's trying to do a multipart even though this photo was just 58 KB.

12-28 16:22:02.513 24459-24459/com.mycompany.myapp D/RestHelper: Upload state changed to IN_PROGRESS
12-28 16:22:02.518 24459-24500/com.mycompany.myapp D/EGL_emulation: eglMakeCurrent: 0xb4369100: ver 2 0
12-28 16:22:02.530 24459-24500/com.mycompany.myapp D/EGL_emulation: eglMakeCurrent: 0xb4369100: ver 2 0
12-28 16:22:02.530 24459-24459/com.mycompany.myapp D/RestHelper: ID:2 bytesCurrent: 0 bytesTotal: 59380 0%
12-28 16:22:02.537 24459-24500/com.mycompany.myapp D/EGL_emulation: eglMakeCurrent: 0xb4369100: ver 2 0
12-28 16:22:02.538 24459-24500/com.mycompany.myapp D/OpenGLRenderer: endAllStagingAnimators on 0xa33e5e00 (RippleDrawable) with handle 0xa31f4fd0
12-28 16:22:02.844 24459-24500/com.mycompany.myapp D/EGL_emulation: eglMakeCurrent: 0xb4369100: ver 2 0
12-28 16:22:06.551 24459-24459/com.mycompany.myapp D/RestHelper: ID:2 bytesCurrent: 59380 bytesTotal: 59380 100%
12-28 16:22:21.578 24459-24459/com.mycompany.myapp D/RestHelper: ID:2 bytesCurrent: 59380 bytesTotal: 59380 100%
12-28 16:22:21.580 24459-24683/com.mycompany.myapp E/UploadTask: Error encountered during multi-part upload: 2 due to Unable to execute HTTP request: Read timed out
    com.amazonaws.AmazonClientException: Unable to execute HTTP request: Read timed out
        at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:441)
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:212)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4659)
        at com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1752)
        at com.amazonaws.mobileconnectors.s3.transferutility.UploadTask.uploadSinglePartAndWaitForCompletion(UploadTask.java:221)
        at com.amazonaws.mobileconnectors.s3.transferutility.UploadTask.call(UploadTask.java:99)
        at com.amazonaws.mobileconnectors.s3.transferutility.UploadTask.call(UploadTask.java:55)
        at java.util.concurrent.FutureTask.run(FutureTask.java:237)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1112)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:587)
        at java.lang.Thread.run(Thread.java:818)
     Caused by: java.net.SocketTimeoutException: Read timed out
        at com.android.org.conscrypt.NativeCrypto.SSL_read(Native Method)
        at com.android.org.conscrypt.OpenSSLSocketImpl$SSLInputStream.read(OpenSSLSocketImpl.java:699)
        at com.android.okio.Okio$2.read(Okio.java:113)
        at com.android.okio.RealBufferedSource.indexOf(RealBufferedSource.java:147)
        at com.android.okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:94)
        at com.android.okhttp.internal.http.HttpConnection.readResponse(HttpConnection.java:179)
        at com.android.okhttp.internal.http.HttpTransport.readResponseHeaders(HttpTransport.java:101)
        at com.android.okhttp.internal.http.HttpEngine.readResponse(HttpEngine.java:628)
        at com.android.okhttp.internal.http.HttpURLConnectionImpl.execute(HttpURLConnectionImpl.java:388)
        at com.android.okhttp.internal.http.HttpURLConnectionImpl.getResponse(HttpURLConnectionImpl.java:332)
        at com.android.okhttp.internal.http.HttpURLConnectionImpl.getResponseMessage(HttpURLConnectionImpl.java:496)
        at com.android.okhttp.internal.http.DelegatingHttpsURLConnection.getResponseMessage(DelegatingHttpsURLConnection.java:109)
        at com.android.okhttp.internal.http.HttpsURLConnectionImpl.getResponseMessage(HttpsURLConnectionImpl.java:25)
        at com.amazonaws.http.UrlHttpClient.createHttpResponse(UrlHttpClient.java:92)
        at com.amazonaws.http.UrlHttpClient.execute(UrlHttpClient.java:85)
        at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:371)
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:212) 
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4659) 
        at com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1752) 
        at com.amazonaws.mobileconnectors.s3.transferutility.UploadTask.uploadSinglePartAndWaitForCompletion(UploadTask.java:221) 
        at com.amazonaws.mobileconnectors.s3.transferutility.UploadTask.call(UploadTask.java:99) 
        at com.amazonaws.mobileconnectors.s3.transferutility.UploadTask.call(UploadTask.java:55) 
        at java.util.concurrent.FutureTask.run(FutureTask.java:237) 
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1112) 
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:587) 
        at java.lang.Thread.run(Thread.java:818) 
12-28 16:22:21.603 24459-24459/com.mycompany.myapp D/RestHelper: Upload state changed to FAILED

@hipwelljo
Copy link
Author

FWIW, I reproduced that last one using the Network Link Conditioner in macOS with the android emulator, not a physical device this time.

@mutablealligator
Copy link
Contributor

@hipwelljo From the stacktrace, I see uploadSinglePartAndWaitForCompletion which means its single-part. The error message might be misleading as it has the word multi-part. Sorry about that. I will check on the error message.

Do you have TransferService declared in the AndroidManifest? Are you starting the service from your app?

@mutablealligator mutablealligator added the bug Something isn't working label Dec 28, 2018
@hipwelljo
Copy link
Author

Ok good catch.

Yes, TransferService is in the manifest like so:
<service android:name="com.amazonaws.mobileconnectors.s3.transferutility.TransferService" android:enabled="true" />

Yes, it is started from the first activity - see startService in the original report in this thread - the last line there.

@mutablealligator
Copy link
Contributor

@hipwelljo I would suggest to start the service from Application.onCreate()

public class MyApplication extends Application {
    // Overriding this method is totally optional!
    @Override
    public void onCreate() {
        super.onCreate();
        // Required initialization logic here!

        // Network service
        getApplicationContext().startService(new Intent(getApplicationContext(), TransferService.class));
    }
}

Can you try with this?

@hipwelljo
Copy link
Author

Okay I've moved that startService call into an Application subclass, but I continue to get the same outcome. Do you recommend moving the initialize call into there as well, or keep that in the first activity?

@hipwelljo
Copy link
Author

In reproducing this issue more today, I am seeing the upload does in fact succeed, which explains why it reports 100% progress, so it seems the only issue is the incorrect state update of FAILED when it should be COMPLETED.

@mutablealligator
Copy link
Contributor

@hipwelljo We can debug this issue in the call.

@minbi minbi removed the pending-community-response Issue is pending response from the issue requestor label Dec 31, 2018
@hipwelljo
Copy link
Author

To document this for anyone following along, as discussed on the call, it was suspected maybe the poor network results in a failure to verify the photo was uploaded to s3 due to a timeout - s3 has a 20s socket timeout. That could maybe explain why the photo was uploaded successfully yet the sdk couldn't confirm that and thus it reports it FAILED. Karthik is investigating to see if another request is made for single-part uploads. As a temporary workaround, it was suggested to check for a FAILED state and then find out if the object exists in s3 via a metadata HEAD request and if not then retry it.

@mutablealligator
Copy link
Contributor

I can confirm that we do not make additional request for single-part uploads. The retry functionality is a feature request for TransferUtility in Android. Meanwhile as @hipwelljo pointed out, one could check for the FAILED state and resume or start a new upload transfer.

@hipwelljo
Copy link
Author

Thanks for checking into that @kvasukib! Do you have any other ideas as to why sometimes the state changes to FAILED even though the photo did upload to S3 on a poor connection like edge?

@mutablealligator
Copy link
Contributor

@hipwelljo I would check the device, API level and network conditions. Can you try the same experiment on a different device or network condition? You could also check if there any firewall policies set on the network?

@hipwelljo
Copy link
Author

hipwelljo commented Jan 10, 2019

@kvasukib The issue only occurs with a poor network condition like Edge data speeds (but many of our customers will be in that scenario) that I've seen. I do only have the one physical device available to test with, besides emulators which I can reproduce this issue on those as well (Pixel and Pixel 2, let me know if you would like me to try others). I can confirm there's not any firewall policies set on the network and have reproduced it on two different WiFi networks.

@gilbertdigio
Copy link

@hipwelljo @kvasukib Hi guys, first of all sorry for my English.

I have the same issue simulating 3G conditions (1024 kbps Upload bandwidth).
I think resume or restart the failed upload is not worth it. Under the same conditions timeout will take place again.

I receive "Failed" signal but observing with a proxy I can see that all parts have been "Completed". Very strange. However I stop receiving "InProgress" updates from the SDK.

The size of the uploaded video is 110 Mb.

Any solution or alternative workaround? Thanks in advance.

Timeout log
01-25 10:49:35.041 31095-31521/io.n3n.cubicam E/UploadPartTask: Encountered error uploading part com.amazonaws.AmazonClientException: Unable to execute HTTP request: timeout at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:441) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:212) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4655) at com.amazonaws.services.s3.AmazonS3Client.uploadPart(AmazonS3Client.java:3672) at com.amazonaws.mobileconnectors.s3.transferutility.UploadPartTask.call(UploadPartTask.java:51) at com.amazonaws.mobileconnectors.s3.transferutility.UploadPartTask.call(UploadPartTask.java:29) at java.util.concurrent.FutureTask.run(FutureTask.java:237) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1133) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:607) at java.lang.Thread.run(Thread.java:761) Caused by: java.net.SocketTimeoutException: timeout at com.android.okhttp.okio.Okio$3.newTimeoutException(Okio.java:212) at com.android.okhttp.okio.AsyncTimeout.exit(AsyncTimeout.java:250) at com.android.okhttp.okio.AsyncTimeout$2.read(AsyncTimeout.java:217) at com.android.okhttp.okio.RealBufferedSource.indexOf(RealBufferedSource.java:306) at com.android.okhttp.okio.RealBufferedSource.indexOf(RealBufferedSource.java:300) at com.android.okhttp.okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:196) at com.android.okhttp.internal.http.HttpConnection.readResponse(HttpConnection.java:191) at com.android.okhttp.internal.http.HttpTransport.readResponseHeaders(HttpTransport.java:80) at com.android.okhttp.internal.http.HttpEngine.readNetworkResponse(HttpEngine.java:906) at com.android.okhttp.internal.http.HttpEngine.readResponse(HttpEngine.java:782) at com.android.okhttp.internal.huc.HttpURLConnectionImpl.execute(HttpURLConnectionImpl.java:463) at com.android.okhttp.internal.huc.HttpURLConnectionImpl.getResponse(HttpURLConnectionImpl.java:405) at com.android.okhttp.internal.huc.HttpURLConnectionImpl.getResponseMessage(HttpURLConnectionImpl.java:517) at com.android.okhttp.internal.huc.DelegatingHttpsURLConnection.getResponseMessage(DelegatingHttpsURLConnection.java:109) at com.android.okhttp.internal.huc.HttpsURLConnectionImpl.getResponseMessage(HttpsURLConnectionImpl.java) at com.amazonaws.http.UrlHttpClient.createHttpResponse(UrlHttpClient.java:92) at com.amazonaws.http.UrlHttpClient.execute(UrlHttpClient.java:85) at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:371) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:212)  at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4655)  at com.amazonaws.services.s3.AmazonS3Client.uploadPart(AmazonS3Client.java:3672)  at com.amazonaws.mobileconnectors.s3.transferutility.UploadPartTask.call(UploadPartTask.java:51)  at com.amazonaws.mobileconnectors.s3.transferutility.UploadPartTask.call(UploadPartTask.java:29)  at java.util.concurrent.FutureTask.run(FutureTask.java:237)  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1133)  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:607)  at java.lang.Thread.run(Thread.java:761) 

@mutablealligator
Copy link
Contributor

@gilbertdigio Sorry for the inconvenience caused.
Do this happen every time you upload the file?
What version of the SDK are you using?

@gilbertdigio
Copy link

@kvasukib Every time I repeat the test under those conditions. Also increasing the bandwidth to 2048 kbps.

I'm using 2.11.0 version of the SDK.

@mutablealligator
Copy link
Contributor

@hipwelljo @gilbertdigio

I have fixed this issue in 2.11.1 version of the SDK. It contains improved exception handling and pause/resume handling and fixes the progress reporting issue when a transfer is paused and resumed. Please upgrade to 2.11.1 and let us know if it fixes the issue.

Root cause analysis:

  1. When a transfer is paused by the user or due to network drop, the thread that is managing the multipart transfer is interrupted. When the thread is interrupted and encounters an exception, there is a case where we skip setting the transfer to PAUSED or WAITING_FOR_NETWORK, and instead moves to FAILED.

  2. When a transfer is paused by the user or due to network drop, the thread that is managing the multipart transfer is interrupted. When a multi-part transfer (N parts) is interrupted, there are X number of parts that may have been completed, Y number of parts that may been in progress and Z number of parts that are yet to be started (X + Y + Z = N). When a part that is in progress is interrupted, and then resumed at a later point, the part is retried from the beginning, however the progress made in the previous attempt that is interrupted is not reset from the total progress. This caused the progress reported to go over 100%.

Description of the fix:

The fix involves

  1. improving the exception handling of transfers:
    Improved the state, progress and error reporting when the transfers are interrupted.
  • When the transfer is paused or cancelled by the user, the state is reported correctly.
  • When the transfer is interrupted because of a network drop, the state is set to WAITING_FOR_NETWORK when the TransferNetworkLossHandler is used.
  • When the transfer is interrupted otherwise, the transfer is set to FAILED and the exception is reported via TransferListener.onError callback.
  1. Fix the progress reporting to not go beyond 100% and reset when a part failed.

See 2.11.1 for more information.

@mutablealligator mutablealligator added the pending-community-response Issue is pending response from the issue requestor label Feb 4, 2019
@gilbertdigio
Copy link

Hi @kvasukib , I have updated to 2.11.1.

I have also instantiated TransferNetworkLossHandler before retrieving TransferUtility but I still receive timeout error under the same scenario. (simulating 3G conditions 1024 kbps Upload bandwidth.)

TransferNetworkLossHandler.getInstance(context.applicationContext)

sTransferUtility = TransferUtility.builder().context(context.applicationContext)
                    .s3Client(getS3Client(context.applicationContext))
                    .build()

It`s supposed that I should receive WAITING_FOR_NETWORK state but I receive FAILED state instead. Maybe I'm missing something.

I receive multiple timeout errors from the pending parts of the transfers. However I'm watching the transfer with a proxy and none of the pending parts is marked as error.

Timeout log

02-05 14:05:20.321 29475-2010/io.n3n.cubicam.develop E/UploadPartTask: Upload part interrupted: com.amazonaws.AmazonClientException: Unable to execute HTTP request: timeout
02-05 14:05:20.378 29475-2010/io.n3n.cubicam.develop E/UploadPartTask: Encountered error uploading part 
    com.amazonaws.AmazonClientException: Unable to execute HTTP request: timeout
        at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:441)
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:212)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4655)
        at com.amazonaws.services.s3.AmazonS3Client.uploadPart(AmazonS3Client.java:3672)
        at com.amazonaws.mobileconnectors.s3.transferutility.UploadPartTask.call(UploadPartTask.java:60)
        at com.amazonaws.mobileconnectors.s3.transferutility.UploadPartTask.call(UploadPartTask.java:30)
        at java.util.concurrent.FutureTask.run(FutureTask.java:237)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1113)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:588)
        at java.lang.Thread.run(Thread.java:818)
     Caused by: java.net.SocketTimeoutException: timeout
        at com.android.okhttp.okio.Okio$3.newTimeoutException(Okio.java:207)
        at com.android.okhttp.okio.AsyncTimeout.exit(AsyncTimeout.java:250)
        at com.android.okhttp.okio.AsyncTimeout$2.read(AsyncTimeout.java:217)
        at com.android.okhttp.okio.RealBufferedSource.indexOf(RealBufferedSource.java:306)
        at com.android.okhttp.okio.RealBufferedSource.indexOf(RealBufferedSource.java:300)
        at com.android.okhttp.okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:196)
        at com.android.okhttp.internal.http.HttpConnection.readResponse(HttpConnection.java:191)
        at com.android.okhttp.internal.http.HttpTransport.readResponseHeaders(HttpTransport.java:80)
        at com.android.okhttp.internal.http.HttpEngine.readNetworkResponse(HttpEngine.java:905)
        at com.android.okhttp.internal.http.HttpEngine.readResponse(HttpEngine.java:789)
        at com.android.okhttp.internal.huc.HttpURLConnectionImpl.execute(HttpURLConnectionImpl.java:443)
        at com.android.okhttp.internal.huc.HttpURLConnectionImpl.getResponse(HttpURLConnectionImpl.java:388)
        at com.android.okhttp.internal.huc.HttpURLConnectionImpl.getResponseMessage(HttpURLConnectionImpl.java:497)
        at com.android.okhttp.internal.huc.DelegatingHttpsURLConnection.getResponseMessage(DelegatingHttpsURLConnection.java:109)
        at com.android.okhttp.internal.huc.HttpsURLConnectionImpl.getResponseMessage(HttpsURLConnectionImpl.java:25)
        at com.amazonaws.http.UrlHttpClient.createHttpResponse(UrlHttpClient.java:92)
        at com.amazonaws.http.UrlHttpClient.execute(UrlHttpClient.java:85)
        at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:371)
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:212) 
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4655) 
        at com.amazonaws.services.s3.AmazonS3Client.uploadPart(AmazonS3Client.java:3672) 
        at com.amazonaws.mobileconnectors.s3.transferutility.UploadPartTask.call(UploadPartTask.java:60) 
        at com.amazonaws.mobileconnectors.s3.transferutility.UploadPartTask.call(UploadPartTask.java:30) 
        at java.util.concurrent.FutureTask.run(FutureTask.java:237) 
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1113) 
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:588) 
        at java.lang.Thread.run(Thread.java:818) 

Any suggestions? thanks in advance, and sorry for my English.

@stale stale bot removed the closing soon Issue will auto-close if there is no additional activity within 7 days. label Apr 6, 2019
@mutablealligator
Copy link
Contributor

@hipwelljo Thank you for the detailed feedback. We only detect when the network disconnects and currently TransferUtility does not detect poor network condition. We set the state to WAITING_FOR_NETWORK only when the network disconnects and we get a callback from Android about the network disconnect.

@hipwelljo
Copy link
Author

Okay. Are you working towards a fix for these issues we're encountering reported here?

  1. On a really slow connection a photo may successfully upload 100% but the SDK may then report it failed instead of completed
  2. Sometimes retrying the photo upload will not fail or succeed - it'll get stuck

@mutablealligator
Copy link
Contributor

@hipwelljo When the state is reported as FAILED, do you get the onError callback with the exception?

@dmillermx
Copy link

dmillermx commented Apr 8, 2019

Hello @kvasukib
We've tested the last version 2.13.1 and it still fails on bad network connection:

The error:  com.amazonaws.AmazonClientException: Unable to execute HTTP request: timeout
We uploading 2 files: first one is picture, the second one is video file, here the log if it will help you.
Uploading file /data/user/0/com.truvideo.staging/files/Camera/Videos/ed666241cf2fd5c9d1ba2b781769987_mp4_thumb.png
TransferStatusUpdater: Creating a new progress listener for transfer: 5
onStateChanged: 5, state=IN_PROGRESS
Error encountered while uploading thumb: com.amazonaws.AmazonClientException: Unable to execute HTTP request: timeout
onStateChanged: 5, state=FAILED
Uploading video [/data/user/0/com.truvideo.staging/files/Camera/Videos/ed666241cf2fd5c9d1ba2b781769987.mp4]
onStateChanged: 6, state=IN_PROGRESS
TransferStatusUpdater: Creating a new progress listener for transfer: 6
com.amazonaws.AmazonClientException: Unable to execute HTTP request: timeout
onProgressChanged: 6, bytesCurrent=765862, bytesTotal=765862
onStateChanged: 6, state=FAILED

@hipwelljo
Copy link
Author

@kvasukib Yes, onError is where I log Log.e(TAG, "Failed to upload " + id + ": " + ex);

@mutablealligator
Copy link
Contributor

@hipwelljo @dmillermx From the logs (onProgressChanged: 6, bytesCurrent=765862, bytesTotal=765862) I can see that the bytesCurrent and bytesTotal are equal and the file has been uploaded successfully but the TransferUtility reported the transfer as FAILED.

When you get the onError callback, what is the exception that is being passed into the onError callback?

@dmillermx
Copy link

@kvasukib, onError callback we see next in log:
com.amazonaws.AmazonClientException: Unable to execute HTTP request: timeout

Thank you

@mutablealligator
Copy link
Contributor

@dmillermx Does this happen for all transfers (single-part, multi-part uploads and downloads)?

Currently our logic when the transfer errors out with an exception is the following:-

  • transfer is canceled, set to CANCELED
  • transfer is paused, set to PAUSED
  • a valid transfer network loss handler is registered and network is not connected, set to WAITING_FOR_NETWORK
  • transfer is interrupted, set to FAILED
  • all other exceptions, set to FAILED

We don't have a mechanism at this point to detect a poor network condition when the transfer errors out. We are only able to detect when the network is completely disconnected.

@hipwelljo
Copy link
Author

hipwelljo commented Apr 8, 2019

Yes, that same exception dmillermx posted. This is single part uploads since all our photo uploads are small file sizes. We don't utilize the others in this app.

This scenario where the state changes to FAILED even though it may have actually succeeded on a bad connection isn't as worrying to me as the issue where sometimes retrying the photo upload will neither fail nor succeed, just gets lost.

@mutablealligator
Copy link
Contributor

@hipwelljo Can you elaborate on what you mean when the upload gets lost?

@hipwelljo
Copy link
Author

hipwelljo commented Apr 8, 2019

As shown in the logs in this reply, from my 6th test, I retried the upload but nothing ever happened after that (I left it for 10 minutes) - state never changed to IN_PROGRESS or FAILED. So the end result is the UI shows a progress bar filled to 100% but it will never complete.

@dmillermx
Copy link

Does this happen for all transfers (single-part, multi-part uploads and downloads)?

During testing with emulation of bad connection it happens always - 100% for single-part and for multi-part uploads.

@mutablealligator mutablealligator removed the bug Something isn't working label Apr 9, 2019
@mutablealligator
Copy link
Contributor

@dmillermx @hipwelljo I will take this as a feature request to the team as we may have to implement a retry mechanism to retry the transfer when the network condition is poor by detecting the request timeout.

Meanwhile, we would suggest you to retry the transfer based on the exception thrown in the onError and check if the object exists on Amazon S3 by using AmazonS3Client.doesObjectExist(bucket, key) to successfully upload an object.

@dmillermx
Copy link

@kvasukib Thank you, we'll try.

@hipwelljo
Copy link
Author

@kvasukib I am already retrying the transfer manually when the state changes to FAILED, which is called right after onError. After doing that though it sometimes gets stuck and nothing happens after that.

I retry the upload like so:

@Override
            public void onStateChanged(int id, TransferState state) {
                Log.d(TAG, "Upload state changed to " + state + " for " + id);

                if (state == TransferState.COMPLETED) {
                    //...
                } else if (state == TransferState.FAILED) {
                    //try again - sdk doesn't automatically retry
                    Log.d(TAG, "Retrying photo upload for " + id);
                    TransferUtility transferUtility = getTransferUtility(context);
                    TransferObserver newTransferObserver = transferUtility.resume(id);
                    newTransferObserver.setTransferListener(this);
                }
            }

@mutablealligator
Copy link
Contributor

@hipwelljo @dmillermx Sorry for the delayed response. Can you try attaching a new TransferListener in the observer for every retry? The transfer listeners registered for a transfer will be cleared when the transfer terminates (successfully or not).

My solution for this problem would be to retry the transfer when this exception occurs. The retry can be done at the SDK level or at the app level with the resume functionality. Regardless of the solution being in SDK or app, the best we could do is to retry the transfer for N number of times or until it eventually succeeds.

On a side note, can you find the number of available processors on the device where you are experiencing this problem? When you do a multi-part upload, we create two thread pools: Pool-1 of size N + 1 and Pool-2 of size N + 1, where N is the number of available processors on the mobile device found through (Runtime.getRuntime().availableProcessors()). Pool-1 is used for single-part uploads, downloads and Pool-2 is used for multi-part uploads. When the network condition is not stable (poor but still connected and available) and there are a lot of transfers competing for the network resources, there is a lot of contention in the network and the connection may be left idle for more than 20 seconds. My suggestion would be to lower the number of threads in the thread-pool using TransferUtilityOptions or try single-part uploads in the retry mechanism.

@mutablealligator mutablealligator changed the title S3 TransferUtility: State changes to failed on very bad network, but onError is not called and progress is reported to be 100% S3 TransferUtility: State changes to failed on very bad network Apr 25, 2019
@mutablealligator mutablealligator changed the title S3 TransferUtility: State changes to failed on very bad network Adapt S3 TransferUtility for unstable network conditions Apr 25, 2019
@palpatim palpatim added the pending-community-response Issue is pending response from the issue requestor label Apr 29, 2019
@hipwelljo
Copy link
Author

Hi @kvasukib I was able to look at this again. I should always be performing single-part uploads because the file sizes are so small (<200 KB). I attached a new TransferListener for each retry, instead of this, and so far I have not seen the issue where retrying the upload causes it to get lost and not ever fail nor succeed. It is always failing on repeat until it eventually succeeds in my tests.

@palpatim palpatim removed the pending-community-response Issue is pending response from the issue requestor label May 9, 2019
@Shakezulla57
Copy link

This still seems to be an issue. Setting an emulator into a poor network state will make the transfer fail (which is expected). Once a good network is reestablished the transfer will resume itself and complete. However, the TransferObserver won't have its TransfertState updated. The issue is that we can't seem to accurately tell if an upload has actually failed (for real) or has failed initially but retried by the SDK. I've also setup a RetryPolicy to not retry and have 0 max retries, but yet the SDK still retries to upload on timeouts.

@mutablealligator mutablealligator removed their assignment Dec 16, 2019
awsmobilesdk pushed a commit to awsmobilesdk/aws-sdk-android that referenced this issue Apr 12, 2020
@raphkim
Copy link
Contributor

raphkim commented Apr 15, 2020

Hi @Shakezulla57,

TransferNetworkLossHandler currently only supports recovering from complete loss of network, and does not support retrying upon slow/unstable connection.

I've also setup a RetryPolicy to not retry and have 0 max retries, but yet the SDK still retries to upload on timeouts.

This is an unrelated multipart upload bug that may have been resolved recently in #1536 . Would you be able to check if you are experiencing this issue with our newest SDK release and submit a new ticket? Thank you.

@raphkim raphkim closed this as completed Apr 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request Request a new feature s3 Issues with the AWS Android SDK for Simple Storage Service (S3).
Projects
None yet
Development

No branches or pull requests