Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IoTHubDeviceClient_UploadMultipleBlocksToBlob_Async can miss some errors #2569

Closed
pulkomandy opened this issue Jan 30, 2024 · 14 comments
Closed
Labels

Comments

@pulkomandy
Copy link

** Environment **

  • Development system: Yocto Kirkstone, ARM64, using custom hardware.
  • SDK version: LTS_08_2823

** Problem description **

We use IoTHubDeviceClient_UploadMultipleBlocksToBlob_Async to start an upload. However, there is an error during the upload setup. Logs extract:

[warning]: [azure] Failure in HTTP communication: server reply code is 404
[info]: [azure] HTTP Response:<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN""http://www.w3.org/TR/html4/strict.dtd">
                                                         <HTML><HEAD><TITLE>Not Found</TITLE>
                                                         <META HTTP-EQUIV="Content-Type" Content="text/html; charset=us-ascii"></HEAD>
                                                         <BODY><h2>Not Found</h2>
                                                         <hr><p>HTTP Error 404. The requested resource is not found.</p>
                                                         </BODY></HTML>
[warning]: [azure] HTTP failed response code was 404
[warning]: [azure] unable to HTTPAPIEX_ExecuteRequest
[warning]: [azure] error in IoTHubClient_LL_UploadToBlob_GetBlobCredentialsFromIoTHub
[warning]: [azure] Failed initializing upload in IoT Hub

These error come from

LogError("error in IoTHubClient_LL_UploadToBlob_GetBlobCredentialsFromIoTHub");
and
LogError("Failed initializing upload in IoT Hub");

In our case we are using asynchronous upload so this is called from here inside uploadMultipleBlock_thread:

https://github.com/Azure/azure-iot-sdk-c/blob/main/iothub_client/src/iothub_client_core.c#L2437

The thread function returns an error, but this is not propagated to the caller function that started the thread: https://github.com/Azure/azure-iot-sdk-c/blob/main/iothub_client/src/iothub_client_core.c#L2475 (and it can't be, because it's happening asynchronously in another thread).

There is no way for the caller code to know that the upload has failed. IoTHubClientCore_UploadMultipleBlocksToBlobAsync but the upload callback is never called. I think it could make sense in this case to call the callback (IOTHUB_CLIENT_FILE_UPLOAD_GET_DATA_CALLBACK_EX getDataCallbackEx) with an IOTHUB_CLIENT_FILE_UPLOAD_RESULT error code indicating a failed upload. Otherwise, our software has no way to know that the upload is failed and should be retried. In our case, we make sure to only have one upload at a time, so we stay blocked and never upload anything anymore if this happens.

@ewertons
Copy link
Contributor

ewertons commented Mar 6, 2024

Hi @pulkomandy , could you please take a look at these new more-granular upload-to-blob API functions?
https://github.com/Azure/azure-iot-sdk-c/blob/main/iothub_client/samples/iothub_client_sample_upload_to_blob_with_retry/iothub_client_sample_upload_to_blob_with_retry.c

They would give you more control over detecting intermediary errors and attemping retries.

@ewertons
Copy link
Contributor

ewertons commented Mar 6, 2024

But it's well noted, your assessment of the issue is correct. We will look into fixing it. Update coming soon.

@pulkomandy
Copy link
Author

Hello,

We have been using this patch to fix the problem:

From f9d46850bc024989c16110f7c7f8b1a25d51823f Mon Sep 17 00:00:00 2001
From: Nicolas Savatier <nicolas.savatier@viveris.fr>
Date: Thu, 1 Feb 2024 10:22:46 +0000
Subject: [PATCH] [LCB-1144] Add callback call on upload failure in Azure
 Worker Thread

---
 iothub_client/src/iothub_client_core.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/iothub_client/src/iothub_client_core.c b/iothub_client/src/iothub_client_core.c
index 6da8b7216..08b911b7f 100644
--- a/iothub_client/src/iothub_client_core.c
+++ b/iothub_client/src/iothub_client_core.c
@@ -2431,10 +2431,22 @@ static int uploadMultipleBlock_thread(void* data)
     if (threadInfo->uploadBlobMultiblockSavedData.getDataCallback != NULL)
     {
         result = IoTHubClientCore_LL_UploadMultipleBlocksToBlob(llHandle, threadInfo->destinationFileName, threadInfo->uploadBlobMultiblockSavedData.getDataCallback, threadInfo->context);
+        if(result != IOTHUB_CLIENT_OK)
+        {
+            //An error happened during IoTHubClientCore_LL_UploadMultipleBlocksToBlob(),
+            //notify the caller by calling its callback function.
+            threadInfo->uploadBlobMultiblockSavedData.getDataCallback(result, NULL, NULL, threadInfo->context);
+        }
     }
     else
     {
         result = IoTHubClientCore_LL_UploadMultipleBlocksToBlobEx(llHandle, threadInfo->destinationFileName, threadInfo->uploadBlobMultiblockSavedData.getDataCallbackEx, threadInfo->context);
+        if(result != IOTHUB_CLIENT_OK)
+        {
+            //An error happened during IoTHubClientCore_LL_UploadMultipleBlocksToBlobEx(),
+            //notify the caller by calling its callback function.
+            threadInfo->uploadBlobMultiblockSavedData.getDataCallbackEx(result, NULL, NULL, threadInfo->context);
+        }
     }
     (void)markThreadReadyToBeGarbageCollected(threadInfo);
 
-- 
2.39.2

Sorry, we forgot to update the issue with the patch. We have confirmed that it works for us and solves the problem.

We know of the new granular API, but if we can avoid rewriting our code with a simple fix like this, it saves us a lot of time.

@ewertons
Copy link
Contributor

ewertons commented Mar 7, 2024

Thank you a lot, @pulkomandy .
I thought about the fix following these guidelines, but there is a catch there. It could result in the callback to the user app being called twice (in case the failure happens in IoTHubClient_LL_UploadToBlob_UploadMultipleBlocks). It will need a callback into the iothub_client_core layer first that can be used to control if a callback from iothub_client_ll_uploadtoblob hasn't been invoked already.

ewertons added a commit that referenced this issue Mar 9, 2024
…2593)

* Float upload to blob errors in convenience layer (gh issue #2569)

* Add unit tests

* Address CR comments
@ewertons
Copy link
Contributor

ewertons commented Mar 9, 2024

Hi @pulkomandy , we merged a fix for this issue. Thank you very much for reporting it!
We will close this issue early next week. If you have a chance to verify the fix from your side, we would appreciate it.
Thanks,
Azure IoT SDK Team.

@pulkomandy
Copy link
Author

Hello,

Thanks for the fix. I don't know if we can fully verify it, since we noticed the error after a problem on the Azure IoT cloud side, which we can't reproduce predictably.

We will try to update the version of the SDK we use whenever possible (possibly waiting for the next release).

@ewertons
Copy link
Contributor

Perfect. Thank you @pulkomandy , we will close this issue for now then.

@pulkomandy
Copy link
Author

Hello @ewertons, we have started using this patch and it improves things, unfortunately we have hit another problematic case.

The following scenario happens:

  • Start uploading a file
  • The callbacks are called to send data, and the data sending phase eventually completes without problems
  • The callback is called one last time to indicate an upload success (result == FILE_UPLOAD_OK, data or size == NULL)

At this point, our code decides that the upload is succesful, we delete the corresponding local file and the upload context.

However, later on there is an error in IotHubClient_LL_UploadToBlob_NotifyCompletion (this happens 1 minute later in our case since it is a timeout error):

 [azure] curl_easy_perform() failed: Timeout was reached
 [azure] (result = HTTPAPI_OPEN_REQUEST_FAILED (4))
 [azure] unable to recover sending to a working state
 [azure] unable to HTTPAPIEX_ExecuteRequest
 [azure] unable to execute send_http_request
 [azure] IoTHubClient_LL_UploadToBlob_NotifyIoTHubOfUploadCompletion failed
 [azure] Failed completing upload to blob.

since there were no errors during the upload phase, isCallbackInvokedWithError is not set, and so the error callback is called.

When using a synchronous upload function such as IoTHubClientCore_LL_UploadMultipleBlobsToBlockEx, this would be equivalent to the callback reporting a success, but the function eventually returning an error code.

As a result of this, we can't know for sure when an upload is really fully completed. Even after we received a notification with FILE_UPLOAD_OK and data or size == NULL, which should mean the upload is succesfully completed, we can still receive an error callback later on. So, we don't know when it is safe to consider an upload complete and successful.

I'm not sure what's the best option here, some ideas are:

  • Intercept the "upload success" callback, not forwarding it to the application, and call the application completion callback from the upload thread only after the upload function has returned. This would preserve the current API contract for the callback (when it's called to report a successful upload, the upload is complete)
  • Always call the callback at the end with the result of the upload function, either success or failure, no matter if the callback has been called before in the "normal" way. This would allow to be really sure that the upload is complete in all cases, but would require a specific way to call the callback (maybe a new status code?)

@pulkomandy
Copy link
Author

Hello,

I made some sequence diagrams to explain the problem more clearly:

image

And the modification I made:

image

@ewertons
Copy link
Contributor

ewertons commented May 3, 2024

Hi @pulkomandy , I was investigating this latest issue and I think I got it wrong the first time.
For iothub_client_core, this should be the logic:

  • Capture the last callback invocation from IoTHubClientCore_LL_UploadMultipleBlocksToBlob (uploadToBlobMultiblockCallbackWrapper) or IoTHubClientCore_LL_UploadMultipleBlocksToBlobEx (uploadToBlobMultiblockCallbackWrapperEx), no matter the result - name it blobStorageUploadResult
  • Save the result of the upload to blob API call (IoTHubClientCore_LL_UploadMultipleBlocksToBlob or IoTHubClientCore_LL_UploadMultipleBlocksToBlobEx) - name it result
  • In uploadMultipleBlock_thread, after the upload to blob API call, if result is IOTHUB_CLIENT_OK and blobStorageUploadResult is FILE_UPLOAD_OK, invoke threadInfo->uploadBlobMultiblockSavedData.getDataCallback (or Ex) with FILE_UPLOAD_OK. Otherwise invoke it with FILE_UPLOAD_ERROR

That way, for the convenience layer the final callback will always be called after the whole upload to blob logic is completed. I'll post a PR with those changes, once I do that could you please validate it as well before we merged it?

@pulkomandy
Copy link
Author

pulkomandy commented May 3, 2024 via email

@ewertons
Copy link
Contributor

ewertons commented May 3, 2024

This is the initial PR.
https://github.com/Azure/azure-iot-sdk-c/tree/ewertons/FixThreadedUploadToBlob

I could not run our pipeline against it yet, but it worked locally in my sample run.

@ewertons
Copy link
Contributor

ewertons commented May 8, 2024

PR #2615 posted and pipeline tests run. Please review if possible, we will be merging it soon.

@ewertons
Copy link
Contributor

HI @pulkomandy,
The PR with the fix has been merged.
Please let us know if you have any other feedback. Feel free to reopen this case if you need to.
And thank you once more for your contributions, we really value your feedback and investment in this project.
Thanks,
Azure IoT SDKs Team

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants