Skip to content
This repository has been archived by the owner on Dec 8, 2022. It is now read-only.

MQTT Publish message with QoS0 always return success even if there is some error at network level #2155

Closed
vrnrao opened this issue Jun 19, 2020 · 2 comments

Comments

@vrnrao
Copy link
Contributor

vrnrao commented Jun 19, 2020

Describe the bug
While Publishing MQTT QoS0 message using Sync or Async API it always return SUCCESS status because the operation is actually async internally and uses MQTT operation scheduling.

System information

  • Which hardware board or part numbers?
    We are using custom NRF52 based system BG96 WAN module connected to it to communicate with AWS IoT.
  • IDE used
  • Operating System [Windows|Linux|MacOS]
    MacOS
  • Version of FreeRTOS (run git describe --tags to find it)
    We are using master branch at commit 7878b52
  • Project [Custom Application]
  • If your project is a Custom Application, please add the relevant code snippet in the section
    We are using Amazon FreeRTOS network interface implementation (iot_network_freertos.c) with MQTT lib. Implemented custom socket API so that it can be used in iot_network_freertos.c. This should be reproducible whenever underlying socket is closed abruptly.

Expected behavior
Ideally MQTT lib should be aware of the internal socket closer, but I did not find any mechanism in AFR IoT Network implementation where in can notify about this to the MQTT lib. In function which is dedicated for network recv task (_networkReceiveTask) it just exits if the socketStatus value is negative without notifying this to to the MQTT lib.
If we land up in the situation mentioned above and try to send QoS0 message will always get the SUCCESS status from MQTT API, but internally it actually fails at MQTT operation level.
This gets resolved if keep alive job kicks in and closes the connection after there is network error while sending the PING request.
But we have set keep alive at MAX value because it most of the times leads to disconnection even if there is active traffic on the MQTT connection. That is separate issue.

Screenshots or console output
Since we are using TLS as well we get mbedTLS internal error:

1865 529394 [OTA Agent Task] [INFO ][MQTT][529394] (MQTT connection 0x2000bba8) MQTT PUBLISH operation queued.
1866 529394 [OTA Agent Task] [prvRequestFileBlock_Mqtt] OK: $aws/things/<ThingName>/streams/AFR_OTA-e381ce72-e75f-43d0-b05f-787afd91fa4a/get/cbor
1867 529395 [OTA Agent Task] [prvOTAAgentTask] ,Heap low watermark: 23176. Stack high watermark: 1349.
1869 529395 [iot_thread] [ERROR][NET][529395] Error -27648 while sending data.
1870 588225 [iot_thread] [WARN ][THREAD][588225] Timeout waiting on semaphore 0x2002a014.
1871 589394 [OTA Agent Task] [INFO ][MQTT][589394] (MQTT connection 0x2000bba8) MQTT PUBLISH operation queued.
1872 589394 [OTA Agent Task] [prvRequestFileBlock_Mqtt] OK: $aws/things/<ThingName>/streams/AFR_OTA-e381ce72-e75f-43d0-b05f-787afd91fa4a/get/cbor
1873 589395 [OTA Agent Task] [prvOTAAgentTask] ,Heap low watermark: 23176. Stack high watermark: 1349.
1875 589395 [iot_thread] [ERROR][NET][589395] Error -27648 while sending data.
1876 648225 [iot_thread] [WARN ][THREAD][648225] Timeout waiting on semaphore 0x2002a014.
1877 649394 [OTA Agent Task] [INFO ][MQTT][649394] (MQTT connection 0x2000bba8) MQTT PUBLISH operation queued.
1878 649394 [OTA Agent Task] [prvRequestFileBlock_Mqtt] OK: $aws/things/<ThingName>/streams/AFR_OTA-e381ce72-e75f-43d0-b05f-787afd91fa4a/get/cbor
1879 649395 [OTA Agent Task] [prvOTAAgentTask] ,Heap low watermark: 23176. Stack high watermark: 1349.
1881 649395 [iot_thread] [ERROR][NET][649395] Error -27648 while sending data.
1882 708225 [iot_thread] [WARN ][THREAD][708225] Timeout waiting on semaphore 0x2002a014.
1883 709394 [OTA Agent Task] [INFO ][MQTT][709394] (MQTT connection 0x2000bba8) MQTT PUBLISH operation queued.
1884 709394 [OTA Agent Task] [prvRequestFileBlock_Mqtt] OK: $aws/things/<ThingName>/streams/AFR_OTA-e381ce72-e75f-43d0-b05f-787afd91fa4a/get/cbor
1885 709395 [OTA Agent Task] [prvOTAAgentTask] ,Heap low watermark: 23176. Stack high watermark: 1349.
1887 709395 [iot_thread] [ERROR][NET][709395] Error -27648 while sending data.
1888 768225 [iot_thread] [WARN ][THREAD][768225] Timeout waiting on semaphore 0x2002a014.
1889 769394 [OTA Agent Task] [INFO ][MQTT][769394] (MQTT connection 0x2000bba8) MQTT PUBLISH operation queued.
1890 769394 [OTA Agent Task] [prvRequestFileBlock_Mqtt] OK: $aws/things/<ThingName>/streams/AFR_OTA-e381ce72-e75f-43d0-b05f-787afd91fa4a/get/cbor
1891 769395 [OTA Agent Task] [prvOTAAgentTask] ,Heap low watermark: 23176. Stack high watermark: 1349.
1893 769395 [iot_thread] [ERROR][NET][769395] Error -27648 while sending data.

To reproduce
Steps to reproduce the behavior:

  1. Use MQTT with AFR IoT Network implementation.
  2. Set Keep Alive to MAX (1200).
  3. Periodically send QoS0 messages.
  4. Try to close the underlying socket directly before Keep Alive job kicks in.
  5. Observe the behavior

Code to reproduce the bug
The code should be wrapped in the cpp tag in order to be displayed clearly. For example:

printf("Hello World")

If the code is longer than 30 lines, GIST is preferred.

Additional context
Add any other context about the problem here.

Thank you!

@aggarw13
Copy link
Contributor

aggarw13 commented Jun 20, 2020

Hi @varun9sr, thank you for letting us know of the issue.
We will look into a fix for it, and will provide an update when we are ready with the fix.

@ravibhagavandas
Copy link
Contributor

@varun9sr Changes for the fix is merged to v4_beta branch. Please let us know if this fixes your issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants