Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[client-websocket] Random crashes, possible root cause from valgrind (IDFGH-13642) #645

Closed
3 tasks done
Sean-Der opened this issue Sep 6, 2024 · 11 comments · Fixed by espressif/esp-idf#14536
Closed
3 tasks done

Comments

@Sean-Der
Copy link

Sean-Der commented Sep 6, 2024

Answers checklist.

  • I have read the documentation for esp-protocols components and the issue is not addressed there.
  • I have updated my esp-protocols branch (master or release) to the latest version and checked that the issue is present there.
  • I have searched the issue tracker for a similar issue and not found a similar issue.

General issue report

I am trying to use esp-websocket-client and seeing crashes on the Linux build. It isn't 100% reproducible yet, but I am seeing these.

I am also able to reproduce these errors with the example linux application.

==36952== Conditional jump or move depends on uninitialised value(s)
==36952==    at 0x128247: esp_event_loop_run (esp_event.c:581)
==36952==    by 0x11B995: esp_websocket_client_dispatch_event (esp_websocket_client.c:224)
==36952==    by 0x11C234: esp_websocket_client_task (esp_websocket_client.c:977)
==36952==    by 0x1232F1: pthread_task (freertos_linux.c:210)
==36952==    by 0x4BBDA93: start_thread (pthread_create.c:447)
==36952==    by 0x4C4AA33: clone (clone.S:100)
==36952==
==36952== Conditional jump or move depends on uninitialised value(s)
==36952==    at 0x117C13: app_websocket_event_handler(void*, char const*, int, void*) (websocket.cpp:31)
==36952==    by 0x127A3E: handler_execute (esp_event.c:137)
==36952==    by 0x128275: esp_event_loop_run (esp_event.c:584)
==36952==    by 0x11B995: esp_websocket_client_dispatch_event (esp_websocket_client.c:224)
==36952==    by 0x11C234: esp_websocket_client_task (esp_websocket_client.c:977)
==36952==    by 0x1232F1: pthread_task (freertos_linux.c:210)
==36952==    by 0x4BBDA93: start_thread (pthread_create.c:447)
==36952==    by 0x4C4AA33: clone (clone.S:100)
==36952==
==36952== Conditional jump or move depends on uninitialised value(s)
==36952==    at 0x117C1C: app_websocket_event_handler(void*, char const*, int, void*) (websocket.cpp:31)
==36952==    by 0x127A3E: handler_execute (esp_event.c:137)
==36952==    by 0x128275: esp_event_loop_run (esp_event.c:584)
==36952==    by 0x11B995: esp_websocket_client_dispatch_event (esp_websocket_client.c:224)
==36952==    by 0x11C234: esp_websocket_client_task (esp_websocket_client.c:977)
==36952==    by 0x1232F1: pthread_task (freertos_linux.c:210)
==36952==    by 0x4BBDA93: start_thread (pthread_create.c:447)
==36952==    by 0x4C4AA33: clone (clone.S:100)
==36952==
==36952== Conditional jump or move depends on uninitialised value(s)
==36952==    at 0x117C94: app_websocket_event_handler(void*, char const*, int, void*) (websocket.cpp:31)
==36952==    by 0x127A3E: handler_execute (esp_event.c:137)
==36952==    by 0x128275: esp_event_loop_run (esp_event.c:584)
==36952==    by 0x11B995: esp_websocket_client_dispatch_event (esp_websocket_client.c:224)
==36952==    by 0x11C234: esp_websocket_client_task (esp_websocket_client.c:977)
==36952==    by 0x1232F1: pthread_task (freertos_linux.c:210)
==36952==    by 0x4BBDA93: start_thread (pthread_create.c:447)
==36952==    by 0x4C4AA33: clone (clone.S:100)
==36952==
==36952== Conditional jump or move depends on uninitialised value(s)
==36952==    at 0x1282CE: post_instance_delete (esp_event.c:433)
==36952==    by 0x1282CE: esp_event_loop_run (esp_event.c:606)
==36952==    by 0x11B995: esp_websocket_client_dispatch_event (esp_websocket_client.c:224)
==36952==    by 0x11C234: esp_websocket_client_task (esp_websocket_client.c:977)
==36952==    by 0x1232F1: pthread_task (freertos_linux.c:210)
==36952==    by 0x4BBDA93: start_thread (pthread_create.c:447)
==36952==    by 0x4C4AA33: clone (clone.S:100)
==36952==
==36952== Conditional jump or move depends on uninitialised value(s)
==36952==    at 0x4849845: free (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==36952==    by 0x1282D4: post_instance_delete (esp_event.c:434)
==36952==    by 0x1282D4: esp_event_loop_run (esp_event.c:606)
==36952==    by 0x11B995: esp_websocket_client_dispatch_event (esp_websocket_client.c:224)
==36952==    by 0x11C234: esp_websocket_client_task (esp_websocket_client.c:977)
==36952==    by 0x1232F1: pthread_task (freertos_linux.c:210)
==36952==    by 0x4BBDA93: start_thread (pthread_create.c:447)
==36952==    by 0x4C4AA33: clone (clone.S:100)
==36952==
==36952== Conditional jump or move depends on uninitialised value(s)
==36952==    at 0x128247: esp_event_loop_run (esp_event.c:581)
==36952==    by 0x11B995: esp_websocket_client_dispatch_event (esp_websocket_client.c:224)
==36952==    by 0x11C633: esp_websocket_client_task (esp_websocket_client.c:991)
==36952==    by 0x1232F1: pthread_task (freertos_linux.c:210)
==36952==    by 0x4BBDA93: start_thread (pthread_create.c:447)
==36952==    by 0x4C4AA33: clone (clone.S:100)
==36952==
==36952== Conditional jump or move depends on uninitialised value(s)
==36952==    at 0x1282CE: post_instance_delete (esp_event.c:433)
==36952==    by 0x1282CE: esp_event_loop_run (esp_event.c:606)
==36952==    by 0x11B995: esp_websocket_client_dispatch_event (esp_websocket_client.c:224)
==36952==    by 0x11C633: esp_websocket_client_task (esp_websocket_client.c:991)
==36952==    by 0x1232F1: pthread_task (freertos_linux.c:210)
==36952==    by 0x4BBDA93: start_thread (pthread_create.c:447)
==36952==    by 0x4C4AA33: clone (clone.S:100)
@github-actions github-actions bot changed the title [client-websocket] Random crashes, possible root cause from valgrind [client-websocket] Random crashes, possible root cause from valgrind (IDFGH-13642) Sep 6, 2024
@Sean-Der
Copy link
Author

Sean-Der commented Sep 6, 2024

Is it possible to force only one thread? This code works on my esp32s3

my assumption is the multi threading on Linux is the root cause?

@david-cermak
Copy link
Collaborator

I can reproduce the reported warnings, and looking into it. My initial guess is that they're false positives, but will take a deeper look.

seeing crashes on the Linux build.

Could you please check if you still seeing the problems after applying the patch from #647 ?

Is it possible to force only one thread? This code works on my esp32s3

I'm afraid that wouldn't be very easy, as we use FreeRTOS simulator. If you'd like to experiment with that you can use FreeRTOS mock in IDF, but that's not very straight-forward.

@Sean-Der
Copy link
Author

Sean-Der commented Sep 9, 2024

@david-cermak I was able to get a reproduce on this!

On Linux if I read a websocket message <= 127 chars I get the segfault. I will paste a diff so you get it also.

I don't understand fully yet why it happens

@Sean-Der
Copy link
Author

Sean-Der commented Sep 9, 2024

@david-cermak Try this

The follow WILL NOT crash

diff --git a/components/esp_websocket_client/examples/linux/main/websocket_linux.c b/components/esp_websocket_client/examples/linux/main/websocket_linux.c
index 3329274..a8f0ba3 100644
--- a/components/esp_websocket_client/examples/linux/main/websocket_linux.c
+++ b/components/esp_websocket_client/examples/linux/main/websocket_linux.c
@@ -84,11 +84,11 @@ static void websocket_app_start(void)
     esp_websocket_register_events(client, WEBSOCKET_EVENT_ANY, websocket_event_handler, (void *)client);

     esp_websocket_client_start(client);
-    char data[32];
+    char data[256];
     int i = 0;
     while (i < 1) {
         if (esp_websocket_client_is_connected(client)) {
-            int len = sprintf(data, "hello %04d", i++);
+            int len = sprintf(data, "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvw");
             ESP_LOGI(TAG, "Sending %s", data);
             esp_websocket_client_send_text(client, data, len, portMAX_DELAY);
         }

Now try adding one more character to that long string. It should then segfault.

@Sean-Der
Copy link
Author

Sean-Der commented Sep 9, 2024

Non-crash output

I (734196109) websocket: WEBSOCKET_EVENT_DATA
I (734196109) websocket: Received opcode=1
W (734196109) websocket: Received=abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvw
W (734196109) websocket: Total payload length=127, data_len=127, current payload offset=0

Crash output

I (734285615) websocket: Sending abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwyz
I (734285739) websocket: WEBSOCKET_EVENT_DATA
I (734285739) websocket: Received opcode=1
W (734285739) websocket: Received=echo.websocket.events sponsored by Lob.com
W (734285739) websocket: Total payload length=42, data_len=42, current payload offset=0

@Sean-Der
Copy link
Author

Sean-Der commented Sep 9, 2024

The issue starts here

The value of payload_len is -128 when it should be 128. Not sure why that bit is flipped yet

@david-cermak
Copy link
Collaborator

Thanks for sharing the reproducer. it's because of integer promotions (issue only with the linux target).

a quick fix would be:

diff --git a/components/tcp_transport/transport_ws.c b/components/tcp_transport/transport_ws.c
index 4aaa1845ed9..76d9c502770 100644
--- a/components/tcp_transport/transport_ws.c
+++ b/components/tcp_transport/transport_ws.c
@@ -497,14 +498,15 @@ static int ws_read_header(esp_transport_handle_t t, char *buffer, int len, int t
     if (payload_len == 126) {
         // headerLen += 2;
         if ((rlen = esp_transport_read_internal(ws, data_ptr, header, timeout_ms)) <= 0) {
             ESP_LOGE(TAG, "Error read data");
             return rlen;
         }
-        payload_len = data_ptr[0] << 8 | data_ptr[1];
+        payload_len = data_ptr[0] << 8 | (0xFF & data_ptr[1]);
     } else if (payload_len == 127) {
         // headerLen += 8;

@Sean-Der
Copy link
Author

Sean-Der commented Sep 9, 2024

@david-cermak nice! That fixed it. Should I open a PR for that. So exciting to be able to run CI/dev locally again.

@david-cermak
Copy link
Collaborator

Should I open a PR for that. So exciting to be able to run CI/dev locally again.

@Sean-Der of course, feel free to submit a PR; this code is in IDF, though, and it might take some time till this gets reviewed and merged.

PS: also looking into other similar issues, the patch I posted was just the first "naive" shot to fix the issue.

@Sean-Der
Copy link
Author

Sean-Der commented Sep 9, 2024

Sorry I don't fully understand what the next steps are.

Are you working on a larger fix/patch @david-cermak and I should leave this alone?

I am totally ok if it takes a long time to get merged into IDF! I just want to make sure it gets fixed.

@david-cermak
Copy link
Collaborator

Yes, please feel free to submit a PR.

I'm still investigating other similar issues, but this particular bug definitely needs to be patched, and we'd be happy to merge your PR. Even though it might take a little time to get merged into IDF, your contribution is valuable, and it'll help move things forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants