Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wifi-UDP: sendto return ENOMEM only with short buffer or if a client is disconnected (IDFGH-5068) #6852

Closed
pbruynbroeck opened this issue Apr 9, 2021 · 13 comments
Labels
Resolution: Done Issue is done internally Status: Done Issue is done internally

Comments

@pbruynbroeck
Copy link

pbruynbroeck commented Apr 9, 2021

Environment

  • Development Kit: none
  • Module or chip used: ESP32-WROVER-IE
  • IDF version: v4.0.1 AND v4.2-rc
  • Build System: idf.py
  • Compiler version: xtensa-esp32-elf-gcc (crosstool-NG esp-2019r2) 8.2.0
  • Operating System: Linux
  • Using an IDE?: [no]
  • Power Supply: external 3.3V

Problem Description

In this project, the ESP send UDP data to two clients. Every second, we send 15 packets of data at each client.

As is done in the iperf.c example, when sendto return -1, we add a delay and re-call sendto if errno == ENOMEM:

static int udp_send_data(ip4_addr_t ip ,char* buf, size_t buf_len)
{
    cliaddr.sin_addr.s_addr = ip.addr;
    return sendto(udp_server_socket, (const char *)buf, buf_len, 0, (const struct sockaddr *) &cliaddr, socklen);;
}

int udp_send_data_blocking(ip4_addr_t ip ,char* buf, size_t buf_len, uint32_t timeout_ms)
{
    int64_t start = esp_timer_get_time()/1000;
    int64_t now = start;

    int err_code = 0;
    int32_t actual_send = 0;
    bool timeout = false;

    int tries = 0;

    do
    {
        actual_send = udp_send_data(ip, buf, buf_len);
        tries++;

        err_code = (actual_send < 0) ? errno : 0;

        now = esp_timer_get_time()/1000;
        timeout = (now-start) > timeout_ms;

        if(!err_code || (err_code != ENOMEM))
            break;

        vTaskDelay(1);

    }while(!timeout);

    if(timeout || err_code)
        ESP_LOGE(TAG, "SENDTO - %s (%d) (timeout: %d, socket: %d, ip_addr: %s, socklen: %d)", strerror(err_code), err_code, timeout, udp_server_socket, ip4addr_ntoa(&ip), socklen);

    if((tries > 1) || (actual_send != buf_len))
        ESP_LOGW(TAG, "SENDTO in %d tries and %d ms (size:%d, buf_len:%d)", tries, (uint32_t)(now-start), actual_send, buf_len);

    return (timeout || err_code || (actual_send != buf_len) || (tries > 1));
}

We have observed different behavior between the esp-idf 4.2-rc and 4.0.1 if we change the packet size. (The number of packet per second doesn't change !)

packet size (bytes) esp-idf average sendto time (us) Comment
256 4.0.1 360~440 many retries
256 4.2-rc 260~300 many retries
1024 4.0.1 400~470 some retries
1024 4.2-rc 415~480 some retries
2048 4.0.1 900~1100 never return -1 ! No trouble !
2048 4.2-rc 500~600 never return -1 ! No trouble !

A. Why when the size of the buffer increase (but the number of packet doesn't change), sendto never return -1 ?

B. Only on 4.2 (not every time), when a client is disconnected (power off), no event has observed, sendto return -1 and errno returnENOMEM...:

  • Why there is a difference of behaviour with 4.0.1 ?
  • How to managed a disconnection ?
E (1236391) UDP_MANAGER: SENDTO - Not enough space (12) (timeout: 1, socket: 54, ip_addr: 192.168.4.2, socklen: 16)
W (1236392) UDP_MANAGER: SENDTO in 127 tries and 252 ms (size:-1, buf_len:24)
E (1236653) UDP_MANAGER: SENDTO - Not enough space (12) (timeout: 1, socket: 54, ip_addr: 192.168.4.2, socklen: 16)
W (1236654) UDP_MANAGER: SENDTO in 127 tries and 251 ms (size:-1, buf_len:24)
E (1236915) UDP_MANAGER: SENDTO - Not enough space (12) (timeout: 1, socket: 54, ip_addr: 192.168.4.2, socklen: 16)
W (1236916) UDP_MANAGER: SENDTO in 127 tries and 252 ms (size:-1, buf_len:24)
E (1237177) UDP_MANAGER: SENDTO - Not enough space (12) (timeout: 1, socket: 54, ip_addr: 192.168.4.2, socklen: 16)
W (1237178) UDP_MANAGER: SENDTO in 127 tries and 252 ms (size:-1, buf_len:24)
E (1237439) UDP_MANAGER: SENDTO - Not enough space (12) (timeout: 1, socket: 54, ip_addr: 192.168.4.2, socklen: 16)
W (1237440) UDP_MANAGER: SENDTO in 127 tries and 252 ms (size:-1, buf_len:24)
E (1237701) UDP_MANAGER: SENDTO - Not enough space (12) (timeout: 1, socket: 54, ip_addr: 192.168.4.2, socklen: 16)
W (1237702) UDP_MANAGER: SENDTO in 127 tries and 251 ms (size:-1, buf_len:24)
E (1237963) UDP_MANAGER: SENDTO - Not enough space (12) (timeout: 1, socket: 54, ip_addr: 192.168.4.2, socklen: 16)
W (1237964) UDP_MANAGER: SENDTO in 127 tries and 252 ms (size:-1, buf_len:24)
E (1238225) UDP_MANAGER: SENDTO - Not enough space (12) (timeout: 1, socket: 54, ip_addr: 192.168.4.2, socklen: 16)

Code to reproduce this issue

Code for esp-idf 4.0.1: wifi_udp_4_0_1.tar.gz

Code for esp-idf 4.2-rc: wifi_udp_4_2.tar.gz

@espressif-bot espressif-bot added the Status: Opened Issue is new label Apr 9, 2021
@github-actions github-actions bot changed the title Wifi-UDP: sendto return ENOMEM only with short buffer or if a client is disconnected Wifi-UDP: sendto return ENOMEM only with short buffer or if a client is disconnected (IDFGH-5068) Apr 9, 2021
@Alvin1Zhang
Copy link
Collaborator

Thanks for the very detailed report, and sorry for the inconvenience, we will look into.

@pbruynbroeck
Copy link
Author

pbruynbroeck commented Apr 13, 2021

Hi @Alvin1Zhang,

I have additional info to help you in the debugging. With esp-idf 4.2, in lwip component (https://github.com/espressif/esp-lwip/blob/889c3875bf2154356aab10b36025add28909a933), when the client is disconnected:

@Alvin1Zhang
Copy link
Collaborator

@pbruynbroeck Thanks for sharing the extra details.

@MaxwellAlan
Copy link
Collaborator

hi @pbruynbroeck thanks for reporting.

A. Why when the size of the buffer increase (but the number of packet doesn't change), sendto never return -1 ?
Yeah, just like you last comments, the difference is because ip fragment.

B. Only on 4.2 (not every time), when a client is disconnected (power off), no event has observed, sendto return -1 and errno returnENOMEM...:
Why there is a difference of behaviour with 4.0.1 ?
How to managed a disconnection ?

This issue is a little complicated.
Firstly, the differences between v4.0 with v4.2 may because some reasons that we not backport this fix to v4.0.
Secondly, the root cause of this issue is that, when station connect to softap, station go to sleep and poweroff, the softap will buffer the station's pkts(the power save buffer of softap is only 16), however the station already poweroff, so the softap will keep buffer this station's pkts until pkts timeout, which will cause always return ENOMEM.

@pbruynbroeck
Copy link
Author

Hi @MaxwellAlan,

A. Why do not return the error when the packet is fragmented ?

B.1 So, what its your advice to detecting a disconnection ?
B.2 How is it managed in broadcast mode ?

@pbruynbroeck
Copy link
Author

@MaxwellAlan,

the softap will keep buffer this station's pkts until pkts timeout

How long this timeout last ? Can we reduce it ?

@MaxwellAlan
Copy link
Collaborator

MaxwellAlan commented Apr 21, 2021

hi @pbruynbroeck , you can try esp_wifi_set_ps(WIFI_PS_NONE) to disable power save and test, both sta and softap.
It is not easy for softap to know when sta go to sleep and power off.

@Alvin1Zhang
Copy link
Collaborator

@pbruynbroeck Thanks for reporting, would you please help share if any updates for this issue? Thanks.

@Alvin1Zhang
Copy link
Collaborator

Thanks for reporting, will close due to short of feedback, feel free to reopen.

@espressif-bot espressif-bot added Resolution: Done Issue is done internally Status: Done Issue is done internally and removed Status: Opened Issue is new labels Sep 9, 2021
@yihuangong
Copy link

we have the same problem, I want to ask that you have solved the problem or not

@fredy0219
Copy link

I have the same problem too. And during a while of sendto() cycle, it always report ENOMEM and crash.
I'm using esp_wifi_set_ps(WIFI_PS_MAX_MODEM) for power save. How to avoid such issue.

@nopnop2002
Copy link

nopnop2002 commented Sep 24, 2022

@yihuangong @fredy0219

The simplest easy fix is to add a vTaskDelay() after sendto() in the loop.
This gives the TCP/IP task some time to send out UDP packets.
However, this reduces the maximum transfer rate.
This reduces the failure rate by a lot, although there are still some ENOMEM results due to timing (sometimes the TCP/IP task doesn't get a chance to clear it's backlog due to a higher priority task running instead).

ENOMEM occurs more on single-core ESP32.
This is because TCP/IP tasks and user tasks use one core at the same time.

while(1) {
	ret = sendto(sock, buffer, 960, 0, (struct sockaddr*)&toAddr, sizeof(toAddr));
	ESP_LOGI(TAG, "ret=%d errno=%d", ret, errno);
	if (ret < 0) {
	    if (errno == ENOMEM) {
	        ESP_LOGW(TAG, "lwip_sendto fail. %d", errno);
	        vTaskDelay(10);
	    } else {
	        ESP_LOGE(TAG, "lwip_sendto fail. %d", errno);
                break;
	    }
	}
}

@colinb2048
Copy link

The default wifi station config (WIFI_INIT_CONFIG_DEFAULT) uses static TX buffer type (CONFIG_ESP32_WIFI_TX_BUFFER_TYPE = 0) whether you're using PSRAM or not. The IDF documentation recommends dynamic TX buffer for non PSRAM use, and static for when you have access to this extra memory.

In higher latency scenarios this caused enomem issues for me. Changing the TX buffer type to dynamic seems to have solved this issue. Also increasing the number of dynamic TX buffers can also help, at the expense of higher mem usage.

Redefining :
#define CONFIG_ESP32_WIFI_TX_BUFFER_TYPE 1
#define WIFI_DYNAMIC_TX_BUFFER_NUM 16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Resolution: Done Issue is done internally Status: Done Issue is done internally
Projects
None yet
Development

No branches or pull requests

8 participants