Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport patch from lwip2 to fix lwip1.4 behaviour with MTU<1500 #5902

Closed
6 tasks done
mcspr opened this issue Mar 21, 2019 · 5 comments
Closed
6 tasks done

Backport patch from lwip2 to fix lwip1.4 behaviour with MTU<1500 #5902

mcspr opened this issue Mar 21, 2019 · 5 comments
Milestone

Comments

@mcspr
Copy link
Collaborator

mcspr commented Mar 21, 2019

Basic Infos

  • This issue complies with the issue POLICY doc.
  • I have read the documentation at readthedocs and the issue is not addressed there.
  • I have tested that the issue is present in current master branch (aka latest git).
  • I have searched the issue tracker for a similar issue.
  • If there is a stack dump, I have decoded it.
  • I have filled out all fields below.

Platform

  • Hardware: ESP-12
  • Core Version: [2019/03/21 git 8961bba]
  • Development Env: Arduino IDE
  • Operating System: Any

Settings in IDE

  • Module: Generic ESP8266 Module
  • lwip Variant: v1.4
  • Flash Mode: dout
  • Flash Size: 4MB
  • CPU Frequency: 80Mhz
  • Upload Using: SERIAL
  • Upload Speed: 115200 (serial upload only)

Problem Description

Using ESPAsyncTCP, ESPAsyncWebServer, AsyncWebSocket, lwip v1.4
It is impossible to receive data-chunk larger than MSS size via websockets if MTU of the client device is different from 1500. I can see data is sent to the router, but fragmenting does not work properly, so router refuses to send the data further. Surprisingly, HTML is sent properly.

I was able to find one related issue:
#2437

This was already fixed in lwip2. Related issue + fix:
http://savannah.nongnu.org/bugs/?46384
http://git.savannah.gnu.org/cgit/lwip.git/commit/src?id=8e8571da6a6771f9d2d82bbd0b5a6c27474ce0fc
Can we apply this patch to the local tree here and rebuild liblwip_gcc.a so pre built option works too?

MCVE Sketch

Gist of the sketch, to avoid blowing up the issue with long strings:
https://gist.github.com/mcspr/00e95d3ea8d680aeaa04cbe8a707ba8d

Debug Messages

On Windows machine with MTU of 1492 (meaning, MSS is auto calculated to be 1452)

> netsh interface ipv4 set subinterface "WLAN" mtu=1492

In browser, using lwip v1.4:

SDK:2.2.2-dev(c0eb301)/Core:unix-2.6.0-dev=-194000/lwIP:1.4.0rc2/BearSSL:6778687
client connected

Same log with lwip2 low memory:

SDK:2.2.2-dev(c0eb301)/Core:unix-2.6.0-dev=-194000/lwIP:STABLE-2_1_2_RELEASE/glue:1.1-2-ga501b57/BearSSL:6778687
client connected
aaaaaaaaaaaaaa...
aaaaaaaaaaaaaa...
@d-a-v
Copy link
Collaborator

d-a-v commented Mar 21, 2019

Out of curiosity, what is preventing you from using lwIP-v2 ?

TBH, the primary goal of lwip2 was exactly to avoid your situation:
being able to use latest sources, update when bug-fix are posted in the original library.

@mcspr
Copy link
Collaborator Author

mcspr commented Mar 22, 2019

I happened to notice this when trying to fix issues with the app running 2.3.0, and testing with other versions happen to pinpoint LWIP1 as the culprit. Since it is a (still) selectable menu option , I thought it might warrant a patch. (although, I have missed how old lwip source actually is...)

Also, TBH, I would agree that using a single library source is much easier :) Amazing work on that.

@d-a-v
Copy link
Collaborator

d-a-v commented Mar 22, 2019

@earlephilhower @devyte Should we mark lwIP-v1.4 as deprecated for the next release ?
(rtos-sdk, esp32, both have lwip-2.0.3)

@devyte
Copy link
Collaborator

devyte commented Mar 22, 2019

Yes, I'd like to get rid of it in next major.
But how do you deprecate a lib?

d-a-v added a commit that referenced this issue Apr 5, 2019
ref: d-a-v/esp82xx-nonos-linklayer#31
origin: #5902
me-no-dev/ESPAsyncTCP#108

Following the links above is instructive.
To summarize:

    * currently and from a long time lwIP tcp client connections always uses the same tcp source port number right after boot
    * this port number is increased everytime a new one is needed (= new tcp client connection)
      (to be noted, linux has the same increasing behavior)
    * when connecting to the same server (right after boot), the triplet (esp-ip-address, source port, destination port) are the same, and may hit remote server list of sockets in time-wait-state (previous connection unproperly closed from the same esp). Consequently the new connection fails when it happens.
    * this is happening only when debugging (esp reboots often, in less time than time-wait expiration), so the nasty effect is amplified especially when bugs are being chased
    * efforts had been done when espressif's lwIP implementation wasn't open source, with WiFiClient::setLocalPortStart() #632 but it must be explicitely called with a different random number at every reboot. Efficient but not ideal.

This PR uses espressif firmware's r_rand() everytime a new local source port is needed. A different source port number is now showed by tcpdump right after boot. Source port range and duplication is verified everytime in lwIP's src/core/tcp.c:tcp_new_port(). It is implemented as a local patch for upstream lwIP so it is valid not only with WiFiClient but also with @me-no-dev's Async libraries (they don't use WiFiClient).

WiFiClient::setLocalPortStart() is still usable with the same effects as before.
@devyte devyte added this to the 3.0.0 milestone Nov 20, 2019
@mcspr
Copy link
Collaborator Author

mcspr commented Sep 1, 2020

Resolved via #7436, no more lwip1.4

@mcspr mcspr closed this as completed Sep 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants