Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test: etharp_gratuitous #1877

Merged
merged 10 commits into from
Aug 30, 2019
Merged

Test: etharp_gratuitous #1877

merged 10 commits into from
Aug 30, 2019

Conversation

mcspr
Copy link
Collaborator

@mcspr mcspr commented Aug 27, 2019

fix #614 ?
esp8266/Arduino#5998 (comment)

@arihantdaga can you try this one? if I understood correctly the required process. Can be adjusted a bit later
edit: not seeing any resets on initial connection. wifi.reset, wifi.ap commands work too.

If you can reliably reproduce lost IP addresses (either modem or none sleep), it would be interesting to see if this helps

@arihantdaga
Copy link
Contributor

arihantdaga commented Aug 27, 2019

@mcspr Its working. Dint see any reset. I also checked at the same time, n->ip_addr.addr was always giving value 0 for the fist iteration. I think this condition for checking netif up or not works well. 👍

@TD-er
Copy link

TD-er commented Aug 27, 2019

Looks OK to me.
I will also include it in ESPeasy to see if this also may have been triggering some of the WDT reboots.

@TD-er
Copy link

TD-er commented Aug 27, 2019

For ESP32 at least, you need to check the hwaddr_len, or else it will crash with an assert as soon as AP+STA is enabled.

So it will be something like this:

    if ((n->hwaddr_len == ETH_HWADDR_LEN) && 
        (n->flags & NETIF_FLAG_ETHARP) && 
        ((n->flags & NETIF_FLAG_LINK_UP) || (n->flags & NETIF_FLAG_UP))) {

I am not sure if a check for NETIF_FLAG_ETHARP flag is needed. (or maybe even contra-productive?)
See LWIP NETIF_FLAG_

@mcspr
Copy link
Collaborator Author

mcspr commented Aug 28, 2019

...UP check should be && btw, assuming lwip authors know something that we don't and expecting that LINK_UP can be missing independently of UP
https://git.savannah.nongnu.org/cgit/lwip.git/tree/src/core/netif.c?h=STABLE-2_1_2_RELEASE#n880
Old lwip1 code only has UP condition
edit: also note ipv6 branch

ETH flag is set by lwip-glue layer, so check is safe with 8266 too:
https://github.com/d-a-v/esp82xx-nonos-linklayer/blob/af6738a22332a92256471e3d29d37a2700ec1298/glue-lwip/lwip-git.c#L397
https://github.com/espressif/esp-idf/blob/fdab15dc76a3464ed10ce03e9b9cf1a5cd2aa0d1/components/lwip/port/esp32/netif/wlanif.c#L76
Probably can be avoided, but you can never be too safe.

And while I read the https://espeasy.readthedocs.io/en/latest/Tools/Tools.html#periodical-send-gratuitous-arp explanation for arp, in what case it should be on? Are we doing this so that router does not kick us, sort-of keep alive?
Is it safe to leave always on? Is 5000msec interval something based on testing or can be more random?

@TD-er
Copy link

TD-er commented Aug 28, 2019

I have been using the code I mentioned in my last comment, since that comment.
The ESP32 unit I have here is still very responsive now. It used to be taking up to a minute to reply after some time.
The ESP8266 ones do not appear to react different from before this change.

It is not really a keep-alive, but more to make sure the MAC tables in switches/AP/router are not forgetting about the node.

It is giving an answer to a question never asked.

For normal ethernet traffic, each switch/router etc. needs to know what MAC address is behind what port.
Since these tables in switches etc. are not infinite large, they have to renew their tables every now and then.
This is done via ARP packets.
So someone asks "who has 192.168.1.1" and the answer is also used by every other host or switch on the network to update its MAC tables.
If the ESP does not reply to such ARP requests, it may become impossible to send data (back) to the ESP, since the switches do not know how to route it.
So therefore we send Gratuitous ARP packets to answer "Mac AA:BB:CC:DD:EE:FF has 192.168.1.1" even though no-one asked for it.

Sending ICMP (ping) packets to the ESP may also have a similar effect, since ICMP packets appear to be handled different by the AP compared to other packets.
Also an ICMP packet does seem to have effect on the power management of the ESP at core level.
Try running an ESP node for a while (running delay() when not performing a task) and you will notice the power consumption does decrease after a few minutes.
This lower power consumption does seem to be correlated with not receiving some packets like ARP packets.
However, if you start sending ping packets to the ESP, the first one may take a while (up-to 900 msec sometimes), but it will always be answered. This will cause the ESP to increase its power consumption and the next ping packets will be answered much faster.

This simple test does show us a number of things happening here:

  • Lower power consumption is very likely caused by the WiFi radio not listening on every beacon interval of the AP (102.4 msec on most AP)
  • Some packet types (ARP/UDP/etc) are not being repeated or buffered by the AP to overcome the actual listening interval of the ESP. (at least the ones not required to be answered)
  • ICMP packets are being repeated long enough to be received by the ESP (N.B. even the first ping you send will get an reply eventually)
  • ESP nodes may become hard/impossible to reach after a while, which can be explained by their MAC address not being known any more. You can also test this on your PC to look for the known ARP table, or clearing it yourself.

TL;DR
I don't think the Gratuitous ARP is the same as a keep-alive, but more like prevent to become forgotten.

In ESPeasy I use a more dynamic interval for sending these ARP packets.
If a connection (or DNS lookup) attempt fails, I will make the interval short (and send one right away) and every new call to Gratuitous ARP will increase the interval to about 5 sec again.
Also right after getting connected ("Got IP" event) I send one out.
The actual implementation of clearing MAC tables in switches may differ from brand to brand.
Some may remove one after not been seen for a while and others just may clear the entire table periodically (or when it is full). I have seen reports of 30 seconds, but also lots of "hard to give a single number".
It is hard to know what implementation is being used, but I guess some interval below the watchdog timer may be a good idea.
Maybe also good to have some kind of randomness in the intervals to prevent all nodes sending out ARP packets at the same time (e.g. after some power outage)

@mcspr
Copy link
Collaborator Author

mcspr commented Aug 28, 2019

Wow, thanks for detailed explanation!

It is surprising that esp32 build works differently, there's this exact functionality enabled by default:
https://github.com/espressif/esp-idf/blob/aa087667dffb8f024a5bf162cef4d5c0ffb152ab/components/lwip/Kconfig#L138-L154
https://github.com/espressif/arduino-esp32/blob/7a574399b17e552800459e3eafb72d9158c05968/tools/sdk/sdkconfig#L601 (i.e. pre-built idf has this flag on)
https://github.com/espressif/esp-lwip/blob/61d840ff4778f4946c8743f7e412345abcd537f1/src/core/ipv4/etharp.c#L142-L154
I guess, 60s is too long? (edit: "It used to be taking up to a minute to reply after some time." <<<)
And it is in Espressif's lwip package since the last year: espressif/esp-lwip@e6bb433

I think I'd settle with idf approach and randomize arp time between 15 and 30s (and maybe kick in sooner if any connection routine fails, that's an interesting idea)

//
// Help solve compatibility issues with some routers.
// If the ARP table of the AP is old, and the AP doesn't send ARP request to update it's ARP table,
// this will lead to the STA sending IP packet fail. Thus we send gratuitous ARP periodically to let AP update it's ARP table.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking, it is not the sending of an packet that fails, but receiving.
The other hosts in the network may no longer know how to retrieve what MAC address to use as destination when sending a packet to some IP-address. Also the other components on the network, like switch or AP may have the MAC address removed from their tables and thus no longer be able to route the packet to the correct port.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wording is weird, true... that was a blatant c/p from the esp-lwip Kconfig entry. Reworded to mention ARP tables.

#if WIFI_GRATUITOUS_ARP_SUPPORT
// Only send out gra arp when in STA mode
if (_wifi_gratuitous_arp_interval && ((WiFi.getMode() & WIFI_AP) == 0)) {
_wifiSendGratuitousArp(_wifi_gratuitous_arp_interval);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With STA+AP mode the WiFi radio is always on, which makes it less likely ARP requests may be missed.
But it may still be useful to send it if you get a (new) IP address.
Just make sure to only send it to the STATION_IF if you want to distinguish this.

N.B. the AP and STA use different MAC addresses, so it should be no problem to send out Gratuitous ARP to the AP interface. (as long as you're sending the right IP of course :) )

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is copied from the idf, basically trying to emulate GARP flag that they check. No flag here, so STATION_IF seems like a solution.

mcspr added a commit to mcspr/ESPEasy that referenced this pull request Aug 30, 2019
See this comment and further discussion in the issue:
xoseperez/espurna#1877 (comment)

Use the same condition as esp-lwip.
2.3.0/lwip1 builds netif->num increments on for each sta or ap
lwip2 keeps those constant, but that seems like a implementation detail
might break in the future anyways...
@mcspr mcspr merged commit 35ce687 into xoseperez:dev Aug 30, 2019
@mcspr mcspr deleted the etharp_gratuitous branch August 30, 2019 07:58
@mcspr mcspr mentioned this pull request Sep 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sonoff Dual R2 IP Address not reachable after a while
3 participants