Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

at86rf231 initialization loses race against auto_init_gnrc_rpl #16359

Closed
rgrunbla opened this issue Apr 20, 2021 · 4 comments · Fixed by #16527
Closed

at86rf231 initialization loses race against auto_init_gnrc_rpl #16359

rgrunbla opened this issue Apr 20, 2021 · 4 comments · Fixed by #16527
Labels
Area: network Area: Networking Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)

Comments

@rgrunbla
Copy link
Contributor

Hello !

Description

I'm trying to auto initialize RPL on iotlab-m3 nodes, but it is not working. I'm using the gnrc_networking example. Enabling debug, I found that the error is this one, when specifying the interface: auto_init_gnrc_rpl: could not initialize RPL on interface 6 - interface does not exist:

auto_init_gnrc_rpl: could not initialize RPL on interface 6 - interface does not exist
main(): This is RIOT! (Version: UNKNOWN (builddir: /nix/store/k1p1qr1m0zir5k55c6hdxl0s3sfpdb47-RIOT-master))
RIOT network stack example application
All up, running the shell now
> ifconfig
ifconfig
Iface  6  HWaddr: 1F:60  Channel: 26  Page: 0  NID: 0x23  PHY: O-QPSK

          Long HWaddr: A6:DA:E9:93:B3:CC:9F:60
           TX-Power: 0dBm  State: IDLE  max. Retrans.: 3  CSMA Retries: 4
          AUTOACK  ACK_REQ  CSMA  L2-PDU:102  MTU:1280  HL:64  RTR
          6LO  IPHC
          Source address length: 8
          Link type: wireless
          inet6 addr: fe80::a4da:e993:b3cc:9f60  scope: link  VAL
          inet6 group: ff02::2
          inet6 group: ff02::1
          inet6 group: ff02::1:ffcc:9f60

          Statistics for Layer 2
            RX packets 2  bytes 86
            TX packets 2 (Multicast: 2)  bytes 86
            TX succeeded 2 errors 0
          Statistics for IPv6
            RX packets 2  bytes 128
            TX packets 2 (Multicast: 2)  bytes 128
            TX succeeded 2 errors 0

When not specifying the interface, I get a similar error:

Unable to auto-initialize RPL. No interfaces found.
main(): This is RIOT! (Version: UNKNOWN (builddir: /nix/store/k1p1qr1m0zir5k55c6hdxl0s3sfpdb47-RIOT-master))
RIOT network stack example application
All up, running the shell now
> rpl
rpl
instance table: [ ]
parent table:   [ ]     [ ]     [ ]

> ifconfig
ifconfig
Iface  6  HWaddr: 0B:1F  Channel: 26  Page: 0  NID: 0x23  PHY: O-QPSK

          Long HWaddr: 5A:EC:05:B2:53:D9:8B:1F
           TX-Power: 0dBm  State: IDLE  max. Retrans.: 3  CSMA Retries: 4
          AUTOACK  ACK_REQ  CSMA  L2-PDU:102  MTU:1280  HL:64  RTR
          6LO  IPHC
          Source address length: 8
          Link type: wireless
          inet6 addr: fe80::58ec:5b2:53d9:8b1f  scope: link  VAL
          inet6 group: ff02::2
          inet6 group: ff02::1
          inet6 group: ff02::1:ffd9:8b1f

          Statistics for Layer 2
            RX packets 2  bytes 86
            TX packets 5 (Multicast: 5)  bytes 215
            TX succeeded 5 errors 0
          Statistics for IPv6
            RX packets 2  bytes 128
            TX packets 5 (Multicast: 5)  bytes 320
            TX succeeded 5 errors 0

Modifying the sys/net/netif/netif.c file and the netif_register to add DEBUG statements just after the start and just before its return, I can observe that the message about auto-initialization failing ("Unable to auto-initialize RPL. No interfaces found.") is wrangled with the "netif register start" and "netif register end" messages (which I added)

Unable to aat86rf2xx_reset(): reset complete.
netif register start
netif register end
uto-initialize RPL. No interfaces found.
main(): This is RIOT! (Version: UNKNOWN (builddir: /nix/store/km3r0mng7msn87v1p5vmlyfyxr7l8xjk-RIOT-master))
RIOT network stack example application
All up, running the shell now

It seems to me there is some problem with the auto-initialization for this board (or I didn't quite get how to do it).

Expected results

I expect auto initialization for RPL to work with the gnrc_networking example on FIT-IOT m3 nodes.

Versions

Linux, NixOS, and I'm using the master branch from RIOT-OS to compile everything. The first step would be to reproduce the problem of FIT-IOT with another compilation chain to rule out any problem that could occur on my side.

Thanks,
Rémy

@chrysn
Copy link
Member

chrysn commented Apr 20, 2021

There was an earlier discussion about making an auto init component blocking around #15207, but there it was sufficient to just do a check on thread priorities.

Here, my best shot would be to make the auto_init_at86rf2xx function block until the interfaces are really up. Judging from a look at the call tree, it looks to me like pretty much all netifs are potentially prone to this (gnrc_netif_create only spawns the thread), and only complete if they never block (because then their higher priority means they can complete before auto init continues).

I suppose that it's the res = dev->driver->init(dev) call that blocks. Given that this might fail, pulling netif_register(&netif->netif) up to before the first blocking point is probably not such a good idea. It'd be tempting to, in auto init, briefly gnrc_netif_acquire/release the newly created netif (for it gets locked with high priority initially and then only released after successful completion), but that'd make auto init (and thus the full application) come to a full halt if the initialization fails.

So, which options do we have?

  • I see adding a mutex that's released on both successful and unsuccessful initialization, some msg ping-pong (auto init sends GNRC_NETAPI_MSG_TYP_GET on some common property, and waits for a reply) -- both don't sound not overly tempting to me.
  • Require that ->init does not block. I don't know how feasible this is in general, especially given that it's supposed to be fallible and that a netif connected via I2C will necessarily have a blocking operation before it can even be determined whether it's physically there.
  • The "do nothing" option is not too tempting either, though: the at86 interface is already out there and silently passes by any later auto init that looks into the present interfaces, and an implementation change in any device's init function could move that device in the slipping-through category.

@chrysn
Copy link
Member

chrysn commented Apr 20, 2021

There's one more option:

  • We could move auto inits that now happen uring the auto init phase from "at the end of auto-init" to "whenever an interface gets added". The hooks could retain their semantics -- trigger when an interface is added if its number matches, rather than when init is done if the interface is present. It would also make it easier to auto-start something whenever any interface is added.

@chrysn chrysn added Area: network Area: Networking Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) labels Apr 21, 2021
@benpicco
Copy link
Contributor

benpicco commented Apr 27, 2021

Something similar seems to happen with uhcpd on samr21-xpro with UPLINK=slip and examples/gnrc_border_router:

gnrc_uhcpc: Using 6 as border interface and 0 as wireless interface.
gnrc_uhcpc: only one interface found, skipping setup

As a workaround, this 'fixes' the issue there:

--- a/sys/net/gnrc/application_layer/uhcpc/gnrc_uhcpc.c
+++ b/sys/net/gnrc/application_layer/uhcpc/gnrc_uhcpc.c
@@ -34,6 +34,8 @@ static void set_interface_roles(void)
 {
     gnrc_netif_t *netif = NULL;
 
+    xtimer_sleep(1);
+
     while ((netif = gnrc_netif_iter(netif))) {
         kernel_pid_t dev = netif->pid;
         int is_wired = gnrc_netapi_get(dev, NETOPT_IS_WIRED, 0, NULL, 0);

@benpicco
Copy link
Contributor

Does #16527 fix this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: network Area: Networking Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants