Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gnrc_rpl: do not assert netif on auto-init #10895

Merged
merged 2 commits into from
Mar 27, 2019

Conversation

miri64
Copy link
Member

@miri64 miri64 commented Jan 29, 2019

Contribution description

gnrc_networking is unusable when compiled for boards that do not have any network devices on-board due to an assertion in RPL's auto-init. I think this is pretty harsh. A friendly info message is enough, as it might not even be an error. Also, if one expects RPL to work without network interfaces they are a fool ;-). Also also, the GNRC_NETIF_NUMOF != 1 case doesn't have this assertion either.

Testing procedure

Compile gnrc_networking for a board without any on-board radios. Without this PR the board will run into a failed assertion on start-up. With it, it will just continue and print a message before the normal welcome message: "Unable to auto-initialize RPL. No interfaces found."

Issues/PRs references

None, but discovered in #10412.

`gnrc_networking` is unusable when compiled for boards that do not have
any network devices on-board due to an assertion in RPL's auto-init. I
think this is pretty harsh. A friendly info message is enough, as it
might not even be an error. Also, if one expects RPL to work without
network interfaces they are a fool ;-).
@miri64 miri64 requested a review from cgundogan January 29, 2019 17:01
@miri64 miri64 added Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation Area: network Area: Networking labels Jan 29, 2019
@miri64 miri64 added this to the Release 2019.04 milestone Jan 29, 2019
@cgundogan
Copy link
Member

@MichelRottleuthner and me discussed this a couple of days ago. IMO it's really strange that we set GNRC_NETIF_NUMOF to 1 per default on each board, no matter if it has a network device or not. Having RPL to assert a network device is IMO correct. If a firmware is not including any network device, then the only solution I see is: set GNRC_NETIF_NUMOF correctly to 0 and do not compile and run any network related stuff (like this RPL auto_init)

@miri64
Copy link
Member Author

miri64 commented Jan 29, 2019

IMO it's really strange that we set GNRC_NETIF_NUMOF to 1 per default on each board, no matter if it has a network device or not.

One thing is a compile time configuration, the other a runtime configuration. Once #10622 is in effect, this could be normal behavior.

Having RPL to assert a network device is IMO correct.

IMO not. Just because GNRC_NETIF_NUMOF is set to 1 doesn't mean there has to be one (see the cases at hand).

If a firmware is not including any network device, then the only solution I see is: set GNRC_NETIF_NUMOF correctly to 0 and do not compile and run any network related stuff (like this RPL auto_init)

This is a possible optimization (or rather: do not include gnrc_netif at all would be the right answer), but the default case should also not crash and burn, just because the routing protocol doesn't find its interface.

@cgundogan
Copy link
Member

My ceterum censeo is that we should not bloat the code with run-time checks, but rather get the compile-time dependencies straight. Adding a run-time check here is a very simple, but palliative solution. This check is unnecessary for correctly compiled applications, so that we add superfluous overhead for properly configured apps (which should be the majority of all compilations) ...

@miri64
Copy link
Member Author

miri64 commented Jan 29, 2019

My ceterum censeo is that we should not bloat the code with run-time checks, but rather get the compile-time dependencies straight. Adding a run-time check here is a very simple, but palliative solution. This check is unnecessary for correctly compiled applications, so that we add superfluous overhead for properly configured apps (which should be the majority of all compilations) ...

The problem is as e.g. in #10412 that this error was also caused by runtime errors (e.g. the radio failed to initialize for whatever reason). Basically, this assert (again: the routing protocol making assumptions about the lower layer) hardstops any attempt the lower layer might try to resolve the situation. IMHO the assert is completely in the wrong place.

@cgundogan
Copy link
Member

Basically, this assert (again: the routing protocol making assumptions about the lower layer) hardstops any attempt the lower layer might try to resolve the situation.

It's not the routing protocol; the assert is in the context of auto_init. Don't auto_init, if device initialization can fail, because there is no way to re-run all auto_init functions "after lower layers resolve the situation" (which there is nothing like that in RIOT that I know of).

In your example, the correct behaviour would be to "resolve" the situation on the lower layer, i.e., in a blocking way, such that upper layers don't even start to auto_init. Our auto_init does not support a sophisticated error handling and re-running of auto_init functions (yet?). It merely is a sequence of auto_init function calls without any dependency checks. My fear is that putting run-time checks here and there only "shadows" actual problems. Makes it disappear for now, only to come back and haunt us during refactoring attempts or alike.

I consider putting an if here as a workaround, and thus it should be marked as such, e.g. in a big fat comment. I am not happy with this solution, but I also cannot provide other suggestions currently.

@miri64
Copy link
Member Author

miri64 commented Jan 30, 2019

It's not the routing protocol; the assert is in the context of auto_init. Don't auto_init, if device initialization can fail, because there is no way to re-run all auto_init functions "after lower layers resolve the situation" (which there is nothing like that in RIOT that I know of).

Fine, then it is the auto_init of gnrc_rpl not gnrc_rpl itself. It's still pretty confusing when a filename that is associated with a routing protocol is throwing an assertion at you that basically means: "network device was not initialized". Even the network devices themselves have that as a crash condition. But to come to the conclusion you

  1. Have to go to the failed assertion (to figure out what's wrong), for which you have to know how to translate the (per default) printed address into a line,
  2. Need to know, that gnrc_netif_iter() return NULL means "there is no network interface".

Most users will just assess: "Welp, networking doesn't work on my Arduino, because this RPL thing is crashing the system, please help!"

Our auto_init does not support a sophisticated error handling and re-running of auto_init functions (yet?). It merely is a sequence of auto_init function calls without any dependency checks.

Yes it doesn't support it yet, but this assert blocks future efforts towards this IMHO.

My fear is that putting run-time checks here and there only "shadows" actual problems. Makes it disappear for now, only to come back and haunt us during refactoring attempts or alike.

It already steps in at a place where it is not supposed to. So removing it doesn't shadow the problem, it keeps problem to the module responsible for it. E.g. IPv6 doesn't fail either when there are no network interfaces, because it can work with that, and so does RPL (naïve solution would be to type rpl init once an interface becomes online). Since the current solution is completely breaking the layered abstraction I rather believe that the current situation will haunt us if we refactor things, not the other way around.

I consider putting an if here as a workaround, and thus it should be marked as such, e.g. in a big fat comment. I am not happy with this solution, but I also cannot provide other suggestions currently.

But so is the assert. It causes a crash due to a problem in the lower layer. At least the if handles the situation gracefully (and gives the user actual feedback about whats wrong).


If a firmware is not including any network device, then the only solution I see is: set GNRC_NETIF_NUMOF correctly to 0 and do not compile and run any network related stuff (like this RPL auto_init)

This is a possible optimization (or rather: do not include gnrc_netif at all would be the right answer), but the default case should also not crash and burn, just because the routing protocol doesn't find its interface.

On that note after a night's sleep: If we somehow manage to exclude gnrc_netif in the case of no actual network device being there then we also shouldn't include gnrc_ipv6 (no interfaces, no IPv6) and also, among others, gnrc_rpl (no IPv6, no RPL).


Finally, there is another valid reason why there might be no network interface at startup: the device might need "manual" bootstrapping (or bootstrapping provided by the dynamic configuration, but which still might need some time). Classic examples: WiFi or Bluetooth.

@cgundogan
Copy link
Member

I am not saying that assert is the ultimate solution, neither is a runtime check. Currently, the rpl auto_init is one of its kind in the observable RIOT universe: no other auto_init requires an immediately working netif interface during auto_init. Another candidate could've been nib, but it is directly hooking into the netif creation routine, so there's no problem for nib in such cases.

There is no clean way to solve this with our current approach of auto_init.
A run-time check with a log statement will only work for debugging on-site. What happens if the firmware is deployed in production? Nobody will ever see this log statement, but the binary will still contain all runtime checks.

IMO, the most appropriate solution would be to have interface up/down events and let rpl auto_init be called on up events. But we are not there yet. This is why I propose: 1) move RPL auto_init to the same place, where nib is initialized, 2) disable RPL auto_init by default, or 3) add a comment telling a user who stumbles across the auto_init function that this sort of error handling (the runtime check you provide) is just a workaround.

FWIW, the quickest solution in this case is adding the comment. And I would really appreciate it if you make this warning as flashy as you can, so 1) other developers don't see these kind of checks as the normal way to handle such scenarios, and 2) that we have a reminder for ourselves to replace this auto_init with something more appropriate in the future (interface events, etc.)

@miri64
Copy link
Member Author

miri64 commented Feb 26, 2019

FWIW, the quickest solution in this case is adding the comment. And I would really appreciate it if you make this warning as flashy as you can, so 1) other developers don't see these kind of checks as the normal way to handle such scenarios, and 2) that we have a reminder for ourselves to replace this auto_init with something more appropriate in the future (interface events, etc.)

Done

Copy link
Member

@cgundogan cgundogan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @miri64. That's exactly what I wanted (: ACK

@cgundogan cgundogan added CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Reviewed: 1-fundamentals The fundamentals of the PR were reviewed according to the maintainer guidelines Reviewed: 2-code-design The code design of the PR was reviewed according to the maintainer guidelines Reviewed: 3-testing The PR was tested according to the maintainer guidelines Reviewed: 5-documentation The documentation details of the PR were reviewed according to the maintainer guidelines labels Mar 27, 2019
@miri64 miri64 merged commit 7931d66 into RIOT-OS:master Mar 27, 2019
@miri64 miri64 deleted the gnrc_rpl/enh/do-not-assert-netif branch March 27, 2019 17:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: network Area: Networking CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Reviewed: 1-fundamentals The fundamentals of the PR were reviewed according to the maintainer guidelines Reviewed: 2-code-design The code design of the PR was reviewed according to the maintainer guidelines Reviewed: 3-testing The PR was tested according to the maintainer guidelines Reviewed: 5-documentation The documentation details of the PR were reviewed according to the maintainer guidelines Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants