Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extensebility of netdev_driver_t API #12469

Closed
maribu opened this issue Oct 16, 2019 · 44 comments
Closed

Extensebility of netdev_driver_t API #12469

maribu opened this issue Oct 16, 2019 · 44 comments
Assignees
Labels
Discussion: needs consensus Marks a discussion that needs a (not necessarily moderated) consensus Discussion: RFC The issue/PR is used as a discussion starting point about the item of the issue/PR State: stale State: The issue / PR has no activity for >185 days

Comments

@maribu
Copy link
Member

maribu commented Oct 16, 2019

The Issue

In PR #12294 the feature to let the transceiver sleep and wake up again is developed. I strongly think that there should be a common API for this functionality, so this feature can be used in the same way for all transceiver supported by RIOT.

There seems however not a an easy way to extend the API, except for the netdev_driver_t::get() and netdev_driver_t::set() function - those have been proven to be very well extensible without having compatibility issues.

Using netdev_driver_t::set() for sleeping

One could use netdev_driver_t::set(dev, NETOPT_STATE, NETOPT_STATE_SLEEP, sizeof(NETOPT_STATE_SLEEP)) as a conical way to get the device sleeping. But what should be the way to wake it up again?

One of the following? Any of the following?

  1. netdev_driver_t::set(dev, NETOPT_STATE, NETOPT_STATE_IDLE, sizeof(NETOPT_STATE_IDLE))
  2. netdev_driver_t::set(dev, NETOPT_STATE, NETOPT_STATE_RX, sizeof(NETOPT_STATE_RX))
  3. netdev_driver_t::set(dev, NETOPT_STATE, NETOPT_STATE_RESET, sizeof(NETOPT_STATE_RESET))

TX Preloading

A similar approach is used for TX preloading: When preloading is active, netdev_driver_t::send() will not send but preload the given frame. And netdev_driver_t::set(dev, NETOPT_STATE, NETOPT_STATE_TX, sizeof(NETOPT_STATE_TX))

The issue with netdev_driver_t::set() for actions

To me the fact that netdev_driver_t::set() and netdev_driver_t::get() have been proven to be so versatile proves that this API is well designed. But this doesn't seem to reflect actions well.

E.g. with the state of the driver: Normally, this is and implementation detail. The internal state machine of the driver can either reflect the states in netopt_state_t very well, or could be vastly more complex, or anything in between. Similar, the transitions between the states greatly depend on the properties of both the specific hardware used and the design decisions in the driver made. Providing an API that looks like force-setting the internal state seems very wrong to me.

The road forward

So, what should we do?

Should we simple keep using the get() / set() not only for things like configuration knobs (for which the API - at least to my personal opinion - has served us very well), but also for actions?

Or should we try to add a second API that should ideally as extensible as the get() / set() API while also allowing netdev_driver_ts to only support a subset of the features?

Very likely there are more options that I haven't thought about

Collecting ideas

Adding an action API

How about adding a specific action API, so that netdev_driver_t becomes:

typedef struct netdev_driver {
    int (*action)(netdev_t *dev, netdev_action_t action, void *value, size_t value_len);
    int (*init)(netdev_t *dev); 
    void (*isr)(netdev_t *dev);
    int (*get)(netdev_t *dev, netopt_t opt,  void *value, size_t max_len);
    int (*set)(netdev_t *dev, netopt_t opt, const void *value, size_t value_len);
} netdev_driver_t;

with

typdef enum {
    NETDEV_ACTION_SEND, /**< Sends a frame. Value is of type `const iolist_t *` and `value_len` is `sizeof(iolist_t)`. */
    NETDEV_ACTION_TX_PRELOAD, /**< Preloads a frame for TX. Value is of type `const iolist_t *` and `value_len` is `sizeof(iolist_t)`. A subsequent call with `NETDEV_ACTION_SEND` and `value == NULL` will send it */
    NETDEV_ACTION_RX_SIZE, /**< Get the size of the received frame without dropping it. `value` and `value_len` are ignored, but should be set to zero by the caller */
    NETDEV_ACTION_DROP, /**< Drops the received frame. `value` and `value_len` are ignored, but should be set to zero by the caller */
    NETDEV_ACTION_RECV, /**< Retrieves the received frame */
    NETDEV_ACTION_SLEEP, /**< Power down the netdev to conserve power. No frames are received any more */
    NETDEV_ACTION_WAKE_UP, /**< Powers up the netdev to be able to receive frames */
    NETDEV_ACTION_PEEK, /**< Retrieve the beginning of the received or incoming frame without dropping it, cropping it to the provided buffer size if needed */
} netdev_action_t;

(The details of NETDEV_ACTION_RECV are purposely left out, as e.g. just using value as the destination buffer and value_len as the buffer size would lack the feature of the RX info; so either a struct that can be used to pass the buffer and retrieve the RX info should be used, or a second action or a get() would be needed to get the RX info. But this in-detail discussion is misplaced here until the general idea of this approach should be agreed upon first.)

This proposal tries to learn some lessons from previous discussions and ideas:

  • The get()/set() API has proved to be very flexible and extensible. This proposed API tries to copy this feature
  • Previous tries in extending the API had some flaws:
    • The recv() function has been overloaded to support the drop() and get_size() functionality by providing magic values as arguments. This lead to confusion about the API and a lot of bugs in the implementations (see Misleading API in netdev driver #9805). The explicit enum value provided here should be more obvious
    • Adding additional functions increases the size of the struct and the ROM size of the implementations as well (see numbers provided in Misleading API in netdev driver #9805). This approach does the opposite by replacing the existing send() and recv() functions with a more generic function
@maribu maribu added Discussion: RFC The issue/PR is used as a discussion starting point about the item of the issue/PR Discussion: needs consensus Marks a discussion that needs a (not necessarily moderated) consensus labels Oct 16, 2019
@maribu
Copy link
Member Author

maribu commented Oct 16, 2019

@RIOT-OS/maintainers: Please have a look at this and discuss :-)

Please feel invited to directly add alternative ideas as subsections to the "Collecting ideas" section in the description above.

@jia200x
Copy link
Member

jia200x commented Oct 16, 2019

hi @maribu

This is aligned with the PHY/MAC/netif rework. There was a presentation for the RIOT summit that unfortunately couldn't present because of lack of time. We discussed some insights with @PeterKietzmann, @miri64, @bergzand, @haukepetersen and @kaspar030, but the idea was to have a wide discussion about this topic.

I will try to record the original session asap so we are all in sync.

@jia200x
Copy link
Member

jia200x commented Nov 21, 2019

Hi @maribu

After opening #12741, I think it's easier to explain my point of view here.

I think considering netdev is an interface that encapsulates several actions, the proposed approach can make sense.

However, I would like to point the problem the netif rework is trying to resolve (in the mid-long term, third phase of #12688 )

(sorry for the Wall Of Text)

The problem

Netdev is documented as a network interface:

This interface provides a uniform API for network stacks to interact with network device drivers. This interface is designed in a way, that it is completely agnostic to the used network stack. This way, device drivers for network devices (e.g. IEEE802.15.4 radios, Ethernet devices, ...) have to implemented once and can be used with any supported network stack in RIOT.

But in practice is closer to a PHY layer (the MAC layer is on top, e.g it's called by the IEEE802.15.4 MAC in gnrc_netif_ieee802154):

But also abstracts some transceiver operations (read packet, set hardware acceleration params, etc).

So it would be ideal to have different layers to have a MAC, PHY and whatever layer is needed in the middle (proposed in #7736).
But in practice, the APIs of each MAC and PHY layer has unique API (an ethernet PHY has no idea how to set an IEEE802.15.4 channel...) . So we could benefit if we define specific APIs for each layer. Something like:

(this is partially shown in #12741).

E.g for the proposed actions:

  • NETDEV_ACTION_SEND: This can be replaced by the PHY layer send function (e.g ieee802154_phy_send).
  • NETDEV_ACTION_TX_PRELOAD: In theory the transceiver always pre-loads a packet. Then someone triggers send (TX_START flag). This option was introduced because of MAC layers with duty cycling (TSCH), where it's needed to preload a packet before sending in the exact time. However, this is not needed if there's a dedicated PHY layer (or if the MAC layer has direct access to the transceiver, e.g OpenWSN).
  • NETDEV_ACTION_RX_SIZE: This is a PHY layer attribute. It can be implemented in the corresponding PHY (e.g indication message in IEEE802.15.4 packet).
  • NETDEV_ACTION_DROP: The transceiver usually doesn't know how to drop a packet. We need to add this command because we call netdev->driver->recv function twice (one to get the packet size, another one to put the packet in a buffer). If we don't implement "drop", there's no way to tell the radio "please receive more packets". But same as before, this is PHY/transceiver HAL dependent.
  • NETDEV_ACTION_RECV: This is a transceiver indication. It can be implemented in the transceiver HAL (or in a component of the PHY layer).
  • NETDEV_ACTION_SLEEP: This is a transceiver operation. It can be implemented in the transceiver HAL.
  • NETDEV_ACTION_WAKE_UP: Ditto.
  • NETDEV_ACTION_PEEK: This can also be modeled as a transceiver HAL operation.

In fact, send, recv, get, set are "network interface" operations. Check how similar is the API of gnrc_netif_ops and netdev_driver_t.

Note that having MAC/PHY specific APIs doesn't interfere with users sending data directly using the low layers:

  • A IEEE802.15.4 PHY user (probably someone designing an IEEE802.15.4 MAC layer or an enthusiast) will use the IEEE802.15.4 PHY layer, no abstraction needed there.
  • Someone that wants to send messages between IEEE802.15.4 radios, will need to use the IEEE802.15.4 MAC layer.
  • Someone that wants to send data regardless of the MAC layer below, needs a network interface (netif).

However, I'm fine to continue with this proposed approach, since the netif rework will take some time and it doesn't collides with these principles.
If we all agree to have dedicated PHY and MAC layers (#11483) then we would only need to move code.

I hope everything is clear! Please let me know if I miss something

@maribu
Copy link
Member Author

maribu commented Nov 21, 2019

I fail to fully understand this. Could you sketch the API of e.g. an IEEE 802.15.4 Radio HAL? How would it differ (except for names) from the proposed netdev API? (I get that you want to provide HALs specific to the network technology used. But isn't this already the case? The netdev_driver_t::get() and netdev_driver_t::set() APIs already only provided what is relevant to the technology. E.g. setting TX power on a netdev_driver_t for an Ethernet device will simply return -ENOTSUP.)

NETDEV_ACTION_SEND: This can be replaced by the PHY layer send function (e.g ieee802154_phy_send).

Hardly. The PHY layer send function will need some API to implement the send feature. Something like NETDEV_ACTION_SEND. But I like the idea very much to split PHY and bare device driver APIs. This would likely allow some significant code-deduplication

@maribu
Copy link
Member Author

maribu commented Nov 21, 2019

Let's say the different HALs are implemented with the three basic get(), set(), action() (and some stuff for handling ISRs, and initialization). And every HAL for the different technology only differs in the enums used to specify the options to set() or get(), or the action to perform. Would that already be something that would match the concept you proposed?

That would certainly make the compilers job easier to generate efficient machine code for a switch() of the enums.

@jia200x
Copy link
Member

jia200x commented Nov 22, 2019

I fail to fully understand this. Could you sketch the API of e.g. an IEEE 802.15.4 Radio HAL? How would it differ (except for names) from the proposed netdev API?

Without any cross layer optimizations, a pure theoretical IEEE802.15.4 HAL would need:

  • An interface to the FIFO of the transceiver
  • A function to trigger send
  • A function to set the transceiver state
  • Functions to access hardware acceleration features (L2 filters, auto-ACK, retransmissions, CSMA params, etc).
  • A function to set the carrier frequency and modulation (QPSK, FSK, ASK, etc)

A pure PHY layer on top would:

  • Calculate the carrier frequency and modulation from the channel number + page and set it via the HAL interface
  • Function to set PHY states (e.g note "sleep" is not a PHY state)
  • Provide a send function using the FIFO interface, set transceiver state and trigger send functions
  • Etc

(I'm not assuming it's the smartest idea to have pure theoretical PHY and HAL because most radios provide some components of PHY, but it's just to show the differences)

Indeed it's possible to encapsulate everything under get and set. I would just try to see if it makes sense to encapsulate under these when there's a set of function that must be implemented in a given technology.
E.g, ALL IEEE802.15.4 radios MUST implement set_channel functions.

We could benefit from type checks. E.g

int channel = 16;
ieee802154_set_channel(dev, channel);

instead of

uint8_t channel = 16; /* This MUST be uint8_here */
netdev->driver->set(NETOPT_CHANNEL, &channel, sizeof(channel));

But yes, this is just an implementation decision :)

Hardly. The PHY layer send function will need some API to implement the send feature. Something like NETDEV_ACTION_SEND

It depends on the scope of the transceiver HAL. If we add a FIFO interface, then it's possible to implement the "send" function on top.

Let's say the different HALs are implemented with the three basic get(), set(), action() (and some stuff for handling ISRs, and initialization). And every HAL for the different technology only differs in the enums used to specify the options to set() or get(), or the action to perform. Would that already be something that would match the concept you proposed?

As said before, for sure it would match. The only question that comes to my mind is, if it makes sense to encapsulate under these three basics when the actions are well known for each technology. We could think about vtables. But again, this is just an implementation detail and it shouldn't interfere with the architecture.

That would certainly make the compilers job easier to generate efficient machine code for a switch() of the enums.

That's for sure!

@jia200x
Copy link
Member

jia200x commented Nov 22, 2019

I just found this: https://github.com/RIOT-OS/RIOT/blob/master/drivers/at86rf2xx/at86rf2xx_netdev.c#L489

This check is not device dependent (the PHY channels are PHY dependent. IEEE802.15.4 2.4GHz always uses page 0).
With a transceiver HAL, we can move the validation to the PHY layer and assume we would never set a wrong channel

@maribu
Copy link
Member Author

maribu commented Nov 28, 2019

Reducing the number of "actual" functions in the HAL has proven to reduce the ROM requirements significantly. The benefit of proper type checks and range changes could be provided upon a slim API via static inline functions that encapsulate e.g. the proposed action(), get() and set() functions. I'd even say it would be nice to have the action(), get() and set() interface as "private" interface, and let users only use static inline functions to access them. That would yield an API with easy to use and more descriptive functions that provide both type safety and range check; but without the increased ROM requirements.

From the perspective of an implementer of a device driver it is actually quite nice to have few functions like get(), set() and action(): The boilerplate code needed for every communication with the device (e.g. spi_acquire() and spi_release()) could be shared, which can reduce the lines of code needed to implement it a bit.

@jia200x
Copy link
Member

jia200x commented Nov 29, 2019

Reducing the number of "actual" functions in the HAL has proven to reduce the ROM requirements significantly. The benefit of proper type checks and range changes could be provided upon a slim API via static inline functions that encapsulate e.g. the proposed action(), get() and set() functions. I'd even say it would be nice to have the action(), get() and set() interface as "private" interface, and let users only use static inline functions to access them. That would yield an API with easy to use and more descriptive functions that provide both type safety and range check; but without the increased ROM requirements.

I like the idea of having static inline functions. In fact, it would be nice to have them beforehand so we can measure performance differences between different implementations of HAL.

From the perspective of an implementer of a device driver it is actually quite nice to have few functions like get(), set() and action(): The boilerplate code needed for every communication with the device (e.g. spi_acquire() and spi_release()) could be shared, which can reduce the lines of code needed to implement it a bit.

Agreed. That's the point of separating these sub-layers

@jia200x
Copy link
Member

jia200x commented Nov 29, 2019

Going back to the interface, I think I would also remove the isr and init functions.

The isr can be replaced with OS signals(e.g #12474 ). A driver can register its ISR processing function directly without the need of an extra interface.
The init can be omitted considering there's always a device specific auto_init (e.g auto_init_at86rf2xx inits gnrc_netif, who then will call the driver init function using the netdev interface. Why don't just call it directly in the auto_init function?)

Check e.g the AT86RF2xx test. Note that:

  • It uses stuff from the radio API directly. E.g the device descriptor
  • It calls init and isr via an interface, although we don't need to abstract something there because we know beforehand we are using an Atmel radio.

@benpicco
Copy link
Contributor

A driver can register its ISR processing function directly without the need of an extra interface.

Will that still work if an interrupt is shared between two interfaces?
But since you then need something other than PID to address interfaces anyway, it might even be easier.

@jia200x
Copy link
Member

jia200x commented Nov 29, 2019

Will that still work if an interrupt is shared between two interfaces?

Yes. In fact, it's easier because the driver is the one that knows what to do with the interrupt.
The ISR processing would indirectly call the rx_done of the PHY interface on top (and the driver is the one that de-multiplexes)

@jia200x
Copy link
Member

jia200x commented Nov 29, 2019

I proposed to define the PHY and HAL layer for each device type with static inline wrappers of netdev, so we can proceed with the proposed changes.

Does this sound reasonable?

@maribu
Copy link
Member Author

maribu commented Nov 29, 2019

The init can be omitted considering there's always a device specific auto_init (e.g auto_init_at86rf2xx inits gnrc_netif, who then will call the driver init function using the netdev interface. Why don't just call it directly in the auto_init function?)

I never fully understood why there were to functions for doing the initialization. To me, this makes the process more complex and increases the ROM size, as function calls add (depending on calling convention) some boilerplate instructions. But maybe there was some specific reason for that, so it might be a good idea to ask why this was chosen that way. (And ideally we can find a better solution to that, rather than splitting the setup phase into two functions.)

@jia200x
Copy link
Member

jia200x commented Nov 29, 2019

maybe @kaspar030 or @haukepetersen can say something about it

@kaspar030
Copy link
Contributor

I never fully understood why there were to functions for doing the initialization.

The idea was that at some point, the network stack needs to know "this device can be used". That time has come when "init()" has been called. It is basically the initialization function of the netdev layer (which might be independend of the underlying device.).
Maybe some state within netdev must be set, some device(type) specific fields populated...

@maribu
Copy link
Member Author

maribu commented Nov 29, 2019

I have three alternatives to the two-step initialisation that might work:

  1. Pass a void-pointer to the init function pointing the foo_params_t struct. This has the disadvantage, that the foo_params.h headers need to be included from the network stack, which can get a maintenance issue.
  2. Pass the index within the foo_params array to the init function. Just comparing if the netdev_driver_t-pointer is equal to the previous device (increment index) or not (reset index to zero) should be sufficient. It has the disadvantage, that network interfaces need to be ordered by driver.
  3. Drop the init function from the netdev driver. Let the auto_init function of the driver call into the network stack if initialisation succeeded. (The network stack has to maintain a RAM allocated array of usuable network interfacaes anyway, which can just be populated in this function.) For a RIOT node with a single transceiver, it drops one function and adds one, so no benefit here. But for n netdevs this replaces n functions by 1.

I'm personally in favour of option 3. But maybe this discussion leads to more and better ideas than those three :-)

@jia200x
Copy link
Member

jia200x commented Nov 29, 2019

Drop the init function from the netdev driver. Let the auto_init function of the driver call into the network stack if initialisation succeeded. (The network stack has to maintain a RAM allocated array of usuable network interfacaes anyway, which can just be populated in this function.) For a RIOT node with a single transceiver, it drops one function and adds one, so no benefit here. But for n netdevs this replaces n functions by 1.

+1 to this approach.
Actually the network stack shouldn't see devices but network interfaces. So I would just let the drivers init themselves.

@jia200x
Copy link
Member

jia200x commented Nov 29, 2019

Even there are cases where a network interface is not associated to a device driver (e.g loopback interface)

@kaspar030
Copy link
Contributor

3\. Drop the init function from the netdev driver. Let the auto_init function of the driver call into the network stack if initialisation succeeded.

where? How? Is the network stack already "up"?
We should rather extend "init()" to take event_callback and context as parameters.

netdev->init() is the initialization synchronization point between netdev and a network stack.

@maribu
Copy link
Member Author

maribu commented Nov 29, 2019

netdev->init() is the initialization synchronization point between netdev and a network stack.

First, netdev->init() depends on netdev_foo_init(foo_t *, foo_params_t *) being called prior to its call. And thus forces a specific boot up sequence, so that it doesn't solve the synchronisation requirements.

Second, this is something that is totally unique in RIOT. All other drivers have a foo_init(foo_t *, foo_params_t *) as initialisation, and that's it. Is there anything unique to netdev, that requires or benefits from this unique initialisation approach?

Let's look e.g. at a sensor driver for an I2C sensor. That driver needs the I2C bus to work. But there is no i2c_dec->init() as synchronisation point. The proper initialisation is done by enforcing that periph_init() is done before auto_init(). Then SAUL requires sensors (and actuators) to only get registered if and after they have been successfully initialised. That is simply handled by having each sensor (and actuator) call into SAUL after the initialisation has succeeded.

To me, netdevs behave to the network stack very similar to how sensor drivers behave to SAUL, when only looking at the initialisation requirements. So why having different approaches there? If one solution was preferred over the other for good reasons, those reasons should apply in the other context as well. Also: Consisted look and feel was stated as explicit design goal of RIOT. So to me, there should be sound arguments to justify the inconsistency here; if not, the inconsistency should be addressed.

Finally, the split of the initialisation increases complexity and ROM requirements. But so far noone could explain me the benefit of that split. A simple callback like used in SAUL is enough to let the network stack know that initialisation succeeded. (Let's be honest about the synchronisation: It depends on init functions being called in the correct order with zero concurrency, even with the current approach. But for an OS without SMP support, that is a sane design choice. Especially when low complexity and footprint are design goals.)

@jia200x
Copy link
Member

jia200x commented Dec 2, 2019

o me, netdevs behave to the network stack very similar to how sensor drivers behave to SAUL, when only looking at the initialisation requirements. So why having different approaches there? If one solution was preferred over the other for good reasons, those reasons should apply in the other context as well.

I totally agree. All synchronization should be done in the boot sequence without an init interface.
In any case, the network stack shouldn't be the one who initializes the device driver. Not even the network interface. There are several cases where a device is needed without a network interface nor network stack (jammers, MAC layers, tests, etc).

@kaspar030
Copy link
Contributor

Is there anything unique to netdev, that requires or benefits from this unique initialisation approach?

yes. a network device is by definition asynchronous and triggers events by itself. A random saul driver will not trigger asynchronous events before being configured (initialized by the application and configured to trigger events).

So either the network stack initializes the network device, or the network interface is being initialized before and then being told by the network stack that it is now "active".

Without this extra activation / synchronization step, the driver's ISR doesn't know what to do in case of an ISR (as e.g., event_callback is not set up). The idea is that once "init()" has been called, the driver is from that point on free to send events, e.g., about arriving packets, link-state, ...

In any case, the network stack shouldn't be the one who initializes the device driver.

Exactly. But a netdev without "network stack" cannot handle its own ISRs. netdev->init() is a method to tell the netdev layer that this is now possible.

@maribu
Copy link
Member Author

maribu commented Dec 2, 2019

yes. a network device is by definition asynchronous and triggers events by itself. A random saul driver will not trigger asynchronous events before being configured (initialized by the application and configured to trigger events).

Your description of the sensor matches quite perfectly the behavior of a network device: It will not generate any event until the driver configures it to do so (upon driver initialization).

So why not bring up the network stack, with only the loop back device configured. Then bring up the network devices one by one, which call back into the network stack to register themselves (just like in SAUL) right after the driver is successfully initialized. After that registration, interrupts of the netdev can be enabled and served.

@kaspar030
Copy link
Contributor

Then bring up the network devices one by one,

How? With a function called bringup()? Or maybe init()? Or how do you let the driver know where to call back?

So why not bring up the network stack, with only the loop back device configured.

That might work with gnrc. A more specialized stack might not want to dynamically add new network devices, just expect some to be there and use them.

I'm sure there is a way to get rid of init(), but IMO it remains to be shown that that would be an improvement.

@maribu
Copy link
Member Author

maribu commented Dec 2, 2019

How? With a function called bringup()? Or maybe init()?

The common approach is to call foo_init(foo_t *dev, foo_params_t *params) to initialize a driver. As stated before, I strongly believe that this should be done consistently everywhere, unless valid reasons can be provided to do it differently.

That might work with gnrc. A more specialized stack might not want to dynamically add new network devices, just expect some to be there and use them.

Please elaborate on how this would work with the current API. Summarizing only your own words, the constraints are:

  1. The network device must not be used prior to netdev_driver_t::init() returned a success code
  2. The network device can start to emit events from the point the netdev_driver_t::init() is called. (This explicitly means before that function returns.)
  3. The networks stack expects some network device to just be there and be usable

Dropping netdev_driver_t::isr() as proposed by @jia200x and using the event handler thread would by the way allow using such a static network stack, with two adaptions: The callback into the network stack is not used, and RIOT panics when netdev_foo_init(netdev_foo_t *, netdev_foo_params_t *) fails.

IMO it remains to be shown that that would be an improvement.

Start reading form #12469 (comment), and I'm sure you'll find the arguments you're looking for ;-) But showing the improvement in ROM size with a prototype might actually be more convincing ;-)

@jia200x
Copy link
Member

jia200x commented Dec 2, 2019

So either the network stack initializes the network device, or the network interface is being initialized before and then being told by the network stack that it is now "active".

The network stack shouldn't initialize the device driver. The OS should.
This "hey network stack, you can use this interface" is the "up" status of a network interface.
In fact, the network stack cannot send data if a PPP connection is not established, if LoRaWAN is not joined or if IEEE802.15.4 didn't configure the PAN. So, the network stack shouldn't even be aware of the concept of "device"

Without this extra activation / synchronization step, the driver's ISR doesn't know what to do in case of an ISR (as e.g., event_callback is not set up). The idea is that once "init()" has been called, the driver is from that point on free to send events, e.g., about arriving packets, link-state, ...

This is strictly a responsibility of the PHY layer. It's not even close to the network interface IMO.

Exactly. But a netdev without "network stack" cannot handle its own ISRs. netdev->init() is a method to tell the netdev layer that this is now possible.

In fact, I would expect the driver the startup sequence of the driver (auto_init_xxx) to initialize the PHY layer on top, not the other way around...

@jia200x
Copy link
Member

jia200x commented Dec 2, 2019

An extra note. The device driver shouldn't receive interrupts out of nothing (we are doing something wrong if so). The device driver will receive events only if:

  • The PHY layer set the transceiver state to RX
  • There was a response to an event triggered by the user (Tx Done, Framebuffer overflow, etc) or an indication, that can be handled by the device.

All of them can be handled if the device drivers inits before the PHY layer.
If we start receiving packets in the init function, I'm afraid to say we might be doing something wrong :/

@jia200x
Copy link
Member

jia200x commented Dec 2, 2019

The common approach is to call foo_init(foo_t *dev, foo_params_t *params) to initialize a driver. As stated before, I strongly believe that this should be done consistently everywhere, unless valid reasons can be provided to do it differently.

Strong +1 to this. After all, a network device is just another device. We can use that devices for other stuff that are not even close to networks (RNG, temperature sensor, crypto acceleration, etc)

@bergzand
Copy link
Member

bergzand commented Dec 2, 2019

This discussion seems to be getting stuck around assumptions on both sides. Is there a sequence diagram available somewhere explaining the proposed idea and stating the responsibilities of all functions?

@kaspar030
Copy link
Contributor

kaspar030 commented Dec 2, 2019

The common approach is to call foo_init(foo_t *dev, foo_params_t *params) to initialize a driver.

Yes, nothing wrong with that. But a netdev (logically) sits on top of that. The mentioned calls says nothing about netdev->event_callback. Those are only usable when a network stack takes the properly set up device descriptor and calls it's netdev->init().

Please elaborate on how this would work with the current API.

It works perfectly well.

Summarizing only your own words, the constraints are:

  1. Whatever auto init mechanism sets up the device specific configuration (bus, pins, ...), as in, driver_setup(descriptor, params_t).
1. The network device must not be used prior to `netdev_driver_t::init()` returned a success code

1.1. the network stack initializes its event handling mechanism (set up thread with message loop, ...)
1.2. the network stack takes the array of "netdev_t", iterates them, sets up it's own structures, populates the netdevs' "callback_event" and "context", then calls device->netdev->init().

2. The network device can start to emit events from the point the `netdev_driver_t::init()` is called. (This explicitly means before that function returns.)

Yes, calling init() means the network stack event handling has been set up before.

3. The networks stack expects some network device to just be there and be usable

No, the network stack expects a bunch of "netdev_t" that are ready to use after calling their init().

If there's no explicit "init()", it is implicitly assumed that the network device does not do anything requiring e.g., SPI communication (after its setup function) until the network stack issues an action that enables it (setting to RX state).

Please, treat network drivers as two-stage: their actual implementation and the netdev part. Just because most drivers in RIOT are netdev only doesn't mean we should assume or even force that.

@jia200x
Copy link
Member

jia200x commented Dec 2, 2019

Yes, nothing wrong with that. But a netdev (logically) sits on top of that. The mentioned calls says nothing about netdev->event_callback. Those are only usable when a network stack takes the properly set up device descriptor and calls it's netdev->init().

I fail to understand something here. What do you call a network stack in this context? Are you considering PHY and MAC a part of the network stack?

@kaspar030
Copy link
Contributor

I fail to understand something here. What do you call a network stack in this context? Are you considering PHY and MAC a part of the network stack?

Yes.

@maribu
Copy link
Member Author

maribu commented Dec 2, 2019

I fail to understand something here. What do you call a network stack in this context? Are you considering PHY and MAC a part of the network stack?

Yes.

I refer to @jia200x post above and cite the image of the architecture:

architecutre

What we want to discuss is who a network device driver interfaces with the PHY layer on top. Do we now all agree that a single initialization function is sufficient?

@kaspar030
Copy link
Contributor

kaspar030 commented Dec 2, 2019

Do we now all agree that a single initialization function is sufficient?

No, unless we want to change netdev->init() so it takes device-specific parameters (void *params), in which case the upper layer would use that as sole initialization function for a network device (driver).

Which IMO, we don't want.

Why are you so opposed on having an explicit function for hooking up a netdev to its users?

Network drivers get set up as regular devices (as in, at86rf2xx_setup(dev, params)). At that point, there's a "netdev" that can be used by the network stack, after a call to netdev->init().
That function might be empty, or, especially when stacked netdev is used to share common code, there might be additional initialization that needs to be done.

It is a simple two stage setup procedure, one stage that has all device specific information, one that doesn't but can expect the upper layer to be ready. of course we can fold that together somehow, but that would only tie the layers unnecessarily together.

@jia200x
Copy link
Member

jia200x commented Dec 2, 2019

What we want to discuss is who a network device driver interfaces with the PHY layer on top. Do we now all agree that a single initialization function is sufficient?

Also, this shows the only one that requires the radios to be initialized is the PHY layer.
As mentioned before, network stacks are not the only users of a network device.

Thus, initialization should be carried by boot sequence and not by the network stack.

@jia200x
Copy link
Member

jia200x commented Dec 2, 2019

No, unless we want to change netdev->init() so it takes device-specific parameters (void *params), in which case the upper layer would use that as sole initialization function for a network device (driver).
Which IMO, we don't want.

Of course we don't want that, but I don't see why we need to pass device specific params there

@kaspar030
Copy link
Contributor

Of course we don't want that, but I don't see why we need to pass device specific params there

Because the device cannot be initialized without its params_t.

Should we set up a meeting to discuss this? IMO there are limits of github comments to discuss complex things with differing definitions...

@jia200x
Copy link
Member

jia200x commented Dec 2, 2019

Network drivers get set up as regular devices (as in, at86rf2xx_setup(dev, params)). At that point, there's a "netdev" that can be used by the network stack

Could you please define network stack and in which layer wold you put netdev in terms of:

  • HAL
  • PHY
  • MAC

I think that would help to focus the discussion

@jia200x
Copy link
Member

jia200x commented Dec 2, 2019

Because the device cannot be initialized without its params_t.
Should we set up a meeting to discuss this? IMO there are limits of github comments to discuss complex things with differing definitions...

Yes, I think it's a good idea.

@kaspar030
Copy link
Contributor

Yes, I think it's a good idea.

Now is a little late in the day, but maybe sometime tomorrow? maybe 11am (CET)?

@jia200x
Copy link
Member

jia200x commented Dec 2, 2019

Now is a little late in the day, but maybe sometime tomorrow? maybe 11am (CET)?

Ok for me. Is it ok for you @maribu ?

@maribu
Copy link
Member Author

maribu commented Dec 2, 2019

Sorry, I won't be able to make it. But being honest, in addition to implement most parts of the interface between network device driver and the PHY layer via static inline functions on top of a very slim interface (e.g. action(), set() and get() and the event handler thread), I haven't anything to add anyway to add to @jia200x proposal anyway. So I'd say just go ahead without me.

@stale
Copy link

stale bot commented Jun 4, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you want me to ignore this issue, please mark it with the "State: don't stale" label. Thank you for your contributions.

@stale stale bot added the State: stale State: The issue / PR has no activity for >185 days label Jun 4, 2020
@stale stale bot closed this as completed Jul 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Discussion: needs consensus Marks a discussion that needs a (not necessarily moderated) consensus Discussion: RFC The issue/PR is used as a discussion starting point about the item of the issue/PR State: stale State: The issue / PR has no activity for >185 days
Projects
None yet
Development

No branches or pull requests

6 participants