Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kernel/device: Update to multi API device model #49374

Conversation

bjarki-andreasen
Copy link
Collaborator

@bjarki-andreasen bjarki-andreasen commented Aug 22, 2022

This update adds support for creating multiple struct
devices pr devicetree node, differentiated by API type.

The new device model can exist with the old device model
until drivers and subsystems have been updated to use
the new model, at which point, the old model may be
depricated. This model can completely replace the the
current device model once it is completely stable.

The new device model can be used to update the shell
backends. The new device model groups devices by
API using properties similar to devicetree bindings files,
which negates the need for using device_get_binding()
and device string names to identify device API type.
On top of this, the macro DEVICE_DT_API_FOREACH()
allows for compile time iteration of enabled devices, where
additional properties defined in the properties files can
be obtained, like sensor type, vendor id, vendor name etc.

The new device model also allows for tailored APIs like
location and PPS, which are secondary functionalities which
are shared between different types of devices. The device
interface will be similar to a C++ class inheriting multiple
smaller specific APIs, fully utilizing complex devices, which
would otherwise have to have functionalities moved to
driver specific headers, which is not portable.

The specific changes made are listed and described here:

Scripts have been added which look through a zephyr
specific properties files. These files contain zephyr
specific properties related to devices, like which API
they support. They are linked to the dts bindings
with the compatible property.

gen_device_properties.py walks through all the properties
files stored in drivers/properties, parses and verifies
them, and stores the parsed data in a PICKLE formatted
binary file.

gen_device_header.py uses the parsed properties and data
from dts.cmake (EDT_PICKLE) to find nodes, and generate
appropriate macros and extern declarations, which are
stored in device_generated.h"

@galak
Copy link
Collaborator

galak commented Aug 22, 2022

Been thinking about where do we keep the association of API and driver compatible. I'm thinking a new YAML file that describes the driver features. The idea here is to separate hw description in dts & dts binding YAML and software description of driver in a driver YAML.

(For a few I2C controllers that support PM)

compatibles:
  - ti,cc13xx-cc26xx-i2c
  - nordic,nrf-twi
  - nordic,nrf-twim

properties:
  apis:
     i2c:
       features:
        - stats
        - pm

(For nxp,i2c-mcux that has callback support, but not-PM)

compatibles:
  - nxp,i2c-mcux

properties:
  apis:
     i2c:
       features:
        - stats
        - callback

@bjarki-andreasen
Copy link
Collaborator Author

bjarki-andreasen commented Aug 22, 2022

Been thinking about where do we keep the association of API and driver compatible. I'm thinking a new YAML file that describes the driver features. The idea here is to separate hw description in dts & dts binding YAML and software description of driver in a driver YAML.

(For a few I2C controllers that support PM)

compatibles:
  - ti,cc13xx-cc26xx-i2c
  - nordic,nrf-twi
  - nordic,nrf-twim

properties:
  apis:
     i2c:
       features:
        - stats
        - pm

(For nxp,i2c-mcux that has callback support, but not-PM)

compatibles:
  - nxp,i2c-mcux

properties:
  apis:
     i2c:
       features:
        - stats
        - callback

This PR provides a top level key for apis

compatible: zephyr,driver

api: [i2c, i3c, ...]

The apis are simply the __subsystem struct i2c_driver_api structs with the _driver_api removed. This is how gen_defines
knows which API devices to declare for a node. The structs are also used to create the sections in ROM for the device
structs. The APIs themselves are already well described in the header files, and since they are linked directly with the api
in the bindings file, I think an additional yaml or other file describing the same or similar information is redundant. Unless you want to have some other information in the YAML file which isn't already in the header file?

@galak
Copy link
Collaborator

galak commented Aug 22, 2022


The apis are simply the __subsystem struct i2c_driver_api structs with the _driver_api removed. This is how gen_defines
knows which API devices to declare for a node. The structs are also used to create the sections in ROM for the device
structs. The APIs themselves are already well described in the header files, and since they are linked directly with the api
in the bindings file, I don't think an additional yaml or other file to describing the same information is redundant. Unless you want to have some other information in the YAML file which isn't already in the header file?

Its a meta question for people to discuss about having more than just the API information.

@henrikbrixandersen henrikbrixandersen self-requested a review August 23, 2022 07:29
Copy link
Member

@gmarull gmarull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you mind creating a test/sample that illustrates how this works in practice?

@bjarki-andreasen bjarki-andreasen force-pushed the device_update_to_multi_api_dev_model branch from 262efeb to 01303fc Compare August 23, 2022 09:09
@gmarull
Copy link
Member

gmarull commented Aug 23, 2022

Note regarding bindings api field: I think our goal should be to align with dt-schema, so the more we customize our own flavor of bindings, the more difficult it will be to replace them. IMO, bindings need to describe the DT content (schema), not software properties. The relationship between DT and devices is really a Zephyr decision, e.g. the fact we now have a 1:1 link between dt nodes and devices is a Zephyr architectural decision, this is not true on Linux.

@galak
Copy link
Collaborator

galak commented Aug 23, 2022

Note regarding bindings api field: I think our goal should be to align with dt-schema, so the more we customize our own flavor of bindings, the more difficult it will be to replace them. IMO, bindings need to describe the DT content (schema), not software properties. The relationship between DT and devices is really a Zephyr decision, e.g. the fact we now have a 1:1 link between dt nodes and devices is a Zephyr architectural decision, this is not true on Linux.

As @gmarull said, I'm not keen on keeping this information in the devicetree binding as its a pure software construct of Zephyr. The devicetree is meant to describe the hardware and some level of configuration of that hardware. What API implements what driver is a pure software decision of Zephyr and should be described elsewhere. I suggested one option of introducing a new driver/software oriented 'binding' that describes relevant software features that have been implemented.

@bjarki-andreasen
Copy link
Collaborator Author

bjarki-andreasen commented Aug 23, 2022

Note regarding bindings api field: I think our goal should be to align with dt-schema, so the more we customize our own flavor of bindings, the more difficult it will be to replace them. IMO, bindings need to describe the DT content (schema), not software properties. The relationship between DT and devices is really a Zephyr decision, e.g. the fact we now have a 1:1 link between dt nodes and devices is a Zephyr architectural decision, this is not true on Linux.

As @gmarull said, I'm not keen on keeping this information in the devicetree binding as its a pure software construct of Zephyr. The devicetree is meant to describe the hardware and some level of configuration of that hardware. What API implements what driver is a pure software decision of Zephyr and should be described elsewhere. I suggested one option of introducing a new driver/software oriented 'binding' that describes relevant software features that have been implemented.

Okay, so we will have a dts/bindings for hardware, and device/bindings or similar for software, both sharing the compatible property to bind them together, since the software and hardware properties are tied to each other.

I see no issue in splitting it up like this, to keep dts/bindings compatible with Linux, and the device/bindings or similar for Zephyr additions, however i think it is simpler from Zephyr perspective to keep them together.

@galak
Copy link
Collaborator

galak commented Aug 23, 2022

Okay, so we will have a dts/bindings for hardware, and device/bindings or similar for software, both sharing the compatible property to bind them together, since the software and hardware properties are tied to each other.

I see no issue in splitting it up like this, to keep dts/bindings compatible with Linux, and the device/bindings or similar for Zephyr additions, however i think it is simpler from Zephyr perspective to keep them together.

I get how its easier to have the info in one place, but I think long term its better to keep the separation since as @gmarull one of our longer terms goals is to try and utilize the same dt-schema format that linux uses for better sharing of tooling, bindings etc. So introducing Zephyr specific's in that area will make it difficult to make that switch.

@bjarki-andreasen
Copy link
Collaborator Author

bjarki-andreasen commented Aug 23, 2022

Okay, so we will have a dts/bindings for hardware, and device/bindings or similar for software, both sharing the compatible property to bind them together, since the software and hardware properties are tied to each other.
I see no issue in splitting it up like this, to keep dts/bindings compatible with Linux, and the device/bindings or similar for Zephyr additions, however i think it is simpler from Zephyr perspective to keep them together.

I get how its easier to have the info in one place, but I think long term its better to keep the separation since as @gmarull one of our longer terms goals is to try and utilize the same dt-schema format that linux uses for better sharing of tooling, bindings etc. So introducing Zephyr specific's in that area will make it difficult to make that switch.

Is this an issue I should undertake in this PR? And if so, can we agree on the schema so i can implement it?
I propose zephyr/bindings so my dummy example would be in zephyr/bindings/dummy/dummy_driver1.yaml for software desc
and zephyr/dts/bindings/dummy/dummy_driver1.yaml for hardware.

Copy link
Collaborator

@tbursztyka tbursztyka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I quickly stopped when I realized this approach would create a schism in how devices are handled.

I see why you went that way, to save memory and runtime processing. But from a design point of view this making stuff all messed up: now we would have devices existing in an array depending of their primary API, fully separated to the others. Only for sake of multi-API use case.
That breaks too many things (parent/child ordering, thus all that relies on it such as PM etc...)

We should treat the devices for what they are, whether they expose an API (or multiple ones) or not is irrelevant in this matter.

Perhaps you should attack the problem from another stand point: the api property itself, leaving out of the picture how the device is treated.

It's wet finger thinking so it's nothing well thought though but:
what a bout having a macro that would return the requested API pointer by it's dev and api type, which would be used by the API domain helpers.
For instance, let's take the Counter API

static inline uint32_t z_impl_counter_get_frequency(const struct device *dev)
{
        const struct counter_driver_api *api = (struct counter_driver_api *)dev->api;

Would become:

static inline uint32_t z_impl_counter_get_frequency(const struct device *dev)
{
        const struct counter_driver_api *api = API_GET(dev, counter);

Obviously it would need to check if the returned pointer is not NULL to proceed, etc...

I guess it would require that all defined APIs are then inserted in a dedicated ROM section, and the extra APIs one or more device would expose as well.
Overall, I am not entirely sure you can avoid runtime processing and using a bit of memory, but it would be quite limited anyway. At least way more than the current usage which is: creating a device per-exposed API.

include/zephyr/device.h Outdated Show resolved Hide resolved
include/zephyr/device.h Outdated Show resolved Hide resolved
@bjarki-andreasen
Copy link
Collaborator Author

bjarki-andreasen commented Sep 15, 2022

@tbursztyka @dleach02

That breaks too many things (parent/child ordering, thus all that relies on it such as PM etc...)

This is incorrect. The point of the multi API device model is to be interchangeable with the current API wrappers, PM, etc. The multi API device model should not break anything, if it does, please detail what is breaking so it can be handled.

I quickly stopped when I realized this approach would create a schism in how devices are handled.

The point of the multi api device model is to replace the current implementation.

There will only be two different ways of defining devices in the transition period; the new one for which we should update the drivers and shells to use, and the current one, which will be kept to support drivers and apps which have not yet been updated to the new device model. The current one can then be deprecated when transition is done.

Regarding PM, using the runtime device PM, you can treat the device structs which belong to the same node as independent devices (which they aren't of course, but the application will not and won't need to know that).

Using the PM action, calling it on one or the other will result in the same action taken by the device, it will not break anything, it just means the user must be conscious of the fact that it is the same device they are interacting with when manually changing power states.

The elegant solution to this in the future is to create a PM device API which devices which support PM can implement, moving the pm context from the device struct to the drivers themselves. This is another cool feature of having multiple APIs, no bloating the device struct context with potentially unused sub contexts like pm.

Only for sake of multi-API use case.

The multi API device model fixes many issues, and as stated before, is a complete replacement for the current device model. It supporting multiple APIs is a necessary feature the current model lacks, but it also brings other features and cleanup:

Take for example the shells: We can now create a list at compile time containing every device which supports a specific API by node_id, and get additional properties defined for that device using the new properties files (see zephyr/samples/multi_Api_shell/sensor_shell/src/main.c).

And the sensors: We can now define vendor, model and other properties which can be fetched at compile time, which is a good mechanism to fix #49294 for example.

The devicetree can be cleaned up since it no longer needs to generate extern declarations for devices since this is now handled by the new properties files and scripts.

now we would have devices existing in an array depending of their primary API, fully separated to the others

There are two lists of device structs in ROM, one which holds the current DEVICE_DEFINE devices, and one which holds all device structs created using DEVICE_API_DEFINE. The reason for this is that device_get_binding(), which is not necessary with the new multi api device model, iterates through that section. All devices defined using DEVICE_API_DEFINE are placed in a single section in ROM, regardless of API. They are not iterated through by design.

The point of the new API model is to allow it to coexist alongside the current device model, without altering it.

Perhaps you should attack the problem from another stand point: the api property itself, leaving out of the picture how the device is treated.

This was one of my first proposed solutions to the issue, which was shut down pretty much immediately (see early comments in this PR #48817), and for good reason. Runtime lookup of APIs is slow and cumbersome, the multi API device model handles everything at compile time, completely removing the need for runtime lookup of devices in general as stated earlier (say goodbye device_get_binding :))

static inline uint32_t z_impl_counter_get_frequency(const struct device *dev)
{
        const struct counter_driver_api *api = API_GET(dev, counter);

The problem here is the runtime lookup of the API, as stated previously, this is not feasible. It would also break every Zephyr API, which is not realistic, and as stated earlier "The multi API device model should not break anything"

I hope you will find the time to review the proposal in more detail, if not, i will go through it in the dev meeting thursday september 22.

@mbolivar-nordic
Copy link
Contributor

The devicetree can be cleaned up since it no longer needs to generate extern declarations for devices since this is now handled by the new properties files and scripts.

Curious what you mean here ever since 87e148e was merged.

@bjarki-andreasen
Copy link
Collaborator Author

bjarki-andreasen commented Sep 18, 2022

The devicetree can be cleaned up since it no longer needs to generate extern declarations for devices since this is now handled by the new properties files and scripts.

Curious what you mean here ever since 87e148e was merged.

I was referencing the last section in the device.h file using the devicetree foreach to declare externs which can be removed, while forgetting that the DT_FOREACH_STATUS_OKAY_NODE may be used for other functionality than declaring externs, so there will be no impact on devicetree.

Turning it around, the properties files and device dedicated script generating macros will allow for zephyr specific device features to be implemented outside of the devicetree and edtlib (see zephyr/drivers/properties/), while using nodes and properties from the devicetree. So it will keep pollution away from the devicetree in the future, moving us to a timeline where devicetree is cleaner than it otherwise could have been :)

@bjarki-andreasen bjarki-andreasen force-pushed the device_update_to_multi_api_dev_model branch from 6a39514 to 2805741 Compare September 18, 2022 19:40
Bjarki Arge Andreasen added 3 commits September 18, 2022 21:43
This update adds support for creating multiple struct
devices pr devicetree node, differentiated by API type.
device.h has been updated with new macros which make use
of the new multi API device subsystem.

The specific changes made are listed and described here:

Scripts have been added which look through a zephyr
specific properties files. These files contain zephyr
specific properties related to devices, like which API
they support. They are linked to the dts bindings
with the compatible property.

gen_device_properties.py walks through all the properties
files stored in drivers/properties, parses and verifies
them, and stores the parsed data in a PICKLE formatted
binary file.

gen_device_header.py uses the parsed properties and data
from dts.cmake (EDT_PICKLE) to find nodes, and generate
appropriate macros and extern declarations, which are
stored in device_generated.h.

The properties files support a properties property
similar to the dts bindings. A property supports the
following keys:

properties:
  dummy:
    type: <string or int>
    required: <true or false>
    default: <value if value not set>
    value: <value which overwrites default>

These can be fetched during compile time using
DEVICE_DT_PROPERTY(node_id, property) and
DEVICE_DT_PROPERTY_OR(node_id, property, default)

This is useful for metadata like sensor vendor and
model, which can be fetched during compile time,
preferably using the DEVICE_DT_API_FOREACH(fn) macro

Lastly, a KConfig option has been added which allows for
using the new properties files and scripts with the old
device model, and with the new macros for defining device
instances, which allow for updating the drivers to the new
device model without breaking compatibility, allowing for
a later deprecation of that option, followed by removal of
the old device model.

Signed-off-by: Bjarki Arge Andreasen <bjarkix123@gmail.com>
Added test for defining, getting and listing device APIs

Signed-off-by: Bjarki Arge Andreasen <bjarkix123@gmail.com>
Added simple POC which uses the properties files and new device
macros to create a shell, which doesn't require runtime device
lookup and can warn if no sensors are present

Signed-off-by: Bjarki Arge Andreasen <bjarkix123@gmail.com>
@bjarki-andreasen bjarki-andreasen force-pushed the device_update_to_multi_api_dev_model branch from 2805741 to 74d6c96 Compare September 18, 2022 19:43
@bjarki-andreasen
Copy link
Collaborator Author

bjarki-andreasen commented Sep 18, 2022

PR update:

The naming of macros have been optimized for readability and understanding:

The updated versions of DEVICE_DEFINE and DEVICE_DT_DEFINE have been renamed to DEVICE_NEW_DEFINE DEVICE_DT_NEW_DEFINE. This should indicate that they are new versions of the legacy macros, which shall replace DEVICE_DEFINE and DEVICE_DT_DEFINE.

The device struct get macros keep the names DEVICE_API_GET and DEVICE_DT_API_GET which should indicate that you are getting a device instance based on the API.

An updating procedure from the current device model to the new device model has been implemented:

The KConfig option LEGACY_DEVICE_MODEL has been added to Kconfig.zephyr with the default value y. The option excludes all multi API specific macros like DEVICE_API_GET and turns DEVICE_DT_NEW_DEFINE into a wrapper for the legacy DEVICE_DT_DEFINE, which can be used for devices which only have one API (which currently is all of them). This allows for updating the drivers to use the DEVICE_DT_NEW_DEFINE to prepare for the switch to the new multi API device model.

The update procedure follows:

  1. Merge new device model and properties file feature with upstream (will not have any impact on existing app/drivers)
    • Update in-tree drivers to use DEVICE_DT_NEW_DEFINE
    • Add properties files for in-tree drivers to zephyr/drivers/properties
    • This still has no impact on existing app/drivers, but will enable the use of DEVICE_DT_API_FOREACH and properties from properties files
  2. Update subsystems and shells to use DEVICE_DT_API_FOREACH and properties from device properties files
  3. Deprecate LEGACY_DEVICE_MODEL, DEVICE_DT_DEFINE and DEVICE_DEFINE
  4. Remove LEGACY_DEVICE_MODEL and legacy device model macros / functions from device.h
  5. Rename DEVICE_DT_NEW_DEFINE to DEVICE_DT_DEFINE, turning DEVICE_DT_NEW_DEFINE into a wrapper for the new DEVICE_DT_DEFINE
  6. Deprecate DEVICE_DT_NEW_DEFINE
  7. Success

Following this, the PM context should be removed from the struct device and into a PM subsystem, and the PM API should become an API that can be added to a device instead using the new multi API device model. This is a larger task that will get its own PR naturally.

@gmarull
Copy link
Member

gmarull commented Sep 20, 2022

static inline uint32_t z_impl_counter_get_frequency(const struct device *dev)
{
        const struct counter_driver_api *api = (struct counter_driver_api *)dev->api;

Would become:

static inline uint32_t z_impl_counter_get_frequency(const struct device *dev)
{
        const struct counter_driver_api *api = API_GET(dev, counter);

Obviously it would need to check if the returned pointer is not NULL to proceed, etc...

I guess it would require that all defined APIs are then inserted in a dedicated ROM section, and the extra APIs one or more device would expose as well. Overall, I am not entirely sure you can avoid runtime processing and using a bit of memory, but it would be quite limited anyway. At least way more than the current usage which is: creating a device per-exposed API.

I'm uncertain if this would work (I assume API_GET() is a macro expected to resolve at compile time). At this point you just have a struct device pointer, so you either have a field resolvable with the counter token, or you're left with a runtime lookup. I think that runtime overhead should be avoided as much as possible, and this change would add such overhead, even if small, to every single API call.

Copy link
Member

@gmarull gmarull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced that what this PR proposes goes in the right direction. The main reason I say that is because complexity is increased significantly (e.g. a new set of yaml files), but I don't see topics such as PM, dependencies, etc. being mentioned. Tests are nice, and show things compile, but, what happens in practice? What if this model is applied in-tree? Does it scale? For example, some driver classes expect device data/config to have a specific structure as a first field (gpio), what will happen with those if they need to share state with another driver class that has the same requirements?

Added a few comments as well.

* DEVICE_DT_PROPERTY() etc. are available.
*/

#define DEVICE_NEW_DEFINE(dev_name, drv_name, init_fn, pm_dev_ptr, data_ptr, \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's wrong to use NEW in APIs that are expected to replace the "old" ones.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any suggestions?

Comment on lines +9 to +25
compatible = "dummy_driver";
status = "okay";
};

dummy2: dummy2 {
compatible = "dummy_driver";
status = "okay";
};

dummy_low: dummy_low {
compatible = "dummy_driver_uart_low_prio";
status = "okay";
};

dummy_high: dummy_high {
compatible = "dummy_driver_uart_high_prio";
status = "okay";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use valid compatible strings

Copy link
Collaborator Author

@bjarki-andreasen bjarki-andreasen Sep 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember getting an error using "dummy,driver" for example since dummy is not a valid vendor. Or do you mean something else?

tests/kernel/device_new/src/main.c Show resolved Hide resolved
Comment on lines +4 to +15
properties:
vendor:
type: string
required: true

serial:
type: int
required: true

model:
type: string
default: DefaultDummyModel
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the purpose of all such properties? For example, why isn't DT model standard property used?

Comment on lines +5 to +7
CONFIG_HTS221=y
CONFIG_LIS2DH=y
CONFIG_BME280=y
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redundant

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is? where else is it defined?

@tbursztyka
Copy link
Collaborator

That breaks too many things (parent/child ordering, thus all that relies on it such as PM etc...)

This is incorrect. The point of the multi API device model is to be interchangeable with the current API wrappers, PM, etc. The multi API device model should not break anything, if it does, please detail what is breaking so it can be handled.

I said it in parenthesis. And you actually pointed out in your code: "The multi API device model doesn't support device handles".

I quickly stopped when I realized this approach would create a schism in how devices are handled.

The point of the multi api device model is to replace the current implementation.

If we have to replace the old way, it will need to be made all at once without a period of transition since during that transitions some devices will be able to follow old way, and the others won't and use another way.

The multi API device model fixes many issues, and as stated before, is a complete replacement for the current device model. It supporting multiple APIs is a necessary feature the current model lacks, but it also brings other features and cleanup:

It does not fix any issue. Don't get me wrong, I know your PR is making some memory savings, though not as much as I first thought. Anyway, I mean for real it's not fixing any issue: we already have multi-API device drivers: see cc2520, or the Ethernet drivers exposing ptp features. Current device driver model is multi-API ready.

From you PR I had the false assumption the first time I reviewed it that you would expose 1 device and 1+ APIs for it. But I was wrong, you actually do the same as currently: as many devices as there are APIs.

The only difference being that only the "main" device gets an init structure, not the others. Which is nice, it's always nice to gain a bit of ROM.

Take for example the shells: We can now create a list at compile time containing every device which supports a specific API by node_id, and get additional properties defined for that device using the new properties files (see zephyr/samples/multi_Api_shell/sensor_shell/src/main.c).

The example is not really interesting in itself. But I see some more use cases where one could shutoff all devices of a certain API domain, or it could as well help in gathering statistics, etc...

The devicetree can be cleaned up since it no longer needs to generate extern declarations for devices since this is now handled by the new properties files and scripts.

Before we fully move to DTS macros etc... we studied the possibility to generate device instances via python scripts instead. Anyway, decision was made to go for DTS macros. But you are now introducing external python script that replaces/extends partially some of the feature of these DTS macros. At least these are external, so I guess it's fine. Any thoughts on that @mbolivar-nordic @galak ?

now we would have devices existing in an array depending of their primary API, fully separated to the others

The point of the new API model is to allow it to coexist alongside the current device model, without altering it.

Sure it does coexist, but both do not know about each other. See my comment above. I don't think we should have such coexistence phase (unless the new get along with old and vice versa, and old features still work etc...)

The problem here is the runtime lookup of the API, as stated previously, this is not feasible. It would also break every Zephyr API, which is not realistic, and as stated earlier "The multi API device model should not break anything"

Yes because your approach is not a multi-API device model: each and every API has one unique device attached to it. Exactly as the current model.
I think it's ok to break things if the gain would be worthy.

I hope you will find the time to review the proposal in more detail, if not, i will go through it in the dev meeting thursday september 22.

Yes, I won't be able to join however, let me summarize my view on this:

  • existing and proposed models expose multi API per-hardware device the same way: one struct device for one API.
  • proposed model has a really tiny bit of ROM improvement as it tightens a struct init only to the main struct device, and not the others.
  • proposed model breaks the device handles, thus parent/child link. Bad for PM (at least, don't know any other feature that requires it). Any way to fix that?
  • proposed model offers better flexibility on handling group of devices from same API domain. Can be a nice feature, but is that sufficient to break the existing model?
  • proposed model makes the assumption that all devices are DTS based. Which is not the case, unfortunately. It renders the name property useless, but that's just a side effect of it. It would be already possible in the existing model to get rid of the name property if all device instances would be DTS based.

In conclusion: I still think this should be investigated more. Best of all would be to have one struct device, and 1+ API linked to it + the added feature of that PR.

@carlescufi
Copy link
Member

carlescufi commented Sep 20, 2022

API meeting:

  • This PR suggests multiple struct device but they share config and data from the "primary" struct device. Each "secondary" struct device implements one API, and links back to the config and data to the "primary" one.
  • @galak proposes a model where we could potentially have multiple class-specific instances (eg. struct spi_dev) pointing to a single struct device instance that represents the physical device
  • A requirement has been added to the Device Driver Model issue: Refining Zephyr's Device Driver Model - Take II #22941 (comment)

@carlescufi
Copy link
Member

static inline uint32_t z_impl_counter_get_frequency(const struct device *dev)
{
        const struct counter_driver_api *api = (struct counter_driver_api *)dev->api;

Would become:

static inline uint32_t z_impl_counter_get_frequency(const struct device *dev)
{
        const struct counter_driver_api *api = API_GET(dev, counter);

Obviously it would need to check if the returned pointer is not NULL to proceed, etc...
I guess it would require that all defined APIs are then inserted in a dedicated ROM section, and the extra APIs one or more device would expose as well. Overall, I am not entirely sure you can avoid runtime processing and using a bit of memory, but it would be quite limited anyway. At least way more than the current usage which is: creating a device per-exposed API.

I'm uncertain if this would work (I assume API_GET() is a macro expected to resolve at compile time). At this point you just have a struct device pointer, so you either have a field resolvable with the counter token, or you're left with a runtime lookup. I think that runtime overhead should be avoided as much as possible, and this change would add such overhead, even if small, to every single API call.

@tbursztyka can you weigh in here?

@tbursztyka
Copy link
Collaborator

I'm uncertain if this would work (I assume API_GET() is a macro expected to resolve at compile time). At this point you just have a struct device pointer, so you either have a field resolvable with the counter token, or you're left with a runtime lookup. I think that runtime overhead should be avoided as much as possible, and this change would add such overhead, even if small, to every single API call.

@tbursztyka can you weigh in here?

Well as I said it in that comment, it was wet finger thinking, I don't think getting a compile time resolved API finding where it was proposed is going to be feasible.

That said, it is still possible to go for 1 struct device and 1+ API(s), which would be the most logical architecture change:

  • It would not break any existing features and add the one this PR proposes as well (grouping API per-domain in dedicated sections that is, which enables manipulations around it. Note however that this feature is not critical really, but it's nice to get).
  • It would be the way that saves most ROM
  • Remember also we need something that works also with non DTS based instances. It would be great to get only DTS ones, but it's not the case (yet, at least).

There is, I think, 2 possible paths to for it:

  1. We run a device_get_api() inside each API calls.
  2. Or we change how we address device and api when it's about API calls (the user APIs that is, not the device driver API)

Let me detail a bit, for both solution we could get that change:

struct device {
         ...
         const struct device_api *api; /* in case of multiple API this would be the primary one.*/
         ...
};

where:

struct device_api {
        const struct device *dev;
        const void *api;
};

Obviously device_api structures are the one going into dedicated API domain sections.

Now device_api_get() could be something like (pseudo code):

if (dev->api) {
    if (api_on_spec(dev->api, api_spec))
         return dev->api->api; 
    else
       return loop_on_api_spec_section(dev, api_spec) /* where first match of api of relevant sections that owns dev should be returned. */
}

return NULL;

Obviously since it's fully run-time, the api_spec is not the same as it would in macros (probably a generated enum then).

As noted, the drawback is the overhead for each API calls. But this can be reduced: most device expose only one API. So we could have a marker like in device_state that tells about non-multi/multi API. And thus replace the api_on_spec() by !dev->state->multi_api. (But that's only valid as long as we require runtime resolution, see below)

Now on the 2 paths:

  1. device_api_get() would be runtime called inside each and every API calls. Adding the overhead... not such a huge one imo, but a little bit anyway. I am not in favor of it, since there is the second option below.
  2. we change ALL user API calls so it would take const struct device_api *dev_api, instead of the const struct device *dev. So we could do a dev_api = DEVICE_API_GET(node_id) for DTS based instances (so it would be compile time resolved), and for non DTS instance dev_api = device_api_get(). Which could get removed when we'll eventually get rid of all non-DTS based instance. (so device_api_get() would disappear as well). (note that only the user API would see such change, not the device driver API itself who does not care at all about the API pointer)

Conclusion

As stated previously, current model can already handle multiple API per-hw device. So there is nothing broken calling for a fix here. Now, the idea it to reduce the memory usage, and possibly the CPU usage as well. I don't think this PR goes far enough to solve this, moreover that it does break some features.

In short, either we make a bolder move to address these optimizations: 1 struct device, 1+ API per-struct device, possibly no name attribute once all instances are DTS based (and also no runtime api/dev resolution) etc... or we do nothing yet. It would be a waste of effort to do something in between in my opinion. (going for devices that share data/config pointers is an in-between solution for instance)

@tbursztyka
Copy link
Collaborator

I'll run some numbers. Taking into account all possibilities: system with 0 vs a few vs a lot of multi-API device and compare the possible gains on all solutions so far (as well as existing model). Let's see

@bjarki-andreasen
Copy link
Collaborator Author

bjarki-andreasen commented Sep 21, 2022

@tbursztyka The solution with runtime lookup of APIs in tables as stated was the second solution i proposed to the multi api device issue, check this PR #48817 and the comments to see what the conclusion was at that time :)

I even made the exact same argument as you have :D

Me:

Yeah, but realistically, the list is gonna be 1 to 5 entries long pr API, so with the features it brings and the simplicity of it I will argue it is an acceptable overhead.

str4t0m:

The main issue here is probably that the runtime overhead is not introduced for multi-iface devices, but for interfaces with several instances like GPIO ports.
STM32 devices have up to 11 ports, so it can be necessary to integrate over all of them just to set the state of a single pin.

This PR proposes the solution which was initially suggested by Kumar:

So I wonder about changing the model to have "typed" devices instead as a solution.

So having a struct location_device that embeds a struct device.

Than having something like DEVICE_DT_GET_SUBSYS(, ) so something like:
DEVICE_DT_GET(, location)

With the only difference being the type of device returned, his embeds a struct device alongside API, this PR overwrites api in struct device, otherwise identical approach.

Do keep in mind that the PM is the only feature being "broken" with this approach, because it is the only feature which has a nested context inside the device struct, which is a non scalable solution (basically just adding an extra api pointer to the struct device for all existing APIs...) It has already been pointed out that this is a bad implementation (See point 10 in #22941 (comment)), which probably materialized because there was no good way to implement multiple APIs at the time, PM should just be an API.

Please reevaluate the approach in this PR with this in mind. Devices are statically defined, APIs are available at compile time without any runtime lookup, and it doesn't break either the drivers or the APIs, just the already shady PM which is set to be revisited in any case.

Your conclusion that "there is nothing broken calling for a fix here" is simply not correct. Yours and my identical earlier solution, this PRs solution, and weeks of discussion regarding how to solve the issue of adding multiple APIs is proof that there is an issue.

@keith-zephyr
Copy link
Contributor

I see one immediate benefit from this proposal - adding an "api" or "class" attribute to the devicetree model and creating FOREACH iterators. This can be used by a large number of shell commands to qualify the device parameter. This includes at least the shell commands can, dac, eeprom, flash, fpga, gpio, i2c, and others.

Is it possible to make that addition to the devicetree independent of the larger multi API changes?

@bjarki-andreasen
Copy link
Collaborator Author

I see one immediate benefit from this proposal - adding an "api" or "class" attribute to the devicetree model and creating FOREACH iterators. This can be used by a large number of shell commands to qualify the device parameter. This includes at least the shell commands can, dac, eeprom, flash, fpga, gpio, i2c, and others.

Is it possible to make that addition to the devicetree independent of the larger multi API changes?

Absolutely #50441 is one solution which isolates the changes in this PR which create the API macros and define them, It has previously been proposed adding the api property to the bindings files and generating the macros in gen_defines.py as well, this was shut down at the time but might become the solution after all. This is currently being discussed in the dev meetings and API meetings if you want to contribute to the discussion :)

@tbursztyka
Copy link
Collaborator

tbursztyka commented Sep 23, 2022

Got some numbers

Context:

  1. I took the most optimistic context of all: all device instances are DTS based. (which is not the case in real, yet).
  2. I compared the current model, the proposed model of this PR and the one I exposed above (API are in a separate structure) without runtime lookup (thanks to previous point 1). I optimized it also by getting rid of the API pointer in the struct device since it's basically useless (again, due to 1).
  3. I also made a comparison when the PM pointer is removed and exist as an API, as @bjarki-trackunit reminded it.
  4. I took 3 use cases, as you'll see shortly, which illustrates how this would impact memory consumption. As there is no runtime lookup for any of the 3 compared models, I just calculated the ROM occupation (RAM one is unchanged).
  5. I assumed this is on a 32bit system
  6. I proposed 2 calculations: one which keeps the name attribute, and one that removes it as when all is DTS based, it becomes a useless attribute (no need to run device_get_bindings())

Uses case

To make things shorter:
Ds : Device with one Simple API
Dma(x): Device with x Muli-API

With PM pointer hold by struct device:

  • A) 15 Ds
  • B) 8 Ds + 2 Dma(2) + 1 Dma(3)
  • C) 2 Ds + 3 Dma(2) + 1 Dma(3) + 1 Dma(4)

Same case but PM becomes an API in itself (so the PM pointer gets out of struct device in all models):

  • A') 15 Dma(2)
  • B') 8 Dma(2) + 2 Dma(3) + 1 Dma(4)
  • C') 2 Dma(2) + 3 Dma(3) + 1 Dma(4) + 1 Dma(5)

Results

I calculated the amount of Device, init entries, and - at least for the last model - the api structure in total.
Results follow this format, in bytes: total (same minus name attribute) - delta vs current model (or, when different, delta vs current model without name attribute)

Current model

  • A) 540 (480) -> 0
  • B) 540 (480) -> 0
  • C) 540 (480) -> 0
  • A') 540 (480) -> 0 (PM cannot be detached)
  • B') 540 (480) -> 0 (PM cannot be detached)
  • C') 540 (480) -> 0 (PM cannot be detached)

Proposed model of this PR

  • A) 540 (480) -> 0
  • B) 508 (448) -> -32
  • C) 476 (416) -> -64
  • A') 840 (720) -> +300 (or +240)
  • B') 712 (608) -> +172 (or +128)
  • C') 584 (496) -> +44 (or +16)

Proposed model with split structure for APIs

  • A) 600 (540) -> +60 (or 0)
  • B) 472 (428) -> -8 (or -52)
  • C) 344 (316) -> -196 (or -164)
  • A') 660 (600) -> +120
  • B') 540 (496) -> 0 (or +16)
  • C') 364 (336) -> -176 (or -144)

Conclusion

Well, the current model is not that bad after all, at least for now. (more in the end)

Let me explain: in my opinion (I could be wrong, but the existing device drivers base we have tends to enforce this), multi API devices are not going to be the norm. I mean, we are not going to get a system that owns more multi API devices than simple API devices. So, to me, The C/C' use cases are unrealistic, I anyway wanted to show them for a good comparison.
The most realistic use cases are A/A' and seldomly B/B' (which have almost 50% multi API devices).

The model I proposed is not to be considered if the name attribute is kept: bloats too much for A/A', I did not calculated the tipping point where it starts to be an interesting one but B/B' are already - as noted above - quite optimistic about the presence of multi API devices. It would be interesting if only the name attribute goes do /dev/null, then the added ROM usage from struct api_dev per-struct device is nullified. And once we get mult API device, the gain is much better than this PR's model. Drawback being that all user API (not the device driver API) need to be changed to take the struct api_dev as first parameter instead of the struct device.
The plus: it does not break any feature, and adds the api sections features this PR does too.

Now, about the proposed model of this PR. At least letting the name attribute generates no negative impact. But the gain is very limited since only some of the struct init entry disappears. Fact that for each API we always need a struct device is limiting the gain. It gets really worse when PM is extracted from struct device and put as an API. Drawback being that it kills the parent/child device view (now only used by PM, but it's not limited to PM).
The plus: is does not need any changes on user API front.

Anyway to me, none of the proposed models should be considered, at least for now: mostly because not all device instances are DTS based. If we get rid of that limitation, then yes considering changes could be valuable. But until then, no.

@gmarull
Copy link
Member

gmarull commented Sep 23, 2022

Added to the issue, but adding a link here: #48934 (comment). It is a proposal to solve MFD with current device model, no changes required. I think we should go to the origin of the discussion and set clear requirements. Like "I have X device. I need to expose its multiple functionality through multiple Zephyr APIs using the device model". Then, is it possible to do this with what we have? If no, then propose changes. If yes, we may just document such usecase and move on. If yes but unhappy, propose a solution where we can balance pros/cons. As I said in the issue, I don't think it's worth breaking all device drivers/device model for a case that doesn't seem to be the common rule. More if we have a way to solve it with current model.

@bjarki-andreasen
Copy link
Collaborator Author

bjarki-andreasen commented Sep 23, 2022

@gmarull

Do keep in mind that the issue discussed here is the multiple APIs pr instance of a device, not multi functional devices. As far as I was aware we agreed to use that solution for multi functional devices months ago, the solution 1 in the issue is just the documentation for it #48932, which got sidelined because I moved my focus to this issue. The MFD PR is not contested to my knowledge.

As discussed at the time, the solution for MFDs can also be abused to provide multiple APIs to a single instance of a device by sharing the context, but that is essentially what the solution in this PR is doing, just a bit more efficiently (by not cluttering the devicetree, using the same node to get the different device instances).

@bjarki-andreasen
Copy link
Collaborator Author

bjarki-andreasen commented Sep 23, 2022

I will close the pull request here. It seems the solution to the issue of multiple APIs will be to use the initial multi functional device solution documented IO proposed in this PR #48932 to solve the issue without any changes to kernel or devicetree, at the cost of having APIs show up in the devicetree as device nodes.

This is an OK solution as it means we can have multiple instances of the same API pr device in the devicetree, at the cost of a little devicetree clutter, but at least it will be the same solution used whether it is truly a child device or simply a handle for an API.

I will move focus to using Kconfig to define which APIs a device driver supports, and maybe I can find a way to link the Kconfig file to a compatible to, make some magic happen :) Otherwise placing device types in specific sections will be the solution for iterating through devices of specific types for the APIs and subsystems which need that capability.

We can discuss this contemporary solution at the next API meeting

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants