scripts: add Zephyr inline code generation with Python #6762

b0661 · 2018-03-23T07:13:19Z

Overview

Extend Zephyr build system by inline code generation with Python snippets in source files.

The PR is a cut-out from PR #5799 for review but can be applied independently.

Motivation for or Use Case

The prime motivation is to be able to create pin controller drivers that can configure the pins on startup by the information from the device tree in a standardized way. Therefor there must be means to easily generate structured data from device tree information at build time.

Device tree information is extracted to generated_dts_boards.[h,conf]. Inline code generation with Python snippets is used to get this information and generate structured data to be compiled.

Design Details

Python snippets that are inlined in a source file are use as generators. The Zephyr tool to scan the source file for the Python snippets and process them is Codegen. Codegen is itself written in Python. The Codegen class uses code from Cog and extends Cog by Zephyr specific generator functions. Cog is distributed under the MIT license which should be in line with the Zephyr license policy.

The processing of source files is controlled by two CMake extension functions: zephyr_sources_codegen(..) and zephyr_sources_codegen_ifdef(..). During CMake configuration the source files are processed by Codegen and the generated source files are written to the CMake binary directory. The generated source files are added to the Zephyr sources.

The inlined Python snippets can contain any Python code, they are regular Python scripts. All Python snippets in a source file and all Python snippets of included template files are treated as a python script with a common set of global Python variables. Global data created in one snippet can be used in another snippet that is processed later on. This feature is e.g. used to customize included template files.

An inlined Python snippet can always access the codegen module. The codegen module encapsulates and provides all the functions to retrieve information (options, device tree properties, CMake variables, config properties) and to put out the generated code.

Alternatives

Instead of code generation by a python tool the C preprocessor and a macro library may be used. This was the first approach in PR #5799. Due to the complexity of the chosen macro library CHAOS_PP and the general policy to use Python as a tooling language there are objections against this approach.

Test Strategy

A working proof of concept for STM32F0x pinctrl, gpio and spi drivers is available in PR #5799. PR#5799 also includes the unit test of the pinctrl driver template which also uses inline code generation.

Signed-off-by: Bobby Noelte b0661n0e17e@gmail.com

codecov-io · 2018-03-23T07:37:36Z

Codecov Report

Merging #6762 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #6762   +/-   ##
=======================================
  Coverage   52.15%   52.15%           
=======================================
  Files         212      212           
  Lines       25916    25916           
  Branches     5582     5582           
=======================================
  Hits        13517    13517           
  Misses      10149    10149           
  Partials     2250     2250

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7983606...6bd9606. Read the comment docs.

SebastianBoe

Disapproving to block this from being merged until I have had time to review.

dbkinder · 2018-03-25T16:29:43Z

doc/subsystems/codegen/codegen.rst

@@ -0,0 +1,381 @@
+..
+    Copyright (c) 2004-2015 Ned Batchelder
+    SPDX-License-Identifier: MIT


@nashif Will we need to account for this use of MIT licensed material?

dbkinder · 2018-03-25T16:31:50Z

doc/subsystems/codegen/codegen.rst

+
+    /* This is my C file. */
+    ...
+    /**


Hmmm... will this conflict with doxygen processing (that also uses the /** comment convention.

This does not conflict with doxygen. In this example it is by purpose to include the python code snippet into doxygen output. The markers for code @code{.codegen}, .. are chosen in a way to support documentation by doxygen. /** is a marker for doxygen only and can be replaced by /* to leave the code snippet out of doxygen output..

OK. I'm curious how this will play with the document generating process, and if we'll need to add a codegen pass over source files before presenting them to doxygen.

how this will play with the document generating process

Currently none of the files is parsed by doxygen because the driver directory is not included in the doxygen documentation. make htmldocs works without any errors.

if we'll need to add a codegen pass over source

There is no need for an extra pass. Inline code is within comments and clearly marked with the doxygen special commands @code/@Endcode. Doxygen does not know the {.codegen} type. It ignores the unknown code type specification (see https://www.stack.nl/~dimitri/doxygen/manual/commands.html#cmdcode) and "just show the output as-is".

As an improvement {.codegen} could be made known to doxygen to be treated as python code.

Thanks for clarifying!

dbkinder · 2018-03-25T16:33:26Z

doc/subsystems/codegen/codegen.rst

+   :local:
+   :backlinks: top
+
+Description


Seems like we should have a problem statement first. What problem is codegen addressing and why is it needed?

Added problem statement

dbkinder · 2018-03-25T19:27:43Z

doc/subsystems/codegen/codegen.rst

+    Raises an exception with msg as the text. No traceback is included, so
+    that non-Python programmers using your code generators won’t be scared.
+
+codegen.inFile


for presentation consistency with the function documentation above you should use the attribute directive for these attributes:

.. attribute:: codegen.infile

tbursztyka · 2018-03-26T08:08:33Z

This seems really promising, code is clear and well documented

dbkinder · 2018-03-27T21:06:51Z

doc/subsystems/codegen/codegen.rst

+Inline Code Generation
+######################
+
+Inline code generation allows you to build code (basically any text) using


How about something like this:

For some repetitive or parameterized coding tasks, it's convenient to use a code generating tool to build C code fragments, instead of writing (or editing) that source code by hand. Such a tool can also access CMake build parameters to generate project source code tailored and tuned to a specific project configuration without you editing the source files directly. The Zephyr project supports a code generating tool that processes embedded Python "snippets" inlined in your C source files, creating source code that and can be used, for example, to generate source code that creates and fills data structures, adapts programming logic, creates configuration-specific code fragments, and more.

@dbkinder

How about something like this

Updated the codegen intro using your text.

dbkinder

Thanks!

SebastianBoe

It is not desirable to have code generators inline with C code.

I believe it will cause a myriad of problems, but I only know of two:

The build system slows down because it can not distinguish between an edit of
the code generator or the C source code and is forced to re-run the code-generator
unnecessarily.

Neither compilers nor static analyzer are able to work with the intermixed file
and must work on the generated file instead. This causes several problems,
for instance: the compiler errors use line numbers that do not reflect
the line numbers of the file that the programmer is working on.

b0661 · 2018-04-12T14:12:03Z

@SebastianBoe

The build system slows down because it can not distinguish between an edit of
the code generator or the C source code and is forced to re-run the code-generator
unnecessarily.

Sorry this is not true. Any code geneneration depends on the generation commands whether they are inlined or not. The build system always has to do something when you change the generation commands.

Neither compilers nor static analyzer are able to work with the intermixed file
and must work on the generated file instead.

Sorry this is not true either.

Static analyzer can work with the intermixed file as the generator code is embedded into comments which are not analyzed. But you should not do this to have the analyzer get the full picture. Even if you do not generate code with inline commands you would have to feed the generated code to the static analyzer to enable it to have the full view.

Compilers always must work with the generated file - otherwise you would not generate them. This is independent whether you embed the generation commands in the source code comments or have some other generation command files. In the end you have to feed the generated code to the compiler.

the compiler errors use line numbers that do not reflect the line numbers of the file that the programmer is working on.

Yes it uses the line numbers of the generated file which is available to the programmer. So no more guessing which shitty part of the code generation commands is the culprit. If your programmer can not map these two files you may have more problems than just line numbers.

My conclusion from your review

Zephyr is unable to fulfill the promise to be configured with device tree data. And there is no solution in sight.

SebastianBoe · 2018-04-12T14:36:55Z

Sorry this is not true. Any code geneneration depends on the generation commands whether they are inlined or not. The build system always has to do something when you change the generation commands.

Sorry for being terse, allow me to explain in detail. When the
generated code is 'included' by the preprocessor the build system can
be encoded with this dependency graph:

                      generator
                           +
                           |
                           v
                      generated code
                           +
                           |
                           v
C source code +-----> object file

In this graph, editing the C source code does not trigger a rebuild of
the generator, but when the generator and the C source code is
intermixed in the same file the build system will have to re-run the
generator even if it was just the source C code that was edited.

tbursztyka · 2018-04-12T14:37:13Z

@SebastianBoe I have to disagree with your comment. We really need something like this to solve device instance generation in relation with device tree. If there are any issue with the built system, let's fix the build system.

b0661 · 2018-04-12T15:03:07Z

In this graph, editing the C source code does not trigger a rebuild of
the generator, but when the generator and the C source code is
intermixed in the same file the build system will have to re-run the
generator even if it was just the source C code that was edited.

The inline code generation will most probably only be used for drivers.

During application development you have to generate the driver code one time. There is no difference in the calls to the generator compared to not inlined generation commands.

During a specific driver development this is different. But then you will want the generated code to be regenerated to be shure that it is in sync with your changes. So also here is no real benefit for your scenario.

SebastianBoe · 2018-04-12T15:28:38Z

Static analyzer can work with the intermixed file as the generator code is embedded into comments which are not analyzed. But you should not do this to have the analyzer get the full picture. Even if you do not generate code with inline commands you would have to feed the generated code to the static analyzer to enable it to have the full view.

Right, I thought I had identified a case where a static analyzer was able to analyze a C source file that used #include's but when comments were used it could not. But I can not reproduce this. So never mind the point about static analyzers.

SebastianBoe · 2018-04-12T15:46:25Z

Compilers always must work with the generated file - otherwise you would not generate them. This is independent whether you embed the generation commands in the source code comments or have some other generation command files. In the end you have to feed the generated code to the compiler.

Right. The difference between #including and inlining a code generator is that a compile error in the non-generated part will not reflect the source when inlining, but it will when #including because the compiler knows that an #include does not affect the line number. This is one case (of many i suspect) where you will get better feedback from the compiler if you separate the code generation from the source code with the preprocessor than if you intermix them.

erwango · 2018-04-12T15:56:59Z

Looks nice but still I'm not quite comfortable to have python snippets in C code, for many reasons, one being just "I don't like it" (I recognize this not quite elaborated but let us be honest and recognize this will be also a quite frequent remark we'll have to face, so let's be ahead of it)
Is there any way to provide a layer/module on top of codegen that would help making this reality less crude?
For instance, since most of the time we'll need to generate quite formatted and similar code,
call generic generation functions based on formatted comments in C code,
so that we get understandable comments in C files and generated parts as expected.
Since this would be based on codegen, interested people would still be able to use
"raw" codegen.

@b0661 , from your understanding of codegen, do you think this could be feasible?
(providing one has the time to do it)

b0661 · 2018-04-12T16:01:58Z

This is one case (of many i suspect) where you will get better feedback from the compiler if you separate the code generation from the source code with the preprocessor than if you intermix them.

This depends on what you regard your primary artefact. If you take the generated code file as the primary artefact you get accurate feedback of the compiler with no distraction by include hirarchies. You have a clear view of the generated code and of the generation commands.

It might be a problem to work benches that want to map the reported error lines to the source code of the original file. In the error listing the generated file name is listed which can or can not be distinguished from the original file. You can give hints to to the compiler about the line numbering in the generated file. I did not dive into that because I do not use a work bench.

SebastianBoe · 2018-04-12T16:03:39Z

Zephyr is unable to fulfill the promise to be configured with device tree data. And there is no solution in sight.

I believe this is an accurate assesment of todays situation.

About the alternatives, the original proposal of using the preprocessor, or at the very least using a supported source language (C, C preprocessor, C++), is not affected by these issues.

Also, retaining the codegen infrastructure pretty much as-is, but enforcing some guidelines for how to use it (no inlining) would also not be affected by these issues.

The guidelines could perhaps look something like this:

Don't do anything with the preprocessor that can be done in C.
Don't do anything with codegen that can be done with the preprocessor.
Don't do anything with CMake that can be done with codegen.

Generated code, whether generated by CPP, python, or other, should limit itself to just
the problem that cannot be solved in the source language, and nothing more.

Generators should limit themselves to generating declarative data, and avoid generating
C control logic whenever possible.

SebastianBoe · 2018-04-12T16:09:27Z

The inline code generation will most probably only be used for drivers.

I doubt it. The use-case that drivers have is shared by many. Essentially,
in embedded we need to do a lot of processing at build-time, because our
runtime environment is so constrained. This puts strain on the source language's
metalanguage / macrolanguage / preprocessing language, a strain that obviously
C is not able to deal with. But perhaps modern C++ (which is already a dependency)
can deal with better. E.g. populating pinctrl_stm32_gpio_data could perhaps be done with a constexpr.

dbkinder · 2018-04-12T16:11:11Z

Sounds like we may need to capture something here in our coding standards guidelines

b0661 · 2018-04-12T16:16:01Z

@erwango

Is there any way to provide a layer/module on top of codegen that would help making this reality less crude?

Codegen is already a layer on top of pure python abstracting away a lot of the complex interaction with the device tree info. It also provides access to CMake variables, ... Please have a look at the doumentation that is included in the PR.

For example you can write:

codegen.if_device_tree_controller_compatible('SPI', 'st,stm32-spi-fifo')

and code generation will stop if there is no SPI node of type 'st,stm32-spi-fifo' activated in the device tree.

Codegen can be expanded easily to support even more complex generation tasks by providing a simple command for them. It is Python.
One might complain about the naming of the commands but they already abstract away a lot of complexity.

b0661 · 2018-04-12T16:34:55Z

@SebastianBoe

C is not able to deal with. But perhaps modern C++ (which is already a dependency)
can deal with better.

Cog (the basis of codegen) was developed by a guy who wanted something that is easier to use than C++ macros and more flexible. In the end you can achieve the same goal by different solutions.

What I learned from the discussion about the C macro library originally used in #5799 is that the tool must fit to the users/ developers knowledge/ experience. I currently see not much C++ but a lot of Python.

b0661 · 2018-04-12T16:48:02Z

The guidelines could perhaps look something like this:
Don't do anything with the preprocessor that can be done in C.
Don't do anything with codegen that can be done with the preprocessor.
Don't do anything with CMake that can be done with codegen.
Generated code, whether generated by CPP, python, or other, should limit itself to just
the problem that cannot be solved in the source language, and nothing more.
Code generators should limit themselves to generating declarative data, and avoid generating
C control logic whenever possible.

This is a perfect summary of the experience I made when using codegen to implement the drivers in #5799. In the end it is a variant of "keep it as simple as possible".

pfalcon · 2018-10-16T16:49:16Z

@b0661 : Well, there're 241 pull requests open, all interesting to everyone. Some changes may take longer than the other to ripe and be ready for the mainline. For one, @erwango had a presentation about this at Linaro Connect, and it definitely looked interesting (though not without the issues to consider).

b0661 · 2018-10-17T12:59:21Z

@pfalcon, 'interest' is more about the activities in PR #9876 which is a prerequisite to codegen and was originally developed by me. @erwango and @galak used it as a playground for Linaro/ Intel interest/ideas and constantly denied my comments for focusing on what is important to me/ codegen. As I´m the original author of the EDTS database and DTS extraction to it I was expecting to at least be informed about breaking changes. This was not the case - there were such "important" changes as to prepend get_ to all getter functions without notice and in the end to remove functions that are essential to codegen usage also without notice or discussion. As I´m the only real world user of codegen it costs a lot of effort to remedy all these toying around changes and to make DTS extraction and codegen work for my project again. This is especially frustrating as a lot of additions I originally had in the code were removed on request of @erwango to not add something new which would question inclusion in mainline. Obviously there is a difference between additions by Linaro/ Intel and mine.

There were some comments in the mean time from @erwango and @galak. It seems to change now. I will see.

pfalcon · 2018-10-17T13:28:54Z

@b0661: I'm sorry to hear about possible uncoordinated changes. At the same time, the changes proposed are quite substantial and have many implications, so I wouldn't be surprised if some reduction of scope would be requested (meaning that you'd need to rework additional bits you need on top of a partial solution, which I agree is a pain, but is a part of the compromise maintainable for everyone). What I can say that even less brave ideas of mine were turned down/shunned, so, what to do, just continuing with ones which fit. So, I hope some compromise will be found, without people pulling to solve those issues, it's all bleak.

The Extended Device Tree Specification database collates device tree (dts) information with information taken from the device tree bindings. The EDTS database may be loaded from a json file, stored to a json file or extracted from the DTS files and the bindings yaml files. The commit integrates database development done in zephyrproject-rtos#9876 which was based on zephyrproject-rtos#6762. Major differences/ improvements to zephyrproject-rtos#9876 are: - the database now has an own extraction function that can be used instead of e.g. extract_dts_includes. The extraction function follows the design of the extract_dts_includes script and the additions that were done in zephyrproject-rtos#9876. It is restructured and several globals are now classes and objects. All functionality of extract_dts_includes related to the generation of defines is not part of the database extract function. It's sole purpose is to fill the database directly from the compiled DTS file. - the database got itś own directory 'edtsdb' to structure all files related to the database. - The EDTSDevice class from zephyrproject-rtos#9876 was enhanced to allow devices to access the database they are taken from. Mayor compatibility issues to zephyrproject-rtos#9876. - the consumer, provider API and the internal structure of the databse is copied from zephyrproject-rtos#9876. API should be fully compatible. - the EDTSDevice class is copied from zephyrproject-rtos#9876. The device API should be fully compatible except for the unique id feature. Device name can be used instead. Signed off from zephyrproject-rtos#9876 added to attribute for the changes done there and copied. Signed-off-by: Erwan Gouriou <erwan.gouriou@linaro.org> Signed-off-by: Kumar Gala <kumar.gala@linaro.org> Signed-off-by: Bobby Noelte <b0661n0e17e@gmail.com>

The Extended Device Tree Specification database collates device tree (dts) information with information taken from the device tree bindings. The EDTS database may be loaded from a json file, stored to a json file or extracted from the DTS files and the bindings yaml files. The database is integrated into cogeno as a module. The commit integrates database development done in zephyrproject-rtos#9876 which was based on zephyrproject-rtos#6762. Major differences/ improvements to zephyrproject-rtos#9876 are: - the database now has an own extraction function that can be used instead of e.g. extract_dts_includes. The extraction function follows the design of the extract_dts_includes script and the additions that were done in zephyrproject-rtos#9876. It is restructured and several globals are now classes and objects. All functionality of extract_dts_includes related to the generation of defines is not part of the database extract function. It's sole purpose is to fill the database directly from the compiled DTS file. - the database got itś own directory 'edtsdb' to structure all files related to the database. - The EDTSDevice class from zephyrproject-rtos#9876 was enhanced to allow devices to access the database they are taken from. Mayor compatibility issues to zephyrproject-rtos#9876. - The consumer, provider API and the internal structure of the database is copied from zephyrproject-rtos#9876. - API should be fully compatible. - Extraction of children was replaced as it broke the concept of the devices struct as a list of devices. The functions device.get_children() and device.get_parent() may be used to acess the parent<->child relation. - The EDTSDevice class is copied from zephyrproject-rtos#9876. - The device API should be compatible except for - the constructor which now needs the EDTS database and - the unique id feature. To ge an unique id the device.get_name() function can be used instead. Signed off from zephyrproject-rtos#9876 added to attribute for the changes done there and copied. Signed-off-by: Erwan Gouriou <erwan.gouriou@linaro.org> Signed-off-by: Kumar Gala <kumar.gala@linaro.org> Signed-off-by: Bobby Noelte <b0661n0e17e@gmail.com>

b0661 requested review from dbkinder, nashif and SebastianBoe as code owners March 23, 2018 07:13

b0661 mentioned this pull request Mar 23, 2018

drivers: pinctrl device interface (Linux-like) and STM32F0 driver #5799

Closed

IvanSanchez mentioned this pull request Mar 23, 2018

usb: allow UTF16LE characters in string descriptors #6649

Closed

SebastianBoe requested changes Mar 23, 2018

View reviewed changes

dbkinder suggested changes Mar 25, 2018

View reviewed changes

tbursztyka requested review from erwango and tbursztyka March 26, 2018 08:05

b0661 force-pushed the pr_pinctrl_codegen branch from b0cb303 to 46f9325 Compare March 27, 2018 19:49

dbkinder suggested changes Mar 27, 2018

View reviewed changes

b0661 force-pushed the pr_pinctrl_codegen branch from 46f9325 to 6303127 Compare April 4, 2018 09:12

dbkinder approved these changes Apr 4, 2018

View reviewed changes

SebastianBoe requested changes Apr 12, 2018

View reviewed changes

b0661 closed this Oct 16, 2018

b0661 deleted the pr_pinctrl_codegen branch October 16, 2018 16:49

b0661 mentioned this pull request Oct 26, 2018

scripts: inline code generation with cogeno (and edts) #10885

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scripts: add Zephyr inline code generation with Python #6762

scripts: add Zephyr inline code generation with Python #6762

b0661 commented Mar 23, 2018 •

edited

Loading

codecov-io commented Mar 23, 2018 •

edited

Loading

SebastianBoe left a comment

dbkinder Mar 25, 2018

dbkinder Mar 25, 2018

b0661 Mar 26, 2018 •

edited

Loading

dbkinder Mar 26, 2018

b0661 Mar 27, 2018 •

edited

Loading

dbkinder Mar 27, 2018

dbkinder Mar 25, 2018

b0661 Mar 27, 2018

dbkinder Mar 25, 2018

tbursztyka commented Mar 26, 2018

dbkinder Mar 27, 2018

b0661 Apr 4, 2018

dbkinder left a comment

SebastianBoe left a comment

b0661 commented Apr 12, 2018 •

edited

Loading

SebastianBoe commented Apr 12, 2018

tbursztyka commented Apr 12, 2018

b0661 commented Apr 12, 2018

SebastianBoe commented Apr 12, 2018

SebastianBoe commented Apr 12, 2018

erwango commented Apr 12, 2018

b0661 commented Apr 12, 2018

SebastianBoe commented Apr 12, 2018 •

edited

Loading

SebastianBoe commented Apr 12, 2018

dbkinder commented Apr 12, 2018

b0661 commented Apr 12, 2018

b0661 commented Apr 12, 2018

b0661 commented Apr 12, 2018

pfalcon commented Oct 16, 2018

b0661 commented Oct 17, 2018

pfalcon commented Oct 17, 2018

scripts: add Zephyr inline code generation with Python #6762

scripts: add Zephyr inline code generation with Python #6762

Conversation

b0661 commented Mar 23, 2018 • edited Loading

Overview

Motivation for or Use Case

Design Details

Alternatives

Test Strategy

codecov-io commented Mar 23, 2018 • edited Loading

Codecov Report

SebastianBoe left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

b0661 Mar 26, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

b0661 Mar 27, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tbursztyka commented Mar 26, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dbkinder left a comment

Choose a reason for hiding this comment

SebastianBoe left a comment

Choose a reason for hiding this comment

b0661 commented Apr 12, 2018 • edited Loading

My conclusion from your review

SebastianBoe commented Apr 12, 2018

tbursztyka commented Apr 12, 2018

b0661 commented Apr 12, 2018

SebastianBoe commented Apr 12, 2018

SebastianBoe commented Apr 12, 2018

erwango commented Apr 12, 2018

b0661 commented Apr 12, 2018

SebastianBoe commented Apr 12, 2018 • edited Loading

SebastianBoe commented Apr 12, 2018

dbkinder commented Apr 12, 2018

b0661 commented Apr 12, 2018

b0661 commented Apr 12, 2018

b0661 commented Apr 12, 2018

pfalcon commented Oct 16, 2018

b0661 commented Oct 17, 2018

pfalcon commented Oct 17, 2018

b0661 commented Mar 23, 2018 •

edited

Loading

codecov-io commented Mar 23, 2018 •

edited

Loading

b0661 Mar 26, 2018 •

edited

Loading

b0661 Mar 27, 2018 •

edited

Loading

b0661 commented Apr 12, 2018 •

edited

Loading

SebastianBoe commented Apr 12, 2018 •

edited

Loading