-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
maintainability: properly document the build process #723
Comments
We never edit generated code (
This is nothing particular to the Zephyr SDK. Upstream GCC does this, and so do all the other projects that use the GNU build system.
What I (and many others working with the GNU build system) do is to have the various common autoconf versions installed under their own prefix (e.g.
What part of it becomes unpredictable? I understand that it can be hard to follow at first for the people who are not familiar with the GNU tooling; but, it is a fairly standard and very predictable process.
There is nothing particular to Zephyr SDK about how this works. This is just how the GNU build system works. It is not pretty, it is extremely outdated and far from ideal; but, I am afraid nobody has time to overhaul the entire GCC codebase to a different build system ...
It certainly is not good to check in generated code into VCS in general; but, that is the standard process for the upstream GCC, and we are not going to deviate from that.
Once again, we do not edit any manual patching to the generated code (
Sorry, but we are not going to deviate from the upstream GCC process for this. |
Patching via git is ~equivalent to manually editing generated code. In particular, given that the output of autoconf can and does vary from one machine to another, not only based on the release of autoconf but also based on the presence of other tools that it uses, it's a bit of a slippery slope.
That's a fallacy. Most projects that use the GNU build system only generate the https://stackoverflow.com/a/3291181 My guess as to why GNU started doing this for GCC / Binutils was that enough previous tarball users complained that it wasn't there after they had switched to RCS. Generally, it's bad, but it has clearly snowballed well out of control.
^^ This should be documented somewhere. Actually, so should the entire process.
The last time I had to fix the build, it was because I had no way of predicting what was in the (private) AWS caches that are used by Zephyr's SDK builder. This was after painstakingly trying to reproduce what was done in CI for some time. Like weeks of effort due to what should have been easy to reproduce following some simple steps. If it's as predictable as you suggest, the please document the steps to manually reproduce builds.
There is domain specific knowledge (see your paragraph above). There is no need to overhaul anything. Autotools may not be pretty but they do work. However, currently, the documented process to build it is to make a PR to the Zephyr project. That effectively creates a black box (due to insufficient diagnostics / privileged access / private AWS caches).
Submitting patches to the configure script via git is equivalent to manually editing the generated code. It's bad practice in any case, whether upstream is doing it or not. It (at least) doubles the amount of work that needs to be done for changes to the SDK. Likely far more than 2x though, e.g. it took me maybe a couple of hours to edit the necessary .ac / .m4 files , and now it's going on several days of debugging the build (in CI as a black box). The last time I had to fix something that was broken in the SDK, it took me weeks. Eventually, I realized it was due to a deprecated release of zlib or something like that. The tarballs still existed in Zephyr's AWS cache though, so the build actually succeeded in unpredictable ways. I've been building autotools packages for close to 20 years. If it isn't obvious to me how to build the SDK, then how do you expect it to be obvious to a newcomer? Please document the manual build process, even if that is only for a single host / target. |
It is not. The script is checked in as is without any modifications/patches into git.
That is not true. The
That is arguable. Many projects still include the pre-generated configure script in tree for convenience as well as for "predictability" (because you do not want a bunch of bugs saying "build fails because every developer is using a different version of autoconf").
I am having a very hard time understanding how that works; but, either way, this is not a decision made by me or anyone else working on the Zephyr SDK -- it is the decision made by upstream GCC and, as a downstream project using GCC, sdk-ng is not going to deviate from that. If you have a problem with this, please email the GCC mailing list.
I do not understand why the AWS cache matters here. The source tarball cache is literally a directory with the tarballs downloaded by crosstool-ng (that is uploaded after a crosstool-ng run). If it does not exist locally, crosstool-ng will download everything from the scratch (i.e. it will be 100% locally reproducible as long as you have a working internet connection and none of the mirrors are broken -- see below).
I think you are mixing up CI and crosstool-ng. The CI itself is pretty much just a wrapper around crosstool-ng (and Yocto for building host tools). All the toolchain builds are done through crosstool-ng with the configs located inside the sdk-ng tree. Anyone familiar with crosstool-ng should be able to build the sdk-ng toolchains using the crosstool-ng toolchain config files (
If you are talking about the local crosstool-ng run failing to download the source tarballs from broken mirrors, that happens. In fact, that was one of the reasons why the cache was introduced in the first place, aside from the download speed. I am afraid no amount of documentation is going to fix a broken third party mirror ...
I think the missing link here is crosstool-ng. You may be familiar with how autotools work; but, you do not seem to be very familiar with crosstool-ng which sdk-ng uses to build toolchains -- if you are, you would have probably looked at the crosstool-ng output logs and manually invoked the gcc configure script with the exact command line that was used by crosstool-ng (yes, it is there in the logs); in which case, you do not have to go through the whole ordeal of waiting for CI (or local crosstool-ng run for that matter) to re-build everything from the scratch -- instead, you can just check out https://github.com/zephyrproject-rtos/gcc/ and directly build and debug GCC locally. I can try to document hints like these in the FAQ for those who are not familiar with crosstool-ng. I suppose this should lessen the amount of frustration for newcomers who do not have much experience working with embedded toolchains -- though, crosstool-ng is a fairly standard tool for generating embedded cross compiler toolchains; so, many people contributing to sdk-ng tend to already have working knowledge of it, which I suppose is why we have not had much problem in the past with third-party PRs to sdk-ng from many people ... As for documenting the whole process, I am afraid "take a look at what As for things seemingly randomly breaking, I am afraid no amount of documentation is going to ease the pain with that. Even I, as a maintainer of sdk-ng, sometimes spend days troubleshooting weird CI, crosstool-ng, gcc build system, binutils build system, third party mirrors, or whatever-other-crap-in-the-whole-chain breakages. |
I'm not sure if one or two lines in a FAQ is sufficient. It would be nice to know exact steps to build a toolchain. What is maybe obvious to you likely is not obvious to others. |
For reference, the following patches were required when building the 0.15.2 SDK manually. The CI build only worked because of deprecated packages (some with security vulnerabilities) being in the AWS cache. Not that I'm saying the documentation should include transient patches, but it would be nice if someone didn't need to extrapolate everything out to a bash script to make SDK builds easily reproducible. https://github.com/cfriedt/zephyr-sdk-builder 0000-crosstool-ng-update-to-zlib-1.2.13.patch |
The FAQ could be more comprehensive. Here we already have a few candidates from the above.
The problem is that it is not obvious to me what is not obvious to others, and it is very difficult to decide where the documentation should begin and end (e.g. should the documentation cover crosstool-ng 101, working with GCC, or even fixing the problem with a mirror that replaced an existing source tarball with the same exact filename/version number?). The only fundamental solution to that is to provide a very detailed documentation on the whole process; which, as I said above, will require significant amount of effort from a willing party -- I just do not have the bandwidth to write such a detailed documentation (or book). At least, the (somewhat implicit) expectation for sdk-ng contributors up until now has been that they have some experience working with embedded toolchains (and hence likely with crosstool-ng) in one way or another; and, if they had any questions specific to sdk-ng, I have answered them in the issues/PRs or privately in chat. |
As someone who maintains gcc-based toolchains for other projects (debian), and has been hacking autotools-based projects for well over 20 years, you're experiencing how people commonly used autotools 'back in the day'. You'd ship a generated configure script because that's what was expected. And that often meant that the generated script was checked into the VCS so that a bare check-out would exactly match the distributed tarballs. GCC is about as legacy a project as you will ever see, and they've stuck to this practice for a very long time. Most other autotools-based projects changed to delivering an 'autogen.sh' script and expected users to run that to get the required configure script. Heck, there's even 'autoreconf' these days for this job. However, GCC has very strict requirements about which autotools version you can use to generate the scripts; older or newer versions often simply fail because autotools doesn't guarantee backwards compatibility. Because of this, GCC is usually many versions behind the default autotools versions provided on most systems. For someone simply building the compiler, it's far more reliable to use the provided scripts than attempt to generate them locally. Yes, this places a huge burden on anyone hacking on the compiler; as @stephanosio says, you end up installing the precise autotools versions required for GCC so that the generated scripts match what's in the VCS. But, once you've got it set up, things are fairly straightforward, if a bit icky -- you hack the source code, re-build the generated scripts and commit both together. With luck, the diffs to the generated scripts are easy to manually verify. And, yes, there is a strong temptation for those doing a drive-by change to simply manually edit both the source scripts and the generated scripts. Which means that when you review patches to the autotools scripts, the best practice is to apply the source script patch and then verify that the generated script patch matches. If you've ever looked at the autotools scripts that gcc uses, you'll probably understand why there hasn't been any serious attempt to replace them with cmake or meson. For every horribly ugly little kludge, there's someone who depends upon the existing behavior to get their work done. |
@keith-packard - as someone who has maintained gcc-based toolchains for other projects for the last 20 years (Gentoo based, Yocto based), I'm fairly confident in labeling my experiences. Again, the point of this issue isn't trying to categorize the user. It's simply asking for better documentation and / or to improve the build process.
The source-based distros that I use typically regenerate generated code as part of the build process (mostly always). As a result, it is significantly easier to maintain the toolchain as the process is (again) linear - does not really hide any skeletons, etc. So whether or not a particular project checks in configure to revision control is mostly irrelevant to the people building it on a regular basis.
Yes, I've been told by both Stephanos and by our version of GCC that very specific autoconf versions need to be used. There are 2 problems there:
If only there were a sequence of documented instructions .. 🤔
Yes, which is why GCC ships generated code / checks it into version control. Most autotools projects only do this when creating a release tarball.
Exactly - so why not lessen that burden?
The latter suggestion was where this issue started. While it would make everyone's lives significantly easier, that was deemed too much work by @stephanosio, so now we are left with door number 1.
Again, there is this misconception that I haven't also been working with gcc fairly intimately for the last 20 years.. The only reason I've done the latter is because the suggested ways have not worked.
I'm perfectly comfortable with autotools and the autotools scripts in gcc and (again) have been working with autotools projects and gcc for 20 years. I am far more familiar with autotools than CMake or meson.
Sure... I guess my argument here is that life can be made significantly easier with proper documentation. Personally, when I contribute to a project, if the instructions are:
I'm going to be skeptical about it. Since it became significantly more complicated than that, and since I needed to manually set up a build environment to match what was in CI so that I could manually diagnose what the problem was, I thought it would be wise to ask for some documentation about how to manually set up a build environment to match what was in CI. It was essentially the same gripe I had when I needed to build the SDK manually last time. This correct resolution of this issue isn't about "maybe you've never contributed to an autotools project / gcc before", or "what reasons are there to not write proper documentation?" It's more along the lines of, "yes, there is a conventional build flow, and here is a page that describes that". With that, there is at least some starting point at an intuitive location for people, and a place to put knowledge that is otherwise maybe only just in @stephanosio ' head at the moment. |
Fair enough. I would suggest starting from first principles with some assumptions. Try to solve a much smaller version of the bigger problem. E.g. user has an Ubuntu Linux environment, e.g. build/host is x86_64, target is e.g. arm. Must install these .deb's, must manually build this version of that tool... Even documenting an existing container image to run that has some of these things built already?
Crossdev-ng has decent documentation already, so a link could be sufficient.
There are already links to GCC and they have docs already.
Well, that's one option. Maybe a detailed doc like that would be good overall, conceptually, but it's probably more work than necessary. But why not simply write down a sequence of exact steps (i.e. commands) necessary to build one toolchain? Ideally, snippets could even be factored-out to external scripts that can be used by both CI and by users. People can extrapolate from there. If someone wants to build the macos tools, some optional steps could be added later.
Please, feel free to continue making that assumption or not. It should be mostly irrelevant though. |
I am not really sure where you got the idea that it was "deemed too much work" to regenerate generated sources as part of the build process. All I said was "this is not a decision made by me or anyone else working on the Zephyr SDK -- it is the decision made by upstream GCC and, as a downstream project using GCC, sdk-ng is not going to deviate from that." It is just as much of a good practice to, as a downstream project, not make arbitrary decisions deviating from the way the upstream project does things. It really has nothing to do with how much work it would be to generate these generated sources a part of the build process.
First of all, this issue was initially opened for "regenerating generated sources [in GCC] as part of the build process," and later changed to "properly documenting the build process" -- these two are completely different and independent topics; so, let us try not to mix these up. Regarding "generating generated sources [in GCC] as part of the build process," this is a deviation from the upstream GCC development process and I have voiced negative opinions about it for the aforementioned reasons. Regarding "properly documenting the build process," I have already clarified in #723 (comment) that there is room for improvement (e.g. providing an FAQ); but, for a detailed "full" documentation, a willing party will need to dedicate a significant amount of their time for it to happen.
Which part of |
Sure, that could be a good starting point; though, keeping it up to date and making it actually work locally would be easier said than done. It should be quite doable targeting a very specific environment though, as you have mentioned.
Actually, this used to be the case (there used to be a script that was used by CI and could also be used locally to invoke crosstool-ng and Yocto build processes). That script was removed with the addition of macOS and Windows host support because the CI infrastructure and the build process were too closely coupled for this to be practical (and, at the time of writing the CI workflow for all three major host operating systems, I did not have a very good idea of what it would look like at the end). Now that At this time, I do not have any spare bandwidth to take on such an endeavour; but, if someone is willing to put their effort looking into it, I would be more than glad to review and provide feedback. |
This would be a good doc to link to Might be good to include the part about |
Probably would be good to mention |
Was "maintainability: do not check in generated code"
Currently, the version of gcc that is contained in the Zephyr SDK (https://github.com/zephyrproject-rtos/gcc) contains some generated code that is checked-in (e.g.
./configure
scripts).This requires an additional manual step of regenerating the
./configure
script fromconfigure.ac
(and many other support files) viaautoreconf
that may or many not be easily reproducible (e.g. the defaultautoreconf
in Ubuntu might not work, it might be necessary to get the latest from gnu).It's generally bad to check generated code into version control and generally worse to require either manually patching the generated code or some specialized knowledge about how to do it.
The main issue is sustainability; rather than the build process being predictable and linear, it becomes unpredictable, non-linear, and not really stainable. Without having specialized tools or domain specific knowledge, or a particular build machine or version, it makes it difficult for developers to make successful PRs to the SDK.
So I would like to just request that we do not check in generated code (in the form of
./configure
scripts and so on), and instead insert (or populate) a dedicated step in the build process to simply regenerate those scripts.The text was updated successfully, but these errors were encountered: