Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert 73978 & 74096 due to regressions #75244

Merged
merged 18 commits into from
Jul 3, 2024

Conversation

aescolar
Copy link
Member

@aescolar aescolar commented Jul 1, 2024

PR #73978 introduced a regression.
Unfortunately this PR cannot be reverted without reverting also
#74096.
Let's revert both PRs to stabilize main again towards the 3.7 release.

For more details on the issue see
#75205

Note during the revert cea6bf5 conflicted, but it was merged taking the change in cea6bf5

Fixes #75205
Fixes #75364


Note the checkpatch error "Error: lib/posix/options/fs.c:89:WARNING: Violation to rule 21.2 (Should not used a reserved identifier) - rename" is the exact same as in the orinignal PR, as the diff contains the same lines just reverted.

aescolar added 18 commits July 1, 2024 12:05
PR zephyrproject-rtos#73978 introduced a regression.
Unfortunately this PR cannot be reverted without reverting also
Let's revert both PRs to stabilize main again towards the 3.7 release.

For more details on the issue see
zephyrproject-rtos#75205

This reverts commit e9b676a.

Signed-off-by: Alberto Escolar Piedras <alberto.escolar.piedras@nordicsemi.no>
This reverts commit b18cad1.

PR zephyrproject-rtos#73978 introduced a regression.
Unfortunately this PR cannot be reverted without reverting also
Let's revert both PRs to stabilize main again towards the 3.7 release.

For more details on the issue see
zephyrproject-rtos#75205

Signed-off-by: Alberto Escolar Piedras <alberto.escolar.piedras@nordicsemi.no>
…iour"

This reverts commit b10f1ca.

PR zephyrproject-rtos#73978 introduced a regression.
Unfortunately this PR cannot be reverted without reverting also
Let's revert both PRs to stabilize main again towards the 3.7 release.

For more details on the issue see
zephyrproject-rtos#75205

Signed-off-by: Alberto Escolar Piedras <alberto.escolar.piedras@nordicsemi.no>
…oup"

This reverts commit 308322e.

PR zephyrproject-rtos#73978 introduced a regression.
Unfortunately this PR cannot be reverted without reverting also
Let's revert both PRs to stabilize main again towards the 3.7 release.

For more details on the issue see
zephyrproject-rtos#75205

Signed-off-by: Alberto Escolar Piedras <alberto.escolar.piedras@nordicsemi.no>
…port"

This reverts commit b2243af.

PR zephyrproject-rtos#73978 introduced a regression.
Unfortunately this PR cannot be reverted without reverting also
Let's revert both PRs to stabilize main again towards the 3.7 release.

For more details on the issue see
zephyrproject-rtos#75205

Signed-off-by: Alberto Escolar Piedras <alberto.escolar.piedras@nordicsemi.no>
This reverts commit be086f1.

PR zephyrproject-rtos#73978 introduced a regression.
Unfortunately this PR cannot be reverted without reverting also
Let's revert both PRs to stabilize main again towards the 3.7 release.

For more details on the issue see
zephyrproject-rtos#75205

Signed-off-by: Alberto Escolar Piedras <alberto.escolar.piedras@nordicsemi.no>
This reverts commit b82b5b0.

PR zephyrproject-rtos#73978 introduced a regression.
Unfortunately this PR cannot be reverted without reverting also
Let's revert both PRs to stabilize main again towards the 3.7 release.

For more details on the issue see
zephyrproject-rtos#75205

Signed-off-by: Alberto Escolar Piedras <alberto.escolar.piedras@nordicsemi.no>
This reverts commit d9855da.

PR zephyrproject-rtos#73978 introduced a regression.
Unfortunately this PR cannot be reverted without reverting also
Let's revert both PRs to stabilize main again towards the 3.7 release.

For more details on the issue see
zephyrproject-rtos#75205

Signed-off-by: Alberto Escolar Piedras <alberto.escolar.piedras@nordicsemi.no>
This reverts commit 499a633.

PR zephyrproject-rtos#73978 introduced a regression.
Unfortunately this PR cannot be reverted without reverting also
Let's revert both PRs to stabilize main again towards the 3.7 release.

For more details on the issue see
zephyrproject-rtos#75205

Signed-off-by: Alberto Escolar Piedras <alberto.escolar.piedras@nordicsemi.no>
This reverts commit 48dff55.

PR zephyrproject-rtos#73978 introduced a regression.
Unfortunately this PR cannot be reverted without reverting also
Let's revert both PRs to stabilize main again towards the 3.7 release.

For more details on the issue see
zephyrproject-rtos#75205

Signed-off-by: Alberto Escolar Piedras <alberto.escolar.piedras@nordicsemi.no>
This reverts commit 581a0f5.

PR zephyrproject-rtos#73978 introduced a regression.
Unfortunately this PR cannot be reverted without reverting also
Let's revert both PRs to stabilize main again towards the 3.7 release.

For more details on the issue see
zephyrproject-rtos#75205

Signed-off-by: Alberto Escolar Piedras <alberto.escolar.piedras@nordicsemi.no>
This reverts commit 305ec62.

PR zephyrproject-rtos#73978 introduced a regression.
Unfortunately this PR cannot be reverted without reverting also
Let's revert both PRs to stabilize main again towards the 3.7 release.

For more details on the issue see
zephyrproject-rtos#75205

Signed-off-by: Alberto Escolar Piedras <alberto.escolar.piedras@nordicsemi.no>
This reverts commit 49ac191.

PR zephyrproject-rtos#73978 introduced a regression.
Unfortunately this PR cannot be reverted without reverting also
Let's revert both PRs to stabilize main again towards the 3.7 release.

For more details on the issue see
zephyrproject-rtos#75205

Signed-off-by: Alberto Escolar Piedras <alberto.escolar.piedras@nordicsemi.no>
This reverts commit 93973e2.

PR zephyrproject-rtos#73978 introduced a regression.
Unfortunately this PR cannot be reverted without reverting also
Let's revert both PRs to stabilize main again towards the 3.7 release.

For more details on the issue see
zephyrproject-rtos#75205

Signed-off-by: Alberto Escolar Piedras <alberto.escolar.piedras@nordicsemi.no>
This reverts commit 2d72966.

PR zephyrproject-rtos#73978 introduced a regression.
Unfortunately this PR cannot be reverted without reverting also
Let's revert both PRs to stabilize main again towards the 3.7 release.

For more details on the issue see
zephyrproject-rtos#75205

Signed-off-by: Alberto Escolar Piedras <alberto.escolar.piedras@nordicsemi.no>
This reverts commit 86b9293.

PR zephyrproject-rtos#73978 introduced a regression.
Unfortunately this PR cannot be reverted without reverting also
Let's revert both PRs to stabilize main again towards the 3.7 release.

For more details on the issue see
zephyrproject-rtos#75205

Signed-off-by: Alberto Escolar Piedras <alberto.escolar.piedras@nordicsemi.no>
This reverts commit 6f62292.

PR zephyrproject-rtos#73978 introduced a regression.
Unfortunately this PR cannot be reverted without reverting also
Let's revert both PRs to stabilize main again towards the 3.7 release.

For more details on the issue see
zephyrproject-rtos#75205

Signed-off-by: Alberto Escolar Piedras <alberto.escolar.piedras@nordicsemi.no>
This reverts commit a9a909c.

PR zephyrproject-rtos#73978 introduced a regression.
Unfortunately this PR cannot be reverted without reverting also
Let's revert both PRs to stabilize main again towards the 3.7 release.

For more details on the issue see
zephyrproject-rtos#75205

Signed-off-by: Alberto Escolar Piedras <alberto.escolar.piedras@nordicsemi.no>
@aescolar aescolar marked this pull request as ready for review July 1, 2024 10:19
@aescolar aescolar added the Hotfix Fix for issues blocking development, i.e. CI issues, tests failing in CI, etc. label Jul 1, 2024
@zephyrbot zephyrbot added area: C Library C Standard Library area: Tracing Tracing area: Networking area: Base OS Base OS Library (lib/os) area: LWM2M area: Sockets Networking sockets area: POSIX POSIX API Library labels Jul 1, 2024
@nashif nashif removed the DNM This PR should not be merged (Do Not Merge) label Jul 3, 2024
@nashif nashif dismissed cfriedt’s stale review July 3, 2024 12:15

The revert is needed to unblock development and testing, please resubmit with a proper fix and more testing, we will then reconsider that. If you need to escalate, please bring this to the TSC today.

@nashif nashif added the TSC Topics that need TSC discussion label Jul 3, 2024
@cfriedt
Copy link
Member

cfriedt commented Jul 3, 2024

The work is not lost anyhow. It can be resubmitted after when the issues have been ironed out.

@aescolar - that's reassuring, although re-submitting might require fixing bugs in other code as well, and that could take time. What is the deadline for resubmitting prior to tagging v3.7.0? Will there be some flexibility?

Care to list some change requests?
e.g.

  • deprecating NET_SOCKETS_POLL_MAX
  • fixing type usage in inappropriate places

Copy link
Member

@carlescufi carlescufi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to reverting this late in the game

@nashif nashif added the Hotfix Fix for issues blocking development, i.e. CI issues, tests failing in CI, etc. label Jul 3, 2024
@cfriedt
Copy link
Member

cfriedt commented Jul 3, 2024

I would be happy to remove my change request to this PR when there is a concrete answer to these questions.

#75244 (comment)

@rlubos has already messaged me and confirmed that the NET_SOCKETS_POLL_MAX change is relatively trivial. Thanks for that.

@cfriedt
Copy link
Member

cfriedt commented Jul 3, 2024

Addressed @nashif's remarks in private.

My position is that certainly if

  • manual runs of twister on tier0 platforms pass
  • manual runs of twister are run in multiple test areas (posix, c, net, etc)
  • manual runs of twister for all boards pass
  • tens or hundreds of thousands of test permutations pass
  • weeks of tests running in CI that all pass

Then there is something wrong in testing.

Please feel free to suggest a test plan, and as maintainer of the testing area and twister, you should be one of the more qualified individuals to make that recommendation.

@nashif
Copy link
Member

nashif commented Jul 3, 2024

Addressed @nashif's remarks in private.

There is no reason to address my remarks in private, I have seen then coming on discord but I will not read them. The place for doing this should be the PR and not private messages. Sorry.

@cfriedt
Copy link
Member

cfriedt commented Jul 3, 2024

@nashif

The TSC approved an exception, the TSC did not approve the PR uncondentially.

The many reviewers who approved the PRs approved them without a condition attached. The TSC also approved the PRs for 3.7.0.

Due diligence was done by myself as author, tester, all of the reviewers who reviewed these PRs as well as the TSC.

At any given time, any PR that is causing such issues would be reverted if no fix is provided in time. We are at a critical time right now and we can't be spending time on something that was not tested well enough.

"Not tested well enough"? Sorry, I don't have access to every board supported by Zephyr. However, when I run tests on platforms I have access to, and then build for all platforms with twister. Then I submit the PR to CI that runs for 4 hours each run, and run, and iterate.

Who knows how many hundreds of thousands of test permutations are executed.

Please define "not tested well enough". Maybe you could elaborate on what you feel proper testing is with concrete suggestions?

A fix has been provided in #75348 and is significantly less destructive than this PR.

It's quite compact. I invited @aescolar to find a test or sample that breaks. So far, nothing.

Spare time or not, Why does this make a difference? Doing something on your spare time is not really my or anyone's concern. We all dedicate spare time to the project, does not mean this needs to be treated differently.

It makes a difference because you, the Linux Foundation, and others should probably care care about maintainer work-life balance. It really should be a concern - for all of us. We have had community members away for medical reasons recently. A friend in the LF community that does similar work died of a heart attack a few days ago. We have community members from all walks of life, in all circumstances. 🌈

People are different. People can be treated differently. It's OK to do that 👍

Why did CI fail (and manual twister runs) fail to catch these issues? I think there will be some work required to actually understand why.

You need to tell us why.

I need write the tests and run them on platforms which i have access to, and then build them for all platforms that I can build them for and then submit PRs for review.

I'm not maintainer of our CI, testsuite, or tooling. I don't pretend to know every detail about why twister is not working as expected.

CI did exactly what it does for all PRs, the failing tests were built/run on the default platform, so it seems there is a gap in coverage, but that is a known "issue". Lets not blame CI, something like that needs also to be verified and tested outside of CI by the developer.

It was verified outside over and over by myself and others, it was verified in CI several times spanning weeks.

As mentioned yesterday in the release meeting, clearly, tests were getting skipped when they should not have been.

What twister invocation do you suggest I run? Because running on dozens of platforms with thousands upon thousands of permutations of tests really seems like sufficient testing to me.

I'm really not kidding; please, actually respond with the twister invocation / testcase.yaml you suggest that I use when testing manually, and I will use that diligently.

You are the maintainer of it so you should have a much better idea than me.

The changes will form the basis of several companies' products going forward with LTSv3.

Well, in that case effort should have been done to get this in the tree long before the feature freeze,

Again, there are only 24 hours in the day / 365 days in the year.

Patches welcome!

I've been cleaning up technical debt in the posix, C, C++, kernel, net, and other areas what - 4 years I think? If it could have been done overnight, I would have done that. However, it often takes weeks / months to get a single PR reviewed.

Please do not be so dismissive.

People have different amounts of bandwidth that they can spend working on Zephyr.

if that is important enough and more time and effort should have been put into making this happen outside of "spare time" allocated.

You seem to have a lot of insight into how my time should be managed 🤣

As mentioned last week in the TSC meeting, without these in LTS, it will be difficult to backport several bug-fixes in multiple areas.
Just some food for thought. I'm out for the day unfortunately.

to unblock development, we do not have the luxury to experiment with fixes that are "half baked". You can resubmit the changes again with a proper fix and more testing.

Half-baked? 🤣 ... Wow. It's a good thing I know you don't intend to come off as dismissive.

@nashif
Copy link
Member

nashif commented Jul 3, 2024

My position is that certainly if

  • manual runs of twister on tier0 platforms pass
  • manual runs of twister are run in multiple test areas (posix, c, net, etc)
  • manual runs of twister for all boards pass
  • tens or hundreds of thousands of test permutations pass
  • weeks of tests running in CI that all pass

Then there is something wrong in testing.

Please feel free to suggest a test plan, and as maintainer of the testing area and twister, you should be one of the more qualified individuals to make that recommendation.

your position is flawed. You are making this about testing. The fact is, testing DID catch the issue and a bug was reported., that why we have a multi-layered test approach.
Our CI is by design very selective depending on where CI runs. The PR being reverted introcuded something new that is not being caught by current test scenarios, you introduced the change, so probably you know better what new scenario is to be added, I do not know, and CI does know. This had nothing to do with the testing area, and if you know better, please make us all happy and make it better.

Whatever it is, the fact is that this PR is causing issues, whether caught by CI or not, the fix provided as commented by you...

this was the "path of least resistance" change that is non-destructive. The better change would be to simply not use inappropriate types in the base OS (i.e. simply change return values from ssize_t to long). However, that would have greater repercussions as several function signatures in and out of tree would need to be corrected.

so, there is a better change, hence my comment about half-baked fix above.

@nashif nashif removed TSC Topics that need TSC discussion labels Jul 3, 2024
@nashif nashif merged commit 846556f into zephyrproject-rtos:main Jul 3, 2024
48 of 53 checks passed
@dleach02
Copy link
Member

dleach02 commented Jul 3, 2024

The importance of this from @cfriedt perspective is a general concern I have and have voiced in another forum and that is the importance of this particular release. There are consumers of Zephyr that tend to stay on an LTS to the next LTS. Given our current guidelines, this branch is now frozen for the next 2+ years with only bug/security fixes being allowed. Any other release it is easy to say "we will get it in the next release"... not such an easy thing to hear on an LTS.

Problems/lessons I see:

  1. This change came in very late in this cycle. It was in the Release Plan roadmap for 3.7 (POSIX Roadmap for LTSv3 #51211) but I don't remember if we reviewed this particular ticket to determine readiness of LTS3.
  2. LTS updates are very constrained.
  3. CI missed the problems with this PR. Chris' own testing missed this as well. Yes, there are a lot of reviewers but we have limited time so we need CI to help identify any larger regression issues.

@aescolar aescolar deleted the revert_73978 branch July 3, 2024 19:34
@cfriedt
Copy link
Member

cfriedt commented Jul 3, 2024

Please feel free to suggest a test plan, and as maintainer of the testing area and twister, you should be one of the more qualified individuals to make that recommendation.

your position is flawed. You are making this about testing. The fact is, testing DID catch the issue and a bug was reported., that why we have a multi-layered test approach.

@nashif - sure after the fact

Our CI is by design very selective depending on where CI runs.

It seems testing is a little too selective.

this had nothing to do with the testing area, and if you know better

It really did. Why would it have passed countless iterations of manual testing, as well as many iterations of CI, even after multiple rebases?

So please, since you are the expert, tell me the manual testing to do so that I don't need to do waste twice as much of my time doing this again.

However, that would have greater repercussions as several function signatures in and out of tree would need to be corrected.

so, there is a better change

I can do that change too, I mean there is a clear path forward, I just want acceptance criteria to be agreed on in advance.

@kartben
Copy link
Collaborator

kartben commented Jul 3, 2024

The many reviewers who approved the PRs approved them without a condition attached. The TSC also approved the PRs for 3.7.0.

Just to clarify, the TSC approved for the PRs to be included in 3.7 granted they are approved as per the usual review process -- it didn't give a blanket approval to the PRs themeselves. The same goes for all the PRs/exceptions voted upon during the June 26 TSC meeting.

@andyross
Copy link
Contributor

andyross commented Jul 3, 2024

This seems kinda... energetic. Just to jump in with an opinion where we already have too many: what's the harm of waiting a few days to stabilize and merge the fix? Others are pointing out this stuff too, but just to summarize:

  • It's a comparatively large and reasonably useful feature
  • It's an LTS release that will otherwise not get it
  • The bug itself is just a boring header dependency cycle. It's not like we need to be bugging a hardware vendor for docs or working around toolchain UB or whatever. We all agree it's a well-understood/well-constrained fix, if perhaps tedious.
  • It's just a few days!

I mean, stuff breaks, people mess up. It happens, we've all been there. But I don't see much of an argument for ejecting the feature?

@nashif
Copy link
Member

nashif commented Jul 3, 2024

I mean, stuff breaks, people mess up. It happens, we've all been there. But I don't see much of an argument for ejecting the feature?

I already said above, the reverted PR can be resubmitted again with a proper fix that addresses all issues and it will be re-considered. The exception granted by the TSC still valid, but this needs to be done within the next few days.

@nashif
Copy link
Member

nashif commented Jul 3, 2024

your position is flawed. You are making this about testing. The fact is, testing DID catch the issue and a bug was reported., that why we have a multi-layered test approach.

@nashif - sure after the fact

yes, if you have a better way of doing this, please submit a proposal. Our CI has been operating this way for years.

  • PR (subset based on changes)-> Push (all default platforms and all tests) -> Weekly (everything is built on all platforms)
  • Hardware testing is conducted by vendors (that is where the first (build) issue was caught)
  • Features should be tested and verified mostly in emulation and what we define as default test platforms.

We can't build every sample/test variant on each of the 800 platforms we have, this does not scale and each PR will take days to complete, that is where developers are tasked with picking the right coverage and defining where tests need to run.

In this particular case the build issue can be reproduced on qemu_x86 by adding CONFIG_MBEDTLS=y, so maybe all those posix tests need to enable more features that end up depending on posix headers and features to increase the coverage, at least the building part.

Our CI is by design very selective depending on where CI runs.

It seems testing is a little too selective.

see above...

It really did. Why would it have passed countless iterations of manual testing, as well as many iterations of CI, even after multiple rebases?

because you did not modify the tests for the new feature and additions and change in header hierachy? Of course CI will pass if there is no test coverage for something that was just introduced.

So please, since you are the expert, tell me the manual testing to do so that I don't need to do waste twice as much of my time doing this again.

If you do not know how to test your own features, no expert (and I am not claiming to be one) will be able to help. But I want to be productive here, I already spent a few minutes to find that enabling CONFIG_MBEDTLS or any others "users" might expose such issues in the future and would be good to have as a scenario. The socket issue I do not know, someone else need to provide their input.

@aescolar
Copy link
Member Author

aescolar commented Jul 4, 2024

I only respond here because there is clearly some quite big misunderstandings driving some of the comments, and I hope that by clarifying them we can all move on and continue working on improving things.

Why did CI fail (and manual twister runs) fail to catch these issues?
...
I run tests on platforms I have access to, and then build for all platforms with twister

Fetching the PR branch and running for ex.:
twister -T tests/posix/common/ -s portability.posix.common.newlib --all
shows the same build errors as we saw in main.
Note that twister by default will not build for all platforms, but on the default test platforms (depending on extra settings on the tests' yaml). You need to pass it
--all to build for all (not filtered by the tests yaml).

Please note on top of what Anas said:

  • These tests are setting one integration platform. That limits what they will run on PRs to just that platform. ( https://docs.zephyrproject.org/latest/develop/test/twister.html#tests )
  • Ultimately it is the responsibility of a PR submitter to ensure the code is working and properly tested. And that includes that tests and CI will provide appropriate coverage. Reviews or the previous tests cannot catch it all.
  • Regarding components which have too many interdependencies with other areas: The only option is to try to reduce those interdependencies and a lot of testing.

I honestly feel that we should not take such a willy-nilly approach to reverting PRs .. A simple ping on discord would have helped.

This revert PR was submitted: 4 days after the first bug report on this issue, 2 days after @cfriedt queued a draft PR. That draft PR was failing in several ways in CI and from my review it did not seem like something we could merge for 3.7. Several more bug reports had been raised. And there had been multiple reports of PRs and developers blocked by these issues.
I had, before submitting this PR, triaged those bug reports, analyzed the extent of the issue and condensed all reports in one with all platforms and failing tests. I checked what commit introduced it, and like others realized the PR was not bisectable. I checked if an easy solution existed and there did not seem to be one.
I concluded queuing a PR with a revert was appropriate as that would ready it while @cfriedt continued investigating. There was nothing willy-nilly on this process, and it actually took a lot of time.
@cfriedt was set as assignee in this PR (so he was certainly pinged), and this revert PR was not rushed by anybody.

In spite of that, I came up with a hotfix that is certainly less destructive than this PR.

That 2nd hotfix was rejected in its own PR as it did not seem appropriate. Other issues which the original PR had introduced and which were unrelated to the hotfix had also been raised at that point.

the development, review, and approval.. process was followed

Please note the following (this is not to blame, but to provide pointers which hopefully will help us all):

  • As the PR was not bisectable identifying the issue and doing a smaller revert was not possible. With not bisectable I mean that commits 1..N of the PR did not even build without the rest of the PR. We should always aim at submitting PRs with commits which build in separation and are as isolated and applicable on their own as possible.
  • Several very related and (merge) conflicting PRs where submitted simultaneously. This means that if one PR is merged, the other need to be rebased. This slows reviews, increases the likelihood of errors, and CI not detecting issues. It also means that if one of those PRs is found to have issues which require a revert (like this one did) a revert becomes much more difficult and ends requiring reverting more code. It is best to avoid submitting PRs like this in parallel when possible.
  • These PRs were submitted too late in the release cycle given their scope and size. To ensure one has proper reviews and a good chance to merge the code, and minimize disruption to others, PRs like this should be sent much earlier.

Moreover as @kartben and @nashif already mentioned, all the TSC did is approve an exception for the rule which prevents PRs with new features being merged after rc1. The TSC did not vote on the content or merit of the PRs. Moreover the TSC chair warned about the risk of merging PRs like this so late, a risk which materialized. The approval from the TSC was only thru a lack of objections. I don't recall any TSC member expressing their interest in this PRs being merged.

The changes will form the basis of several companies' products going forward with LTSv3.

In that case those companies should have allocated resources to ensure this changes made it well in time for the release. The rc1 deadline has always been known.
This is not to say the maintainers should have worked faster. It is simply to say that not everything can make it to the release.
For this particular subsystem, all we can see relating this features is that the maintainer of this area has set his own targets for this release. That does not imply a commitment or need by anybody else.
There was many warnings about the incoming feature freeze, and a couple of discussions during release meetings with the maintainer about that plan for the subsystem and the (un)likelihood of all those changes making it in before all the way up to freeze.
Nonetheless, the release managers actively pinged the author and maintainer of these PRs when rebases were needed, and waited an extra day to allow more of these area PRs in before freeze.
There was a public and repeated complaint from the author after freeze about one of those PRs not having been merged. That PR was not meeting the merge criteria even 1 day after the deadline due to needed rebases.
More PRs were submitted after freeze. When they were brought to the TSC to be treated as exceptions some were not meeting the merge criteria even though it was requested that would be the case, and one was submitted just a few hours earlier.
In my eyes, this subsystem late PRs have received a lot of preferential treatment, taking a lot of attention and time from the release managers and TSC.

Please note all subsystems have features they would like to get in. And we'd all like as many features as possible in.
But eventually we need to make a release. We cannot simply continue forever advancing an unstable main.
Our releases are time based as was agreed long ago; not feature based. Those who have an interest for having a feature in need do the work in time.
In any case, there will be more releases.

Care to list some change requests?

@cfriedt At the very least the bugs that were reported and linked against the original PR must be fixed. Please also try to improve the test coverage.
But note it is not the role or responsibility of the release managers to tell you this. As release managers we are just trying to ensure that others can continue working on stabilizing for 3.7.

what's the harm of waiting a few days to stabilize and merge the fix?

@andyross Several other PRs and developers were blocked before merging this revert. We are just 1.5 weeks from final freeze. There was more than one distint regression introduced by this PR. There was no reasonable fix in sight even after several days after the issues were reported.
We cannot just block a lot of other people which is doing necessary fixes for the release, specially when a final deadline is very close, and even more, when summer holidays will prevent some from working before that.
And as @nashif said, PRs can be resubmitted again when fixed.

Please, let's just move on and continue fixing bugs. This whole issue has taken way too much time from too many people.

@cfriedt
Copy link
Member

cfriedt commented Jul 5, 2024

Several other PRs and developers were blocked before merging this revert. We are just 1.5 weeks from final freeze. There was more than one distint regression introduced by this PR. There was no reasonable fix in sight even after several days after the issues were reported.

@aescolar - there fully was a reasonable fix in sight (@rlubos even had a PR ready to go for the network config). You need to be willing to see the solution without dismissing it because it came from an engineer you dislike 🤷‍♂️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Base OS Base OS Library (lib/os) area: C Library C Standard Library area: LWM2M area: Networking area: POSIX POSIX API Library area: Sockets Networking sockets area: Tracing Tracing Hotfix Fix for issues blocking development, i.e. CI issues, tests failing in CI, etc.
Projects
Status: Done