Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Situation with Zephyr POSIX subsystem and ARCH_POSIX (targets running on the underlying POSIX OS) #13054

Closed
pfalcon opened this issue Feb 5, 2019 · 16 comments
Assignees
Labels
area: native port Host native arch port (native_sim) area: POSIX POSIX API Library RFC Request For Comments: want input from the community

Comments

@pfalcon
Copy link
Contributor

pfalcon commented Feb 5, 2019

Subj is a cause of continued cases of failures and frustrating workarounds in Zephyr. Something needs to be done about it. Two obvious choices are:

  1. Say that POSIX subsys of Zephyr is not compatible with ARCH_POSIX. This is very obvious and right solution.
  2. Call for all-hands work from the maintainers of both parts, work hard on them, and then it's still unclear how far we'll get with that.
@pfalcon pfalcon added the bug The issue is a bug, or the PR is fixing a bug label Feb 5, 2019
@pfalcon
Copy link
Contributor Author

pfalcon commented Feb 5, 2019

cc: @nashif

@aescolar
Copy link
Member

aescolar commented Feb 5, 2019

@pfalcon : Having the same POSIX APIs both above and below Zephyr does certainly require some cheating to make it work.
But I hope it does not lead to too much frustration overall.
We need to be aware about how that is made to work (how those cheats work) and ensure we don't break them.
I think giving up on supporting the POSIX API on top of the POSIX arch is a bit too dramatic at this point.

Could you be a bit more specific about what are those continued failures and frustrations? (so we can address them)
If the problem is understanding how it is made to work today, we can fix that.
Is there any current broken functionality in this regard? (that warrants the bug Label?)

@aescolar aescolar added area: native port Host native arch port (native_sim) area: POSIX POSIX API Library labels Feb 5, 2019
@pfalcon
Copy link
Contributor Author

pfalcon commented Feb 5, 2019

So, the problem is that there's too much magic in each of the subsystems above. I'm working on de-magicize POSIX API subsystem, but I get blocked by native_posix failures on each step.

Here's an example case transcript:

  1. [WIP][DNM] lib: posix: Elaborate for wider yet more selective POSIX coverage #12984 fails with:
/home/buildslave/src/github.com/zephyrproject-rtos/zephyr/arch/posix/include/posix_cheats.h:36:32: error: unknown type name ‘zap_pthread_attr_t’
 #define pthread_attr_t         zap_pthread_attr_t
  1. Looking at that file, it says:
/* For the include/posix/pthreads.h provided with Zephyr,
 * in case somebody would use it, we rename all symbols here adding
 * some prefix, and we ensure this header is included
 */
  1. we rename all symbols here - Ok, rename to what? Grepping thru the codebase, I don't fine definition of "zap_pthread_attr_t". Is "zap" here a pun? I'd say it's rather unclear then, should be named something like "dontuse_pthread_attr_t". In either case, should be properly explained in the comment.
  2. and we ensure this header is included - Ok, how? Grepping thru the codebase, I see that the file is never included. Grepping more, I see that it's references from CMake*.txt. Again, should be better documented.
  3. See no docs in comments, I do git log posix_cheats.h to see if commit messages had more detailed information, and instead find tests/posix/fs/ segfaults randomly in POSIX arch #13011, which both not cc'ed to make (even though it fixes the commit I made), and causes random segfaults. All that only proves the points made about.

So, the point that there should be much better synchronization between CONFIG_POSIX_API and CONFIG_ARCH_POSIX developers, which first requires cleaning up and documenting better both. There're (as usual) currently no resources for that, so better for now would be just separate one from another until it's more stabilized and we can work on a specific task on integrating both together (to the extent possible, which is so far unknown and requires separate research task).

So, I'll be looking into blacklisting native_posix for tests/posix/ in the meantime, but comments are welcome.

@pfalcon
Copy link
Contributor Author

pfalcon commented Feb 5, 2019

But I hope it does not lead to too much frustration overall.

Well, It does, for quite some time now. Any change to the POSIX subsys leads to breakage of CI re: native_posix.

I think giving up on supporting the POSIX API on top of the POSIX arch is a bit too dramatic at this point.

It's not like "giving up", it's more like accepting that it's a separate task to integrate both, which so far severely interferes with the development of POSIX subsys itself. After last working streak on it in September or so, I tried to stay away from it. Now I need to work on it again, and the same breakage-on-each-step story repeats.

@pfalcon pfalcon added the RFC Request For Comments: want input from the community label Feb 5, 2019
@aescolar
Copy link
Member

aescolar commented Feb 5, 2019

Ok, you raise several points which require separate answers:

  • Regarding not having CC'ed you: Actually that was a silly mistake from my side. When fixing it, I had incorrectly thought it was a change by Ulf, so I had added him as reviewer instead of you.
  • The symbols which require renaming are those the linker may get confused about. And those are the functions we (may) use both from the POSIX arch and the POSIX API.
    The types do not really need to be renamed. You should be able to just remove those #defines from the posix_cheats.h header. (Having the types also renamed may help somebody inspect the compiler output to know what he is doing).
  • ZAP: Initially it came from "Zephyr APPlication", (before the POSIX API, only main() needed to be renamed. "zap_" is just as good as any unlikely random string, which hopefully does not cause collisions with other symbol names.
  • You cannot find anywhere in the codebase any other symbol matching zap_* for that reason. The idea is that, wherever you include posix_cheats.h (which should be everywhere automatically, except where it is on purpose excluded (POSIX arch private files), your function definition or function call (say pthread_attr_init( ) ) gets renamed by the C pre-processor (to zap_pthread_attr_init( )), so it does not collide with the host one on link time.
  • Apart from this, you need to be careful to include the Zephyr POSIX API headers and not the host ones.when compiling for native_posix. Say by including "posix/banana.h" instead of <banana.h>.
  • That comment at the beginning of posix_cheats needs to be updated. At the time that was written only the pthreads.h one had been added to Zephyr. Now it applies for all "POSIX compatible APIs in Zephyr"
  • Yes, we probably should better document this in the posix_cheats.h header itself (at least).

aescolar added a commit to aescolar/zephyr that referenced this issue Feb 5, 2019
There is no need to rename the POSIX types, so let's not do it to
simplify things
Also remove an unnecessary guard (POSIX_ARCH) to avoid mystifying
this any more than necessary

Related to zephyrproject-rtos#13054

Signed-off-by: Alberto Escolar Piedras <alpi@oticon.com>
@pfalcon
Copy link
Contributor Author

pfalcon commented Feb 6, 2019

So, while this ticket as about "POSIX subsystem vs ARCH_POSIX" matters, to give a wider perspective, currently POSIX subsystem conflicts with:

  • ARCH_POSIX, as pulling system headers.
  • libc, as having its own headers
  • some ports, as having their own headers (such ports also may conflict with libc)
  • Socket API, as being both part of POSIX, and having a requirement to be usable standalone, without it.

As you may imagine, that's too much conflicts to take care of. Of these, conflict with ARCH_POSIX is the most tangled one, least clear how it should be handled at all, and not much useful in practice. That's why it's brought up, and why temporary solution is proposed to just decouple POSIX subsys and ARCH_POSIX for now, and concentrate on its development per se, while dealing with remaining conflicts, which are themselves quite enough.

@aescolar
Copy link
Member

aescolar commented Feb 6, 2019

@pfalcon : With regard to giving up the support of the POSIX API on top of the POSIX ARCH:
Me or my company do not care. I guess that, whoever (whichever company is driving adding the POSIX API to Zephyr), would need to comment in their interest for such a combination.
If they do not care to have it, or consider it is not worthwhile the trouble to keep that combination working, and other main maintainers of Zephyr have no issue with dropping that support, then it could be just dropped.

@pfalcon
Copy link
Contributor Author

pfalcon commented Feb 6, 2019

With regard to giving up the support of the POSIX API on top of the POSIX ARCH:

I'd say you actually should welcome it, because any attempts to "fix" it would likely adversely affect your main usecases for it (simulation, etc.)

And again, the talk is about to temporary decouple them, until POSIX subsys matures enough. Then it can be revisited.

nashif pushed a commit that referenced this issue Feb 6, 2019
There is no need to rename the POSIX types, so let's not do it to
simplify things
Also remove an unnecessary guard (POSIX_ARCH) to avoid mystifying
this any more than necessary

Related to #13054

Signed-off-by: Alberto Escolar Piedras <alpi@oticon.com>
@aescolar aescolar removed their assignment Feb 11, 2019
@aescolar
Copy link
Member

I understand there is currently no action to be taken by me here, so I unassigned myself

@pfalcon
Copy link
Contributor Author

pfalcon commented Feb 11, 2019

Yes, the expected action is approving a PR with blacklisting of native_posix from POSIX tests, but I don't haste with it, for as long as there're other PRs to push from that work.

@nashif
Copy link
Member

nashif commented Feb 19, 2019

Me or my company do not care. I guess that, whoever (whichever company is driving adding the POSIX API to Zephyr), would need to comment in their interest for such a combination.

this is useful for many reason, the primary being able to test with something as simple as native_posix and also get coverage data for that subsystem.

Yes, the expected action is approving a PR with blacklisting of native_posix from POSIX tests, but I don't haste with it, for as long as there're other PRs to push from that work.

I fully understand the issue and frustration here, been that route myself before.

I propose to do it this way:

  • disable posix subsystem on posix arch for now, with a clear REVERTME
  • Fix the posix subsystem and clean it up without the hacks and get it working on other architectures
  • One we have a clean implementation and without the hacks, go back to enabling on native_posix and implement cleanly.

@nashif nashif added priority: medium Medium impact/importance bug and removed RFC Request For Comments: want input from the community labels Feb 19, 2019
@nashif
Copy link
Member

nashif commented Feb 19, 2019

it is either a bug or an RFC, not both...

@pfalcon pfalcon added RFC Request For Comments: want input from the community and removed bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug labels Feb 19, 2019
@pfalcon
Copy link
Contributor Author

pfalcon commented Feb 19, 2019

it is either a bug or an RFC, not both...

For the purpose of 1.14 release planning, it's RFC indeed.

@galak
Copy link
Collaborator

galak commented Feb 19, 2019

I've been digging into this today, not on purpose, but we need to decide where the header information is suppose to come from. We have a number of different combinations at play now:

  • Zephyr's min-libc
  • newlib
  • newlib-xtensa
  • x86_64
  • ARCH_posix native

@pfalcon
Copy link
Contributor Author

pfalcon commented Feb 19, 2019

but we need to decide where the header information is suppose to come from

Yes, that would be subject of a separate RFC, but I lean towards following option:

  1. As libc is unalienable part of POSIX, POSIX subsystem should depend on newlib. But then it only makes sense to reuse headers already supplied by newlib, instead of trying to override some of them, which never works, as real-world apps pull in wide array of headers, so include one non-overriden, and everything crashes.

The only alternative to that would be:

  1. Override, i.e. write from scratch/duplicate all newlib configs within Zephyr's include/posix/ .

But I assume that totally makes no sense, and such an option was already considered, and newlib is for a reason part of the SDK and not Zephyr itself.

But p.1 of course leaves open question of POSIX subsys support for platforms/toolchains without newlib. And actually, to get most of newlib, we likely would need to patch it, as currently it has some parts #ifdef'ed for Cygwin and RTEMS.

@nashif
Copy link
Member

nashif commented Apr 13, 2023

@aescolar can you please take a look and see if this is still relevant and close if not?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: native port Host native arch port (native_sim) area: POSIX POSIX API Library RFC Request For Comments: want input from the community
Projects
Status: Done
Development

No branches or pull requests

5 participants