-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support NVidia HPC SDK (PGI) #2475
Conversation
tests/CMakeLists.txt
Outdated
if (WIN32) | ||
target_compile_options(${target} PRIVATE -Wc,--pending_instantiations=0 ) | ||
else() | ||
target_compile_options(${target} PRIVATE -Wc,--pending_instantiations=0 -D__GCC_ATOMIC_TEST_AND_SET_TRUEVAL=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these always required, or just for the tests? If always, maybe they should go into pybind11Common for pybind11 or some similar target.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-Wc,--pending_instantiations=0
is about the recursion level in templates. If I recall that correctly.
So, well, in some codes it will not be needed. But in the others the compilation will fail without this option.
Note that this option makes compilation significantly slower.
-D__GCC_ATOMIC_TEST_AND_SET_TRUEVAL=1
was deduced from experiments with old compilers on Linux.
Hi @henryiii , just to avoid misunderstanding, I will quote myself from the #2407 Best regards, |
e47625b
to
cfcfe76
Compare
.github/workflows/ci.yml
Outdated
- 7 # GCC 4.8 | ||
- 8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason you are building twice, with identical compilers, just with a different base system? You aren't using the host GCC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @henryiii ,
There are differences, e.g. Python and cmake. But in general, I would consider CI to be an "environment test" rather than a test of some specific compiler version. The later has little practical value, as for me.
Best regards,
Andrii
6918c91
to
0a82626
Compare
Have you tried running the tests? It seems like many of them error out, mostly with things like "SystemError: Exception escaped from default exception translator!". Looks like this is pretty broken without this. |
Looked into PGI a bit more and realized we are probably making a fundamental mistake. Will fix tomorrow. |
fdbbb8a
to
5a62da8
Compare
Not the tests with exceptions. I see no hope for simple fixes for them.
I'm intrigued. Do you mean that it is not possible to support their implementation of stl? Or the approach is wrong? |
I looked into PGI and it has complete C++11 support since 2015. So I then added I re-enabled all the tests, and now only one test fails:
|
There are still quite a few warnings - we manually silence quite a few warnings for GCC/Clang, which obviously doesn't scale very well. I'm not sure how to manually silence them for PGI in-source. |
Yes. That is why"more different compilers" is good for debug.
https://forums.developer.nvidia.com/t/disable-unreachable-warnings/133575 Unfortunately that is "all or nothing". Best regards, Andrii |
879c973
to
a490194
Compare
Added new CI config Added new trigger Changed CI workflow name Debug CI Debug CI Debug CI Debug CI Added flags fro PGI Disable Eigen Removed tests that fail Uncomment lines
fix: minor style cleanup tests: support skipping ci: remove and tighten a bit fix: try msvc workaround for pgic
a490194
to
20ea47f
Compare
I left the verbose in, wasn't too long. I've reordered that test and now pytest is no longer a FATAL_ERROR, so you can build without it. |
been there.
Also, grepping the nvidia installation shows that "exception_ptr" is located only in the libstdc++.ipl |
I think it's getting it from libstdc++ on your system (GCC 4.8), not from the install. |
I mean the definition. There is just one header with exception_ptr definition and that definition is behind very funny condition expressions. |
Could you, please try this code locally on CentOS7 with and w/o defines?
|
with -std=c++11 |
I was able to reproduce locally (in docker). Very weird, no idea why this is different compared to docker on GHA! Try this:
|
You are running natively, right (not containerized)? |
Yes. Always! (no, sometimes in VB). Ok, now more instanity. PGI 19.10 works perfectly. |
That compiles for me. Is "3" even a valid number for that? https://en.cppreference.com/w/c/atomic/ATOMIC_LOCK_FREE_consts |
Yes. Basically the <bits/exception_ptr.h> has condition bla-bla>2. |
I think PGI 20.7 doesn't support GCC 4.8 (even though it probably should if they are providing a repo for it, I bet it can be filed as a bug). The compiler is supposed to define that, and I think that's been dropped in 20.7. Except when running in GHA, in which case it magically doesn't drop that. That's about where I am. :S |
In short, since there is a command line workaround, I don't think we should hack the source. Probably report it as a bug with 20.7 and CentOS 7 and move on. |
We should probably link to this PR in somewhere, so if it breaks we can add the workaround to the CI. |
Container magic as is. Ok. Agree to your suggestions. Basically to refine them:
What is your opinion? Best regards, Andrii |
I would not add the changes to CMake. Maybe put a note in. But this is a user workaround that has nothing to do with pybind11 (it's needed to compile the example code from cppref above), and it's OS (probably stdlib) + compiler specific. Just a note in the changlog, perhaps, saying this is supported but on old systems, you may need the workaround in issue #2475. |
(I'll try this locally on CentOS 8 later tonight) |
Yes, so that should not appear in the code, but it is ok to put it in the test suite. I also think this is a pure bug, not an "unsupported version". |
We don't understand the details of the bug, what versions it affects, or why the workaround is needed/what it affects. Hardcoding it into the test suite hides that the workaround is needed. Let's leave it as a note. (Unless another maintainer feels otherwise). |
Ok. Note is ok. One ca also issue a message during cmake configuration in case pgi is detected. |
Once we understand it better, that's a possibility. |
When I did this by hand, I got, in CentOS 8:
(Again, this is a problem out of the box, not with pybind11). Looking here: https://forums.developer.nvidia.com/t/support-for-atomic-in-libstdc-missing/135403/4 It looks a bit to me like that it was rather designed for an HPC admin to create a siterc that has defines like this that describe the HPC hardware that you will be targeting (since you compile on a host node, not a compute node, in general, and they may not be homogenous). That also is likely why it uses environment-modules; this is really not intended for "normal" users to drop in and go. The docker container is set up properly, but the rest is "standard PGI" configuration that you have to do. (I could totally be wrong here. I also rather don't think they are targeting normal users yet, just HPC clusters) |
Let me know / CC me on the issue with NVIDIA! For now, this should now be better / compilable via flags or site configuration instead of source changes. |
The rpm repo seems to be a little unstable. If it causes build to fail occasionally in the long run, I'll move to the container for the main test suite and make the two CentOS builds weekly or something like that. |
In this PR:
Closes #2214 and closes #2407.