-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HWLOC 2.4 and HWLOC 2.5 are binary incompatible #477
Comments
We had one possibly related report last year with hwloc 2.2 but we never understood what was going on. https://www.mail-archive.com/hwloc-users@lists.open-mpi.org/msg01565.html There shouldn't be any ABI break above since topology is just a pointer that is filled by init() and then used by load(). Things would break if init() from one hwloc version was used with load() from another version (but even this should work with hwloc 2.4 and 2.5). Can you check the return value of init() in case it failed? The wrong location reported 0x000000000000000d0 makes me think that the topology poiinter is NULL. |
I also performed the same experiments with the following HWLOC versions: 2.0, 2.1, 2.2, 2.3, 2.4, 2.5 And it seems like all these versions are incompatible with each other. But the issue reproduces only on Windows, it seems like on Linux all works fine (but I checked not all versions on Linux). |
I can reproduce the issue when building a program inside a MSVC project using hwloc 2.5 headers/libs [1], and then running it from either the hwloc 2.4 zipball bin directory (fails) or 2.5 (works). When building with cygwin, I don't see any issue. Unfortunately, I don't get anything useful from dumpbin, ldd, objdump, nm on the binary hence I can't check whether msvc hardwired some symbol address, etc. I am not sure I am doing this as expected. Here's my config: |
The case that you describe above is about forward binary compatibility. But my case is about backward binary compatibility. You should compile the application with HWLOC 2.4 but run it with HWLOC 2.5 DLL. I do it using the following way:
Also, it seems like this is entry points mapping issue. To prove it lets consider the following example: std::cout << "version is: " << hwloc_get_api_version() << std::endl;
hwloc_topology_t topology{ nullptr };
auto init_result = hwloc_topology_init(&topology);
std::cout << "init result: " << init_result << " topology now is: " << topology << std::endl;
auto load_result = hwloc_topology_load(topology);
std::cout << "load result: " << load_result << " topology now is: " << topology << std::endl; When it executes with the same DLL and headers versions, then the return code is 0 and the output is "version is: 132096". The assembly of the
But if I use the DLL for HWLOC 2.5 then it crashes during the
I don't know why it has such behavior but the assembler is different and it seems like we just call the incorrect address. So I can assume that the root cause of the issue is incorrect entry points mapping. |
It looks like cygwin fails too if I link with libhwloc.lib explicitly instead of passing -L../lib -lhwloc to gcc. Do you happen to know what libhwloc.lib is compared to the DLL? |
No, I don't. Could you please clarify the question? |
It looks like linking against libhwloc.lib raises the issue (what MSVC does, and cygwin can be forced to do), while linking against libhwloc.dll doesn't (cygwin uses that one by default). But I don't know what libhwloc.lib is, at least compared to libhwloc.dll hence I don't know why linking against the former would cause an error and not the latter. |
Hope I understand your question correctly, so I will try to provide some helpful information. HWLOC is the dynamic library, so in our case But it seems like I found some part of helpful information here. Hope that I don't provide the wrong information to you. |
Hello, I have news regarding this problem. It seems like the root cause is in the build system which is used to build HWLOC packages on Windows. As I understand the packages on Windows are built using some Linux subsystem like MinGW or SygWin, am I right? I do such an assumption because I have built several HWLOC versions using |
Yes prebuilt zipballs are built using MSYS2 and MinGW (we support cygwin but it's only used in the CI). But this may change in the future because the recently-added CMake support makes things muuuuuch easier (and it generates a hwloc.sln that seems much better, at least not outdated). I'll see if I can test the compatibility across MSVC-built libs. |
I can confirm I still get the issue with our prebuilt 2.4/2.5 libraries but no incompatibility between 2.4/2.5 built from hwloc.sln |
Did I understand correctly that you are planning to change the way of building the packages to solve this issue? |
I don't know yet. In the past, building was MSYS/MinGW only. This environment isn't easy to install, and building is very long. That's why we provided pre-built binaries in ZIPs. Then cygwin support was added and made things slightly easier but cygwin has some other drawbacks. CMake clearly changes things. Building is easy and fast, an many windows developers already have CMake installed. Things that may happen in the future, from most likely to unlikely
|
What version of hwloc are you using?
Which operating system and hardware are you running on?
Windows
Details of the problem
HWLOC 2.5 breaks the binary backward compatibility with regard to HWLOC 2.4.
To reproduce the issue, compile the simple HWLOC example using HWLOC 2.4:
And then run it with DLL from HWLOC 2.5.
You will see the following error:
The text was updated successfully, but these errors were encountered: