Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

linux-libnuma test failing in SLES 12 build root #213

Closed
dgloe opened this issue Oct 28, 2016 · 7 comments
Closed

linux-libnuma test failing in SLES 12 build root #213

dgloe opened this issue Oct 28, 2016 · 7 comments

Comments

@dgloe
Copy link

dgloe commented Oct 28, 2016

In a SLES 12 build root the linux-libnuma make check test is failing with this output:

linux-libnuma: linux-libnuma.c:70: main: Assertion `numa_bitmask_equal(bitmask, numa_all_nodes_ptr)' failed.
./wrapper.sh: line 14: 25529 Aborted (core dumped) "$@"

This is using hwloc 1.11.4.

@dgloe
Copy link
Author

dgloe commented Oct 28, 2016

When I try again with debugging information turned on I get a different error before linux-libnuma (moved to #214).

linux-libnuma: linux-libnuma.c:64: main: Assertion `hwloc_bitmap_isequal(set, set2)' failed.
./wrapper.sh: line 14: 21061 Aborted                 (core dumped) "$@"
#4  0x0000000000401d3f in main () at linux-libnuma.c:64
        topology = 0x604960
        set = 0x607ca0
        set2 = 0x606fc0
        nocpunomemnodeset = 0x6073e0
        nocpubutmemnodeset = 0x607370
        nomembutcpunodeset = 0x606d70
        nomembutcpucpuset = 0x608040
        node = <optimized out>
        bitmask = <optimized out>
        bitmask2 = <optimized out>
        mask = 1
        maxnode = 4204333
        i = <optimized out>
        __PRETTY_FUNCTION__ = "main"

@bgoglin
Copy link
Contributor

bgoglin commented Oct 28, 2016

get_last_cpu_location is a completely different function, I am ignoring it here.
Please add this patch:

diff --git a/tests/linux-libnuma.c b/tests/linux-libnuma.c
index 9750c6e..7a49bfb 100644
--- a/tests/linux-libnuma.c
+++ b/tests/linux-libnuma.c
@@ -67,6 +67,7 @@ int main(void)
   bitmask = hwloc_cpuset_to_linux_libnuma_bitmask(topology, set);
   /* numa_all_nodes_ptr contains NODES with no CPU but with memory */
   hwloc_bitmap_foreach_begin(i, nocpubutmemnodeset) { numa_bitmask_setbit(bitmask, i); } hwloc_bitmap_foreach_end();
+printf("%ld != %ld\n", *bitmask->maskp, *numa_all_nodes_ptr->maskp);
   assert(numa_bitmask_equal(bitmask, numa_all_nodes_ptr));
   numa_bitmask_free(bitmask);

then rebuild the test with make -C tests linux-libnuma, and run it manually with tests/linux-libnuma, and report the output.

Also please send the file foo.xml generated by lstopo foo.xml so that we know what your machine looks like.

@dgloe
Copy link
Author

dgloe commented Oct 28, 2016

When I run it again it fails earlier, at hwloc_bitmap_isequal(set, set2). So the additional printf doesn't help.

foo.xml.txt

@bgoglin
Copy link
Contributor

bgoglin commented Oct 28, 2016

Do you have NUMA enabled in your kernel? (grep CONFIG_NUMA /boot/config-*)
If not, we may want to just disable this test in that case.

Otherwise:

diff --git a/tests/linux-libnuma.c b/tests/linux-libnuma.c
index 9750c6e..8992e47 100644
--- a/tests/linux-libnuma.c
+++ b/tests/linux-libnuma.c
@@ -61,6 +61,14 @@ int main(void)
   hwloc_cpuset_from_linux_libnuma_bitmask(topology, set2, numa_all_nodes_ptr);
   /* numa_all_nodes_ptr doesn't contain NODES with CPU but no memory */
   hwloc_bitmap_or(set2, set2, nomembutcpucpuset);
+{
+  char *a, *b, *c;
+  hwloc_bitmap_asprintf(&a, set);
+  hwloc_bitmap_asprintf(&b, set2);
+  hwloc_bitmap_asprintf(&c, nomembutcpucpuset);
+  printf("nomembutcpucpuset %s\n", c);
+  printf("numa_all_nodes_ptr gave %s != %s\n", b, a);
+}
   assert(hwloc_bitmap_isequal(set, set2));
   hwloc_bitmap_free(set2);

@dgloe
Copy link
Author

dgloe commented Oct 28, 2016

The host system has

CONFIG_NUMA=y
CONFIG_NUMA_EMU=y

The chroot doesn't have a /boot/config-* file.

With /proc mounted it's getting past the first assert. The first printf you suggested is giving:

abuild@cfosbld03:tests $ ./linux-libnuma
1 != 3
linux-libnuma: linux-libnuma.c:72: main: Assertion `numa_bitmask_equal(bitmask, numa_all_nodes_ptr)' failed.
Aborted (core dumped)

@bgoglin
Copy link
Contributor

bgoglin commented Oct 28, 2016

Your lstopo output and the printf says there is a single core ad hyperthread in the chroot, while libnuma says there are two of them. Is the chroot somehow restricted to a single core and hyperthread while the machine actually has two of them?

@bgoglin
Copy link
Contributor

bgoglin commented Jul 12, 2017

I am going to ignore/close this issue since it looks like libnuma doesn't properly report things in that chroot, and I don't think we care about libnuma<->hwloc interoperability regression checks in such an build environment anyway.

@bgoglin bgoglin closed this as completed Jul 12, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants