-
Notifications
You must be signed in to change notification settings - Fork 868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault with Slurm (14.03.8), 2+ tasks, and OpenSees #235
Comments
Vader is an alternative, potentially faster, shared memory BTL. I'm not sure why it would be segfaulting on your machine, but it is probably missing some startup info when direct launched by srun instead of mpirun. For those cases, you can just put OMPI_MCA_btl=^vader in your environment to turn it off. We'll take a look and see if we can ID the problem. Have you tried 1.8.3 to see if the problem exists there too? On Oct 14, 2014, at 3:34 PM, Kilian Cavalotti notifications@github.com wrote:
|
@rhc54: thanks for the explanation. Setting OMPI_MCA_btl=^vader indeed prevents the segfault from happening. I've installed 1.8.3 and I'll work with the user to see if we can reproduce the problem with that version. I'll keep you posted. |
@hjelmn Are you aware of any direct-launch problems with Vader? |
Quick update: the segfault happens under the same circumstances with Open MPI 1.8.3, and produces the same exact backtrace. |
Are you using PMI-1 or PMI-2? On Oct 15, 2014, at 11:29 AM, Kilian Cavalotti notifications@github.com wrote:
|
Using the head of the 1.8 series (to be released soon as 1.8.4) and Slurm 2.5.4 (which means PMI-1), it works just fine. On Oct 15, 2014, at 11:39 AM, Ralph Castain rhc@open-mpi.org wrote:
|
@rhc54: I'm actually not sure. Slurm 14.03.8 provides both PMI-1 and PMI-2, and I compiled Open MPI with |
If I run -------------------------------------------------------------------------- PMI2 failed to initialize, returning an error code of 14. We cannot use PMI2 at this time, and your job will likely abort. -------------------------------------------------------------------------- So it looks indeed that it occurs in conjunction with PMI-2 |
Nope - Slurm defaults to using PMI-1 unless you explicitly tell it to use pmi2 on the srun cmd line. We just link so we have support for either one you choose to use. FWIW: I can direct launch an MPI "hello" and an MPI ring example just fine using PMI-1 or PMI-2 under what was released as Slurm 14 without problem, including letting vader run. On Oct 15, 2014, at 11:57 AM, Kilian Cavalotti notifications@github.com wrote:
|
@rhc54: Oh ok. I was under the impression that PMI-2 would be used over PMI-1 by reading the FAQ, especially this: "When the --with-pmi option is given, OMPI will automatically determine if PMI-2 support was built and use it in place of PMI-1" in https://www.open-mpi.org/faq/?category=slurm#slurm-direct-srun-mpi-apps I was also able to run a simple MPI hello world program without a segfault, so it seems to be somewhat related to the OpenSees application, except the stack trace really seems to point at vader, which confuses me. |
On Oct 15, 2014, at 1:19 PM, Kilian Cavalotti notifications@github.com wrote:
We can only decide that for build purposes - Slurm needs to know which one to actually use when it runs. So if you want Slurm to use PMI-2, you have to tell it to do so. Either on the srun cmd line, or you can set it as the default in the slurm.conf file.
|
@jsquyres vader runs cleanly for me launching directly with either srun and aprun Can you provide more details? A line number at a minimum would help finger the culprit. It is very very unlikely a vader issue but rather an application issue. The sendi function does nothing more than copy from the user's pointer to a shared memory region. The region is allocated in exactly the same way it is in btl/sm (except each process owns its own segment). |
i am able to reproduce the issue at this stage of the investigation :
|
PR #238 makes the master work again, i am now investigating the latest v1.8 |
i tried several things but i am unable to reproduce the issue with v1.8 @rhc54 could you please clarify for me pmi1 vs pmi2 ?
@kcgthb your hello world program must include at least one send/recv to ensure you get a chance to hit the issue |
On Oct 16, 2014, at 12:25 AM, Gilles Gouaillardet notifications@github.com wrote:
Yes - if you configure for pmi2, then we only support pmi2 operations
Ah - that would explain why my tests were passing. Thanks!
|
@ggouaillardet Thanks for tracking that down. I don't know why opal/shmem uses O_CREAT there. I will ask the original author of that code to see whats going on. I assumed segment_attach would fail if the file didn't exist disqualifying vader. btl/sm was only working because it doesn't send the filename through the modex. |
Nathan: could you please take a look at the proposed fix? I flagged you, Howard, and Elena on it for review as I'm not sure of the impact it will have on minimizing #keys pushed to pmi. On Oct 16, 2014, at 6:59 AM, Nathan Hjelm notifications@github.com wrote:
|
This will send less keys (from a pure pmi point of view) but all the keys will be sent at the last minute. If the goal was to send keys at regular interval, then slicing was fine but the logic to re-assemble the slices must be reviewed. rhc54 notifications@github.com wrote:
|
@ggouaillardet: Salut Gilles ! Toujours à Rokkasho ? @rhc54: regarding memory corruption, I tried this on a bunch of different machines, and the segfault behavior is consistent. |
On Oct 16, 2014, at 12:55 PM, Kilian Cavalotti notifications@github.com wrote:
Yes, I think Gilles hit the key point and fixed the problem. We are looking at the fix to ensure it doesn't have some unintended consequences, and then will push into 1.8.4 which will release soon.
|
@rhc54: that looks awesome, thanks. |
@rhc54 the point is hit is only for the master, i did not see such thing in the v1.8 branch @kcgthb can you tell me more about your config ? by default (configure && make install) slurm 14.03.8 provides only PMI1 support. when the crash occurs under gdb, can you run thanks Gilles PS |
I configured OMPI with the
Yes.
$ ldd /share/sw/free/openmpi/1.8.3/gcc/4.4/lib/openmpi/mca_ess_pmi.so linux-vdso.so.1 => (0x00007fffcf7ff000) libcr_run.so => /usr/lib64/libcr_run.so (0x00007f03a8884000) libmca_common_pmi.so.1 => /share/sw/free/openmpi/1.8.3/gcc/4.4/lib/libmca_common_pmi.so.1 (0x00007f03a8681000) libpmi2.so.0 => /usr/lib64/libpmi2.so.0 (0x00007f03a8469000) libpmi.so.0 => /usr/lib64/libpmi.so.0 (0x00007f03a8264000) libslurm.so.27 => /usr/lib64/libslurm.so.27 (0x00007f03a7f30000) libdl.so.2 => /lib64/libdl.so.2 (0x00007f03a7d2c000) libhwloc.so.5 => /usr/lib64/libhwloc.so.5 (0x00007f03a7afb000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f03a78dd000) librt.so.1 => /lib64/librt.so.1 (0x00007f03a76d5000) libm.so.6 => /lib64/libm.so.6 (0x00007f03a7451000) libutil.so.1 => /lib64/libutil.so.1 (0x00007f03a724d000) libc.so.6 => /lib64/libc.so.6 (0x00007f03a6eb9000) /lib64/ld-linux-x86-64.so.2 (0x0000003b6aa00000) libxml2.so.2 => /usr/lib64/libxml2.so.2 (0x00007f03a6b66000) libz.so.1 => /lib64/libz.so.1 (0x00007f03a6950000)
Mmmh, I don't see any file name: $ OMPI_MCA_shmem_base_verbose=255 srun ./send_recv [sh-5-35.local:29589] mca: base: components_register: registering shmem components [sh-5-35.local:29588] mca: base: components_register: registering shmem components [sh-5-35.local:29588] mca: base: components_register: found loaded component mmap [sh-5-35.local:29588] mca: base: components_register: component mmap register function successful [sh-5-35.local:29588] mca: base: components_register: found loaded component posix [sh-5-35.local:29589] mca: base: components_register: found loaded component mmap [sh-5-35.local:29588] mca: base: components_register: component posix register function successful [sh-5-35.local:29589] mca: base: components_register: component mmap register function successful [sh-5-35.local:29589] mca: base: components_register: found loaded component posix [sh-5-35.local:29589] mca: base: components_register: component posix register function successful [sh-5-35.local:29589] mca: base: components_register: found loaded component sysv [sh-5-35.local:29589] mca: base: components_register: component sysv register function successful [sh-5-35.local:29589] mca: base: components_open: opening shmem components [sh-5-35.local:29589] mca: base: components_open: found loaded component mmap [sh-5-35.local:29589] mca: base: components_open: component mmap open function successful [sh-5-35.local:29589] mca: base: components_open: found loaded component posix [sh-5-35.local:29589] mca: base: components_open: component posix open function successful [sh-5-35.local:29589] mca: base: components_open: found loaded component sysv [sh-5-35.local:29589] mca: base: components_open: component sysv open function successful [sh-5-35.local:29589] shmem: base: runtime_query: Auto-selecting shmem components [sh-5-35.local:29589] shmem: base: runtime_query: (shmem) Querying component (run-time) [mmap] [sh-5-35.local:29589] shmem: base: runtime_query: (shmem) Query of component [mmap] set priority to 50 [sh-5-35.local:29589] shmem: base: runtime_query: (shmem) Querying component (run-time) [posix] [sh-5-35.local:29589] shmem: base: runtime_query: (shmem) Query of component [posix] set priority to 40 [sh-5-35.local:29589] shmem: base: runtime_query: (shmem) Querying component (run-time) [sysv] [sh-5-35.local:29589] shmem: base: runtime_query: (shmem) Query of component [sysv] set priority to 30 [sh-5-35.local:29589] shmem: base: runtime_query: (shmem) Selected component [mmap] [sh-5-35.local:29589] mca: base: close: unloading component posix [sh-5-35.local:29588] mca: base: components_register: found loaded component sysv [sh-5-35.local:29588] mca: base: components_register: component sysv register function successful [sh-5-35.local:29588] mca: base: components_open: opening shmem components [sh-5-35.local:29589] mca: base: close: unloading component sysv [sh-5-35.local:29588] mca: base: components_open: found loaded component mmap [sh-5-35.local:29588] mca: base: components_open: component mmap open function successful [sh-5-35.local:29588] mca: base: components_open: found loaded component posix [sh-5-35.local:29588] mca: base: components_open: component posix open function successful [sh-5-35.local:29588] mca: base: components_open: found loaded component sysv [sh-5-35.local:29588] mca: base: components_open: component sysv open function successful [sh-5-35.local:29588] shmem: base: runtime_query: Auto-selecting shmem components [sh-5-35.local:29588] shmem: base: runtime_query: (shmem) Querying component (run-time) [mmap] [sh-5-35.local:29588] shmem: base: runtime_query: (shmem) Query of component [mmap] set priority to 50 [sh-5-35.local:29588] shmem: base: runtime_query: (shmem) Querying component (run-time) [posix] [sh-5-35.local:29588] shmem: base: runtime_query: (shmem) Query of component [posix] set priority to 40 [sh-5-35.local:29588] shmem: base: runtime_query: (shmem) Querying component (run-time) [sysv] [sh-5-35.local:29588] shmem: base: runtime_query: (shmem) Query of component [sysv] set priority to 30 [sh-5-35.local:29588] shmem: base: runtime_query: (shmem) Selected component [mmap] [sh-5-35.local:29588] mca: base: close: unloading component posix [sh-5-35.local:29588] mca: base: close: unloading component sysv srun: error: sh-5-35: task 0: Segmentation fault
They are 4.1MB: # ps aux | grep send kilian 30093 0.0 0.0 323444 4804 pts/0 Sl+ 09:10 0:00 srun gdb ./send_recv kilian 30094 0.0 0.0 43380 784 pts/0 S+ 09:10 0:00 srun gdb ./send_recv kilian 30108 0.4 0.0 215536 31900 ? S 09:10 0:00 /usr/bin/gdb ./send_recv kilian 30109 0.4 0.0 212068 30864 ? S 09:10 0:00 /usr/bin/gdb ./send_recv kilian 30117 0.1 0.0 323068 11080 ? TLl 09:10 0:00 /home/kilian/tests/mpi_send_recv/send_recv kilian 30118 98.5 0.0 323024 11076 ? RLl 09:10 1:01 /home/kilian/tests/mpi_send_recv/send_recv root 30433 0.0 0.0 105312 884 pts/1 S+ 09:11 0:00 grep send # pmap 30117 | grep vader_seg 00007fffe6d73000 4100K rw-s- /tmp/openmpi-sessions-kilian@sh-5-35_0/16356/3/0/vader_segment.sh-5-35.0 # pmap 30118 | grep vader_seg 00007fffeed74000 4100K rw-s- /tmp/openmpi-sessions-kilian@sh-5-35_0/16356/3/1/vader_segment.sh-5-35.1 PS: Yeah, I can't complain. :) |
@kcgthb your config looks good Could you please run I also found that some errors are not correctly reported by vader and i will fix that on monday |
@kcgthb you do not see any filenames because ompi was not configure'd with --enable-debug |
Doesn't seem to be any $ srun -N 1 -n 2 strace -f -e getpid ./send_recv Process 24058 attached Process 24059 attached Process 24060 attached Process 24061 attached Process 24064 attached Process 24065 attached [pid 24052] --- SIGSEGV (Segmentation fault) @ 0 (0) --- [pid 24052] --- SIGSEGV (Segmentation fault) @ 0 (0) --- [pid 24061] +++ killed by SIGSEGV +++ [pid 24059] +++ killed by SIGSEGV +++ [pid 24065] +++ killed by SIGSEGV +++ |
@kcgthb i cannot reproduce the issue with a similar environment here is attached a patch for openmpi 1.8.3 OMPI_MCA_shmem_base_verbose=255 srun ./send_recv
diff -ruN orig/openmpi-1.8.3/opal/mca/shmem/mmap/shmem_mmap_module.c openmpi-1.8.3/opal/mca/shmem/mmap/shmem_mmap_module.c
@@ -417,9 +417,19 @@
|
the patch can be downloaded at https://gist.github.com/ggouaillardet/007876c338ba26ca9d7b |
@ggouaillardet still got pretty much the same segfault with the patch applied (at least the backtrace looks the same): $ OMPI_MCA_shmem_base_verbose=255 srun ./send_recv [sh-5-35.local:42511] mca: base: components_register: registering shmem components [sh-5-35.local:42512] mca: base: components_register: registering shmem components [sh-5-35.local:42512] mca: base: components_register: found loaded component mmap [sh-5-35.local:42512] mca: base: components_register: component mmap register function successful [sh-5-35.local:42512] mca: base: components_register: found loaded component posix [sh-5-35.local:42512] mca: base: components_register: component posix register function successful [sh-5-35.local:42512] mca: base: components_register: found loaded component sysv [sh-5-35.local:42512] mca: base: components_register: component sysv register function successful [sh-5-35.local:42512] mca: base: components_open: opening shmem components [sh-5-35.local:42512] mca: base: components_open: found loaded component mmap [sh-5-35.local:42512] mca: base: components_open: component mmap open function successful [sh-5-35.local:42512] mca: base: components_open: found loaded component posix [sh-5-35.local:42512] mca: base: components_open: component posix open function successful [sh-5-35.local:42512] mca: base: components_open: found loaded component sysv [sh-5-35.local:42512] mca: base: components_open: component sysv open function successful [sh-5-35.local:42512] shmem: base: runtime_query: Auto-selecting shmem components [sh-5-35.local:42512] shmem: base: runtime_query: (shmem) Querying component (run-time) [mmap] [sh-5-35.local:42512] shmem: base: runtime_query: (shmem) Query of component [mmap] set priority to 50 [sh-5-35.local:42512] shmem: base: runtime_query: (shmem) Querying component (run-time) [posix] [sh-5-35.local:42512] shmem: base: runtime_query: (shmem) Query of component [posix] set priority to 40 [sh-5-35.local:42512] shmem: base: runtime_query: (shmem) Querying component (run-time) [sysv] [sh-5-35.local:42512] shmem: base: runtime_query: (shmem) Query of component [sysv] set priority to 30 [sh-5-35.local:42512] shmem: base: runtime_query: (shmem) Selected component [mmap] [sh-5-35.local:42511] mca: base: components_register: found loaded component mmap [sh-5-35.local:42512] mca: base: close: unloading component posix [sh-5-35.local:42512] mca: base: close: unloading component sysv [sh-5-35.local:42511] mca: base: components_register: component mmap register function successful [sh-5-35.local:42511] mca: base: components_register: found loaded component posix [sh-5-35.local:42511] mca: base: components_register: component posix register function successful [sh-5-35.local:42511] mca: base: components_register: found loaded component sysv [sh-5-35.local:42511] mca: base: components_register: component sysv register function successful [sh-5-35.local:42511] mca: base: components_open: opening shmem components [sh-5-35.local:42511] mca: base: components_open: found loaded component mmap [sh-5-35.local:42511] mca: base: components_open: component mmap open function successful [sh-5-35.local:42511] mca: base: components_open: found loaded component posix [sh-5-35.local:42511] mca: base: components_open: component posix open function successful [sh-5-35.local:42511] mca: base: components_open: found loaded component sysv [sh-5-35.local:42511] mca: base: components_open: component sysv open function successful [sh-5-35.local:42511] shmem: base: runtime_query: Auto-selecting shmem components [sh-5-35.local:42511] shmem: base: runtime_query: (shmem) Querying component (run-time) [mmap] [sh-5-35.local:42511] shmem: base: runtime_query: (shmem) Query of component [mmap] set priority to 50 [sh-5-35.local:42511] shmem: base: runtime_query: (shmem) Querying component (run-time) [posix] [sh-5-35.local:42511] shmem: base: runtime_query: (shmem) Query of component [posix] set priority to 40 [sh-5-35.local:42511] shmem: base: runtime_query: (shmem) Querying component (run-time) [sysv] [sh-5-35.local:42511] shmem: base: runtime_query: (shmem) Query of component [sysv] set priority to 30 [sh-5-35.local:42511] shmem: base: runtime_query: (shmem) Selected component [mmap] [sh-5-35.local:42511] mca: base: close: unloading component posix [sh-5-35.local:42511] mca: base: close: unloading component sysv [sh-5-35.local:42511] shmem: mmap: shmem_ds_resetting (id: 0, size: 0, name: ) [sh-5-35.local:42511] shmem: mmap: backing store base directory: /tmp/openmpi-sessions-kilian@sh-5-35_0/37920/4/shared_mem_pool.sh-5-35 [sh-5-35.local:42511] shmem: mmap: create successful (id: 23, size: 134217736, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/37920/4/shared_mem_pool.sh-5-35) [sh-5-35.local:42511] segment_attach: my_pid=42511 seg_cpid=42511 [sh-5-35.local:42511] segment_attach: /tmp/openmpi-sessions-kilian@sh-5-35_0/37920/4/shared_mem_pool.sh-5-35 [sh-5-35.local:42511] shmem: mmap: attach successful (id: 23, size: 134217736, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/37920/4/shared_mem_pool.sh-5-35) [sh-5-35.local:42511] shmem: mmap: ds_copy complete from: (id: 23, size: 134217736, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/37920/4/shared_mem_pool.sh-5-35 flags: 0x01) to: (id: 23, size: 134217736, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/37920/4/shared_mem_pool.sh-5-35 flags: 0x01) [sh-5-35.local:42511] shmem: mmap: shmem_ds_resetting (id: 0, size: 0, name: ) [sh-5-35.local:42511] shmem: mmap: backing store base directory: /tmp/openmpi-sessions-kilian@sh-5-35_0/37920/4/shared_mem_btl_module.sh-5-35 [sh-5-35.local:42511] shmem: mmap: create successful (id: 23, size: 140, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/37920/4/shared_mem_btl_module.sh-5-35) [sh-5-35.local:42511] segment_attach: my_pid=42511 seg_cpid=42511 [sh-5-35.local:42511] segment_attach: /tmp/openmpi-sessions-kilian@sh-5-35_0/37920/4/shared_mem_btl_module.sh-5-35 [sh-5-35.local:42511] shmem: mmap: attach successful (id: 23, size: 140, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/37920/4/shared_mem_btl_module.sh-5-35) [sh-5-35.local:42511] shmem: mmap: ds_copy complete from: (id: 23, size: 140, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/37920/4/shared_mem_btl_module.sh-5-35 flags: 0x01) to: (id: 23, size: 140, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/37920/4/shared_mem_btl_module.sh-5-35 flags: 0x01) [sh-5-35.local:42511] shmem: mmap: shmem_ds_resetting (id: 0, size: 0, name: ) [sh-5-35.local:42511] shmem: mmap: backing store base directory: /tmp/openmpi-sessions-kilian@sh-5-35_0/37920/4/0/vader_segment.sh-5-35.0 [sh-5-35.local:42511] shmem: mmap: create successful (id: 24, size: 4194312, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/37920/4/0/vader_segment.sh-5-35.0) [sh-5-35.local:42511] segment_attach: my_pid=42511 seg_cpid=42511 [sh-5-35.local:42511] segment_attach: /tmp/openmpi-sessions-kilian@sh-5-35_0/37920/4/0/vader_segment.sh-5-35.0 [sh-5-35.local:42511] shmem: mmap: attach successful (id: 24, size: 4194312, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/37920/4/0/vader_segment.sh-5-35.0) [sh-5-35.local:42512] shmem: mmap: shmem_ds_resetting (id: 0, size: 0, name: ) [sh-5-35.local:42512] shmem: mmap: backing store base directory: /tmp/openmpi-sessions-kilian@sh-5-35_0/37920/4/1/vader_segment.sh-5-35.1 [sh-5-35.local:42512] shmem: mmap: create successful (id: 24, size: 4194312, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/37920/4/1/vader_segment.sh-5-35.1) [sh-5-35.local:42512] segment_attach: my_pid=42512 seg_cpid=42512 [sh-5-35.local:42512] segment_attach: /tmp/openmpi-sessions-kilian@sh-5-35_0/37920/4/1/vader_segment.sh-5-35.1 [sh-5-35.local:42512] shmem: mmap: attach successful (id: 24, size: 4194312, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/37920/4/1/vader_segment.sh-5-35.1) [sh-5-35.local:42511] segment_attach: my_pid=42511 seg_cpid=42511 [sh-5-35.local:42511] segment_attach: /tmp/openmpi-sessions-kilian@sh-5-35_0/37920/4/shared_mem_pool.sh-5-35 [sh-5-35.local:42511] shmem: mmap: attach successful (id: 23, size: 134217736, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/37920/4/shared_mem_pool.sh-5-35) [sh-5-35.local:42511] shmem: mmap: ds_copy complete from: (id: 23, size: 134217736, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/37920/4/shared_mem_pool.sh-5-35 flags: 0x01) to: (id: 23, size: 134217736, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/37920/4/shared_mem_pool.sh-5-35 flags: 0x01) [sh-5-35.local:42511] shmem: mmap: unlinking(id: 23, size: 140, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/37920/4/shared_mem_btl_module.sh-5-35) [sh-5-35.local:42511] shmem: mmap: unlinking(id: 23, size: 134217736, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/37920/4/shared_mem_pool.sh-5-35) [sh-5-35.local:42512] segment_attach: my_pid=42512 seg_cpid=42511 [sh-5-35.local:42512] segment_attach: /tmp/openmpi-sessions-kilian@sh-5-35_0/37920/4/shared_mem_pool.sh-5-35 [sh-5-35.local:42512] shmem: mmap: attach successful (id: 24, size: 134217736, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/37920/4/shared_mem_pool.sh-5-35) [sh-5-35.local:42512] shmem: mmap: ds_copy complete from: (id: 24, size: 134217736, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/37920/4/shared_mem_pool.sh-5-35 flags: 0x01) to: (id: 24, size: 134217736, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/37920/4/shared_mem_pool.sh-5-35 flags: 0x01) [sh-5-35.local:42512] segment_attach: my_pid=42512 seg_cpid=42511 [sh-5-35.local:42512] segment_attach: /tmp/openmpi-sessions-kilian@sh-5-35_0/37920/4/shared_mem_btl_module.sh-5-35 [sh-5-35.local:42512] shmem: mmap: attach successful (id: 29, size: 140, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/37920/4/shared_mem_btl_module.sh-5-35) [sh-5-35.local:42512] shmem: mmap: ds_copy complete from: (id: 29, size: 140, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/37920/4/shared_mem_btl_module.sh-5-35 flags: 0x01) to: (id: 29, size: 140, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/37920/4/shared_mem_btl_module.sh-5-35 flags: 0x01) srun: error: sh-5-35: task 0: Segmentation fault Backtrace: #0 0x00007ffff20788ea in mca_btl_vader_sendi () from /share/sw/free/openmpi/1.8.3/gcc/4.4/lib/openmpi/mca_btl_vader.so #1 0x00007ffff1a4008f in mca_pml_ob1_send_inline () from /share/sw/free/openmpi/1.8.3/gcc/4.4/lib/openmpi/mca_pml_ob1.so #2 0x00007ffff1a41031 in mca_pml_ob1_send () from /share/sw/free/openmpi/1.8.3/gcc/4.4/lib/openmpi/mca_pml_ob1.so #3 0x00007ffff7b788ca in PMPI_Send () from /share/sw/free/openmpi/1.8.3/gcc/4.4/lib/libmpi.so.1 #4 0x00000000004009de in main () |
@kcgthb the patch did not contain any bug fix, the goal was to collect some more outputs. i was able to note that tasks do not mmap the vader_segment from the other task. could you please confirm i made a new patch (the previous one must be reverted before it can be applied) available at could you please apply it and run OMPI_MCA_btl_base_verbose=255 OMPI_MCA_shmem_base_verbose=255 OMPI_MCA_btl=sm,vader,self srun -N 1 -n 2 ./send_recv hopefully we will get a better picture of what went wrong |
You'll laugh: no. $ OMPI_MCA_btl=vader,self srun -N 1 -n 2 ./send_recv Process 1 received number -1 from process 0 And with your new patch (235.2), the results are: $ OMPI_MCA_btl_base_verbose=255 OMPI_MCA_shmem_base_verbose=255 OMPI_MCA_btl=sm,vader,self srun -N 1 -n 2 ./send_recv [sh-5-35.local:03853] mca: base: components_register: registering shmem components [sh-5-35.local:03852] mca: base: components_register: registering shmem components [sh-5-35.local:03852] mca: base: components_register: found loaded component mmap [sh-5-35.local:03852] mca: base: components_register: component mmap register function successful [sh-5-35.local:03852] mca: base: components_register: found loaded component posix [sh-5-35.local:03853] mca: base: components_register: found loaded component mmap [sh-5-35.local:03852] mca: base: components_register: component posix register function successful [sh-5-35.local:03852] mca: base: components_register: found loaded component sysv [sh-5-35.local:03853] mca: base: components_register: component mmap register function successful [sh-5-35.local:03853] mca: base: components_register: found loaded component posix [sh-5-35.local:03853] mca: base: components_register: component posix register function successful [sh-5-35.local:03853] mca: base: components_register: found loaded component sysv [sh-5-35.local:03853] mca: base: components_register: component sysv register function successful [sh-5-35.local:03853] mca: base: components_open: opening shmem components [sh-5-35.local:03853] mca: base: components_open: found loaded component mmap [sh-5-35.local:03853] mca: base: components_open: component mmap open function successful [sh-5-35.local:03853] mca: base: components_open: found loaded component posix [sh-5-35.local:03853] mca: base: components_open: component posix open function successful [sh-5-35.local:03853] mca: base: components_open: found loaded component sysv [sh-5-35.local:03853] mca: base: components_open: component sysv open function successful [sh-5-35.local:03853] shmem: base: runtime_query: Auto-selecting shmem components [sh-5-35.local:03853] shmem: base: runtime_query: (shmem) Querying component (run-time) [mmap] [sh-5-35.local:03853] shmem: base: runtime_query: (shmem) Query of component [mmap] set priority to 50 [sh-5-35.local:03853] shmem: base: runtime_query: (shmem) Querying component (run-time) [posix] [sh-5-35.local:03853] shmem: base: runtime_query: (shmem) Query of component [posix] set priority to 40 [sh-5-35.local:03853] shmem: base: runtime_query: (shmem) Querying component (run-time) [sysv] [sh-5-35.local:03853] shmem: base: runtime_query: (shmem) Query of component [sysv] set priority to 30 [sh-5-35.local:03853] shmem: base: runtime_query: (shmem) Selected component [mmap] [sh-5-35.local:03853] mca: base: close: unloading component posix [sh-5-35.local:03852] mca: base: components_register: component sysv register function successful [sh-5-35.local:03852] mca: base: components_open: opening shmem components [sh-5-35.local:03853] mca: base: close: unloading component sysv [sh-5-35.local:03852] mca: base: components_open: found loaded component mmap [sh-5-35.local:03852] mca: base: components_open: component mmap open function successful [sh-5-35.local:03852] mca: base: components_open: found loaded component posix [sh-5-35.local:03852] mca: base: components_open: component posix open function successful [sh-5-35.local:03852] mca: base: components_open: found loaded component sysv [sh-5-35.local:03852] mca: base: components_open: component sysv open function successful [sh-5-35.local:03852] shmem: base: runtime_query: Auto-selecting shmem components [sh-5-35.local:03852] shmem: base: runtime_query: (shmem) Querying component (run-time) [mmap] [sh-5-35.local:03852] shmem: base: runtime_query: (shmem) Query of component [mmap] set priority to 50 [sh-5-35.local:03852] shmem: base: runtime_query: (shmem) Querying component (run-time) [posix] [sh-5-35.local:03852] shmem: base: runtime_query: (shmem) Query of component [posix] set priority to 40 [sh-5-35.local:03852] shmem: base: runtime_query: (shmem) Querying component (run-time) [sysv] [sh-5-35.local:03852] shmem: base: runtime_query: (shmem) Query of component [sysv] set priority to 30 [sh-5-35.local:03852] shmem: base: runtime_query: (shmem) Selected component [mmap] [sh-5-35.local:03852] mca: base: close: unloading component posix [sh-5-35.local:03852] mca: base: close: unloading component sysv [sh-5-35.local:03852] mca: base: components_register: registering btl components [sh-5-35.local:03852] mca: base: components_register: found loaded component self [sh-5-35.local:03853] mca: base: components_register: registering btl components [sh-5-35.local:03852] mca: base: components_register: component self register function successful [sh-5-35.local:03852] mca: base: components_register: found loaded component sm [sh-5-35.local:03852] mca: base: components_register: component sm register function successful [sh-5-35.local:03852] mca: base: components_register: found loaded component vader [sh-5-35.local:03852] mca: base: components_register: component vader register function successful [sh-5-35.local:03852] mca: base: components_open: opening btl components [sh-5-35.local:03852] mca: base: components_open: found loaded component self [sh-5-35.local:03852] mca: base: components_open: component self open function successful [sh-5-35.local:03852] mca: base: components_open: found loaded component sm [sh-5-35.local:03852] mca: base: components_open: component sm open function successful [sh-5-35.local:03852] mca: base: components_open: found loaded component vader [sh-5-35.local:03852] mca: base: components_open: component vader open function successful [sh-5-35.local:03853] mca: base: components_register: found loaded component self [sh-5-35.local:03853] mca: base: components_register: component self register function successful [sh-5-35.local:03853] mca: base: components_register: found loaded component sm [sh-5-35.local:03853] mca: base: components_register: component sm register function successful [sh-5-35.local:03853] mca: base: components_register: found loaded component vader [sh-5-35.local:03853] mca: base: components_register: component vader register function successful [sh-5-35.local:03853] mca: base: components_open: opening btl components [sh-5-35.local:03853] mca: base: components_open: found loaded component self [sh-5-35.local:03853] mca: base: components_open: component self open function successful [sh-5-35.local:03853] mca: base: components_open: found loaded component sm [sh-5-35.local:03853] mca: base: components_open: component sm open function successful [sh-5-35.local:03853] mca: base: components_open: found loaded component vader [sh-5-35.local:03853] mca: base: components_open: component vader open function successful [sh-5-35.local:03853] select: initializing btl component self [sh-5-35.local:03852] select: initializing btl component self [sh-5-35.local:03852] select: init of component self returned success [sh-5-35.local:03852] select: initializing btl component sm [sh-5-35.local:03852] shmem: mmap: shmem_ds_resetting (id: 0, size: 0, name: ) [sh-5-35.local:03852] shmem: mmap: backing store base directory: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/shared_mem_pool.sh-5-35 [sh-5-35.local:03853] select: init of component self returned success [sh-5-35.local:03853] select: initializing btl component sm [sh-5-35.local:03853] select: init of component sm returned success [sh-5-35.local:03853] select: initializing btl component vader [sh-5-35.local:03853] shmem: mmap: shmem_ds_resetting (id: 0, size: 0, name: ) [sh-5-35.local:03853] shmem: mmap: backing store base directory: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/1/vader_segment.sh-5-35.1 [sh-5-35.local:03852] shmem: mmap: create successful (id: 13, size: 134217736, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/shared_mem_pool.sh-5-35) [sh-5-35.local:03853] shmem: mmap: create successful (id: 12, size: 4194312, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/1/vader_segment.sh-5-35.1) [sh-5-35.local:03853] segment_attach: my_pid=3853 seg_cpid=3853 [sh-5-35.local:03853] segment_attach: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/1/vader_segment.sh-5-35.1 [sh-5-35.local:03853] shmem: mmap: attach successful (id: 12, size: 4194312, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/1/vader_segment.sh-5-35.1) [sh-5-35.local:03853] select: init of component vader returned success [sh-5-35.local:03852] segment_attach: my_pid=3852 seg_cpid=3852 [sh-5-35.local:03852] segment_attach: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/shared_mem_pool.sh-5-35 [sh-5-35.local:03852] shmem: mmap: attach successful (id: 13, size: 134217736, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/shared_mem_pool.sh-5-35) [sh-5-35.local:03852] shmem: mmap: ds_copy complete from: (id: 13, size: 134217736, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/shared_mem_pool.sh-5-35 flags: 0x01) to: (id: 13, size: 134217736, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/shared_mem_pool.sh-5-35 flags: 0x01) [sh-5-35.local:03852] shmem: mmap: shmem_ds_resetting (id: 0, size: 0, name: ) [sh-5-35.local:03852] shmem: mmap: backing store base directory: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/shared_mem_btl_module.sh-5-35 [sh-5-35.local:03852] shmem: mmap: create successful (id: 13, size: 140, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/shared_mem_btl_module.sh-5-35) [sh-5-35.local:03852] segment_attach: my_pid=3852 seg_cpid=3852 [sh-5-35.local:03852] segment_attach: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/shared_mem_btl_module.sh-5-35 [sh-5-35.local:03852] shmem: mmap: attach successful (id: 13, size: 140, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/shared_mem_btl_module.sh-5-35) [sh-5-35.local:03852] shmem: mmap: ds_copy complete from: (id: 13, size: 140, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/shared_mem_btl_module.sh-5-35 flags: 0x01) to: (id: 13, size: 140, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/shared_mem_btl_module.sh-5-35 flags: 0x01) [sh-5-35.local:03852] select: init of component sm returned success [sh-5-35.local:03852] select: initializing btl component vader [sh-5-35.local:03852] shmem: mmap: shmem_ds_resetting (id: 0, size: 0, name: ) [sh-5-35.local:03852] shmem: mmap: backing store base directory: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/0/vader_segment.sh-5-35.0 [sh-5-35.local:03852] shmem: mmap: create successful (id: 13, size: 4194312, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/0/vader_segment.sh-5-35.0) [sh-5-35.local:03852] segment_attach: my_pid=3852 seg_cpid=3852 [sh-5-35.local:03852] segment_attach: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/0/vader_segment.sh-5-35.0 [sh-5-35.local:03852] shmem: mmap: attach successful (id: 13, size: 4194312, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/0/vader_segment.sh-5-35.0) [sh-5-35.local:03852] select: init of component vader returned success [sh-5-35.local:03852] mca_bml_r2_add_procs 0/3 calling add_procs with btl self [sh-5-35.local:03852] mca: bml: Using self btl to [[21711,9],0] on node sh-5-35 [sh-5-35.local:03852] mca_bml_r2_add_procs 1/3 calling add_procs with btl vader [sh-5-35.local:03852] vader_add_procs(nprocs=2) [sh-5-35.local:03852] vader_add_procs: init_vader_endpoint(proc=0, local_rank=0) => 0 [sh-5-35.local:03852] init_vader_endpoint: modex recv success, seg_name=/tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/1/vader_segment.sh-5-35.1 [sh-5-35.local:03852] segment_attach: my_pid=3852 seg_cpid=3853 [sh-5-35.local:03852] segment_attach: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/1/vader_segment.sh-5-35.1 [sh-5-35.local:03852] shmem: mmap: attach successful (id: 13, size: 4194312, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/1/vader_segment.sh-5-35.1) [sh-5-35.local:03852] vader_add_procs: init_vader_endpoint(proc=1, local_rank=1) => 0 [sh-5-35.local:03852] mca: bml: Using vader btl to [[21711,9],1] on node sh-5-35 [sh-5-35.local:03852] mca_bml_r2_add_procs 2/3 calling add_procs with btl sm [sh-5-35.local:03852] segment_attach: my_pid=3852 seg_cpid=3852 [sh-5-35.local:03852] segment_attach: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/shared_mem_pool.sh-5-35 [sh-5-35.local:03852] shmem: mmap: attach successful (id: 13, size: 134217736, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/shared_mem_pool.sh-5-35) [sh-5-35.local:03852] shmem: mmap: ds_copy complete from: (id: 13, size: 134217736, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/shared_mem_pool.sh-5-35 flags: 0x01) to: (id: 13, size: 134217736, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/shared_mem_pool.sh-5-35 flags: 0x01) [sh-5-35.local:03852] shmem: mmap: unlinking(id: 13, size: 140, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/shared_mem_btl_module.sh-5-35) [sh-5-35.local:03852] shmem: mmap: unlinking(id: 13, size: 134217736, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/shared_mem_pool.sh-5-35) [sh-5-35.local:03853] mca_bml_r2_add_procs 0/3 calling add_procs with btl self [sh-5-35.local:03853] mca: bml: Using self btl to [[21711,9],1] on node sh-5-35 [sh-5-35.local:03853] mca_bml_r2_add_procs 1/3 calling add_procs with btl vader [sh-5-35.local:03853] vader_add_procs(nprocs=2) [sh-5-35.local:03853] init_vader_endpoint: modex recv success, seg_name=/tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/0/vader_segment.sh-5-35.0 [sh-5-35.local:03853] segment_attach: my_pid=3853 seg_cpid=3852 [sh-5-35.local:03853] segment_attach: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/0/vader_segment.sh-5-35.0 [sh-5-35.local:03853] shmem: mmap: attach successful (id: 12, size: 4194312, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/0/vader_segment.sh-5-35.0) [sh-5-35.local:03853] vader_add_procs: init_vader_endpoint(proc=0, local_rank=0) => 0 [sh-5-35.local:03853] vader_add_procs: init_vader_endpoint(proc=1, local_rank=1) => 0 [sh-5-35.local:03853] mca: bml: Using vader btl to [[21711,9],0] on node sh-5-35 [sh-5-35.local:03853] mca_bml_r2_add_procs 2/3 calling add_procs with btl sm [sh-5-35.local:03853] segment_attach: my_pid=3853 seg_cpid=3852 [sh-5-35.local:03853] segment_attach: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/shared_mem_pool.sh-5-35 [sh-5-35.local:03853] shmem: mmap: attach successful (id: 12, size: 134217736, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/shared_mem_pool.sh-5-35) [sh-5-35.local:03853] shmem: mmap: ds_copy complete from: (id: 12, size: 134217736, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/shared_mem_pool.sh-5-35 flags: 0x01) to: (id: 12, size: 134217736, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/shared_mem_pool.sh-5-35 flags: 0x01) [sh-5-35.local:03853] segment_attach: my_pid=3853 seg_cpid=3852 [sh-5-35.local:03853] segment_attach: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/shared_mem_btl_module.sh-5-35 [sh-5-35.local:03853] shmem: mmap: attach successful (id: 14, size: 140, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/shared_mem_btl_module.sh-5-35) [sh-5-35.local:03853] shmem: mmap: ds_copy complete from: (id: 14, size: 140, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/shared_mem_btl_module.sh-5-35 flags: 0x01) to: (id: 14, size: 140, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/shared_mem_btl_module.sh-5-35 flags: 0x01) [sh-5-35.local:03853] mca: bml: Not using sm btl to [[21711,9],0] on node sh-5-35 because vader btl has higher exclusivity (65536 > 65535) [sh-5-35.local:03852] mca: bml: Not using sm btl to [[21711,9],1] on node sh-5-35 because vader btl has higher exclusivity (65536 > 65535) [sh-5-35.local:03852] shmem: mmap: detaching (id: 13, size: 4194312, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/1/vader_segment.sh-5-35.1) [sh-5-35.local:03852] shmem: mmap: shmem_ds_resetting (id: 13, size: 4194312, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/1/vader_segment.sh-5-35.1) [sh-5-35.local:03852] shmem: mmap: unlinking(id: 13, size: 4194312, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/0/vader_segment.sh-5-35.0) [sh-5-35.local:03852] shmem: mmap: detaching (id: -1, size: 4194312, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/0/vader_segment.sh-5-35.0) [sh-5-35.local:03852] shmem: mmap: shmem_ds_resetting (id: -1, size: 4194312, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/0/vader_segment.sh-5-35.0) [sh-5-35.local:03852] mca: base: close: component self closed [sh-5-35.local:03852] mca: base: close: unloading component self [sh-5-35.local:03852] shmem: mmap: detaching (id: -1, size: 140, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/shared_mem_btl_module.sh-5-35) [sh-5-35.local:03852] shmem: mmap: shmem_ds_resetting (id: -1, size: 140, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/shared_mem_btl_module.sh-5-35) [sh-5-35.local:03852] mca: base: close: component sm closed [sh-5-35.local:03852] mca: base: close: unloading component sm [sh-5-35.local:03852] mca: base: close: component vader closed [sh-5-35.local:03852] mca: base: close: unloading component vader [sh-5-35.local:03852] shmem: mmap: detaching (id: -1, size: 134217736, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/shared_mem_pool.sh-5-35) [sh-5-35.local:03852] shmem: mmap: shmem_ds_resetting (id: -1, size: 134217736, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/shared_mem_pool.sh-5-35) [sh-5-35.local:03853] shmem: mmap: detaching (id: 12, size: 4194312, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/0/vader_segment.sh-5-35.0) [sh-5-35.local:03853] shmem: mmap: shmem_ds_resetting (id: 12, size: 4194312, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/0/vader_segment.sh-5-35.0) [sh-5-35.local:03853] shmem: mmap: unlinking(id: 12, size: 4194312, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/1/vader_segment.sh-5-35.1) [sh-5-35.local:03853] shmem: mmap: detaching (id: -1, size: 4194312, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/1/vader_segment.sh-5-35.1) [sh-5-35.local:03853] shmem: mmap: shmem_ds_resetting (id: -1, size: 4194312, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/1/vader_segment.sh-5-35.1) [sh-5-35.local:03853] mca: base: close: component self closed [sh-5-35.local:03853] mca: base: close: unloading component self [sh-5-35.local:03853] shmem: mmap: detaching (id: 14, size: 140, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/shared_mem_btl_module.sh-5-35) [sh-5-35.local:03853] shmem: mmap: shmem_ds_resetting (id: 14, size: 140, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/shared_mem_btl_module.sh-5-35) [sh-5-35.local:03853] mca: base: close: component sm closed [sh-5-35.local:03853] mca: base: close: unloading component sm [sh-5-35.local:03853] mca: base: close: component vader closed [sh-5-35.local:03853] mca: base: close: unloading component vader [sh-5-35.local:03853] shmem: mmap: detaching (id: 12, size: 134217736, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/shared_mem_pool.sh-5-35) [sh-5-35.local:03853] shmem: mmap: shmem_ds_resetting (id: 12, size: 134217736, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/9/shared_mem_pool.sh-5-35) [sh-5-35.local:03853] mca: base: close: component mmap closed [sh-5-35.local:03852] mca: base: close: component mmap closed [sh-5-35.local:03852] mca: base: close: unloading component mmap [sh-5-35.local:03853] mca: base: close: unloading component mmap Process 1 received number -1 from process 0 |
and can you now run OMPI_MCA_btl_base_verbose=255 OMPI_MCA_shmem_base_verbose=255 srun -N 1 -n 2 ./send_recv /* i "hope" a crash will occur ... */ |
@ggouaillardet Yeah! $ OMPI_MCA_btl_base_verbose=255 OMPI_MCA_shmem_base_verbose=255 srun -N 1 -n 2 ./send_recv [sh-5-35.local:06353] mca: base: components_register: registering shmem components [sh-5-35.local:06353] mca: base: components_register: found loaded component mmap [sh-5-35.local:06353] mca: base: components_register: component mmap register function successful [sh-5-35.local:06353] mca: base: components_register: found loaded component posix [sh-5-35.local:06353] mca: base: components_register: component posix register function successful [sh-5-35.local:06353] mca: base: components_register: found loaded component sysv [sh-5-35.local:06353] mca: base: components_register: component sysv register function successful [sh-5-35.local:06353] mca: base: components_open: opening shmem components [sh-5-35.local:06353] mca: base: components_open: found loaded component mmap [sh-5-35.local:06353] mca: base: components_open: component mmap open function successful [sh-5-35.local:06353] mca: base: components_open: found loaded component posix [sh-5-35.local:06353] mca: base: components_open: component posix open function successful [sh-5-35.local:06353] mca: base: components_open: found loaded component sysv [sh-5-35.local:06353] mca: base: components_open: component sysv open function successful [sh-5-35.local:06353] shmem: base: runtime_query: Auto-selecting shmem components [sh-5-35.local:06353] shmem: base: runtime_query: (shmem) Querying component (run-time) [mmap] [sh-5-35.local:06353] shmem: base: runtime_query: (shmem) Query of component [mmap] set priority to 50 [sh-5-35.local:06353] shmem: base: runtime_query: (shmem) Querying component (run-time) [posix] [sh-5-35.local:06353] shmem: base: runtime_query: (shmem) Query of component [posix] set priority to 40 [sh-5-35.local:06353] shmem: base: runtime_query: (shmem) Querying component (run-time) [sysv] [sh-5-35.local:06354] mca: base: components_register: registering shmem components [sh-5-35.local:06354] mca: base: components_register: found loaded component mmap [sh-5-35.local:06354] mca: base: components_register: component mmap register function successful [sh-5-35.local:06354] mca: base: components_register: found loaded component posix [sh-5-35.local:06354] mca: base: components_register: component posix register function successful [sh-5-35.local:06354] mca: base: components_register: found loaded component sysv [sh-5-35.local:06354] mca: base: components_register: component sysv register function successful [sh-5-35.local:06354] mca: base: components_open: opening shmem components [sh-5-35.local:06354] mca: base: components_open: found loaded component mmap [sh-5-35.local:06354] mca: base: components_open: component mmap open function successful [sh-5-35.local:06354] mca: base: components_open: found loaded component posix [sh-5-35.local:06354] mca: base: components_open: component posix open function successful [sh-5-35.local:06354] mca: base: components_open: found loaded component sysv [sh-5-35.local:06354] mca: base: components_open: component sysv open function successful [sh-5-35.local:06354] shmem: base: runtime_query: Auto-selecting shmem components [sh-5-35.local:06354] shmem: base: runtime_query: (shmem) Querying component (run-time) [mmap] [sh-5-35.local:06354] shmem: base: runtime_query: (shmem) Query of component [mmap] set priority to 50 [sh-5-35.local:06354] shmem: base: runtime_query: (shmem) Querying component (run-time) [posix] [sh-5-35.local:06354] shmem: base: runtime_query: (shmem) Query of component [posix] set priority to 40 [sh-5-35.local:06354] shmem: base: runtime_query: (shmem) Querying component (run-time) [sysv] [sh-5-35.local:06353] shmem: base: runtime_query: (shmem) Query of component [sysv] set priority to 30 [sh-5-35.local:06353] shmem: base: runtime_query: (shmem) Selected component [mmap] [sh-5-35.local:06353] mca: base: close: unloading component posix [sh-5-35.local:06353] mca: base: close: unloading component sysv [sh-5-35.local:06354] shmem: base: runtime_query: (shmem) Query of component [sysv] set priority to 30 [sh-5-35.local:06354] shmem: base: runtime_query: (shmem) Selected component [mmap] [sh-5-35.local:06354] mca: base: close: unloading component posix [sh-5-35.local:06354] mca: base: close: unloading component sysv [sh-5-35.local:06353] mca: base: components_register: registering btl components [sh-5-35.local:06353] mca: base: components_register: found loaded component openib [sh-5-35.local:06354] mca: base: components_register: registering btl components [sh-5-35.local:06354] mca: base: components_register: found loaded component openib [sh-5-35.local:06353] mca: base: components_register: component openib register function successful [sh-5-35.local:06354] mca: base: components_register: component openib register function successful [sh-5-35.local:06354] mca: base: components_register: found loaded component self [sh-5-35.local:06354] mca: base: components_register: component self register function successful [sh-5-35.local:06353] mca: base: components_register: found loaded component self [sh-5-35.local:06354] mca: base: components_register: found loaded component sm [sh-5-35.local:06353] mca: base: components_register: component self register function successful [sh-5-35.local:06354] mca: base: components_register: component sm register function successful [sh-5-35.local:06354] mca: base: components_register: found loaded component tcp [sh-5-35.local:06353] mca: base: components_register: found loaded component sm [sh-5-35.local:06354] mca: base: components_register: component tcp register function successful [sh-5-35.local:06354] mca: base: components_register: found loaded component usnic [sh-5-35.local:06354] mca: base: components_register: component usnic register function successful [sh-5-35.local:06353] mca: base: components_register: component sm register function successful [sh-5-35.local:06354] mca: base: components_register: found loaded component vader [sh-5-35.local:06353] mca: base: components_register: found loaded component tcp [sh-5-35.local:06354] mca: base: components_register: component vader register function successful [sh-5-35.local:06354] mca: base: components_open: opening btl components [sh-5-35.local:06354] mca: base: components_open: found loaded component openib [sh-5-35.local:06354] mca: base: components_open: component openib open function successful [sh-5-35.local:06354] mca: base: components_open: found loaded component self [sh-5-35.local:06354] mca: base: components_open: component self open function successful [sh-5-35.local:06354] mca: base: components_open: found loaded component sm [sh-5-35.local:06353] mca: base: components_register: component tcp register function successful [sh-5-35.local:06354] mca: base: components_open: component sm open function successful [sh-5-35.local:06353] mca: base: components_register: found loaded component usnic [sh-5-35.local:06354] mca: base: components_open: found loaded component tcp [sh-5-35.local:06354] mca: base: components_open: component tcp open function successful [sh-5-35.local:06354] mca: base: components_open: found loaded component usnic [sh-5-35.local:06354] mca: base: components_open: component usnic open function successful [sh-5-35.local:06354] mca: base: components_open: found loaded component vader [sh-5-35.local:06354] mca: base: components_open: component vader open function successful [sh-5-35.local:06353] mca: base: components_register: component usnic register function successful [sh-5-35.local:06353] mca: base: components_register: found loaded component vader [sh-5-35.local:06353] mca: base: components_register: component vader register function successful [sh-5-35.local:06353] mca: base: components_open: opening btl components [sh-5-35.local:06353] mca: base: components_open: found loaded component openib [sh-5-35.local:06353] mca: base: components_open: component openib open function successful [sh-5-35.local:06353] mca: base: components_open: found loaded component self [sh-5-35.local:06353] mca: base: components_open: component self open function successful [sh-5-35.local:06353] mca: base: components_open: found loaded component sm [sh-5-35.local:06353] mca: base: components_open: component sm open function successful [sh-5-35.local:06353] mca: base: components_open: found loaded component tcp [sh-5-35.local:06353] mca: base: components_open: component tcp open function successful [sh-5-35.local:06353] mca: base: components_open: found loaded component usnic [sh-5-35.local:06353] mca: base: components_open: component usnic open function successful [sh-5-35.local:06353] mca: base: components_open: found loaded component vader [sh-5-35.local:06353] mca: base: components_open: component vader open function successful [sh-5-35.local:06354] select: initializing btl component openib [sh-5-35.local:06353] select: initializing btl component openib [sh-5-35.local:06353] openib BTL: rdmacm CPC available for use on mlx4_0:1 [sh-5-35.local:06354] openib BTL: rdmacm CPC available for use on mlx4_0:1 [sh-5-35.local:06353] [rank=0] openib: using port mlx4_0:1 [sh-5-35.local:06353] select: init of component openib returned success [sh-5-35.local:06353] select: initializing btl component self [sh-5-35.local:06353] select: init of component self returned success [sh-5-35.local:06353] select: initializing btl component sm [sh-5-35.local:06353] shmem: mmap: shmem_ds_resetting (id: 0, size: 0, name: ) [sh-5-35.local:06353] shmem: mmap: backing store base directory: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/10/shared_mem_pool.sh-5-35 [sh-5-35.local:06353] shmem: mmap: create successful (id: 23, size: 134217736, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/10/shared_mem_pool.sh-5-35) [sh-5-35.local:06353] segment_attach: my_pid=6353 seg_cpid=6353 [sh-5-35.local:06353] segment_attach: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/10/shared_mem_pool.sh-5-35 [sh-5-35.local:06353] shmem: mmap: attach successful (id: 23, size: 134217736, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/10/shared_mem_pool.sh-5-35) [sh-5-35.local:06353] shmem: mmap: ds_copy complete from: (id: 23, size: 134217736, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/10/shared_mem_pool.sh-5-35 flags: 0x01) to: (id: 23, size: 134217736, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/10/shared_mem_pool.sh-5-35 flags: 0x01) [sh-5-35.local:06353] shmem: mmap: shmem_ds_resetting (id: 0, size: 0, name: ) [sh-5-35.local:06353] shmem: mmap: backing store base directory: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/10/shared_mem_btl_module.sh-5-35 [sh-5-35.local:06353] shmem: mmap: create successful (id: 23, size: 140, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/10/shared_mem_btl_module.sh-5-35) [sh-5-35.local:06353] segment_attach: my_pid=6353 seg_cpid=6353 [sh-5-35.local:06353] segment_attach: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/10/shared_mem_btl_module.sh-5-35 [sh-5-35.local:06353] shmem: mmap: attach successful (id: 23, size: 140, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/10/shared_mem_btl_module.sh-5-35) [sh-5-35.local:06353] shmem: mmap: ds_copy complete from: (id: 23, size: 140, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/10/shared_mem_btl_module.sh-5-35 flags: 0x01) to: (id: 23, size: 140, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/10/shared_mem_btl_module.sh-5-35 flags: 0x01) [sh-5-35.local:06353] select: init of component sm returned success [sh-5-35.local:06353] select: initializing btl component tcp [sh-5-35.local:06354] [rank=1] openib: using port mlx4_0:1 [sh-5-35.local:06354] select: init of component openib returned success [sh-5-35.local:06354] select: initializing btl component self [sh-5-35.local:06354] select: init of component self returned success [sh-5-35.local:06354] select: initializing btl component sm [sh-5-35.local:06354] select: init of component sm returned success [sh-5-35.local:06354] select: initializing btl component tcp [sh-5-35.local:06354] btl: tcp: Searching for exclude address+prefix: 127.0.0.1 / 8 [sh-5-35.local:06354] btl: tcp: Found match: 127.0.0.1 (lo) [sh-5-35.local:06354] select: init of component tcp returned success [sh-5-35.local:06354] select: initializing btl component usnic [sh-5-35.local:06354] found 1 verbs interface [sh-5-35.local:06354] examining verbs interface: mlx4_0 [sh-5-35.local:06353] btl: tcp: Searching for exclude address+prefix: 127.0.0.1 / 8 [sh-5-35.local:06353] btl: tcp: Found match: 127.0.0.1 (lo) [sh-5-35.local:06353] select: init of component tcp returned success [sh-5-35.local:06353] select: initializing btl component usnic [sh-5-35.local:06353] found 1 verbs interface [sh-5-35.local:06353] examining verbs interface: mlx4_0 [sh-5-35.local:06354] found 1 verbs interface [sh-5-35.local:06354] examining verbs interface: mlx4_0 [sh-5-35.local:06353] found 1 verbs interface [sh-5-35.local:06353] examining verbs interface: mlx4_0 [sh-5-35.local:06353] btl:usnic: no usNICs found [sh-5-35.local:06353] select: init of component usnic returned failure [sh-5-35.local:06353] mca: base: close: component usnic closed [sh-5-35.local:06353] mca: base: close: unloading component usnic [sh-5-35.local:06354] btl:usnic: no usNICs found [sh-5-35.local:06354] select: init of component usnic returned failure [sh-5-35.local:06354] mca: base: close: component usnic closed [sh-5-35.local:06354] mca: base: close: unloading component usnic [sh-5-35.local:06354] select: initializing btl component vader [sh-5-35.local:06354] shmem: mmap: shmem_ds_resetting (id: 0, size: 0, name: ) [sh-5-35.local:06354] shmem: mmap: backing store base directory: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/10/1/vader_segment.sh-5-35.1 [sh-5-35.local:06354] shmem: mmap: create successful (id: 24, size: 4194312, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/10/1/vader_segment.sh-5-35.1) [sh-5-35.local:06354] segment_attach: my_pid=6354 seg_cpid=6354 [sh-5-35.local:06354] segment_attach: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/10/1/vader_segment.sh-5-35.1 [sh-5-35.local:06354] shmem: mmap: attach successful (id: 24, size: 4194312, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/10/1/vader_segment.sh-5-35.1) [sh-5-35.local:06354] select: init of component vader returned success [sh-5-35.local:06353] select: initializing btl component vader [sh-5-35.local:06353] shmem: mmap: shmem_ds_resetting (id: 0, size: 0, name: ) [sh-5-35.local:06353] shmem: mmap: backing store base directory: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/10/0/vader_segment.sh-5-35.0 [sh-5-35.local:06353] shmem: mmap: create successful (id: 24, size: 4194312, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/10/0/vader_segment.sh-5-35.0) [sh-5-35.local:06353] segment_attach: my_pid=6353 seg_cpid=6353 [sh-5-35.local:06353] segment_attach: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/10/0/vader_segment.sh-5-35.0 [sh-5-35.local:06353] shmem: mmap: attach successful (id: 24, size: 4194312, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/10/0/vader_segment.sh-5-35.0) [sh-5-35.local:06353] select: init of component vader returned success [sh-5-35.local:06353] mca_bml_r2_add_procs 0/6 calling add_procs with btl self [sh-5-35.local:06354] mca_bml_r2_add_procs 0/6 calling add_procs with btl self [sh-5-35.local:06353] mca: bml: Using self btl to [[21711,10],0] on node sh-5-35 [sh-5-35.local:06353] mca_bml_r2_add_procs 1/6 calling add_procs with btl vader [sh-5-35.local:06353] vader_add_procs(nprocs=2) [sh-5-35.local:06354] mca: bml: Using self btl to [[21711,10],1] on node sh-5-35 [sh-5-35.local:06354] mca_bml_r2_add_procs 1/6 calling add_procs with btl vader [sh-5-35.local:06353] vader_add_procs: init_vader_endpoint(proc=0, local_rank=0) => 0 [sh-5-35.local:06354] vader_add_procs(nprocs=2) [sh-5-35.local:06353] init_vader_endpoint: modex recv failed -48 [sh-5-35.local:06353] vader_add_procs: init_vader_endpoint(proc=1, local_rank=1) => -48 [sh-5-35.local:06353] mca: bml: Using vader btl to [[21711,10],1] on node sh-5-35 [sh-5-35.local:06353] mca_bml_r2_add_procs 2/6 calling add_procs with btl sm [sh-5-35.local:06353] segment_attach: my_pid=6353 seg_cpid=6353 [sh-5-35.local:06353] segment_attach: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/10/shared_mem_pool.sh-5-35 [sh-5-35.local:06353] shmem: mmap: attach successful (id: 23, size: 134217736, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/10/shared_mem_pool.sh-5-35) [sh-5-35.local:06353] shmem: mmap: ds_copy complete from: (id: 23, size: 134217736, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/10/shared_mem_pool.sh-5-35 flags: 0x01) to: (id: 23, size: 134217736, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/10/shared_mem_pool.sh-5-35 flags: 0x01) [sh-5-35.local:06354] init_vader_endpoint: modex recv failed -48 [sh-5-35.local:06354] vader_add_procs: init_vader_endpoint(proc=0, local_rank=0) => -48 [sh-5-35.local:06354] vader_add_procs: init_vader_endpoint(proc=1, local_rank=1) => 0 [sh-5-35.local:06354] mca: bml: Using vader btl to [[21711,10],0] on node sh-5-35 [sh-5-35.local:06354] mca_bml_r2_add_procs 2/6 calling add_procs with btl sm [sh-5-35.local:06354] segment_attach: my_pid=6354 seg_cpid=6353 [sh-5-35.local:06354] segment_attach: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/10/shared_mem_pool.sh-5-35 [sh-5-35.local:06354] shmem: mmap: attach successful (id: 24, size: 134217736, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/10/shared_mem_pool.sh-5-35) [sh-5-35.local:06354] shmem: mmap: ds_copy complete from: (id: 24, size: 134217736, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/10/shared_mem_pool.sh-5-35 flags: 0x01) to: (id: 24, size: 134217736, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/10/shared_mem_pool.sh-5-35 flags: 0x01) [sh-5-35.local:06354] segment_attach: my_pid=6354 seg_cpid=6353 [sh-5-35.local:06354] segment_attach: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/10/shared_mem_btl_module.sh-5-35 [sh-5-35.local:06354] shmem: mmap: attach successful (id: 29, size: 140, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/10/shared_mem_btl_module.sh-5-35) [sh-5-35.local:06354] shmem: mmap: ds_copy complete from: (id: 29, size: 140, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/10/shared_mem_btl_module.sh-5-35 flags: 0x01) to: (id: 29, size: 140, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/10/shared_mem_btl_module.sh-5-35 flags: 0x01) [sh-5-35.local:06353] shmem: mmap: unlinking(id: 23, size: 140, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/10/shared_mem_btl_module.sh-5-35) [sh-5-35.local:06354] mca: bml: Not using sm btl to [[21711,10],0] on node sh-5-35 because vader btl has higher exclusivity (65536 > 65535) [sh-5-35.local:06353] shmem: mmap: unlinking(id: 23, size: 134217736, name: /tmp/openmpi-sessions-kilian@sh-5-35_0/21711/10/shared_mem_pool.sh-5-35) [sh-5-35.local:06354] mca_bml_r2_add_procs 3/6 calling add_procs with btl openib [sh-5-35.local:06353] mca: bml: Not using sm btl to [[21711,10],1] on node sh-5-35 because vader btl has higher exclusivity (65536 > 65535) [sh-5-35.local:06353] mca_bml_r2_add_procs 3/6 calling add_procs with btl openib [sh-5-35.local:06353] mca: bml: Not using openib btl to [[21711,10],1] on node sh-5-35 because vader btl has higher exclusivity (65536 > 1024) [sh-5-35.local:06353] mca_bml_r2_add_procs 4/6 calling add_procs with btl tcp [sh-5-35.local:06354] mca: bml: Not using openib btl to [[21711,10],0] on node sh-5-35 because vader btl has higher exclusivity (65536 > 1024) [sh-5-35.local:06354] mca_bml_r2_add_procs 4/6 calling add_procs with btl tcp [sh-5-35.local:06354] mca: bml: Not using tcp btl to [[21711,10],0] on node sh-5-35 because vader btl has higher exclusivity (65536 > 100) [sh-5-35.local:06354] mca_bml_r2_add_procs 5/6 calling add_procs with btl tcp [sh-5-35.local:06354] mca: bml: Not using tcp btl to [[21711,10],0] on node sh-5-35 because vader btl has higher exclusivity (65536 > 100) [sh-5-35.local:06353] mca: bml: Not using tcp btl to [[21711,10],1] on node sh-5-35 because vader btl has higher exclusivity (65536 > 100) [sh-5-35.local:06353] mca_bml_r2_add_procs 5/6 calling add_procs with btl tcp [sh-5-35.local:06353] mca: bml: Not using tcp btl to [[21711,10],1] on node sh-5-35 because vader btl has higher exclusivity (65536 > 100) srun: error: sh-5-35: task 0: Segmentation fault |
@kcgthb can you please run OMPI_MCA_btl_base_verbose=255 OMPI_MCA_shmem_base_verbose=255 OMPI_MCA_btl=^usnic OMPI_MCA_btl_base_verbose=255 OMPI_MCA_shmem_base_verbose=255 OMPI_MCA_btl=^openib srun -N 1 -n 2 ./send_recv OMPI_MCA_btl_base_verbose=255 OMPI_MCA_shmem_base_verbose=255 OMPI_MCA_btl=^tcp srun -N 1 -n 2 ./send_recv and attach the logs only if a crash occurs |
@kcgthb
if you do not have any usnic adapter, the simplest workaround is i am now looking for the root cause of the issue |
This has been fixed on master by commit 7508c6f If the usnic btl is used on a system where no usnic is detected, a NULL modex object is sent. @hjelmn the vader btl is not the root cause of the crash even if some error were not correctly reported to the upper layer |
@ggouaillardet I can confirm than patch 42 (commit 7508c6f) fixes the problem. No segfault anymore. |
…ng-fixes v1.8 osx clang warning fixes
Hi,
One of our users reported a segfault that appears under specific circumstances.
When using Slurm (14.03.8) and
srun
to launch an OpenSees job (http://opensees.berkeley.edu/index.php, compiled and executed with OpenMPI 1.8.2) with more than 2 tasks, the execution fails with a segmentation fault that seems to occur in the vader BTL. I have no idea why it's being used, we're running a x86 Linux cluster on Redhat 6.5.Reproducer is a follows:
The stack is:
If I run only 1 task (
srun -n1 OpenSeesMP
), no segfault occurs.If I use
mpirun
instead ofsrun
, no segfault.If I move aside
mca_btl_vader.so
, no segfault either.So I have 2 questions:
Thanks!
The text was updated successfully, but these errors were encountered: