libnbc nonuniform type failures in ibcast, iallgather(v)

When testing `coll/libnbc` with nonuniform data types three of the algorithms failed. I'll post a link to the test case in the comments. The unit test only works with `-np 5` some of the tests pass fine with `-np 4`.
- `ibcast` Falls into an infinite error loop in `libnbc`
- `iallgather` and `iallgatherv` produce wrong answers. Might be the same underlying problem.
### `ibcast` Failure

**Note**: This test will enter an infinite loop displaying `MPI Error in MPI_Testall() (18)` until PR #2245 is resolved.

``` shell
shell$ mpirun -np 5 --mca coll ^hcoll  ./test-nbc-dt 0 
 0 /  5) Running MPI_Ibcast...
 1 /  5) Running MPI_Ibcast...
 2 /  5) Running MPI_Ibcast...
 3 /  5) Running MPI_Ibcast...
 4 /  5) Running MPI_Ibcast...
MPI Error in MPI_Testall() (18)
MPI Error in MPI_Testall() (18)
...
```
### `iallgather` Failure

``` shell
shell$ mpirun -np 5 --mca coll ^hcoll  ./test-nbc-dt 1
 0 /  5) Running MPI_Iallgather...
 1 /  5) Running MPI_Iallgather...
 2 /  5) Running MPI_Iallgather...
 3 /  5) Running MPI_Iallgather...
 4 /  5) Running MPI_Iallgather...
 3 /  5) buf[1] : actual 2 [2], expected 1 [1]. from 0
 3 /  5) Failed in chkbuf. err = -1
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 3 in communicator MPI_COMM_WORLD
with errorcode -2.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
 2 /  5) buf[1] : actual 2 [2], expected 1 [1]. from 0
 2 /  5) Failed in chkbuf. err = -6
 1 /  5) buf[2] : actual 2 [2], expected 1 [1]. from 0
 1 /  5) Failed in chkbuf. err = -2
 4 /  5) buf[1] : actual 2 [2], expected 1 [1]. from 0
 4 /  5) Failed in chkbuf. err = -6
```
### `iallgatherv` Failure

``` shell
shell$ mpirun -np 5 --mca coll ^hcoll  ./test-nbc-dt 2
 0 /  5) Running MPI_Iallgatherv...
 1 /  5) Running MPI_Iallgatherv...
 2 /  5) Running MPI_Iallgatherv...
 3 /  5) Running MPI_Iallgatherv...
 4 /  5) Running MPI_Iallgatherv...
 0 /  5) buf[1] : actual 3 [3], expected 2 [2]. from 1
 0 /  5) Failed in chkbuf. err = -6
 3 /  5) buf[1] : actual 3 [3], expected 2 [2]. from 1
 3 /  5) Failed in chkbuf. err = -6
 4 /  5) buf[1] : actual 3 [3], expected 2 [2]. from 1
 4 /  5) Failed in chkbuf. err = -6
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 3 in communicator MPI_COMM_WORLD
with errorcode -2.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
 2 /  5) buf[1] : actual 3 [3], expected 2 [2]. from 1
 2 /  5) Failed in chkbuf. err = -6
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

libnbc nonuniform type failures in ibcast, iallgather(v) #2256

`ibcast` Failure

`iallgather` Failure

`iallgatherv` Failure

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

libnbc nonuniform type failures in ibcast, iallgather(v) #2256

Description

ibcast Failure

iallgather Failure

iallgatherv Failure

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`ibcast` Failure

`iallgather` Failure

`iallgatherv` Failure