Skip to content

Conversation

@bwbarrett
Copy link
Member

This patch series backports all the significant changes from the collectives components from master to v4.1.x. The diff between master and v4.1.x are all repository-wide code cleanups that weren't worth pulling in.

mkurnosov and others added 21 commits June 25, 2020 23:06
The call of MPI_Allgatherv with sendbuf and sendtype parameters equal to MPI_IN_PLACE and NULL correspondingly, produces the segmentation fault.

The problem is that sendtype is used even when sendbuf value is MPI_IN_PLACE. But according to the standard, sendtype and sendcount parameters should be ignored in this case.

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
(cherry picked from commit b45e190)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
always initialize 'size'.

Only the a2a_sched_diss() alltoall algorithm is impacted,
and this algo is currently unused, so there is no need
to backport nor update the NEWS file for now.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
(cherry picked from commit ff48e92)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
(cherry picked from commit 466217f)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Gcc 8 identified hb_tree_csearch() as an infinite recursion, and it
turns out that we never call this function, anyway.  So just remove
it.

Fixes open-mpi#5670.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 06c1bf7)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Implements recursive doubling algorithm for MPI_Iscan. The algorithm preserves order of operations so it can be used both by commutative and non-commutative operations.

The MCA parameter coll_libnbc_iscan_algorithm was added for dynamic algorithm selection.

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
(cherry picked from commit 3d43ff0)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Implements recursive doubling algorithm for MPI_Iexscan.
The algorithm preserves order of operations so it can be used both
by commutative and non-commutative operations.

The MCA parameter 'coll_libnbc_iexscan_algorithm' was added for dynamic
algorithm selection.

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
(cherry picked from commit dfe203e)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Remove dead code that was causing warnings about unused static
functions.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
(cherry picked from commit 2e24e6e)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
An implementation of R. Rabenseifner's algorithm for MPI_Ireduce.
This algorithm is a combination of a reduce-scatter implemented with recursive vector halving
and recursive distance doubling, followed either by a gather.

Limitations:
-- count >= 2^{\floor{\log_2 p}}
-- commutative operations only
-- intra-communicators only

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
(cherry picked from commit 7bd63e7)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
(cherry picked from commit b0429d2)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Implements recursive doubling algorithm for MPI_Iallgather.
The algorithm can be used only for power-of-two number of processes.

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
(cherry picked from commit a7386c1)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
(cherry picked from commit 66182a2)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
An implementation of R. Rabenseifner's algorithm for MPI_Iallreduce.

This algorithm is a combination of a reduce-scatter implemented with recursive vector halving
and recursive distance doubling, followed either by an allgather.

Limitations:
-- count >= 2^{\floor{\log_2 p}}
-- commutative operations only
-- intra-communicators only

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
(cherry picked from commit 73e048b)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
(cherry picked from commit 8b511c7)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
1. Remove debug output in iallgather (I have forgotten to remove it).
2. Remove an incorrect comment in description of ibcast

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
(cherry picked from commit 64abd0f)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Signed-off-by: Alex Anenkov <anenkov.ru@gmail.com>
(cherry picked from commit 77d466e)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Signed-off-by: Valentin Petrov valentinp@mellanox.com
(cherry picked from commit 6ea920e)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
(cherry picked from commit 531171c)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
(cherry picked from commit f2cbd48)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Signed-off-by: William Zhang <wilzhang@amazon.com>
(cherry picked from commit 5064040)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Signed-off-by: William Zhang <wilzhang@amazon.com>

cr https://code.amazon.com/reviews/CR-23837553

(cherry picked from commit 771f9c0)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
@bwbarrett bwbarrett force-pushed the backports/v4.1.x-collectives-updates branch from 587bd85 to 339ee63 Compare June 25, 2020 23:18
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit f64c30e)
@jsquyres jsquyres force-pushed the backports/v4.1.x-collectives-updates branch from 26487ae to 7987a7f Compare June 26, 2020 14:57
@bwbarrett
Copy link
Member Author

bot:aws:retest

1 similar comment
@bwbarrett
Copy link
Member Author

bot:aws:retest

@bwbarrett
Copy link
Member Author

bot:aws:retest

Not sure what happened, SSL CA error (meaning curl couldn't read the local CA bundle, I think):

+ curl https://raw.githubusercontent.com/open-mpi/ompi-scripts/master/jenkins/open-mpi-build-script.sh -o open-mpi-build-script.sh
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
curl: (77) Problem with the SSL CA cert (path? access rights?)
Build step 'Execute shell' marked build as failure
Finished: FAILURE

@bwbarrett bwbarrett merged commit 25abbb2 into open-mpi:v4.1.x Jun 26, 2020
@bwbarrett bwbarrett deleted the backports/v4.1.x-collectives-updates branch June 26, 2020 19:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants