-
Notifications
You must be signed in to change notification settings - Fork 72
Sync to ORTE/OPAL master #21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
In multithreaded case, it is expensive to release the lock, call the slow match and retake the lock again just to queue the frag. This patch will eliminate number of lock taken by queueing the frag right away and return. Signed-off-by: Thananon Patinyasakdikul <tpatinya@utk.edu>
LSF running on top of CSM does not provide LSF daemons on the compute nodes. Signed-off-by: Matt Ezell <ezellma@ornl.gov>
If not the pvars will remain valid after the OB1 PML is unloaded, and any access will segfault (the callbacks associated with the pvar will point to the memory of the dlclosed module). Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Store the pointer to the object handle and not the pointer to the pointer. We should not assert(0) in the code ! Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
values to FI_PROGRESS_UNSPEC so each provider will use its default. Signed-off-by: Matias Cabral <matias.a.cabral@intel.com>
Rework the logic to handle the out-of-sequence fragments on the receiver side. A large number of OOS messages are still arriving even in single threaded scenarios. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
The following issues have been fixed for `mindist`: - computing the job map on the backend nodes - using slots count (`-host node1:<s1>,nodeN:<sN>`) - fixed `dist:span` job mapping method - fixed `oversubcribe` option with `-host` Signed-off-by: Boris Karasev <karasev.b@gmail.com>
rmaps/mindist: reworked the job map binding
…d have higher priority than rdma and default to psm2. Context: the Intel Omni-path driver (hfi1) has verbs support, so the openib btl is available to use. However, at a bad performance. Without this change osc rdma using btl openib is the default choice when running on Intel Omni-path, with a lower performance than osc pt2pt over mtl psm2. Signed-off-by: Matias A Cabral <matias.a.cabral@intel.com>
so the bool type is defined when using old compilers that do not support gcc builtin atomics (such as gcc 4.1.x from CentOS 5) Fixes open-mpi/ompi#4478 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Per https://www.mail-archive.com/users@lists.open-mpi.org/msg31758.html, only output unknown frames when we're outputting verbose BTL messages. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
…n-verbose-mode usnic: only output unknown frames in verbose mode
pml/ob1: match callback will now queue wrong sequence frag and return.
No code or logic changes. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
No code or logic changes. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
The value of ret is negative (e.g., -61), but it is displayed in the help message as `%zd`, which renders as unsigned (i.e., a giant positive value). So make sure to negate the negative value before rendering it (e.g., so we display "61", not "4294967235"). Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Before this commit, the presence of usNIC devices -- which will (currently) return no data when fi_getinfo() is queried for tagged matching providers -- would cause an error message to be displayed. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
gcc 5.2 complains: ``` mtl_ofi_component.c: In function ‘ompi_mtl_ofi_finalize’: mtl_ofi_component.c:613:5: warning: suggest parentheses around assignment used as truth value [-Wparentheses] if (ret = fi_close((fid_t)ompi_mtl_ofi.fabric)) { ^ ``` Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
Sync to PMIx master
MTL OFI updates
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
Some minor cleanups of the DVM
mtl/ofi: Set data and control progress options default values to FI_PROGRESS_UNSPEC
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
pmix: pack pointer to object (vs. pointer to pointer)
Sometimes, the ethernet interfaces can get quite high kernel indices. struct ifreq (see netdevice(7)) defines ifr_ifindex to be int's. The OOB component used int16_t internally for matching (in case of -mca oob_tcp_if_[in|ex]clude) which meant that any interface index > 32767 would never be matched because the integer would be truncated to int16_t upon return from the function. OOB would then refuse to work because it didn't find any usable interfaces and MPI job would abort. Signed-off-by: Wojtek Wasko <wwasko@nvidia.com>
of "unset". mtl/psm2: Update some shadow mca parameters to use the default "unset". mtl/psm2: Add new shadow parameter to allow specifying the service level. Signed-off-by: Matias A Cabral <matias.a.cabral@intel.com>
Make interface's kernel index an int instead of int16_t
return true if the datatype has non-negative displacements and monotonically nondecreasing, and false otherwise. Thanks George for the guidance. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Per MPI 3.1 chapter 13.3 : "Derived etypes can be constructed by using any of the MPI datatype constructor routines, provided all resulting typemap displacements are non-negative and monotonically nondecreasing." Same restriction applies to ftypes. add the OMPI_DATATYPE_CHECK_FOR_VIEW() macro that is check the underlying opal_datatype_t is monotonic, on top of all checks performed in OMPI_DATATYPE_CHECK_FOR_RECV(). Since checking monotoniciy is expensive, check is only performed when needed, but the result is cached by ompi_datatype_is_monotonic(). Thanks Wei-keng Liao for the valuable feedback. Thanks George for the guidance. Refs. open-mpi/ompi#4682 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
MPI_File_set_view(): check datatypes are monotonic
Fix type of mpi_f08 MPI_ERRCODES_IGNORE
Signed-off-by: Philip Kovacs <pkdevel@yahoo.com>
Fix DIR, DIR/include search for --with-pmix
If we abnormally terminate, then we still want any cleanups to be executed. Remove debug Signed-off-by: Ralph Castain <rhc@open-mpi.org>
Ensure the epilog gets executed in PMIx server
Handle the need for different regex generator/parsers by moving the orte/util/nidmap and orte/util/regex code into a new "regx" framework. Use the original code to complete a "fwd" component, and create a scaffold for IBM's "reverse" component. Signed-off-by: Ralph Castain <rhc@open-mpi.org>
Convert nidmap to regx framework
Resolve a race condition between registering for a file to be removed upon termination and actual creation of that file by providing attributes that identify whether the path is a file or directory. This removes the need for PMIx to detect the difference. Refs #4686 Signed-off-by: Ralph Castain <rhc@open-mpi.org>
Ensure cleanup of registered files/dirs
Use asprintf in description message to avoid missing default values Signed-off-by: Matias Cabral <matias.a.cabral@intel.com>
Refs. open-mpi/ompi#4689 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
typedef int (*orte_regx_base_module_extract_node_names_fn_t)(char *regexp, char ***names); among other things, that will make testing way easier. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Search for the digits to be compressed from the end of the node names. For example, if the nodelist is c712f6n01,c712f6n02,c712f6n03 the regx/fwd component generates c[3:712]f6n01,c[3:712]f6n02,c[3:712]f6n03@(3) when the regx/reverse component generates c712f6n[2:1-3]@0(3) which is a better fit here. Josh Hursey authored the changes and must be credited. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Ensure that prun doesn't exit until notified that its own child job terminated. Signed-off-by: Ralph Castain <rhc@open-mpi.org>
Ensure that prun doesn't prematurely exit
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
orte/regx: fix, revamp and enhancement
osc/rmda: fix missing opal_argv_free in mtls search.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
@ggouaillardet Looks like Travis changed something, and now GCC v6 cannot be found. I've captured it here in case the detailed log is lost.
|
hppritcha
referenced
this pull request
in hppritcha/prrte
Oct 1, 2024
Use the PMIx functions to check params
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.