Skip to content

Commit ebad34f

Browse files
committed
Adding -mca comm_method to print table of communication methods
This is closely related to Platform-MPI's old -prot feature. The long-format of the tables it prints could look like this: > Host 0 [myhost001] ranks 0 - 1 > Host 1 [myhost002] ranks 2 - 3 > Host 2 [myhost003] ranks 4 > Host 3 [myhost004] ranks 5 > Host 4 [myhost005] ranks 6 > Host 5 [myhost006] ranks 7 > Host 6 [myhost007] ranks 8 > Host 7 [myhost008] ranks 9 > Host 8 [myhost009] ranks 10 > > host | 0 1 2 3 4 5 6 7 8 > ======|============================================== > 0 : sm tcp tcp tcp tcp tcp tcp tcp tcp > 1 : tcp sm tcp tcp tcp tcp tcp tcp tcp > 2 : tcp tcp self tcp tcp tcp tcp tcp tcp > 3 : tcp tcp tcp self tcp tcp tcp tcp tcp > 4 : tcp tcp tcp tcp self tcp tcp tcp tcp > 5 : tcp tcp tcp tcp tcp self tcp tcp tcp > 6 : tcp tcp tcp tcp tcp tcp self tcp tcp > 7 : tcp tcp tcp tcp tcp tcp tcp self tcp > 8 : tcp tcp tcp tcp tcp tcp tcp tcp self > > Connection summary: > on-host: all connections are sm or self > off-host: all connections are tcp In this example hosts 0 and 1 had multiple ranks so "sm" was more meaningful than "self" to identify how the ranks on the host are talking to each other. While host 2..8 were one rank per host so "self" was more meaningful as their btl. Above a certain number of hosts (12 by default) the above table gets too big so we shrink to a more abbreviated looking table that has the same data: > host | 0 1 2 3 4 8 > ======|==================== > 0 : A C C C C C C C C > 1 : C A C C C C C C C > 2 : C C B C C C C C C > 3 : C C C B C C C C C > 4 : C C C C B C C C C > 5 : C C C C C B C C C > 6 : C C C C C C B C C > 7 : C C C C C C C B C > 8 : C C C C C C C C B > key: A == sm > key: B == self > key: C == tcp Then above 36 hosts we stop printing the 2d table entirely and just print the summary: > Connection summary: > on-host: all connections are sm or self > off-host: all connections are tcp The options to control it are -mca comm_method 1 : print the above table at the end of MPI_Init -mca comm_method 2 : print the above table at the beginning of MPI_Finalize -mca comm_method_max <n> : number of hosts <n> for which to print a full size 2d -mca comm_method_brief 1 : only print summary output, no 2d table -mca comm_method_fakefile <filename> : for debugging only * printing at init vs finalize: The most important difference between these two is that when printing the table during MPI_Init(), we send extra messages to make sure all hosts are connected to each other. So the table ends up working against the idea of on-demand connections (although it's only forcing the n^2 connections in the number of hosts, not the total ranks). If printing at MPI_Finalize() we don't create any connections that aren't already connected, so the table is more likely to have "n/a" entries if some hosts never connected to each other. * how many hosts <n> for which to print a full size 2d table The option -mca comm_method_max <n> can be used to specify a number of hosts <n> (default 12) that controls at what host-count the unabbreviated / abbreviated 2d tables get printed: 1 - n : full size 2d table n+1 - 3n : shortened 2d table 3n+1 - inf : summary only, no 2d table * brief The option -mca comm_method_brief 1 can be used to skip the printing of the 2d table and only show the short summary * fakefile This is a debugging option that allows easeir testing of all the printout routines by letting all the detected communication methods between the hosts be overridden by fake data from a file. The source of the information used in the table is the .mca_component_name In the case of BTLs, the module always had a .btl_component linking back to the component. The vars mca_pml_base_selected_component and ompi_mtl_base_selected_component offer similar functionality for pml/mtl. So with the ability to identify the component, we can then access the component name with code like this mca_pml_base_selected_component.pmlm_version.mca_component_name See the three lookup_{pml,mtl,btl}_name() functions in hook_comm_method_fns.c, and their use in comm_method() to parse the strings and produce an integer to represent the connection type being used. Signed-off-by: Mark Allen <markalle@us.ibm.com>
1 parent 804a517 commit ebad34f

14 files changed

+1080
-8
lines changed

Diff for: ompi/mca/hook/comm_method/Makefile.am

+20
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
#
2+
# Copyright (c) 2018 IBM Corporation. All rights reserved.
3+
# $COPYRIGHT$
4+
#
5+
# Additional copyrights may follow
6+
#
7+
# $HEADER$
8+
#
9+
10+
sources = \
11+
hook_comm_method.h \
12+
hook_comm_method_component.c \
13+
hook_comm_method_fns.c
14+
15+
# This component will only ever be built statically -- never as a DSO.
16+
17+
noinst_LTLIBRARIES = libmca_hook_comm_method.la
18+
19+
libmca_hook_comm_method_la_SOURCES = $(sources)
20+
libmca_hook_comm_method_la_LDFLAGS = -module -avoid-version

Diff for: ompi/mca/hook/comm_method/configure.m4

+25
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
#
2+
# Copyright (c) 2018 IBM Corporation. All rights reserved.
3+
#
4+
# $COPYRIGHT$
5+
#
6+
# Additional copyrights may follow
7+
#
8+
# $HEADER$
9+
#
10+
11+
# Make this a static component
12+
AC_DEFUN([MCA_ompi_hook_comm_method_COMPILE_MODE], [
13+
AC_MSG_CHECKING([for MCA component $2:$3 compile mode])
14+
$4="static"
15+
AC_MSG_RESULT([$$4])
16+
])
17+
18+
# MCA_hook_comm_method_CONFIG([action-if-can-compile],
19+
# [action-if-cant-compile])
20+
# ------------------------------------------------
21+
AC_DEFUN([MCA_ompi_hook_comm_method_CONFIG],[
22+
AC_CONFIG_FILES([ompi/mca/hook/comm_method/Makefile])
23+
24+
$1
25+
])

Diff for: ompi/mca/hook/comm_method/hook_comm_method.h

+37
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
/*
2+
* Copyright (c) 2016-2018 IBM Corporation. All rights reserved.
3+
* $COPYRIGHT$
4+
*
5+
* Additional copyrights may follow
6+
*
7+
* $HEADER$
8+
*/
9+
#ifndef MCA_HOOK_COMM_METHOD_H
10+
#define MCA_HOOK_COMM_METHOD_H
11+
12+
#include "ompi_config.h"
13+
14+
#include "ompi/constants.h"
15+
16+
#include "ompi/mca/hook/hook.h"
17+
#include "ompi/mca/hook/base/base.h"
18+
19+
BEGIN_C_DECLS
20+
21+
OMPI_MODULE_DECLSPEC extern const ompi_hook_base_component_1_0_0_t mca_hook_comm_method_component;
22+
23+
extern int mca_hook_comm_method_verbose;
24+
extern int mca_hook_comm_method_output;
25+
extern bool hook_comm_method_enable_mpi_init;
26+
extern bool hook_comm_method_enable_mpi_finalize;
27+
extern int hook_comm_method_max;
28+
extern int hook_comm_method_brief;
29+
extern char *hook_comm_method_fakefile;
30+
31+
void ompi_hook_comm_method_mpi_init_bottom(int argc, char **argv, int requested, int *provided);
32+
33+
void ompi_hook_comm_method_mpi_finalize_top(void);
34+
35+
END_C_DECLS
36+
37+
#endif /* MCA_HOOK_COMM_METHOD_H */
+179
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,179 @@
1+
/*
2+
* Copyright (c) 2016-2018 IBM Corporation. All rights reserved.
3+
* $COPYRIGHT$
4+
*
5+
* Additional copyrights may follow
6+
*
7+
* $HEADER$
8+
*/
9+
10+
#include "ompi_config.h"
11+
12+
#include "hook_comm_method.h"
13+
14+
static int ompi_hook_comm_method_component_open(void);
15+
static int ompi_hook_comm_method_component_close(void);
16+
static int ompi_hook_comm_method_component_register(void);
17+
18+
/*
19+
* Public string showing the component version number
20+
*/
21+
const char *mca_hook_comm_method_component_version_string =
22+
"Open MPI 'comm_method' hook MCA component version " OMPI_VERSION;
23+
24+
/*
25+
* Instantiate the public struct with all of our public information
26+
* and pointers to our public functions in it
27+
*/
28+
const ompi_hook_base_component_1_0_0_t mca_hook_comm_method_component = {
29+
30+
/* First, the mca_component_t struct containing meta information
31+
* about the component itself */
32+
.hookm_version = {
33+
OMPI_HOOK_BASE_VERSION_1_0_0,
34+
35+
/* Component name and version */
36+
.mca_component_name = "comm_method",
37+
MCA_BASE_MAKE_VERSION(component, OMPI_MAJOR_VERSION, OMPI_MINOR_VERSION,
38+
OMPI_RELEASE_VERSION),
39+
40+
/* Component open and close functions */
41+
.mca_open_component = ompi_hook_comm_method_component_open,
42+
.mca_close_component = ompi_hook_comm_method_component_close,
43+
.mca_register_component_params = ompi_hook_comm_method_component_register,
44+
45+
// Force this component to always be considered - component must be static
46+
//.mca_component_flags = MCA_BASE_COMPONENT_FLAG_ALWAYS_CONSIDER,
47+
},
48+
.hookm_data = {
49+
/* The component is checkpoint ready */
50+
MCA_BASE_METADATA_PARAM_CHECKPOINT
51+
},
52+
53+
/* Component functions */
54+
.hookm_mpi_initialized_top = NULL,
55+
.hookm_mpi_initialized_bottom = NULL,
56+
57+
.hookm_mpi_finalized_top = NULL,
58+
.hookm_mpi_finalized_bottom = NULL,
59+
60+
.hookm_mpi_init_top = NULL,
61+
.hookm_mpi_init_top_post_opal = NULL,
62+
.hookm_mpi_init_bottom = ompi_hook_comm_method_mpi_init_bottom,
63+
.hookm_mpi_init_error = NULL,
64+
65+
.hookm_mpi_finalize_top = ompi_hook_comm_method_mpi_finalize_top,
66+
.hookm_mpi_finalize_bottom = NULL,
67+
};
68+
69+
int mca_hook_comm_method_verbose = 0;
70+
int mca_hook_comm_method_output = -1;
71+
bool hook_comm_method_enable_mpi_init = false;
72+
bool hook_comm_method_enable_mpi_finalize = false;
73+
int hook_comm_method_max = 12;
74+
int hook_comm_method_brief = 0;
75+
char *hook_comm_method_fakefile = NULL;
76+
77+
static int ompi_hook_comm_method_component_open(void)
78+
{
79+
// Nothing to do
80+
return OMPI_SUCCESS;
81+
}
82+
83+
static int ompi_hook_comm_method_component_close(void)
84+
{
85+
// Nothing to do
86+
return OMPI_SUCCESS;
87+
}
88+
89+
static int ompi_hook_comm_method_component_register(void)
90+
{
91+
92+
/*
93+
* Component verbosity level
94+
*/
95+
// Inherit the verbosity of the base framework, but also allow this to be overridden
96+
if( ompi_hook_base_framework.framework_verbose > MCA_BASE_VERBOSE_NONE ) {
97+
mca_hook_comm_method_verbose = ompi_hook_base_framework.framework_verbose;
98+
}
99+
else {
100+
mca_hook_comm_method_verbose = MCA_BASE_VERBOSE_NONE;
101+
}
102+
(void) mca_base_component_var_register(&mca_hook_comm_method_component.hookm_version, "verbose",
103+
NULL,
104+
MCA_BASE_VAR_TYPE_INT, NULL,
105+
0, 0,
106+
OPAL_INFO_LVL_9,
107+
MCA_BASE_VAR_SCOPE_READONLY,
108+
&mca_hook_comm_method_verbose);
109+
110+
mca_hook_comm_method_output = opal_output_open(NULL);
111+
opal_output_set_verbosity(mca_hook_comm_method_output, mca_hook_comm_method_verbose);
112+
113+
/*
114+
* If the component is active for mpi_init / mpi_finalize
115+
*/
116+
hook_comm_method_enable_mpi_init = false;
117+
(void) mca_base_component_var_register(&mca_hook_comm_method_component.hookm_version, "enable_mpi_init",
118+
"Enable comm_method behavior on mpi_init",
119+
MCA_BASE_VAR_TYPE_BOOL, NULL,
120+
0, 0,
121+
OPAL_INFO_LVL_3,
122+
MCA_BASE_VAR_SCOPE_READONLY,
123+
&hook_comm_method_enable_mpi_init);
124+
125+
hook_comm_method_enable_mpi_finalize = false;
126+
(void) mca_base_component_var_register(&mca_hook_comm_method_component.hookm_version, "enable_mpi_finalize",
127+
"Enable comm_method behavior on mpi_finalize",
128+
MCA_BASE_VAR_TYPE_BOOL, NULL,
129+
0, 0,
130+
OPAL_INFO_LVL_3,
131+
MCA_BASE_VAR_SCOPE_READONLY,
132+
&hook_comm_method_enable_mpi_finalize);
133+
134+
// User can set the comm_method mca variable too
135+
int hook_comm_method = -1;
136+
(void) mca_base_var_register("ompi", NULL, NULL, "comm_method",
137+
"Enable comm_method behavior (1) mpi_init or (2) mpi_finalize",
138+
MCA_BASE_VAR_TYPE_INT, NULL,
139+
0, 0,
140+
OPAL_INFO_LVL_3,
141+
MCA_BASE_VAR_SCOPE_READONLY,
142+
&hook_comm_method);
143+
144+
if( 1 == hook_comm_method ) {
145+
hook_comm_method_enable_mpi_init = true;
146+
}
147+
else if( 2 == hook_comm_method ) {
148+
hook_comm_method_enable_mpi_finalize = true;
149+
}
150+
151+
// comm_method_max
152+
(void) mca_base_var_register("ompi", NULL, NULL, "comm_method_max",
153+
"Number of hosts for which to print unabbreviated 2d table of comm methods.",
154+
MCA_BASE_VAR_TYPE_INT, NULL,
155+
0, 0,
156+
OPAL_INFO_LVL_3,
157+
MCA_BASE_VAR_SCOPE_READONLY,
158+
&hook_comm_method_max);
159+
// comm_method_brief
160+
(void) mca_base_var_register("ompi", NULL, NULL, "comm_method_brief",
161+
"Only print the comm method summary, skip the 2d table.",
162+
MCA_BASE_VAR_TYPE_INT, NULL,
163+
0, 0,
164+
OPAL_INFO_LVL_3,
165+
MCA_BASE_VAR_SCOPE_READONLY,
166+
&hook_comm_method_brief);
167+
168+
// comm_method_fakefile is just for debugging, allows complete override of all the
169+
// comm method in the table
170+
(void) mca_base_var_register("ompi", NULL, NULL, "comm_method_fakefile",
171+
"For debugging only: read comm methods from a file",
172+
MCA_BASE_VAR_TYPE_STRING, NULL,
173+
0, 0,
174+
OPAL_INFO_LVL_3,
175+
MCA_BASE_VAR_SCOPE_READONLY,
176+
&hook_comm_method_fakefile);
177+
178+
return OMPI_SUCCESS;
179+
}

0 commit comments

Comments
 (0)