Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Topic/monitoring #3109

Merged
merged 83 commits into from
Jun 26, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
83 commits
Select commit Hold shift + click to select a range
5e5f269
Add a monitoring PML. This PML track all data exchanges by the processes
bosilca Mar 5, 2015
673c766
Fix a convertion problem and add a comment about the lack of component
bosilca Jul 25, 2015
a4a9b39
add ability to querry pml monitorinting results with MPI Tools interface
gpapaure May 6, 2015
7316eb2
Allow the pvar to be written by invoking the associated callback.
bosilca Sep 15, 2015
d50394b
Various fixes for the monitoring.
bosilca Sep 16, 2016
617e3c5
Cleanup for the monitoring module.
bosilca Sep 21, 2016
8b0af75
Adding documentation about how to use pml_monitoring component.
clementFoyer Sep 23, 2016
c3dbc7e
Change rank into MPI_COMM_WORLD and size(MPI_COMM_WORLD) to global va…
clementFoyer Sep 26, 2016
3689a01
Improve monitoring support (including integration with MPI_T)
clementFoyer Sep 26, 2016
462e59a
Add overhead benchmark, with script to use data and create graphs out…
clementFoyer Oct 13, 2016
30ac0a3
Fix segfault error at end when not loading pml
clementFoyer Oct 19, 2016
c9c9739
Start create common monitoring module. Factorise version numbering
clementFoyer Oct 21, 2016
e1b35a6
Fix microbenchmarks script
clementFoyer Oct 21, 2016
dbbd563
Resolve brutal segfault when double freeing filename
clementFoyer Oct 21, 2016
0cfa1d3
Improve readability of code
clementFoyer Oct 21, 2016
5d2725f
Add osc monitoring component
clementFoyer Oct 21, 2016
9894ba6
Add error checking if running out of memory in osc_monitoring
clementFoyer Oct 21, 2016
ef0ccca
Moving to ompi/mca/common the proper parts of the monitoring system
clementFoyer Oct 26, 2016
e63532a
Fix linking library with mca/common
clementFoyer Oct 27, 2016
886b180
Add calls to record monitored data from osc. Use common function to t…
clementFoyer Oct 26, 2016
8ca765e
Fix test_overhead benchmark script distribution
clementFoyer Oct 27, 2016
8458be7
Add passive operations in monitoring_test
clementFoyer Oct 27, 2016
7ad29cf
Fix from rank calculation. Add more detailed error messages
clementFoyer Oct 27, 2016
9ec0be4
Fix alignments. Fix common_monitoring_get_world_rank function. Remove…
clementFoyer Oct 27, 2016
ce6a7cb
Fix osc_monitoring mget_message_count function call
clementFoyer Oct 28, 2016
3ff6e7c
Change common_monitoring function names to respect the naming convent…
clementFoyer Oct 28, 2016
ec6c426
Add monitoring common output system
clementFoyer Nov 2, 2016
88835f6
Add error message when trying to flush to a file, and open fails. Rem…
clementFoyer Nov 4, 2016
7ddc0bc
Consistent output file name (with and without MPI_T).
clementFoyer Nov 8, 2016
edc6863
Always output to a file when flushing at pvar_stop(flush).
clementFoyer Nov 9, 2016
f944638
Update the monitoring documentation.
clementFoyer Nov 9, 2016
f6e1ba8
Use the world_rank for printf's.
clementFoyer Nov 9, 2016
a4014ec
Clean potential previous runs, but keep the results at the end in ord…
clementFoyer Nov 9, 2016
ed6f1fd
Add security check for unique initialization for osc monitoring
clementFoyer Nov 10, 2016
de5577f
Clean the amout of symbols available outside mca/common/monitoring
clementFoyer Nov 10, 2016
c6dd8e0
Remove use of __sync_* built-ins. Use opal_atomic_* instead.
clementFoyer Nov 10, 2016
52dfb1c
Add histogram ditribution of message sizes
clementFoyer Nov 21, 2016
549d2ef
Allocate the hashtable on common/monitoring component initialization.…
clementFoyer Nov 18, 2016
545f64c
Deleting now useless file : moved to common/monitoring
clementFoyer Nov 18, 2016
d1339da
Add histogram array of 2-based log of message sizes. Use simple call …
clementFoyer Nov 21, 2016
e95ea72
Add informations in dumping file. Separate per category (pt2pt/osc/co…
clementFoyer Nov 21, 2016
55ebb61
Add coll component for collectives communications monitoring
clementFoyer Nov 21, 2016
003c92e
Fix warning messages : use c_name as the magic id is not always defin…
clementFoyer Nov 22, 2016
b5becc0
Fix log10_2 constant initialization. Fix index calculation for histog…
clementFoyer Nov 22, 2016
56063c9
Add debug info messages to follow more easily initialization steps.
clementFoyer Nov 22, 2016
3c96586
Group all the var/pvar definitions to common_monitoring. Separate ini…
clementFoyer Nov 24, 2016
76dac4d
Fix invalid memory allocation. Initialize initial_filename to empty s…
clementFoyer Nov 25, 2016
2f1c06b
Fix missing procs in hashtable. Cache coll monitoring data.
clementFoyer Nov 28, 2016
f31cf18
Don't install the test scripts.
bosilca Nov 28, 2016
0d9235e
Use intermediate variable to avoid invalid write while retrieving ran…
clementFoyer Nov 29, 2016
32c90eb
Add missing release of the last element in flush_all. Add release of …
clementFoyer Nov 29, 2016
070ef41
Use a linked list instead of a hashtable to keep tracks of communicat…
clementFoyer Nov 29, 2016
08fa473
Set world_rank from hashtable only if found
clementFoyer Nov 30, 2016
8f7cbb4
Use predefined symbol from opal system to print int
clementFoyer Nov 30, 2016
d8c95e6
Move collective monitoring data to a hashtable. Add pvar to access th…
clementFoyer Nov 30, 2016
ce76a15
Fix pvar registration. Use OMPI_ERROR isntead of -1 as returned error…
clementFoyer Dec 6, 2016
dd3480e
Add automated check (with MPI_Tools) of monitoring.
clementFoyer Dec 6, 2016
5d59aa8
Fix procs list caching in common_monitoring_coll_data_t
clementFoyer Dec 6, 2016
dd51986
Add linking to Fortran applications for LD_PRELOAD usage of monitorin…
clementFoyer Dec 13, 2016
a059c17
Add PVAR's handles. Clean up code (visibility, add comments...). Star…
clementFoyer Dec 14, 2016
6ae15ea
Fix coll operations monitoring. Update check_monitoring accordingly t…
clementFoyer Dec 15, 2016
770d1e0
Documentation update.
clementFoyer Dec 15, 2016
6ee2d62
Aggregate monitoring COLL data to the generated matrix. Update docume…
clementFoyer Dec 15, 2016
9530757
Fix monitoring_prof (bad variable.vector used, and wrong array in PMP…
clementFoyer Dec 15, 2016
145a52f
Add reduce_scatter and reduce_scatter_block monitoring. Reduce memory…
clementFoyer Dec 16, 2016
c365c30
Add the use of a machine file for overhead benchmark
clementFoyer Jan 17, 2017
8376ad3
Check for out-of-bound write in histogram
clementFoyer Jan 18, 2017
b8d6825
Fix common_monitoring_cache object init for MPI_COMM_WORLD
clementFoyer Jan 19, 2017
08fa850
Add RDMA benchmarks to test_overhead
clementFoyer Feb 1, 2017
2914158
Add computation of average and median of overheads. Add comments and …
clementFoyer Feb 17, 2017
cbf9bf3
Add dumping histogram in edge case
clementFoyer Mar 3, 2017
716dbb8
Add technical documentation
clementFoyer Feb 24, 2017
5dbf25d
Update expected output in test/monitoring/monitoring_test.c
clementFoyer Mar 2, 2017
db3f3c4
Adding a reduce(pml_monitoring_messages_count, MPI_MAX) example
clementFoyer Mar 6, 2017
e7b154f
Adapt to the new definition of communicators
clementFoyer Mar 1, 2017
ee939b9
Add consistency in header inclusion.
clementFoyer Mar 7, 2017
b139f2b
misc monitoring fixes
ggouaillardet Mar 9, 2017
4ac5ac9
Cleanups.
bosilca Mar 16, 2017
b93f217
Changing int64_t to size_t.
clementFoyer Mar 23, 2017
00c896b
Add parameter for RMA test case
clementFoyer Apr 5, 2017
65884bf
Clean the maximum bound computation for proc list dump.
clementFoyer May 10, 2017
a0c31de
Add communicator-specific monitored collective data reset
clementFoyer Jun 8, 2017
d87cba3
Add monitoring scripts to the 'make dist'
bosilca Jun 21, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions configure.ac
Original file line number Diff line number Diff line change
Expand Up @@ -1409,6 +1409,10 @@ AC_CONFIG_FILES([
test/util/Makefile
])
m4_ifdef([project_ompi], [AC_CONFIG_FILES([test/monitoring/Makefile])])
m4_ifdef([project_ompi], [
m4_ifdef([MCA_BUILD_ompi_pml_monitoring_DSO_TRUE],
[AC_CONFIG_LINKS(test/monitoring/profile2mat.pl:test/monitoring/profile2mat.pl
test/monitoring/aggregate_profile.pl:test/monitoring/aggregate_profile.pl)])])

AC_CONFIG_FILES([contrib/dist/mofed/debian/rules],
[chmod +x contrib/dist/mofed/debian/rules])
Expand Down
49 changes: 19 additions & 30 deletions ompi/mca/coll/base/coll_base_find_available.c
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
* Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana
* University Research and Technology
* Corporation. All rights reserved.
* Copyright (c) 2004-2005 The University of Tennessee and The University
* Copyright (c) 2004-2017 The University of Tennessee and The University
* of Tennessee Research Foundation. All rights
* reserved.
* Copyright (c) 2004-2005 High Performance Computing Center Stuttgart,
Expand Down Expand Up @@ -46,9 +46,6 @@
static int init_query(const mca_base_component_t * ls,
bool enable_progress_threads,
bool enable_mpi_threads);
static int init_query_2_0_0(const mca_base_component_t * ls,
bool enable_progress_threads,
bool enable_mpi_threads);

/*
* Scan down the list of successfully opened components and query each of
Expand Down Expand Up @@ -105,6 +102,20 @@ int mca_coll_base_find_available(bool enable_progress_threads,
}


/*
* Query a specific component, coll v2.0.0
*/
static inline int
init_query_2_0_0(const mca_base_component_t * component,
bool enable_progress_threads,
bool enable_mpi_threads)
{
mca_coll_base_component_2_0_0_t *coll =
(mca_coll_base_component_2_0_0_t *) component;

return coll->collm_init_query(enable_progress_threads,
enable_mpi_threads);
}
/*
* Query a component, see if it wants to run at all. If it does, save
* some information. If it doesn't, close it.
Expand Down Expand Up @@ -138,33 +149,11 @@ static int init_query(const mca_base_component_t * component,
}

/* Query done -- look at the return value to see what happened */

if (OMPI_SUCCESS != ret) {
opal_output_verbose(10, ompi_coll_base_framework.framework_output,
"coll:find_available: coll component %s is not available",
component->mca_component_name);
} else {
opal_output_verbose(10, ompi_coll_base_framework.framework_output,
"coll:find_available: coll component %s is available",
component->mca_component_name);
}

/* All done */
opal_output_verbose(10, ompi_coll_base_framework.framework_output,
"coll:find_available: coll component %s is %savailable",
component->mca_component_name,
(OMPI_SUCCESS == ret) ? "": "not ");

return ret;
}


/*
* Query a specific component, coll v2.0.0
*/
static int init_query_2_0_0(const mca_base_component_t * component,
bool enable_progress_threads,
bool enable_mpi_threads)
{
mca_coll_base_component_2_0_0_t *coll =
(mca_coll_base_component_2_0_0_t *) component;

return coll->collm_init_query(enable_progress_threads,
enable_mpi_threads);
}
53 changes: 53 additions & 0 deletions ompi/mca/coll/monitoring/Makefile.am
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
#
# Copyright (c) 2016 Inria. All rights reserved.
# $COPYRIGHT$
#
# Additional copyrights may follow
#
# $HEADER$
#

monitoring_sources = \
coll_monitoring.h \
coll_monitoring_allgather.c \
coll_monitoring_allgatherv.c \
coll_monitoring_allreduce.c \
coll_monitoring_alltoall.c \
coll_monitoring_alltoallv.c \
coll_monitoring_alltoallw.c \
coll_monitoring_barrier.c \
coll_monitoring_bcast.c \
coll_monitoring_component.c \
coll_monitoring_exscan.c \
coll_monitoring_gather.c \
coll_monitoring_gatherv.c \
coll_monitoring_neighbor_allgather.c \
coll_monitoring_neighbor_allgatherv.c \
coll_monitoring_neighbor_alltoall.c \
coll_monitoring_neighbor_alltoallv.c \
coll_monitoring_neighbor_alltoallw.c \
coll_monitoring_reduce.c \
coll_monitoring_reduce_scatter.c \
coll_monitoring_reduce_scatter_block.c \
coll_monitoring_scan.c \
coll_monitoring_scatter.c \
coll_monitoring_scatterv.c

if MCA_BUILD_ompi_coll_monitoring_DSO
component_noinst =
component_install = mca_coll_monitoring.la
else
component_noinst = libmca_coll_monitoring.la
component_install =
endif

mcacomponentdir = $(ompilibdir)
mcacomponent_LTLIBRARIES = $(component_install)
mca_coll_monitoring_la_SOURCES = $(monitoring_sources)
mca_coll_monitoring_la_LDFLAGS = -module -avoid-version
mca_coll_monitoring_la_LIBADD = \
$(OMPI_TOP_BUILDDIR)/ompi/mca/common/monitoring/libmca_common_monitoring.la

noinst_LTLIBRARIES = $(component_noinst)
libmca_coll_monitoring_la_SOURCES = $(monitoring_sources)
libmca_coll_monitoring_la_LDFLAGS = -module -avoid-version
Loading