Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Topic/monitoring #3109

Merged
merged 83 commits into from
Jun 26, 2017
Merged

Topic/monitoring #3109

merged 83 commits into from
Jun 26, 2017

Conversation

bosilca
Copy link
Member

@bosilca bosilca commented Mar 6, 2017

Add a new capability to Open MPI. With this patch, OMPI gain the capability to monitor all communications and to dump the communication heat map (including differences between internal and user communications) for point-to-point, collective and RMA. The entire framework is driven using the new, MPI Forum blessed, MPI_T interface.

Fully fledged documentation and scripts to manipulate the generated data are also provided.

collectives are already decomposed in send and receive calls.

The monitoring is strored internally by each process and output on stderr at the end of the
>>>>>>> Add a monitoring PML. This PML track all data exchanges by the processes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge leftovers

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks. fixed.

@bosilca bosilca force-pushed the topic/monitoring branch 13 times, most recently from 3ad72ce to 869b60e Compare March 7, 2017 03:22
@bosilca bosilca added this to the v3.0.0 milestone Mar 7, 2017
@hjelmn
Copy link
Member

hjelmn commented Mar 8, 2017

:bot:retest:

@hppritcha
Copy link
Member

lanl distcheck doesn't like this

make[3]: Entering directory `/home/hppritcha/jenkins/workspace/ompi_master_pr_distcheck/openmpi-gitclone/_build/ompi/mca/common/monitoring'
  CC       libmca_common_monitoring_la-common_monitoring.lo
  CC       libmca_common_monitoring_la-common_monitoring_coll.lo
  LN_S     libmca_common_monitoring.la
../../../../../ompi/mca/common/monitoring/common_monitoring.c:17:31: fatal error: common_monitoring.h: No such file or directory
 #include "common_monitoring.h"
                               ^
compilation terminated.
../../../../../ompi/mca/common/monitoring/common_monitoring_coll.c:17:31: fatal error: common_monitoring.h: No such file or directory
 #include "common_monitoring.h"
                               ^
compilation terminated.

@hppritcha
Copy link
Member

botany bay doesn't like this either:

  CC       monitoring_prof.lo
monitoring_prof.c:385:6: warning: no previous prototype for function 'monitoring_prof_mpi_init_f2c' [-Wmissing-prototypes]
void monitoring_prof_mpi_init_f2c( MPI_Fint *ierr ) { 
     ^
monitoring_prof.c:394:6: warning: no previous prototype for function 'monitoring_prof_mpi_finalize_f2c' [-Wmissing-prototypes]
void monitoring_prof_mpi_finalize_f2c( MPI_Fint *ierr ) { 
     ^
In file included from monitoring_prof.c:417:
../../ompi/mpi/fortran/mpif-h/bindings.h:67:2: error: Unrecognized Fortran name mangling scheme
#error Unrecognized Fortran name mangling scheme
 ^
monitoring_prof.c:424:28: error: expected identifier
                           (MPI_Fint *ierr),
                           ^
monitoring_prof.c:427:1: error: expected function body after function declarator
OMPI_GENERATE_F77_BINDINGS (MPI_FINALIZE,
^
2 warnings and 3 errors generated.
make[2]: *** [monitoring_prof.lo] Error 1
make[1]: *** [install-recursive] Error 1
make: *** [install-recursive] Error 1
Build step 'Execute shell' marked build as failure

@hppritcha
Copy link
Member

the sign off checker doesn't like a lot of these commits either.
Please squash down the commits in this PR at least some.

@bosilca
Copy link
Member Author

bosilca commented Mar 8, 2017

I don't see how it cannot find the "common_monitoring.h" file. We are supposed to automatically include "." so the file should be readily available.

I am really sorry for the sign-off checker. The commits will be squashed anyway once merged.

@ggouaillardet
Copy link
Contributor

@bosilca that was just a typo that causes common_monitoring.h is missing from the tarball
an other error is related to lack of weak symbols and no fortran bindings

these should be both fixes in #3131
once it passes CI, i will PR vs your branch so you can simply merge it

@bosilca
Copy link
Member Author

bosilca commented Mar 9, 2017

Thanks @ggouaillardet, I haven't noticed the failure was on due "make dist".

@ggouaillardet
Copy link
Contributor

@bosilca i filed bosilca#2 to fix all the glitches with this PR

@bosilca
Copy link
Member Author

bosilca commented Mar 9, 2017

Thanks @ggouaillardet, I merged your PR.

@ggouaillardet
Copy link
Contributor

@bosilca i filed bosilca#3 to silence some misc warnings

@hppritcha hppritcha modified the milestones: Future, v3.0.0 Mar 13, 2017
clementFoyer and others added 25 commits June 26, 2017 14:24
    * Fix monitoring_coll_data type definition.
    * Use size(COMM_WORLD)-1 to determine max number of digits.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
…g_prof

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
…t updating the documentation

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
…o the added pvar. Fix monitoring array allocation.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
Update and then move the latex and README documentation to a more logical place

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
…ntation accordingly.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
…I_Gather).

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
… footprint of monitoring_prof. Unify OSC related outputs.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
Add error file output. Add MPI_Put and MPI_Get results analysis. Add overhead computation for complete sending (pingpong / 2).

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
…copyrigths to the test_overhead script

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
Include ompi/mpi/fortran/mpif-h/bindings.h only if needed.
Add sanity check before emptying hashtable.
Fix typos in documentation.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
* test/monitoring: fix test when weak symbols are not available
* monitoring: fix a typo and add a missing file in Makefile.am
and have monitoring_common.h and monitoring_common_coll.h included in the distro
* test/monitoring: cleanup all tests and make distclean a happy panda
* test/monitoring: use gettimeofday() if clock_gettime() is unavailable
* monitoring: silence misc warnings (#3)

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Keep the size_t used accross all monitoring components.
Adapt the documentation.
Remove useless MPI_Request and MPI_Status from monitoring_test.c.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
Use ptrdiff_t instead of OPAL_PTRDIFF_TYPE to reflect the changes from commit fa5cd0d.

Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
@bosilca bosilca force-pushed the topic/monitoring branch from 68c0163 to cfb6dec Compare June 26, 2017 12:25
Also install them in the build and the install directories.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
@bosilca bosilca force-pushed the topic/monitoring branch from cfb6dec to d87cba3 Compare June 26, 2017 13:39
@bosilca bosilca merged commit d55b666 into open-mpi:master Jun 26, 2017
@bosilca bosilca deleted the topic/monitoring branch June 26, 2017 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants