-
Notifications
You must be signed in to change notification settings - Fork 895
Remove inter-dependencies between OSC modules. #4527
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Can one of the admins verify this patch? |
OSC_MONITORING_MODULE_TEMPLATE_GENERATE(portals4, ompi_osc_portals4_module_t, comm) | ||
#undef GET_MODULE | ||
#endif /* OMPI_WITH_OSC_PORTALS4 */ | ||
OSC_MONITORING_MODULE_TEMPLATE_GENERATE(rdma) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still feels pretty brittle (although it is a step up). There probably not a generic solution, but a less-brittle version would be to have an autogenerated file from configure with these OSC_MONITORING_MODULE_TEMPLATE_GENERATE() statements, which was generated during configure time by looping over mca_ompi_osc_m4_config_component_list
and mca_ompi_osc_no_config_component_list
.
As an example, I notice that there's no template for the ucx
component.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed I have not came back for a few few months, and I missed this new component. You are totally right on how careful this whole component have to be handle. I had not thought of this m4 solution, but that's a very good idea.
ok to test |
5d6207a
to
f058ac3
Compare
Another possible approach is to unify the different OSC modules structs by moving the ompi_communicator_t pointer from the specialized modules down into the base ompi_osc_base_module_t structure (in MPI every window has it's corresponding communicator). |
Maybe this PR should not be merged yet, as it may be followed by another change to remove the introduced hashtable. struct ompi_osc_base_module_3_0_0_t {
struct ompi_communicator_t *comm;
ompi_osc_base_module_win_shared_query_fn_t osc_win_shared_query;
ompi_osc_base_module_win_attach_fn_t osc_win_attach;
ompi_osc_base_module_win_detach_fn_t osc_win_detach;
ompi_osc_base_module_free_fn_t osc_free;
ompi_osc_base_module_put_fn_t osc_put;
ompi_osc_base_module_get_fn_t osc_get;
ompi_osc_base_module_accumulate_fn_t osc_accumulate;
ompi_osc_base_module_compare_and_swap_fn_t osc_compare_and_swap;
ompi_osc_base_module_fetch_and_op_fn_t osc_fetch_and_op;
ompi_osc_base_module_get_accumulate_fn_t osc_get_accumulate;
ompi_osc_base_module_rput_fn_t osc_rput;
ompi_osc_base_module_rget_fn_t osc_rget;
ompi_osc_base_module_raccumulate_fn_t osc_raccumulate;
ompi_osc_base_module_rget_accumulate_fn_t osc_rget_accumulate;
ompi_osc_base_module_fence_fn_t osc_fence;
ompi_osc_base_module_start_fn_t osc_start;
ompi_osc_base_module_complete_fn_t osc_complete;
ompi_osc_base_module_post_fn_t osc_post;
ompi_osc_base_module_wait_fn_t osc_wait;
ompi_osc_base_module_test_fn_t osc_test;
ompi_osc_base_module_lock_fn_t osc_lock;
ompi_osc_base_module_unlock_fn_t osc_unlock;
ompi_osc_base_module_lock_all_fn_t osc_lock_all;
ompi_osc_base_module_unlock_all_fn_t osc_unlock_all;
ompi_osc_base_module_sync_fn_t osc_sync;
ompi_osc_base_module_flush_fn_t osc_flush;
ompi_osc_base_module_flush_all_fn_t osc_flush_all;
ompi_osc_base_module_flush_local_fn_t osc_flush_local;
ompi_osc_base_module_flush_local_all_fn_t osc_flush_local_all;
};
typedef struct ompi_osc_base_module_3_0_0_t ompi_osc_base_module_3_0_0_t;
typedef ompi_osc_base_module_3_0_0_t ompi_osc_base_module_t; or struct ompi_osc_base_module_3_0_0_fn_t {
ompi_osc_base_module_win_shared_query_fn_t osc_win_shared_query;
ompi_osc_base_module_win_attach_fn_t osc_win_attach;
ompi_osc_base_module_win_detach_fn_t osc_win_detach;
ompi_osc_base_module_free_fn_t osc_free;
ompi_osc_base_module_put_fn_t osc_put;
ompi_osc_base_module_get_fn_t osc_get;
ompi_osc_base_module_accumulate_fn_t osc_accumulate;
ompi_osc_base_module_compare_and_swap_fn_t osc_compare_and_swap;
ompi_osc_base_module_fetch_and_op_fn_t osc_fetch_and_op;
ompi_osc_base_module_get_accumulate_fn_t osc_get_accumulate;
ompi_osc_base_module_rput_fn_t osc_rput;
ompi_osc_base_module_rget_fn_t osc_rget;
ompi_osc_base_module_raccumulate_fn_t osc_raccumulate;
ompi_osc_base_module_rget_accumulate_fn_t osc_rget_accumulate;
ompi_osc_base_module_fence_fn_t osc_fence;
ompi_osc_base_module_start_fn_t osc_start;
ompi_osc_base_module_complete_fn_t osc_complete;
ompi_osc_base_module_post_fn_t osc_post;
ompi_osc_base_module_wait_fn_t osc_wait;
ompi_osc_base_module_test_fn_t osc_test;
ompi_osc_base_module_lock_fn_t osc_lock;
ompi_osc_base_module_unlock_fn_t osc_unlock;
ompi_osc_base_module_lock_all_fn_t osc_lock_all;
ompi_osc_base_module_unlock_all_fn_t osc_unlock_all;
ompi_osc_base_module_sync_fn_t osc_sync;
ompi_osc_base_module_flush_fn_t osc_flush;
ompi_osc_base_module_flush_all_fn_t osc_flush_all;
ompi_osc_base_module_flush_local_fn_t osc_flush_local;
ompi_osc_base_module_flush_local_all_fn_t osc_flush_local_all;
};
typedef struct ompi_osc_base_module_3_0_0_fn_t ompi_osc_base_module_3_0_0_fn_t;
struct ompi_osc_base_module_3_0_0_t {
struct ompi_communicator_t *comm;
ompi_osc_base_module_3_0_0_fn_t module;
};
typedef ompi_osc_base_module_3_0_0_t ompi_osc_base_module_t; |
OS-X is not happy:
|
The IBM CI (GNU Compiler) build failed! Please review the log, linked below. Gist: https://gist.github.com/5692e25afdb5edea04a1451fad841cf2 |
The IBM CI (PGI Compiler) build failed! Please review the log, linked below. Gist: https://gist.github.com/a6c633a1019927effca8fdabc35289eb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of these commits should be squashed.
8726fd3
to
2011660
Compare
ompi/mca/osc/monitoring/Makefile.am
Outdated
@@ -17,7 +17,8 @@ monitoring_sources = \ | |||
osc_monitoring_active_target.h \ | |||
osc_monitoring_dynamic.h \ | |||
osc_monitoring_module.h \ | |||
osc_monitoring_template.h | |||
osc_monitoring_template.h \ | |||
osc_monitoring_template_gen.h |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
template_gen.h shouldn't be in monitoring_sources. Since it's generated at every run of configure, it shouldn't be distributed in the tarball (which monitoring_sources will be). You probably need to add a nodist_mca_osc_monitoring_la_SOURCES line (I think that's right, my Automake foo is getting rusty).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in commit 2011660 .
I'm not a automake guru, but it seems like adding osc_monitoring_template_gen.h to the nodist_mcacomponent_HEADERS target worked as intended.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I saw that in the next commit (I review commits independently, because changes should be bisectable). You need to merge those two commits into one before merging this PR to master.
ompi/mca/osc/monitoring/configure.m4
Outdated
done | ||
echo '' >>[$1] | ||
cat <<EOF >>[$1] | ||
return OMPI_ERR_NOT_SUPPORTED; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to say that this shell script is really ugly to read. It would be a lot clearer (in my opinion) if you didn't try to get the else if on the same line and just had some slightly uglier code generated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to say I totally agree with you :-D And it actually led to issue with OS-X, and replacing "echo -n" with "printf" did not worked either. Commit 2011660 fixes the issue. And at this point I start feeling like I should not have kept the wrong commits, and only keep once everything was fixed. I'll do that tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option you might want to do here is not generate the inline code here in the .h
file, but rather just instantiate an argv-style array of the component names. That's the only thing that needs to be generated (the array of component names), right? Then the code can be beautifully indented in a .c
file. 😄
ompi/mca/osc/monitoring/configure.m4
Outdated
[$1], | ||
[$2]) | ||
|
||
mca_ompi_osc_monitoring_generate_templates([$OMPI_TOP_SRCDIR/ompi/mca/osc/monitoring/osc_monitoring_template_gen.h], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised this worked with "make distcheck"; you should be writing into the build directory, not the source directory. You actually shouldn't need the prefix ($OMPI_TOP_SRCDIR/
) at all; configure is by definition run at the top of the list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It did not. I think the cleancheck failed (but I did not know that at that point. I fought latter with targets to make it succeed. And I reached beyond despair at some point... But now it seems fine)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sigh; this commit should be squashed into the previous one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be squashed into 08ec754
As we can use the group to translate ranks why do we need the knowledge about the other OSC modules ? |
The OSC component specific module is kept in the window. As it keeps all the structured data needed by a component to be effective, it cannot be replaced in the window structure, because some component are expecting these information to be there, and are accessing directly through the window given as parameter. I would have love to "extend" these module to add my own informations, but I would have to be able to know their module sizes, and thus, I would have to include their module headers. Otherwise I could create a module big enough to contain any OSC module but that is quite fragile and it would get us back to the original issue that had me to create this PR. |
704d6e1
to
8127b2d
Compare
bot:ompi:retest |
Everything here is squashed into two commits. The first one does the first quick solution that translate rank from windows' group to MPI_COMM_WORLD rank. The second commit enables the automatic generation at config time. IMO, it is relevant to keep these two commits separated. From the differents tests I've made, all seems to work smoothly. As the monitoring was introduced into v3.1, this PR may be also added to the target v3.1.x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you get a chance, it would be awesome to fix the one Makefile bit, but let's ship it...
ompi/mca/osc/monitoring/Makefile.am
Outdated
|
||
noinst_LTLIBRARIES = $(component_noinst) | ||
libmca_osc_monitoring_la_SOURCES = $(monitoring_sources) | ||
libmca_osc_monitoring_la_LDFLAGS = -module -avoid-version | ||
|
||
# C file depends on the header file generated at configure time | ||
$(srcdir)/osc_monitoring_component.c: $(builddir)/osc_monitoring_template_gen.h |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the point of this rule? You don't have any generating rules, so all it's doing is expressing a dependency, which should be picked up in dependency generation. It's not harmful, but it's odd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expressing the dependency was the only point. I wasn't sure whether or not it would have been expressed otherwise. I think I tried to check that if the generated .h file was edited by hand (should not be, but who knows...) osc_monitoring_component.c was compiled again, and I think it was not. If it should be here, I'll remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can look at the .deps/... file to make sure, but the compiler should generate a dependency line. In general we don't add extra dependency information unless we're adding a build rule. So it's not required, but it's also not harmful. If you can change it, that would be great. But I'm also fine if you change it in a follow-up commit.
@jsquyres, I think @clementFoyer has addressed your squash concern; mind dismissing your review (or doing another review)? |
8127b2d
to
fd1b6c4
Compare
There was a prototype of a function that became useless and needed to be removed. I did it and removed the useless dependency explicitation from Makefile.am |
ompi/mca/osc/monitoring/configure.m4
Outdated
#define MCA_OSC_MONITORING_GEN_TEMPLATE_H | ||
|
||
#include <string.h> | ||
#include <ompi_config.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<ompi_config.h>
should be first.
ompi/mca/osc/monitoring/configure.m4
Outdated
AC_CONFIG_COMMANDS([$1], | ||
[filename="$1" | ||
components=`echo "$2" | sed -e 's/,/ /g' -e 's/monitoring//'` | ||
cat <<EOF >$filename |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit: this indenting is inconsistent with the indenting below (i.e., the indenting below is nice 4-space indenting, but this is much larger than that).
ompi/mca/osc/monitoring/configure.m4
Outdated
# | ||
AC_DEFUN([mca_ompi_osc_monitoring_generate_templates],[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
m4 macros as usually named with upper case, so that we know they are macros.
/* Include template generating macros */ | ||
#include "osc_monitoring_template.h" | ||
/**************************************/ | ||
/* Include templated macros */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not just macros, right? There's also inline functions.
ompi/mca/osc/monitoring/configure.m4
Outdated
done | ||
echo '' >>[$1] | ||
cat <<EOF >>[$1] | ||
return OMPI_ERR_NOT_SUPPORTED; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option you might want to do here is not generate the inline code here in the .h
file, but rather just instantiate an argv-style array of the component names. That's the only thing that needs to be generated (the array of component names), right? Then the code can be beautifully indented in a .c
file. 😄
fd1b6c4
to
fe0c02c
Compare
fe0c02c
to
70d6dfa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason that iterating over the already-existing ompi_osc_base_framework.framework_components
list isn't sufficient such that you need to build your own? I'm sorry I didn't make that clear in my initial review...
Also, you don't need to keep all the commits in this PR.
The 1st one adds some hard-coded component names that you then remove in later commits. That kind of stuff can be squashed together so that we only merge the final/desired/good code (multiple commits on a PR is fine -- it's just a little silly to have ABC code in commit X, and then remove ABC code in commit Y on the same PR. Know what I mean?).
@@ -40,10 +40,10 @@ | |||
|
|||
/* Define and set the ompi_osc_monitoring_## template ##_template | |||
* variable. The functions recorded here are linked to the original | |||
* functions of the original {template} module that were replaced. | |||
* functions of the original {template} module that weas replaced. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo.
70d6dfa
to
48221f7
Compare
Not sure where you ask about the |
@clementFoyer This list should already exist and be fully populated by the time your monitoring component is opened. I.e., you shouldn't have to do anything to create this list -- you should be able to just use it. It should contain the names of all the components that are available (but not necessarily selected). |
I wouldn't have squashed all the commits into one commit (I disagree with Jeff there). But since you did, you should make the commit message read like one commit. I'm not sure I understand Jeff's comment about iterating over the framework list, because that's a run-time construct. Since the code needs the list at compile time (to build the templates), that list won't do what you want. Now, there's the question of whether you still need the templates, but I'm pretty sure the answer is yes, because you need to know the size of the module for the underlying transport in order to do the replace logic. So I think this is the best you can do... |
If needed I still have a version somewhere where everything is not squashed, if that would be required. Two things about the list : I need the list to omit the monitoring component, and it is used at compilation time as it is used in the macros. Saddly, I still need the templates as I cannot simply replace the modules in the windows structures. Without it, I cannot know wich was the underlying osc omponent. I originally wanted to be able to replace the module pointers by inheriting from them, but that would require whether to know the size of the module, or to include the other headers, which led to the original issue. |
Sorry, I wasn't advocating for squashing into a single commit; multiple commits are both fine and good. But I do dislike seeing:
I also agree with Brian -- I forgot to mention that one of your commit messages looks like a squash of 2 commit messages. If you're going to squash, please make the commit message look like a single commit message. In terms of iterating over the list of components, let me go read the code again; perhaps I misunderstood what it was trying to do... |
Ok, I understand your code now -- you did it all at compile time with macros and m4 (I think my bias would have been to do it at run time iterating over the framework component list and swapping around function pointers, which is why my brain got confused when I read your code). |
BTW, I still left the "X" review so that the commits can be cleaned up per Brian's and my comments. Thanks! |
48221f7
to
efecde9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry, I missed that a new commit was put here to be reviewed. I have super minor nit-pick requests. Other than that, I think this is good.
@@ -94,6 +94,13 @@ static inline int mca_common_monitoring_get_world_rank(int dst, struct ompi_comm | |||
return ret; | |||
} | |||
|
|||
/* Translate the rank from the given communicator of a process to its rank in MPI_COMM_RANK. */ | |||
static inline int mca_common_monitoring_get_world_rank(int dst, struct ompi_communicator_t*comm, | |||
int*world_rank) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why rename this function, and leave a 1-line inline alias function to get to the new function? Why not just call the new function name directly?
ompi/mca/osc/monitoring/configure.m4
Outdated
# | ||
# Additional copyrights may follow | ||
# Append to file $1 an array of components by listing component names in $2. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify this comment: this function does not append to the end of an existing file -- it overwrites file $1.
ompi/mca/osc/monitoring/Makefile.am
Outdated
|
||
noinst_LTLIBRARIES = $(component_noinst) | ||
libmca_osc_monitoring_la_SOURCES = $(monitoring_sources) | ||
libmca_osc_monitoring_la_LDFLAGS = -module -avoid-version | ||
|
||
distclean-local: | ||
rm -f $(builddir)/osc_monitoring_template_gen.h |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit: you can add this file to DISTCLEANFILES
and not have to make a distclean-local
rule.
The osc monitoring component needed to include other OSC components header in order to be able tu access communicator through the component specific ompi_osc_*_module_t structures. This commit remove the dependency, and resolve the issue open-mpi#4523. Extend the common monitoring API. * Now it's possible to translate from local rank to world rank from both the communicator and the group. * Remove useless hashtable as we directly use the w_group contained in window structure. Add automatic generation at config time. The templates are expanded at configure time. It creates a new header file that generates all the variables/functions needed. Adding this during the autogen automagicaly generates for each of the available modules the proper functions. Only keep a generated argv-style array. Following Jeff's advice, the configure.m4 file generate a simple array of module variables to be iterated over to find the proper module. Signed-off-by: Clement Foyer <clement.foyer@inria.fr>
efecde9
to
f5b4fc0
Compare
@jsquyres Every thing seems to pass. The requested changes have been made. |
@bwbarrett A bunch of things have changed since you reviewed. Care to review again, or are you good / we should merge? |
I'll take Brian's silence and his prior approval as a positive sign. Let's merge. Thanks for your patience, @clementFoyer! |
The osc monitoring component needed to include other OSC components header in order to be able to access communicator through the component specific ompi_osc_*_module_t structures. Now we keep a hashtable to keep track of the window specific communicator. This commit remove the dependency, and resolve the issue #4523 opened by @tkordenbrock .
@bosilca @jsquyres Please review.