-
Notifications
You must be signed in to change notification settings - Fork 868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MTL OFI: add support for FI_REMOTE_CQ_DATA. #5004
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't really review the majority of this code, but I did have a small number of comments.
ompi/mca/mtl/ofi/mtl_ofi.h
Outdated
if (OPAL_UNLIKELY(0 > ret)) { | ||
char *fi_api; | ||
if (ompi_mtl_ofi.fi_cq_data) { | ||
asprintf( &fi_api, "fi_tinjectddata"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to use a string constant here? I know this is an error path, and it doesn't really matter, but it might be slightly simpler to:
char *fi_api = ompi_mtl_ofi.fi_cq_data ? " fi_tinjectddata" : "fi_tinject";
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No limitation and indeed looks nicer. Will update.
config/opal_check_ofi.m4
Outdated
fi | ||
|
||
AC_DEFINE_UNQUOTED([MTL_OFI_ALTERNATIVE_DEFAULT_TAG],$ALTERNATIVE_TAG, | ||
[Use ofi alternative no FI_REMOTE_CQ_DATA tag]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand what this configure
CLI option is for. It sounds like if the provider does not support FI_REMOTE_CQ_DATA
, you use a different bit mapping scheme for tag matching. Is that right?
If so, why isn't that just detected and used a run time -- why does it require a configure-time argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is indeed a runtime detection of FI_REMOTE_CQ_DATA and fallback to a default OFI tag fields distribution (mtl_ofi_component.c 415 to 458). However, the original implementation offered very few bits of the OFI tag to pack the source rank (16) considerably limiting scalability. @bosilca suggested having 2 options for the default/fallback OFI tag fields distribution that can be selected at configure time, with the "default default" having more bits for the source rank than for the user tag (as openib btl does). See mtl_ofi_types.h 78-81. This option is to choose DEFAULT_2 at build time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to keep the old / unscalable-number-of-bits-for-source default values?
Should the number of source bits be dynamically determined by the job size / size of MPI_COMM_WORLD?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to keep the old / unscalable-number-of-bits-for-source default values?
Here is my understanding OMPI "needs" 96 bits 32x3 for source rank, communicator id and MPI tag. However, OFI offers 64 in the tag, plus 32 (or more) in FI_REMOTE_CQ_DATA when supported. When not supported, 64 is all you get and must make it fit. The "old_unscalable" tag was thought to provide the full range of MPI tags. If a user still needs this, can build with the --enable-mtl-ofi-alternative-tag
Should the number of source bits be dynamically determined by the job size / size of MPI_COMM_WORLD?
Maybe, but would still be cases that will fall out of the logic when spawning new ranks and crating new communicators. Maybe an alternative is reading the total number of available slots, it it actually possible see this value in an MTL ? but will have to implement a logic that will still leave some cases out (as long as 64 < 96 😮 ) OR over reserve for slots not used. So, a build time option may still be helpful. The question would be what are apps really needing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jsquyres Would it be possible to merge the PR as is? We have a few other patches we want to contribute that depend on this one. I understand and agree that a dynamic approach for defining the tag bits is the optimal way to go. However, a) most of the providers today support FI_REMOTE_CQ_DATA b) as George shared, there risk of the default TAG not being enough is very reduced, the build time flag is just an alternative for when that odd case happens. We can indeed revisit the the dynamic approach if we see the concern grows. Thanks,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, this is effectively your+Cray's module, so if you can get @hppritcha's approval, you can go ahead and merge. But I still think:
- Having a compile-time decision like this is somewhat un-OMPI-ish. IMHO, it would be more natural to have a decision like this be a run-time / MCA-var-driven decision. But I'm not quite sure what decision you're trying to get from the user -- isn't it all driven by what the underlying libfabric provider provides? I.e., if the fi provider supports 32 bits in FI_REMOTE_CQ_DATA, then you use scheme A. If it doesn't, you use scheme B. Does this need to be a user-driven decision at all?
- Does the OFI MTL convey back the max supported tag value back up to the CM PML?
I agree that dynamically sizing the number of bits based on the size of the job could be considered outside the scope of this PR. Random question, though: does the OFI MTL support spawn (i.e., adding more peers)? If not, then you could consider the job size as constant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't like the name. It doesn't give any indication of why you might want to use the alternative logic. Much prefered would be --enable/disable-remote-cq-data-matching or somesuch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Strictly speaking, it can be made a runtime decision. However, given the low chance of occurrence @bolica suggested making it a build time.
Clarification of the decision being made by the option: (also clarifies @bwbarrett 's question): OMPI needs to send 96 bits with each message (32 source rank, 32 communicator id, 32 user tag) but OFI tag only offers 64. When the OFI provider supports FI_REMOTE_CQ_DATA (detected at runtime) there is no limitation since 32 bit is the minimum required by OFI spec. However, when it is not supported some bits need to be trimmed (96 -> 64). Then is when this option comes into play and offers two alternatives for "fallback tag" with different options to distribute the 64 bits available. -
yes, this is shown in the comments belwo
Random question, though: does the OFI MTL support spawn (i.e., adding more peers)?
Yes.
ompi/mca/mtl/ofi/mtl_ofi.c
Outdated
(1UL << 30), /* max tag value - must allow negatives */ | ||
(int)((1ULL << MTL_OFI_CID_BIT_COUNT ) - 1), /* max cid */ | ||
(int)(1ULL << (MTL_OFI_TAG_BIT_COUNT - 2)),/* max tag value - must allow negatives */ | ||
// (1UL << 30), /* max tag value - must allow negatives */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to commit a commented-out line like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think OFI_TAG_BIT_COUNT is right; numbers can be negative. That doesn't mean - 2, that means / 2 for max tag value. I could be wrong; the allocation is hard to figure out. But pretty sure it's / 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to commit a commented-out line like this.
Oops, this slipped out. Will remove
I don't think OFI_TAG_BIT_COUNT is right;
I agree! I asked this in the devel list. I just took a conservative approach and followed what was already there. Let me test this further.
config/opal_check_ofi.m4
Outdated
fi | ||
|
||
AC_DEFINE_UNQUOTED([MTL_OFI_ALTERNATIVE_DEFAULT_TAG],$ALTERNATIVE_TAG, | ||
[Use ofi alternative no FI_REMOTE_CQ_DATA tag]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, this is effectively your+Cray's module, so if you can get @hppritcha's approval, you can go ahead and merge. But I still think:
- Having a compile-time decision like this is somewhat un-OMPI-ish. IMHO, it would be more natural to have a decision like this be a run-time / MCA-var-driven decision. But I'm not quite sure what decision you're trying to get from the user -- isn't it all driven by what the underlying libfabric provider provides? I.e., if the fi provider supports 32 bits in FI_REMOTE_CQ_DATA, then you use scheme A. If it doesn't, you use scheme B. Does this need to be a user-driven decision at all?
- Does the OFI MTL convey back the max supported tag value back up to the CM PML?
I agree that dynamically sizing the number of bits based on the size of the job could be considered outside the scope of this PR. Random question, though: does the OFI MTL support spawn (i.e., adding more peers)? If not, then you could consider the job size as constant.
ompi/mca/mtl/ofi/mtl_ofi_types.h
Outdated
} else { \ | ||
match_bits |= (MTL_OFI_TAG_MASK & tag); \ | ||
} \ | ||
#define MTL_OFI_SET_RECV_BITS(match_bits, mask_bits, comm_id, source, tag) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite long for a macro. Is there a reason to not make this an inline function? (some of the others above are also a little long)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No really good reason, just extended the existing approach. OMPI prefers inline for longer macros? sure, I can change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we were cautious about using inline for long operations because of the jitter it can cause as compilers can't optimize the instruction cache as much as usual? @bwbarrett ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Macros vs. inline in this case is probably to-MAY-to vs. to-MAH-to: i.e., you're shoving a bunch of code inline, regardless of the mechanism.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To merge this patch, I think we should really document (in, say, a README.md in the mtl directory) how this all works. I'm confused, and it's all based on Portals ideas that I've been working on for years. A description of how matching works for the different modes supported in a README would go a long way towards maintainability.
config/opal_check_ofi.m4
Outdated
fi | ||
|
||
AC_DEFINE_UNQUOTED([MTL_OFI_ALTERNATIVE_DEFAULT_TAG],$ALTERNATIVE_TAG, | ||
[Use ofi alternative no FI_REMOTE_CQ_DATA tag]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't like the name. It doesn't give any indication of why you might want to use the alternative logic. Much prefered would be --enable/disable-remote-cq-data-matching or somesuch.
ompi/mca/mtl/ofi/mtl_ofi.c
Outdated
(1UL << 30), /* max tag value - must allow negatives */ | ||
(int)((1ULL << MTL_OFI_CID_BIT_COUNT ) - 1), /* max cid */ | ||
(int)(1ULL << (MTL_OFI_TAG_BIT_COUNT - 2)),/* max tag value - must allow negatives */ | ||
// (1UL << 30), /* max tag value - must allow negatives */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think OFI_TAG_BIT_COUNT is right; numbers can be negative. That doesn't mean - 2, that means / 2 for max tag value. I could be wrong; the allocation is hard to figure out. But pretty sure it's / 2.
ompi/mca/mtl/ofi/mtl_ofi.h
Outdated
@@ -244,6 +244,7 @@ ompi_mtl_ofi_send_start(struct mca_mtl_base_module_t *mtl, | |||
ompi_proc_t *ompi_proc = NULL; | |||
mca_mtl_ofi_endpoint_t *endpoint = NULL; | |||
ompi_mtl_ofi_request_t *ack_req = NULL; /* For synchronous send */ | |||
fi_addr_t src_addr=0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src_addr = 0;
(spaces are important)
ompi/mca/mtl/ofi/mtl_ofi.h
Outdated
src_addr = endpoint->peer_fiaddr; | ||
} else { | ||
MTL_OFI_SET_SEND_BITS(match_bits, comm->c_contextid, | ||
comm->c_my_rank, tag); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think the indentation is wrong here.
config/opal_check_ofi.m4
Outdated
@@ -100,6 +101,17 @@ AC_DEFUN([OPAL_CHECK_OFI],[ | |||
AC_SUBST($1_CPPFLAGS) | |||
AC_SUBST($1_LDFLAGS) | |||
AC_SUBST($1_LIBS) | |||
AC_ARG_ENABLE(mtl-ofi-alternative-tag, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be in the MTL's configure.m4, not the toplevel opal_check_ofi. This flag does nothing for OSC or a potential OOB, for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I will move.
I agree on adding a README.md. However, I would like to reach a general agreement on the the build time option being discussed, and document accordingly. Thanks for the feedback. |
@matcabral, honestly, I am having a hard time sorting through the protocol differences without the readme to comment on the high level part of the software. That's why I poked at the README. I think the general idea is ok, although clearly I'd have a preference for runtime selection. We also need the non-cq_data case to work by default because packaging in distros. |
@bwbarrett @jsquyres I'm fine with moving it to be a runtime option, but would please ask for some feedback on the following before doing the change. AFAIK, the openib btl only has 16 bits for the user tag and that has not been a problem. So, would it be safe to just follow the same approach for the OFI fallback tag (and not use any option at all)? I'm honestly trying to not add runtime options unless necessary, to avoid overwhelming users with things needed in very odd cases. README: I will create one and post it for review, would appreciate some input on the above question. The non-cq data is definitely expected to work by default, subject that 16 bits are good for user tag. OR, can go back to the previous fallback tag that had more bits for tag and less for source rank. Thanks! |
The BTL design pushes tag sizing into the OB1 PML. The OB1 PML has a max tag size of MAX_INT. The Portals4, PSM, and PSM2 MTLs all limit tag to 2^30. I have trouble seeing us being ok with max tag being only 15 bits; yes that's all the standard requires, but when Portals3.3 used that as a MPI_TAG_UB, there was considerable pushback from users. |
@bwbarrett Thanks for the input, will effectively create the runtime mca paramters. Now, these are the two fallback distributions I'm proposing: Thoughts? Note that I am showing 2 more bits for the source than what was shared on this PR since we optimized the sync send protocol to use only 2 bits (instead of 4). However, that is in a different patch. Thanks, |
I think 18 bits for source rank is so far above average job size that I'd push to make that 2^15 or 2^16. The places that don't have cq_data can't scale as large; that's a reasonable tradeoff. |
Just being paranoid and willing to avoid confusions: both of the fallback tags I shared are for the case when cq_data is not supported. So, when no cq_data is available, the MTL will offer 2^20 by default for the source rank, and move to offer 2^18 when (no cq_data is available) and the mca paramter is passed. From your suggestion, I get I should be reducing this. |
@matcabral could you update with some of this feedback incorporated? Then I'll run against GNI provider again and review |
Hi @hppritcha, I'm actively working on the comments. I will have an update posted in the next couple days. thanks, |
Summary of the changes in new patch:
NOTE: We have a patch that applies on top of this one that optimized the MPI_Ssend protocol to use 2 bits instead of 4. So there will 2 more bits available in the fallback tags. |
Hi @bwbarrett, would please take a look at the latest patch? thanks, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only thing I really didn't like in this patch was the lack of a mode-set parameter, making it hard to ensure all three cases work in our nightly testing.
ompi/mca/mtl/ofi/mtl_ofi_component.c
Outdated
fi_strerror(-ret), -ret); | ||
goto error; | ||
} | ||
else if ((NULL != prov_cq_data) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Open MPI style is } else if (....
on one line. So the }
on line 453 should be on line 454.
Also, why an else if, if your previous test is going to goto NULL. It's a little defensive, but it's also a bit hard to follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will more the else if
to line 453.
else if
: The first test checks for (-FI_ENODATA != ret) to consider it an error. So, you still don't know if the provider supports CQ_DATA
@@ -136,7 +137,23 @@ ompi_mtl_ofi_component_register(void) | |||
MCA_BASE_VAR_SCOPE_READONLY, | |||
&ompi_mtl_ofi.ofi_progress_event_count); | |||
|
|||
free(desc); | |||
free(desc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the unnecessary change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was a minor indentation fix
ompi/mca/mtl/ofi/mtl_ofi_component.c
Outdated
free(desc); | ||
|
||
fallback_alternative_tag = false; | ||
asprintf(&desc, "Use alternative ofi tag bits distribution for providers that do not support FI_REMOTE_CQ_DATA:" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love this. I've read it three times and I'm still confused. I also don't like that there's no way to force using cq_data (or fail if it's not there). Maybe a 3-option flag that sets a particular mode (would be easier to test as well, wouldn't need a device that doesn't support cq_data to test, could just run through each mode).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are few different things here:
- The README file was intended to provider further clarity to the fallback tag thing. I'm open to add any additional mechanism to add more clarity, please share how.
- I understand the complications for testing, but I'm not convinced that CQ_DATA selection should be made available through an mca parameter since this is just a testing requirement with no benefit for the end user (in fact it limits scalability). Unless there is a provider that may have benefits by intentionally not using CQ_DATA, e.g. overhead in supporting FI_DIRECTED_RECIEVE, required to not include the rank in the tag. Thoughts ?
ompi/mca/mtl/ofi/mtl_ofi_component.c
Outdated
@@ -392,7 +410,7 @@ ompi_mtl_ofi_component_init(bool enable_progress_threads, | |||
if (FI_ENODATA == -ret) { | |||
// It is not an error if no information is returned. | |||
goto error; | |||
} else if (0 != ret) { | |||
} else if (OPAL_UNLIKELY(0 != ret)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the compiler hint; this is initialization, not the critical path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, will remove all compiler hints from init time.
ompi/mca/mtl/ofi/mtl_ofi_component.c
Outdated
@@ -310,11 +327,12 @@ ompi_mtl_ofi_component_init(bool enable_progress_threads, | |||
{ | |||
int ret, fi_version; | |||
struct fi_info *hints; | |||
struct fi_info *providers = NULL, *prov = NULL; | |||
struct fi_info *providers = NULL, *prov = NULL, *prov_cq_data = NULL ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
multiple lines make code easier to read :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
ompi/mca/mtl/ofi/mtl_ofi_component.c
Outdated
* FI_DIRECTED_RECV is also needed so receives can discrimate the source | ||
*/ | ||
prov_tmp_name = strdup(prov->fabric_attr->prov_name); | ||
if(!prov_tmp_name){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in OMPI, we explicitly check against the NULL.
I also don't understand why you're swapping provider names here; a comment would go a long way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not swapping providers names, jut making a copy to avoid issues with fi_freeinfo(). Will add a comment.
ompi/mca/mtl/ofi/mtl_ofi_component.c
Outdated
else if ((NULL != prov_cq_data) && | ||
(0 == strncmp (prov_tmp_name, prov_cq_data->fabric_attr->prov_name, | ||
strlen(prov_tmp_name)))) { | ||
prov=prov_cq_data; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment of which case we're in would make this way more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
unsigned long long source_rank_mask; | ||
unsigned long long mpi_tag_mask; | ||
int num_bits_mpi_tag; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the extra newline?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
ompi/mca/mtl/ofi/mtl_ofi_types.h
Outdated
#define MTL_OFI_SYNC_SEND_DATA (0x0000000100000000ULL) | ||
#define MTL_OFI_SYNC_SEND_ACK_DATA (0x0000000900000000ULL) | ||
|
||
__opal_attribute_always_inline__ static inline uint64_t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this have to be inlined? Thinking you're smarter than the inline is generally a path to madness...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function creates the send tag. Please check previous comments where it was asked to move the macros that create tags to inline functions.
@bwbarrett, please review the updated patch. Main change is the tag selection logic and corresponding mca paramter. Now it addresses all you requirements. Details explained in the README file. |
minor pending cleaning done in last patch |
@jsquyres would you consider the latest patch addresses your comments? merge request. thanks, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great; thanks for all the changes.
My only quibble is with the help message for the MCA var. You can accept my suggestion or not -- my only goal with the suggestion is to provide a help message that is oriented towards an end user (not a developer who knows things about libfabric). No need to have me re-review if you update the help message.
ompi/mca/mtl/ofi/mtl_ofi_component.c
Outdated
|
||
ofi_tag_mode = MTL_OFI_TAG_AUTO; | ||
asprintf(&desc, " Mode for OFI tag." | ||
" 1 auto (default): detect if the provider supports FI_REMOTE_CQ_DATA or fallback to 2." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite sure what the explanation for 1 means. Does it mean:
- If FI_REMOTE_CQ_DATA is supported, use 4.
- Otherwise, use 2.
- 3 is available just for the heckuvit (i.e., different bit counts than 2)
- How many bits are used for 4? I only ask because they're specified / shown for 2 and 3, but not 4 (or 1).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All answers are correct. 4 or 1 do not show this info because there are no limitations: 32b. Should it still show it? or at least mention there are no limitations using it.
ompi/mca/mtl/ofi/mtl_ofi_component.c
Outdated
" 1 auto (default): detect if the provider supports FI_REMOTE_CQ_DATA or fallback to 2." | ||
" 2 ofi_tag_1: %d bits com_id,%d bits source rank,%d bits mpi_tag." | ||
" 3 ofi_tag_2: %d bits com_id,%d bits source rank,%d bits mpi_tag." | ||
" 4 force_fi_cq_data: try FI_REMOTE_CQ_DATA or fail if not supported.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the user supposed to specify a value of auto
, ofi_tag_1
, ...etc., or a value of 1
, 2
, ...etc.? From reading this help message, I think a user might assume that they are supposed to use the number, but I think that since you're registering this as an MCA var enum, they're supposed to specify the names (auto
, etc.), right?
Here's my suggestion for the help message:
Mode specifying how many bits to use for various MPI values in OFI/Libfabric communications. Some Libfabric provider network types can support as many bits as Open MPI needs; others can only supply a limited number of bits, which then must be split across the MPI communicator ID, MPI source rank, and MPI tag. Three different splitting schemes are available: ofi_tag_full (%d bits for the communicator, %d bits for the source rank, and %d bits for the tag), ofi_tag_2 (%d bits communicator, %d bits source rank, %d bits tag), or ofi_tag_3 (%d bits communicator, %d bits source rank, %d bits tag). By default, this MCA variable is set to "auto", which will first try to use "ofi_tag_full", and if that fails, fall back to "ofi_tag_2".
Where "ofi_tag_full" -- I think -- is the equivalent to your force_fi_cq_data (FI_REMOTE_CQ_DATA and friends have no meaning to the end user).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
passing numbers vs strings: both will work (since the enum is registered), but (please correct me if I'm wrong) most of the mca parameters are written suggesting text. I take it in and will update the README accordingly and also update the parameter help message. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, thanks. I had honestly forgotten that we allow both numbers or names for enum values of MCA vars. If you find my suggested text helpful, awesome. 😄
Extend number of supported ranks with providers that support FI_REMOTE_CQ_DATA. Add README file to OFI MTL Signed-off-by: Matias Cabral <matias.a.cabral@intel.com>
New patch including @jsquyres feedback for the README and parameter help message. Renamed the force_cq_data option to ofi_tag_full |
MTL OFI: add support for FI_REMOTE_CQ_DATA.
Extend number of supported ranks with providers that support
FI_REMOTE_CQ_DATA.
Signed-off-by: Matias Cabral matias.a.cabral@intel.com