-
Notifications
You must be signed in to change notification settings - Fork 365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define threading env-vars only for threaded builds #1549
Conversation
@amametjanov , this is a clever implementation, but I doubt it would be accepted in its current implementation, but I think it's pretty close. I think the better way to do this is to add support for matching arbitrary case values to _match_attribs. I will try to be more specific in the comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, but needs refactor.
<arg name="xl_bg_spreadlayout"> --envs XL_BG_SPREADLAYOUT=YES</arg> | ||
<arg name="omp_stacksize"> --envs OMP_STACKSIZE=64M</arg> | ||
<arg name="thread_count"> --envs OMP_NUM_THREADS=$ENV{OMP_NUM_THREADS}</arg> | ||
<arg SMP_PRESENT="TRUE" name="bg_threadlayout"> --envs BG_THREADLAYOUT=1</arg> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead, do this:
<arg name="extra_smp_stuff"> $ENV{ACME_EXTRA_SMP_STUFF}</arg>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The config file is already in acme/machines so does ACME have to be in the ENV variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rljacob I was mostly just trying to ensure that the variable name didn't conflict with anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Matching at the arg
tag is only limited to compiler, debug and mpilib variables. Why not extend the matching to other variables too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amametjanov , CIME doesn't do any matching for individual mpirun arguments:
# Now that we know the best match, compute the arguments
if not exe_only:
arg_node = self.get_optional_node("arguments", root=the_match)
if arg_node is not None:
arg_nodes = self.get_nodes("arg", root=arg_node)
for arg_node in arg_nodes:
arg_value = transform_vars(arg_node.text,
case=case,
subgroup=job,
check_members=check_members,
default=arg_node.get("default"))
args.append(arg_value)
^ snippet from env_mach_specific.get_mpirun
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mpirun args can be lengthy and this one-line change avoids repeating the whole mpirun-xml block for a different combination of args.
@@ -1075,8 +1075,8 @@ | |||
</module_system> | |||
<environment_variables> | |||
<env name="MPI_TYPE_MAX">10000</env> | |||
<env name="OMP_DYNAMIC">FALSE</env> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead, do this:
<environment_variables>
<env name="MPI_TYPE_MAX">10000</env>
</environment_variables>
<environment_variables SMP_PRESENT="TRUE">
<env name="OMP_DYNAMIC">FALSE</env>
<env name="OMP_STACKSIZE">64M</env>
<env name="ACME_EXTRA_SMP_STUFF">--envs XL_BG_SPREADLAYOUT=YES --envs OMP_STACKSIZE=64M --envs OMP_NUM_THREADS=$ENV{OMP_NUM_THREADS}</env>
</environment_variables>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing for actual/resolved values of $ENV{OMP_NUM_THREADS}
at case setup or build time shows that this variable's value isn't known until job actually runs. So $ENV{ACME_EXTRA_SMP_STUFF}
will also be unresolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. Interesting problem. Maybe if you used $OMP_NUM_THREADS instead of $ENV{OMP_NUM_THREADS} it would leave it unresolved until the shell executed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unlike vars like $COBALT_PARTNAME
and $LID
, $OMP_NUM_THREADS
is known at case-setup-time.
@@ -183,21 +183,26 @@ def _compute_actions(self, nodes, child_tag, compiler, debug, mpilib): | |||
|
|||
return result | |||
|
|||
def _match_attribs(self, attribs, compiler, debug, mpilib): | |||
if ("compiler" in attribs and | |||
def _match_attribs(self, attribs, compiler=None, debug=None, mpilib=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the =None changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Matching is hard-wired to only look at compiler, debug and mpilib vars: need to generalize. Also, making these parameters optional removes the need to supply actual values at call-site.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we need to generalize.
not self._match("TRUE" if debug else "FALSE", attribs["debug"].upper())): | ||
return False | ||
elif ("unit_testing" in attribs and | ||
not self._match("TRUE" if self._unit_testing else "FALSE", | ||
attribs["unit_testing"].upper())): | ||
return False | ||
|
||
# check for matches with env-vars | ||
for attrib in attribs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep this, except refactor to look at case object instead of environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look like env_mach_specific.py initializes/uses case object, and didn't want to introduce unexpected side effects of loading a case object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The call chain that uses this is:
case.load_env ->
env_mach_specific.load_env ->
env_mach_specific._get_modules_for_case ->
env_mach_specific._compute_module_actions ->
env_mach_specific._compute_actions ->
env_mach_specific._match_attribs
So, you have a case object in that call chain that can be passed down. I don't see any other way to do this as we cannot expect every case setting to be in the environment.
check_members=check_members, | ||
default=arg_node.get("default")) | ||
args.append(arg_value) | ||
if (self._match_attribs(arg_node.attrib)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Undo this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
XML nodes under
<mpirun mpilib="default">
<executable>/usr/bin/runjob</executable>
<arguments>
are matched/resolved/replaced in this function.
Side note: this section can also process elements with env-vars like
<arg name="thread_count"> --envs OMP_NUM_THREADS=$ENV{OMP_NUM_THREADS}</arg>
and a call to preview_run
returns
MPIRUN:
ERROR: Undefined env var 'OMP_NUM_THREADS'
That's why it is necessary to refer to env-vars at build-time on login nodes.
cime/scripts/lib/CIME/case.py
Outdated
return bool(force_threaded) or self.thread_count > 1 | ||
smp_present = bool(force_threaded) or self.thread_count > 1 | ||
self.set_value("SMP_PRESENT", stringify_bool(smp_present)) | ||
os.environ["SMP_PRESENT"] = stringify_bool(smp_present) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this line (add SMP_PRESENT to env)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jim, env_mach_specific.py appears to be limited to the env-vars.
not self._match("TRUE" if debug else "FALSE", attribs["debug"].upper())): | ||
return False | ||
elif ("unit_testing" in attribs and | ||
not self._match("TRUE" if self._unit_testing else "FALSE", | ||
attribs["unit_testing"].upper())): | ||
return False | ||
|
||
# check for matches with env-vars | ||
for attrib in attribs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look like env_mach_specific.py initializes/uses case object, and didn't want to introduce unexpected side effects of loading a case object.
check_members=check_members, | ||
default=arg_node.get("default")) | ||
args.append(arg_value) | ||
if (self._match_attribs(arg_node.attrib)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
XML nodes under
<mpirun mpilib="default">
<executable>/usr/bin/runjob</executable>
<arguments>
are matched/resolved/replaced in this function.
Side note: this section can also process elements with env-vars like
<arg name="thread_count"> --envs OMP_NUM_THREADS=$ENV{OMP_NUM_THREADS}</arg>
and a call to preview_run
returns
MPIRUN:
ERROR: Undefined env var 'OMP_NUM_THREADS'
That's why it is necessary to refer to env-vars at build-time on login nodes.
@@ -183,21 +183,26 @@ def _compute_actions(self, nodes, child_tag, compiler, debug, mpilib): | |||
|
|||
return result | |||
|
|||
def _match_attribs(self, attribs, compiler, debug, mpilib): | |||
if ("compiler" in attribs and | |||
def _match_attribs(self, attribs, compiler=None, debug=None, mpilib=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Matching is hard-wired to only look at compiler, debug and mpilib vars: need to generalize. Also, making these parameters optional removes the need to supply actual values at call-site.
<arg name="xl_bg_spreadlayout"> --envs XL_BG_SPREADLAYOUT=YES</arg> | ||
<arg name="omp_stacksize"> --envs OMP_STACKSIZE=64M</arg> | ||
<arg name="thread_count"> --envs OMP_NUM_THREADS=$ENV{OMP_NUM_THREADS}</arg> | ||
<arg SMP_PRESENT="TRUE" name="bg_threadlayout"> --envs BG_THREADLAYOUT=1</arg> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Matching at the arg
tag is only limited to compiler, debug and mpilib variables. Why not extend the matching to other variables too.
@@ -1075,8 +1075,8 @@ | |||
</module_system> | |||
<environment_variables> | |||
<env name="MPI_TYPE_MAX">10000</env> | |||
<env name="OMP_DYNAMIC">FALSE</env> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing for actual/resolved values of $ENV{OMP_NUM_THREADS}
at case setup or build time shows that this variable's value isn't known until job actually runs. So $ENV{ACME_EXTRA_SMP_STUFF}
will also be unresolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jgfouca the update generalizes to case-vars now.
<arg name="xl_bg_spreadlayout"> --envs XL_BG_SPREADLAYOUT=YES</arg> | ||
<arg name="omp_stacksize"> --envs OMP_STACKSIZE=64M</arg> | ||
<arg name="thread_count"> --envs OMP_NUM_THREADS=$ENV{OMP_NUM_THREADS}</arg> | ||
<arg SMP_PRESENT="TRUE" name="bg_threadlayout"> --envs BG_THREADLAYOUT=1</arg> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mpirun args can be lengthy and this one-line change avoids repeating the whole mpirun-xml block for a different combination of args.
@@ -1075,8 +1075,8 @@ | |||
</module_system> | |||
<environment_variables> | |||
<env name="MPI_TYPE_MAX">10000</env> | |||
<env name="OMP_DYNAMIC">FALSE</env> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unlike vars like $COBALT_PARTNAME
and $LID
, $OMP_NUM_THREADS
is known at case-setup-time.
@amametjanov I made some changes to this branch that I think will make it more more likely for the ESMCI folks to accept. I also briefly tested it:
|
@jgfouca thanks for the updates. I noticed an additional |
@amametjanov will do, thanks. |
Define threading env-vars only for threaded builds * Adds SMP_PRESENT case-var, which is true if (BUILD_THREADED or NTHRDS_* >1) * Enable/disable env_mach_specific XML variables based on attributes set in the case variables [BFB] * azamat/mira/add-smp-present: Cleanup BGQ SMP vars Fixes for cetus jgfouca mods to this branch Match env_mach_specific.xml tag attributes with any case-var value Define threading env-vars only for threaded builds
Merged to next. |
reverted off of next. Bisect pointed to this one as causing recent build fails. |
Merge #2 for this PR. * azamat/mira/add-smp-present: Fix race condition
The attempt to fix the build problems didn't solve all of them so reverted from next again. |
Had to "revert revert revert" to fully remove this from next. |
1cd98fd
to
e3a6fba
Compare
Merge #3 for this PR * azamat/mira/add-smp-present: Refactor a non-pythonic list join Remove very dangerous side effect from get_build_threaded Fix ERP Fix race condition Cleanup BGQ SMP vars Fixes for cetus jgfouca mods to this branch Match env_mach_specific.xml tag attributes with any case-var value Define threading env-vars only for threaded builds
It is NOT safe to do a set_value in just any old place in the code. [BFB]
Merge #4 for this PR. * azamat/mira/add-smp-present: More fixes
Updated mpi-serial from MCSclimate/mpi-serial Merged recent changes from MCSclimate/mpi-serial into CIME. Includes new hvector functions. Test suite: scripts_regression_test.py SMS_D_Mmpi-serial_Ld1.T42_T42.FSCM5A97.hobart_nag Test baseline: NA Test namelist changes: NA Test status: bit for bit Fixes #1549 User interface changes?: NA Code review:jedwards
Define threading env-vars only for threaded builds (PR #1549) * Adds SMP_PRESENT case-var, which is true if (BUILD_THREADED or NTHRDS_* >1) * Enable/disable env_mach_specific XML variables based on attributes set in the case variables [BFB]
Define threading env-vars only for threaded builds (PR #1549) * Adds SMP_PRESENT case-var, which is true if (BUILD_THREADED or NTHRDS_* >1) * Enable/disable env_mach_specific XML variables based on attributes set in the case variables [BFB]
Define threading env-vars only for threaded builds (PR #1549) * Adds SMP_PRESENT case-var, which is true if (BUILD_THREADED or NTHRDS_* >1) * Enable/disable env_mach_specific XML variables based on attributes set in the case variables [BFB]
Define threading env-vars only for threaded builds (PR #1549) * Adds SMP_PRESENT case-var, which is true if (BUILD_THREADED or NTHRDS_* >1) * Enable/disable env_mach_specific XML variables based on attributes set in the case variables [BFB]
Define threading env-vars only for threaded builds (PR #1549) * Adds SMP_PRESENT case-var, which is true if (BUILD_THREADED or NTHRDS_* >1) * Enable/disable env_mach_specific XML variables based on attributes set in the case variables [BFB]
Define threading env-vars only for threaded builds (PR #1549) * Adds SMP_PRESENT case-var, which is true if (BUILD_THREADED or NTHRDS_* >1) * Enable/disable env_mach_specific XML variables based on attributes set in the case variables [BFB]
attributes set in the case variables
[BFB]