Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define threading env-vars only for threaded builds #1549

Merged
merged 10 commits into from
Jun 8, 2017

Conversation

amametjanov
Copy link
Member

@amametjanov amametjanov commented May 20, 2017

  • Adds SMP_PRESENT case-var, which is true if (BUILD_THREADED or NTHRDS_* >1)
  • Enable/disable env_mach_specific XML variables based on
    attributes set in the case variables

[BFB]

@jgfouca
Copy link
Member

jgfouca commented May 21, 2017

@amametjanov , this is a clever implementation, but I doubt it would be accepted in its current implementation, but I think it's pretty close. I think the better way to do this is to add support for matching arbitrary case values to _match_attribs. I will try to be more specific in the comments.

Copy link
Member

@jgfouca jgfouca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, but needs refactor.

<arg name="xl_bg_spreadlayout"> --envs XL_BG_SPREADLAYOUT=YES</arg>
<arg name="omp_stacksize"> --envs OMP_STACKSIZE=64M</arg>
<arg name="thread_count"> --envs OMP_NUM_THREADS=$ENV{OMP_NUM_THREADS}</arg>
<arg SMP_PRESENT="TRUE" name="bg_threadlayout"> --envs BG_THREADLAYOUT=1</arg>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead, do this:

<arg name="extra_smp_stuff"> $ENV{ACME_EXTRA_SMP_STUFF}</arg>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The config file is already in acme/machines so does ACME have to be in the ENV variable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rljacob I was mostly just trying to ensure that the variable name didn't conflict with anything.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Matching at the arg tag is only limited to compiler, debug and mpilib variables. Why not extend the matching to other variables too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amametjanov , CIME doesn't do any matching for individual mpirun arguments:

        # Now that we know the best match, compute the arguments                                                                                                                                                                                                                   
        if not exe_only:
            arg_node = self.get_optional_node("arguments", root=the_match)
            if arg_node is not None:
                arg_nodes = self.get_nodes("arg", root=arg_node)
                for arg_node in arg_nodes:
                    arg_value = transform_vars(arg_node.text,
                                               case=case,
                                               subgroup=job,
                                               check_members=check_members,
                                               default=arg_node.get("default"))
                    args.append(arg_value)

^ snippet from env_mach_specific.get_mpirun

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mpirun args can be lengthy and this one-line change avoids repeating the whole mpirun-xml block for a different combination of args.

@@ -1075,8 +1075,8 @@
</module_system>
<environment_variables>
<env name="MPI_TYPE_MAX">10000</env>
<env name="OMP_DYNAMIC">FALSE</env>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead, do this:

<environment_variables>
   <env name="MPI_TYPE_MAX">10000</env>
</environment_variables>
<environment_variables SMP_PRESENT="TRUE">
    <env name="OMP_DYNAMIC">FALSE</env>
    <env name="OMP_STACKSIZE">64M</env>
    <env name="ACME_EXTRA_SMP_STUFF">--envs XL_BG_SPREADLAYOUT=YES --envs OMP_STACKSIZE=64M  --envs OMP_NUM_THREADS=$ENV{OMP_NUM_THREADS}</env>
</environment_variables>

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing for actual/resolved values of $ENV{OMP_NUM_THREADS} at case setup or build time shows that this variable's value isn't known until job actually runs. So $ENV{ACME_EXTRA_SMP_STUFF} will also be unresolved.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Interesting problem. Maybe if you used $OMP_NUM_THREADS instead of $ENV{OMP_NUM_THREADS} it would leave it unresolved until the shell executed it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlike vars like $COBALT_PARTNAME and $LID, $OMP_NUM_THREADS is known at case-setup-time.

@@ -183,21 +183,26 @@ def _compute_actions(self, nodes, child_tag, compiler, debug, mpilib):

return result

def _match_attribs(self, attribs, compiler, debug, mpilib):
if ("compiler" in attribs and
def _match_attribs(self, attribs, compiler=None, debug=None, mpilib=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the =None changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Matching is hard-wired to only look at compiler, debug and mpilib vars: need to generalize. Also, making these parameters optional removes the need to supply actual values at call-site.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we need to generalize.

not self._match("TRUE" if debug else "FALSE", attribs["debug"].upper())):
return False
elif ("unit_testing" in attribs and
not self._match("TRUE" if self._unit_testing else "FALSE",
attribs["unit_testing"].upper())):
return False

# check for matches with env-vars
for attrib in attribs:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep this, except refactor to look at case object instead of environment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like env_mach_specific.py initializes/uses case object, and didn't want to introduce unexpected side effects of loading a case object.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The call chain that uses this is:

case.load_env ->
  env_mach_specific.load_env ->
    env_mach_specific._get_modules_for_case ->
      env_mach_specific._compute_module_actions ->
        env_mach_specific._compute_actions ->
          env_mach_specific._match_attribs

So, you have a case object in that call chain that can be passed down. I don't see any other way to do this as we cannot expect every case setting to be in the environment.

check_members=check_members,
default=arg_node.get("default"))
args.append(arg_value)
if (self._match_attribs(arg_node.attrib)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Undo this change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

XML nodes under

  <mpirun mpilib="default">
    <executable>/usr/bin/runjob</executable>
    <arguments>

are matched/resolved/replaced in this function.

Side note: this section can also process elements with env-vars like

      <arg name="thread_count"> --envs OMP_NUM_THREADS=$ENV{OMP_NUM_THREADS}</arg>

and a call to preview_run returns

MPIRUN:
ERROR: Undefined env var 'OMP_NUM_THREADS'

That's why it is necessary to refer to env-vars at build-time on login nodes.

return bool(force_threaded) or self.thread_count > 1
smp_present = bool(force_threaded) or self.thread_count > 1
self.set_value("SMP_PRESENT", stringify_bool(smp_present))
os.environ["SMP_PRESENT"] = stringify_bool(smp_present)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this line (add SMP_PRESENT to env)

Copy link
Member Author

@amametjanov amametjanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jim, env_mach_specific.py appears to be limited to the env-vars.

not self._match("TRUE" if debug else "FALSE", attribs["debug"].upper())):
return False
elif ("unit_testing" in attribs and
not self._match("TRUE" if self._unit_testing else "FALSE",
attribs["unit_testing"].upper())):
return False

# check for matches with env-vars
for attrib in attribs:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like env_mach_specific.py initializes/uses case object, and didn't want to introduce unexpected side effects of loading a case object.

check_members=check_members,
default=arg_node.get("default"))
args.append(arg_value)
if (self._match_attribs(arg_node.attrib)):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

XML nodes under

  <mpirun mpilib="default">
    <executable>/usr/bin/runjob</executable>
    <arguments>

are matched/resolved/replaced in this function.

Side note: this section can also process elements with env-vars like

      <arg name="thread_count"> --envs OMP_NUM_THREADS=$ENV{OMP_NUM_THREADS}</arg>

and a call to preview_run returns

MPIRUN:
ERROR: Undefined env var 'OMP_NUM_THREADS'

That's why it is necessary to refer to env-vars at build-time on login nodes.

@@ -183,21 +183,26 @@ def _compute_actions(self, nodes, child_tag, compiler, debug, mpilib):

return result

def _match_attribs(self, attribs, compiler, debug, mpilib):
if ("compiler" in attribs and
def _match_attribs(self, attribs, compiler=None, debug=None, mpilib=None):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Matching is hard-wired to only look at compiler, debug and mpilib vars: need to generalize. Also, making these parameters optional removes the need to supply actual values at call-site.

<arg name="xl_bg_spreadlayout"> --envs XL_BG_SPREADLAYOUT=YES</arg>
<arg name="omp_stacksize"> --envs OMP_STACKSIZE=64M</arg>
<arg name="thread_count"> --envs OMP_NUM_THREADS=$ENV{OMP_NUM_THREADS}</arg>
<arg SMP_PRESENT="TRUE" name="bg_threadlayout"> --envs BG_THREADLAYOUT=1</arg>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Matching at the arg tag is only limited to compiler, debug and mpilib variables. Why not extend the matching to other variables too.

@@ -1075,8 +1075,8 @@
</module_system>
<environment_variables>
<env name="MPI_TYPE_MAX">10000</env>
<env name="OMP_DYNAMIC">FALSE</env>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing for actual/resolved values of $ENV{OMP_NUM_THREADS} at case setup or build time shows that this variable's value isn't known until job actually runs. So $ENV{ACME_EXTRA_SMP_STUFF} will also be unresolved.

Copy link
Member Author

@amametjanov amametjanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jgfouca the update generalizes to case-vars now.

<arg name="xl_bg_spreadlayout"> --envs XL_BG_SPREADLAYOUT=YES</arg>
<arg name="omp_stacksize"> --envs OMP_STACKSIZE=64M</arg>
<arg name="thread_count"> --envs OMP_NUM_THREADS=$ENV{OMP_NUM_THREADS}</arg>
<arg SMP_PRESENT="TRUE" name="bg_threadlayout"> --envs BG_THREADLAYOUT=1</arg>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mpirun args can be lengthy and this one-line change avoids repeating the whole mpirun-xml block for a different combination of args.

@@ -1075,8 +1075,8 @@
</module_system>
<environment_variables>
<env name="MPI_TYPE_MAX">10000</env>
<env name="OMP_DYNAMIC">FALSE</env>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlike vars like $COBALT_PARTNAME and $LID, $OMP_NUM_THREADS is known at case-setup-time.

@jgfouca
Copy link
Member

jgfouca commented May 23, 2017

@amametjanov I made some changes to this branch that I think will make it more more likely for the ESMCI folks to accept. I also briefly tested it:

% ./create_test SMS.f19_g16_rx1.A --no-build
% cd $CASEDIR
% ./preview_run 
BATCH SUBMIT:
    case.test -> qsub   --cwd /projects/HiRes_EarthSys_2/acmetest/SMS.f19_g16_rx1.A.cetus_ibm.20170523_193017_g3q61f -A HiRes_EarthSys_2 -t 01:00:00 -n 16 -q default --mode script case.test 

MPIRUN: /usr/bin/runjob  --label short  --ranks-per-node 4  --np 64  --block $COBALT_PARTNAME $LOCARGS  --envs BG_THREADLAYOUT=1 --envs XL_BG_SPREADLAYOUT=YES --envs OMP_STACKSIZE=64M --envs OMP_NUM_THREADS=16 :  /projects/HiRes_EarthSys_2/acmetest/SMS.f19_g16_rx1.A.cetus_ibm.20170523_193017_g3q61f/bld/acme.exe  >> acme.log.$LID 2>&1 

@amametjanov
Copy link
Member Author

@jgfouca thanks for the updates. I noticed an additional export EXTRA_SMP_STUFF=None prior to an actual export in .env_mach_specific.sh, but we'll have to know that it's needed so that the variable is defined in both contexts. Did a little bit of a cleanup. Please proceed. Thank you.

@jgfouca
Copy link
Member

jgfouca commented May 23, 2017

@amametjanov will do, thanks.

jgfouca added a commit that referenced this pull request May 23, 2017
Define threading env-vars only for threaded builds

* Adds SMP_PRESENT case-var, which is true if (BUILD_THREADED or NTHRDS_* >1)
* Enable/disable env_mach_specific XML variables based on
  attributes set in the case variables

[BFB]

* azamat/mira/add-smp-present:
  Cleanup BGQ SMP vars
  Fixes for cetus
  jgfouca mods to this branch
  Match env_mach_specific.xml tag attributes with any case-var value
  Define threading env-vars only for threaded builds
@jgfouca
Copy link
Member

jgfouca commented May 23, 2017

Merged to next.

rljacob added a commit that referenced this pull request May 26, 2017
This reverts commit f08d778, reversing
changes made to 9233c4b.

Bisect pointed to this as causing fails on melvin-next.
@rljacob
Copy link
Member

rljacob commented May 26, 2017

reverted off of next. Bisect pointed to this one as causing recent build fails.

jgfouca added a commit that referenced this pull request May 26, 2017
jgfouca added a commit that referenced this pull request May 26, 2017
Merge #2 for this PR.

* azamat/mira/add-smp-present:
  Fix race condition
rljacob added a commit that referenced this pull request May 27, 2017
This reverts commit aabde09, reversing
changes made to 5a48e63.

Revert the second merge of this branch because still causing
build fails on next.
@rljacob
Copy link
Member

rljacob commented May 27, 2017

The attempt to fix the build problems didn't solve all of them so reverted from next again.

rljacob added a commit that referenced this pull request May 28, 2017
…to next (PR #1549)"""

This reverts commit 5a48e63.

Finish removing PR #1549
@rljacob
Copy link
Member

rljacob commented May 28, 2017

Had to "revert revert revert" to fully remove this from next.

@jgfouca jgfouca force-pushed the azamat/mira/add-smp-present branch from 1cd98fd to e3a6fba Compare May 30, 2017 19:09
jgfouca added a commit that referenced this pull request May 30, 2017
Merge #3 for this PR

* azamat/mira/add-smp-present:
  Refactor a non-pythonic list join
  Remove very dangerous side effect from get_build_threaded
  Fix ERP
  Fix race condition
  Cleanup BGQ SMP vars
  Fixes for cetus
  jgfouca mods to this branch
  Match env_mach_specific.xml tag attributes with any case-var value
  Define threading env-vars only for threaded builds
It is NOT safe to do a set_value in just any old place in the code.

[BFB]
jgfouca added a commit that referenced this pull request May 31, 2017
Merge #4 for this PR.

* azamat/mira/add-smp-present:
  More fixes
jgfouca pushed a commit that referenced this pull request Jun 2, 2017
Updated mpi-serial from MCSclimate/mpi-serial
Merged recent changes from MCSclimate/mpi-serial into CIME.
Includes new hvector functions.

Test suite: scripts_regression_test.py
SMS_D_Mmpi-serial_Ld1.T42_T42.FSCM5A97.hobart_nag
Test baseline: NA
Test namelist changes: NA
Test status: bit for bit

Fixes #1549

User interface changes?: NA

Code review:jedwards
@jgfouca jgfouca merged commit b3fbd18 into master Jun 8, 2017
@jgfouca jgfouca deleted the azamat/mira/add-smp-present branch June 8, 2017 20:42
jgfouca added a commit that referenced this pull request Oct 25, 2017
Define threading env-vars only for threaded builds (PR #1549)

* Adds SMP_PRESENT case-var, which is true if (BUILD_THREADED or NTHRDS_* >1)
* Enable/disable env_mach_specific XML variables based on attributes set in the case variables

[BFB]
jgfouca added a commit that referenced this pull request Feb 27, 2018
Define threading env-vars only for threaded builds (PR #1549)

* Adds SMP_PRESENT case-var, which is true if (BUILD_THREADED or NTHRDS_* >1)
* Enable/disable env_mach_specific XML variables based on attributes set in the case variables

[BFB]
jgfouca added a commit that referenced this pull request Mar 14, 2018
Define threading env-vars only for threaded builds (PR #1549)

* Adds SMP_PRESENT case-var, which is true if (BUILD_THREADED or NTHRDS_* >1)
* Enable/disable env_mach_specific XML variables based on attributes set in the case variables

[BFB]
rljacob pushed a commit that referenced this pull request Apr 12, 2021
Define threading env-vars only for threaded builds (PR #1549)

* Adds SMP_PRESENT case-var, which is true if (BUILD_THREADED or NTHRDS_* >1)
* Enable/disable env_mach_specific XML variables based on attributes set in the case variables

[BFB]
rljacob pushed a commit that referenced this pull request Apr 12, 2021
Define threading env-vars only for threaded builds (PR #1549)

* Adds SMP_PRESENT case-var, which is true if (BUILD_THREADED or NTHRDS_* >1)
* Enable/disable env_mach_specific XML variables based on attributes set in the case variables

[BFB]
rljacob pushed a commit that referenced this pull request May 6, 2021
Define threading env-vars only for threaded builds (PR #1549)

* Adds SMP_PRESENT case-var, which is true if (BUILD_THREADED or NTHRDS_* >1)
* Enable/disable env_mach_specific XML variables based on attributes set in the case variables

[BFB]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants