Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problem rebuilding if case.build fails after a successful first build #1971

Closed
mvertens opened this issue Oct 18, 2017 · 10 comments
Closed

problem rebuilding if case.build fails after a successful first build #1971

mvertens opened this issue Oct 18, 2017 · 10 comments
Assignees

Comments

@mvertens
Copy link
Contributor

I am encountering the following failure.

  1. I invoke case.build successfully
  2. I realize I want to change a file - and I modify a file (say in CLM)
  3. I run case.build again - and there is a compilation failure
  4. I fix the file so that I expect it will now compile - and I rerun case.build. I then get the following:
Setting build complete to False
ERROR:
ERROR env_build HAS CHANGED
  A manual clean of your obj directories is required
  You should execute the following:
    ./case.build --clean-all

I know that I don't want to run case.build --clean-all since everything should work except for the file that I had changed. It turns out that to fix this I need to run the following two steps:

rm LockedFiles/env_build.xml
./xmlchange BUILD_STATUS=0

This is clearly a problem that users would run into when making changes to a case.

@gold2718
Copy link

I just had this happen on Hobart but don't know what changed. The sequence was:

  1. create_newcase, case.setup, case.build, case.submit, case.run (run crashed)
  2. Edit Fortran file
  3. case.build (build crashed)
  4. Fix error in edit
  5. case.build
Building case in directory /scratch/cluster/goldy/QPC6_mg16
sharedlib_only is False
model_only is False
File /scratch/cluster/goldy/QPC6_mg16/LockedFiles/env_build.xml has been modified
  found difference in BUILD_COMPLETE : case False locked True
Setting build complete to False
ERROR: 
ERROR env_build HAS CHANGED
  A manual clean of your obj directories is required
  You should execute the following:
    ./case.build --clean-all

Could it be some sort of race condition that locks up the process?

@gold2718
Copy link

Maybe it is the sequence of complete build, failed build, build try that triggers things. My diff:

hobart: diff env_build.xml LockedFiles/
264c264
<     <entry id="BUILD_COMPLETE" value="FALSE">
---
>     <entry id="BUILD_COMPLETE" value="TRUE">
289c289
<     <entry id="BUILD_STATUS" value="1">
---
>     <entry id="BUILD_STATUS" value="0">

mvertens pushed a commit that referenced this issue Oct 23, 2017
@jedwards4b
Copy link
Contributor

I am closing this issue since noone seems to have come up with a reproducer.

@gold2718
Copy link

gold2718 commented Nov 2, 2017

I did come up with a reproducer which is the comment above from about Oct. 22.
To restate it:

  1. Build the model
  2. Make a code change that will not compile
  3. Try another build (build fails)
  4. Make another change and try a build again (triggers issue)

Does this not trigger the error for you?

@gold2718 gold2718 reopened this Nov 2, 2017
@jedwards4b
Copy link
Contributor

No it does not, perhaps I am not making the right change, but I tried several times.

@gold2718
Copy link

gold2718 commented Nov 2, 2017

Hmm. Well, all I can say is that I have been making bad code changes in CAM (I'm particularly talented at that) and getting failed compiles (./case.build). Then, whenever I try to ./case.build again, I get the message. Could it depend on the component being modified?

@jedwards4b
Copy link
Contributor

Can you give me instructions to reproduce in which you tell me exactly how to generate this error in a reproducible fashion? Otherwise I would again request that you close it.

@mvertens
Copy link
Contributor Author

mvertens commented Nov 2, 2017 via email

@jedwards4b
Copy link
Contributor

Since no reproducer has been presented for this issue, I still think that it should be closed. I have removed the critical tag.

@gold2718
Copy link

Reproducer using cime-only clone:

./create_newcase --case /scratch/cluster/${USER}/A_mg17 --compset A --res f19_f19_mg17 --compiler nag --run-unsupported
cd /scratch/cluster/${USER}/A_mg17
./xmlchange DEBUG=TRUE
./case.setup
./case.build
sed -i -e 's/SHR_KIND_R8/SHR_KIND_FOODIDOO/' `./xmlquery --value CIMEROOT`/src/drivers/mct/main/cime_comp_mod.F90
./case.build
sed -i -e 's/SHR_KIND_FOODIDOO/SHR_KIND_R8/' `./xmlquery --value CIMEROOT`/src/drivers/mct/main/cime_comp_mod.F90
./case.build

jgfouca added a commit that referenced this issue Nov 10, 2017
locked files needs to handle BUILD_COMPLETE better

If BUILD_COMPLETE is the only variable changed in env_build.xml it should not flag in check_lockedfiles

Test suite: scripts_regression_tests.py, hand testing - see test in issue #1971
Test baseline:
Test namelist changes:
Test status: [bit for bit, roundoff, climate changing]

Fixes #1971

User interface changes?:

Update gh-pages html (Y/N)?:

Code review:
jgfouca pushed a commit that referenced this issue Jan 30, 2018
Update MPAS submodules: Reduce thread barriers in halo exchanges

This PR brings in a new version of the MPAS framework, and updates the
MPAS components. In the MPAS framework, it reduces unneeded thread
barriers in halo exchanges. See MPAS-Dev/MPAS#1459.

The measured speedup from this change in the MPAS-Ocean stand-alone for
the RRS30to10 mesh on 256 KNL nodes using 2 threads was ~8%.

The updated MPAS submodules also bring in a MPAS sea ice change that
adds namelist entries.

Tested with:
* PET_Ln9.T62_oQU240.GMPAS-IAF.cori-knl_intel
* SMS_Ln9.T62_oQU120_ais20.MPAS_LISIO_TEST.theta_intel
* PET_Ln9.T62_oQU240.GMPAS-IAF.theta_intel
* PET_Ln9.T62_oQU240.GMPAS-IAF.theta_gnu
* a range of DTESTM runs on cori-knl

[NML]
[BFB]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants