Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provenance appears not to be handling relative paths correctly #4251

Closed
xylar opened this issue May 7, 2022 · 13 comments · Fixed by #4254
Closed

Provenance appears not to be handling relative paths correctly #4251

xylar opened this issue May 7, 2022 · 13 comments · Fixed by #4254
Assignees

Comments

@xylar
Copy link

xylar commented May 7, 2022

The compass software for testing MPAS-Ocean and MALI contains E3SM as a submodule. A common workflow is to create a worktree of compass and then to test from the E3SM submodule within the compass worktree.

Today, I encountered an error I have not previously seen with this workflow:

$ ./create_test --wait --walltime 01:00:00 ERS.ne11_oQU240.WCYCL1850NS.chrysalis_intel
No project info available
create_test will do up to 1 tasks simultaneously
create_test will use up to 160 cores simultaneously
Creating test directory /lcrc/group/e3sm/ac.xylar/scratch/chrys/ERS.ne11_oQU240.WCYCL1850NS.chrysalis_intel.20220507_115857_puvq8g
RUNNING TESTS:
  ERS.ne11_oQU240.WCYCL1850NS.chrysalis_intel
Starting CREATE_NEWCASE for test ERS.ne11_oQU240.WCYCL1850NS.chrysalis_intel with 1 procs
Finished CREATE_NEWCASE for test ERS.ne11_oQU240.WCYCL1850NS.chrysalis_intel in 0.594735 seconds (PASS)
Starting XML for test ERS.ne11_oQU240.WCYCL1850NS.chrysalis_intel with 1 procs
Finished XML for test ERS.ne11_oQU240.WCYCL1850NS.chrysalis_intel in 0.146541 seconds (PASS)
Starting SETUP for test ERS.ne11_oQU240.WCYCL1850NS.chrysalis_intel with 1 procs
Finished SETUP for test ERS.ne11_oQU240.WCYCL1850NS.chrysalis_intel in 5.213597 seconds (PASS)
Starting SHAREDLIB_BUILD for test ERS.ne11_oQU240.WCYCL1850NS.chrysalis_intel with 1 procs
Finished SHAREDLIB_BUILD for test ERS.ne11_oQU240.WCYCL1850NS.chrysalis_intel in 186.838065 seconds (PASS)
Starting MODEL_BUILD for test ERS.ne11_oQU240.WCYCL1850NS.chrysalis_intel with 6 procs
Finished MODEL_BUILD for test ERS.ne11_oQU240.WCYCL1850NS.chrysalis_intel in 994.970013 seconds (FAIL). [COMPLETED 1 of 1]
    Case dir: /lcrc/group/e3sm/ac.xylar/scratch/chrys/ERS.ne11_oQU240.WCYCL1850NS.chrysalis_intel.20220507_115857_puvq8g
    Errors were:
        Building test for ERS in directory /lcrc/group/e3sm/ac.xylar/scratch/chrys/ERS.ne11_oQU240.WCYCL1850NS.chrysalis_intel.20220507_115857_puvq8g
        Traceback (most recent call last):
          File "./case.build", line 256, in <module>
            _main_func(__doc__)
          File "./case.build", line 236, in _main_func
            separate_builds=separate_builds,
          File "/gpfs/fs1/home/ac.xylar/mpas-work/compass/test_stream_read_input_only/E3SM-Project/cime/CIME/SystemTests/system_tests_common.py", line 126, in build
            model_only=(phase_name == MODEL_BUILD_PHASE),
          File "/gpfs/fs1/home/ac.xylar/mpas-work/compass/test_stream_read_input_only/E3SM-Project/cime/CIME/SystemTests/system_tests_common.py", line 161, in build_phase
            self.build_indv(sharedlib_only=sharedlib_only, model_only=model_only)
          File "/gpfs/fs1/home/ac.xylar/mpas-work/compass/test_stream_read_input_only/E3SM-Project/cime/CIME/SystemTests/system_tests_common.py", line 176, in build_indv
            separate_builds=self._user_separate_builds,
          File "/gpfs/fs1/home/ac.xylar/mpas-work/compass/test_stream_read_input_only/E3SM-Project/cime/CIME/build.py", line 1285, in case_build
            return run_and_log_case_status(functor, cb, caseroot=caseroot)
          File "/gpfs/fs1/home/ac.xylar/mpas-work/compass/test_stream_read_input_only/E3SM-Project/cime/CIME/utils.py", line 2417, in run_and_log_case_status
            rv = func()
          File "/gpfs/fs1/home/ac.xylar/mpas-work/compass/test_stream_read_input_only/E3SM-Project/cime/CIME/build.py", line 1278, in <lambda>
            dry_run,
          File "/gpfs/fs1/home/ac.xylar/mpas-work/compass/test_stream_read_input_only/E3SM-Project/cime/CIME/build.py", line 1220, in _case_build_impl
            save_build_provenance=save_build_provenance,
          File "/gpfs/fs1/home/ac.xylar/mpas-work/compass/test_stream_read_input_only/E3SM-Project/cime/CIME/build.py", line 1244, in post_build
            save_build_provenance_sub(case, lid=lid)
          File "/gpfs/fs1/home/ac.xylar/mpas-work/compass/test_stream_read_input_only/E3SM-Project/cime/CIME/provenance.py", line 284, in save_build_provenance
            _save_build_provenance_e3sm(case, lid)
          File "/gpfs/fs1/home/ac.xylar/mpas-work/compass/test_stream_read_input_only/E3SM-Project/cime/CIME/provenance.py", line 181, in _save_build_provenance_e3sm
            _record_git_provenance(srcroot, exeroot, lid)
          File "/gpfs/fs1/home/ac.xylar/mpas-work/compass/test_stream_read_input_only/E3SM-Project/cime/CIME/provenance.py", line 151, in _record_git_provenance
            safe_copy(config_src, config_prov, preserve_meta=False)
          File "/gpfs/fs1/home/ac.xylar/mpas-work/compass/test_stream_read_input_only/E3SM-Project/cime/CIME/utils.py", line 1375, in safe_copy
            preserve_times=preserve_meta,
          File "/usr/lib64/python3.6/distutils/file_util.py", line 105, in copy_file
            "can't copy '%s': doesn't exist or not a regular file" % src)
        distutils.errors.DistutilsFileError: can't copy '../../compass/.git/config': doesn't exist or not a regular file

My hunch is that this relative path ../../compass/.git/config is not being handled correctly because it is relative to the E3SM root rather than the cime/scripts subdirectory where I am running ./create_test. But this is only a hunch.

Steps to reproduce the problem on Chrysalis:

git clone git@github.com:MPAS-Dev/compass.git
cd compass
git worktree add ../test
cd ../test
git submodule update --init --recursive
cd E3SM-Project/cime/scripts
./create_test --wait --walltime 01:00:00 ERS.ne11_oQU240.WCYCL1850NS.chrysalis_intel
@xylar
Copy link
Author

xylar commented May 7, 2022

@jgfouca, are there any recent changes to CIME provenance (in E3SM) that might explain this?

@jgfouca
Copy link
Contributor

jgfouca commented May 9, 2022

@xylar , I'm looking at the git log for this file and seeing only minimal changes in recent months. One significant thing that did happen was the CIME reorganization which moved many of the files around, but, looking at the implementation of _record_git_provenance:

def _record_git_provenance(srcroot, exeroot, lid):
    """Records git provenance                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                  
    Records git status, diff and logs for main repo and all submodules.                                                                                                                                                                           
    """
    # Git Status                                                                                                                                                                                                                                  
    status_prov = os.path.join(exeroot, "GIT_STATUS.{}".format(lid))
    _run_git_cmd_recursively("status", srcroot, status_prov)

    # Git Diff                                                                                                                                                                                                                                    
    diff_prov = os.path.join(exeroot, "GIT_DIFF.{}".format(lid))
    _run_git_cmd_recursively("diff", srcroot, diff_prov)

    # Git Log                                                                                                                                                                                                                                     
    log_prov = os.path.join(exeroot, "GIT_LOG.{}".format(lid))
    cmd = "log --first-parent --pretty=oneline -n 5"
    _run_git_cmd_recursively(cmd, srcroot, log_prov)

    # Git remote                                                                                                                                                                                                                                  
    remote_prov = os.path.join(exeroot, "GIT_REMOTE.{}".format(lid))
    _run_git_cmd_recursively("remote -v", srcroot, remote_prov)

    gitroot = _find_git_root(srcroot)
    gitroot = _parse_dot_git_path(gitroot)

    # Git config                                                                                                                                                                                                                                  
    config_src = os.path.join(gitroot, "config")
    config_prov = os.path.join(exeroot, "GIT_CONFIG.{}".format(lid))
    safe_copy(config_src, config_prov, preserve_meta=False)

... it doesn't look like anything should be sensitive to the location of the current source file. The error indicates that gitroot is incorrect, which means there could be a problem with srcroot (unlikely), _find_get_root, or _parse_dot_git_path. Could you maybe add some print statements around this code:

    gitroot = _find_git_root(srcroot)
    gitroot = _parse_dot_git_path(gitroot)

to help us see what's going on?

@xylar
Copy link
Author

xylar commented May 9, 2022

@jgfouca, sure, I can do that. It will be easiest to wait for LCRC to come back up, since that's where I was testing.

@xylar
Copy link
Author

xylar commented May 10, 2022

@jgfouca, now that Chrysalis is back, I rad again. Here's what I'm seeing:

srcroot: /gpfs/fs1/home/ac.xylar/mpas-work/compass/test_stream_read_input_only/E3SM-Project
gitroot: ../../compass/.git/worktrees/test_stream_read_input_only/modules/E3SM-Project
gitroot: ../../compass/.git

I assume this is obvious but the first gitroot is after _find_git_root() and the second is after _parse_dot_git_path(). While I don't fully understand the intended outcome of _parse_dot_git_path(), I'd hazard a guess that that isn't producing the expected result.

@xylar
Copy link
Author

xylar commented May 10, 2022

Let me know if I can add more prints to help. It would also be helpful for me to know if there's a quicker way to reproduce this than rerunning the full test, since that takes about 20 minutes.

@xylar
Copy link
Author

xylar commented May 10, 2022

Actually, looking at the code, I think what is missing is something to convert a relative path to an absolute path using the original srcroot. At least that seems the most likely problem based on my limited understanding.

@xylar
Copy link
Author

xylar commented May 10, 2022

config_src = os.path.join(srcroot, gitroot, "config")

would seem to take care of this whether gitroot is a relative or absolute path, though it would be more intuitive to so:

if not os.path.isabs(gitroot):
    gitroot = os.path.abspath(os.path.join(srcroot, gitroot))
config_src = os.path.join(gitroot, "config")

@xylar
Copy link
Author

xylar commented May 10, 2022

Indeed, that seems to fix the problem. @jgfouca, would you like me to make a PR?

@jasonb5
Copy link
Collaborator

jasonb5 commented May 10, 2022

@xylar @jgfouca

From the debug text

srcroot: /gpfs/fs1/home/ac.xylar/mpas-work/compass/test_stream_read_input_only/E3SM-Project
gitroot: ../../compass/.git/worktrees/test_stream_read_input_only/modules/E3SM-Project
gitroot: ../../compass/.git

the output of

gitroot = _find_git_root(srcroot)
gitroot = _parse_dot_git_path(gitroot)

should be /gpfs/fs1/home/ac.xylar/mpas-work/compass/test_stream_read_input_only/E3SM-Project not /gpfs/fs1/home/ac.xylar/mpas-work/compass.

The above fix should be added but I think there might be a bigger issue with determining the correct gitroot.

@xylar
Copy link
Author

xylar commented May 10, 2022

@jasonb5, I don't think that's right. The code is correctly detecting that /gpfs/fs1/home/ac.xylar/mpas-work/compass/test_stream_read_input_only/E3SM-Project/.git is a file not a directory. It needs to go to /gpfs/fs1/home/ac.xylar/mpas-work/compass/compass/.git to find a config file. There is no such file within the worktree.

@jasonb5
Copy link
Collaborator

jasonb5 commented May 10, 2022

@xylar The E3SM config is located in ../../compass/.git/worktrees/test_stream_read_input_only/modules/E3SM-Project, the provenance should be recording this config not the one under /gpfs/fs1/home/ac.xylar/mpas-work/compass/.git.

@xylar
Copy link
Author

xylar commented May 10, 2022

Ah, I see.

@xylar
Copy link
Author

xylar commented May 10, 2022

Yes, I think you're absolutely right.

@jgfouca jgfouca self-assigned this May 10, 2022
jgfouca added a commit that referenced this issue May 11, 2022
Fixes _find_gitroot for E3SM provenance

The function _find_gitroot would error when trying to determine the
models .git directory when being used as a submodule of another
project.

The error would occur when trying to copy a git metadata
file for provenance whose path was relative to the git repository.

This required two fixes, the first was to resolve the relative path and
the second was handling a third scenario for determining the model's
.git directory.

Test suite: scripts_regression_tests
Test baseline: n/a
Test namelist changes: n/a
Test status: n/a

Fixes #4251
User interface changes?: n
Update gh-pages html (Y/N)?: n
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants