Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add some changes to vcs #1203

Draft
wants to merge 12 commits into
base: release
Choose a base branch
from
30 changes: 29 additions & 1 deletion src/esm_runscripts/helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@
import esm_parser
import esm_plugin_manager
import esm_tools
from loguru import logger
from esm_profile import print_profile_summary


Expand Down Expand Up @@ -447,3 +446,32 @@ def get_all_git_info(path):
"diffs": get_git_diffs(path, add_colors=False),
}
return git_info


class CachedFile:
"""
Represents a file that might already have saved information somewhere. Can be used to check if a cache is valid
"""

def __init__(self, path):
self.path = path

def is_younger_than(self, other, check_by="mtime"):
if check_by == "mtime":
return os.path.getmtime(self.path) < os.path.getmtime(other.path)
else:
raise ValueError("Invalid check_by value. Only 'mtime' is supported.")

def is_older_than(self, other, check_by="mtime"):
if check_by == "mtime":
return os.path.getmtime(self.path) > os.path.getmtime(other.path)
else:
raise ValueError("Invalid check_by value. Only 'mtime' is supported.")

def read(self):
with open(self.path, "r") as f:
return f.read()

def load_cache(self):
data = self.read()
return yaml.safe_load(data)
10 changes: 8 additions & 2 deletions src/esm_runscripts/prepare.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,12 @@

import questionary
import yaml
from loguru import logger

import esm_parser
import esm_utilities
from esm_calendar import Calendar, Date
from esm_plugin_manager import install_missing_plugins
from loguru import logger

from . import batch_system, helpers

Expand Down Expand Up @@ -695,6 +695,7 @@ def add_vcs_info(config):
The experiment configuration
"""
exp_vcs_info_file = f"{config['general']['thisrun_log_dir']}/{config['general']['expid']}_vcs_info.yaml"
esm_vcs_info_file_cache = helpers.CachedFile(exp_vcs_info_file)
logger.debug("Experiment information is being stored for usage under:")
logger.debug(f">>> {exp_vcs_info_file}")
vcs_versions = {}
Expand All @@ -708,7 +709,12 @@ def add_vcs_info(config):
vcs_versions[model] = f"Unable to locate model_dir for {model}."
continue
if helpers.is_git_repo(model_dir):
vcs_versions[model] = helpers.get_all_git_info(model_dir)
if esm_vcs_info_file_cache.is_younger_than(model_dir):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to think about this, maybe we want to compare it with the age of the binary: is not until one compiles it and uses the binary that the diff needs to be refreshed?

At the same time we need to check whether the binary in the model dir and in the work folder are the same, because ESM-Tools does not update the bin in a simulation unless you run esm_runscripts with the option --update-files=bin (or some argument like that)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I wasn't entirely sure about this, I'll think more about it. At the moment, it checks the modification time of the directory. The naive assumption that "changing a file in a directory will change the modification time of the directory" is wrong, apparently:

$ mkdir foo
$ gstat foo  # gstat because Darwin != Linux
  File: foo
  Size: 64        	Blocks: 0          IO Block: 4096   directory
Device: 1,14	Inode: 24969841    Links: 2
Access: (0755/drwxr-xr-x)  Uid: (952471260/  pgierz)   Gid: (    0/   wheel)
Access: 2024-08-09 15:19:30.833895558 +0200
Modify: 2024-08-09 15:19:30.833895558 +0200
Change: 2024-08-09 15:19:30.834023725 +0200
 Birth: 2024-08-09 15:19:30.833895558 +0200
$ sleep 10
$ touch foo/a_file
$ gstat foo
  File: foo
  Size: 96        	Blocks: 0          IO Block: 4096   directory
Device: 1,14	Inode: 24970749    Links: 3
Access: (0755/drwxr-xr-x)  Uid: (952471260/  pgierz)   Gid: (    0/   wheel)
Access: 2024-08-09 15:21:26.892522871 +0200
Modify: 2024-08-09 15:21:55.836772107 +0200
Change: 2024-08-09 15:21:55.836772107 +0200
 Birth: 2024-08-09 15:21:26.892522871 +0200
$ echo "my_change" > foo/a_file
$ sleep 3
$ gstat foo
  File: foo
  Size: 96        	Blocks: 0          IO Block: 4096   directory
Device: 1,14	Inode: 24970749    Links: 3
Access: (0755/drwxr-xr-x)  Uid: (952471260/  pgierz)   Gid: (    0/   wheel)
Access: 2024-08-09 15:22:46.780003463 +0200
Modify: 2024-08-09 15:21:55.836772107 +0200
Change: 2024-08-09 15:21:55.836772107 +0200
 Birth: 2024-08-09 15:21:26.892522871 +0200

...so I'll need to think of something else. But that is just a matter of changing the logic inside is_younger_than. (I like this name, it makes it easy to read 😉)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The access time changes, maybe we can use that...

logger.debug(f"Using cached VCS info for {model}")
cached_info = esm_vcs_info_file_cache.load_cache()
vcs_versions[model] = cached_info[model]
else:
vcs_versions[model] = helpers.get_all_git_info(model_dir)
else:
vcs_versions[model] = "Not a git-controlled model!"

Expand Down
Loading