Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use gcs cache in ci #1858

Merged
merged 6 commits into from
Aug 23, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/build-deploy-pudl.yml
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ jobs:
- id: "auth"
uses: "google-github-actions/auth@v0"
with:
credentials_json: "${{ secrets.GCE_SA_KEY }}"
credentials_json: "${{ secrets.DEPLOY_PUDL_SA_KEY }}"

# Setup gcloud CLI
- name: Set up Cloud SDK
Expand Down
8 changes: 7 additions & 1 deletion .github/workflows/tox-pytest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -55,11 +55,17 @@ jobs:
conda run -n pudl-test which sqlite3
conda run -n pudl-test sqlite3 --version

- name: Set default gcp credentials
id: gcloud-auth
uses: "google-github-actions/auth@v0"
with:
credentials_json: "${{ secrets.TOX_PYTEST_SA_KEY }}"

- name: Run PyTest with Tox
env:
API_KEY_EIA: ${{ secrets.API_KEY_EIA }}
run: |
conda run -n pudl-test tox
conda run -n pudl-test tox -- --gcs-cache-path gs://zenodo-cache.catalyst.coop

- name: Log post-test Zenodo datastore contents
run: find ~/pudl-work/data/
Expand Down
2 changes: 1 addition & 1 deletion docs/dev/nightly_data_builds.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ The ``gcloud`` command in ``build-deploy-pudl`` requires certain Google Cloud
Platform (GCP) permissions to start and update the GCE instance. The
``gcloud`` command authenticates using a service account key for the
``deploy-pudl-github-action`` service account stored in PUDL's GitHub secrets
as ``GCE_SA_KEY``. The ``deploy-pudl-github-action`` service account has
as ``DEPLOY_PUDL_SA_KEY``. The ``deploy-pudl-github-action`` service account has
the `Compute Instance Admin (v1) IAM <https://cloud.google.com/iam/docs/understanding-roles#compute-engine>`__
role on the GCE instances to update the container and start the instance.

Expand Down
24 changes: 16 additions & 8 deletions src/pudl/workspace/datastore.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
import coloredlogs
import datapackage
import requests
from google.auth.exceptions import DefaultCredentialsError
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry

Expand Down Expand Up @@ -288,9 +289,16 @@ def __init__(
if local_cache_path:
self._cache.add_cache_layer(resource_cache.LocalFileCache(local_cache_path))
if gcs_cache_path:
self._cache.add_cache_layer(
resource_cache.GoogleCloudStorageCache(gcs_cache_path)
)
try:
self._cache.add_cache_layer(
resource_cache.GoogleCloudStorageCache(gcs_cache_path)
)
except DefaultCredentialsError:
logger.info(
f"Unable to obtain credentials for GCS Cache at {gcs_cache_path}. "
"Falling back to Zenodo if necessary."
)
pass

self._zenodo_fetcher = ZenodoFetcher(sandbox=sandbox, timeout=timeout)

Expand All @@ -299,7 +307,7 @@ def get_known_datasets(self) -> list[str]:
return self._zenodo_fetcher.get_known_datasets()

def get_datapackage_descriptor(self, dataset: str) -> DatapackageDescriptor:
"""Fetch datapackage descriptor for given dataset either from cache or from zenodo."""
"""Fetch datapackage descriptor for dataset either from cache or Zenodo."""
doi = self._zenodo_fetcher.get_doi(dataset)
if doi not in self._datapackage_descriptors:
res = PudlResourceKey(dataset, doi, "datapackage.json")
Expand All @@ -325,9 +333,9 @@ def get_resources(
"""Return content of the matching resources.

Args:
dataset (str): name of the dataset to query.
cached_only (bool): if True, only retrieve resources that are present in the cache.
skip_optimally_cached (bool): if True, only retrieve resources that are not optimally
dataset: name of the dataset to query.
cached_only: if True, only retrieve resources that are present in the cache.
skip_optimally_cached: if True, only retrieve resources that are not optimally
cached. This triggers attempt to optimally cache these resources.
filters (key=val): only return resources that match the key-value mapping in their
metadata["parts"].
Expand All @@ -349,7 +357,7 @@ def get_resources(
self._cache.add(res, contents)
yield (res, contents)

def remove_from_cache(self, res: PudlResourceKey):
def remove_from_cache(self, res: PudlResourceKey) -> None:
"""Remove given resource from the associated cache."""
self._cache.delete(res)

Expand Down
2 changes: 1 addition & 1 deletion tox.ini
Original file line number Diff line number Diff line change
Expand Up @@ -259,7 +259,7 @@ addopts = --verbose --pdbcls=IPython.terminal.debugger:TerminalPdb
log_format = %(asctime)s [%(levelname)8s] %(name)s:%(lineno)s %(message)s
log_date_format= %Y-%m-%d %H:%M:%S
log_cli = true
log_cli_level = info
log_cli_level = debug
doctest_optionflags = NORMALIZE_WHITESPACE IGNORE_EXCEPTION_DETAIL ELLIPSIS
filterwarnings =
ignore:distutils Version classes are deprecated:DeprecationWarning
Expand Down