Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address loose ends in versioned release mechanics #3421

Merged
merged 9 commits into from
Feb 25, 2024

Conversation

zaneselvans
Copy link
Member

@zaneselvans zaneselvans commented Feb 23, 2024

Overview

Deal with some minor issues in the release mechanics that came up in the last release.

Closes #3375

Testing

How did you make sure this worked? How can a reviewer verify this?

To-do list

@zaneselvans zaneselvans self-assigned this Feb 23, 2024
@zaneselvans zaneselvans added metadata Anything having to do with the content, formatting, or storage of metadata. Mostly datapackages. nightly-builds Anything having to do with nightly builds or continuous deployment. labels Feb 23, 2024
@zaneselvans zaneselvans changed the title Improve build script Address loose ends in versioned release mechanics Feb 23, 2024
@zaneselvans zaneselvans added this to the v2024.02 milestone Feb 23, 2024
pytest ${pytest_args} -n auto --live-dbs test/validate
pytest ${pytest_args} -n 4 --no-cov --live-dbs test/validate
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests aren't used for generating coverage, and coverage is turned on by default in the coverage configuration. Explicitly disable coverage here to avoid a "failure" after the test runs.

Also, switch to using at most 4 CPUs to avoid running out of memory with our currently very memory inefficient system for running the data validations.

@@ -71,7 +71,7 @@ function run_pudl_etl() {

function save_outputs_to_gcs() {
echo "Copying outputs to GCP bucket $PUDL_GCS_OUTPUT" && \
gsutil -m cp -r "$PUDL_OUTPUT" "$PUDL_GCS_OUTPUT" && \
gsutil -q -m cp -r "$PUDL_OUTPUT" "$PUDL_GCS_OUTPUT" && \
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some -q and --quiet flags to the GCS and S3 commands since they produce verbose interactive terminal outputs that we don't want to log.

Comment on lines +286 to +291
# If running a tagged release, ensure that outputs can't be accidentally deleted
# It's not clear that an object lock can be applied in S3 with the AWS CLI
if [[ "$GITHUB_ACTION_TRIGGER" == "push" && "$BUILD_REF" == v20* ]]; then
gcloud storage objects update "gs://pudl.catalyst.coop/$BUILD_REF/*" --temporary-hold 2>&1 | tee -a "$LOGFILE"
GCLOUD_TEMPORARY_HOLD_SUCCESS=${PIPESTATUS[0]}
fi
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should prevent the accidental deletion of versioned outputs after they've been distributed, on GCS at least.

@@ -285,14 +285,14 @@ filterwarnings = [
"ignore:Subclassing validator classes is not intended to be part of their public API.:DeprecationWarning",
"ignore:Subclassing validator classes:DeprecationWarning:tableschema",
"ignore:The Shapely GEOS version:UserWarning:geopandas[.*]",
"ignore:Unknown extension:UserWarning:openpyxl.worksheet[.*]",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this since we aren't using the openpyxl engine any more.

asset_check_from_schema(asset_key, _package) for asset_key in _asset_keys
asset_check_from_schema(asset_key, _package)
for asset_key in _asset_keys
if asset_key.to_user_string() != "core_epacems__hourly_emissions"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Asset check was failing on core_epacems__hourly_emissions since it's written out very differently than the other tables. It also has a billion rows, so we would need to check it in a different way if we wanted to check it.

"valid_till_date": {
"valid_until_date": {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the full word rather than a nonstandard contraction.

Comment on lines -109 to +110
path.open("wb").write(content)
with path.open("wb") as file:
file.write(content)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was generating a warning about the file resource being left open, so I switched to a context manager.

@zaneselvans zaneselvans marked this pull request as ready for review February 24, 2024 20:22
@zaneselvans zaneselvans added the release Tasks directly related to data and software releases. label Feb 24, 2024
@zaneselvans zaneselvans added this pull request to the merge queue Feb 25, 2024
@pudlbot pudlbot removed the request for review from bendnorman February 25, 2024 00:26
Merged via the queue into main with commit 730cb52 Feb 25, 2024
12 checks passed
@zaneselvans zaneselvans deleted the improve-build-script branch February 25, 2024 01:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
metadata Anything having to do with the content, formatting, or storage of metadata. Mostly datapackages. nightly-builds Anything having to do with nightly builds or continuous deployment. release Tasks directly related to data and software releases.
Projects
Archived in project
2 participants