Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PUDL Release v2022.11.30 #2077

Closed
33 tasks done
zaneselvans opened this issue Nov 18, 2022 · 7 comments · Fixed by #1681
Closed
33 tasks done

PUDL Release v2022.11.30 #2077

zaneselvans opened this issue Nov 18, 2022 · 7 comments · Fixed by #1681
Assignees
Labels
inframundo release Tasks directly related to data and software releases.

Comments

@zaneselvans
Copy link
Member

zaneselvans commented Nov 18, 2022

As soon as the nightly builds succeed on dev we'll be ready to merge into main and tag a new PUDL release that includes all 2021 data for all of our covered datasets.

Release Checklist / Notes

Since we want to automate the release process I'm trying to catalog everything I do here...

  • Sync Zenodo archives to GCS cache for nightly builds
    • censusdp1tract
    • eia860
    • eia860m
    • eia923
    • eia_bulk_elec
    • epacamd_eia
    • epacems
    • ferc1
    • ferc2
    • ferc6
    • ferc60
    • ferc714
  • After a passing nightly build, check that https://data.catalyst.coop is working as expected
  • Check that the PUDL Intake Catalog tests pass with the new dev archive.
  • Check that all the RMI stuff that depends on dev is working.
  • Merge Release 2022.11.XX #1681 into main
  • Update the data release build script in devtools/databeta.sh to use the new data (and pull it from GCS?)
  • Re-run a nightly build that uses the official Arelle release.
  • Package the official Arelle distribution & catalystcoop.ferc-xbrl-extractor on conda-forge. See this PR
  • Tag the release on main, using CalVer (YYYY.MM.DD)
  • Wait for the tagged build to complete successfully
  • Release catalystcoop.pudl v2022.11.30 on PyPI.
  • Convert tag to release on GitHub & upload PyPI distribution outputs.
  • Manually update metadata for Zenodo archive of the PUDL GitHub repo.
  • Release catalystcoop.pudl on conda-forge
  • Update the PUDL Examples repo environment to use new software versions.
  • Do a versioned release of the PUDL Intake Catalog to point at the newly released data version. Need to merge Add s3 bucket support pudl-catalog#70 first.
  • Update the PUDL Examples repo environment to install new version of pudl-catalog with the v2022.11.30 data.
  • Update PUDL Examples notebooks to run locally with the newly released data + software versions.
  • Build a PUDL data release tarball including Docker + Data with pudl-data-release.sh & upload it to Zenodo
  • Update version of Docker container on the 2i2c JupyterHub
  • Upload the newly released data to the 2i2c JupyterHub (downloaded from S3 release bucket)
@zaneselvans zaneselvans added the release Tasks directly related to data and software releases. label Nov 18, 2022
@zaneselvans zaneselvans self-assigned this Nov 18, 2022
@zaneselvans zaneselvans linked a pull request Nov 18, 2022 that will close this issue
@zaneselvans zaneselvans reopened this Nov 29, 2022
@zaneselvans zaneselvans changed the title PUDL Release 2022.11.XX PUDL Release 2022.11.30 Nov 29, 2022
@zaneselvans zaneselvans changed the title PUDL Release 2022.11.30 PUDL Release v2022.11.30 Nov 29, 2022
@jdangerx
Copy link
Member

jdangerx commented Jan 6, 2023

@zaneselvans - seems like v2022.11.30 is released, at least on Zenodo. Are we good to merge dev into main, close this issue, etc? Do you still need to do 2i2c stuff?

Also, is this the process you want help streamlining?

@zaneselvans
Copy link
Member Author

@jdangerx yes, this is a big part of the semi-manual release process that needs streamlining. The other big piece which @zschira has been looking at is on the data acquisition end with the pudl-archiver repository and #1418

I'm torn on the JupyterHub. If we're not going to update it, then we should remove it from the documentation. I do think some resource like this is / would be useful. I should just go ahead and update it. It should only take 10 minutes. I just ran out of steam.

@jdangerx
Copy link
Member

Tada!

Image

@zaneselvans
Copy link
Member Author

zaneselvans commented Feb 10, 2023

However, I'm pretty sure we still need to upload the new data to the JupyterHub. If you attempt to run the example notebooks in the new Docker container I believe they'll fail, since the data on the hub is from the prior release.

But this should hopefully be much faster & easier now that I can pull it down directly from the S3 bucket without needing to do any authentication.

@jdangerx
Copy link
Member

@zaneselvans how do we do that upload?

@zaneselvans
Copy link
Member Author

zaneselvans commented Feb 10, 2023

I usually log in to the JuypterHub, open a terminal within JupyterLab, and download the files from wherever they are on the internet. Historically this has been from Zenodo which has been flaky and slow. But now that we've got the build outputs in a publicly accessible bucket with no authentication required, it should be much faster and easier. Still need to install the AWS CLI on the hub to do recursive downloads. Should probably add that to the Docker container rather than needing to do it manually.

I've got it downloaded to the hub now and am mopping up the old versions, and putting the files in the right places now.

@zaneselvans
Copy link
Member Author

Okay, it's all updated now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
inframundo release Tasks directly related to data and software releases.
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants