Skip to content
This repository has been archived by the owner on Jan 12, 2024. It is now read-only.

Add s3 bucket support #70

Merged
merged 16 commits into from
Dec 20, 2022
Merged

Add s3 bucket support #70

merged 16 commits into from
Dec 20, 2022

Conversation

bendnorman
Copy link
Member

@bendnorman bendnorman commented Dec 15, 2022

This PR adds the new s3://intake.catalyst.coop bucket and sets it as the default. I also removed the requester pays documentation, given we don't want users to be using the GCS requester pays bucket.

@bendnorman bendnorman linked an issue Dec 15, 2022 that may be closed by this pull request
18 tasks
@codecov
Copy link

codecov bot commented Dec 15, 2022

Codecov Report

Base: 100.0% // Head: 100.0% // No change to project coverage 👍

Coverage data is based on head (d027d82) compared to base (7915cb4).
Patch coverage: 100.0% of modified lines in pull request are covered.

Additional details and impacted files
@@           Coverage Diff           @@
##              dev      #70   +/-   ##
=======================================
  Coverage   100.0%   100.0%           
=======================================
  Files           2        2           
  Lines          44       44           
=======================================
  Hits           44       44           
Impacted Files Coverage Δ
src/pudl_catalog/__init__.py 100.0% <100.0%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@zaneselvans
Copy link
Member

I think the failing tests here are because we were using an old version of the EPA CEMS outputs previously, which had a few additional columns that have since been removed by @aesharpe, so I am updating the expectations in the integration tests now.

Copy link
Member

@zaneselvans zaneselvans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor ask -- can you migrate the gcloud setup / nightly build access docs into our main PUDL docs? Folks have found it useful internally.

docs/requester_pays.rst Show resolved Hide resolved
src/pudl_catalog/__init__.py Outdated Show resolved Hide resolved
This way people can get access to the dev data by
installing the pudl_catalog package from git.
This value will be updated for tagged releases
so people can access the release data from pypi
and conda.
@zaneselvans
Copy link
Member

Oh thank goodness nobody will have to authenticate. That was gonna be annoying.

bendnorman and others added 5 commits December 19, 2022 15:57
I removed caching disabling because fsspec starting throwing
unexpected keyword argument errors when making requests to s3
with caching disabled. See intake/intake-parquet#26
for the full explanation.
@bendnorman
Copy link
Member Author

If the CI passes this PR should be good to go @zaneselvans

@@ -25,10 +28,45 @@
"hourly_emissions_epacems/epacems-2020-FL.parquet",
],
)
def test_file_exists(filename: str) -> None:
def test_gcs_file_exists(filename: str) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we're going to keep testing both S3 and GCS, which seems like a good idea if we actually want GCS to keep working as a fallback.

@zaneselvans zaneselvans merged commit e81ed3d into dev Dec 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace storage_option with AWS S3 bucket
2 participants