Releases: catalyst-cooperative/pudl
PUDL v2024.10.0
This is a special early release to publish the new VCE Resource Adequacy Renewable Energy (RARE) dataset. It also includes final releases of EIA 860 and 923 data for 2023 and the FERC Form 714 data for 2021-2023, which had previously been integrated from the XBRL data published by FERC. See the release notes for more narrative detail.
What's Changed
New & Updated Data
- Extract 714 xbrl by @aesharpe in #3822
- FERC 714: transform of hourly demand table (dbf +xbrl) by @cmgosnell in #3842
- Add source metadata for
vceregen
by @aesharpe in #3887 - Integrate 2023 EIA 923 final release data by @e-belfer in #3903
- Extract VCE renewable generation profiles and remove deprecated
gsutil
from workflows by @e-belfer in #3893 - Transform vceregen renewable generation profiles by @aesharpe in #3898
Other Changes
- Update release process documentation by @zaneselvans in #3807
- added description of Data Source Heading by @Nancy9ice in #3780
- Make an EIA plant part association table with generators by @cmgosnell in #3774
- Remove
record_id_eia
foreign key relationship toout_pudl__yearly_assn_eia_ferc1_plant_parts
by @cmgosnell in #3819 - Add PUDL pronunciation to docs and README by @aesharpe in #3817
- Fix Excel and CSV column mapping errors by @cmgosnell in #3820
- Try to fix CodeCov report uploads in pytest workflow. by @zaneselvans in #3827
- If docs build interrupted, don't commit leftover autoapi files by @e-belfer in #3826
- added primary key column in data dictionary page by @Nancy9ice in #3821
- Raise error if FERC1 column renames don't match expected data by @e-belfer in #3791
- edited data dictionary description by @Nancy9ice in #3781
- Fix bugs in allocate_gen_fuel by @grgmiller in #3690
- update min-max rows for gen-fuel allocation* assets by @cmgosnell in #3831
- Add pudl usage metrics gcp infrastructure by @bendnorman in #3841
- Increase nightly build VM disk from 80GB to 100GB by @zaneselvans in #3853
- Adjust superset row limits by @bendnorman in #3843
- Add documentation section to PR template by @aesharpe in #3859
- Move dagster modules to
pyproject.toml
instead of CLI command by @bendnorman in #3865 - added introduction by @Nancy9ice in #3848
- FERC 714: Fix defensive check by @cmgosnell in #3869
- added descriptions of the EIA861 data by @Nancy9ice in #3808
- Integrate final 2023 EIA 860 data by @e-belfer in #3871
- Create an asset factory for FERC 1 output tables by @e-belfer in #3883
- Migrate where/when we filter for the freshest XBRL data by @cmgosnell in #3861
- fix rolling fuel cost average bug by @cmgosnell in #3892
- Add Mapbox key to superset by @bendnorman in #3854
- Update
gridpathratoolkit
andvceregen
metadata by @e-belfer in #3897 - Lint GHA workflows in pre-commit. by @jdangerx in #3870
- Update the source.py file with new name for vceregen dataset: vcerare by @aesharpe in #3907
- Update language in vcerare datasource description by @aesharpe in #3909
- Close out release notes and repo metadata for v2024.10.0 by @zaneselvans in #3916
Full Changelog: v2024.8.0...v2024.10.0
PUDL v2024.8.0
See the release notes for v2024.8.0 in our docs for a more narrative explanation of what has changed.
What's Changed
New & Updated Data
- Merge EIA861 short-form transform by @zaneselvans in #3660
- Update expected row counts in service territory tables. by @zaneselvans in #3673
- Add EIA AEO fuel cost projections by @jdangerx in #3656
- Integrate 2023 EIA 860 Early Release by @cmgosnell in #3681
- Integrate 2024 NREL ATB data by @e-belfer in #3719
- Update xbrl extraction to use new version by @zschira in #3710
- Transform FERC-714 load forecast table by @seeess1 in #3670
- Integrate EIA 923 ER by @aesharpe in #3721
- Eia923 er validation test fix by @aesharpe in #3734
- Add title to quarterly updates issue template by @e-belfer in #3765
- Add Q2 2024 CEMS data by @e-belfer in #3769
- Update DOI for NREL ATB to get error corrections by @zaneselvans in #3778
- Integrate Eia923 Q2 2024 Data by @aesharpe in #3768
- Update 2024 Q2 EIA bulk electricity data by @e-belfer in #3785
- Integrate 2024 Half 2 data for EIA 930 by @e-belfer in #3789
- Integrate Eia860M Q2 2024 Data by @aesharpe in #3767
- Update all FERC dataset DOIs by @zaneselvans in #3790
Other Changes
- Add staging environment by @jdangerx in #3666
- Add retry logic to datastore.get_zipfile_resource. by @jdangerx in #3658
- SoD Docs Updates by @e-belfer in #3672
- Add data-update github label to the release notes by @aesharpe in #3683
- Update nightly build script to quietly publish public Parquet outputs. by @zaneselvans in #3680
- Update conda lockfile by @zaneselvans in #3685
- Update docs to link to .zip not .gz S3 archives by @zaneselvans in #3686
- Take the most recently reported generator operating date when there's no 70%+ consistently reported date by @e-belfer in #3697
- Add organization using PUDL section to readme by @bendnorman in #3671
- Log datasette access IPs by @jdangerx in #3669
- Make asset checks matter to integration tests by @jdangerx in #3687
- Fix EIA 860m Changelog bugs by @cmgosnell in #3694
- Move
pudl_etl_job_factory
back topudl.etl.cli.py
by @bendnorman in #3711 - Automate issue creation for Quarterly Updates by @aesharpe in #3709
- Disable rolling avg to fill missing fuel prices in EIA923 FRC table by @zaneselvans in #3716
- Update allowable not water limited capacity ratio. by @zaneselvans in #3723
- Remove bad respondent ID 2 by @zaneselvans in #3724
- If branch is a fork, skip the
release.yml
workflow by @e-belfer in #3727 - Add direct_support keys in
core_eia860__scd_generators_energy_storage
as foreign keys tocore_eia__entity_generators
by @e-belfer in #3699 - Update all documentation URLs to point at nightly not latest by @zaneselvans in #3740
- Remove additional generator from expected
gens_eia860
row count by @e-belfer in #3743 - Add
generator_operating_date
to 860M changelog table by @e-belfer in #3751 - Add mozilla repo to wif pool by @zschira in #3753
- Update some dataset lists and date ranges. by @zaneselvans in #3754
- Fix
release.yml
fork behaviors by @e-belfer in #3788 - repaired ferc broken links by @Nancy9ice in #3787
- Superset deployment by @bendnorman in #3715
- Close out v2024.8.0 release notes. by @zaneselvans in #3801
New Contributors
- @seeess1 made their first contribution in #3670
- @Nancy9ice made their first contribution in #3787
Full Changelog: v2024.5.0...v2024.8.0
PUDL v2024.5.0
What's Changed
New Data
- Update EIA Bulk Electricity archive DOI by @zaneselvans in #3353
- 3313 Q4 2023 eia860 update by @aesharpe in #3367
- Add Q4 CEMS data by @e-belfer in #3379
- Extract raw 923 Schedule 8 A-D by @e-belfer in #3373
- Integrate monthly EIA923 data through November 2023 by @zaneselvans in #3422
- Add EIA Thermoelectric Cooling Water dataset DOI to datastore. by @zaneselvans in #3457
- Transform EIA860 and EIA923 Cooling System Tables by @aesharpe in #3405
- Add manual GridPath RA Toolkit renewable profile data source. by @zaneselvans in #3489
- eia860 solar: extract by @cmgosnell in #3482
- Extract EIA860 Energy Storage tables by @aesharpe in #3488
- NREL ATB axtraction by @cmgosnell in #3498
- Extract EIA 930 data, refactor extractors to handle different date partitions by @e-belfer in #3497
- Extract EIA923 energy storage table by @aesharpe in #3516
- Transform EIA860 Wind by @cmgosnell in #3522
- Transform and harvesting eia860 solar table by @cmgosnell in #3524
- WIP: GridPath RA Toolkit wind and solar generation profiles by @zaneselvans in #3514
- Transform and harvesting the eia860 Energy Storage table by @aesharpe in #3526
- EIA 923 energy storage transform by @aesharpe in #3546
- Extract AEO Table 54, with bonus 13/15/20. by @jdangerx in #3538
- Transform NREL ATB by @cmgosnell in #3570
- EIA-930 initial transform by @zaneselvans in #3584
- Extract Net Summer Electricity Generating Capacity from AEO Table 54 by @jdangerx in #3582
- Update EIA Bulk Electricity archive/DOI. by @zaneselvans in #3615
- Add electric sales transformation. by @jdangerx in #3613
- Add EPA CEMS 2024Q1 by @cmgosnell in #3624
- Q1 2024 eia860m eia923 by @aesharpe in #3625
Other Changes
- Fix (more) v2024.02.03 release issues by @zaneselvans in #3346
- Output Parquet files as well as SQLite in PUDL ETL by @zschira in #3296
- Split monolithic ferc_to_sqlite ops into per-dataset pieces by @rousik in #3098
- Add a simple test coverage check. by @zaneselvans in #3352
- Add a simple pytest coverage check on workflow_dispatch or merge queue by @zaneselvans in #3371
- Provide CodeCov token in pytest workflow. by @zaneselvans in #3374
- Update docs + add release template by @jdangerx in #3361
- Stop using live DB in unit tests!! by @jdangerx in #3377
- Add sec10k metadata to sources by @zschira in #3378
- Force --no-cov in nightly build by @jdangerx in #3382
- Use context managers for opening zipfiles by @bendnorman in #3369
- Update expected row count for EIA tables post 860m quarterly update by @aesharpe in #3380
- Skip batch job if build was skipped as a whole. by @jdangerx in #3390
- Update nightly build script to distribute parquet by @zaneselvans in #3399
- Make an EIA860m Changelog table by @cmgosnell in #3331
- Parametermize adding a column in the FERC1 transform & ensure
_correction
records end up in the calculation compoent table by @cmgosnell in #3409 - Simplify pytest-cov configuration. by @jdangerx in #3391
- Prototype dagster-pandera integration by @jdangerx in #3282
- Fix small plants input table to FERC all plants table by @katie-lamb in #3415
- Standardize process for merging tagged commits into persistent branches automatically by @zaneselvans in #3347
- Restore individual FERC 1 plant output tables. by @zaneselvans in #3417
- Experiment tracking by @zschira in #3289
- Address loose ends in versioned release mechanics by @zaneselvans in #3421
- Close out release notes for PUDL v2024.2.6 by @zaneselvans in #3427
- Fix minor issues that arose in v2024.2.6 release by @zaneselvans in #3432
- Harvest generator operating dates when they're within a year of one another by @e-belfer in #3419
- Add RMI beta access to parquet.catalyst.coop by @jdangerx in #3434
- Add new citations of Catalyst / PUDL by @zaneselvans in #3435
- Add BA codes and EIA sector IDs to EIA-860M changelog table by @zaneselvans in #3442
- Very minor but widespread formatting changes from ruff 0.3.0 by @zaneselvans in #3445
- Get multiple years of EIA 176/191/757A CSV data by @davidmudrauskas in #3402
- Delete unused try/except Excel read-in method in
pudl.extract.excel
by @e-belfer in #3454 - Update pull_request_template.md to improve full ETL instructions by @e-belfer in #3446
- Fix broken links and rendering failure in PR template by @e-belfer in #3458
- Add metadata for ATB, EIA 930 and AEO data by @e-belfer in #3474
- Add PUDL citation for Grid Strategies load growth report. by @zaneselvans in #3483
- Clean EIA 860 and 923 FGD operation and maintenance data by @e-belfer in #3403
- Fix nightly build FK failure by @e-belfer in #3491
- Add logline that tells us more about BadZipFile. by @jdangerx in #3493
- Add total -> subtotal calculation correction & fix hard-coded plant-in-service table name by @cmgosnell in #3450
- Fix indent error in nightly builds by @e-belfer in #3521
- add two new correction records into plant_in_service table by @cmgosnell in #3525
- Ferc1 rate base tag updates by @cmgosnell in #3517
- Schema cleanup by @zaneselvans in #3529
- Refactor etl/init.py to make adding new modules easier. by @jdangerx in #3539
- Attempt to limit
_out_ferc714__hourly_demand_matrix
concurrency by @bendnorman in #3541 - Manage concurrency of high-memory processes by @zaneselvans in #3543
- Tag additional assets as high memory usage by @zaneselvans in #3548
- Rename BA & Utility service territory tables to use conventions by @zaneselvans in #3552
- Pin ferc-xbrl-extractor<1.4 to facilitate frictionless v5 update by @zaneselvans in #3566
- Draft of package-level field encoding, applied to EIA by @zaneselvans in #3558
- Get last non-null value instead of latest XBRL filing. by @jdangerx in #3545
- Update expected row counts for FERC 1 tables by @zaneselvans in #3574
- Create beta access SA's for gridpath and zerolab. by @jdangerx in #3577
- Allow beta service accounts to access Parquet bucket by @jdangerx in #3586
- Speed up nb-output...
PUDL v2024.2.6
What's Changed
New Data
- Update EIA Bulk Electricity archive DOI by @zaneselvans in #3353
- 3313 Q4 2023 eia860 update by @aesharpe in #3367
- Add Q4 CEMS data by @e-belfer in #3379
- Extract raw 923 Schedule 8 A-D by @e-belfer in #3373
- Integrate monthly EIA923 data through November 2023 by @zaneselvans in #3422
Other Changes
- Fix (more) v2024.02.03 release issues by @zaneselvans in #3346
- Output Parquet files as well as SQLite in PUDL ETL by @zschira in #3296
- Split monolithic ferc_to_sqlite ops into per-dataset pieces by @rousik in #3098
- Add a simple test coverage check. by @zaneselvans in #3352
- Add a simple pytest coverage check on workflow_dispatch or merge queue by @zaneselvans in #3371
- Provide CodeCov token in pytest workflow. by @zaneselvans in #3374
- Update docs + add release template by @jdangerx in #3361
- Stop using live DB in unit tests!! by @jdangerx in #3377
- Add sec10k metadata to sources by @zschira in #3378
- Force --no-cov in nightly build by @jdangerx in #3382
- Use context managers for opening zipfiles by @bendnorman in #3369
- Update expected row count for EIA tables post 860m quarterly update by @aesharpe in #3380
- Skip batch job if build was skipped as a whole. by @jdangerx in #3390
- Update nightly build script to distribute parquet by @zaneselvans in #3399
- Make an EIA860m Changelog table by @cmgosnell in #3331
- Parametermize adding a column in the FERC1 transform & ensure
_correction
records end up in the calculation compoent table by @cmgosnell in #3409 - Simplify pytest-cov configuration. by @jdangerx in #3391
- Prototype dagster-pandera integration by @jdangerx in #3282
- Fix small plants input table to FERC all plants table by @katie-lamb in #3415
- Standardize process for merging tagged commits into persistent branches automatically by @zaneselvans in #3347
- Restore individual FERC 1 plant output tables. by @zaneselvans in #3417
- Experiment tracking by @zschira in #3289
- Address loose ends in versioned release mechanics by @zaneselvans in #3421
- Close out release notes for PUDL v2024.2.6 by @zaneselvans in #3427
Full Changelog: v2024.02.04...v2024.2.6
PUDL v2024.02.05
What's Changed
New Data
- Extract EIA923 emissions control table and add 2022 final release dat… by @aesharpe in #3100
- Update CEMS partitions to handle year-quarter files by @e-belfer in #3096
- Start EIA-176 pipelines: company data by @davidmudrauskas in #3227
- WIP: Extract raw PHMSA transmission data (A-D, H, I) by @e-belfer in #3235
- Update EIA bulk electricity data archive. by @zaneselvans in #3249
- Extract raw PHMSA distribution and start of transmission data (Table A-D, H, I) by @e-belfer in #2932
- Extract raw tables for PHMSA transmission data Part F & G by @e-belfer in #3242
- Map PHMSA Natural Gas Transmission Part R columns by @e-belfer in #3269
- Map PHMSA Natural Gas Transmission Part S columns by @e-belfer in #3262
- Map PHMSA Natural Gas Transmission Part T columns by @e-belfer in #3267
- Map PHMSA Natural Gas Transmission Part N-O columns by @e-belfer in #3260
- 3243 PHMSA transmission part J by @jdangerx in #3266
- Map PHMSA Natural Gas Transmission Part L columns by @e-belfer in #3254
- Map PHMSA Natural Gas Transmission Part M columns by @cmgosnell in #3270
- Map PHMSA Natural Gas Transmission Part Q columns by @cmgosnell in #3280
- PHMSA: fix to Part Q - fix the column names for the other materials (NOT MILES!) by @cmgosnell in #3291
- PHMSA transmission part P by @jdangerx in #3279
- Clean up and standardize PHMSA raw assets by @e-belfer in #3295
- Add EIA Forms 191 and 757 to sources in PUDL metadata by @e-belfer in #3304
Other Changes
- Sort DBs Fly Datasette; don't distribute Datasette's metadata.yml by @zaneselvans in #3106
- Add support for choosing between multiprocess and inprocess executors via cli flag by @rousik in #2895
- Improve flexibility for publishing options by @rousik in #2964
- Hide diffs in lock files by default. by @jdangerx in #3103
- Clean up PUDL CLI tools; use Click framework by @zaneselvans in #3086
- Create new issue template for adding a new year of data by @aesharpe in #3089
- Reorganize contributing docs + add process description. by @jdangerx in #3044
- add data corrections for "sizable minority" utilities by @cmgosnell in #3078
- Skip slow tests in pre-commit hook by @jdangerx in #3132
- Replace ferc714 @multi_asset with asset factory by @rousik in #3123
- WIP Transition CEMS paritions to
year_quarter
fromyear
andquarter
by @cmgosnell in #3139 - Table diff tools by @jdangerx in #3128
- Refactor calculation of annualized_respondents_ferc714 by @rousik in #3024
- Include sub-annual updates in annual_updates docs by @aesharpe in #3129
- Script to sync a local directory up to a Zenodo DOI by @zaneselvans in #3127
- Attempt to create sandbox data release in nightly builds. by @zaneselvans in #3158
- Alter output path for the nightly builds by @rousik in #3157
- Add EPA CEMS concurrency limit to
pudl_etl
by @bendnorman in #3160 - Remove obsolete Docker data access instructions. by @zaneselvans in #3156
- Feature branch: Rename core + output assets to match new naming protocols by @e-belfer in #2818
- Knowledge contribution docs by @jdangerx in #3151
- Update EPA CEMS docs for quarterly data, new table name. by @zaneselvans in #3178
- Improve nightly build and deployment logic by @zaneselvans in #3164
- Clean up nightly build/deploy w/o nightly branch update. by @zaneselvans in #3188
- Update FERC rate base tags with RMI guidance by @cmgosnell in #3162
- update the 860m doi by @cmgosnell in #3189
- Update ferc-ferc plant matching with ccai implementation. by @zschira in #3007
- Tell setuptools_scm to ignore non-version tags by @zaneselvans in #3193
- Clarify consistency check and lower threshold from 75% to 74% by @zaneselvans in #3194
- Reduce FERC1 match threshold in test to 85% by @zaneselvans in #3197
- Update nightly branch after successful build by @zaneselvans in #3195
- Use digest instead of tag so VM uses right image. by @jdangerx in #3206
- It's german. It means "the dev, the" by @zaneselvans in #3212
- Switch from dev to main/nightly/stable branch structure. by @zaneselvans in #3216
- Suppress excessive numba logs by @zschira in #3217
- Make CEMS extraction handle new listed
year_quarter
partitions by @e-belfer in #3187 - Tweak metadata and pyarrow schema methods to work for all tables by @zaneselvans in #3222
- Add --build-only flag to datasette deploy script. by @jdangerx in #3231
- Generalize passing args through to flyctl deploy by @zaneselvans in #3236
- Create directory for local only notebooks by @bendnorman in #3230
- fill in the annual columns with quarterly balances by @cmgosnell in #3234
- FERC to EIA Entity Matching Refactor by @katie-lamb in #3184
- Apply new naming conventions to
devtool
notebooks by @bendnorman in #3228 - Update gas price upper bound; Enable null-row check for MCOE. by @zaneselvans in #3252
- Tiny fix to make FERC to EIA tests xfail by @katie-lamb in #3257
- PHMSA gas extract step for transmission part k by @cmgosnell in #3258
- Fix dataframe embedder op factory and tune ferc-ferc model by @zschira in #3247
- Nightly build quality of life improvements by @jdangerx in #3287
- Push to prod Zenodo sometimes, fix pypi release flow, update docs by @jdangerx in #3292
- Add issue template for nightly build by @jdangerx in #3286
- Parameterize the FERC1 transform step what transfers quarterly filed data into annual columns by @cmgosnell in #3300
- Add retries, and use fsspec to handle GCS by @jdangerx in #3299
- Use Google Batch for full ETL runs by @jdangerx in #3211
- Hide
_out
tables and restore metadata in Datasette by @bendnorman in #3226 - Update build ref by @jdangerx in #3321
- Rename some straggler assets by @bendnorman in #3294
- Resource field description cleanup by @aesharpe in #3283
- Fix nightly builds 2024-01-31 by @zaneselvans in #3329
- Trigger CI on merge group; only trigger integration on merge group. by @jdangerx in #3332
- Fix nightly build failure 2024-02-01 by @zaneselvans in #3334
- ensure all the cor...
v2023.12.01
What's Changed
- Dbf xbrl mapping by @zaneselvans in #2088
- eia860m september update by @cmgosnell in #2079
- integrate the elecrtric energy source dbf & xbrl tbl by @cmgosnell in #2094
- fix ferc1 record_id validation errors by @cmgosnell in #2102
- Electric Dispositions Table by @cmgosnell in #2100
- Use app token for auto-merging bot PRs when CI passes. by @zaneselvans in #2106
- fix table name in record_id test by @cmgosnell in #2111
- Transform f1 xmssn line by @aesharpe in #2103
- Map
f1_bal_sheet_cr
by @aesharpe in #2113 - Utility plant summary by @cmgosnell in #2105
- Allow Tox v4+ in the dev extras environment. by @zaneselvans in #2117
- Bump ferc-xbrl-extractor version to avoid Arelle locale issue by @zschira in #2118
- Map
f1_elc_op_mnt_expn
table by @aesharpe in #2114 - Map
f1_comp_balance_db
table by @aesharpe in #2112 - Merge release branch updates into our working
dev
branch. by @zaneselvans in #2133 - Add the
balance_sheet_assets_ferc1
table by @cmgosnell in #2127 - Xbrl metadata restructuring by @zaneselvans in #2136
- Add AWS creds to build-deploy-pudl action and copy outputs to s3 bucket by @bendnorman in #2137
- Mitigate zenodo dependency in docs build by @zaneselvans in #2150
- Update to new version of FERC XBRL Extractor by @zaneselvans in #2151
- Transform
f1_dacs_epda
by @zschira in #2143 - Add depreciation_amortization_summary_ferc1 to non-unique record ID's… by @zschira in #2154
- Integrate
income_statement_ferc1
table by @cmgosnell in #2147 - Move awscli from pudl package to docker image by @bendnorman in #2163
- Ferc1 xbrl table release notes by @zaneselvans in #2157
- Transform
f1_bal_sheet_cr
by @aesharpe in #2134 - Transform
f1_retained_erng
xbrl + dbf by @cmgosnell in #2155 - Transform
f1_elc_op_mnt_expn
by @aesharpe in #2162 - Transform
f1_accumdepr_prvsn
dbf + xbrl by @zaneselvans in #2119 - Replace lingering transmission_ferc1 w/ transmission_statistics_ferc1 by @zaneselvans in #2178
- Make transform params stricter 2 by @aesharpe in #2177
- Validate the raw ferc1 tables in the settings by @cmgosnell in #2168
- Update & simplify PR template formatting / language. by @zaneselvans in #2181
- Integrate
f1_cash_flow
FERC1 table by @cmgosnell in #2184 - Transform electric_plant_depreciation_functional_ferc1 DBF + XBRL by @zaneselvans in #2183
- Delete old notebooks by @jdangerx in #2186
- Transform f1_elctrc_oper_rev by @zschira in #2192
- Add direct S3 nightly build download links to README. by @zaneselvans in #2199
- Update documentation to refer to
archiver
and notscrapers
orzenodo-storage
by @jdangerx in #2190 - Notify community-dev channel by @bendnorman in #2211
- Transform f1 othr reg liab by @zschira in #2215
- Add other_regulatory_liabilities_ferc1 to the list of non-unique reco… by @zschira in #2222
- Change FuelFix for nuclear from mmmbtu to mmbtu by @aesharpe in #2233
- Xbrl test speedups by @zschira in #2229
- Restrict the FERC1 output tables with the PudlTabl's start/end date by @cmgosnell in #2238
- Fix fuel ferc1 expected values by @aesharpe in #2241
- add methods to PudlTabl so it can be serialized and de-serialized (v2) by @arengel in #2251
- Retain all reported EIA sector codes for harvesting by @knordback in #2200
- Pin SQLAlchemy<2.0 and allow pip 23 by @zaneselvans in #2268
- Use Workload Identity Federation in GH Actions by @jdangerx in #2259
- Retain all EIA sector IDs for harvesting by @zaneselvans in #2270
- Split EIA extract steps and add field types to dataset_settings by @bendnorman in #2263
- Dagster cli wrapper by @zschira in #2272
- Add design process documentation by @jdangerx in #2282
- Remove tables from settings by @zschira in #2286
- Add more service accounts to Workload Identity Federation by @jdangerx in #2273
- Add EIA 176 to sources.py by @e-belfer in #2258
- Docs updates for Annual Updates by @aesharpe in #2089
- Add
electricity_sales_by_rate_schedule_ferc1
table by @aesharpe in #2205 - Rework of FERC to EIA logistic regression model by @katie-lamb in #2276
- Add fuel allocation release notes by @zaneselvans in #2308
- Extract 860 EnviroAssoc and EnviroEquip Tables in PUDL by @e-belfer in #2281
- Integrate FERC-EIA record linkage into PUDL by @cmgosnell in #2224
- Update unit tests for allocate net gen by @jdangerx in #2297
- Fix google auth error in tox-pytest by @jdangerx in #2311
- Ferc to EIA match release notes by @katie-lamb in #2313
- Convert FERC1 -> EIA missing ID validation ET[L] to Dagster by @jdangerx in #2309
- Update integration tests to work with Dagster ETL by @zschira in #2299
- Convert epacems_to_parquet command to run dagster asset by @bendnorman in #2300
- Add spot fix function/class by @e-belfer in #2254
- Update previous balancing_authority_code_eia fixes for plants_eia860 by @e-belfer in #2312
- Breakpoints in Dagster by @jdangerx in #2322
- Fix balancing_authority_name update in cases where no ba_name_to_code_map() by @e-belfer in #2323
- Update doi to point to new epacamd_eia archive by @aesharpe in #2316
- Update local cache when using
--gcs-cache-path
by @jdangerx in #2326 - Resolve dev -> dagster merge fixes by @bendnorman in #2318
- Configure dagster env vars from settings if not set already by @zschira in #2332
- Remove code deprecated by dagster by @zschira in #2341
- Run nightly builds in dagster-world by @jdangerx in #2344
- Update s3 bucket urls to use https by @bendnorman in #2351
- Add jobs for excluding EPA CEMS assets by @bendnorman in #2343
- Merge s3 readme url changes into
dev
by @bendnorman in #2355 - Add boiler-associated attributes from EIA 860 6.2 EnvrEquip tables to ETL by @e-belfer in #2319
- Parameterize reconstructable jobs to set loglevel by @bendnorman in #2348
- Merge
dev
intodagster-asset-etl
once again by @jdangerx ...
PUDL v2022.11.30
See the release notes for v2022.11.30 in our docs for a more narrative explanation of what has changed.
What's Changed
This is all the PRs that were merged since the last release, excluding those made by the @dependabot and @pre-commit-ci bots.
- Apply black autoformatting by @zaneselvans in #1543
- Apply black formatting by @zaneselvans in #1548
- Update to pip 22, setuptools 61. Add nbconvert to pudl-dev by @zaneselvans in #1565
- Add installation_year and construction_year to PPL by @katie-lamb in #1554
- Modify EPA CEMS ETL to facilitate Intake Catalog by @zaneselvans in #1563
- Hub EIA transition by @cmgosnell in #1575
- Add office hours scheduling links to README by @zaneselvans in #1582
- Rename tox virtualenv dir from .env_pudl to .env_tox by @zaneselvans in #1586
- Add rstcheck to our collection of linters by @zaneselvans in #1587
- Bring in year_state_filter tests & improvements from pudl_catalog by @zaneselvans in #1589
- Update maximum allowed version of setuptools to 62. by @zaneselvans in #1590
- Use partial function in map of EPA CEMS ETL by @zaneselvans in #1591
- Additional code formatting/linting without modernizing Python syntax by @zaneselvans in #1598
- Refactor labeling of true granularities with plant part to generator match function by @katie-lamb in #1447
- Re-gigger backfilling
technology_description
& makeprime_mover_code
an annually harvested column by @cmgosnell in #1600 - Remove some seldom used dependencies from pudl-dev environment.yml by @zaneselvans in #1615
- Fix breakage resulting from dask v2022.4.2 by @zaneselvans in #1618
- Cinco de Mayo 🇲🇽 by @zaneselvans in #1616
- Clean operational_status_code using metadata encoder by @cmgosnell in #1624
- add opex_nonfuel column to all FERC1 plant tables in output layer for all_plants_ferc1 table by @aesharpe in #1626
- Add installation_year and construction_year as plant part level by @katie-lamb in #1578
- Add ML for sustainable energy citation to bibliography by @zaneselvans in #1641
- Dependabot auto merge by @zaneselvans in #1655
- Small docs updates by @aesharpe in #1642
- Dynamically generate RSTs with new DataSource metadata by @katie-lamb in #1532
- Add DataSource Metadata for EPA-EIA Crosswalk by @aesharpe in #1676
- Address issue where 861 ETL fails w/o all years of data by @arengel in #1671
- update ferc-eia glue with fixes found from the FERC plant-ID-er by @cmgosnell in #1678
- Rework clean_merge_asof func by @katie-lamb in #1550
- Switch to the Furo Sphinx theme by @zaneselvans in #1680
- Require Python 3.10 and update to modern syntax by @zaneselvans in #1685
- Release notes for date_merge and default columns change for PPL and MCOE by @katie-lamb in #1690
- Apply Yaml pre-commit formatter by @bendnorman in #1689
- Fix all plants ferc1 by @aesharpe in #1656
- Require Python 3.10 in the pudl-dev conda environment by @zaneselvans in #1697
- Draft: add capacity mw to mcoe defaults and fix row counts in validation tests for eia tables by @katie-lamb in #1695
- GCE Deploy by @zaneselvans in #1627
- Add workflow_dispatch support to nightly builds by @bendnorman in #1702
- Add build-deploy-pudl.yml to main by @bendnorman in #1703
- Fix Github Ref bug by @bendnorman in #1704
- Unpin apt-get packages by @bendnorman in #1725
- Remove unnecessary packages from build system; specify backend. by @zaneselvans in #1743
- Update setuptools numpy by @zaneselvans in #1745
- Add gcs and bypass cache args to datastore cli by @bendnorman in #1740
- Fix dependabot automerge by @zaneselvans in #1753
- Bring new bot-auto-merge workflow into main by @zaneselvans in #1756
- Move slowly varying plant attributes from entity to annual plants table by @zaneselvans in #1749
- Fix bug in gens mega and plant part list creation by @katie-lamb in #1759
- Integrate EPA CEMS hourly emissions data for 2021 by @zaneselvans in #1778
- EIA923 early release, EIA860 early release, and 860m 2022-06 by @cmgosnell in #1834
- Avoid using Shapely v1.8.3 due to upstream bug / incompatibility by @zaneselvans in #1848
- Update eia923 raw inputs to include revisions made by EIA on 2022-08-11 by @zaneselvans in #1846
- Add missing columns and update EIA860, EIA860M and EIA923 data for 2021 by @cmgosnell in #1836
- Use gcs cache in ci by @zaneselvans in #1858
- Patch nightly build flakiness by @bendnorman in #1856
- Create a
data_maturity
label for EIA data by @cmgosnell in #1855 - Update bug report issue template by @zaneselvans in #1869
- Update ETL settings files to work with XBRL+DBF and new Ferc1Settings by @cmgosnell in #1886
- Xbrl steam but really by @cmgosnell in #1881
- Use internal zenodo-cache bucket for nightly builds by @bendnorman in #1880
- Encode balancing authority codes by @cmgosnell in #1897
- Fill in some null BA codes using BA names by @cmgosnell in #1906
- Implement drop_invalid_rows() for fuel_ferc1 table by @zaneselvans in #1903
- Split TableTransformer.transform() into 3 phases by @zaneselvans in #1900
- Prepare raw FERC XBRL DB's for publication with Datasette by @zschira in #1831
- Aggregate data_maturity in gfn_eia923, update EIA ETL debugging Notebook by @zaneselvans in #1915
- Use provision-micromamba and remove ferc1_solo ETL to speed up CI. by @zaneselvans in #1913
- Fill in pre-2013 BA codes by @cmgosnell in #1911
- Update the name of the EPA CAMD to EIA crosswalk data source. by @zaneselvans in #1918
- Add metadata & DOIs for EIA Bulk Electricity data source by @zaneselvans in #1922
- Integrate EIA-861 2021 Early Release data by @zaneselvans in #1921
- Updating 861 package_data for 2021 early release by @arengel in #1920
- Add epacems crosswalk to etl by @aesharpe in #1692
- Re-add and update the epacamd-eia crosswalk analysis module by @aesharpe in #1934
- Add updated crosswalk analysis back into dev by @aesharpe in #1938
- Ensure PUDL works with Pandas 1.5.0 by @zaneselvans in #1902
- Plant part updates to fix RMI CI memory issues by @katie-lamb in #1865
- Fix build error for
epacamd_eia_test
by @aesharpe in https://github.com/catalyst-cooperative/pudl/...
PUDL v0.6.0
See the more extensive narrative release notes in our documentation.
What's Changed
- Fix release notes formatting and tox -e release warning by @zaneselvans in #1346
- Minor changes associated w/ data release for v0.5.0 by @zaneselvans in #1348
- Widen allowable Jinja versions to 2-3 by @zaneselvans in #1360
- Bb fips fix by @cmgosnell in #1364
- Allow PUDL SQLite DB to be loaded into Postgres by @zaneselvans in #1361
- Add support for Python 3.10 by @zaneselvans in #1373
- Better preserve dtypes in allocate_net_gen process by @cmgosnell in #1370
- Fill missing technology_description values in generators_eia860 by @aesharpe in #1075
- Merge dev into main before 2021-12-20 by @zaneselvans in #1375
- Constrain setuptools to <60.0.0 in environment.yml and pyproject.toml to avoid breaking changes by @Wheelspawn in #1384
- Implement PyArrow schemas in Pydantic metadata classes by @zaneselvans in #1377
- Use pd.NA where appropriate for ENUM and categorical fields by @katie-lamb in #1376
- Separate resource definitions by data source by @zaneselvans in #1386
- Export code static labeling to documentation by @katie-lamb in #1388
- Replace simple label substitutions with coding tables by @zaneselvans in #1416
- Remove COLUMN_DTYPES and switch to field metadata dictionary by @zaneselvans in #1408
- Add test for EIA generator technology_description backfilling by @aesharpe in #1389
- Fix county FIPS codes string type in fuel_receipts_costs_eia923 by @katie-lamb in #1405
- Add pudl id mapping rst docs by @aesharpe in #1387
- Minor changes to make pandas 1.4.0 work by @zaneselvans in #1421
- Correct time interval in etl_fast.yml description and correct typos in data_access.rst and intro.rst documentation by @Wheelspawn in #1428
- Address geopandas deprecation warnings by @zaneselvans in #1444
- Update numba to v0.55 which works w/ Python 3.10 by @zaneselvans in #1449
- Update ci-environment.yml to match pudl-dev environment.yml by @zaneselvans in #1450
- Consolidate data source metadata using a Pydantic model by @zschira in #1446
- Clean up FIPS codes and use same method for ZIP codes by @zaneselvans in #1476
- Fix a few incorrectly mapped PUDL IDs by @aesharpe in #1458
- Fix mismapped wheeling power company by @zaneselvans in #1480
- Valentines Day Merge ❤️ 💞 💘 by @zaneselvans in #1445
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #1482
- Customize CodeCov configuration file by @zaneselvans in #1481
- Use fixed random seeds for timeseries cleaning tests by @zaneselvans in #1483
- Attempt to resume notifying Slack on CI failures by @zaneselvans in #1484
- Implement DataSource to raw datapackage method by @zschira in #1475
- Remove lingering data package refs in docs by @zaneselvans in #1489
- Propagate Pydantic settings classes throughout the entire ETL by @zschira in #1506
- Enrich Datasette with new metadata by @katie-lamb in #1479
- Integrate eia860m data through 2021-12 by @zaneselvans in #1510
- Recombine nuke/non-nuke gen fuel in output functions by @zaneselvans in #1518
- Move the EIA plant-parts list into PUDL outputs by @cmgosnell in #1157
- Update release notes with changes since v0.5.0 by @zaneselvans in #1524
- Adjust expected row counts for eia860m-2012-12 by @zaneselvans in #1528
- Remove prefect dependency by @zaneselvans in #1529
- Potential v0.6.0 by @zaneselvans in #1526
New Contributors
- @Wheelspawn made their first contribution in #1384
- @katie-lamb made their first contribution in #1376
- @zschira made their first contribution in #1446
- @pre-commit-ci made their first contribution in #1482
Full Changelog: v0.5.0...v0.6.0
PUDL 0.5.0
Update to include 2020 annual data
See the more extensive release notes in our documentation.
Merged Pull Requests
- make generation allocation output mirror the standard generation table. by @cmgosnell in #1134
- Dependency and data release script updates by @zaneselvans in #1135
- Dependencies by @zaneselvans in #1150
- End of sprint merge of dev by @zaneselvans in #1158
- Epic template. by @bendnorman in #1164
- Hourly state demand by @zaneselvans in #1175
- EIA860 2001-2003 by @bendnorman in #1122
- Redesign metadata and harvest process by @ezwelty in #806
- Basic epa cems output by @TrentonBush in #1227
- Map small gen pudl ids by @aesharpe in #1231
- Dev PR for sprint ending 2021-09-24 by @zaneselvans in #1228
- Build all generated documentation dynamically by @zaneselvans in #1235
- Update dependencies, mostly related to testing, plus sklearn 1.0. by @zaneselvans in #1236
- Eliminate null values in generation_eia923 primary key fields by @bendnorman in #1248
- Drop rows with null generator_id in ownership_eia860 by @zaneselvans in #1258
- Add FERC1 output table that combines key FERC1 subtables by @aesharpe in #1209
- Deduplicate and re-organize metadata from constants.py by @zaneselvans in #1230
- Fix utility_id_eia issues in ownership & plants tables by @zaneselvans in #1268
- remove the data package cruft by @cmgosnell in #1267
- Updated xlsx_maps for eia860 2020 data by @bendnorman in #1273
- 2020 ferc1 by @aesharpe in #1274
- Defer validation of PudlTabl datastore to eia861/ferc714 ETL methods by @zaneselvans in #1275
- 2020 Harvest and load by @bendnorman in #1277
- Crosswalk analysis by @TrentonBush in #1256
- Beginnings of a PUDL bibliography by @zaneselvans in #1294
- add plant_id_pudl to small generators field by @aesharpe in #1293
- Deduplicate natural key fields of generation_fuel_eia923 by @bendnorman in #1296
- Integrate 2020 data for ferc1, eia860, eia923 by @zaneselvans in #1297
- Respond to CG's PR comments. Mostly docs. by @zaneselvans in #1308
- EIA-861 FERC-714 2020 by @zaneselvans in #1309
- Boiler fuel duplicate aggregation by @TrentonBush in #1306
- Fix errors with EIA861 output tables by @aesharpe in #1312
- Add missing output tables to EIA861 by @aesharpe in #1313
- 2020 Data Integration by @cmgosnell in #1255
- Update to flake8 v4.0; always install pudl for Tox by @zaneselvans in #1322
- Use pydantic for ETL settings validation by @bendnorman in #1292
- Update generation_fuel_eia923 documentation with nuclear unit change. by @bendnorman in #1323
- fix pandas API deprecation (issue #1173) by @TrentonBush in #1332
- Static metadata tables and automatic recoding by @zaneselvans in #1272
- Validate v0.5.0 by @zaneselvans in #1345
- PUDL v0.5.0 release candidate by @zaneselvans in #1334
New Contributors
- @bendnorman made their first contribution in #1164 🎉
Full Changelog: v0.4.0...v0.5.0
PUDL 0.4.0
This is our first release in more than a year and a half, and it contains lots of new data and analyses (and breaking changes...) but it doesn't yet include 2020 datasets for FERC and EIA.
See the complete v0.4.0 release notes for details.
Merged Pull Requests
- Unified logic for excel extraction by @rousik in #566
- fuel cost output to ref 860 generators. by @cmgosnell in #574
- Ferc714 by @yashkumar1803 in #594
- Ei mcoe by @aesharpe in #592
- Transform function for distribution systems and other edits by @aesharpe in #643
- Add manually compiled balancing authority id fixes by @zaneselvans in #646
- Transform function for AMI EIA861 by @aesharpe in #647
- Transform function for EIA 861 Dynamic Pricing Table by @aesharpe in #649
- Normalize the Balancing Authority Table and add a BA Association Table by @zaneselvans in #651
- Transform func for Eia861 Green Pricing table by @aesharpe in #653
- Net metering table eia861 by @aesharpe in #671
- Service territories by @zaneselvans in #670
- Non net metering function eia861 by @aesharpe in #680
- Categorize eia codes with either Util or BA priority by @zaneselvans in #687
- Add a new FERC 714 Output Module by @zaneselvans in #699
- 635: Datastore passes travis tests by @ptvirgo in #701
- Operational data table eia861 by @aesharpe in #691
- Add limit_by_state option to utility territory generation by @zaneselvans in #707
- Simplify datapkg_to_sqlite script by @zaneselvans in #712
- Clobber datapackage bundles not single datapackages by @zaneselvans in #714
- Reliability and utility data eia861 by @aesharpe in #710
- Datastore improvements by @ptvirgo in #715
- Distributed generation eia861 by @aesharpe in #724
- Set up GitHub Actions to run Tox/PyTest by @zaneselvans in #727
- Restore utility_assn() and other code wiped out by PR 724 by @zaneselvans in #730
- Energy efficiency eia861 by @aesharpe in #731
- Demand mapping by @yashkumar1803 in #717
- Ferc714 by @ptvirgo in #733
- Demand side management eia861 by @aesharpe in #732
- Some tweaks to table columns and data types by @aesharpe in #743
- get_census2010_gdf uses datastore by @zaneselvans in #764
- Datastore data package validation and updated DOIs by @zaneselvans in #761
- More robust flake8 linting by @zaneselvans in #768
- Validate new dois by @cmgosnell in #773
- Draft of ferc1 + eia860 + eia923 data integration for 2019 by @zaneselvans in #788
- Merge Sprint25 into dev branch by @zaneselvans in #800
- Add DOIs for production archives on Zenodo by @zaneselvans in #804
- Zipcode fix by @aesharpe in #820
- Better help messages and default to verbose logging by @zaneselvans in #825
- Add docker build scripts by @rousik in #826
- Fix few issues surfaced in the previous PR by @rousik in #827
- Automate docker image builds by @rousik in #829
- Bump build-push-action to @v2 and fix arguments. by @rousik in #831
- Draft documentation framework for data sources by @aesharpe in #821
- Eia epa crosswalk by @aesharpe in #822
- Integrate EIA-860 2008 data by @aesharpe in #838
- Integrate EIA 860 M into ETL by @cmgosnell in #824
- Add basic Datasette metadata and deployment script by @zaneselvans in #841
- Notebook land: intro notebooks for CEMS and output tables by @aesharpe in #823
- add output and access notebooks by @cmgosnell in #844
- Allocate generation_fuel_eia923 table data to generators by @cmgosnell in #785
- Notebook land by @zaneselvans in #853
- Ensure deterministic checksums on csv.gz outputs by @rousik in #856
- Add output methods for all remaining EIA 861 tables. by @zaneselvans in #862
- EIA860 old years (through 2004) by @aesharpe in #849
- Add high-performance timeseries anomaly detection and imputation module by @ezwelty in #871
- Speed up FERC 714 hourly demand transform by @ezwelty in #873
- Always run interim ETL tests b/c they're fast now. by @zaneselvans in #874
- Alaska is a thing by @rousik in #876
- Specify min/max versions for all dependencies in setup.py by @zaneselvans in #875
- Fix broken links in README by @kyleries in #864
- Datastore refactoring by @rousik in #880
- Add unit test environment that runs quick tests under src/pudl by @rousik in #867
- Regex future warning by @rousik in #883
- Adjust FERC 714 service territories by using modified versions of EIA 861 tables by @ezwelty in #881
- Timeseries unittest by @zaneselvans in #885
- Bugfixes for states=[ALL] and SQLite DB clobber check by @zaneselvans in #890
- Consolidate interim ETL / output tests by @zaneselvans in #892
- Jupyterhub beta by @zaneselvans in #894
- Clean up PyTest config, coverage generation, unit tests by @zaneselvans in #896
- Implementation of DataFrameCollection by @rousik in #887
- Sprint29 by @zaneselvans in #897
- Dev by @zaneselvans in #898
- Pyarrow v3 by @zaneselvans in #912
- Eia860 validation by @aesharpe in #911
- Improvements to the DataFrameCollection by @rousik in #916
- pudl_datastore --list-partitions by @rousik in #925
- Pudl rmi by @zaneselvans in #926
- Pytest scripts by @zaneselvans in #913
- Sprint30 by @zaneselvans in #933
- Integrate EIA-860m through Nov. 2020 + fixed PUDL Plant IDs by @zaneselvans in #934
- Update PUDL Development Docs by @zaneselvans in #940
- Metadata docs by @aesharpe in #907
- Update transform documentation by @aesharpe in #939
- Convert Census DP1 to SQLite by @zaneselvans in #948
- Sprint31 by @zaneselvans in #951
- Databeta by @zaneselvans in #956
- Dev by @zaneselvans in #957
- Dev docs setup updates by @cmgosnell in https://github.com/catalyst-cooperative/pudl...