Skip to content

Commit

Permalink
Merge branch 'catalyst-cooperative:main' into edit-data-sources-infor…
Browse files Browse the repository at this point in the history
…mation
  • Loading branch information
Nancy9ice authored Sep 6, 2024
2 parents b9d9b00 + 1d6363d commit 26c53d6
Show file tree
Hide file tree
Showing 54 changed files with 4,126 additions and 2,405 deletions.
54 changes: 42 additions & 12 deletions .github/workflows/pytest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,11 +50,19 @@ jobs:
pip install --no-deps --editable .
make docs-build
- name: Coverage debugging output
run: |
coverage debug config
coverage debug sys
coverage report --fail-under=0
ls -a
- name: Upload docs coverage artifact
uses: actions/upload-artifact@v4
with:
name: coverage-docs
path: ./*coverage*
include-hidden-files: true
path: .coverage

ci-unit:
runs-on: ubuntu-latest
Expand Down Expand Up @@ -97,16 +105,24 @@ jobs:
pip install --no-deps --editable .
make pytest-unit
- name: Coverage debugging output
run: |
coverage debug config
coverage debug sys
coverage report --fail-under=0
ls -a
- name: Upload unit tests coverage artifact
uses: actions/upload-artifact@v4
with:
name: coverage-unit
path: ./*coverage*
include-hidden-files: true
path: .coverage

ci-integration:
runs-on:
group: large-runner-group
labels: ubuntu-22.04-4core
labels: ubuntu-latest-4core
if: github.event_name == 'workflow_dispatch' || github.event.merge_group
permissions:
contents: read
Expand Down Expand Up @@ -174,11 +190,19 @@ jobs:
pudl_datastore --dataset epacems --partition year_quarter=2022q1
make pytest-integration
- name: Coverage debugging output
run: |
coverage debug config
coverage debug sys
coverage report --fail-under=0
ls -a
- name: Upload integration test coverage artifact
uses: actions/upload-artifact@v4
with:
name: coverage-integration
path: ./*coverage*
include-hidden-files: true
path: .coverage

- name: Log post-test Zenodo datastore contents
run: find ${{ env.PUDL_INPUT }}
Expand All @@ -199,22 +223,28 @@ jobs:
- name: List downloaded files
run: |
find coverage -type f
- name: Upload test coverage report to CodeCov
uses: codecov/codecov-action@v4
with:
directory: coverage
token: ${{ secrets.CODECOV_TOKEN }}
- name: Install Micromamba
uses: mamba-org/setup-micromamba@v1
with:
init-shell: bash
environment-name: coverage
create-args: >-
python=3.12
coverage>=7.4.1
- name: Combine coverage data and check that we have required coverage
# Required coverage is set in pyproject.toml section [tool.coverage.report]
coverage>=7.6.1
- name: Combine coverage data and output XML report
run: |
micromamba run -n coverage coverage combine coverage/*/.coverage
micromamba run -n coverage coverage xml --fail-under=0
- name: Upload XML coverage report to CodeCov
uses: codecov/codecov-action@v4
with:
disable_search: true
file: ./coverage.xml
token: ${{ secrets.CODECOV_TOKEN }}
fail_ci_if_error: true # optional (default = false)
plugin: noop
verbose: true
- name: Display coverage report and ensure it meets required minimum
# Required coverage is set in pyproject.toml section [tool.coverage.report]
run: |
micromamba run -n coverage coverage report
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
**/*.*.swp
**/__pycache__/*
docs/data_dictionaries/pudl_db.rst
docs/autoapi/*
.ipynb_checkpoints/
.cache/
.ruff_cache/
Expand Down
4 changes: 2 additions & 2 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ repos:
# Formatters: hooks that re-write Python & documentation files
####################################################################################
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.6.1
rev: v0.6.2
hooks:
- id: ruff
args: [--fix, --exit-non-zero-on-fix]
Expand All @@ -53,7 +53,7 @@ repos:

# Check for errors in restructuredtext (.rst) files under the doc hierarchy
- repo: https://github.com/PyCQA/doc8
rev: v1.1.1
rev: v1.1.2
hooks:
- id: doc8
args: [--config, pyproject.toml]
Expand Down
6 changes: 1 addition & 5 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -79,17 +79,13 @@ docs-clean:
rm -f docs/data_sources/ferc*.rst
rm -f docs/data_sources/gridpathratoolkit*.rst
rm -f docs/data_sources/phmsagas*.rst
rm -f coverage.xml

# Note that there's some PUDL code which only gets run when we generate the docs, so
# we want to generate coverage from the docs build. Then we need to convert that
# coverage output to XML so it's the coverage reports generated by pytest below, and can
# be combined into a single unified coverage report.
# we want to generate coverage from the docs build.
.PHONY: docs-build
docs-build: docs-clean
doc8 docs/ README.rst
coverage run ${covargs} -- ${CONDA_PREFIX}/bin/sphinx-build --jobs auto -v -W -b html docs docs/_build/html
coverage xml --fail-under=0

########################################################################################
# Running the Full ETL
Expand Down
5 changes: 3 additions & 2 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,9 @@ The Public Utility Data Liberation Project (PUDL)
What is PUDL?
-------------

The `PUDL <https://catalyst.coop/pudl/>`__ Project is an open source data processing
pipeline that makes US energy data easier to access and use programmatically.
The `PUDL <https://catalyst.coop/pudl/>`__ Project (pronounced puddle) is an open source
data processing pipeline that makes US energy data easier to access and use
programmatically.

Hundreds of gigabytes of valuable data are published by US government agencies, but it's
often difficult to work with. PUDL takes the original spreadsheets, CSV files, and
Expand Down
2 changes: 1 addition & 1 deletion devtools/zenodo/zenodo_data_release.py
Original file line number Diff line number Diff line change
Expand Up @@ -272,7 +272,7 @@ def _open_fsspec_file(self, openable_file: fsspec.core.OpenFile) -> IO[bytes]:
if "local" in openable_file.fs.protocol:
return openable_file.open()

tmpfile = tempfile.NamedTemporaryFile()
tmpfile = tempfile.NamedTemporaryFile() # noqa: SIM115
openable_file.fs.get(openable_file.path, tmpfile.name)
return tmpfile

Expand Down
2 changes: 1 addition & 1 deletion docker/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM mambaorg/micromamba:1.5.8
FROM mambaorg/micromamba:1.5.9

ENV PGDATA=${CONTAINER_HOME}/pgdata

Check warning on line 3 in docker/Dockerfile

View workflow job for this annotation

GitHub Actions / Test building the PUDL ETL Docker image

Variables should be defined before their use

UndefinedVar: Usage of undefined variable '$CONTAINER_HOME' More info: https://docs.docker.com/go/dockerfile/rule/undefined-var/

Expand Down
2 changes: 1 addition & 1 deletion docs/data_dictionaries/ferc1_db.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
===============================================================================
FERC Form 1 Data Dictionary
Raw FERC Form 1 Data Dictionary
===============================================================================

We have mapped the Visual FoxPro DBF files to their corresponding FERC Form 1
Expand Down
28 changes: 25 additions & 3 deletions docs/data_dictionaries/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,20 +3,42 @@
Data Dictionaries
=================

Data Processed & Cleaned by PUDL
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The PUDL data dictionary provides detailed metadata for the tables
in the PUDL database. This includes table descriptions,
field names, field descriptions, and field datatypes.

.. toctree::
:caption: Data Processed & Cleaned by PUDL
:maxdepth: 1
:titlesonly:

pudl_db

Raw, Unprocessed Data
^^^^^^^^^^^^^^^^^^^^^
Certain raw datasets (e.g. FERC Form 1) require additional effort to process.
We load these raw sources into SQLite databases before feeding them into
the PUDL data pipeline. The dictionaries below provide key metadata on
these raw sources including table name, table
description, links to corresponding database tables,
raw file names, page numbers, and data reporting frequency.

.. toctree::
:caption: Raw, Unprocessed Data
:maxdepth: 1
:titlesonly:

ferc1_db

Code Descriptions & Metadata
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This section contains mappings of codes in the raw tables to their
corresponding labels and descriptions in the processed PUDL database tables.
For example, the code, NV, represents the full description "Never to exceed"
in the core_eia_codes_averaging_periods table.

.. toctree::
:caption: Code Descriptions & Metadata
:maxdepth: 1
:titlesonly:

codes_and_labels
2 changes: 2 additions & 0 deletions docs/data_sources/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
Data Sources
============

The following data sources serve as the foundation for our data pipeline.

.. toctree::
:caption: Currently Available Data
:maxdepth: 1
Expand Down
2 changes: 1 addition & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
The Public Utility Data Liberation Project
===============================================================================

PUDL is a data processing pipeline created by `Catalyst Cooperative
PUDL (pronounced puddle) is a data processing pipeline created by `Catalyst Cooperative
<https://catalyst.coop/>`__ that cleans, integrates, and standardizes some of the most
widely used public energy datasets in the US. The data serve researchers, activists,
journalists, and policy makers that might not have the technical expertise to access it
Expand Down
11 changes: 11 additions & 0 deletions docs/release_notes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,17 @@ PUDL Release Notes
v2024.X.x (2024-XX-XX)
---------------------------------------------------------------------------------------

Schema Changes
^^^^^^^^^^^^^^
* Added :ref:`out_eia__yearly_assn_plant_parts_plant_gen` table. This table associates
records from the :ref:`out_eia__yearly_plant_parts` with ``plant_gen`` records from
that same plant parts table. See issue :issue:`3773` and PR :pr:`3774`.

Bug Fixes
^^^^^^^^^
* Include more retiring generators in the net generation and fuel consumption
allocation. Thanks to :user:`grgmiller` for this contirbution :pr:`3690`.

.. _release-v2024.8.0:

---------------------------------------------------------------------------------------
Expand Down
9 changes: 9 additions & 0 deletions docs/templates/resource.rst.jinja
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,15 @@

{{ resource.description | wordwrap(78) if resource.description else 'No table description available.' }}

{% if resource.schema.primary_key -%}
**The table has the following primary key columns:**
{% for key in resource.schema.primary_key %}
* {{ key }}
{% endfor %}
{% else -%}
**This table has no primary key.**
{%- endif %}

{% if resource.create_database_schema -%}
`Browse or query this table in Datasette. <https://data.catalyst.coop/pudl/{{ resource.name }}>`__
{% else -%}
Expand Down
Loading

0 comments on commit 26c53d6

Please sign in to comment.