Integrate XBRL taxonomy metadata into `plant_in_service` transform #2058

zaneselvans · 2022-11-11T22:40:07Z

No description provided.

codecov · 2022-11-11T23:39:22Z

Codecov Report

Base: 84.6% // Head: 85.1% // Increases project coverage by +0.5% 🎉

Coverage data is based on head (f708893) compared to base (e5c530b).
Patch coverage: 100.0% of modified lines in pull request are covered.

Additional details and impacted files

@@                Coverage Diff                 @@
##           xbrl_integration   #2058     +/-   ##
==================================================
+ Coverage              84.6%   85.1%   +0.5%     
==================================================
  Files                    72      72             
  Lines                  8114    8119      +5     
==================================================
+ Hits                   6865    6910     +45     
+ Misses                 1249    1209     -40

Impacted Files	Coverage Δ
src/pudl/glue/ferc1_eia.py	`96.0% <ø> (ø)`
src/pudl/metadata/fields.py	`100.0% <ø> (ø)`
src/pudl/metadata/resources/ferc1.py	`100.0% <ø> (ø)`
src/pudl/transform/classes.py	`93.9% <ø> (ø)`
src/pudl/etl.py	`89.8% <100.0%> (+<0.1%)`	⬆️
src/pudl/extract/ferc1.py	`87.6% <100.0%> (+0.2%)`	⬆️
src/pudl/transform/ferc1.py	`95.6% <100.0%> (+8.7%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

zaneselvans · 2022-11-12T03:14:32Z

Welp, made it most of the way through, but then ran into an error with the plant mapping it looks like, in this test:

test_for_fk_validation_and_unmapped_ids[missing_utility_id_pudl_in_utilities_ferc1]

I would assume this is something from upstream since I haven't touched any of the plant mapping... but it looks like the xbrl_integration branch PR #1665 is passing? Ugh.

I added this unit test while debugging a bunch of glue test failures, which were somehow being triggered by the addition of the XBRL taxonomy metadata. Whatever it's reading is from the XBRL metadata in the glue tests... it's unable to parse as JSON, resulting in a NotImplementedError. This problem does *not* arise with the new XBRL metadata integration test, so it seems there's something weird going on with the glue test environment in particular. Another strange thing I noticed is that even if you run the tests with --live-dbs the glue tests run the ETL anyway! Not sure why this would be happening. Running ``` pytest --live-dbs test/integration/etl_test.py::test_ferc1_etl ``` also doesn't exhibit this behavior, and it finishes in just a few seconds. I split the FERC 1 ETL test into two tests, one for DBF and one for XBRL, and renamed them: * test_ferc1_dbf2sqlite() * test_ferc1_xbrl2sqlite() Rather than "etl" since they're only doing the conversions of the raw data. This lets you just test one or the other too, which is more convenient when the problem you're working on is just on one side.

zaneselvans · 2022-11-13T05:51:20Z

@cmgosnell I think something odd is going on with the glue tests you recently set up. They're running the ETL independently even when the tests are run with --live-dbs, and they're failing on the XBRL metadata extraction step, even though that step works fine in the main ETL tests. You can run the following to see the difference:

# Tries to run the FERC1 XBRL ETL and fails:
pytest --live-dbs test/integration/glue_test.py::test_unmapped_utils_eia

# Also tries to run the FERC 1 XBRL ETL (as expected) and fails:
pytest test/integration/glue_test.py::test_unmapped_utils_eia

# Runs the FERC 1 XBRL extraction, and the JSON normalization in isolation and succeeds:
pytest test/integration/etl_test.py::test_ferc1_xbrl2sqlite

# Runs the whole PUDL ETL including extracting FERC 1 DBF + XBRL and succeeds:
pytest test/integration/etl_test.py::test_pudl_engine

My guess is that in this unexpected (to me) separate ETL process that the glue tests are running, either the input parameters or the output location is different than in the main ETL, and it's reading something unexpected in (or maybe nothing) when it tries to ingest the XBRL taxonomy metadata, and that's why the call to pd.json_normalize() is failing.

I forgot that the glue tests have their own abbreviated version of the transformers, and they don't need any XBRL metadata. This commit allows a table transformer to be instantiated without any metadata without having any issues.

zaneselvans · 2022-11-13T06:28:58Z

Ah @cmgosnell I figured it out. I forgot that the glue tests have their own abbreviated XBRL transformers that just to the pre-processing so they can grab the plant / utility IDs. I fixed it so that the XBRL taxonomy metadata inputs are optional.

Really it should only be fed into the transformers that need it, like plant_in_service. Or alternatively I guess it could be a class attribute that's stored in the Ferc1AbstractTableTransformer class -- then the same metadata wouldn't end up getting stored in lots of different classes. But then that abstract class would have to know how to read it in. Maybe that would be okay / better?

Initial draft of a process that allows the XBRL taxonomy metadata to be accessed in the table transforms, and also allows the metadata to be transformed as appropriate for the individual tables (which is necessary when we are doing reshaping, renaming, etc. Currently the only way that the metadata is being utilized is in changing the sign convention of reported values so they can be aggregated using sum() directly. Some outstanding questions: * Should the metadata be stored in a separate normalized table, and combined with the data only in output methods / DB views? * What metadata columns do we actually want to retain? * Where should the bulk normalized metadata that pertains to all of the FERC 1 tables be stored while during the transforms? Really it seems like it pertains to the Ferc1AbstractTableTransformer() since it's relevant to *all* the tables. But that would mean making it a class attribute, and somehow making that class capable of reading in the metadata independently. Not sure that's a good idea. * Should we keep the calculated values in the table until we're able to aggregate and reproduce them? If so then they need to be flagged as calculated so they can removed from other calculations. Known Issues: * It looks like FERC accounts that have to get renamed (many in the general plant categories) aren't getting renamed everywhere. Their account numbers aren't showing up in the metadata we merge in... * Actually it looks like the *data* isn't coming through. Need to dig into this wtf.

…le cols.

cmgosnell

i don't love that you are using apply_sign_conventions to both apply the sign and merge the metadata into the table. can you add a merge it method and then an apply sign method

your also storing xbrl_metadata_normalized in a few places and editing it and that feels a little bad

if we make the table-specific metadata, we can pass in and cache just the table-specific metadata into the transformer class. Then you could have a merge-in-the-metadata method that grabs the cached metadata, normalizes it and then merges it.

src/pudl/transform/ferc1.py

@cmgosnell

Changes in response to comments from @cmgosnell on PR #2058 * Get rid of confusing code that turned empty calculations into NA values. * Use a single consistent name for xbrl_metadata_json throughout. * Separate metadata merging and application of sign conventions. * Also moved the translation of "credit" and "debit" into numerical weights into the `apply_sign_conventions` method so it's clearer where those numbers are coming from (and since that translation isn't really metadata normalization...)

…c1()

zaneselvans requested a review from cmgosnell November 11, 2022 22:40

zaneselvans added ferc1 Anything having to do with FERC Form 1 xbrl Related to the FERC XBRL transition labels Nov 11, 2022

zaneselvans linked an issue Nov 11, 2022 that may be closed by this pull request

Transform plant_in_srvce xbrl + dbf #1807

Closed

14 tasks

cmgosnell approved these changes Nov 11, 2022

View reviewed changes

zaneselvans added 2 commits November 11, 2022 17:27

Merge changes from upstream and resolve conflicts.

7546b8e

Merge branch 'xbrl_integration' into agg-pis-xbrl

fece75d

Update _etl_ferc1() to use the new metadata inputs.

abe5fb4

zaneselvans changed the title ~~Fix bad multi-index construction that was scrambling XBRL columns.~~ Integrated XBRL taxonomy metadata into plant_in_service transform Nov 12, 2022

zaneselvans changed the title ~~Integrated XBRL taxonomy metadata into plant_in_service transform~~ Integrate XBRL taxonomy metadata into plant_in_service transform Nov 12, 2022

zaneselvans added 2 commits November 11, 2022 21:37

Merge branch 'xbrl_integration' into agg-pis-xbrl

e920f8d

zaneselvans added 3 commits November 13, 2022 01:55

Add non-FERC1 forms back into XBRL extraction.

0f5c9e1

Rename metadata ferc account categories to match plant_in_service tab…

0455db3

…le cols.

cmgosnell reviewed Nov 14, 2022

View reviewed changes

src/pudl/transform/ferc1.py Outdated Show resolved Hide resolved

src/pudl/transform/ferc1.py Outdated Show resolved Hide resolved

cmgosnell reviewed Nov 14, 2022

View reviewed changes

src/pudl/transform/ferc1.py Outdated Show resolved Hide resolved

cmgosnell reviewed Nov 14, 2022

View reviewed changes

src/pudl/transform/ferc1.py Show resolved Hide resolved

cmgosnell reviewed Nov 14, 2022

View reviewed changes

src/pudl/transform/ferc1.py Show resolved Hide resolved

zaneselvans added 6 commits November 14, 2022 18:11

Add clarifying comment on metadata row selections.

d37d07c

Rename ferc1_xbrl_raw_meta to xbrl_metadata_json in pudl.etl._etl_fer…

9281a51

…c1()

Merge branch 'xbrl_integration' into agg-pis-xbrl

c7c48c6

Remove obsolete FERC 1 transform functions & helpers

b3285b1

Remove an obsolete unused constant.

f708893

zaneselvans merged commit 14bd607 into xbrl_integration Nov 15, 2022

zaneselvans deleted the agg-pis-xbrl branch November 15, 2022 06:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate XBRL taxonomy metadata into `plant_in_service` transform #2058

Integrate XBRL taxonomy metadata into `plant_in_service` transform #2058

zaneselvans commented Nov 11, 2022

codecov bot commented Nov 11, 2022 •

edited

Loading

zaneselvans commented Nov 12, 2022

zaneselvans commented Nov 13, 2022

zaneselvans commented Nov 13, 2022

cmgosnell left a comment •

edited

Loading

Integrate XBRL taxonomy metadata into plant_in_service transform #2058

Integrate XBRL taxonomy metadata into plant_in_service transform #2058

Conversation

zaneselvans commented Nov 11, 2022

codecov bot commented Nov 11, 2022 • edited Loading

Codecov Report

zaneselvans commented Nov 12, 2022

zaneselvans commented Nov 13, 2022

zaneselvans commented Nov 13, 2022

cmgosnell left a comment • edited Loading

Choose a reason for hiding this comment

Integrate XBRL taxonomy metadata into `plant_in_service` transform #2058

Integrate XBRL taxonomy metadata into `plant_in_service` transform #2058

codecov bot commented Nov 11, 2022 •

edited

Loading

cmgosnell left a comment •

edited

Loading