-
-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate XBRL taxonomy metadata into plant_in_service
transform
#2058
Conversation
Codecov ReportBase: 84.6% // Head: 85.1% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## xbrl_integration #2058 +/- ##
==================================================
+ Coverage 84.6% 85.1% +0.5%
==================================================
Files 72 72
Lines 8114 8119 +5
==================================================
+ Hits 6865 6910 +45
+ Misses 1249 1209 -40
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
plant_in_service
transform
plant_in_service
transformplant_in_service
transform
Welp, made it most of the way through, but then ran into an error with the plant mapping it looks like, in this test:
I would assume this is something from upstream since I haven't touched any of the plant mapping... but it looks like the |
I added this unit test while debugging a bunch of glue test failures, which were somehow being triggered by the addition of the XBRL taxonomy metadata. Whatever it's reading is from the XBRL metadata in the glue tests... it's unable to parse as JSON, resulting in a NotImplementedError. This problem does *not* arise with the new XBRL metadata integration test, so it seems there's something weird going on with the glue test environment in particular. Another strange thing I noticed is that even if you run the tests with --live-dbs the glue tests run the ETL anyway! Not sure why this would be happening. Running ``` pytest --live-dbs test/integration/etl_test.py::test_ferc1_etl ``` also doesn't exhibit this behavior, and it finishes in just a few seconds. I split the FERC 1 ETL test into two tests, one for DBF and one for XBRL, and renamed them: * test_ferc1_dbf2sqlite() * test_ferc1_xbrl2sqlite() Rather than "etl" since they're only doing the conversions of the raw data. This lets you just test one or the other too, which is more convenient when the problem you're working on is just on one side.
@cmgosnell I think something odd is going on with the glue tests you recently set up. They're running the ETL independently even when the tests are run with
My guess is that in this unexpected (to me) separate ETL process that the glue tests are running, either the input parameters or the output location is different than in the main ETL, and it's reading something unexpected in (or maybe nothing) when it tries to ingest the XBRL taxonomy metadata, and that's why the call to |
I forgot that the glue tests have their own abbreviated version of the transformers, and they don't need any XBRL metadata. This commit allows a table transformer to be instantiated without any metadata without having any issues.
Ah @cmgosnell I figured it out. I forgot that the glue tests have their own abbreviated XBRL transformers that just to the pre-processing so they can grab the plant / utility IDs. I fixed it so that the XBRL taxonomy metadata inputs are optional. Really it should only be fed into the transformers that need it, like |
Initial draft of a process that allows the XBRL taxonomy metadata to be accessed in the table transforms, and also allows the metadata to be transformed as appropriate for the individual tables (which is necessary when we are doing reshaping, renaming, etc. Currently the only way that the metadata is being utilized is in changing the sign convention of reported values so they can be aggregated using sum() directly. Some outstanding questions: * Should the metadata be stored in a separate normalized table, and combined with the data only in output methods / DB views? * What metadata columns do we actually want to retain? * Where should the bulk normalized metadata that pertains to all of the FERC 1 tables be stored while during the transforms? Really it seems like it pertains to the Ferc1AbstractTableTransformer() since it's relevant to *all* the tables. But that would mean making it a class attribute, and somehow making that class capable of reading in the metadata independently. Not sure that's a good idea. * Should we keep the calculated values in the table until we're able to aggregate and reproduce them? If so then they need to be flagged as calculated so they can removed from other calculations. Known Issues: * It looks like FERC accounts that have to get renamed (many in the general plant categories) aren't getting renamed everywhere. Their account numbers aren't showing up in the metadata we merge in... * Actually it looks like the *data* isn't coming through. Need to dig into this wtf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't love that you are using apply_sign_conventions
to both apply the sign and merge the metadata into the table. can you add a merge it method and then an apply sign method
your also storing xbrl_metadata_normalized
in a few places and editing it and that feels a little bad
if we make the table-specific metadata, we can pass in and cache just the table-specific metadata into the transformer class. Then you could have a merge-in-the-metadata method that grabs the cached metadata, normalizes it and then merges it.
Changes in response to comments from @cmgosnell on PR #2058 * Get rid of confusing code that turned empty calculations into NA values. * Use a single consistent name for xbrl_metadata_json throughout. * Separate metadata merging and application of sign conventions. * Also moved the translation of "credit" and "debit" into numerical weights into the `apply_sign_conventions` method so it's clearer where those numbers are coming from (and since that translation isn't really metadata normalization...)
No description provided.