-
-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parametermize adding a column in the FERC1 transform & ensure _correction
records end up in the calculation compoent table
#3409
Conversation
_correcition
records end up in the calculation compoent table_correction
records end up in the calculation compoent table
… table and into the infering of dimensionsfunc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few clarifying questions and some minor nits. Otherwise mostly looks good to go and everything generates as expected.
src/pudl/transform/ferc1.py
Outdated
@@ -1520,7 +1549,9 @@ def add_corrections( | |||
f"{num_records} original records." | |||
) | |||
|
|||
return pd.concat([calculated_df, corrections], axis="index") | |||
return pd.concat( | |||
[calculated_df.convert_dtypes(), corrections.convert_dtypes()], axis="index" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to do dtypes conversions here and not further upstream?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this has to do with #3125 but i actually didn't even do this "right". i converted this to just astype corrections
@@ -2161,11 +2215,6 @@ def add_calculation_corrections( | |||
weight=1, | |||
) | |||
) | |||
# for every calc component, also make the parent-only version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why this gets dropped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yes i should have said something about this. number 1: this assign wasn't being saved '_' it was just happen in the void, so removing this does not change and outcome. number 2: we went back and forth in early development about whether the calculation component table should have all the factoids as "parent" factoids. this in effect negated the need for the metadata table with all the facts. we moved away this a while back
i just noticed it being in here in a ghostly way while trying to track down why the corrections where not showing up in the rate base table
@@ -6475,6 +6422,22 @@ def make_calculation_dimensions_explicit( | |||
the parental dimensions or the child dimensions. | |||
""" | |||
logger.info(f"Adding {dimensions=} into calculation component table.") | |||
# even though we don't actually get correction records for all of the | |||
# factiods, we are still going to force them to exist here so all of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be for all factoids, or all calculated factoids? If factoids only ever exist as components I'm not sure why we would expect a correction record for them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a very good point. this is part of why i wanted to move this into this function instead of do this in _core_ferc1__table_dimensions
. because these _correction
records are only ever generated here & they are only used to as the left df in a left merge down below when defining calc_comps_w_implied_dims
, if they don't show up null_dim
(which is made out of the calculation components) they will just be correctly lost in the merge.
but that feels pretty opaque. so I'll add another part of the mask to grab only table dim factoids that show up as parents (== calculated values) in the calculation components
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm this is less straightforward than i had thought. because this function should reallly be called make_xbrl_factoid_dimensions_explicit
because this is being used to make dimension explict for dfs that are not the calc comps table. I think it is possible to only add corrections for calculated values but it will be a little complicated. it seems simpler to keep the existing behavior and just explain the left merge drop here.
"added_dim": AddColumnWithUniformValue( | ||
column_value="i'm dim", is_dimension=True | ||
), | ||
"not_a_dim": AddColumnWithUniformValue( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to assert that this column got added to the dataframe, but not as a dimension column?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in here i'm testing the dimension_columns
parameter class but not i'm not actually doing the transform at all.
in the test_add_columns_with_uniform_value
above, we're actually doing the transform. in that context i didn't set the is_dimension
bool so by default they are both False
and are being added. I could make that explicit above and test it up there or add a new test that does that mixed is_dimension
transform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay i added a second step in test_add_columns_with_uniform_value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing my comments.
Overview
Closes #3072. Working on #3404.
What problem does this address?
_correction
records into thedetailed
and rate base tables because the correction records in the_core_ferc1_xbrl__calculation_components
table were not showing up with the same dimensionsWhat did you change?
_correction
records to the_core_ferc1__table_dimensions
table even if there aren't correction records in the data so we can have them in the_core_ferc1_xbrl__calculation_components
with the same exact dimensions as their parentsexplosion_args
into a global variable. this is entirely for notebook-able testing. it's been a pain ergonomically for a while to have that dictionary inside the factory instead of accessible.forest_as_table
: keep the columns with fully null dimensions so that it is much easier to set indexes on a particular layer and select records using the nodes (which have all dimensions even the fully null ones)Testing
How did you make sure this worked? How can a reviewer verify this?
To-do list