Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parametermize adding a column in the FERC1 transform & ensure _correction records end up in the calculation compoent table #3409

Merged
merged 8 commits into from
Feb 20, 2024

Conversation

cmgosnell
Copy link
Member

@cmgosnell cmgosnell commented Feb 19, 2024

Overview

Closes #3072. Working on #3404.

What problem does this address?

  • it was previously pretty hard to keep track of when we added dimension columns into the ferc1 transform like `utility_type="electric"
  • we weren't getting very many _correction records into the detailed and rate base tables because the correction records in the _core_ferc1_xbrl__calculation_components table were not showing up with the same dimensions

What did you change?

  • see task list in Make a transformer to add a column w/ a standard value #3072 for parameterization
  • added in _correction records to the _core_ferc1__table_dimensions table even if there aren't correction records in the data so we can have them in the _core_ferc1_xbrl__calculation_components with the same exact dimensions as their parents
  • ensure the correction records don't get grabbed as total -> sub-total records
  • moved the big explosion_args into a global variable. this is entirely for notebook-able testing. it's been a pain ergonomically for a while to have that dictionary inside the factory instead of accessible.
  • light changes to the forest_as_table: keep the columns with fully null dimensions so that it is much easier to set indexes on a particular layer and select records using the nodes (which have all dimensions even the fully null ones)

Testing

How did you make sure this worked? How can a reviewer verify this?

To-do list

@cmgosnell cmgosnell self-assigned this Feb 19, 2024
@cmgosnell cmgosnell added ferc1 Anything having to do with FERC Form 1 rmi labels Feb 19, 2024
@cmgosnell cmgosnell marked this pull request as ready for review February 19, 2024 20:44
@e-belfer e-belfer changed the title Parametermize adding a column in the FERC1 transform & ensure _correcition records end up in the calculation compoent table Parametermize adding a column in the FERC1 transform & ensure _correction records end up in the calculation compoent table Feb 19, 2024
Copy link
Member

@e-belfer e-belfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few clarifying questions and some minor nits. Otherwise mostly looks good to go and everything generates as expected.

src/pudl/output/ferc1.py Outdated Show resolved Hide resolved
src/pudl/output/ferc1.py Show resolved Hide resolved
src/pudl/transform/ferc1.py Outdated Show resolved Hide resolved
src/pudl/transform/ferc1.py Outdated Show resolved Hide resolved
@@ -1520,7 +1549,9 @@ def add_corrections(
f"{num_records} original records."
)

return pd.concat([calculated_df, corrections], axis="index")
return pd.concat(
[calculated_df.convert_dtypes(), corrections.convert_dtypes()], axis="index"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to do dtypes conversions here and not further upstream?

Copy link
Member Author

@cmgosnell cmgosnell Feb 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this has to do with #3125 but i actually didn't even do this "right". i converted this to just astype corrections

src/pudl/transform/ferc1.py Outdated Show resolved Hide resolved
@@ -2161,11 +2215,6 @@ def add_calculation_corrections(
weight=1,
)
)
# for every calc component, also make the parent-only version
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why this gets dropped?

Copy link
Member Author

@cmgosnell cmgosnell Feb 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes i should have said something about this. number 1: this assign wasn't being saved '_' it was just happen in the void, so removing this does not change and outcome. number 2: we went back and forth in early development about whether the calculation component table should have all the factoids as "parent" factoids. this in effect negated the need for the metadata table with all the facts. we moved away this a while back

i just noticed it being in here in a ghostly way while trying to track down why the corrections where not showing up in the rate base table

@@ -6475,6 +6422,22 @@ def make_calculation_dimensions_explicit(
the parental dimensions or the child dimensions.
"""
logger.info(f"Adding {dimensions=} into calculation component table.")
# even though we don't actually get correction records for all of the
# factiods, we are still going to force them to exist here so all of the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be for all factoids, or all calculated factoids? If factoids only ever exist as components I'm not sure why we would expect a correction record for them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a very good point. this is part of why i wanted to move this into this function instead of do this in _core_ferc1__table_dimensions. because these _correction records are only ever generated here & they are only used to as the left df in a left merge down below when defining calc_comps_w_implied_dims, if they don't show up null_dim (which is made out of the calculation components) they will just be correctly lost in the merge.

but that feels pretty opaque. so I'll add another part of the mask to grab only table dim factoids that show up as parents (== calculated values) in the calculation components

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm this is less straightforward than i had thought. because this function should reallly be called make_xbrl_factoid_dimensions_explicit because this is being used to make dimension explict for dfs that are not the calc comps table. I think it is possible to only add corrections for calculated values but it will be a little complicated. it seems simpler to keep the existing behavior and just explain the left merge drop here.

"added_dim": AddColumnWithUniformValue(
column_value="i'm dim", is_dimension=True
),
"not_a_dim": AddColumnWithUniformValue(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to assert that this column got added to the dataframe, but not as a dimension column?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in here i'm testing the dimension_columns parameter class but not i'm not actually doing the transform at all.

in the test_add_columns_with_uniform_value above, we're actually doing the transform. in that context i didn't set the is_dimension bool so by default they are both False and are being added. I could make that explicit above and test it up there or add a new test that does that mixed is_dimension transform.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay i added a second step in test_add_columns_with_uniform_value

@e-belfer e-belfer self-requested a review February 20, 2024 16:29
Copy link
Member

@e-belfer e-belfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing my comments.

@e-belfer e-belfer added this pull request to the merge queue Feb 20, 2024
Merged via the queue into main with commit 292b1ff Feb 20, 2024
12 checks passed
@e-belfer e-belfer deleted the ferc1-rate-base-debug branch February 20, 2024 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ferc1 Anything having to do with FERC Form 1 rmi
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants