-
-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reconcile multiple years of data in XBRL instant tables #2021
Comments
At least in the ferc1_settings = pudl.settings.Ferc1Settings(
tables=["plant_in_service_ferc1"],
years=[2020, 2021],
)
raw_dbf = pudl.extract.ferc1.extract_dbf(
ferc1_settings,
pudl_settings,
)
raw_xbrl = pudl.extract.ferc1.extract_xbrl(
ferc1_settings,
pudl_settings,
)
# 414 records, 206 from 2021-12-31 & 208 from 2020-12-31
pis_xbrl_instant = raw_xbrl["plant_in_service_ferc1"]["instant"]
assert (pis_xbrl_instant.groupby(["entity_id", "date"]).nunique() > 1).sum().sum() == 0
# Only has 202 records, which aren't actually duplicated before the tables get merged.
pis_xbrl_duration = raw_xbrl["plant_in_service_ferc1"]["duration"]
assert (pis_xbrl_duration.groupby(["entity_id", "start_date"]).nunique() > 1).sum().sum() == 0 Update:I am a dumbass and was grouping-by unique rows so of course everything was the same. Doing this correctly I find that 394 of the 414 records experienced some year-to-year change in the overall aggrete value pis_xbrl_instant["changed"] = (pis_xbrl_instant.groupby(["entity_id"])["electric_plant_in_service"].transform("nunique") > 1)
pis_xbrl_instant[pis_xbrl_instant.changed] |
|
|
Weird missing data issues aside, I think it's clear what the meaning of the multiple years of data is: end of year balances, for this and last year, which get reported in the same year, but have different instantaneous timestamps / dates associated with them. We're not currently adopting the instant vs. duration model of time. Maybe we should think about that at some point, but in our current data model we want to be able to easily aggregate data within or potentially across
This arrangement will allow us to look at a given It seems like working this into the This was mostly a research and exploration issue. I'm going to close it and move on to implementation in #2014. |
In some of the XBRL tables representing
instant
facts, there's more than one value fordate
associated with a singlereport_year
. For example:How should we deal with this? What does it mean?
In the DBF tables there are sometimes "this year" and "last year" data reported next to each other for comparison. Is that what's going on here? If so, then we only need to keep the current year of data. Need to identify which tables report this way in DBF and compare them to the tables with more than 1 year of data in the XBRL to see if they correspond.
start_date
andend_date
.The text was updated successfully, but these errors were encountered: