-
-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow concatenation of tall DBF with wide XBRL data #2014
Comments
@zschira in compiling this column mapping I noticed that the XBRL data has no totals, and I know XBRL lets you specify what collections of values should be added up to create totals and subtotals, so I suspect that they've encoded the totals and subtotals that way. Did you see anything like that in the XBRL metadata / taxonomies when you were poking around? I think they might be called "calculation arcs?" Also, compiling these maps will definitely be much easier when we can get programmatic access to the FERC Account numbers associated with each of the XBRL columns. I think it'll cover more than 90% of the columns in many tables. |
Notes from chat with @zschira
|
Progress on #2012 #2014 * Fixed a bug in how the DBF row numbers that need to be mapped are identified. Now it looks for any time the row_literal associated with a row number has changed from one year to the next, rather than selecting the first instance of each distinct combination of row_literal and row_number. * Also discovered that there's an obscure row_status field that differentiates between annual (A) and quarterly (Q) row literals, and is part of the f1_row_lit_tbl primary key, but it only shows up in association with the f1_schedules_list table. I integrated it but... maybe that table should just be excluded from the row mapping template? * Added some (janky) helper functions to pudl.transform.ferc1 to manage the generation of the row maps. This location is temporary. They should probably become methods of a Ferc1 abstract transformer class for reshaped tables, or maybe end up in a different module. Not sure how they'll end up getting used yet though. * Updated the dbf_to_xbrl.csv file to include all of the possible rows that could need mapping (4270 in total). * Removed the XBRL specific metadata fields from the dbf_to_xbrl.csv file, since they should (hopefully) be available programmatically from the metadata @zschira is extracting from the XBRL taxonomies, and can be joined to this table based on the xbrl_column_stem. * Updated the plant_in_service transform to use the new row map. Need to test on all of the years.
|
Tasks
(report_year, row_number)
should be associated with what XBRL column name for a given table. This file will need to store all unique combinations of(sched_table_name, report_year, row_number, row_literal)
._additions
,_retirements
etc.) for thef1_plant_in_srvce
table. Across all the FERC 1 DBF data, there are 4260 unique combinations. Thef1_plant_in_srvce
table has 198 combos. So this is tedious but very doable, even for all tables.plant_in_service_ferc1
version ofmerge_instant_and_duration_tables_xbrl()
that converts the multiple years of data into starting/ending balance columns (see Reconcile multiple years of data in XBRL instant tables #2021 for notes).plant_in_service_ferc1
version ofprocess_xbrl()
that reshapes from wide to tidy format for concatenation with the DBF data.plant_in_service_ferc1
version ofprocess_dbf()
that alignsreport_year
androw_number
with the account IDs that are used in the XBRL data based on our manual mapping.transform_main()
but potentially referring to the same parameters/mapping/metadata as was used for the reshaping and alignment.record_id
values when they're found in a reshaped table.Notes from CSV Compilation
plant_in_service
DBF table has 198 different combinations of year, row number, and row literal. The XBRL table has 96 columns. This DBF table changed once in 2003 and again in 2006.plant_in_service
table include no totals which makes me suspect that there's some kind of structural metadata in the XBRL that says which values should be added up to generate the previously calculated subtotals / totals. This could be really useful if we can use it to programmatically group the old DBF values too.energy_storage_equipment_distribution_plant
. Having the FERC Account numbers would remove a lot of potential ambiguity.plant_in_service
table there are 2 levels of headers, but only Production Plant is subdivided (by type of generation -- hydro, steam, nuclear, etc.) while all of the other numbers are reported either at the top level (not under any header, for FERC Account 102 and 103) or the first level (a big utility plant category likedistribution_plant
).electric_plant_purchased
andelectric_plant_sold
). The plant sold is a credit rather than a debit to the plant balance, so all values in that category need to be negated. This is important because in the normal categories the individual columns also have different credit/debit categorizations -- additions, transfers, and adjustments are booked as "debits" and retirements are booked as "credits". It'll be much better if we canThe text was updated successfully, but these errors were encountered: