- This pipeline takes raw
companies
,shuttles
andreviews
data and creates typed parquet mirrors on theintermediate
level. - In order to create the
primary
domain level data, we aggregate thecompanies
data so we have one row per company. We then merge all 3 sources and create two newprimary
tables:- The
prm_spine_table
contains just the relevant ID columns at the required grain going forward. Allfeature
andmodel_input
tables will include these columns and have the same number rows. - The
prm_shuttle_company_reviews
table includes metrics which will be used later or in the pipeline as features ready to be used in the model.
- The