[Backfill corrections] Account for differences in fields in daily input files; convert date fields #1758

nmdefries · 2023-01-12T17:50:20Z

Description

Daily and rollup (covering ~4 weeks) input files are formatted slightly differently. Daily input files don't contain lag or issue_date fields that are necessary for data filtering and modeling. In the pipeline, we combine rollup and daily files using bind_rows, whose output includes the union of all fields seen in component dfs. If a given field is missing from one of the component dfs, those entries are filled with NA. This happens to the lag and issue_date fields for daily files.

In the current version of the pipeline, we check if the lag and issue_date fields are entirely missing. However, even if this check passes, the missing values from daily files cause problems later in the pipeline.

So, add issue_date field to daily dfs on read. From this, derive the lag field.

Date fields time_value and issue_date, when available, are read in as datetime class with the local timezone. We expect these to be dates. All of the datetimes correspond to UTC midnight datetimes (such that the date is one day later). In the Python pipelines that produce input files, these date fields are actually formatted as datetime64[ns] (timezone-naive datetimes). It appears that R's arrow::read_parquet is assuming these are in UTC, and converting to the host's timezone.

Convert these back to UTC and then to dates for appropriate handling.
The pipeline expects a field call geo_value; input files call it fips instead, so rename.

Changelog

NAMESPACE
io.R
main.R
utils.R
documentation files

nmdefries · 2023-01-12T22:55:19Z

Superseded by #1760 and #1761

nmdefries added 3 commits January 11, 2023 17:18

generate issue_date field for daily input

c82571c

add lag field, convert dt fields to date only

479adf8

make sure incoming data has geo_value field

c486188

nmdefries requested a review from jingjtang January 12, 2023 17:50

nmdefries closed this Jan 12, 2023

nmdefries deleted the ndefries/fill-issue-date branch January 20, 2023 21:48

nmdefries restored the ndefries/fill-issue-date branch February 15, 2023 17:30

nmdefries deleted the ndefries/fill-issue-date branch February 15, 2023 17:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Backfill corrections] Account for differences in fields in daily input files; convert date fields #1758

[Backfill corrections] Account for differences in fields in daily input files; convert date fields #1758

Uh oh!

nmdefries commented Jan 12, 2023

Uh oh!

nmdefries commented Jan 12, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Backfill corrections] Account for differences in fields in daily input files; convert date fields #1758

[Backfill corrections] Account for differences in fields in daily input files; convert date fields #1758

Uh oh!

Conversation

nmdefries commented Jan 12, 2023

Description

Changelog

Uh oh!

nmdefries commented Jan 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nmdefries commented Jan 12, 2023 •

edited

Loading