Skip to content

Conversation

@nmdefries
Copy link
Contributor

Description

Daily and rollup (covering ~4 weeks) input files are formatted slightly differently. Daily input files don't contain lag or issue_date fields that are necessary for data filtering and modeling. In the pipeline, we combine rollup and daily files using bind_rows, whose output includes the union of all fields seen in component dfs. If a given field is missing from one of the component dfs, those entries are filled with NA. This happens to the lag and issue_date fields for daily files.

In the current version of the pipeline, we check if the lag and issue_date fields are entirely missing. However, even if this check passes, the missing values from daily files cause problems later in the pipeline.

  1. So, add issue_date field to daily dfs on read. From this, derive the lag field.

Date fields time_value and issue_date, when available, are read in as datetime class with the local timezone. We expect these to be dates. All of the datetimes correspond to UTC midnight datetimes (such that the date is one day later). In the Python pipelines that produce input files, these date fields are actually formatted as datetime64[ns] (timezone-naive datetimes). It appears that R's arrow::read_parquet is assuming these are in UTC, and converting to the host's timezone.

  1. Convert these back to UTC and then to dates for appropriate handling.

  2. The pipeline expects a field call geo_value; input files call it fips instead, so rename.

Changelog

  • NAMESPACE
  • io.R
  • main.R
  • utils.R
  • documentation files

@nmdefries nmdefries requested a review from jingjtang January 12, 2023 17:50
@nmdefries
Copy link
Contributor Author

nmdefries commented Jan 12, 2023

Superseded by #1760 and #1761

@nmdefries nmdefries closed this Jan 12, 2023
@nmdefries nmdefries deleted the ndefries/fill-issue-date branch January 20, 2023 21:48
@nmdefries nmdefries restored the ndefries/fill-issue-date branch February 15, 2023 17:30
@nmdefries nmdefries deleted the ndefries/fill-issue-date branch February 15, 2023 17:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants