-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add percent_of_expected_deaths
signal and dry-run mode to NCHS mortality data pipeline
#233
Conversation
Issue to the API weekly, but track daily updates in S3 using the diff-based archive utility (weekly updates would be tracked in S3 anyway). This will require extending the utility to handle this weird case. |
"Get up-to-date utils"
Separate diff tracking into daily diffs and weekly diffs. |
Checking code coverage in the tests:
|
Not part of this set of commits, so maybe this belongs to another issue, but I'm wondering if the lines 63-67 in
|
This is added in case the values for some states are missing for certain dates. |
I see, so this is to make sure that every state has the same number of reported dates? |
Yes. |
Should
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly small linter fixes here.
My only other comment is that the weekly vs daily updates change was tough to understand reading this thread or the code. I think adding some elaboration to the documentation for that would help people coming to this codebase later.
Other than that, all tests pass, so after the linter fixes, I think this PR is good to go.
nchs_mortality/tests/test_pull.py
Outdated
'pneumonia_deaths', 'pneumonia_and_covid_deaths', | ||
'influenza_deaths', 'pneumonia_influenza_or_covid_19_deaths', | ||
"timestamp", "geo_id", "population"]).all() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The linter asks for there to be a new line at the end of this file.
nchs_mortality/tests/test_run.py
Outdated
@@ -1,49 +1,67 @@ | |||
import pytest | |||
|
|||
from os import listdir | |||
import datetime as dt | |||
from os import listdir, remove |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Linter says remove
is no longer needed.
nchs_mortality/tests/test_run.py
Outdated
] | ||
metrics = [ | ||
'covid_deaths', 'total_deaths', 'pneumonia_deaths', | ||
'pneumonia_and_covid_deaths', 'influenza_deaths', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trailing space here.
@dshemetov Agree on documentation -- @jingjtang can you add a section to DETAILS.md describing the details of:
|
@jingjtang See my first comment for code coverage concerns. I think we should try to make sure we have tests that cover all the code lines, so we don't have surprises down the road. In particular:
If you're absolutely sure those work, I can relent on this issue. Just trying to enforce some testing consistency. |
Added a new test case for missing cols. As for others:
|
LGTM! |
Closes #119
percent_of_expected_deaths
signalnum
orprop
) for this signal, just report the raw valuestoken
I noticed that they actually update the dataset every weekday. (The column
data_as_of
is always the date of today) Maybe we want to run the pipeline everyday in order to get the backfill info.(The pipeline will run for only less than 10 seconds every time)