-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add NCHS mortality geo aggregation at the HHS and nation levels #1243
base: main
Are you sure you want to change the base?
Conversation
@krivard - ready for review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a bonus, resolving the code-duplication questions may get the linter to hush.
I could be convinced otherwise though if I've misread what's going on, wow that is a monster.
@@ -18,7 +19,7 @@ | |||
from .pull import pull_nchs_mortality_data | |||
|
|||
|
|||
def run_module(params: Dict[str, Any]): | |||
def run_module(params: Dict[str, Any]): # pylint: disable=too-many-branches, too-many-statements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this better than splitting it up into more-specific functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely not, just pushed some changes to pull out some helper methods!
df = df_pull.copy() | ||
df["se"] = np.nan | ||
df["sample_size"] = np.nan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be pulled out of the for loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually it can probably be pulled out of the if block as well
if geo in ["hhs", "nation"]: | ||
df = gmpr.replace_geocode( | ||
df, "state_id", "state_code", from_col="geo_id", date_col="timestamp") | ||
df = gmpr.replace_geocode( | ||
df, "state_code", geo, date_col="timestamp").rename(columns={geo: "geo_id"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we reduce some of the duplication with percent_of_expected_deaths
here at all?
@@ -4,6 +4,7 @@ | |||
disable=logging-format-interpolation, | |||
too-many-locals, | |||
too-many-arguments, | |||
fixme, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this for a TODO I left in the code. Let me know if this is something we'd want to keep consistent across all indicators and I can add it to others.
Description
Add NCHS mortality geo aggregation at the HHS and nation levels.
Note that because one of the columns being aggregated is a percentage, we're doing a weighted sum with the geomapper
Also there are some pylint disables for too many branches, this is because of the slight difference in logic for handling the percentage vs. count variables. This could be refactored later for the entire run.py function, but is not in scope for this change.
Fixes