Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor doctor visits to use geo utils package #401

Merged
merged 914 commits into from
Nov 9, 2020
Merged

Conversation

chinandrew
Copy link
Contributor

@chinandrew chinandrew commented Oct 28, 2020

Refactor doctor visits to use the new geoutils package. Closes #385

Summary of changes:

  • county to msa is now strings and not floats, so outputs are '11500' and not 11500.0, which should ingest fine.
  • count to state is now lowercase, which should also ingest fine.
  • county to HRR has an updated mapping table from the utils package so some weights are slightly different
  • county to megacounty now removes some unnecessary columns and adjusts the order of the rows (megacounty now at the top instead of bottom), both of which I don't believe affect anything downstream. Also, I think there was a bug where rows that were meant to be excluded were not, which I've added a comment for below.

I branched this off dv-package but had to merge main in to get the most up to date geo utils, so the changes are...unreadable. Could remedy this by merging main into dv-package. Pretty sure there are only 2 files I touched, doctor_visits/delphi_doctor_visits/geo_maps.py and doctor_visits/tests/test_geomap.py, both of which I've left a comment on below so they can be found easily.

jingjtang and others added 30 commits August 28, 2020 18:31
[GHT] try until pull the new data successfully
* Add updated encrypted credentials

* Add wip_signal to production params template

* Add newline

* Run on 12 cores

* Add shell script to run the indicator
* Add updated encrypted credentials

* Add wip_signal to production params template

* Add newline

* Run on 12 cores

* Add shell script to run the indicator

* Update run script
- Use production ingestion dir
- Remove hard fail to work around an issue with `cp`.
  - `cp` fails when files don't exist. I thought we could squash the
    error by sending stderr to /dev/null, but `set -eo` still catches
    and fails the script. Even worse, it happens silently. There should
    be a better way to handle this so will add it to a future task.
Merge the SafeGraph deployment branch updates to main
Propagate NYC and pylint hotfix for JHU
.sum()
.reset_index()
)
megacounty_df["ToExclude"] = False
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the values where ToExclue=True were meant to be filtered out but weren't here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. In line 185, we subset by the ["ToExclude"] column, which should have removed all the cases we wanted to filter out. The line at 190 may just be redundant?


def test_county_to_megacounty(self):

out, name = GM.county_to_megacounty(DATA, 10, 10)
out, name = GM.county_to_megacounty(DATA, 100000, 10)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous threshold led to no megacounties being made

@mariajahja
Copy link
Member

@chinandrew Thanks for the breakdown. Are any state-level changes expected? Most of the values are the same, but I do see some differences (I noticed Alaska). Also, we have more HRRs than before. I don't think these changes are the result of this PR, but maybe the geo utils themselves?

@chinandrew
Copy link
Contributor Author

@chinandrew Thanks for the breakdown. Are any state-level changes expected? Most of the values are the same, but I do see some differences (I noticed Alaska). Also, we have more HRRs than before. I don't think these changes are the result of this PR, but maybe the geo utils themselves?

For HRR's, @dshemetov and I discussed how the mapping files were updated and some very minor changes were to be expected (e.g. fips 01011 got converted to 146,0.003024 and 7,0.996976 (HRR,weight) in the new mapping, while the original gets converted to 7,1.0000). Not sure about the state, it's possible that mapping file was updated? how large is the difference?

Copy link
Member

@mariajahja mariajahja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, OK. The differences are 0.01-0.15 (I see another state is also different). Again, I don't think it's this PR, so I'll hit approve with the understanding that the only changes I've checked are in the two files.

@krivard krivard merged commit fbd87a2 into dv-package Nov 9, 2020
@krivard krivard deleted the dv-georefactor branch February 9, 2021 18:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.