Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GeoMapper produces invalid counties >80000 for JHU #318

Closed
krivard opened this issue Oct 15, 2020 · 7 comments · Fixed by #335
Closed

GeoMapper produces invalid counties >80000 for JHU #318

krivard opened this issue Oct 15, 2020 · 7 comments · Fixed by #335
Assignees
Labels
bug Something isn't working data quality Missing data, weird data, broken data Engineering Used to filter issues when synching with Asana

Comments

@krivard
Copy link
Contributor

krivard commented Oct 15, 2020

Specifically, 88888 and 99999. COVIDcast fails any county file with a geo over 80000, so that's not allowed.

This might be related to the 800XX and 900XX codes referenced here and here; JHU plays pretty fast and loose with their geo coding. It's totally fine for GeoMapper to use 8xxxx and 9xxxx codes internally for geo aggregation, we just can't let them escape into CSV files destined for COVIDcast ingestion.

@dshemetov would you introduce @sgsmob to the relevant code and documentation here and on JHU's github?

@krivard krivard added the bug Something isn't working label Oct 15, 2020
@dshemetov
Copy link
Contributor

Ah these are likely coming from the Puerto Rico 72888 and 72999 UIDs, will take a look.

@dshemetov
Copy link
Contributor

I've been meaning to make a unified map of all the hand modifications we use in JHU, it might be time.

@sgsmob
Copy link
Contributor

sgsmob commented Oct 16, 2020

So do we need to enforce this in GeoMapper or do we need to have some/all indicators check this later?

@dshemetov
Copy link
Contributor

@sgsmob Since this a global requirement for all indicators, I think we should enforce it once in GeoMapper.

@dshemetov
Copy link
Contributor

Ah I see, these are not Puerto Rico, but are the Princess Cruises ships. @sgsmob this is the JHU UID lookup file from this repo.

84088888,US,USA,840,88888,,Diamond Princess,US,,,"Diamond Princess, US",
84099999,US,USA,840,99999,,Grand Princess,US,,,"Grand Princess, US",

We just need to make sure they are dropped from the JHU UID -> FIPS mapping file. Should be a few lines.

@dshemetov
Copy link
Contributor

Also the high-level GeoMapper docs are here. The section on geo codes provides general info on the codes we support and the section on source files describes the authorities we get the lookup tables from.

@dshemetov
Copy link
Contributor

dshemetov commented Oct 20, 2020

Reproducing my table of custom JHU UID -> FIPS mappings:

Name JHU UID Delphi FIPS Notes
Dukes&Nantucket 84070002 25007 Weighted split
Dukes&Nantucket 84070002 25019 Weighted split
Dukes 84025007 25007 JHU unused but kept in case of future changes
Nantucket 84025019 25019 JHU unused but kept in case of future changes
Kansas City 84070003 29095 Weighted split
Kansas City 84070003 29165 Weighted split
Kansas City 84070003 29037 Weighted split
Kansas City 84070003 29047 Weighted split
Alaska 84002158 02270 Recoding due to FIPS table differences
Ogalala 84046102 46113 Recoding due to FIPS table differences
States 840000XX XX000 JHU unused state FIPS code, kept in case of future changes
Unassigned 840800XX XX000 Probable cases/deaths for the state level
Out of State 840900XX XX000 Miscellaneous cases/deaths for the state level
Puerto Rico Unassigned 63072888 72000 Probable cases/deaths for the state level
Puerto Rico Out of State 63072999 72000 Miscellaneous cases/deaths for the state level

@nmdefries nmdefries added the data quality Missing data, weird data, broken data label Nov 10, 2020
@SumitDELPHI SumitDELPHI added the Engineering Used to filter issues when synching with Asana label Dec 6, 2020
@krivard krivard closed this as completed Dec 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working data quality Missing data, weird data, broken data Engineering Used to filter issues when synching with Asana
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants