-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix keypin column in commercial valuation data #648
base: master
Are you sure you want to change the base?
Conversation
sep = "-" | ||
), | ||
keypin | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could not for the life of me get ccao:pin_format_pretty
to work here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What error were you getting? IMO it's worth using our builtins whenever possible.
…ao-data/data-architecture into fix-commercial-valuation-data
@@ -194,5 +212,6 @@ list.files( | |||
select(all_of(sort(names(.)))) %>% | |||
relocate(c(keypin, pins, township, year)) %>% | |||
relocate(c(file, sheet), .after = last_col()) %>% | |||
distinct() %>% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oddly, there are some rows that are duplicated across all columns.
filter( | ||
check.numeric(excesslandval), | ||
!is.na(keypin), | ||
!str_detect(keypin, "[:alpha:]"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to drop_any_ key PIN with an alphabetical character. Are there any cases like 13264290020000A
where we don't want that to happen?
sep = "-" | ||
), | ||
keypin | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What error were you getting? IMO it's worth using our builtins whenever possible.
While adding commercial data documentation to the wiki I was trying to determine if there is a way to define primary keys for the data (I couldn't). I did, however, determine that there were some pretty bad values for
keypin
in the data. We try to clean this data as little as possible, but given the importance ofkeypin
, it feels prudent to make sure we don't have rows with random notes or "TOTAL PINS" askeypin
.This PR reduces the size of the dataset by about 500 rows and cleans they
keypin
column in about 1000 more.