Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix keypin column in commercial valuation data #648

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

wrridgeway
Copy link
Member

@wrridgeway wrridgeway commented Nov 18, 2024

While adding commercial data documentation to the wiki I was trying to determine if there is a way to define primary keys for the data (I couldn't). I did, however, determine that there were some pretty bad values for keypin in the data. We try to clean this data as little as possible, but given the importance of keypin, it feels prudent to make sure we don't have rows with random notes or "TOTAL PINS" as keypin.

This PR reduces the size of the dataset by about 500 rows and cleans they keypin column in about 1000 more.

sweatyhandshake and others added 2 commits November 18, 2024 15:07
sep = "-"
),
keypin
),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not for the life of me get ccao:pin_format_pretty to work here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What error were you getting? IMO it's worth using our builtins whenever possible.

@@ -194,5 +212,6 @@ list.files(
select(all_of(sort(names(.)))) %>%
relocate(c(keypin, pins, township, year)) %>%
relocate(c(file, sheet), .after = last_col()) %>%
distinct() %>%
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oddly, there are some rows that are duplicated across all columns.

@wrridgeway wrridgeway marked this pull request as ready for review November 18, 2024 21:31
@wrridgeway wrridgeway requested a review from a team as a code owner November 18, 2024 21:31
filter(
check.numeric(excesslandval),
!is.na(keypin),
!str_detect(keypin, "[:alpha:]"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to drop_any_ key PIN with an alphabetical character. Are there any cases like 13264290020000A where we don't want that to happen?

sep = "-"
),
keypin
),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What error were you getting? IMO it's worth using our builtins whenever possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants