-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Be more forgiving of exterior ranges? #42
Comments
I agree that it is better just to trim the range than stop entirely. Can you try the version I just pushed? Note that Orestes doesn't appear until 1828, so this might be a better test:
|
Super, thanks. Edge case note: the behavior is now unclear when both dates are outside the allowed range. > gender("James",years=c(1930,1930),method="ipums")
Source: local data frame [1 x 6]
name proportion_male proportion_female gender year_min year_max
<chr> <dbl> <dbl> <chr> <dbl> <dbl>
1 James 0.9902 0.0098 male 1930 1930
> gender("James",years=c(1960,1980),method="ipums")
Source: local data frame [0 x 6]
Variables not shown: name <chr>, proportion_male <dbl>, proportion_female <dbl>, gender <lgl>, year_min
<dbl>, year_max <dbl>.
Warning message:
In gender("James", years = c(1960, 1980), method = "ipums") :
The year range provided has been trimmed to fit within 1789 to 1930. |
Hmm. Good point. As it stands, dates which are completely outside the range On Sat, Sep 10, 2016 at 3:58 PM, Benjamin Schmidt notifications@github.com
Lincoln Mullen |
Based on the output of |
Yeah, I wasn't thinking clearly about how the range was set for odd inputs. The whole code for setting ranges should be refactored. Will fix. |
If I have someone named "Orestes" in 1831, I can't match it in the IPUMS sample
No problem, right? Just broaden the net when you have a rare name
Super. But if I want to do a batch test on many names, I'd like to be able to just set the years for each of them at c(year-30,year+30). But this is going to raise loads of errors for anyone near the edge of the range.
Of course I can muck up my codes with a lot of maxes and mins for each of the datasets I'm using. But why not just clip
c(1788,1818)
toc(1789, 1818)
and write a warning instead of raising an error?The text was updated successfully, but these errors were encountered: