-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: assertions on attributes of data frame #98
Comments
Hi @JordanGutterman, I stumbled across a similar issue and came to following solutions using library(dplyr)
library(assertr)
# Make toy dataframes
my_cars <-
mtcars %>%
mutate(id = row_number())
cars_info <-
my_cars %>%
select(id) %>%
mutate(color = "purple", year = 1974)
# Option 1: check then join
my_cars %>%
verify(nrow(anti_join(., cars_info, by = "id")) == 0) %>%
left_join(cars_info, by = "id") %>%
head()
#> mpg cyl disp hp drat wt qsec vs am gear carb id color year
#> 1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 1 purple 1974
#> 2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 2 purple 1974
#> 3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 3 purple 1974
#> 4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 4 purple 1974
#> 5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 5 purple 1974
#> 6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 6 purple 1974
# Option 2: join then check
my_cars %>%
left_join(cars_info, by = "id") %>%
verify(nrow(.) == nrow(my_cars)) %>%
head()
#> mpg cyl disp hp drat wt qsec vs am gear carb id color year
#> 1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 1 purple 1974
#> 2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 2 purple 1974
#> 3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 3 purple 1974
#> 4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 4 purple 1974
#> 5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 5 purple 1974
#> 6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 6 purple 1974 Created on 2020-07-29 by the reprex package (v0.3.0) |
Hi,
Which will fail if |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I previously used this package to check that joins in a pipeline do not introduce new rows via duplicates using the following pattern:
Or similarly, other attributes of the data frame at that point in the pipeline by passing the current state of the frame in the pipeline using
.
This commit added a check that columns are passed to
assert()
, which makes sense per the current documentation but causes my use case to break. So this a request is to allow passing logical checks to predicates that do not operate on columns, or another way to check attributes of the data frame being built at that point in the pipeline.The text was updated successfully, but these errors were encountered: