Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flag potential improper use of by variables #46

Closed
mstackhouse opened this issue Apr 15, 2022 · 1 comment
Closed

Flag potential improper use of by variables #46

mstackhouse opened this issue Apr 15, 2022 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@mstackhouse
Copy link
Contributor

Description

When by variables are provided into Tplyr, there's a complete run to fill NA values. There's an inherent assumption we've made where you won't provide two by variables that are 1:1. For example, VISITNUM and VISIT. This is because Tplyr can autoamtically detect that AVISITN should be using for sorting AVISIT, based on ADaM assumptions.

If you do provide VISITNUM and VISIT, it will duplicate all of those records and essentially cartesian join the results because of the dplyr::complete() calls we run. This is necessary to an extent, because we want to provide the 0 rows of factor combinations if with no results in the data, but makes these scenarios a bit unintuitive and confusing.

As a preventative measure - we should introduce a warning if we notice a large proportional increate of records due to the complete. Gauging what that ratio increase is a little tough - but maybe somehting like if we notice that rows increase by 50% then produce a warning.

@mstackhouse mstackhouse added the enhancement New feature or request label Apr 15, 2022
@mstackhouse
Copy link
Contributor Author

Closed via #174

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants