Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: derive_vars_period() errors unexpectedly when dataset_ref has unused variables #2231

Closed
manciniedoardo opened this issue Nov 14, 2023 · 1 comment · Fixed by #2245
Closed
Assignees
Labels
bug Something isn't working programming

Comments

@manciniedoardo
Copy link
Collaborator

manciniedoardo commented Nov 14, 2023

What happened?

When dataset_ref contains variables that are then not mentioned within the new_vars call, derive_vars_period() errors with the unhelpful error message:

Error in `signal_duplicate_records()`:
! Dataset `dataset_add` contains duplicate records with respect to `STUDYID` and `USUBJID`
Run `get_duplicates_dataset()` to access the duplicate records

when really one would at least expect the error message to flag that this is occurring because every variable in dataset_ref must either be a key variable or be mentioned in new_vars. Alternatively the function should just ignore unused variables. See reproducible example for details.

Session Information

Not relevant

Reproducible Example

adsl <- tibble(STUDYID = "xyz", USUBJID = c("1", "2"))

# Add period variables to ADSL
period_ref <- tribble(
  ~USUBJID, ~APERIOD, ~APERSDT,     ~APEREDT,     ~APERSDTM,                ~APEREDTM,
  "1",             1, "2021-01-04", "2021-02-06", "2021-01-04 10:10:00", "2021-02-07 11:20:01", 
  "1",             2, "2021-02-07", "2021-03-07", "2021-02-07 11:20:02", "2021-03-07 10:10:00",
  "2",             1, "2021-02-02", "2021-03-02", "2021-02-02 12:20:01", "2021-03-03 13:10:01",
  "2",             2, "2021-03-03", "2021-04-01", "2021-03-03 13:10:02", "2021-04-01 10:10:00"
) %>%
  mutate(
    STUDYID = "xyz",
    APERIOD = as.integer(APERIOD),
    APERSDT = ymd(APERSDT),
    APERSDT = ymd(APEREDT),
    APERSDTM = ymd_hms(APERSDTM),
    APEREDTM = ymd_hms(APEREDTM),
  )

# This errors because we are not mentioning APERSDTM, APEREDTM anywhere
derive_vars_period(
  adsl,
  dataset_ref = period_ref,
  new_vars = exprs(APxxSDT = APERSDT, APxxEDT = APEREDT)
)

# This doesn't error because we are mentioning APERSDTM, APEREDTM
derive_vars_period(
  adsl,
  dataset_ref = period_ref,
  new_vars = exprs(APxxSDT = APERSDT, APxxEDT = APEREDT, APxxSDTM = APERSDTM, APxxEDTM = APEREDTM)
)

# This doesn't error because we are removing APERSDTM, APEREDTM from period_ref
derive_vars_period(
  adsl,
  dataset_ref = period_ref %>% select(-ends_with("M")),
  new_vars = exprs(APxxSDT = APERSDT, APxxEDT = APEREDT)
)
@manciniedoardo manciniedoardo added bug Something isn't working programming future Issue to be implemented after release labels Nov 14, 2023
@manciniedoardo
Copy link
Collaborator Author

Decision from core meeting: behaviour should be changed so that derive_vars_period doesn't throw an error at all, just ignores variables in period_ref that are not used in the function call

@manciniedoardo manciniedoardo added release Q4-2023 and removed future Issue to be implemented after release labels Nov 18, 2023
@manciniedoardo manciniedoardo self-assigned this Nov 18, 2023
manciniedoardo added a commit that referenced this issue Nov 18, 2023
bms63 added a commit that referenced this issue Nov 21, 2023
…olumns (#2245)

* #2231 added check in `derive_vars_period()` to remove unused columns

* #2231 Chore: styler

* Update tests/testthat/test-period_dataset.R

Co-authored-by: Ben Straub <ben.x.straub@gsk.com>

* #2231 updates to `assert_data_frame` calls following review

---------

Co-authored-by: Ben Straub <ben.x.straub@gsk.com>
Co-authored-by: Stefan Bundfuss <80953585+bundfussr@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working programming
Development

Successfully merging a pull request may close this issue.

1 participant