Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in cohort method in export results from inside strategus #161

Closed
jhardin29 opened this issue Mar 15, 2024 · 5 comments
Closed

Error in cohort method in export results from inside strategus #161

jhardin29 opened this issue Mar 15, 2024 · 5 comments

Comments

@jhardin29
Copy link

Log file shows "2024-03-15 13:44:50 [Main thread] FATAL tibble Can't assign rows with toCensor. x Subscript toCensor can't contain missing values. x It has missing values at locations 1, 2, 3, 4, 5, etc."

@schuemie this is for an ASSURE project and I can share the link if helpful.

@schuemie
Copy link
Member

Yes, please send me the link with anything that can help me reproduce this error. I will need to debug this

@azimov
Copy link
Contributor

azimov commented Mar 18, 2024

Some more context to the issue when I have tried to reproduce it:

! Can't assign rows with `toCensor`.
x Subscript `toCensor` can't contain missing values.
x It has missing values at locations 1, 2, 3, 4, 5, etc.
Backtrace:
     x
  1. +-global execute(jobContext)
  2. | \-CohortMethod::exportToCsv(...)
  3. |   \-CohortMethod:::exportCovariateBalance(...) at OHDSI-CohortMethod-d83c543/R/Export.R:167:2
  4. |     \-... %>% cross_join(tidyBalance(balance, minCellCount)) at OHDSI-CohortMethod-d83c543/R/Export.R:660:4
  5. +-dplyr::cross_join(., tidyBalance(balance, minCellCount))
  6. +-dplyr:::cross_join.data.frame(., tidyBalance(balance, minCellCount)) at dplyr/R/join-cross.R:48:2
  7. | \-dplyr::auto_copy(x, y, copy = copy) at dplyr/R/join-cross.R:59:2
  8. |   +-dplyr::same_src(x, y) at dplyr/R/copy-to.R:45:2
  9. |   \-dplyr:::same_src.data.frame(x, y) at dplyr/R/src.R:49:2
 10. |     \-base::is.data.frame(y) at dplyr/R/src.R:54:2
 11. +-CohortMethod:::tidyBalance(balance, minCellCount) at dplyr/R/join-cross.R:59:2
 12. | \-... %>% return() at OHDSI-CohortMethod-d83c543/R/Export.R:728:2
 13. +-dplyr::mutate(...)
 14. +-CohortMethod:::enforceMinCellValue(...) at dplyr/R/mutate.R:146:2
 15. | \-dplyr::pull(data, fieldName) at OHDSI-CohortMethod-d83c543/R/Export.R:229:2
 16. +-CohortMethod:::enforceMinCellValue(...) at OHDSI-CohortMethod-d83c543/R/Export.R:229:2
 17. | +-base::`[<-`(`*tmp*`, toCensor, fieldName, value = `<dbl>`) at OHDSI-CohortMethod-d83c543/R/Export.R:243:4
 18. | \-tibble:::`[<-.tbl_df`(`*tmp*`, toCensor, fieldName, value = `<dbl>`) at OHDSI-CohortMethod-d83c543/R/Export.R:243:4
 19. |   \-tibble:::tbl_subassign(x, i, j, value, i_arg, j_arg, substitute(value))
 20. |     \-tibble:::vectbl_as_new_row_index(i, x, i_arg, call = call)
 21. |       \-tibble:::vectbl_as_row_location(...)
 22. |         +-tibble:::subclass_row_index_errors(...)
 23. |         | \-base::withCallingHandlers(...)
  |==============                                  tion(...)                   
 25. \-vctrs (local) `<fn>`() at vctrs/R/subscript-loc.R:80:2
 26.   \-vctrs:::stop_subscript_missing(i = i, call = call)
 27.     \-rlang::cnd_signal(...) at vctrs/R/subscript-loc.R:411:2
Execution halted

It seems to be inside the enforceMinCellValue function but I'm unable to produce a data.frame that produces it outside of Strategus

@schuemie
Copy link
Member

You can debug inside the CohortMethodModule, right? (Strategus will output the code to start an R session inside the failing module)

@azimov
Copy link
Contributor

azimov commented Mar 19, 2024

On further inspection the line in question is here.

This appears to be occurring because the value of inferredComparatorAfterSize is NaN with the values in the meanBefore column being double values > 0.

reproducible example:

x <- data.frame(a=c(0.1, 0.002))
enforceMinCellValue(x, "a", NaN)

   censoring NA values (NA%) from a because value below minimum
Error in `[<-.data.frame`(`*tmp*`, toCensor, fieldName, value = NaN) : 
  missing values are not allowed in subscripted assignments of data frames

When values in this column are 0 or NA, no issue occurs:

x <- data.frame(a=c(0, NA))
enforceMinCellValue(x, "a", NaN)

   censoring 0 values (0%) from a because value below minimum
   a
1  0
2 NA

I'm unclear what the desired behaviour should be in this setting, but an is.nan sanitation on the minValues parameter is likely required.

In this case the cause of the NaN is because all the values of balance$afterMatchingSumTarget and balance$afterMatchingMeanTarget are NA, but the value of the input parameters to minValues may need to be adjusted on case by case basis.

@schuemie
Copy link
Member

The NaN is probably ultimately caused by the cohort being really small after matching, and therefore any number is probably too small to share without violating the min cell requirement.

Do you know how big the comparator cohort after matching is in this case?

My proposed solution is: if inferred size is NaN and minCellCount > 0, set the corresponding values in the results to NA (R) / NULL (SQL). (All these fields are nullable according to the data model)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants