Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf() on pls unable to handle near zero variance features #196

Closed
Max-Bladen opened this issue Mar 27, 2022 · 0 comments · Fixed by #197
Closed

perf() on pls unable to handle near zero variance features #196

Max-Bladen opened this issue Mar 27, 2022 · 0 comments · Fixed by #197
Assignees
Labels
bug Something isn't working

Comments

@Max-Bladen
Copy link
Collaborator

Max-Bladen commented Mar 27, 2022

🐞 Describe the bug:
When running the perf() function on a pls object (using any PLS mode), if there are features which have near zero variance (nzv), the following error is raised:

Error in Ypred[omit, , h] <- Y.hat[, , 1] :
number of items to replace is not a multiple of replacement length

This pub was raised by two users on the discourse forum:
https://mixomics-users.discourse.group/t/pls-and-diablo-tuning/742/4


🔍 reprex results from reproducible example including sessioninfo():

library(mixOmics)

data("liver.toxicity")

# reducing number of features to reduce run time
X <- liver.toxicity$gene[, 1:1000]
Y <- liver.toxicity$clinic

# to reproduce error, we need to induce some features to have near zero variance
X[, c(1, 23, 62, 234, 789)] <- 0

pls.obg <- pls(Y, X, ncomp = 4)
#> Warning in cor(A[[k]], variates.A[[k]]): the standard deviation is zero
pls.perf.obj <- perf(pls.obg, validation = "Mfold", folds = 4, 
                     progressBar = F, 
                     nrepeat = 3)
#> Error in Ypred[omit, , h] <- Y.hat[, , 1]: number of items to replace is not a multiple of replacement length

Created on 2022-03-28 by the reprex package (v2.0.1)

Session info
sessioninfo::session_info()
#> - Session info ---------------------------------------------------------------
#>  setting  value
#>  version  R version 4.1.2 Patched (2021-11-16 r81220)
#>  os       Windows 10 x64 (build 19044)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language (EN)
#>  collate  English_Australia.1252
#>  ctype    English_Australia.1252
#>  tz       Australia/Sydney
#>  date     2022-03-28
#>  pandoc   2.14.2 @ C:/Users/Work/AppData/Local/Pandoc/ (via rmarkdown)
#> 
#> - Packages -------------------------------------------------------------------
#>  package      * version date (UTC) lib source
#>  assertthat     0.2.1   2019-03-21 [1] CRAN (R 4.1.3)
#>  BiocParallel   1.28.3  2021-12-09 [1] Bioconductor
#>  cli            3.2.0   2022-02-14 [1] CRAN (R 4.1.2)
#>  colorspace     2.0-3   2022-02-21 [1] CRAN (R 4.1.2)
#>  corpcor        1.6.10  2021-09-16 [1] CRAN (R 4.1.1)
#>  crayon         1.5.0   2022-02-14 [1] CRAN (R 4.1.2)
#>  DBI            1.1.2   2021-12-20 [1] CRAN (R 4.1.3)
#>  digest         0.6.29  2021-12-01 [1] CRAN (R 4.1.2)
#>  dplyr          1.0.8   2022-02-08 [1] CRAN (R 4.1.2)
#>  ellipse        0.4.2   2020-05-27 [1] CRAN (R 4.1.2)
#>  ellipsis       0.3.2   2021-04-29 [1] CRAN (R 4.1.2)
#>  evaluate       0.15    2022-02-18 [1] CRAN (R 4.1.2)
#>  fansi          1.0.2   2022-01-14 [1] CRAN (R 4.1.2)
#>  fastmap        1.1.0   2021-01-25 [1] CRAN (R 4.1.2)
#>  fs             1.5.2   2021-12-08 [1] CRAN (R 4.1.2)
#>  generics       0.1.2   2022-01-31 [1] CRAN (R 4.1.2)
#>  ggplot2      * 3.3.5   2021-06-25 [1] CRAN (R 4.1.2)
#>  ggrepel        0.9.1   2021-01-15 [1] CRAN (R 4.1.2)
#>  glue           1.6.2   2022-02-24 [1] CRAN (R 4.1.2)
#>  gridExtra      2.3     2017-09-09 [1] CRAN (R 4.1.2)
#>  gtable         0.3.0   2019-03-25 [1] CRAN (R 4.1.2)
#>  highr          0.9     2021-04-16 [1] CRAN (R 4.1.2)
#>  htmltools      0.5.2   2021-08-25 [1] CRAN (R 4.1.2)
#>  igraph         1.2.11  2022-01-04 [1] CRAN (R 4.1.2)
#>  knitr          1.37    2021-12-16 [1] CRAN (R 4.1.2)
#>  lattice      * 0.20-45 2021-09-22 [2] CRAN (R 4.1.2)
#>  lifecycle      1.0.1   2021-09-24 [1] CRAN (R 4.1.2)
#>  magrittr       2.0.2   2022-01-26 [1] CRAN (R 4.1.2)
#>  MASS         * 7.3-54  2021-05-03 [2] CRAN (R 4.1.2)
#>  Matrix         1.3-4   2021-06-01 [2] CRAN (R 4.1.2)
#>  matrixStats    0.61.0  2021-09-17 [1] CRAN (R 4.1.2)
#>  mixOmics     * 6.18.1  2021-11-18 [1] Bioconductor (R 4.1.2)
#>  munsell        0.5.0   2018-06-12 [1] CRAN (R 4.1.2)
#>  pillar         1.7.0   2022-02-01 [1] CRAN (R 4.1.2)
#>  pkgconfig      2.0.3   2019-09-22 [1] CRAN (R 4.1.2)
#>  plyr           1.8.6   2020-03-03 [1] CRAN (R 4.1.2)
#>  purrr          0.3.4   2020-04-17 [1] CRAN (R 4.1.2)
#>  R.cache        0.15.0  2021-04-30 [1] CRAN (R 4.1.2)
#>  R.methodsS3    1.8.1   2020-08-26 [1] CRAN (R 4.1.1)
#>  R.oo           1.24.0  2020-08-26 [1] CRAN (R 4.1.1)
#>  R.utils        2.11.0  2021-09-26 [1] CRAN (R 4.1.2)
#>  R6             2.5.1   2021-08-19 [1] CRAN (R 4.1.2)
#>  rARPACK        0.11-0  2016-03-10 [1] CRAN (R 4.1.2)
#>  RColorBrewer   1.1-2   2014-12-07 [1] CRAN (R 4.1.1)
#>  Rcpp           1.0.8.2 2022-03-11 [1] CRAN (R 4.1.2)
#>  reprex         2.0.1   2021-08-05 [1] CRAN (R 4.1.2)
#>  reshape2       1.4.4   2020-04-09 [1] CRAN (R 4.1.2)
#>  rlang          1.0.2   2022-03-04 [1] CRAN (R 4.1.3)
#>  rmarkdown      2.13    2022-03-10 [1] CRAN (R 4.1.3)
#>  RSpectra       0.16-0  2019-12-01 [1] CRAN (R 4.1.2)
#>  rstudioapi     0.13    2020-11-12 [1] CRAN (R 4.1.2)
#>  scales         1.1.1   2020-05-11 [1] CRAN (R 4.1.2)
#>  sessioninfo    1.2.2   2021-12-06 [1] CRAN (R 4.1.2)
#>  stringi        1.7.6   2021-11-29 [1] CRAN (R 4.1.2)
#>  stringr        1.4.0   2019-02-10 [1] CRAN (R 4.1.2)
#>  styler         1.7.0   2022-03-13 [1] CRAN (R 4.1.2)
#>  tibble         3.1.6   2021-11-07 [1] CRAN (R 4.1.2)
#>  tidyr          1.2.0   2022-02-01 [1] CRAN (R 4.1.2)
#>  tidyselect     1.1.2   2022-02-21 [1] CRAN (R 4.1.2)
#>  utf8           1.2.2   2021-07-24 [1] CRAN (R 4.1.2)
#>  vctrs          0.3.8   2021-04-29 [1] CRAN (R 4.1.2)
#>  withr          2.5.0   2022-03-03 [1] CRAN (R 4.1.2)
#>  xfun           0.30    2022-03-02 [1] CRAN (R 4.1.2)
#>  yaml           2.3.5   2022-02-21 [1] CRAN (R 4.1.2)
#> 
#>  [1] C:/Users/Work/Documents/R/win-library/4.1
#>  [2] C:/Program Files/R/R-4.1.2patched/library
#> 
#> ------------------------------------------------------------------------------

🤔 Expected behavior:
The perf() function is built to handle nzv features. Hence, it should ideally raise a warning to the fact that features have nzv, but should be able to proceed properly.


💡 Possible solution:
Error is occurring at line 542. Likely a result of Y.pred containing columns for every feature whereas Y.hat containing columns for all non-nzv features. Hence, adjusting this line to:

Ypred[omit, nzv.Y, h] = Y.hat[, , 1]

may solve the bug.

@Max-Bladen Max-Bladen added the bug Something isn't working label Mar 27, 2022
@Max-Bladen Max-Bladen self-assigned this Mar 27, 2022
Max-Bladen added a commit that referenced this issue Mar 27, 2022
@Max-Bladen Max-Bladen linked a pull request Mar 27, 2022 that will close this issue
@Max-Bladen Max-Bladen added rapid-review for PRs which will take minimal time to review and close ready-to-review for all PRs that are ready to be reviewed. including complex, larger commits labels Aug 1, 2022
Max-Bladen added a commit that referenced this issue Aug 9, 2022
Max-Bladen added a commit that referenced this issue Sep 19, 2022
fix: resolved issue preventing `perf()` from properly accounting for near-zero-variance features
@Max-Bladen Max-Bladen removed rapid-review for PRs which will take minimal time to review and close ready-to-review for all PRs that are ready to be reviewed. including complex, larger commits labels Sep 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant