Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Devtools::check() working on Windows! (+ add / fix some tensor stuff) #314

Merged
merged 17 commits into from
Sep 30, 2024

Conversation

brendanlu
Copy link
Contributor

Hey, just some stuff on new tensor functions here.

  • Added tplsda function and unit tests
  • Added some additional unit tests for existing tensor functions
  • Fix up my terrible docstrings that were causing the constellation of roxygen warnings :)
  • Generate a bunch of .Rd files (they are very rough at the moment, will have to update them and actually write useful docstrings later)
  • Added magick to the renv.lock, I saw you added this to DESCRIPTION so I installed locally and sync'ed the lockfile

Local testing and building status

Test status is unchanged. Just more passing tests as more unit tests have been introduced. Might open a separate PR in the near future to try and refactor some of the parallel code so that it runs, although keen to get your opinion!

Thanks for adding magick into DESCRIPTION! With this, and installing pandoc locally into PATH, devtools::check can move forward a bit more now on Windows before crashing again...

devtools::check() error:

Quitting from lines 914-939 [04-spls2-tuning] (vignette.Rmd)
Error: processing vignette 'vignette.Rmd' failed with diagnostics:
BiocParallel errors
 8 remote errors, element index: 1, 11, 21, 31, 41, 51, ...
 72 unevaluated and other errors
 first remote error:
Error in spls(X = X, Y = Y, keepX = c(choice.keepX, keepX), keepY = c(choice.keepY, : could not find function "spls"

Which is complaining about the following example code in the vignette:

# This code may take several min to run, parallelisation option is possible
list.keepX <- c(seq(5, 50, 5))
list.keepY <- c(3:10)

# added this to avoid errors where num_workers exceeded limits set by devtools::check()
    chk <- Sys.getenv("_R_CHECK_LIMIT_CORES_", "")
    if (nzchar(chk) && chk == "TRUE") {
      # use 2 cores in CRAN/Travis/AppVeyor
      num_workers <- 2L
    } else {
      # use all cores in devtools::test()
      num_workers <- parallel::detectCores()
    }

set.seed(33)  # For reproducibility with this handbook, remove otherwise
tune.spls.liver <- tune.spls(X, Y, test.keepX = list.keepX, 
                             test.keepY = list.keepY, ncomp = 2, 
                             nrepeat = 1, folds = 10, mode = 'regression', 
                             measure = 'cor', 
                            #   the following uses two CPUs for faster computation
                            # it can be commented out
                            BPPARAM = BiocParallel::SnowParam(workers = num_workers)
                            )

plot(tune.spls.liver)

When I briefly checked the test failures, the error seemed to be related to the fact that the workers in SnowParam are unable to find the installed mixOmics library (because I think devtools just does some partial import when we invoke the various commands) and as a result the workers just crash.

Surely this must be a common testing / documenting problem for libraries that have parallel code, so I'm hoping to have a look sometime in the future and maybe there's a somewhat straightforward fix.

I'm also keen to refactor as much parallel code as possible using BiocParallel...
For instance in perf.diablo.R there's still a cluster being created manually (which is also causing a test failure, although I think it's for the same reason as SnowParam() in the biocparallel code)

    if (parallel) {
      if (progressBar ==  TRUE) {
        message("progressBar unavailable in parallel computing mode for perf.block.splsda ...")
      }
      
      ## for some reason, FORK freezes when auc == TRUE
      if (.onUnix()) {
        cl <- makeForkCluster(cpus)
      } else {
        cl <- makePSOCKcluster(cpus)
      }
      
      on.exit(stopCluster(cl))
      clusterEvalQ(cl, library(mixOmics))
      clusterExport(cl, ls(), environment())
      if (!is.null(list(...)$seed)) { ## for unit tests purpose
        RNGversion(.mixo_rng()) ## bc different R versions can have different RNGs and we want reproducible unit tests
        clusterSetRNGStream(cl = cl, iseed = list(...)$seed)
      }
      repeat_cv_perf.diablo_res <- parLapply(cl, nrep_list, function(nrep) repeat_cv_perf.diablo(nrep))
    } 

which I am hoping can be rewritten using BiocParallel and snowparam() and multicoreparam() but probably need to sort out this BiocParallel issue first!

@brendanlu
Copy link
Contributor Author

UPDATE: devtools::check() can pass on Windows now!!! Turns out it was an easy fix

For the parallel workers, I think the partial import that devtools does during devtools::test() and devtools::check() is not good enough as they cannot find it in .libPaths() is my guess, so one has to explicitly run devtools::install() first so that there is a copy of mixOmics the workers can find. This way, all the tests that were previously failing now pass too. I feel rather silly because all the tests were passing locally on my computer a few weeks ago when I was working on this, and then suddenly stopped working, this must be why.

In the Github tests, is there a way to make sure devtools::install() (or something equivalent) is automatically invoked before any testing / checking? Let me know if this is possible and if it fixes any of the failures!

Sorry my terrible function documenting is producing a chain of warnings, but heres the output from the completed devtools::check()!

── R CMD check results ───────────────────────────────────────────────────────────────────── mixOmics 6.25.1 ────
Duration: 10m 36s

❯ checking Rd \usage sections ... WARNING
  Undocumented arguments in Rd file 'facewise_product.Rd'
    'a' 'b'

  Undocumented arguments in Rd file 'matrix_to_m_transforms.Rd'
    'm_mat'
  Documented arguments not in \usage in Rd file 'matrix_to_m_transforms.Rd':
    'mat'

  Undocumented arguments in Rd file 'tpca.Rd'
    'x' 'ncomp' 'm' 'minv' 'center' 'matrix_output' 'bpparam'

  Undocumented arguments in Rd file 'tpls.Rd'
    'x' 'y' 'ncomp' 'm' 'minv' 'mode' 'center' 'matrix_output' 'bpparam'

  Undocumented arguments in Rd file 'tplsda.Rd'
    'x' 'y' 'multilevel' 'ncomp' 'm' 'minv' 'center' 'bpparam'

  Undocumented arguments in Rd file 'tsvdm.Rd'
    'x' 'm' 'minv' 'transform' 'keep_hats' 'svals_matrix_form' 'bpparam'

  Functions with \usage entries need to have the appropriate \alias
  entries, and all their arguments documented.
  The \usage entries must correspond to syntactically valid R code.
  See chapter 'Writing R documentation files' in the 'Writing R
  Extensions' manual.

❯ checking package dependencies ... NOTE
  Package suggested but not available for checking: 'microbenchmark'

❯ checking installed package size ... NOTE
    installed size is 12.9Mb
    sub-directories of 1Mb or more:
      R      1.4Mb
      data   3.3Mb
      doc    6.6Mb

0 errors ✔ | 1 warning ✖ | 2 notes ✖

Let me know if this works for you too. If so I'll try work on refactoring some of the parallel code in the future when I have time, and once we are sure all the old tests pass as intended on all environments.

@brendanlu brendanlu changed the title Some updates on tensor stuff Devtools::check() working on Windows! (+ add / fix some tensor stuff) Sep 29, 2024
@evaham1 evaham1 merged commit df7d5ea into mixOmicsTeam:github-actions-fix Sep 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants