Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up quantile_ensemble_flex by forming model$A with one sparseMatrix call and specially treating tau_group runs #8

Merged
merged 8 commits into from
Jul 30, 2021

Conversation

brookslogan
Copy link

This PR builds on #6. It provides two extra efficiency gains for quantile_ensemble_flex:

  • Speed up model$A formation even further by building up parts of i, j, and x vectors to feed into sparseMatrix beforehand, then calling sparseMatrix only once. This avoids extra checks and conversions between Csparse and Tsparse matrices. An Rsparse-inspired approach might be marginally faster, but harder to read.
  • Improve performance when tau_groups contains runs (consecutive duplicated values): eliminates "duplicate" optimization variables, and when nonneg is TRUE and q0 doesn't contain crossings, eliminates noncrossing constraints within these runs that appear unnecessary. It may be possible to have a competitive or faster alternative to the current quantile_ensemble_stand by delegating to / postprocessing this version of quantile_ensemble_flex.

Additionally, this adds some characterization tests for quantile_ensemble using testthat.

Exploratory testing & benchmarking code:

benchmark-flex-matrix-format.md

Selected benchmark results for a 400 x 20 x 7 q matrix with three tau groups, no crossings within qarr, and default parameters:

Unit: milliseconds
                            expr       min        lq      mean    median
                        prealloc 2541.4974 2548.8273 2562.0451 2555.2931
                           rbind  863.5922  866.2385  875.5932  870.2080
         rbind_listTsparse_parts  186.2303  189.6228  197.0454  191.7732
 rbind_listTsparse_parts_rle_tau  143.2163  145.7374  147.7879  148.1336
          rbind_Rsparse_pairwise  191.3534  194.3831  195.8514  195.7886
      rbind_listRsparse_pairwise  183.8296  185.3404  187.0080  186.4660
        rbind_listRsparse_blocks  181.9815  183.4725  196.5011  184.8695
         rbind_modRsparse_blocks  181.3435  183.5542  184.4445  184.2028
             Rsparse_gurobi_only  170.1477  172.4180  173.4431  173.0580
        uq       max neval
 2563.6260 2657.1050    20
  873.0159  979.1753    20
  193.5336  304.1660    20
  149.4263  152.1581    20
  197.3839  199.1315    20
  188.7764  191.4906    20
  186.8797  300.5560    20
  185.4671  188.5675    20
  174.2836  176.3394    20

elray1 and others added 4 commits February 17, 2021 16:12
Keep track of the i,j,x parts using lists of vectors rather than Matrix objects,
then call sparseMatrix only once, in order to reduce the amount of checking
performed, and to prevent unintended conversions between Tsparse and Csparse
forms.
When `tau_groups` has runs (duplicate values that appear consecutively), use
only `p` variables to represent the coefficients within each run. Remove
noncrossing constraints within the runs when one criterion (nonneg is TRUE and
q0 doesn't contain crossings) indicates that removing these contraints should
not affect the output.

This approach may make quantile_ensemble_flex's `model` look the same or similar
to `quantile_ensemble_stand`'s, and `quantile_ensemble_stand` might be sped up
by delegating to / post-processing `quantile_ensemble_flex` to take advantage of
the A matrix formation changes.
@ryantibs
Copy link
Owner

@lcbrooks This looks wonderful. Great job and thank you!

One thing: please add yourself as a package author!

Another thing: I forget what's the easiest way for me to see the results of the tests. I can see the code you've added but can you remind what's the easiest way for me to look at the test outputs (the checks that the old code and the new code match up)? Thanks.

lcbrooks added 2 commits July 23, 2021 19:42
Update `sparseMatrix` call to use `repr = "T"` rather than deprecated equivalent
`giveCsparse=FALSE`.  Also format adjacent line.
@brookslogan
Copy link
Author

Thanks!

I updated the DESCRIPTION above (is "aut" the right "role"?).

For running tests, use devtools::test(), Ctrl/Cmd-Shift-T in RStudio, or C-c C-w C-t in Emacs with ESS. Ideally, GitHub Actions would run these tests on everything automatically as well, but I'm haven't worked with these before... maybe usethis::use_github_action_check_standard() or usethis::use_github_action_check_full() would take care of the setup?

@ryantibs
Copy link
Owner

Yep "aut" is the right role.

I just ran devtools::test() and get:

> devtools::test()
Loading quantgen
Testing quantgen
✔ |  OK F W S | Context
✖ |   2 1     | characterization-quantile_ensemble [0.5 s]                      
────────────────────────────────────────────────────────────────────────────────
Error (test-characterization-quantile_ensemble.R:77:3): quantile_ensemble's A matrix (for Rgplk) encodes identical entries to previous version and that the output is equal
Error: unused argument (repr = "T")
Backtrace:
 1. withr::with_rng_version(...) test-characterization-quantile_ensemble.R:77:2
 4. quantgen::quantile_ensemble(...) test-characterization-quantile_ensemble.R:138:10
 5. quantgen::quantile_ensemble_flex(...) /Users/ryantibshirani/Dropbox/quantgen/R-package/quantgen/R/quantile_ensemble.R:134:4
────────────────────────────────────────────────────────────────────────────────

══ Results ═════════════════════════════════════════════════════════════════════
Duration: 0.6 s

[ FAIL 1 | WARN 0 | SKIP 0 | PASS 2 ]

I'm not sure if the only error is due to the last commit (use of repr = "T" ... by the way did you really mean to put quotations as in "T" here? Just checking because T, without quotations, is encoded as TRUE in R.). Please fix the errors? Thanks!

lcbrooks added 2 commits July 26, 2021 11:22
Context:

In the `Matrix` package v1.3-0, `sparseMatrix` was
[updated](https://cran.r-project.org/web/packages/Matrix/news.html) with a new
argument `repr` and deprecation of `giveCsparse=FALSE`.  Commit
9f8fa00 avoided deprecation warnings when using
`Matrix` package versions >= 1.3-0, but generated errors with versions < 1.3-0.

Action:

Avoid both errors for earlier versions and deprecation warnings for later
versions by constructing Tsparse matrices by directly constructing an
appropriate S4 object.
@brookslogan brookslogan force-pushed the ensemble-speed-up-A-matrix branch from 0174dac to 42e269c Compare July 26, 2021 16:22
@brookslogan
Copy link
Author

Thanks for catching this! It should be fixed now. See the commit message here / above for context. (I did mean "T" rather than T, but trying to use the argument repr at all broke things when using Matrix package versions prior to 1.3-0.)

@ryantibs
Copy link
Owner

Sorry for the delay here. I'm happy to merge this now (reran tests and all 87 passed!).

@elray1 You might consider running this for your use case now and see how fast it is? (No pressure either way, but just curious to see if it's competitive with your Python code now.)

@ryantibs ryantibs merged commit 99e9d7b into ryantibs:master Jul 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants