Speed up `quantile_ensemble_flex` by forming `model$A` with one `sparseMatrix` call and specially treating `tau_group` runs #8

brookslogan · 2021-07-21T12:14:14Z

This PR builds on #6. It provides two extra efficiency gains for quantile_ensemble_flex:

Speed up model$A formation even further by building up parts of i, j, and x vectors to feed into sparseMatrix beforehand, then calling sparseMatrix only once. This avoids extra checks and conversions between Csparse and Tsparse matrices. An Rsparse-inspired approach might be marginally faster, but harder to read.
Improve performance when tau_groups contains runs (consecutive duplicated values): eliminates "duplicate" optimization variables, and when nonneg is TRUE and q0 doesn't contain crossings, eliminates noncrossing constraints within these runs that appear unnecessary. It may be possible to have a competitive or faster alternative to the current quantile_ensemble_stand by delegating to / postprocessing this version of quantile_ensemble_flex.

Additionally, this adds some characterization tests for quantile_ensemble using testthat.

Exploratory testing & benchmarking code:

Selected benchmark results for a 400 x 20 x 7 q matrix with three tau groups, no crossings within qarr, and default parameters:

Unit: milliseconds
                            expr       min        lq      mean    median
                        prealloc 2541.4974 2548.8273 2562.0451 2555.2931
                           rbind  863.5922  866.2385  875.5932  870.2080
         rbind_listTsparse_parts  186.2303  189.6228  197.0454  191.7732
 rbind_listTsparse_parts_rle_tau  143.2163  145.7374  147.7879  148.1336
          rbind_Rsparse_pairwise  191.3534  194.3831  195.8514  195.7886
      rbind_listRsparse_pairwise  183.8296  185.3404  187.0080  186.4660
        rbind_listRsparse_blocks  181.9815  183.4725  196.5011  184.8695
         rbind_modRsparse_blocks  181.3435  183.5542  184.4445  184.2028
             Rsparse_gurobi_only  170.1477  172.4180  173.4431  173.0580
        uq       max neval
 2563.6260 2657.1050    20
  873.0159  979.1753    20
  193.5336  304.1660    20
  149.4263  152.1581    20
  197.3839  199.1315    20
  188.7764  191.4906    20
  186.8797  300.5560    20
  185.4671  188.5675    20
  174.2836  176.3394    20

Keep track of the i,j,x parts using lists of vectors rather than Matrix objects, then call sparseMatrix only once, in order to reduce the amount of checking performed, and to prevent unintended conversions between Tsparse and Csparse forms.

When `tau_groups` has runs (duplicate values that appear consecutively), use only `p` variables to represent the coefficients within each run. Remove noncrossing constraints within the runs when one criterion (nonneg is TRUE and q0 doesn't contain crossings) indicates that removing these contraints should not affect the output. This approach may make quantile_ensemble_flex's `model` look the same or similar to `quantile_ensemble_stand`'s, and `quantile_ensemble_stand` might be sped up by delegating to / post-processing `quantile_ensemble_flex` to take advantage of the A matrix formation changes.

ryantibs · 2021-07-23T01:13:46Z

@lcbrooks This looks wonderful. Great job and thank you!

One thing: please add yourself as a package author!

Another thing: I forget what's the easiest way for me to see the results of the tests. I can see the code you've added but can you remind what's the easiest way for me to look at the test outputs (the checks that the old code and the new code match up)? Thanks.

Update `sparseMatrix` call to use `repr = "T"` rather than deprecated equivalent `giveCsparse=FALSE`. Also format adjacent line.

brookslogan · 2021-07-24T01:51:59Z

Thanks!

I updated the DESCRIPTION above (is "aut" the right "role"?).

For running tests, use devtools::test(), Ctrl/Cmd-Shift-T in RStudio, or C-c C-w C-t in Emacs with ESS. Ideally, GitHub Actions would run these tests on everything automatically as well, but I'm haven't worked with these before... maybe usethis::use_github_action_check_standard() or usethis::use_github_action_check_full() would take care of the setup?

ryantibs · 2021-07-25T04:17:42Z

Yep "aut" is the right role.

I just ran devtools::test() and get:

> devtools::test()
Loading quantgen
Testing quantgen
✔ |  OK F W S | Context
✖ |   2 1     | characterization-quantile_ensemble [0.5 s]                      
────────────────────────────────────────────────────────────────────────────────
Error (test-characterization-quantile_ensemble.R:77:3): quantile_ensemble's A matrix (for Rgplk) encodes identical entries to previous version and that the output is equal
Error: unused argument (repr = "T")
Backtrace:
 1. withr::with_rng_version(...) test-characterization-quantile_ensemble.R:77:2
 4. quantgen::quantile_ensemble(...) test-characterization-quantile_ensemble.R:138:10
 5. quantgen::quantile_ensemble_flex(...) /Users/ryantibshirani/Dropbox/quantgen/R-package/quantgen/R/quantile_ensemble.R:134:4
────────────────────────────────────────────────────────────────────────────────

══ Results ═════════════════════════════════════════════════════════════════════
Duration: 0.6 s

[ FAIL 1 | WARN 0 | SKIP 0 | PASS 2 ]

I'm not sure if the only error is due to the last commit (use of repr = "T" ... by the way did you really mean to put quotations as in "T" here? Just checking because T, without quotations, is encoded as TRUE in R.). Please fix the errors? Thanks!

Context: In the `Matrix` package v1.3-0, `sparseMatrix` was [updated](https://cran.r-project.org/web/packages/Matrix/news.html) with a new argument `repr` and deprecation of `giveCsparse=FALSE`. Commit 9f8fa00 avoided deprecation warnings when using `Matrix` package versions >= 1.3-0, but generated errors with versions < 1.3-0. Action: Avoid both errors for earlier versions and deprecation warnings for later versions by constructing Tsparse matrices by directly constructing an appropriate S4 object.

brookslogan · 2021-07-26T16:30:59Z

Thanks for catching this! It should be fixed now. See the commit message here / above for context. (I did mean "T" rather than T, but trying to use the argument repr at all broke things when using Matrix package versions prior to 1.3-0.)

ryantibs · 2021-07-30T13:59:30Z

Sorry for the delay here. I'm happy to merge this now (reran tests and all 87 passed!).

@elray1 You might consider running this for your use case now and see how fast it is? (No pressure either way, but just curious to see if it's competitive with your Python code now.)

elray1 and others added 4 commits February 17, 2021 16:12

build A matrix for quantile_ensemble_flex by combining small pieces

0845ac4

Add quantile_ensemble characterization tests

d5ce940

This was referenced Jul 21, 2021

Eliminate duplication between quantile_ensemble_stand and quantile_ensemble_flex #9

Open

Benchmark and maybe speed up quantile regression LP matrix formation #10

Open

lcbrooks added 2 commits July 23, 2021 19:42

Add Logan Brooks to Authors@R

7122569

Use repr = "T" instead of deprecated giveCsparse=FALSE

9f8fa00

Update `sparseMatrix` call to use `repr = "T"` rather than deprecated equivalent `giveCsparse=FALSE`. Also format adjacent line.

lcbrooks added 2 commits July 26, 2021 11:22

Add checks on tau in quantile_ensemble_flex

42e269c

brookslogan force-pushed the ensemble-speed-up-A-matrix branch from 0174dac to 42e269c Compare July 26, 2021 16:22

ryantibs merged commit 99e9d7b into ryantibs:master Jul 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up `quantile_ensemble_flex` by forming `model$A` with one `sparseMatrix` call and specially treating `tau_group` runs #8

Speed up `quantile_ensemble_flex` by forming `model$A` with one `sparseMatrix` call and specially treating `tau_group` runs #8

brookslogan commented Jul 21, 2021

ryantibs commented Jul 23, 2021

brookslogan commented Jul 24, 2021

ryantibs commented Jul 25, 2021

brookslogan commented Jul 26, 2021

ryantibs commented Jul 30, 2021

Speed up quantile_ensemble_flex by forming model$A with one sparseMatrix call and specially treating tau_group runs #8

Speed up quantile_ensemble_flex by forming model$A with one sparseMatrix call and specially treating tau_group runs #8

Conversation

brookslogan commented Jul 21, 2021

ryantibs commented Jul 23, 2021

brookslogan commented Jul 24, 2021

ryantibs commented Jul 25, 2021

brookslogan commented Jul 26, 2021

ryantibs commented Jul 30, 2021

Speed up `quantile_ensemble_flex` by forming `model$A` with one `sparseMatrix` call and specially treating `tau_group` runs #8

Speed up `quantile_ensemble_flex` by forming `model$A` with one `sparseMatrix` call and specially treating `tau_group` runs #8