Skip to content

Commit

Permalink
[cicd] test adding documents
Browse files Browse the repository at this point in the history
  • Loading branch information
immanuelazn committed Oct 28, 2024
1 parent ec9fa0e commit d378044
Show file tree
Hide file tree
Showing 3 changed files with 34 additions and 1 deletion.
1 change: 1 addition & 0 deletions r/pkgdown/_pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ articles:
- "web-only/programming-efficiency"
- "web-only/programming-philosophy"
- "web-only/developer-notes"
- "web-only/test_extra_document"

# cosmo, flatly, united, sandstone all look reasonable
# pulse, lumen, zephyr
Expand Down
2 changes: 1 addition & 1 deletion r/vignettes/web-only/benchmarks.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ output:
Note: these performance benchmarks are preliminary while our manuscript is in
preparation, though every effort has been made to present a fair comparison with
other tools. Fortunately, it is straightforward to install and test BPCells on
your own dataset to replicate our claims.
your own dataset to replicate our claims. testing.

## RNA-seq normalization + PCA
Because BPCells can perform all operations streaming from disk, it is able to
Expand Down
32 changes: 32 additions & 0 deletions r/vignettes/web-only/test_extra_document.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
---
title: "Test extra doc"
output:
html_document:
toc: true
toc_depth: 2
toc_float: true
theme: simplex
---

## Normalizations and PCA
- Avoid dense matrices whenever possible. Put normalizations
that preserve sparsity (0 values stay 0) before normalizations
that break sparsity (e.g. adding values to each row/column).
A typical RNA-seq matrix has <5% non-zero entries, so your
code will operate on 20x more entries with a dense matrix.

- For most operations, we recommend using lazy evaluation to avoid
creating intermediate matrices. The one common exception to this
rule is when running PCA. Because PCA requires looping through the
matrix several hundred times, it is often faster to write the matrix to
disk once just before PCA rather than recalculating the entries on each
PCA iteration.
- For storage efficiency, keep any sparsity-breaking normalizations
delayed, but store all the sparse normalizations in a temporary
location with `write_matrix_dir()` then apply the sparsity-breaking
normalizations

- Adding values to the rows/columns of a matrix has very little overhead for
PCA because it translates into a pre or post processing step before each
mat-vec multiply iteration. As a sparsity-breaking operation, adding a vector
to the matrix causes most other operations to become more expensive, however.

0 comments on commit d378044

Please sign in to comment.