[cicd] test adding documents

immanuelazn · Oct 28, 2024 · d378044 · d378044
1 parent ec9fa0e
commit d378044
Show file tree

Hide file tree

Showing 3 changed files with 34 additions and 1 deletion.
diff --git a/r/pkgdown/_pkgdown.yml b/r/pkgdown/_pkgdown.yml
@@ -50,6 +50,7 @@ articles:
   - "web-only/programming-efficiency"
   - "web-only/programming-philosophy"
   - "web-only/developer-notes"
+  - "web-only/test_extra_document"
 
 # cosmo, flatly, united, sandstone all look reasonable
 # pulse, lumen, zephyr

diff --git a/r/vignettes/web-only/benchmarks.Rmd b/r/vignettes/web-only/benchmarks.Rmd
@@ -10,7 +10,7 @@ output:
 Note: these performance benchmarks are preliminary while our manuscript is in
 preparation, though every effort has been made to present a fair comparison with
 other tools. Fortunately, it is straightforward to install and test BPCells on
-your own dataset to replicate our claims.
+your own dataset to replicate our claims. testing.
 
 ## RNA-seq normalization + PCA
 Because BPCells can perform all operations streaming from disk, it is able to

diff --git a/r/vignettes/web-only/test_extra_document.Rmd b/r/vignettes/web-only/test_extra_document.Rmd
@@ -0,0 +1,32 @@
+---
+title: "Test extra doc"
+output: 
+  html_document:
+    toc: true
+    toc_depth: 2
+    toc_float: true
+    theme: simplex
+---
+
+## Normalizations and PCA
+- Avoid dense matrices whenever possible. Put normalizations
+  that preserve sparsity (0 values stay 0) before normalizations
+  that break sparsity (e.g. adding values to each row/column).
+  A typical RNA-seq matrix has <5% non-zero entries, so your
+  code will operate on 20x more entries with a dense matrix.
+
+- For most operations, we recommend using lazy evaluation to avoid
+  creating intermediate matrices. The one common exception to this
+  rule is when running PCA. Because PCA requires looping through the
+  matrix several hundred times, it is often faster to write the matrix to 
+  disk once just before PCA rather than recalculating the entries on each
+  PCA iteration.
+    - For storage efficiency, keep any sparsity-breaking normalizations
+      delayed, but store all the sparse normalizations in a temporary
+      location with `write_matrix_dir()` then apply the sparsity-breaking
+      normalizations
+
+- Adding values to the rows/columns of a matrix has very little overhead for
+  PCA because it translates into a pre or post processing step before each
+  mat-vec multiply iteration. As a sparsity-breaking operation, adding a vector
+  to the matrix causes most other operations to become more expensive, however.