-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[r] Add iterator classes #1274
[r] Add iterator classes #1274
Conversation
Codecov ReportPatch coverage has no change and project coverage change:
❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more. Additional details and impacted files@@ Coverage Diff @@
## main #1274 +/- ##
==========================================
- Coverage 52.13% 51.81% -0.33%
==========================================
Files 68 73 +5
Lines 5726 5690 -36
==========================================
- Hits 2985 2948 -37
- Misses 2741 2742 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@pablo-gar can you talk me through the need for arithmetic ops on matrixZeroBasedView? I had a rationale for supporting only essential operations on it, namely that this forces user to explicitly use I'm open to the possibilities that (i) the zero-based arithmetic ops are essential, or (ii) I should give up on that rationale -- just would like to hear more to either end. |
@pablo-gar: On 'git checkout pablo-gar/r_read_iterators; ./cleanup; R CMD INSTALL .` I hit an immediate speed bump installing to /usr/local/lib/R/site-library/00LOCK-r/00new/tiledbsoma/libs
** R
** inst
** byte-compile and prepare package for lazy loading
** help
Error : EphemeralCollection.Rd:7: undefined exports: DenseReadIter
ERROR: installing Rd objects failed for package ‘tiledbsoma’
* removing ‘/usr/local/lib/R/site-library/tiledbsoma’
* restoring previous ‘/usr/local/lib/R/site-library/tiledbsoma’
Warning message:
In install.packages(pkgs = f, lib = lib, repos = if (isMatchingFile(f)) NULL else repos) :
installation of package ‘.’ had non-zero exit status
edd@rob:~/git/tiledb-soma/apis/r(pablo-gar/r_read_iterators)$ Also: edd@rob:~/git/tiledb-soma/apis/r(pablo-gar/r_read_iterators)$ ag -l DenseReadIter
man/DenseReadIter.Rd
NAMESPACE
edd@rob:~/git/tiledb-soma/apis/r(pablo-gar/r_read_iterators)$ Did you forget to add a file? I see this in your TODO list so maybe your use in documentation 'got ahead' ? |
Easy enough to fix via edd@rob:~/git/tiledb-soma/apis/r(pablo-gar/r_read_iterators)$ git diff
diff --git a/apis/r/NAMESPACE b/apis/r/NAMESPACE
index 76adcc03..9dfd9b77 100644
--- a/apis/r/NAMESPACE
+++ b/apis/r/NAMESPACE
@@ -22,7 +22,7 @@ S3method(write_soma,TsparseMatrix)
S3method(write_soma,data.frame)
S3method(write_soma,matrix)
export(ConfigList)
-export(DenseReadIter)
+#export(DenseReadIter)
export(EphemeralCollection)
export(EphemeralExperiment)
export(EphemeralMeasurement)
edd@rob:~/git/tiledb-soma/apis/r(pablo-gar/r_read_iterators)$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a large PR, and it currently borks tests so no way to approve it yet -- but I like it!
I am wondering if we should simplify and reduce its scope a little and take e.g. the changes to @mlin 's zero-based view out into a different PR ?
The addition of the iterator looks good (in principle) to me and does deliver what it set out to do.
apis/r/R/SparseReadIter.R
Outdated
# Get max soma dims for indeces via tiledb | ||
tiledb_array <- tiledb::tiledb_array(uri) | ||
tiledb::tiledb_array_open(tiledb_array, type = "READ") | ||
max_soma_dim_0 <- as.integer(max(tiledb::tiledb_array_get_non_empty_domain_from_index(tiledb_array, 1))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert this is a 32-byte AND
get max_soma_dom_0 - min_soma_dim_0 as the worst-case cardinality
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
obsolete comment
Yes apologies! It was the SOMADenseArray iterator that I did not include in the commit, which we will not implement in any case. Will fix! |
Not all of them are essential, only addition is essential to be able to seemingly concatenate chunks of the iterator. A user can quickly do TileDB-SOMA/apis/r/R/SparseReadIter.R Line 62 in 4534226
I can reduce the scope to only support iteration, but I think it leads to a poor user experience to not allow for the others given the relatively easy lift. |
apis/r/R/SOMASparseNDArray.R
Outdated
dims <- self$dimensions() | ||
attr <- self$attributes() | ||
shape <- self$shape() | ||
|
||
stopifnot("'repr' must be a sinlge character string" = length(repr) == 1 | mode(repr) == "character", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: 'single'
|
||
#' @description Concatenate remainder of iterator | ||
# to be refined in derived classes | ||
concat = function() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the very belated review -- @aaronwolen asked me to help review, and I'm happy to -- I believe my review is speaking for the both of us
There are definitely multiple ways to go here. However, given that on the Python side we have a single read
, options of .tables
, etc., and concat
-- e.g. sdf.read().tables().concat()
-- we should do the same here. It won't be ideal for everyone but it will work, it will parallel the Python implementation, and we can add some keystroke-saving syntactic-sugar functions on top of these later if we want.
Let's get this concat
going, and get rid of the iterated
argument to read
. Then read
will always return an iterator (low complexity), and tables()
et al. will transform an iterator of one type to an iterator of another, and concat
will have the single job of doing concatenation.
Note that apis/r/R/utils-readerTransformers.R
on this PR already has reader-transformers, so (I think and hope) it should be easy to connect them as methods on the iterator objects.
Likewise, concat()
exists in non-stub form elsewhere on this PR and should be able to be connected as a method on the iterator objects.
If read_next
remains, it needs to be remain as a helper method, not as public API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed with everything and I will try to get this done tonight.
If read_next remains, it needs to be remain as a helper method, not as public API.
@johnkerl I'm not sure that the suggestion here is? Unfortunately in R there's no generic next()
function, so I'm not sure how we could give users the ability to iterate themselves unless we:
- make
read_next
a public method - OR we create function
next()
that calls internallyread_next
The second one is a little less intuitive imo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
read will always return an iterator
I'm not sure I understand the benefit to this. While R does have package that implements iterators, it isn't widely used (or at all in the single-cell world). Because iterators don't exist in R, all this does is add complexity to R users. While I understand the desire to make the R API to act like the Python API, we do need to keep in mind that R is not Python and there are things that Python handles natively that just don't work as well in R
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the key is here:
we can add some keystroke-saving syntactic-sugar functions on top of these later if we want
and imo it's not a "if we want" but more a "we need to". I think we can keep the low-level functionality in the R6 level as proposed and then we provide wrapper functions like as_arrow_table(soma_obj)
see here, as(soma_obj, "dgCMatrix")
, as.data.frame(soma_obj)
, etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@johnkerl I have addressed your comments with the exception of:
If read_next remains, it needs to be remain as a helper method, not as public API.
I also updated the usage section of top-level comment. The branch still needs to merge main but wanted to get your early thoughts on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pablo-gar looking this morning -- thank you! :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pablo-gar sorry to confuse.
This looks good to me -- thank you!! :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! I will update the docs and merge main into the PR branch
|
||
#' @description Concatenate remainder of iterator | ||
# to be refined in derived classes | ||
concat = function() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pablo-gar sorry to confuse.
This looks good to me -- thank you!! :)
Summary
address #1285 and #1348
Matrix::DenseMatrix
SOMASparseNDArray$shape()
Notes for reviewers
@eddelbuettel @aaronwolen Please review changes to:
SOMADataFrame
SOMASparseNDArray
SOMADenseNDArray
SOMAArrayBase
ReadIter
SparseReadIter
TableReadIter
utils-readerTransformers
@mlin please review changes to:
SOMASparseNDArray
and in particular$read_sparse_matrix_zero_based
.utils-readerTransformers
and in particulararrow_table_to_sparse
utils-matrixZeroBasedView
Usage example (May 24)