Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scrapper #3536

Closed
10 tasks done
LTLA opened this issue Sep 10, 2024 · 30 comments
Closed
10 tasks done

scrapper #3536

LTLA opened this issue Sep 10, 2024 · 30 comments
Assignees
Labels
3a. accepted will be ingested into Bioconductor daily builder for distribution OK

Comments

@LTLA
Copy link

LTLA commented Sep 10, 2024

Update the following URL to point to the GitHub repository of
the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

  • I understand that by submitting my package to Bioconductor,
    the package source and all review commentary are visible to the
    general public.

  • I have read the Bioconductor Package Submission
    instructions. My package is consistent with the Bioconductor
    Package Guidelines.

  • I understand Bioconductor Package Naming Policy and acknowledge
    Bioconductor may retain use of package name.

  • I understand that a minimum requirement for package acceptance
    is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS.
    Passing these checks does not result in automatic acceptance. The
    package will then undergo a formal review and recommendations for
    acceptance regarding other Bioconductor standards will be addressed.

  • My package addresses statistical or bioinformatic issues related
    to the analysis and comprehension of high throughput genomic data.

  • I am committed to the long-term maintenance of my package. This
    includes monitoring the support site for issues that users may
    have, subscribing to the bioc-devel mailing list to stay aware
    of developments in the Bioconductor community, responding promptly
    to requests for updates from the Core team in response to changes in
    R or underlying software.

  • I am familiar with the Bioconductor code of conduct and
    agree to abide by it.

I am familiar with the essential aspects of Bioconductor software
management, including:

  • The 'devel' branch for new packages and features.
  • The stable 'release' branch, made available every six
    months, for bug fixes.
  • Bioconductor version control using Git
    (optionally via GitHub).

For questions/help about the submission process, including questions about
the output of the automatic reports generated by the SPB (Single Package
Builder), please use the #package-submission channel of our Community Slack.
Follow the link on the home page of the Bioconductor website to sign up.

@bioc-issue-bot
Copy link
Collaborator

Hi @LTLA

Thanks for submitting your package. We are taking a quick
look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: scrapper
Version: 0.99.0
Date: 2024-09-09
Authors@R: person("Aaron", "Lun", role=c("cre", "aut"), email="infinite.monkeys.with.keyboards@gmail.com")
Title: Bindings to C++ Libraries for Single-Cell Analysis
Description: 
    Implements R bindings to C++ code for analyzing single-cell (expression) data, mostly from various libscran libraries. 
    Each function performs an individual step in the single-cell analysis workflow, ranging from quality control to clustering and marker detection.
    It is mostly intended for other Bioconductor package developers to build more user-friendly end-to-end workflows.
License: MIT + file LICENSE
Imports:
    methods,
    Rcpp,
    beachmat,
    DelayedArray,
    BiocNeighbors,
    parallel
Suggests:
    testthat,
    knitr,
    rmarkdown,
    BiocStyle,
    MatrixGenerics,
    igraph,
    Matrix,
    scRNAseq
LinkingTo:
    Rcpp,
    assorthead,
    beachmat,
    BiocNeighbors
biocViews: 
    Normalization, 
    RNASeq, 
    Software, 
    GeneExpression,
    Transcriptomics, 
    SingleCell, 
    BatchEffect, 
    QualityControl,
    DifferentialExpression,
    FeatureExtraction,
    PrincipalComponent,
    Clustering
VignetteBuilder: knitr
Encoding: UTF-8
RoxygenNote: 7.3.2

@bioc-issue-bot bioc-issue-bot added the 1. awaiting moderation submitted and waiting clearance to access resources label Sep 10, 2024
@lshep lshep added the pre-check passed pre-review performed and ready to be added to git label Sep 17, 2024
@bioc-issue-bot
Copy link
Collaborator

Your package has been added to git.bioconductor.org to continue the
pre-review process. A build report will be posted shortly. Please
fix any ERROR and WARNING in the build report before a reviewer is
assigned or provide a justification on why you feel the ERROR or
WARNING should be granted an exception.

IMPORTANT: Please read this documentation for setting
up remotes to push to git.bioconductor.org. All changes should be
pushed to git.bioconductor.org moving forward. It is required to push a
version bump to git.bioconductor.org to trigger a new build report.

Bioconductor utilized your github ssh-keys for git.bioconductor.org
access. To manage keys and future access you may want to active your
Bioconductor Git Credentials Account

@bioc-issue-bot bioc-issue-bot added pre-review on bioconductor git and access to on demand build but not assigned reviewer until build report clean and removed 1. awaiting moderation submitted and waiting clearance to access resources pre-check passed pre-review performed and ready to be added to git labels Sep 17, 2024
@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

On one or more platforms, the build results were: "skipped, ERROR".
This may mean there is a problem with the package that you need to fix.
Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder:
ERROR before build products produced.

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/scrapper to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.

@LTLA
Copy link
Author

LTLA commented Sep 17, 2024

Do we have cmake installed on the build machines?

@lshep
Copy link
Contributor

lshep commented Sep 17, 2024

We'll check on the broken cmake on nebbiolo2. Please make sure to list this system dependency in the DESCRIPTION as a SystemRequirement and provide the accompanied INSTALL file. See system requirements. It will make end user use of this package more tricky having this requirement; is it needed? Also we are transitioning builders for now please use teran2 as the report to clean up; it seems to have picked up the cmake there but is failing for other reasons.

@LTLA
Copy link
Author

LTLA commented Sep 17, 2024

Please make sure to list this system dependency in the DESCRIPTION as a SystemRequirement and provide the accompanied INSTALL file. See system requirements.

I've never heard of this INSTALL file before. Your link provides no information on the expected format. WRE mentions this but does not specify the format.

It will make end user use of this package more tricky having this requirement; is it needed?

Depends on how seriously we want consistency between languages.

If not, we can ditch the entire CMake dependency and use the existing igraph R package.

Otherwise, we need this to guarantee we're using the intended version of the igraph C library with the same seed, as the R package doesn't expose the C-level seed setters.

I suppose I technically don't need CMake, as I could just extract the small parts of the igraph C library that I actually do need. This would require some modest reverse engineering. Edit: just tried and gave up, too difficult.

Also we are transitioning builders for now please use teran2 as the report to clean up; it seems to have picked up the cmake there but is failing for other reasons.

Forgot to push an update to assorthead to BioC-devel, this should resolve soon enough.

@vjcitn
Copy link
Collaborator

vjcitn commented Sep 19, 2024

We discussed this part of configure process

trying URL 'https://github.com/igraph/igraph/releases/download/0.10.13/igraph-0.10.13.tar.gz'

in core and would prefer that igraph (C) be regarded as a system dependency. Package installation code depending on retrievals from github are not allowed.

@vjcitn
Copy link
Collaborator

vjcitn commented Sep 19, 2024

I had a clean install of scrapper on a linux box only after updating BiocNeighbors and beachmat. My laptop with ccache would not build it.

@LTLA
Copy link
Author

LTLA commented Sep 19, 2024

Hmm.

I suppose we could create an Rlibigraph package in the same vein as Rhdf5lib, allowing downstream packages like scrapper to link to the static libigraph library. I have also begun working on https://github.com/LTLA/biocmake/ that would eliminate the need for a user-managed SystemRequirements: cmake in BioC packages. I probably won't have enough time to get this into the upcoming release, though.

In the meantime, I could just switch to relying purely on the igraph R package. This would not give consistent results across languages due to differences in seeds/versions, but perhaps that's not so big an issue - yet. My real concern with relying on CRAN packages is that updates could change my package's results outside of the usual release schedule; this is a contributing factor to the fragility of the book builds, for example.

In general, scrapper's development policy - all the way down to the C++ code - has been to eliminate as many dependencies as possible, especially those that cannot guarantee back-compatible results. Hence the vendoring of C++ libraries in assorthead that are ostensibly provided by other CRAN packages, or the complete rewrite of Rtsne, uwot and irlba in C++. Using the igraph R package would be a (hopefully temporary) exception to this rule.

Anyway, while it might not be relevant in the next push attempt, I'd still like to know why your ccache build failed. I semi-regularly use ccache and it hasn't been an issue.

@bioc-issue-bot
Copy link
Collaborator

Received a valid push on git.bioconductor.org; starting a build for commit id: ca6019cb8d7f8970521329b40784ddbd28b4e850

@LTLA
Copy link
Author

LTLA commented Sep 20, 2024

Did this end up building?

@vjcitn
Copy link
Collaborator

vjcitn commented Sep 20, 2024

probably not i will try to sign on and diagnose later tonite

@LTLA
Copy link
Author

LTLA commented Sep 20, 2024

FWIW the igraph problem can be solved in the next dev cycle with:

This won't affect the current submission for scrapper but just FYI.

@vjcitn
Copy link
Collaborator

vjcitn commented Sep 21, 2024

I was able to build scrapper on our provisional SPB (teran2). My hypothesis on why it seemed to hang on your push: the system had no ExperimentHub cache and hung on the scRNAseq requests. @lshep is that plausible? As an isolated user, once I authorized creation of the cache, the vignette code succeeded. It is important to be using the very latest checkouts of BiocNeighbors and beachmat, otherwise there are header-related errors. I wonder if you should use >= version specs for those imports.

@vjcitn
Copy link
Collaborator

vjcitn commented Sep 21, 2024

CMD check produced

── Error ('test-modelGeneVariances.R:10:5'): modelGeneVariances works without blocking ──
Error in `MatrixGenerics:::.load_next_suggested_package_to_search(x)`: Failed to find a rowVars() method for dgCMatrix objects.
  However, the following packages are likely to contain the missing method
  but are not installed: sparseMatrixStats, DelayedMatrixStats.
  Please install them (with 'BiocManager::install(...)') and try again.
  Alternatively, if you know where the missing method is defined, install
  only that package.
Backtrace:
    ▆
 1. ├─testthat::expect_equal(out$variances, rowVars(x)) at test-modelGeneVariances.R:10:5
 2. │ └─testthat::quasi_label(enquo(expected), expected.label, arg = "expected")
 3. │   └─rlang::eval_bare(expr, quo_get_env(quo))
 4. ├─MatrixGenerics::rowVars(x)
 5. └─MatrixGenerics::rowVars(x)
 6.   └─MatrixGenerics:::.load_next_suggested_package_to_search(x)
── Error ('test-modelGeneVariances.R:36:9'): modelGeneVariances works with blocking ──
Error in `MatrixGenerics:::.load_next_suggested_package_to_search(x)`: Failed to find a rowVars() method for dgCMatrix objects.
  However, the following packages are likely to contain the missing method
  but are not installed: sparseMatrixStats, DelayedMatrixStats.
  Please install them (with 'BiocManager::install(...)') and try again.
  Alternatively, if you know where the missing method is defined, install
  only that package.
                                            

@vjcitn
Copy link
Collaborator

vjcitn commented Sep 21, 2024

Once those were installed it passed with one note.

@bioc-issue-bot
Copy link
Collaborator

Received a valid push on git.bioconductor.org; starting a build for commit id: cff86319c3c297b9cd3048f223e0cb3b076ce9b1

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

Congratulations! The package built without errors or warnings
on all platforms.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder:
Linux (Ubuntu 22.04.3 LTS): scrapper_0.99.2.tar.gz
Linux (Ubuntu 24.04.1 LTS): scrapper_0.99.2.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/scrapper to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.

@bioc-issue-bot bioc-issue-bot added OK and removed ERROR labels Sep 21, 2024
@lshep lshep added 2. review in progress assign a reviewer and a more thorough review of package code and documentation taking place and removed pre-review on bioconductor git and access to on demand build but not assigned reviewer until build report clean labels Sep 23, 2024
@bioc-issue-bot
Copy link
Collaborator

A reviewer has been assigned to your package for an indepth review.
Please respond accordingly to any further comments from the reviewer.

@vjcitn
Copy link
Collaborator

vjcitn commented Oct 1, 2024

@hpages have you had a look?

@hpages
Copy link
Contributor

hpages commented Oct 2, 2024

Hi @LTLA,

libscran/scrapper@ca6019c <-- Thanks for that!

My real concern with relying on CRAN packages is that updates could change my package's results outside of the usual release schedule; this is a contributing factor to the fragility of the book builds, for example.

Yep, that definitely brings back bad memories.

Anyways, like @vjcitn, I'm not a big fan of the approach that consists in pulling external GitHub assets during installation. Your Rlibigraph proposal sounds like the way to go here.

Taking a look at scrapper now...

@hpages
Copy link
Contributor

hpages commented Oct 2, 2024

Not sure what's going on here but scrapper redefines aggregateAcrossCells() which is already defined as a generic function in package scuttle and as a regular function in package epiregulon. Very confusing given that you are also a co-author or/and maintainer of the latter two packages.

It also redefines normalizeCounts() which is already defined as a regular function in packages epigraHMM, tweeDEseq, MDTS, celda, STdeconvolve, and as a generic function (with various methods) in package scuttle. That's a lot of normalizeCounts() functions!

Same for scoreMarkers(): it's already defined as a generic function in package scran but redefined as a regular function in scrapper.

Not really satisfying for an ecosystem like Bioconductor where we put such a great emphasis on coordination and harmonization.

What am I missing? Is there a plan to tidy things a little?

Other than that, the package is good to go.

@LTLA
Copy link
Author

LTLA commented Oct 2, 2024

Indeed, the plan is to slowly shift users onto scrapper, and then eventually retire some of the older packages. More specifically, the plan with @lgeistlinger was to develop two packages; scrapper provides the underlying bare-bones utilities, while a second package would provide all of the user-friendly bells and whistles (e.g., plotting, one-click workflows, integration with other BioC data structures and packages) for actual end users and the book. I'm not sure where that plan is at, given that we didn't get any money to do it, but scrapper represents my end of the deal.

@hpages
Copy link
Contributor

hpages commented Oct 2, 2024

Thanks for clarifying. Glad to hear there's a plan.

Package accepted.

@hpages hpages closed this as completed Oct 2, 2024
@bioc-issue-bot bioc-issue-bot removed the 2. review in progress assign a reviewer and a more thorough review of package code and documentation taking place label Oct 2, 2024
@hpages hpages added the 3a. accepted will be ingested into Bioconductor daily builder for distribution label Oct 2, 2024
@bioc-issue-bot
Copy link
Collaborator

Your package has been accepted. It will be added to the
Bioconductor nightly builds.

Thank you for contributing to Bioconductor!

Reviewers for Bioconductor packages are volunteers from the Bioconductor
community. If you are interested in becoming a Bioconductor package
reviewer, please see Reviewers Expectations.

@hpages hpages reopened this Oct 2, 2024
@bioc-issue-bot
Copy link
Collaborator

Dear @LTLA ,

We have reopened the issue to continue the review process.
Please remember to push a version bump to git.bioconductor.org
to trigger a new build.

@bioc-issue-bot bioc-issue-bot added the pre-review on bioconductor git and access to on demand build but not assigned reviewer until build report clean label Oct 2, 2024
@hpages hpages removed the pre-review on bioconductor git and access to on demand build but not assigned reviewer until build report clean label Oct 2, 2024
@hpages
Copy link
Contributor

hpages commented Oct 2, 2024

My mistake, I closed this inadvertendly. Reopened now. Sorry for the confusion.

@lshep
Copy link
Contributor

lshep commented Oct 7, 2024

The default branch of your GitHub repository has been added to Bioconductor's
git repository as branch devel.

To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/LTLA.keys is not empty), then no further steps are required. Otherwise, do the following:

  1. Add an SSH key to your github account
  2. Submit your SSH key to Bioconductor

See further instructions at

https://bioconductor.org/developers/how-to/git/

for working with this repository. See especially

https://bioconductor.org/developers/how-to/git/new-package-workflow/
https://bioconductor.org/developers/how-to/git/sync-existing-repositories/

to keep your GitHub and Bioconductor repositories in sync.

Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at

https://bioconductor.org/checkResults/

(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using BiocManager::install("scrapper"). The package 'landing page' will be created at

https://bioconductor.org/packages/scrapper

If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.

@lshep lshep closed this as completed Oct 7, 2024
@DarioS
Copy link

DarioS commented Oct 20, 2024

Ah, bummer about the funding. It takes a couple of steps for the basics e.g. t(t(features[["sums"]]) / features[["counts"]]).

@LTLA
Copy link
Author

LTLA commented Oct 22, 2024

Well, I suppose you could always start putting something together, and see if any of the interested parties (mostly book authors) would be interested in helping out. I don't see any money on the horizon, so if it's going to be a volunteer effort anyway, one might as well start now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3a. accepted will be ingested into Bioconductor daily builder for distribution OK
Projects
None yet
Development

No branches or pull requests

6 participants