submission: taxadb #344

karinorman · 2019-09-30T17:22:57Z

Submitting Author: Kari Norman (@karinorman)
Repository: https://github.com/cboettig/taxadb
Version submitted: v1.0.0
Editor: @ldecicco-USGS (Guest Editor)
Reviewer 1: @mcsiple
Reviewer 2: @lindsayplatt
Archive: TBD
Version accepted: TBD

Paste the full DESCRIPTION file inside a code block below:

Package: taxadb
Version: 0.0.1
Title: A High-Performance Local Taxonomic Database Interface
Description: Creates a local database of many commonly used taxonomic authorities
             and provides functions that can quickly query this data.  
Authors@R: c(
            person("Carl", "Boettiger", , "cboettig@gmail.com", c("aut", "cre"),
                   comment = c(ORCID = "0000-0002-1642-628X")),
            person("Kari", "Norman", role= "aut",
                   comment = c(ORCID = "0000-0002-2029-2325")),       
            person("Jorrit", "Poelen", role ="aut", 
                   comment = c(ORCID="0000-0003-3138-4118")),
            person("Scott", "Chamberlain", role ="aut", 
                   comment = c(ORCID="0000-0003-1444-9135")),
            person("Noam", "Ross", , "ross@ecohealthalliance.org", role ="ctb",
                   comment = c(ORCID = "0000-0002-2136-0000"))
            )
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
ByteCompile: true
Depends: R (>= 2.10)
Imports:
    dplyr,
    DBI,
    arkdb,
    tibble,
    memoise,
    readr,
    rappdirs,
    rlang,
    magrittr,
    stringi,
    progress,
    utils,
    dbplyr
Roxygen: list(markdown = TRUE)
RoxygenNote: 6.1.1
Suggests:
    spelling,
    testthat,
    covr,
    knitr,
    rmarkdown,
    purrr,
    RSQLite,
    MonetDBLite
Language: en-US
VignetteBuilder: knitr
URL: https://github.com/cboettig/taxadb
BugReports: https://github.com/cboettig/taxadb/issues
Remotes: MonetDB/MonetDBLite-R, cwida/duckdb/tools/rpkg

Scope

Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
- data retrieval
- data extraction
- database access
- data munging
- data deposition
- reproducibility
- geospatial data
- text analysis
Explain how and why the package falls under these categories (briefly, 1-2 sentences):

The package accesses existing data sets, adapts them to a standard format, and provides a set of functions for accessing the resulting database.

Who is the target audience and what are scientific applications of this package?

Anyone using data including species names, especially for datasets that may have taxonomic inconsistencies.

Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category?

taxaize provides accesses to many of the same datasets via an API interface. Our package is specifically designed to work around the API to offer the faster access necessary for large datasets, as well as provide a standardized format for all datasets.

If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.

Technical checks

Confirm each of the following by checking the box. This package:

does not violate the Terms of Service of any service it interacts with.
has a CRAN and OSI accepted license.
contains a README with instructions for installing the development version.
includes documentation with examples for all functions.
contains a vignette with examples of its essential functions and uses.
has a test suite.
has continuous integration, including reporting of test coverage using services such as Travis CI, Coveralls and/or CodeCov.

Publication options

Do you intend for this package to go on CRAN?
Do you wish to automatically submit to the Journal of Open Source Software? If so:

JOSS Options

The package has an obvious research application according to JOSS's definition.
- The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
- The package is deposited in a long-term repository with the DOI:
- (Do not submit your package separately to JOSS)

Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:

MEE Options

The package is novel and will be of interest to the broad readership of the journal.
The manuscript describing the package is no longer than 3000 words.
You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
(Scope: Do consider MEE's Aims and Scope for your manuscript. We make no guarantee that your manuscript will be within MEE scope.)
(Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
(Please do not submit your package separately to Methods in Ecology and Evolution)

Code of conduct

I agree to abide by rOpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

The text was updated successfully, but these errors were encountered:

melvidoni · 2019-10-03T21:08:56Z

Hello @karinorman, your editor will be @ldecicco-USGS, acting as a Guest Editor for rOpenSci.

ldecicco-USGS · 2019-10-07T18:18:41Z

Editor checks:

Fit: The package meets criteria for fit and overlap
Automated tests: Package has a testing suite and is tested via Travis-CI or another CI service.
License: The package has a CRAN or OSI accepted license
Repository: The repository link resolves correctly
Archive (JOSS only, may be post-review): The repository DOI resolves correctly
Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

Editor comments

Hello @karinorman ...this is my first "guest edit"...so bear with me! The following is the output of goodpractice::gp(). I'm not terribly concerned about the messages, but they would be easy messages to get rid of, and then future pull request could use the gp() function from a clean slate.

── GP taxadb ──────────────────────────────────────────────────────

It is good practice to

  ✖ write unit tests for all functions, and all
    package code in general. 83% of code lines are covered
    by test cases.

    R/clean_names.R:42:NA
    R/clean_names.R:43:NA
    R/clean_names.R:44:NA
    R/clean_names.R:45:NA
    R/clean_names.R:46:NA
    ... and 60 more lines

  ✖ use '<-' for assignment instead of '='. '<-'
    is the standard, and R users and developers are used it
    and it is easier to read your code for them if you use
    '<-'.

    R\handling-duplicates.R:39:11
    R\mutate_db.R:56:24
    R\mutate_db.R:69:25

  ✖ avoid long code lines, it is bad for
    readability. Also, many people prefer editor windows
    that are about 80 characters wide. Try make your lines
    shorter than 80 characters

    inst\examples\docker.R:23:1
    R\by_common.R:4:1
    R\clean_names.R:3:1
    R\clean_names.R:20:1
    R\filter_by.R:12:1
    ... and 24 more lines

  ✖ avoid 1:length(...), 1:nrow(...),
    1:ncol(...), 1:NROW(...) and 1:NCOL(...) expressions.
    They are error prone and result 1:0 if the expression
    on the right hand side is zero. Use seq_len() or
    seq_along() instead.

    R\filter_by.R:60:50
    R\filter_by.R:65:49
    tests\testthat\test-mutate-db.R:16:19

──────────────────────────────────────────────────────────────

I'm sending email requests now to find 2 reviewers.

Reviewers: @mcsiple @lindsayplatt
Due date: October 28, 2019

ldecicco-USGS · 2019-10-09T15:50:55Z

Alright @karinorman ! We have 2 reviewers @mcsiple and @lindsayplatt. Please try to submit the reviews through the GitHub issue by Oct. 28. Let me know if you have any questions on the review process.

lindsayplatt · 2019-10-28T03:33:45Z

Hello! It was fun to review this package. I have specific comments within the ROpenSci checklist (they are noted as a quotation starting with my initials {LP}). I also added a section of comments near the end that didn't fit the checklist perfectly.

Best,
Lindsay

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

A statement of need clearly stating problems the software is designed to solve and its target audience in README
Installation instructions: for the development version of package and any non-standard dependencies in README
Vignette(s) demonstrating major functionality that runs successfully locally.

{LP} I see a vignettes folder on GitHub, but I don't see the vignettes when I go to the Packages tab and click on the package name or when I run vignette(package="taxadb"), I get no vignettes found. There seems to be an issue with finding the vignettes that are in the folder. It might be that you have them tucked under an additional folder (articles) rather than the .Rmd files just existing in the vignettes folder.

Function Documentation: for all exported functions in R help
Examples for all exported functions in R Help that run successfully locally.

{LP} mutate_db() & taxa_tbl() do not have examples. td_connect() and td_create() have examples but using functions called db_connect() and db_create(); perhaps outdated function names? All others have examples and they run locally.

Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

{LP} I do not see anything that mentions how to contribute to the package in either the README, in a file called CONTRIBUTING, or mentioned anywhere in the package files. Please see this part of the ROpenSci package documentation (https://devguide.ropensci.org/collaboration.html#friendlyfiles) for info on including a contribution section. The DESCRIPTION file has the required fields.

Have not heard if this is being submitted to JOSS, so assuming no and skipping the JOSS section of review.

Functionality

Installation: Installation succeeds as documented.
Functionality: Any functional claims of the software been confirmed.

{LP} Yes, connects to local DB of taxonomic data for querying consistently. Unknown information about the source of this data, which is useful in case you need to update frequently by running td_create("col"). You should site the "Catalogue of Life" somewhere, include links to documentation for that, and then tell the user how often (if at all) they should update that database by running td_create.

Performance: Any performance claims of the software been confirmed.

{LP} Since the database is available locally, it seems like this package could be faster than it currently is. Not bad right now though.

Automated tests: Unit tests cover essential functions of the package
and a reasonable range of inputs and conditions. All tests pass on the local machine.

{LP} I used devtools::check() - it completed with zero issues (I have Windows 10) and took 7.5 min. I also ran devtools::test() and that took 10 min with 0 failures, 5 warnings, and 2 skips. With covr::package_coverage(), I'm seeing low coverage for code within mutate_db.R and I also suggest that you add tests for quick_db() and has_tbl on their own. I ran goodpractice::gp() and only got one note - it was about test coverage (87% lines covered) with similar information about some missing lines as I noted previously. Finally, I ran spelling::spell_check_package() and spelling::spell_check_files("README.Rmd") with no real spelling errors.

Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.

{LP} Two things, both of which have been noted above. All else check out! One issue - vignettes are not available to the user since they aren't in the correct folder. One suggestion - watch the naming convention. Your by_* functions are a bit confusing and could be made clearer with more descriptive names.

Final approval (post-review)

The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 3 hrs as of 10/27/2019

Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

README Documentation

In the installation instructions, it's unclear which of these steps is required as part of the setup and which is an example. It seems like if the package needs dplyr and readr that they would be loaded when taxadb is loaded, rather than requiring additional library commands. You might want to explain why those are needed. Additionally, the local copy of the Catalogue of Life seems like a necessary step, but the following code about reading in a species list seems like an example and not required for all usecases. I would just clarify when/why the user might take those steps.
Using "col" as a reference to "Catalogue of Life" is potentially confusing. I like the brevity of it, but it has a lot of other uses in R such as "column" in colnames or specifying "color" in base graphics. Consider using something more descriptive, such as "cat_of_life".
For people that may be new to the Catalogue of Life, it would be useful to have some documentation describing the common column names or pointing people out to existing documentation for this database.
When I installed this, I received some warnings. Better warnings/errors to help me understand if I need to take action or try again would be useful.

td_create("col")

Importing dwc_col.tsv.bz2 in 100000 line chunks:
[-] chunk 6	...Done! (in 47.58484 secs)
Importing common_col.tsv.bz2 in 100000 line chunks:
Warning: 1 parsing failure.
  row col   expected    actual         file
23215  -- 20 columns 2 columns literal data

	...Done! (in 2.237027 secs)
Warning messages:
1: In readLines(con, n = n, encoding = encoding, warn = FALSE) :
  invalid input found on input connection 'C:\Users\lcarr\AppData\Local\taxadb\taxadb/dwc_col.tsv.bz2'
2: In readLines(con, n = n, encoding = encoding, warn = FALSE) :
  invalid input found on input connection 'C:\Users\lcarr\AppData\Local\taxadb\taxadb/common_col.tsv.bz2'

Functions

When I ran the by_* functions, most of the examples took time to download information. If you submit to CRAN, these would not pass checks because they take too long for the system to run. I recommend that you use \dontrun{} in your ROXYGEN documentation for the examples that take a minute to download or hit some kind of website.
I would suggest having the by_* family of functions be more descriptive. Typically, functions are named with a verb indicating what they do. For instance, filter_catalogue_by*. It is longer, but brevity can cause more confusion because it is not as clear what action the function is actually taking. It could be "grouping by" or "summarizing by", etc.
It is unclear to me how filter_by is different from by_*. I can run the following two lines and seem to get the same result:

by_name(c("Trochalopteron henrici gucenense", "Trochalopteron elliotii"))

filter_by(c("Trochalopteron henrici gucenense", "Trochalopteron elliotii"), "scientificName")

It could be a little confusing that get_ids has an argument that can completely change the output format (the format = "uri" argument). Typically, no matter the arguments, you can expect the same type of information back. Returning a URL that you need to visit to get the IDs might be surprising to users. You could make it a separate function. Additionally, if I try that same argument for get_names, I don't get a url back - get_names(c("ITIS:180092", "ITIS:179913"), format = "uri") returns "Homo sapiens" "Mammalia".
I see code for null_tibble() but I don't see it being used at all or exported. Might be old and you don't need it anymore?

Miscellaneous

If I run code in the console and then try to knit an RMD with taxadb commands, I always get the error about the database being locked since there is already a session using it. It isn't super clear to a user how to remedy this without completely closing R, opening it up, and trying to knit the RMD without running any code in the console. It may make sense to include a function that the user can run to "close" the connection manually and also explain this behavior in documentation somewhere.
I don't know how to recreate or describe (and it has not happened to me before), but every once in a while as I am executing commands from this package, RStudio seems to freeze. I can't click on anything or type any code. I have to close the window and reopen to keep going. I seem to be able to close the window just fine; no need to force quit, so it isn't totally frozen. Haven't seen this happen to me before. I am using RStudio version 1.2.5001

ldecicco-USGS · 2019-10-28T14:25:27Z

Thanks @lindsayplatt !

It looks like the vignettes are in a sub-directory "articles", which is why they are not being properly indexed when created via the remotes package.

If the developers want the vignettes included in the package itself (and not just a pkgdown page), they would need to both move the files out of the "articles" folder, and change the installation instructions to:

remotes::install_github("cboettig/taxadb", 
                         build_vignettes = TRUE, 
                         build_opts = c("--no-resave-data", "--no-manual"))

I'm constantly having to remind myself of the build_opts argument.

mcsiple · 2019-10-29T02:21:44Z

Hi everyone,

This package is very cool. I tried to follow similar conventions in my review to @lindsayplatt, for consistency. I've also added some comments on the end. Thank you for including me as a reviewer for this nice new tool.

Package Review

As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

A statement of need clearly stating problems the software is designed to solve and its target audience in README
Installation instructions: for the development version of package and any non-standard dependencies in README

One quick question about the README -- I wasn't sure after reading it whether there is a way for users to know where the local database is located. If this option is available, I think it would be good to add.

Vignette(s) demonstrating major functionality that runs successfully locally

browseVignettes(package="taxadb") does not return any vignettes but maybe this is normal for packages not yet on CRAN? I see that @lindsayplatt ran into a similar issue and looks like it will be resolved with @ldecicco-USGS 's suggestion.

Function Documentation: for all exported functions in R help
Examples for all exported functions in R Help that run successfully locally
Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

I don't see any CONTRIBUTING guidelines in the documentation.

Functionality

Installation: Installation succeeds as documented.

Comment on operating systems

I tested installation from GitHub and locally from two different operating systems. Where possible, I tested other functionality was also tested on both a Mac and a Windows OS.

I ran into some issues with functionality when using different machines. For example, I had namespace errors for rlang and other dependencies when testing on a Windows OS. This may be unrelated to the package itself (maybe has to do with package versions?) but just something to be aware of.

Local install

Local install in Windows got stuck in a loop of error messages every time I tried it, and I wasn't able to figure out what was causing the issue, but all the error messages were related to MonetDBLite. Installation of MonetDBLite from GitHub was smooth (though lengthy) on a Mac machine.
Local installation on a Mac ran smoothly. I'm not sure if that's an issue of different versions of the imported packages on those two machines, or something else.
Prior to running test(), I installed the dev version of MonetDBLite from GitHub. The most current version did not install from CRAN (i.e., install.packages("MonetDBLite") returned an error). It is possible that if folks try to download and run taxadb, they may have to download the dev version of MonetDBLite in order to install taxadb. That was my experience.

Install from GitHub

devtools::install_github("cboettig/taxadb") was fine and fast on both Windows and Mac OS.

Functionality: Any functional claims of the software been confirmed.

Functionality notes

td_create() returned some errors when I used one machine, including some parsing failures. After the parsing failures I switched to just testing on Mac OS, and this step works fine. I wrote more below about specific functions.

Performance: Any performance claims of the software been confirmed.
Automated tests: Unit tests cover essential functions of the package
and a reasonable range of inputs and conditions. All tests pass on the local machine.

Initially test() returned errors because of the issues with MonetDBLite. After installing the dev version of MonetDBLite it works (and quickly!) in both operating systems.
I used goodpractice::gp() too and encountered two goodpractice errors:

Writing unit tests for all functions: "87% of code lines are covered by test cases"
Installation fail for R CMD check error. gp output says "Installation failed. See ~/taxadb.Rcheck/00install.out for details"

I never figured out what the latter issue was, but it did not appear to affect functionality for me.
There must be some coverage issues, and covr::package_coverage() returns a more detailed description:

taxadb Coverage: 87.09%
R/mutate_db.R: 0.00%
R/taxa_tbl.R: 79.17%
R/td_connect.R: 86.96%
R/get_ids.R: 87.50%
R/get_names.R: 90.00%
R/fuzzy_filter.R: 94.00%
R/td_create.R: 94.74%
R/clean_names.R: 95.45%
R/synonyms.R: 97.06%
R/by_common.R: 100.00%
R/by_id.R: 100.00%
R/by_name.R: 100.00%
R/by_rank.R: 100.00%
R/filter_by.R: 100.00%
R/handling-duplicates.R: 100.00%
R/utils.R: 100.00%

Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

The author has responded to my review and made changes to my satisfaction. I recommend approving this package.
Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Estimated hours spent reviewing: 10 hours as of 10/28/2019

General comments

As someone who works mostly with one group of species and needs access to multiple databases, this package could be incredibly helpful. I think some changes to the documentation could make it more accessible for someone like myself, who is not very well-versed in the diversity of taxonomic databases, how to access them, and which ones have what I need.

Review Comments

Documentation

The documentation is generally good and I just had a couple of small comments.

The breeding bird survey data is used for all the intro examples, but I would love a couple of other examples that are more general. For example, looking up another set of taxa, or a set of taxa downstream from a specific taxonomic node.
An example where taxadb interacts with another package would be helpful
I think some documentation could be particularly effective if it is built in as more specific error/warning messages. With all the taxonomy packages I've used, I often find it hard to match exactly what the package expects in terms of the spelling and syntax of species names, especially common names. It would be great to have a little more built-in functionality that provides additional guideance (e.g., "this species is in ITIS but not in IUCN - did you mean "itis"?")

Help files

Running help(package = "taxadb") did not return a package index or any help files when I was using Windows; on Mac OS it was fine.

Functions

I think I may have missed something about how taxadb handles common names. I am always looking for taxonomic packages that can deal with slight differences in how common names are spelled/punctuated. It would be good to have some more flexibility w/r/t names, so that punctuation can be not an "exact match." For example:

by_common("Stellers jay")
by_common("Steller jay")
by_common("Steller's jay") # only this one returns results
# Based on what I understand from clean_names, maybe this works? 
by_common(clean_names(names = "Steller jay"))  # but this does not return results either

Maybe there is a way to set tolerance on punctuation-- if so, it would be useful for me as a future user.

In the by_rank() function, is there a way to return suggestions about what went wrong, if by_rank() doesn't return a tibble? For example, if I, a fish newb, am looking for all Chondrichtyans,

by_rank("Osteichthyes", "class", "itis")

returns an empty tibble. It would be great to have something that tells people, "[insert name of taxon you entered] is defined as an [order/kingdom/genus]-- is that the classification you meant?" It doesn't have to be that literal, but would be good as a checker if people have errors in classifying the taxa they're interested in.

Session Info

In Windows:

R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] readr_1.3.1  dplyr_0.8.3  taxadb_0.0.1

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.2        dbplyr_1.4.2      compiler_3.6.1    pillar_1.4.2      prettyunits_1.0.2 remotes_2.1.0    
 [7] tools_3.6.1       progress_1.2.2    bit_1.1-14        testthat_2.2.1    zeallot_0.1.0     digest_0.6.22    
[13] pkgbuild_1.0.6    pkgload_1.0.2     RSQLite_2.1.2     memoise_1.1.0     tibble_2.1.3      pkgconfig_2.0.3  
[19] rlang_0.4.1       rex_1.1.2         cli_1.1.0         DBI_1.0.0         yaml_2.2.0        pkgreviewr_0.1.2 
[25] arkdb_0.0.5       withr_2.1.2       rappdirs_0.3.1    desc_1.2.0        fs_1.3.1          vctrs_0.2.0      
[31] devtools_2.2.1    hms_0.5.1         bit64_0.9-7       tidyselect_0.2.5  rprojroot_1.3-2   glue_1.3.1       
[37] R6_2.4.0          processx_3.4.1    fansi_0.4.0       sessioninfo_1.1.1 blob_1.2.0        purrr_0.3.3      
[43] callr_3.3.2       covr_3.3.2        magrittr_1.5      backports_1.1.5   ps_1.3.0          ellipsis_0.3.0   
[49] usethis_1.5.1     assertthat_0.2.1  utf8_1.1.4        stringi_1.4.3     lazyeval_0.2.2    crayon_1.3.4

On Mac:

R version 3.6.1 (2019-07-05)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] devtools_2.2.1 usethis_1.5.1  magrittr_1.5   taxadb_0.0.1   readr_1.3.1    dplyr_0.8.3   

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.2         whoami_1.3.0       prettyunits_1.0.2  ps_1.3.0           clisymbols_1.2.0   utf8_1.1.4        
 [7] assertthat_0.2.1   zeallot_0.1.0      rprojroot_1.3-2    digest_0.6.22      R6_2.4.0           backports_1.1.5   
[13] RSQLite_2.1.2      evaluate_0.14      httr_1.4.1         pillar_1.4.2       rlang_0.4.1        progress_1.2.2    
[19] curl_4.2           lazyeval_0.2.2     rstudioapi_0.10    praise_1.0.0       MonetDBLite_0.6.1  pkgreviewr_0.1.3  
[25] callr_3.3.2        blob_1.2.0         rmarkdown_1.16     desc_1.2.0         stringr_1.4.0      bit_1.1-14        
[31] compiler_3.6.1     lintr_2.0.0        xfun_0.10          pkgconfig_2.0.3    base64enc_0.1-3    pkgbuild_1.0.6    
[37] htmltools_0.4.0    tidyselect_0.2.5   tibble_2.1.3       codetools_0.2-16   fansi_0.4.0        crayon_1.3.4      
[43] dbplyr_1.4.2       withr_2.1.2        rappdirs_0.3.1     jsonlite_1.6       DBI_1.0.0          covr_3.3.2        
[49] cli_1.1.0          stringi_1.4.3      fs_1.3.1           remotes_2.1.0      rex_1.1.2          testthat_2.2.1    
[55] xml2_1.2.2         ellipsis_0.3.0     vctrs_0.2.0        cyclocomp_1.1.0    arkdb_0.0.5        tools_3.6.1       
[61] rcmdcheck_1.3.3    bit64_0.9-7        glue_1.3.1         purrr_0.3.3        hms_0.5.1          processx_3.4.1    
[67] pkgload_1.0.2      yaml_2.2.0         xmlparsedata_1.0.3 xopen_1.0.0        sessioninfo_1.1.1  memoise_1.1.0     
[73] goodpractice_1.0.2 knitr_1.25

rOpenSci guidelines:

Does the code comply with general principles in the Mozilla reviewing guide?
Does the package comply with the rOpenSci packaging guide?

ldecicco-USGS · 2019-10-29T15:36:04Z

Awesome, thanks so much @mcsiple and @lindsayplatt . Now... @karinorman , it's back to you!

karinorman · 2019-10-29T22:07:39Z

Hi @lindsayplatt & @mcsiple, thanks for taking the time to review and for your helpful comments! A couple general responses below about issues you both ran into with more detailed and individualized responses to come!

Vignettes

Thanks @ldecicco-USGS for pointing out the vignette location. They are currently in the articles subdirectory to prevent timing out on CRAN (pkgdown still detects and builds vignettes in subdirectories). Prebuilt vignettes are linked at the bottom of the README, but we could also put a lighter weight version in vingettes/ so they're accessible by browseVignettes(package="taxadb").

Backend Documentation

It's clear from both of your experiences with the package that an upfront description of the package backends (i.e. database hosts) is necessary. We will expand on that in the schema.rmd vignette, but also wanted to point to the article draft in paper/manuscript.rmd, which we would also love feedback on! It lays out the backends in a lot more detail and should lend some clarity.

@lindsayplatt, backend problems are likely the source of your performance issues. td_create("col") only installs the database for Catalogue of Life, whereas many of the examples reference other providers (such as ITIS, which is the default provider). If you try to use a provider that has not been installed the default is to install that database on the fly, which takes a while! Running td_create("all") will install all databases ahead of time. This is obviously an issue of clarity on our part in the README, but just wanted to explain what was likely going on there.

Also, the freezing behavior Lindsay observed is almost surely due to the computer running low on memory. We suspect this is because RSQLite or MonetDBLite were not installed? We made RSQLite an optional dependency and allowed taxadb to work in-memory if it was not found, but this example demonstrates that was a poor choice, since users can easily run out of memory with these giant data tables! We've making RSQLite a required dependency now, and adding a vignette to explain how to choose between these. (Basically RSQLite is easier to install but not as fast as MonetDBLite!) You can check sessionInfo() or just run taxadb::td_connect() to see what database connection is being used.

cboettig · 2019-10-30T15:22:28Z

Wow, thanks @lindsayplatt & @mcsiple for the excellent reviews, this has been really helpful in improving the docs and implementations! Very sorry about the vignettes being hard to find! While we work through this, I just wanted to drop a few links in here to the updated vignettes, since hopefully they go some ways to clarifying some of the issues you both raise and we'd really like your feedback on those parts as well.

Overview of the name providers & column names (darwin core) -- this was previously called the 'schema vignette', but it also discusses a bit more about each of the 10 data providers (Catalogue of Life, ITIS, etc). It could probably include more specifics about each one but we'd love to hear what you think is missing!
new vignette on Database backends. We hope this clarifies a bit more about these options and how to control them!
There's also this intro vignette which is meant as a slightly longer / richer set of examples to complement the README, though I'm not sure if it serves that purpose. Perhaps it is unnecessary? Or perhaps some of the current stuff in the README should just be moved into this to keep the README really short and sweet?
Lastly, we have a draft of a paper for MEE, discusses the context and motivation in a bit more detail. (ropensci partners with MEE for some reviews, so this still needs polish and will get separate review if/when it goes to MEE, but it could be helpful in better understanding the package so we wanted to mention it now and get all the feedback we can!)

Really appreciate all the help!

ldecicco-USGS · 2019-11-05T17:25:11Z

Great!
@lindsayplatt and @mcsiple ....do these response sound acceptable to you? Any further questions/comments?

karinorman · 2019-11-05T19:07:12Z

@ldecicco-USGS We're also still working on a few of the comments made by both reviewers, including examples and test for a couple functions, better common names examples, etc. Hoping to have that finished up in the next couple days but would love responses to the above in the meantime. Thanks everyone!

mcsiple · 2019-11-06T20:01:36Z

Hi all -- apologies for the delayed response. Thank you for adding the extended documentation.

The added database backends vignette is great! Thank you for that.
I quite like the extended vignette you added @cboettig . The only potential issue I ran into was that the current chunk inside system.time({}) took a while for me to run. If other people besides me report really long running times, it might be worth finding an example with a shorter run time. Mine completed in the following:

   user  system elapsed 
 448.79  227.97  686.84

Which is much longer than it says the example took in the vignette. A second attempt took about the same amount of time. Besides that it is great and I liked the added length/detail.

Just to clarify after looking over the manuscript draft - is taxadb intended to replace taxize ? If not, maybe it's worth discussing in the ms how they can be used together. I am imagining a user who is brand new to taxonomic databases and wants to use the most rigorous workflow to match a list of species to these databases and extract some info about IUCN or CITES listing (for example).

ldecicco-USGS · 2019-11-25T17:09:50Z

@karinorman checking it to make sure I understand correctly, we're still waiting on updates from you?

karinorman · 2019-12-02T20:10:13Z

@ldecicco-USGS Yes! Just wanted to address some of the general changes we've now made based on comments from both reviewers.

contribution guidelines added
changed naming convention from by_ to filter_ to reflect the intuition that these functions filter underlying tables based on specified input
updated examples and function tests, we now have 94% coverage based on codev
added function td_disconnect() to manually close the database connection
added more detailed examples for working with messy common names, for example see mutate_db documentation

We are currently still working on a way to deal with database versioning so that multiple versions can be installed on a single machine, with the ability to move between versions. We plan to have that finished up in the next couple weeks.

@mcsiple and @lindsayplatt, let us know if there is anything else you would like to discuss further and thanks again for your helpful feedback!

ldecicco-USGS · 2020-01-08T23:22:48Z

Approved! Thanks @karinorman for submitting and @lindsayplatt and @mcsiple for your reviews!

To-dos:

Transfer the repo to rOpenSci's "ropensci" GitHub organization under "Settings" in your repo. I'm getting the rOpenSci folks the info get a "taxadb" repo set up here. You'll be made admin that's set up and you accept and invitation to join a taxadb team.
Add the rOpenSci footer to the bottom of your README
" [![ropensci_footer](https://ropensci.org/public_images/ropensci_footer.png)](https://ropensci.org)"
Fix all links to the GitHub repo to point to the repo under the ropensci organization.
If you already had a pkgdown website, fix its URL to point to https://docs.ropensci.org/package_name and deactivate the automatic deployment you might have set up, since it will not be built centrally like for all rOpenSci packages, see http://devdevguide.netlify.com/#docsropensci. In addition, in your DESCRIPTION file, include the docs link in the URL field alongside the link to the GitHub repository, e.g.: URL: https://docs.ropensci.org/foobar (website) https://github.com/ropensci/foobar
Add a mention of the review in DESCRIPTION via rodev::add_ro_desc().
Fix any links in badges for CI and coverage to point to the ropensci URL. We no longer transfer Appveyor projects to ropensci Appveyor account so after transfer of your repo to rOpenSci's "ropensci" GitHub organization the badge should be [![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/github/ropensci/pkgname?branch=master&svg=true)](https://ci.appveyor.com/project/individualaccount/pkgname).
We're starting to roll out software metadata files to all ropensci packages via the Codemeta initiative, see https://github.com/ropensci/codemetar/#codemetar for how to include it in your package, after installing the package - should be easy as running codemetar::write_codemeta() in the root of your package.

Should you want to acknowledge your reviewers in your package DESCRIPTION, you can do so by making them "rev"-type contributors in the Authors@R field (with their consent). More info on this here.

Welcome aboard! We'd love to host a blog post about your package - either a short introduction to it with one example or a longer post with some narrative about its development or something you learned, and an example of its use. If you are interested, review the instructions, and tag @stefaniebutland in your reply. She will get in touch about timing and can answer any questions.

We've put together an online book with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding. Please tell us what could be improved, the corresponding repo is here.

maelle · 2020-01-09T08:24:26Z

@karinorman I've invited you to the rOpensci GitHub organization. Once you've accepted it, you can [transfer the repo](https://help.github.com/en/github/administering-a-repository/transferring-a-repository9 as @ldecicco-USGS said, from the Settings tab of your repo. Once you've transferred the repo, please ping me here, so I can give you admin rights to the repo again. Thank you!

stefaniebutland · 2020-01-09T21:40:29Z

Congratulations @karinorman! Given that the subject area of taxadb is an important one for rOpenSci and our user/developer community, would you consider writing a blog post?

ldecicco-USGS · 2020-01-09T22:50:19Z

To echo @stefaniebutland 's request...I think the development of this package would make a good blog story. Others may have different ideas about why taxadb is neat, but for me it was how to make a relatively complicated database accessible to users who aren't necessarily subject matter experts. The logistics of creating the local database for the fast connections are really pertinent to other rOpenSci users/developers.

karinorman · 2020-01-10T02:57:35Z

@ldecicco-USGS Wonderful to hear - thank you so much for making this a smooth process, and thanks again to reviewers @lindsayplatt and @mcsiple for your feedback that greatly improved the package! I'm out of the office until next week but will go through the onboarding steps then. @stefaniebutland I would also be happy to write a blog post, thanks for the invitation!

stefaniebutland · 2020-01-10T16:51:34Z

Awesome :-) and thank you.
When you're back in the office Kari, please suggest a date by which you'd like to submit a draft post. Typically you submit via pull request, I review the post, and we publish one week after submission.

Instructions are here https://github.com/ropensci/roweb2#contributing-a-blog-post
and I'm happy to answer any questions.

karinorman · 2020-01-17T18:46:34Z

@stefaniebutland Would Jan 31st be good?

stefaniebutland · 2020-01-17T19:08:55Z

Yes!!

karinorman · 2020-01-23T19:05:47Z

@lindsayplatt and @mcsiple, do you mind being acknowledged as our reviewers?

lindsayplatt · 2020-01-24T16:48:37Z

👍 don't mind one bit!

stefaniebutland · 2020-02-03T16:28:38Z

Hi @karinorman. Will you be able to submit a draft blog post this week?
Happy to answer any questions.

mcsiple · 2020-02-03T21:30:44Z

Eek so sorry for the delay. I don't mind being acknowledged either!

karinorman · 2020-02-03T21:40:35Z

@stefaniebutland Sorry for the lack of update! I got hit with a nasty virus last week and wasn't quite able to finish it up. Look for it in the next couple days!

stefaniebutland · 2020-02-03T22:30:31Z

Ack! No pressure from over here @karinorman. Get well soon and submit when you're ready

cboettig · 2020-02-17T20:21:59Z

@maelle Should this thread now be closed? Also, not sure why the 'peer reviewed' badge is showing up as 'status unknown?' https://badges.ropensci.org/344_status.svg

maelle · 2020-02-18T16:31:04Z

The badges server seems to have problems at the moment but yep this should be closed for the badge to become green I think.

melvidoni added topic:data-access topic:data-extraction topic:data-retrieval labels Oct 1, 2019

ldecicco-USGS added 2/seeking-reviewer(s) 3/reviewer(s)-assigned and removed 2/seeking-reviewer(s) labels Oct 7, 2019

ldecicco-USGS added 4/review(s)-in-awaiting-changes and removed 3/reviewer(s)-assigned labels Oct 29, 2019

maelle assigned ldecicco-USGS Jan 6, 2020

ldecicco-USGS added 6/approved and removed 4/review(s)-in-awaiting-changes labels Jan 8, 2020

maelle closed this as completed Feb 18, 2020

submission: taxadb #344

submission: taxadb #344

Comments

karinorman commented Sep 30, 2019 • edited by ldecicco-USGS Loading

Scope

Technical checks

Publication options

Code of conduct

melvidoni commented Oct 3, 2019

ldecicco-USGS commented Oct 7, 2019 • edited Loading

Editor checks:

Editor comments

ldecicco-USGS commented Oct 9, 2019

lindsayplatt commented Oct 28, 2019 • edited Loading

Package Review

Documentation

Functionality

Final approval (post-review)

Review Comments

README Documentation

Functions

Miscellaneous

ldecicco-USGS commented Oct 28, 2019 • edited Loading

mcsiple commented Oct 29, 2019 • edited Loading

Package Review

Documentation

Functionality

Comment on operating systems

Local install

Install from GitHub

Functionality notes

Final approval (post-review)

General comments

Review Comments

Documentation

Help files

Functions

Session Info

ldecicco-USGS commented Oct 29, 2019

karinorman commented Oct 29, 2019

cboettig commented Oct 30, 2019

ldecicco-USGS commented Nov 5, 2019

karinorman commented Nov 5, 2019

mcsiple commented Nov 6, 2019

ldecicco-USGS commented Nov 25, 2019

karinorman commented Dec 2, 2019

ldecicco-USGS commented Jan 8, 2020

maelle commented Jan 9, 2020

stefaniebutland commented Jan 9, 2020

ldecicco-USGS commented Jan 9, 2020

karinorman commented Jan 10, 2020

stefaniebutland commented Jan 10, 2020

karinorman commented Jan 17, 2020

stefaniebutland commented Jan 17, 2020

karinorman commented Jan 23, 2020

lindsayplatt commented Jan 24, 2020

stefaniebutland commented Feb 3, 2020

mcsiple commented Feb 3, 2020

karinorman commented Feb 3, 2020

stefaniebutland commented Feb 3, 2020

cboettig commented Feb 17, 2020

maelle commented Feb 18, 2020

karinorman commented Sep 30, 2019 •

edited by ldecicco-USGS

Loading

ldecicco-USGS commented Oct 7, 2019 •

edited

Loading

lindsayplatt commented Oct 28, 2019 •

edited

Loading

ldecicco-USGS commented Oct 28, 2019 •

edited

Loading

mcsiple commented Oct 29, 2019 •

edited

Loading