-
-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
submission: taxadb #344
Comments
Hello @karinorman, your editor will be @ldecicco-USGS, acting as a Guest Editor for rOpenSci. |
Editor checks:
Editor commentsHello @karinorman ...this is my first "guest edit"...so bear with me! The following is the output of goodpractice::gp(). I'm not terribly concerned about the messages, but they would be easy messages to get rid of, and then future pull request could use the
I'm sending email requests now to find 2 reviewers. Reviewers: @mcsiple @lindsayplatt |
Alright @karinorman ! We have 2 reviewers @mcsiple and @lindsayplatt. Please try to submit the reviews through the GitHub issue by Oct. 28. Let me know if you have any questions on the review process. |
Hello! It was fun to review this package. I have specific comments within the ROpenSci checklist (they are noted as a quotation starting with my initials Best, Package ReviewPlease check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
DocumentationThe package includes all the following forms of documentation:
Have not heard if this is being submitted to JOSS, so assuming no and skipping the JOSS section of review. Functionality
Final approval (post-review)
Estimated hours spent reviewing: 3 hrs as of 10/27/2019
Review CommentsREADME Documentation
td_create("col")
Importing dwc_col.tsv.bz2 in 100000 line chunks:
[-] chunk 6 ...Done! (in 47.58484 secs)
Importing common_col.tsv.bz2 in 100000 line chunks:
Warning: 1 parsing failure.
row col expected actual file
23215 -- 20 columns 2 columns literal data
...Done! (in 2.237027 secs)
Warning messages:
1: In readLines(con, n = n, encoding = encoding, warn = FALSE) :
invalid input found on input connection 'C:\Users\lcarr\AppData\Local\taxadb\taxadb/dwc_col.tsv.bz2'
2: In readLines(con, n = n, encoding = encoding, warn = FALSE) :
invalid input found on input connection 'C:\Users\lcarr\AppData\Local\taxadb\taxadb/common_col.tsv.bz2' Functions
by_name(c("Trochalopteron henrici gucenense", "Trochalopteron elliotii"))
filter_by(c("Trochalopteron henrici gucenense", "Trochalopteron elliotii"), "scientificName")
Miscellaneous
|
Thanks @lindsayplatt ! It looks like the vignettes are in a sub-directory "articles", which is why they are not being properly indexed when created via the If the developers want the vignettes included in the package itself (and not just a pkgdown page), they would need to both move the files out of the "articles" folder, and change the installation instructions to: remotes::install_github("cboettig/taxadb",
build_vignettes = TRUE,
build_opts = c("--no-resave-data", "--no-manual")) I'm constantly having to remind myself of the build_opts argument. |
Hi everyone, This package is very cool. I tried to follow similar conventions in my review to @lindsayplatt, for consistency. I've also added some comments on the end. Thank you for including me as a reviewer for this nice new tool. Package Review
DocumentationThe package includes all the following forms of documentation:
taxadb Coverage: 87.09%
R/mutate_db.R: 0.00%
R/taxa_tbl.R: 79.17%
R/td_connect.R: 86.96%
R/get_ids.R: 87.50%
R/get_names.R: 90.00%
R/fuzzy_filter.R: 94.00%
R/td_create.R: 94.74%
R/clean_names.R: 95.45%
R/synonyms.R: 97.06%
R/by_common.R: 100.00%
R/by_id.R: 100.00%
R/by_name.R: 100.00%
R/by_rank.R: 100.00%
R/filter_by.R: 100.00%
R/handling-duplicates.R: 100.00%
R/utils.R: 100.00%
Final approval (post-review)
Estimated hours spent reviewing: 10 hours as of 10/28/2019 General commentsAs someone who works mostly with one group of species and needs access to multiple databases, this package could be incredibly helpful. I think some changes to the documentation could make it more accessible for someone like myself, who is not very well-versed in the diversity of taxonomic databases, how to access them, and which ones have what I need. Review CommentsDocumentationThe documentation is generally good and I just had a couple of small comments.
Help filesRunning FunctionsI think I may have missed something about how taxadb handles common names. I am always looking for taxonomic packages that can deal with slight differences in how common names are spelled/punctuated. It would be good to have some more flexibility w/r/t names, so that punctuation can be not an "exact match." For example: by_common("Stellers jay")
by_common("Steller jay")
by_common("Steller's jay") # only this one returns results
# Based on what I understand from clean_names, maybe this works?
by_common(clean_names(names = "Steller jay")) # but this does not return results either Maybe there is a way to set tolerance on punctuation-- if so, it would be useful for me as a future user. In the by_rank() function, is there a way to return suggestions about what went wrong, if by_rank("Osteichthyes", "class", "itis") returns an empty tibble. It would be great to have something that tells people, "[insert name of taxon you entered] is defined as an [order/kingdom/genus]-- is that the classification you meant?" It doesn't have to be that literal, but would be good as a checker if people have errors in classifying the taxa they're interested in. Session InfoIn Windows: R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] readr_1.3.1 dplyr_0.8.3 taxadb_0.0.1
loaded via a namespace (and not attached):
[1] Rcpp_1.0.2 dbplyr_1.4.2 compiler_3.6.1 pillar_1.4.2 prettyunits_1.0.2 remotes_2.1.0
[7] tools_3.6.1 progress_1.2.2 bit_1.1-14 testthat_2.2.1 zeallot_0.1.0 digest_0.6.22
[13] pkgbuild_1.0.6 pkgload_1.0.2 RSQLite_2.1.2 memoise_1.1.0 tibble_2.1.3 pkgconfig_2.0.3
[19] rlang_0.4.1 rex_1.1.2 cli_1.1.0 DBI_1.0.0 yaml_2.2.0 pkgreviewr_0.1.2
[25] arkdb_0.0.5 withr_2.1.2 rappdirs_0.3.1 desc_1.2.0 fs_1.3.1 vctrs_0.2.0
[31] devtools_2.2.1 hms_0.5.1 bit64_0.9-7 tidyselect_0.2.5 rprojroot_1.3-2 glue_1.3.1
[37] R6_2.4.0 processx_3.4.1 fansi_0.4.0 sessioninfo_1.1.1 blob_1.2.0 purrr_0.3.3
[43] callr_3.3.2 covr_3.3.2 magrittr_1.5 backports_1.1.5 ps_1.3.0 ellipsis_0.3.0
[49] usethis_1.5.1 assertthat_0.2.1 utf8_1.1.4 stringi_1.4.3 lazyeval_0.2.2 crayon_1.3.4 On Mac: R version 3.6.1 (2019-07-05)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.6
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] devtools_2.2.1 usethis_1.5.1 magrittr_1.5 taxadb_0.0.1 readr_1.3.1 dplyr_0.8.3
loaded via a namespace (and not attached):
[1] Rcpp_1.0.2 whoami_1.3.0 prettyunits_1.0.2 ps_1.3.0 clisymbols_1.2.0 utf8_1.1.4
[7] assertthat_0.2.1 zeallot_0.1.0 rprojroot_1.3-2 digest_0.6.22 R6_2.4.0 backports_1.1.5
[13] RSQLite_2.1.2 evaluate_0.14 httr_1.4.1 pillar_1.4.2 rlang_0.4.1 progress_1.2.2
[19] curl_4.2 lazyeval_0.2.2 rstudioapi_0.10 praise_1.0.0 MonetDBLite_0.6.1 pkgreviewr_0.1.3
[25] callr_3.3.2 blob_1.2.0 rmarkdown_1.16 desc_1.2.0 stringr_1.4.0 bit_1.1-14
[31] compiler_3.6.1 lintr_2.0.0 xfun_0.10 pkgconfig_2.0.3 base64enc_0.1-3 pkgbuild_1.0.6
[37] htmltools_0.4.0 tidyselect_0.2.5 tibble_2.1.3 codetools_0.2-16 fansi_0.4.0 crayon_1.3.4
[43] dbplyr_1.4.2 withr_2.1.2 rappdirs_0.3.1 jsonlite_1.6 DBI_1.0.0 covr_3.3.2
[49] cli_1.1.0 stringi_1.4.3 fs_1.3.1 remotes_2.1.0 rex_1.1.2 testthat_2.2.1
[55] xml2_1.2.2 ellipsis_0.3.0 vctrs_0.2.0 cyclocomp_1.1.0 arkdb_0.0.5 tools_3.6.1
[61] rcmdcheck_1.3.3 bit64_0.9-7 glue_1.3.1 purrr_0.3.3 hms_0.5.1 processx_3.4.1
[67] pkgload_1.0.2 yaml_2.2.0 xmlparsedata_1.0.3 xopen_1.0.0 sessioninfo_1.1.1 memoise_1.1.0
[73] goodpractice_1.0.2 knitr_1.25
rOpenSci guidelines:
|
Awesome, thanks so much @mcsiple and @lindsayplatt . Now... @karinorman , it's back to you! |
Hi @lindsayplatt & @mcsiple, thanks for taking the time to review and for your helpful comments! A couple general responses below about issues you both ran into with more detailed and individualized responses to come! Vignettes Thanks @ldecicco-USGS for pointing out the vignette location. They are currently in the Backend Documentation It's clear from both of your experiences with the package that an upfront description of the package backends (i.e. database hosts) is necessary. We will expand on that in the @lindsayplatt, backend problems are likely the source of your performance issues. Also, the freezing behavior Lindsay observed is almost surely due to the computer running low on memory. We suspect this is because |
Wow, thanks @lindsayplatt & @mcsiple for the excellent reviews, this has been really helpful in improving the docs and implementations! Very sorry about the vignettes being hard to find! While we work through this, I just wanted to drop a few links in here to the updated vignettes, since hopefully they go some ways to clarifying some of the issues you both raise and we'd really like your feedback on those parts as well.
Really appreciate all the help! |
Great! |
@ldecicco-USGS We're also still working on a few of the comments made by both reviewers, including examples and test for a couple functions, better common names examples, etc. Hoping to have that finished up in the next couple days but would love responses to the above in the meantime. Thanks everyone! |
Hi all -- apologies for the delayed response. Thank you for adding the extended documentation.
Which is much longer than it says the example took in the vignette. A second attempt took about the same amount of time. Besides that it is great and I liked the added length/detail.
|
@karinorman checking it to make sure I understand correctly, we're still waiting on updates from you? |
@ldecicco-USGS Yes! Just wanted to address some of the general changes we've now made based on comments from both reviewers.
We are currently still working on a way to deal with database versioning so that multiple versions can be installed on a single machine, with the ability to move between versions. We plan to have that finished up in the next couple weeks. @mcsiple and @lindsayplatt, let us know if there is anything else you would like to discuss further and thanks again for your helpful feedback! |
Approved! Thanks @karinorman for submitting and @lindsayplatt and @mcsiple for your reviews! To-dos:
Should you want to acknowledge your reviewers in your package DESCRIPTION, you can do so by making them Welcome aboard! We'd love to host a blog post about your package - either a short introduction to it with one example or a longer post with some narrative about its development or something you learned, and an example of its use. If you are interested, review the instructions, and tag @stefaniebutland in your reply. She will get in touch about timing and can answer any questions. We've put together an online book with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding. Please tell us what could be improved, the corresponding repo is here. |
@karinorman I've invited you to the rOpensci GitHub organization. Once you've accepted it, you can [transfer the repo](https://help.github.com/en/github/administering-a-repository/transferring-a-repository9 as @ldecicco-USGS said, from the Settings tab of your repo. Once you've transferred the repo, please ping me here, so I can give you admin rights to the repo again. Thank you! |
Congratulations @karinorman! Given that the subject area of taxadb is an important one for rOpenSci and our user/developer community, would you consider writing a blog post? |
To echo @stefaniebutland 's request...I think the development of this package would make a good blog story. Others may have different ideas about why |
@ldecicco-USGS Wonderful to hear - thank you so much for making this a smooth process, and thanks again to reviewers @lindsayplatt and @mcsiple for your feedback that greatly improved the package! I'm out of the office until next week but will go through the onboarding steps then. @stefaniebutland I would also be happy to write a blog post, thanks for the invitation! |
Awesome :-) and thank you. Instructions are here https://github.com/ropensci/roweb2#contributing-a-blog-post |
@stefaniebutland Would Jan 31st be good? |
Yes!! |
@lindsayplatt and @mcsiple, do you mind being acknowledged as our reviewers? |
👍 don't mind one bit! |
Hi @karinorman. Will you be able to submit a draft blog post this week? |
Eek so sorry for the delay. I don't mind being acknowledged either! |
@stefaniebutland Sorry for the lack of update! I got hit with a nasty virus last week and wasn't quite able to finish it up. Look for it in the next couple days! |
Ack! No pressure from over here @karinorman. Get well soon and submit when you're ready |
@maelle Should this thread now be closed? Also, not sure why the 'peer reviewed' badge is showing up as 'status unknown?' https://badges.ropensci.org/344_status.svg |
The badges server seems to have problems at the moment but yep this should be closed for the badge to become green I think. |
Submitting Author: Kari Norman (@karinorman)
Repository: https://github.com/cboettig/taxadb
Version submitted: v1.0.0
Editor: @ldecicco-USGS (Guest Editor)
Reviewer 1: @mcsiple
Reviewer 2: @lindsayplatt
Archive: TBD
Version accepted: TBD
Scope
Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
Explain how and why the package falls under these categories (briefly, 1-2 sentences):
The package accesses existing data sets, adapts them to a standard format, and provides a set of functions for accessing the resulting database.
Anyone using data including species names, especially for datasets that may have taxonomic inconsistencies.
taxaize
provides accesses to many of the same datasets via an API interface. Our package is specifically designed to work around the API to offer the faster access necessary for large datasets, as well as provide a standardized format for all datasets.Technical checks
Confirm each of the following by checking the box. This package:
Publication options
JOSS Options
paper.md
matching JOSS's requirements with a high-level description in the package root or ininst/
.MEE Options
Code of conduct
The text was updated successfully, but these errors were encountered: