-
-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Submission: comtradr #141
Comments
Sorry, meant to also post the output of
|
Editor checks:
Editor commentsThanks for your submission @ChrisMuir ! Currently seeking reviewers. It's a good fit and not overlapping. Thanks for including Reviewers: @AliciaSchep @rtaph |
thanks @AliciaSchep and @rtaph for reviewing |
Great, I look forward to the review process! @AliciaSchep and @rtaph if you have any questions for me just let me know. Thanks everyone! |
👋 @rtaph @AliciaSchep - reviews due next monday |
Package ReviewPlease check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
DocumentationThe package includes all the following forms of documentation:
Functionality
Final approval (post-review)
Estimated hours spent reviewing: 3.5 hrs Review CommentsPackage was easy and fun to use! 🎉 Function documentation was thorough and informative. I liked the example plots and manipulations in the README, although those might be more appropriate for the vignette rather than README. ExamplesExamples in package documentation did not run successfully using Package man pageThe package should ideally have a man page for the entire package, which might describe briefly what the package does. A good reference for how to make this in roxygen2 is at http://r-pkgs.had.co.nz/man.html#man-packages. TestsThe tests include many expectations each, especially for ct_search. This can make it harder to figure out exactly why the test is failing, because you get the message from the "test_that" call but then have to do some digging to figure out exactly which expectation(s) failed. My understanding is that it is better practice to have more test_that statements, so each one is reporting on a particular behavior. For example, your "ex1" and "ex2" related expectations should probably be in separate tests if those examples are trying to capture different use cases. Also, things like the following block:
should also probably be a separate test block. The first time I ran the tests, all the expectations for "ex2" failed. Trying again multiple times, it seemed to work. I am not sure what went wrong the first time, so don't really have any suggestion for that one, just figured I'd mention it. Continuous IntegrationIt would be good to include the Travis Badge on the README, and to additionally test the package on MacOs and Windows. For Mac, this can be specified in the .travis.yml file. For Windows, you can use AppVeyor. Additional minor suggestions:These suggestions could affect existing code using package and are also pretty minor so I can understand if you want to ignore! Name of country_lookup and commodity_lookup functionsMost of the package names start with "ct_", except the "country_lookup" and "commodity_lookup" functions. I think it would be nice to go fully with the convention of starting all the functions with the same prefix, and add "ct_" to those two functions as well. Argument ordering for country_lookup functionFor the country_lookup function, I think it would be nice for the order of the arguments to be re-arranged so that "type" was optional without having to write-out the "lookuptable" argument. Currently, if you want just accept the default for type, then you would need to do:
Reordering arguments for type to be last would allow you do:
Having the argument be named "countrytable" for consistency with ct_search might also be nice. Then commodity_lookup could have the argument be "commoditytable". |
Hi Alicia, Thanks so much for taking the time to review and for the great feedback! I will hold off on writing out an official reply to all review comments until @rtaph has replied with his review. In the meantime, I will start working on some of your suggestions, like creating man page, expanding CI, etc. Just a quick note on the |
Package Review
DocumentationThe package includes all the following forms of documentation:
Functionality
Final approval (post-review)
Estimated hours spent reviewing: 8.5 Review CommentsOverall ApproachLet me start by saying that this is a great package and I enjoyed reviewing it. I wish more people in the field used R, and packages like yours make it a little bit easier to do so. I like that that virtually all the functionality of the original API is available in the package, and your integration of regex into the package philosophy means that you provide a flexible approach to working with the original API from the U.N. My thoughts below may be thought of as feature upgrades rather than bugs, but I am including them here because some would break the current approach you have adopted. As a result, you could consider implementing them (if you agree with them) before a major release. It's up to you as a package author. At a high level, my main recommendation would be to simplify the package philosophy to make data access even easier and faster. I think the package could benefit from design changes that a) reduce the number of steps needed to get at the final data, and b) reduce the need to open up help files to remember how to do things or recall how things are named. I found that each time I loaded up this package, I would need to re-read the help pages and do more setup than I cared for in order to get the data I wanted. I think these difficulties can be addressed in a number of ways.
Package Help PageAs mentioned above, it would be useful to add a Form and StyleYour code was easy to follow and well annotated (thanks!). Some small thoughts:
DocumentationThe documentation is generally good with plenty of examples. A few topics I think could be added to the documention include:
ModularizationThe modularization on this package is generally great. The one thing I would split out is the part of the function that constructs the API call from the part that actually calls it. It would be nice to be able to generate the API url for troubleshooting purposes, even if this is not exported, and also save the API url along with the result (especially if caching). User IdentificationIt is good web etiquette to identify oneself when accessing APIs and scraping the web. This can be done via the user agent variable in Along the lines of [Munzert et al.](Automated Data Collection with R), I would set the user agent to something generic like: paste(Sys.info()[["user"]], R.version$version.str, version$platform, sep = ", ")
#> [1] "rtaph, R version 3.4.1 (2017-06-30), x86_64-apple-darwin15.6.0" Data FormatIs there any utility in allowing the user to specify API version checkFrom what I can tell, there is no call to check the API version (is that right?). If there is, it would be nice assert the version in the return data so that if the API changes, the user gets a warning about using an older version of NamesThis is really a matter of preference, but I would simplify by removing the option of "human readable" names in the formals, instead providing a dictionary and examples of how to do this outside of the function. This lets the user go back and forth with renaming columns once the data frame is returned. Another thought is that the "country name" header has a space, and some people may not know how to use backticks to refer to names like this. Again, maybe think about standardizing the naming convention for data frame headers, argument names, and argument parameters. Status 200 CheckWhy not have an asserter to ensure that your receive Spelling and GrammarTiny mistakes:
Unit TestsGreat test coverage! These are especially important for API wrappers since the upstream protocols can change. A few ideas on tests:
Package DataWhy not save Handling of MetadataIt would be great if the following metadata was stored somewhere along the way, including:
I think the best would be to store it as an attribute of the return object, or perhaps as a colum in the data frame. You have to be a bit careful here though because some functions outside the tidyverse drop attributes in their return objects. Alternatively, metadata can be stored in a special purpose namespace for internal reference, however this is likley to be less accessible to end users. Other
Questions
References and ResourcesArticle and Books:
|
@AliciaSchep and @rtaph, Thank you both so much for the thorough reviews! I've already begun working on suggestions made by @AliciaSchep, I will now start to address all suggestions made by @rtaph. I'll compile a reply that addresses each point made within the reviews. This may take some time, but I hope to have a reply to you both in 1-2 weeks. Also, while I'm working on package updates, I have two additional changes that I came up with recently, but have held off as I didn't want to make changes during the review process. Let me know your thoughts on these:
Thanks again! |
thanks for your reviews @AliciaSchep and @rtaph ! plan sounds good @ChrisMuir |
Hi @ChrisMuir, Happy to help. If you are only relying on the pipe, then I think your suggestion to import As for the second point-- well everyone has their opinion and you could do it in base R with if you stick away from |
Got it, thanks for sharing your thoughts on those. Definitely leaning towards implementing them both. I'll look into Thanks! |
@AliciaSchep and @rtaph, So just to update, I've been working on implementing changes and edits to Before getting into it, I just wanted to say I've learned A LOT during this review and while working on refining the package. Thank you both again for taking the time. The tl:dr version of changes is here. Overview of Major Changes
Replies to review comments
This makes good sense, the two examples using
Yeah this was originally done for CRAN submission, but I've removed the
This has been added.
I split up as many of the tests as I could, so they're now more manageable.
@rtaph 's assessment was correct, the
I've expanded Travis CI to now build on Linux and Mac, and have added AppVeyor for Windows CI (plus all the badges!).
This was a great idea, all exported functions now have the
Function
This has been addressed.
This has been fixed,
I had a question about this and adding a date to
This has been added.
Done, I've added
I thought this was great, I had always felt kinda weird about returning a list. I made this change.
This change has been made. I'm checking the response from
This has been implemented. The package now features the country reference table and commodity reference table saved as .rda files to
This is done. Upon package load, three variables are created in pkg env
I fixed this.
I've added a section in the vignette on rate limits and using function
I have the tasks of create the API url and actually make the request split into two separate functions, but they aren't as modular as what you're getting at here. I did add to
This is done, using the method that was suggested: paste(Sys.info()[["user"]], R.version$version.str, version$platform, sep = ", ")
Nope there was no upside to using the "csv" option previously. I've removed it, and thus removed arg
Is this suggestion in regard to the API version or the version of the
I edited all col names within I also edited the column names approach slightly, the args are now
I'm checking for status code 200 in function
These are fixed.
Yeah at some point after getting on CRAN I was having issues with the tests being skipped on Travis, and commenting out
Again, I'm not sure I understand. If I include the col names as package data, I assume I would also export a function for accessing and using it?
This has been done. I've added these three items as attributes to the return data frame from
See my earlier comment on adding date to
I reordered the function args in
I wasn't sure about this, and have left it as is for now. I think I prefer the way it's set up now, but that's just me (if I knew that most users disagreed, I would certainly be inclined to change it). Is this typically handled using options in the major R packages?
I initially misread this comment and am just now realizing what you're getting at, so I haven't done anything to address this. Just to be clear, and using Germany as an example, you're saying go from this: > country_df[country_df$`country name` == "Germany", ]
code country name type
101 276 Germany reporter
373 276 Germany partner to this: > country_df[country_df$`country name` == "Germany", ]
code country name reporter partner
1 276 Germany TRUE TRUE
I haven't implemented this. I like the idea, but for both of the ct_commodity_lookup(search_terms, return_code = FALSE, return_char = FALSE, verbose = TRUE, ignore.case = TRUE, ...) That seems weird to me though, to say "
In all of the testing I've done, it's never been an issue. In the cases in which there is legitimately no data to return based on the input values (and thus no error will be thrown), within internal function Edits I'm planning to make
|
Props @ChrisMuir on implementing so many major changes so quickly! I installed ropensci/comtradr@724d723 and took your package for spin. Everything works nicely! Based on your comments, I have some follow-on remarks:
Yes, my point was related to the API version (not the
Personally I like your choice of setting
Yes, I am thinking of python with my use of 'dictionary'-- namely, a list (or vector) of uniquely named key-value pairs. My point, really, was two-fold:
To illustrate with a 2-entry example # API dictionary, for internal use since the user does not need to interact
# with the the original API naming
h1 <- list(classification = "pfCode",
trade_flow = "rgDesc")
devtools::use_data(h1, internal = TRUE)
# Human-readable dictionary, exported
h2 <- list(`Classification` = "classification",
`Trade Flow` = "trade_flow")
h2 <- c(`Classification` = "classification", # or as a vector
`Trade Flow` = "trade_flow")
devtools::use_data(h2) I guess I liked that you previously had human-readable variants of names, but thought the implementation would be stronger if this mapping was exposed to the user to use as they saw fit. A dictionary would let users do something like Additionally, setting
Okay great! I just have worked with (scraped) HTML and XML data in the past where dataframe simplification arguments failed. You know the API well so if it's not a problem you have encountered, then it's a non-issue!
Ok!
Yes, exactly. It's a small change, but I think makes it easier for a new user to grasp the data structure and filter with logicals.
Interesting about the date field... I did not know about the CRAN behaviour! I defer to @sckott on adding the date field to the A few last suggestions:
|
Great work @ChrisMuir! You addressed all my comments in the review, and tests & examples now run without issue on my computer. I like all the changes you made to the function names, arguments, output format, etc. |
Hi @AliciaSchep and @rtaph , Thanks for the latest comments, I have a few more edits to make. I'm going out of town Wednesday thru Monday, so I may not finish up until next week. I will reply back when I have the next set of edits done, or if I have any further questions, though I think I have everything I need. Thanks so much! |
Hi @rtaph and @AliciaSchep , I've finished up the second round of edits and changes to the package.
I've added
I added internal pkg data to apply snake_case col headers to API data as it's returned from I've removed function
This change has been made, the country reference table has been simplified, it now has dimensions 294 x 4.
I've added this to the
This has been changed.
This was a great suggestion, I added a step within
Yeah, I've never seen "period_desc" return different values than variable "period", and I'm not sure exactly what the difference is between the two variables. However, I'm hesitant to start cutting variables from the API data, so I've kept it for now. In addition to these edits, I've updated the package vignette to go over the use of package data, and the functions related to the package data. Overall, I feel great about the status of the package (especially compared to pre-review!), if there's any additional comments/thoughts/changes I'm happy to hear them. Thanks! |
Let me start by saying great work, @ChrisMuir ! It's easy to for us reviewers to pick apart a package, much harder to implement changes. I am recommending If @sckott has any citation guidance, perhaps your README should change, but to me your proposed approach sounds reasonable. Last suggestion (I hope): Removing 'inst/doc' from your |
👍 👍 👍 Thanks!
Oh these are both great ideas, I will look into implementing them after work today. Thanks! |
So I've been working on the two vignette changes, but I'm having trouble getting the the HTML vignette included with the package. I've tried a few different things, using this page and this page as references, but I keep running into the "rdb is corrupt" issue described here and discussed here. I'll keep working at it. |
I've updated the |
Sorry for delay, was at a conference last week. Nice work on all changes. We don't have any policies specifically about relationship between review and cran submissions. Some pkgs submitted are already on cran even. It does kinda make sense to wait to submit if you think the review will be done soon :) So about DATE in citation: date is filled in on package build step, so AFAIK it's best practice to not fill that in yourself because you may forget to update it. That does mean that if a user does e.g., About the vignette: i'll check it out tmrw |
for the vignette issue: Once it's on CRAN, vignette should be installed by default, so it won't be an issue unless users are installing dev version. linking to vignette from a manual page is a constant problem, but can put a link to the online (github/CRAN) version on the man page 😄 Another option is to use pkgdown, but that's in addition to the vignette. |
Hi Scott, Thanks for the input. The package is currently on CRAN, this would just be submitting the updated version with all of the pkg review updates. I'll hold off on submission though, as the review is (I believe) close to being done. I worked more on the vignette issue tonight, I think it's now fixed and all is working properly....vignette is now available after running So I think all package edits have either been implemented or addressed 😄 |
Okay, on CRAN already, got it. Great, will have one last quick look |
Approved!
|
Great to hear! Thanks @sckott . I've transferred the package to On the blog post, I'd be interested, but my wife and I are moving to Nashville, TN in two weeks, and will be super busy for the next 3-4 weeks. So it would have to wait until after the move. One question, should I add the "peer reviewed" badge to the repo? |
Also, just want to say again to @rtaph and @AliciaSchep, I truly appreciate the help and the time that was put in. I think I learned more about building packages in the last month than I had learned in the previous two years 😄 , and the Thanks so much!! |
Great, thanks @ChrisMuir We'll get in touch with you after the next month or so about the blog post. yes! do add the peer review badge. like
|
Hi @ChrisMuir . I'm rOpenSci's Community Manager. Following up to ask if you're ready to write a blog post about For blog post we ask for a draft via pull request one week prior to the publication date. You can find technical and editorial help here: https://github.com/ropensci/roweb2#contributing-a-blog-post. Perhaps we can continue this discussion via DM on rOpenSci Slack, when you have time. |
Hi @stefaniebutland, good to hear from you! Yes I'm still interested in doing a blog post, but I'm thinking sometime in the new year might be best.....My wife and I are finally getting to a point in which we're settled after our move, we were out of town for Thanksgiving and will be visiting family for a week over Christmas.....busy busy :) I have a couple questions about the process, I will send you a DM on Slack. Thanks so much! |
Hi @AliciaSchep and @rtaph , hope you're both doing well! I just saw this rOpenSci blog post about adding reviewers to a pkg.....I've been thinking So I wanted to see if this was okay with you both. If so, let me know if you also have an orcid.org ID that you'd like me to include (like in this example). Thanks! |
Hi Chris! Sure, I appreciate the consideration! My orcid is 0000-0002-3092-3493. Cheers! |
Hi Chris, Happy to be included -- my ORCID is 0000-0002-3915-0618. Thank you! |
Awesome, this is great. Thanks to you both! |
Hi @rtaph , for your name in the |
Hi Chris,
If it's all the same, I would prefer the full name. Thanks again!
On Mar 22, 2018 20:17, "Chris Muir" <notifications@github.com> wrote:
Hi @rtaph <https://github.com/rtaph> , for your name in the DESCRIPTION
file, should I use person("Rafael", "P.H.", role = "rev", etc...), or would
you prefer I use your full last name?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#141 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AI131Q87YUt1Wa4NvmhpqkJb0pTu55K_ks5tg-q8gaJpZM4OwM1b>
.
|
Summary
API wrapper pkg for the UN Comtrade Database, which features inter-country trade data dating back to the early 1990's. Full API documentation can be found here. This package allows users to interact with the API directly from R, and features functions for making queries and importing data.
https://github.com/ChrisMuir/comtradr
Data retrieval, this is a API package.
Anyone that could benefit from easy access to historical inter-country shipping data. This can include scientists studying supply chains, economists, journalists, etc.
yours differ or meet our criteria for best-in-category?
The Comtrade website has a "Getting API Data with R" post (here), but there are a number of issues with the code:
rjson
package.I've found a few R packages online, but they're all just the code from the Comtrade tutorial wrapped up in a package. There's a Python API package, tradedownloader.
Requirements
Confirm each of the following by checking the box. This package:
Publication options
paper.md
with a high-level description in the package root or ininst/
.Detail
Does
R CMD check
(ordevtools::check()
) succeed? Paste and describe any errors or warnings:Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:
If this is a resubmission following rejection, please explain the change in circumstances:
If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:
The text was updated successfully, but these errors were encountered: