-
Notifications
You must be signed in to change notification settings - Fork 676
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
May I contribute the complete Gapminder data set to your repo? #20
Comments
If it came in, I'd want to have clean R scripts that show how it is produced from its inputs, as I've done with the small excerpt that's already here. Look in the Based on the three variables I worked with, there was lots of fussing with inconsistent country names, missing and inconsistent continents, etc. I assume all this is much worse when trying to unify multiple datasets. What's the situation on that front? What is the final file size of this csv? Thanks! |
Although the file's geography is ISO_3166-1 alpha 3-compliant, almost entirely, I'll need to do some "fussing" of my own to get into full compliance. From there it can be put through conversion to whatever standard you like. The scripts to do this can be included to build the compliant file directly from Gapminder's github repository. If necessary it can be an "R" script that calls out to do the The problem isn't as bad as it may, at first, appear. Click on the link I gave to the "Gapminder github repository" above. The |
I will look into the feasibility of including such a compressed file. I haven't done so in the past ... |
It looks like the dataset might be an acceptable size, if included as a compressed R data file. Apparently it's borderline, with 5MB being the current ceiling these days. So I can't guarantee anything. If you want to pursue it, though, the next step would be a PR with one or more scripts that enact dataset construction and that verify or enforce the consistency of countries/continents, as is done with the current excerpt. Then I'd need to think about how to make it most useful to others, since a data frame with >500 variables is a bit unwieldy. |
I'll fork/clone your repo and make the mods for your requested Pull Request. I have filtered out all but rows that are alpha-3 compliant with the R script included below. The 8M bz2 of the resulting tsv is at this link. I note your iso file has 188 country codes. The source I used to filter for compliance had 248 in this json file. I can construct a country code tsv file congruent with yours from the aforelinked json file using the |
I've submitted pull request 21. Let me know if there is anything I can do to help you evaluate it and/or improve the request. |
Thanks @jabowery. I'm about to enter a period of travel. If this falls off my radar, please don't hesitate to ping me here sometime next week. |
While you're gone, I'll be putting it through some paces and expect to do some more cleaning. For instance, I discovered one of the indicators has a name starting with a numeral. |
One pretty urgent inconsistency is that North Korea (Korea, Dem. Rep.) has an iso code of KOR, when in fact it should be PRK. |
(The country code for North Korea has been fixed, in 003f98f. And I plan to make a release.) I just touched this repo for the first time in many years, so realistically, I'm going to close remaining issues and PRs, just because it's quite clear that I need to treat this package as "finished". I think it's just going to be a little time capsule. |
I came here to say that including May I suggest that you @jabowery spin it off to another github-only package which could be appropriately named |
Jenny,
I've written an "R" script that joins the entire Gapminder database into a single csv file, given the Gapminder github repository of said data, which is organized one file per variable.
Since you already have the name "gapminder" in CRAN, would you mind terribly taking ownership of this, the complete, set of Gapminder data?
Thanks!
-- Jim
PS: Let me know how I can be of any further assistance.
The text was updated successfully, but these errors were encountered: