rdata, read R datasets from Python

Submitting Author: (@vnmabus)
All current maintainers: (@vnmabus)
Package Name: rdata
One-Line Description of Package: Read R datasets from Python.
Repository Link: https://github.com/vnmabus/rdata 
Version submitted: 0.9.2.dev1 
Editor: @isabelizimm   
Reviewer 1: @rich-iannone
Reviewer 2: @has2k1
Archive: [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.10776842.svg)](https://doi.org/10.5281/zenodo.10776842)
JOSS DOI: [![DOI](https://joss.theoj.org/papers/10.21105/joss.07540/status.svg)](https://doi.org/10.21105/joss.07540)
Version accepted: [0.11.0](https://github.com/vnmabus/rdata/releases/tag/0.11.0)
Date accepted (month/day/year): 2/29/2024

---

## Code of Conduct & Commitment to Maintain Package

- [x] I agree to abide by [pyOpenSci's Code of Conduct][PyOpenSciCodeOfConduct] during the review process and in maintaining my package after should it be accepted.
- [x] I have read and will commit to package maintenance after the review as per the [pyOpenSci Policies Guidelines][Commitment].

## Description

- The package rdata allows to parse `.rda` and `.rds` files, containing serialized R objects, and convert them to Python. The users can influence this conversion and provide conversion routines for custom classes.


## Scope

- Please indicate which category or categories. 
Check out our [package scope page][PackageCategories] to learn more about our 
scope. (If you are unsure of which category you fit, we suggest you make a pre-submission inquiry):

	- [ ] Data retrieval
	- [ ] Data extraction
	- [x] Data processing/munging
	- [ ] Data deposition
	- [ ] Data validation and testing
	- [ ] Data visualization[^1]
	- [ ] Workflow automation
	- [ ] Citation management and bibliometrics
	- [ ] Scientific software wrappers
	- [ ] Database interoperability

Domain Specific & Community Partnerships 

	- [ ] Geospatial
	- [ ] Education
	- [ ] Pangeo
	

## Community Partnerships
If your package is associated with an 
existing community please check below:

- [ ] [Pangeo][pangeoWebsite]
	- [ ] My package adheres to the [Pangeo standards listed in the pyOpenSci peer review guidebook][PangeoCollaboration]

> [^1]: Please fill out a pre-submission inquiry before submitting a data visualization package.

- **For all submissions**, explain how the and why the package falls under the categories you indicated above. In your explanation, please address the following points (briefly, 1-2 sentences for each):  

Its main purpose is to be able to read .rda and .rds files, the files used for storing data in the R programming language, and convert them to Python objects for further processing.

  - Who is the target audience and what are scientific applications of this package?

The target audience includes users that want to open in Python datasets created in R. These include scientists working in both Python and R, scientists who want to compare results among the two languages using the same data, or simply Python scientists that want to be able to use the numerous datasets available in CRAN, the R repository of packages.

  - Are there other Python packages that accomplish the same thing? If so, how does yours differ?

The package rpy2 can be used to interact with R from Python. This includes the ability to load data in the RData format, and to convert these data to equivalent Python objects. Although this is arguably the best package to achieve interaction between both languages, it has many disadvantages if one wants to use it just to load RData datasets. In the first place, the package requires an R installation, as it relies in launching an R interpreter and communicating with it. Secondly, launching R just to load data is inefficient, both in time and memory. Finally, this package inherits the GPL license from the R language, which is not compatible with most Python packages, typically released under more permissive licenses.
The recent package pyreadr also provides functionality to read some R datasets. It relies in the C library librdata in order to perform the parsing of the RData format. This adds an additional dependency from C building tools, and requires that the package is compiled for all the desired operating systems. Moreover, this package is limited by the functionalities available in librdata, which at the moment of writing does not include the parsing of common objects such as R lists and S4 objects. The license can also be a problem, as it is part of the GPL family and does not allow commercial use.

  - If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or `@tag` the editor you contacted:

https://github.com/pyOpenSci/software-submission/issues/143

## Technical checks

For details about the pyOpenSci packaging requirements, see our [packaging guide][PackagingGuide]. Confirm each of the following by checking the box. This package:

- [x] does not violate the Terms of Service of any service it interacts with. 
- [x] uses an [OSI approved license][OsiApprovedLicense].
- [x] contains a README with instructions for installing the development version. 
- [x] includes documentation with examples for all functions.
- [x] contains a tutorial with examples of its essential functions and uses.
- [x] has a test suite.
- [x] has continuous integration setup, such as GitHub Actions CircleCI, and/or others.

## Publication Options

- [x] Do you wish to automatically submit to the [Journal of Open Source Software][JournalOfOpenSourceSoftware]? If so:

<details>
 <summary>JOSS Checks</summary>  

- [x] The package has an **obvious research application** according to JOSS's definition in their [submission requirements][JossSubmissionRequirements]. Be aware that completing the pyOpenSci review process **does not** guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS.
- [x] The package is not a "minor utility" as defined by JOSS's [submission requirements][JossSubmissionRequirements]: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria.
- [ ] The package contains a `paper.md` matching [JOSS's requirements][JossPaperRequirements] with a high-level description in the package root or in `inst/`.
- [x] The package is deposited in a long-term repository with the DOI: 10.5281/zenodo.6382237

*Note: JOSS accepts our review as theirs. You will NOT need to go through another full review. JOSS will only review your paper.md file. Be sure to link to this pyOpenSci issue when a JOSS issue is opened for your package. Also be sure to tell the JOSS editor that this is a pyOpenSci reviewed package once you reach this step.*
  
</details>

## Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?
This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.

- [x] Yes I am OK with reviewers submitting requested changes as issues to my repo. Reviewers will then link to the issues in their submitted review.

Confirm each of the following by checking the box.

- [x] I have read the [author guide](https://www.pyopensci.org/software-peer-review/how-to/author-guide.html). 
- [x] I expect to maintain this package for at least 2 years and can help find a replacement for the maintainer (team) if needed.

## Please fill out our survey

- [x] [Last but not least please fill out our pre-review survey](https://forms.gle/F9mou7S3jhe8DMJ16). This helps us track
submission and improve our peer review process. We will also ask our reviewers 
and editors to fill this out.

**P.S.** Have feedback/comments about our review process? Leave a comment [here][Comments]

- I received feedback in the presubmission inquiry, indicating that the package could benefit from more detailed examples in the vignette format. The infrastructure is here (I use scikit-gallery for creating the online notebook) but I would like to know more concretely which examples you think that would be more beneficial to highlight.
- Although I want to submit the paper to JOSS if possible, I did not write the paper file yet (I hope that this is not necessary at this step).

## Editor and Review Templates

The [editor template can be found here][Editor Template].

The [review template can be found here][Review Template].

[PackagingGuide]: https://www.pyopensci.org/python-package-guide/

[PackageCategories]: https://www.pyopensci.org/software-peer-review/about/package-scope.html

[JournalOfOpenSourceSoftware]: http://joss.theoj.org/

[JossSubmissionRequirements]: https://joss.readthedocs.io/en/latest/submitting.html#submission-requirements

[JossPaperRequirements]: https://joss.readthedocs.io/en/latest/submitting.html#what-should-my-paper-contain

[PyOpenSciCodeOfConduct]: https://www.pyopensci.org/governance/CODE_OF_CONDUCT

[OsiApprovedLicense]: https://opensource.org/licenses

[Editor Template]: https://www.pyopensci.org/software-peer-review/appendices/templates.html#editor-s-template

[Review Template]: https://www.pyopensci.org/software-peer-review/appendices/templates.html#peer-review-template

[Comments]: https://pyopensci.discourse.group/

[PangeoCollaboration]: https://www.pyopensci.org/software-peer-review/partners/pangeo

[pangeoWebsite]: https://www.pangeo.io
[Commitment]: https://www.pyopensci.org/software-peer-review/our-process/policies.html#after-acceptance-package-ownership-and-maintenance


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

rdata, read R datasets from Python #144

Code of Conduct & Commitment to Maintain Package

Description

Scope

Community Partnerships

Technical checks

Publication Options

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

Please fill out our survey

Editor and Review Templates

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

rdata, read R datasets from Python #144

Description

Code of Conduct & Commitment to Maintain Package

Description

Scope

Community Partnerships

Technical checks

Publication Options

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

Please fill out our survey

Editor and Review Templates

Footnotes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions