Package maintenance #61

ejolly · 2020-05-06T01:12:57Z

Looking towards the future

Please contribute if you can!

Developing and maintaining this package and so far been primarily performed by myself. I've had some helpful contributions from various folks, but currently any new changes, issues, bugs, etc ultimately fall to me.

While this is to be expected when starting an open-source project, it has become increasingly difficult to maintain. This is primarily because subtle changes in new versions of rpy2, the library that pymer4 depends upon, result in breaking changes more often than not.

Currently the issue list for rpy2 is 200+ between their github repo and their bitbucket repo. Many of these are installation issues, but many are not the fault of rpy2 developers, but rather numerous changes to R packages (e.g lme4) and R language itself. While R itself doesn't change that often, version updates and how attributes, etc are stored in various packages do (e.g. switching from lsmeans to emmeans). I don't envy their job keeping up with these changes, but I can't help but feel like I'm getting a taste of it too. The way that many issues manifest for pymer4 users often entails hunting down how some data-structure or function calls have been updated on the R side of things (e.g. #60).

I'd love to keep maintaining this package, but I'm trying to think about the most sustainable way to do so moving forward. If you have suggestions, even if that doesn't involve direct contributions, I'd love to hear them! In the meantime an updated release will require dealing with compatibility issues noted in #58 at minimum. Some of these have already been started in PR #62.

A few options I've considered (I'll add more as I think of them and receive suggestions):

Option 1

Create a docker container with a known working version of pymer4, Python, R, and all their dependencies

Pros

It would be relatively easy to install without any unexpected breaking changes since everything would exist in a frozen isolated environment
Bug fixes and additions to pymer4 would still be possible

Cons

There would be very infrequent/no new updates from R, lme4, Python, etc
Being isolated from the rest of one's system install means that any other packages, tools (e.g. jupyter notebooks) would either have to be included in the container or linked to from outside (if possible) adding more complications and not making pymer4 an easy solution to simply drop-in to one's existing analysis workflow.

Option 2

Rewrite the code base in a way that is robust to future changes. The current code base can already use some improvement and is very reflective of my learning trajectory in Python package development. The primary way I can see this happening now, is to simply make direct calls to the R language by writing R code from within Python rather than relying on rpy2 objects, methods, and classes.

Pros

This could be relatively robust, assuming there are no syntax changes in how to perform certain operations in R itself. Even if so those would be relatively easy to change.

Cons

This would take a significant amount of time for myself alone, but could go faster with community contributions
We would still need to figure out the most optimal way of extracting the required outputs from the R calls that are similarly robust to version updates

Option 3

Freeze rpy2 dependencies only and sync major pymer4 updates with rpy2 updates.

Pros

This would make maintenance much easier

Cons

It kind of kicks the bucket down the road as eventually big updates will have to occur to account for changes in rpy2
At the same time it could result in slower critical updates in case rpy2 pushes changes because something in core R changes

Option 4

Archive the project in its current state 😢

Pros

Similar to option 1, there would at least be a fully functional version on github for the community, but it would be left up to users to figure out the best way to perform a conflict free install on their own machines
Time cost is minimal

Cons

Clearly my least favorite option and one that essentially abandons the project

The text was updated successfully, but these errors were encountered:

turbach · 2020-05-30T01:24:14Z

Hi all, first off thanks @ejolly for all the hard work, I would like to help out.

I recently started unpinning from Py 3.6 and numpy 1.18 and moving up to pandas 1.0+, and rpy2 3+. While troubleshooting a pymer4.Lmer multiprocessing.Pool issue in 0.7.0 I may have come across the cause (robjects.NA_Character) and a simple fix related to #57, will test further in coming days and post about those issues separately from this.

edit: see PR #63, #64

As for general maintenance, our lab runs a (very) mixed Python and R stack on CentOS linux that includes an open-source project fitgrid that depends on pymer4. So we are quite familiar with dependency whack-a-mole. Our approach is a bit different though and we have gradually migrated away from pip and now use conda (miniconda3, conda-build) for virtual envs, installation, and packaging (and are not the only ones ... NVIDIA Rapids).

The Pymer4 installation docs recommend conda installs for r-base and friends and then switches to pip for pymer4. Our past experience using conda and pip installs in the same env has not been good and we no longer mix the two. In practice this means we only conda install into conda envs. Conda envs have some of the virtues of containerization but the main selling point for us has been the conda install dependency solver precisely because it helps negotiate consistent versions of Python (pandas, numpy, scipy, rpy2, jupyter, ...) and R (lme4, lmerTest, tidyverse, rstudio, ...) packages in one env.

If you're on board with conda for R what about conda packaging pymer4 as well? We packaged a version in house to get our stack to work (pymer4, details on request).

We also carry the conda-ification through the CI. Instead of having Travis pip install and run pytests, we have Travis conda-build the package and then conda install it into a fresh conda environment and run the pytests in that. So the CI checks the pytests in an env populated with the latest greatest versions of what ever compatible dependencies the conda dependency solver comes up with that night with the package installed from the same .tar.bz2 binary a user would conda install from Anaconda Cloud. For releasing, Travis deploy is setup to trigger on v.N.N.N tags and rebuild the sphinx docs and upload the fresh built and tested packages to Anaconda Cloud and PyPI. So dropping a semantically versioned release on github automagically updates and syncs the docs and repo packages with the right version number.

The CI also helps with dependency whack-a-mole. Travis sends an email if there is a breaking change somewhere out there in the unpinned dependency world which is what we really want to know. We hope this doesn't happen too often. When it does, diagnosis is to check the conda list dump in the travis log to see what has changed from the last good run. Triage the problem. If it is obvious and easily fixed in the project code, fix it. If it is hard or in some dependency we don't control, do a minimal pin to the last working version of the offending dependency(ies) in conda/meta.yaml. Either way, pull the hotfix to master and do a github v.N.N.N+1 patch release to refresh the repos.

This keeps pytests passing in Travis and in the latest release in the public repos while buying time to defer major overhauls which might be tackled better down the road, say after a big dependency like pandas does a major release. A potential advantage of minimal pinning vs. freezing and containers is that the many dependencies with non-breaking changes are allowed to evolve so the rest of the environment stays as current as possible.

Obviously none of this fixes bugs or solves the problem of keeping up with breaking changes in rpy2 and R. And I should add that this is only for Py 3.6+ and linux64. Perhaps OSX is in reach but I don't have time for that let alone Windows or 64-bit exotics or 32-bit anything. And there's no free lunch, conda packaging brings new problems like wrangling channel priority and official conda packaged versions tend to lag PyPI. But my take is that conda does offload some of the dependency headaches to the conda solver and it makes the package much more likely to mix and match and install smoothly and reliably with other packages at least in conda envs.

My 2c. Thanks again.

Tom

ejolly · 2020-05-30T19:31:27Z

First off @turbach WOW. I can't express enough how much I appreciate your offer to help and this detailed comment!

I've haven't run across many labs that actively maintained a mixed Python and R stack so your experience here is immensely helpful. A conda package for pymer4 was actually a long-term goal for me! However, since I don't have much experience with building a conda package, my original plan was to essentially make one by converting the pip setup I already had based on the documentation here. After thinking about how to incorporate the R side of things though I realized that a quick conversion wasn't really going to quick at all so I didn't pursue it any further and haven't had a change to revisit it since. The fact that your lab already has a conda package for pymer4 in house blows me away! (side note: apologies if I'm gushing a bit here, but I'm always a little amazed when pymer4 is so actively incorporated into other libraries/setups since I'm mostly only aware of end-users of the library. Seeing it in software stacks like yours is so galvanizing as an independent OSS developer.)

Suffice to say I'm very much on-board with conda packaging for pymer4 as well. I know many users already rely on conda for their scientific computing stacks so this would given them the flexibility to install it into their existing environments, or create separate ones if desired. Your CI workflow seems amazing as well! My current flow is very inefficient especially with regards to documentation updates. For other projects I've typically used sphinx + readthedocs, but again the mixed language nature of the project means that RTD config now requires an R installation to build the tutorials etc which was a huge pain. Instead I just build the docs locally, and push them to a gh-pages branch on pymer4's project repo which redirects to my personal website url.

I'd love to see the details of the config files and any notification hooks from Travis or receive any assistance on this. Offloading as much of this as possible, as you described, to Anaconda's solver + Travis would make things a heck of a lot easier.

On an unrelated note, I was also completely unaware of fitgrid which looks fantastic, as does the general "mass-univariate" approach applied to EEG in Smith & Kutas, 2015. Cosan lab, in which I work, primarily focuses on fMRI but also some intracranial EEG, so I'm going to share with some lab mates who may be interested in using multi-level modeling across sensor channels. We maintain a separate library primarily for fMRI analyses called nltools, but we've had discussions about extending it in different ways and multi-level modeling always comes up. It's exceedingly rare fMRI settings but definitely seems important particularly with respect incorporating item/trial level random effects terms, e.g. Westfall, Nichols, & Yarkoni, 2016.

turbach · 2020-05-31T02:42:57Z

OK great and thanks for the quick reply. It all sounds good, I wasn't sure if you were committed to pure pip installs on principle. As for going the conda route there are some interesting choice points ... food for thought, don't have decide right away. What channel does pymer4 live on? Building the package and uploading to Anaconda Cloud personal or organization channels is pretty painless. For conda-forge there are some hoops to jump through (contributing packages <https://conda-forge.org/docs/maintainer/adding_pkgs.html>). And, I assume, some control is ceded to maintainers, and it may bring more responsibility. I haven't paid much attention, not worth the effort to me for our lab's stuff, but this is your baby, maybe you want it on conda-forge. What architectures/platforms are supported for conda installation? Actually? Potentially? If asked, conda will autobuild package binaries for everything under the sun, but who knows what really works? So shotgun build everything in case someone with a PowerPC might discover, hey it works? Or just build for what you really care about and actually test on? Dunno. What about support for pip installs and PyPI? Keep uploading packages that are easy to install but difficult to make work? Or make a clean break and cut the pip cord? I used to think there was no harm in uploading to PyPI, but if easy pip installs into who knows what env with who knows what Python and R churns misery in Issues I'm not so sure. So these are questions about what to do. How to do it should be pretty straightforward. You already have Travis running and building conda envs which is great. The extra bits are building the conda package (a one liner) and adding the .travis.yml deploy bits to build docs and upload packages on command. I haven't looked at what you're doing with the pymer4 docs yet, will check it out. Part of the trick for syncing the packages and docs to github releases is ensuring the version numbers sprinkled around the package in odd places ... conda/meta.yaml, setup.py, docs ... etc. agree. I don't have a particularly good system for this but whittled it down to two manual changes ... one in meta.yaml and one in pkg/__init__.py that have to agree. The rest of the complication is telling Travis when to deploy and making sure things are in order when it does ... PyPI is especially picky about version numbers, more on that if/when we get there. You have it exactly right for seeding the conda build for pymer4: conda skeleton pypi pymer4 ... that translates requirements.txt into conda/meta.yaml which can be used as is or tweaked to tune the conda build. Then its just conda build conda and finding where the binaries are hiding (or setting the output directory). Then you can install them on your local machine for testing with conda install -c local and upload to a conda channel. Or, if you want to build for other platforms, conda convert -p builds the binary for every platform conda knows about and you upload those too. That's it. It is a useful trick for pulling a PyPI package into conda if nobody else has and exactly how we wound up with kutaslab/pymer4. The rest is just automating this in the CI. For particulars, this .travis.yml does local conda building and installing during CI. The deployment scripts in ci do the conda package conversion and uploads and build and deploy the sphinx docs. https://github.com/kutaslab/spudtr/blob/master/.travis.yml https://github.com/kutaslab/spudtr/tree/master/ci These are early efforts where I was trying out various things ... so overly complicated and I can't say they are the best way. Travis has builtin integration for gh-pages deployment. I don't recall why I wanted to work around it, perhaps just to make sure I could if necessary. So thanks for checking in, if you have questions or want to chat let me know ... we're all about Zoom these days and it can be more efficient than email for discussions. Best, - Tom

…

On Sat, May 30, 2020 at 12:31 PM Eshin Jolly ***@***.***> wrote: First off @turbach <https://github.com/turbach> WOW. I can't express enough how much I appreciate your offer to help! I've haven't run across many labs that actively maintained a mixed Python and R stack so your experience here is immensely helpful. A conda package for pymer4 was actually a long-term goal for me! However, since I don't have much experience with building a conda package, my original plan was to essentially make one by converting the pip setup I already had based on the documentation here <https://docs.conda.io/projects/conda-build/en/latest/user-guide/tutorials/build-pkgs-skeleton.html>. After thinking about how to incorporate the R side of things though I realized that a quick conversion wasn't really going to quick at all so I didn't pursue it any further and haven't had a change to revisit it since. The fact that your lab already has a conda package for pymer4 in house blows me away! (side note: apologies if I'm gushing a bit here, but I'm always a little amazed when pymer4 is so actively incorporated into other libraries/setups since I'm mostly only aware of end-users of the library. Seeing it in software stacks like yours is so galvanizing as an independent OSS developer.) Suffice to say I'm very much on-board with conda packaging for pymer4 as well. I know many users already rely on conda for their scientific computing stacks so this would given them the flexibility to install it into their existing environments, or create separate ones if desired. Your CI workflow seems amazing as well! My current flow is very inefficient especially with regards to documentation updates. For other projects I've typically used sphinx + readthedocs <https://readthedocs.org/>, but again the mixed language nature of the project means that RTD config now requires an R installation to build the tutorials etc which was a huge pain. Instead I just build the docs locally, and push them to a gh-pages branch <https://github.com/ejolly/pymer4/tree/gh-pages> on pymer4's project repo which redirects to my personal website url. I'd love to see the details of the config files and any notification hooks from Travis or receive any assistance on this. Offloading as much of this as possible, as you described, to Anaconda's solver + Travis would make things a heck of a lot easier. On an unrelated note, I was also completely unaware of fitgrid which looks fantastic, as does the general "mass-univariate" approach applied to EEG in Smith & Kutas, 2015. Cosan lab <http://cosanlab.com/>, in which I work, primarily focuses on fMRI but also some intracranial EEG, so I'm going to share with some lab mates who may be interested in using multi-level modeling across sensor channels. We maintain a separate library primarily for fMRI analyses called nltools <https://neurolearn.readthedocs.io/en/latest/>, but we've had discussions about extending it in different ways and multi-level modeling always comes up. It's exceedingly rare fMRI settings but definitely seems important particularly with respect incorporating item/trial level random effects terms, e.g. Westfall, Nichols, & Yarkoni, 2016 <https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5428747/>. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#61 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABRT3SCTH5QNR3VNZK27MYLRUFNJXANCNFSM4M2BFLIQ> . [image: image.gif]

ejolly added discussion help wanted labels May 6, 2020

ejolly mentioned this issue May 6, 2020

Contrasts seem to be broken on rpy2 >= 3.2 #60

Closed

turbach mentioned this issue May 31, 2020

Package updates for dependencies compatibility #58

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Package maintenance #61

Package maintenance #61

ejolly commented May 6, 2020 •

edited

Loading

turbach commented May 30, 2020 •

edited

Loading

ejolly commented May 30, 2020 •

edited

Loading

turbach commented May 31, 2020 via email

Package maintenance #61

Package maintenance #61

Comments

ejolly commented May 6, 2020 • edited Loading

Looking towards the future

Option 1

Pros

Cons

Option 2

Pros

Cons

Option 3

Pros

Cons

Option 4

Pros

Cons

turbach commented May 30, 2020 • edited Loading

ejolly commented May 30, 2020 • edited Loading

turbach commented May 31, 2020 via email

ejolly commented May 6, 2020 •

edited

Loading

turbach commented May 30, 2020 •

edited

Loading

ejolly commented May 30, 2020 •

edited

Loading