-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Package maintenance #61
Comments
Hi all, first off thanks @ejolly for all the hard work, I would like to help out. I recently started unpinning from Py 3.6 and numpy 1.18 and moving up to pandas 1.0+, and rpy2 3+. While troubleshooting a pymer4.Lmer multiprocessing.Pool issue in 0.7.0 I may have come across the cause (robjects.NA_Character) and a simple fix related to #57, will test further in coming days and post about those issues separately from this. As for general maintenance, our lab runs a (very) mixed Python and R stack on CentOS linux that includes an open-source project fitgrid that depends on pymer4. So we are quite familiar with dependency whack-a-mole. Our approach is a bit different though and we have gradually migrated away from pip and now use conda (miniconda3, conda-build) for virtual envs, installation, and packaging (and are not the only ones ... NVIDIA Rapids). The Pymer4 installation docs recommend conda installs for r-base and friends and then switches to pip for pymer4. Our past experience using conda and pip installs in the same env has not been good and we no longer mix the two. In practice this means we only conda install into conda envs. Conda envs have some of the virtues of containerization but the main selling point for us has been the conda install dependency solver precisely because it helps negotiate consistent versions of Python (pandas, numpy, scipy, rpy2, jupyter, ...) and R (lme4, lmerTest, tidyverse, rstudio, ...) packages in one env. If you're on board with conda for R what about conda packaging pymer4 as well? We packaged a version in house to get our stack to work (pymer4, details on request). We also carry the conda-ification through the CI. Instead of having Travis pip install and run pytests, we have Travis conda-build the package and then conda install it into a fresh conda environment and run the pytests in that. So the CI checks the pytests in an env populated with the latest greatest versions of what ever compatible dependencies the conda dependency solver comes up with that night with the package installed from the same .tar.bz2 binary a user would conda install from Anaconda Cloud. For releasing, Travis deploy is setup to trigger on v.N.N.N tags and rebuild the sphinx docs and upload the fresh built and tested packages to Anaconda Cloud and PyPI. So dropping a semantically versioned release on github automagically updates and syncs the docs and repo packages with the right version number. The CI also helps with dependency whack-a-mole. Travis sends an email if there is a breaking change somewhere out there in the unpinned dependency world which is what we really want to know. We hope this doesn't happen too often. When it does, diagnosis is to check the This keeps pytests passing in Travis and in the latest release in the public repos while buying time to defer major overhauls which might be tackled better down the road, say after a big dependency like pandas does a major release. A potential advantage of minimal pinning vs. freezing and containers is that the many dependencies with non-breaking changes are allowed to evolve so the rest of the environment stays as current as possible. Obviously none of this fixes bugs or solves the problem of keeping up with breaking changes in rpy2 and R. And I should add that this is only for Py 3.6+ and linux64. Perhaps OSX is in reach but I don't have time for that let alone Windows or 64-bit exotics or 32-bit anything. And there's no free lunch, conda packaging brings new problems like wrangling channel priority and official conda packaged versions tend to lag PyPI. But my take is that conda does offload some of the dependency headaches to the conda solver and it makes the package much more likely to mix and match and install smoothly and reliably with other packages at least in conda envs. My 2c. Thanks again. Tom |
First off @turbach WOW. I can't express enough how much I appreciate your offer to help and this detailed comment! I've haven't run across many labs that actively maintained a mixed Python and R stack so your experience here is immensely helpful. A conda package for Suffice to say I'm very much on-board with conda packaging for pymer4 as well. I know many users already rely on conda for their scientific computing stacks so this would given them the flexibility to install it into their existing environments, or create separate ones if desired. Your CI workflow seems amazing as well! My current flow is very inefficient especially with regards to documentation updates. For other projects I've typically used sphinx + readthedocs, but again the mixed language nature of the project means that RTD config now requires an R installation to build the tutorials etc which was a huge pain. Instead I just build the docs locally, and push them to a gh-pages branch on I'd love to see the details of the config files and any notification hooks from Travis or receive any assistance on this. Offloading as much of this as possible, as you described, to Anaconda's solver + Travis would make things a heck of a lot easier. On an unrelated note, I was also completely unaware of |
OK great and thanks for the quick reply. It all sounds good, I wasn't sure
if you were committed to pure pip installs on principle.
As for going the conda route there are some interesting choice points ...
food for thought, don't have decide right away.
What channel does pymer4 live on? Building the package and uploading to
Anaconda Cloud personal or organization channels is pretty painless. For
conda-forge there are some hoops to jump through (contributing packages
<https://conda-forge.org/docs/maintainer/adding_pkgs.html>). And, I assume,
some control is ceded to maintainers, and it may bring more responsibility.
I haven't paid much attention, not worth the effort to me for our lab's
stuff, but this is your baby, maybe you want it on conda-forge.
What architectures/platforms are supported for conda installation?
Actually? Potentially? If asked, conda will autobuild package binaries for
everything under the sun, but who knows what really works? So shotgun build
everything in case someone with a PowerPC might discover, hey it works? Or
just build for what you really care about and actually test on? Dunno.
What about support for pip installs and PyPI? Keep uploading packages that
are easy to install but difficult to make work? Or make a clean break and
cut the pip cord? I used to think there was no harm in uploading to PyPI,
but if easy pip installs into who knows what env with who knows what Python
and R churns misery in Issues I'm not so sure.
So these are questions about what to do. How to do it should be pretty
straightforward. You already have Travis running and building conda envs
which is great. The extra bits are building the conda package (a one liner)
and adding the .travis.yml deploy bits to build docs and upload packages on
command.
I haven't looked at what you're doing with the pymer4 docs yet, will check
it out.
Part of the trick for syncing the packages and docs to github releases is
ensuring the version numbers sprinkled around the package in odd places ...
conda/meta.yaml, setup.py, docs ... etc. agree. I don't have a particularly
good system for this but whittled it down to two manual changes ... one in
meta.yaml and one in pkg/__init__.py that have to agree. The rest of the
complication is telling Travis when to deploy and making sure things are in
order when it does ... PyPI is especially picky about version numbers, more
on that if/when we get there.
You have it exactly right for seeding the conda build for pymer4: conda
skeleton pypi pymer4 ... that translates requirements.txt into
conda/meta.yaml which can be used as is or tweaked to tune the conda build.
Then its just conda build conda and finding where the binaries are hiding
(or setting the output directory). Then you can install them on your local
machine for testing with conda install -c local and upload to a conda
channel. Or, if you want to build for other platforms, conda convert -p
builds the binary for every platform conda knows about and you upload those
too. That's it. It is a useful trick for pulling a PyPI package into conda
if nobody else has and exactly how we wound up with kutaslab/pymer4.
The rest is just automating this in the CI. For particulars, this
.travis.yml does local conda building and installing during CI. The
deployment scripts in ci do the conda package conversion and uploads and
build and deploy the sphinx docs.
https://github.com/kutaslab/spudtr/blob/master/.travis.yml
https://github.com/kutaslab/spudtr/tree/master/ci
These are early efforts where I was trying out various things ... so overly
complicated and I can't say they are the best way. Travis has
builtin integration for gh-pages deployment. I don't recall why I wanted to
work around it, perhaps just to make sure I could if necessary.
So thanks for checking in, if you have questions or want to chat let me
know ... we're all about Zoom these days and it can be more efficient than
email for discussions.
Best,
- Tom
…On Sat, May 30, 2020 at 12:31 PM Eshin Jolly ***@***.***> wrote:
First off @turbach <https://github.com/turbach> WOW. I can't express
enough how much I appreciate your offer to help!
I've haven't run across many labs that actively maintained a mixed Python
and R stack so your experience here is immensely helpful. A conda package
for pymer4 was actually a long-term goal for me! However, since I don't
have much experience with building a conda package, my original plan was to
essentially make one by converting the pip setup I already had based on the
documentation here
<https://docs.conda.io/projects/conda-build/en/latest/user-guide/tutorials/build-pkgs-skeleton.html>.
After thinking about how to incorporate the R side of things though I
realized that a quick conversion wasn't really going to quick at all so I
didn't pursue it any further and haven't had a change to revisit it since.
The fact that your lab already has a conda package for pymer4 in house
blows me away! (side note: apologies if I'm gushing a bit here, but I'm
always a little amazed when pymer4 is so actively incorporated into other
libraries/setups since I'm mostly only aware of end-users of the library.
Seeing it in software stacks like yours is so galvanizing as an independent
OSS developer.)
Suffice to say I'm very much on-board with conda packaging for pymer4 as
well. I know many users already rely on conda for their scientific
computing stacks so this would given them the flexibility to install it
into their existing environments, or create separate ones if desired. Your
CI workflow seems amazing as well! My current flow is very inefficient
especially with regards to documentation updates. For other projects I've
typically used sphinx + readthedocs <https://readthedocs.org/>, but again
the mixed language nature of the project means that RTD config now requires
an R installation to build the tutorials etc which was a huge pain. Instead
I just build the docs locally, and push them to a gh-pages branch
<https://github.com/ejolly/pymer4/tree/gh-pages> on pymer4's project repo
which redirects to my personal website url.
I'd love to see the details of the config files and any notification hooks
from Travis or receive any assistance on this. Offloading as much of this
as possible, as you described, to Anaconda's solver + Travis would make
things a heck of a lot easier.
On an unrelated note, I was also completely unaware of fitgrid which
looks fantastic, as does the general "mass-univariate" approach applied to
EEG in Smith & Kutas, 2015. Cosan lab <http://cosanlab.com/>, in which I
work, primarily focuses on fMRI but also some intracranial EEG, so I'm
going to share with some lab mates who may be interested in using
multi-level modeling across sensor channels. We maintain a separate library
primarily for fMRI analyses called nltools
<https://neurolearn.readthedocs.io/en/latest/>, but we've had discussions
about extending it in different ways and multi-level modeling always comes
up. It's exceedingly rare fMRI settings but definitely seems important
particularly with respect incorporating item/trial level random effects
terms, e.g. Westfall, Nichols, & Yarkoni, 2016
<https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5428747/>.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#61 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABRT3SCTH5QNR3VNZK27MYLRUFNJXANCNFSM4M2BFLIQ>
.
[image: image.gif]
|
Looking towards the future
Please contribute if you can!
Developing and maintaining this package and so far been primarily performed by myself. I've had some helpful contributions from various folks, but currently any new changes, issues, bugs, etc ultimately fall to me.
While this is to be expected when starting an open-source project, it has become increasingly difficult to maintain. This is primarily because subtle changes in new versions of
rpy2
, the library thatpymer4
depends upon, result in breaking changes more often than not.Currently the issue list for rpy2 is 200+ between their github repo and their bitbucket repo. Many of these are installation issues, but many are not the fault of
rpy2
developers, but rather numerous changes to R packages (e.glme4
) and R language itself. While R itself doesn't change that often, version updates and how attributes, etc are stored in various packages do (e.g. switching fromlsmeans
toemmeans
). I don't envy their job keeping up with these changes, but I can't help but feel like I'm getting a taste of it too. The way that many issues manifest forpymer4
users often entails hunting down how some data-structure or function calls have been updated on the R side of things (e.g. #60).I'd love to keep maintaining this package, but I'm trying to think about the most sustainable way to do so moving forward. If you have suggestions, even if that doesn't involve direct contributions, I'd love to hear them! In the meantime an updated release will require dealing with compatibility issues noted in #58 at minimum. Some of these have already been started in PR #62.
A few options I've considered (I'll add more as I think of them and receive suggestions):
Option 1
Create a docker container with a known working version of
pymer4
, Python, R, and all their dependenciesPros
pymer4
would still be possibleCons
lme4
, Python, etcpymer4
an easy solution to simply drop-in to one's existing analysis workflow.Option 2
Rewrite the code base in a way that is robust to future changes. The current code base can already use some improvement and is very reflective of my learning trajectory in Python package development. The primary way I can see this happening now, is to simply make direct calls to the R language by writing R code from within Python rather than relying on
rpy2
objects, methods, and classes.Pros
Cons
Option 3
Freeze
rpy2
dependencies only and sync majorpymer4
updates withrpy2
updates.Pros
Cons
rpy2
rpy2
pushes changes because something in core R changesOption 4
Archive the project in its current state 😢
Pros
Cons
The text was updated successfully, but these errors were encountered: