Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove pystan backend, move to cmdstanpy #2041

Closed
2 of 3 tasks
tcuongd opened this issue Oct 4, 2021 · 18 comments · Fixed by #2148
Closed
2 of 3 tasks

Remove pystan backend, move to cmdstanpy #2041

tcuongd opened this issue Oct 4, 2021 · 18 comments · Fixed by #2148

Comments

@tcuongd
Copy link
Collaborator

tcuongd commented Oct 4, 2021

Context

Pystan 3+ makes some breaking changes to Prophet python package:

  • MAP estimation is removed.
  • Windows support is removed.

We've been continuing to use Pystan 2.19.1.1, but this is only supported up to Python 3.8 and no longer receives development. Many of our issues also relate to pystan issues, which will become harder to debug over time and may need hacky patches.

The current installation process with pystan is as follows:

Mac OS / Linux

  • End users usually use prophet from PyPi.
  • End user must install pystan=2.19.1.1 first, as it is required for building the wheel
  • End user installs prophet, which includes compiling the pystan model on their machine.

Windows

  • End users usually use prophet from conda-forge.

Proposal

  • Use cmdstan + cmdstanpy to build binary distributions for Pypi (using Github Actions) and conda-forge, across all platforms.
    • Packaging pre-compiled stan models per platform + python version means the end-user does not need to perform the build process on their machine. They won't even need their own cmdstan installation since we will package the necessary executables into Prophet itself.
    • Only the lightweight wrapper library cmdstanpy is required as a dependency.
  • Remove the dependency on pystan altogether.

This should not change how end-users typically interact with the Prophet package. The general flow of

m = Prophet()
m.fit(df)
future = m.make_future_dataframe()
m.predict(future)

will remain unchanged, and all other diagnostic functions will work as normal.

It's just that behind the scenes, the model object will now be a cmdstanpy object rather than a pystan one. The argument stan_backend='PYSTAN' will also be deprecated. This will affect end users who retrieve the actual pystan object and work with it directly, but we expect this to be the minority.

We have already refactored the setup.py script to accomodate this new process (it currently supports both pystan 2.19.1.1 and cmdstanpy for backwards compatibility).
We have also built and tested PyPi wheels for macOS and Linux for Python versions 3.6-3.8. If you follow the instructions here, you can test these binary distributions on your machine.

Work required

For the next minor release (prophet 1.1.0), we should consider the following:

  • Update existing conda build scripts to use the cmdstan + cmdstanpy backend.
  • Remove the pystan model compilation from the build process and remove pystan as a Prophet dependency.
  • (Optional) Add support for a Windows wheel on PyPi.
@tcuongd tcuongd changed the title Removing pystan backend, moving to cmdstanpy Remove pystan backend, move to cmdstanpy Oct 4, 2021
@mitzimorris
Copy link
Contributor

mitzimorris commented Oct 8, 2021

excited to be the Python backend!

CmdStanPy pr stan-dev/cmdstanpy#461 changed behavoir of sampler so that CmdStanModel method sample argument show_progress defaults to True, previously default was False.

also, we're hoping that next release of CmdStanPy is going to be the 1.0 release.

@mitzimorris
Copy link
Contributor

mitzimorris commented Oct 12, 2021

FYI - CmdStanPy PR stan-dev/cmdstanpy#460 is removing deprecated functions.
I checked class IStanBackend - this won't affect it.

@mitzimorris
Copy link
Contributor

FYI, CmdStanPy 1.0.0rc1 is up. Methods, arguments, and properties which were previously deprecated have been removed. The addition of progress bars to the sample method and the cmdstan_install methods will change the console output behavoir; these progress bars use the tqdm library.

More information is available here: https://discourse.mc-stan.org/t/cmdstanpy-1-0-0rc1/24973

@tcuongd
Copy link
Collaborator Author

tcuongd commented Nov 22, 2021

Thanks heaps @mitzimorris! Sorry I haven't had the time to take a look. Do you have any examples of other packages using cmdstanpy on conda btw? It'll save me some time -- if not, all good.

Would be keen to know when the 1.0 release goes on PyPI as well! Given I've been slow on this I think we might as well wait for that and refactor the setup script.

@mitzimorris
Copy link
Contributor

answer from Brian who put together the conda package:
stan-dev/cmdstanpy#503 (comment)

@akosfurton
Copy link

@tcuongd and @mitzimorris , FYI the 1.0 release of cmdstanpy is available on pypi

https://pypi.org/project/cmdstanpy/

related discussion: stan-dev/cmdstanpy#475

@WardBrian
Copy link
Collaborator

Are you looking to compile the .stan files as part of the conda build? Or just depend on cmdstanpy and actually compile the executables at install or run time?

I'm happy to help with this if I can provide value

@akosfurton
Copy link

@WardBrian , yes that would be the idea. Would love the help since this issue is blocking a new release to pypi

Packaging pre-compiled stan models per platform + python version means the end-user does not need to perform the build process on their machine. They won't even need their own cmdstan installation since we will package the necessary executables into Prophet itself.

@WardBrian
Copy link
Collaborator

I'm not sure that it would even need to be per-Python version, since there is no direct interface between those programs and the python environment, it's all done via subshells etc.

I think this is relatively simple to do via conda - cmdstan can just be a build-time requirement and then you can build the models and store them with your package. This can be done either with cmdstanpy and then saving the executable, or actually just running the cmdstan commands directly inside bld.bat/build.sh. The only 'issue' I forsee is if you plan on using cmdstanpy as a runtime requirement, conda will install cmdstan anyway, even if you don't technically need it.

Is there an existing PR to the feedstock to tackle this?

@tcuongd
Copy link
Collaborator Author

tcuongd commented Dec 12, 2021

Thanks Brian! Ideally we'd compile the stan files as a part of the conda build. There's no PR to the feedstock yet, I haven't started on it.

@WardBrian
Copy link
Collaborator

Reading through the feedstock and setup.py, I think you may have a bit of a chicken-egg problem about when you actually remove pystan. The feedstock uses the pypi source tarball rather than something like a github tag, so it can only depend on a version which is already released to pypi.

I think it's possible to build a working cmdstanpy version based off 1.0.1 (and if you remove pip check from the tests you probably won't even need pystan), but then I think you'll want to fairly rapidly think about the next version where pystan is not the default and clean up some of the code in setup.py as it relates to cmdstan (I noticed there are already some TODO comments about this). If you remove PyStan completely I believe it makes the conda build much simpler as you'll no longer need all the run time packages in the build environment

I'll make a few attempts at updating the recipe today and see if I can get anywhere

@WardBrian
Copy link
Collaborator

I should note that I think the changes to setup.py made in #2010 will need to be re-thought or reverted. The assumptions it makes about the cmdstan path will not hold true in the conda environment (you cannot in general hard-code ~/.cmdstan as the path), and the extra work it does (rebuilding cmdstan, 'pruning' the cmdstan folder after the fact) will also be problematic I believe

I think the previous method of the very simple build_model method would work better. If you're concerned about previously not having cmdstan installed, you can call something like cmdstanpy.cmdstan_path, catch its exception and call cmdstan.install_cmdstan, and then proceed

@WardBrian
Copy link
Collaborator

Tests over at conda-forge/prophet-feedstock#21 are green, so as a proof of concept this works for 1.0.1

I think given the current state of master, this would break in the next release. The functionality in setup.py that moves/repackages CmdStan should be configurable (something like an environment flag REPKG_CMDSTAN would be great), such that disabling it would just assume you have a working cmdstanpy installation and uses that instead (like is done in 1.0.1's released version). The logic in the actual Backend class would need to check that the folder exists before setting cmdstan to that version, but that's also reasonable.

If you set it up in that way you could build wheels with it enabled, but it is disabled for conda or power users building from source. Plus, it should make testing with different CmdStan versions a lot easier if you allow the model to be arbitrarily rebuilt, so everybody wins in the end.

@tcuongd , let me know if there is anything I can do to help with that reorganization. I'm not as familiar with building wheels as I am with conda recipes, but the logic in setup.py should be pretty simple. If you'd like, we can also have the conda recipe target git tags from github to test before the release of the next version.

@akosfurton
Copy link

@WardBrian , I think that makes sense for setup.py. Please feel free to re-organize setup.py. The logic for a setup.py file is usually straightforward.

I like the environment flag to build the wheel by default, with an override if a power user wants to build from source.

@WardBrian
Copy link
Collaborator

I am currently waiting to hear back from my employer about whether the Facebook CLA is compatible with our IP policy.

A question that may be related to this change: what originated the use of a different model on windows versus unix? I believe CmdStan should be able to compile the 'unix' model on all platforms

@akosfurton
Copy link

I don't believe there was ever a compelling reason for a different Windows vs Unix model. Probably just general Windows difficulties. By pre-compiling the model vs having the user compile the model, all of the Windows issues with C++ compilers are sidestepped. If you look through the Prophet issues, there's loads of "can't install on Windows" issues related to model compilation.

If you are able to use the same model for all platforms, that would be the best way forward and easiest to maintain.

@jlopezpena
Copy link

With #2088 now merged, I guess the next step would be getting rid of the pytstan dependency and imports? Is there much work to do there or is it just a matter of removing all pystan references in the code (and eventually documentation)?

@WardBrian
Copy link
Collaborator

In #2088 we discussed the next steps. Beyond just removing the PyStan references, there is also some refactoring work which can be done since the abstraction of a “backend” is not really needed any longer. I don’t think that would need to happen right away though, it’s probably fine to leave IStanBackend (and just only have one option for it) for the time being

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants