Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update dependencies and ensure all notebooks are working #206

Closed
bryanwweber opened this issue Jan 31, 2022 · 5 comments · Fixed by #218
Closed

Update dependencies and ensure all notebooks are working #206

bryanwweber opened this issue Jan 31, 2022 · 5 comments · Fixed by #218
Assignees

Comments

@bryanwweber
Copy link
Contributor

Hi dask-examples maintainers! As I'm learning Dask after starting @coiled I thought about adding an example here, or at least doing something that could work in that direction. While getting set up, I noticed that a number of the dependencies of the repo are out-of-date or not available for the aarch64/arm64 (Mac M1) platform. Specifically:

  • dask is pinned to 2.20.0, but 2022.1.1 was released last week
  • dask-image is pinned to 0.2.0 but 2021.12.0 is availble
  • dask-ml is pinned to 1.6.0 but 2022.1.22 is available
  • python is pinned to 3.8 (perhaps because of pystan below?)
  • py-xgboost and dask-xgboost are either deprecated or not available
  • prophet requires pystan==2.19.1.1, but pystan 3.3.0 is released (this is a known issue with prophet)
  • pystan, httpstan, and simdjson are not available for aarch64 via conda or PyPI
  • scikit-learn is pinned to 0.23
  • scikit-image is installed via pip in the example notebook

I'd like to propose updating all these dependencies and putting them into binder/environment.yml. Before I get in too deep, especially with prophet which is likely to be the most complicated one, I'd like some feedback on whether this is desirable and which versions of dependencies (especially dask-* libraries) have known conflicts or requirements.

On a related, but separate, note, I've been experimenting with creating a devcontainer for this repo so that developers can run a full build of the site as they're working. Essentially, this is an add-on for VS Code that mounts a local directory into a pre-configured Docker container that is started by VS Code. All commands and terminals are run inside the container, so it can more-or-less replicate the environment on GH Actions to build the examples via Sphinx and serve them locally.

I think this would be a nice helper for anyone who wanted to add a new example, but it would require committing VS Code specific configuration to the repo. Feedback on this idea is welcome as well! Thanks 😃

@bryanwweber
Copy link
Contributor Author

  • prophet requires pystan==2.19.1.1, but pystan 3.3.0 is released (this is a known issue with prophet)
  • pystan, httpstan, and simdjson are not available for aarch64 via conda or PyPI

Fortunately, it seems as though prophet has the ability to use the cmdstan backend, which does not require pystan at all. At the moment, this requires manually installing dependencies, but hopefully this will improve. xref facebook/prophet#2041 and facebook/prophet#2088

@bryanwweber

This comment was marked as outdated.

@bryanwweber
Copy link
Contributor Author

So I put together a devcontainer for VS Code remote, it's living over on this branch: https://github.com/bryanwweber/dask-examples/tree/devcontainer. What that allows is to use the VS Code Remove Development extension and run all the notebooks in a Docker container with all the dependencies already installed. That will hopefully be quite useful for anyone wanting to add/extend examples, but I don't think it needs to be merged here.

With that container, I ran pytest and there were 4 notebooks that failed, presumably due to syntax changes in Dask/Pandas since Dask v2.20? My next task here is to debug those notebooks and fix them up:

  • FAILED dataframes/02-groupby.ipynb
  • FAILED dataframes/01-data-access.ipynb
  • FAILED machine-learning/xgboost.ipynb
  • FAILED dataframes/04-reading-messy-data-into-dataframes.ipynb

One question I had for anyone following along: what to do about the Binder image, in terms of including dependencies of the Notebooks? Several dependencies are installed with pip inside the relevant Notebook example. I think I understand that we are trying to keep the image relatively small; however, I don't think it's very good practice to install with pip inside a Notebook, although I don't have specific evidence for that thought...

@bryanwweber
Copy link
Contributor Author

Another question I have is, what is the purpose of the repo2docker CI job? Does that image get uploaded anywhere, or is it just to see that repo2docker continues to work? I also note that the build-and-deploy CI job has failed on main with the last two pushes, although it succeeded on the two branches that were merged. https://github.com/dask/dask-examples/actions/runs/1786946502 https://github.com/dask/dask-examples/actions/runs/1653328956

@bryanwweber
Copy link
Contributor Author

FWIW here, VS Code offers the ability to set up an external folder to store devcontainer configuration: https://code.visualstudio.com/docs/remote/create-dev-container#_alternative-repository-configuration-folders

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant