Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add packages for everyday usage #2

Merged
merged 3 commits into from
Nov 25, 2022
Merged

Add packages for everyday usage #2

merged 3 commits into from
Nov 25, 2022

Conversation

fperez
Copy link
Contributor

@fperez fperez commented Nov 24, 2022

Add the full complement of scientific python packages I use on a fairly regular basis.

As per @yuvipanda's comment, I went through the various environment files in images I use, and I tried to capture all the packages I actually use in practice on a reasonably regular basis, leaving out more obscure things that might end up going stale or not actually be used.

All the packages here are things I've used in some capacity for either research or teaching either at Berkeley or in the JMTE hub in the last couple of years.

Add the full complement of scientific python packages I use on a fairly regular basis.
@github-actions
Copy link

Binder 👈 Test this PR on Binder

@fperez
Copy link
Contributor Author

fperez commented Nov 24, 2022

BTW, I ran a quick check of versions and imports off the binder link, and all so far looks good:

Python info on this system:

3.10.8 | packaged by conda-forge | (main, Nov 22 2022, 08:23:14) [GCC 10.4.0]

Current package versions

IPython                  : 8.6.0
PIL                      : 9.2.0
bokeh                    : 3.0.2
bs4                      : 4.11.1
cartopy                  : 0.21.0
cython                   : 0.29.32
dask                     : 2022.11.0
geopandas                : 0.12.1
h5py                     : 3.7.0
intake                   : 0.6.6
ipyleaflet               : 0.17.2
ipympl                   : 0.9.2
ipywidgets               : 7.7.2
jedi                     : 0.18.1
jupyter_book             : 0.13.1
jupyterlab               : 3.5.0
jupyterlab_favorites     : 3.1.0
jupyterlab_git           : 0.39.3
jupyterlab_widgets       : 1.1.1
lxml                     : 4.9.1
matplotlib               : 3.6.2
matplotlib_inline        : 0.1.6
myst_nb                  : 0.13.2
myst_parser              : 0.15.2
networkx                 : 2.8.8
numba                    : 0.56.4
numpy                    : 1.23.5
pandas                   : 1.5.2
pep8                     : 1.7.1
plotly                   : 5.11.0
pooch                    : v1.6.0
pyflakes                 : 2.5.0
pytest                   : 7.2.0
pytest_cov               : 4.0.0
requests                 : 2.28.1
scipy                    : 1.9.3
seaborn                  : 0.12.1
skimage                  : 0.19.3
sklearn                  : 1.1.3
sphinx                   : 4.5.0
sphinx_jupyterbook_latex : 0.4.7
statsmodels              : 0.13.5
sympy                    : 1.11.1
xarray                   : 2022.11.0
yaml                     : 6.0
zarr                     : 2.13.3

Thanks!

@yuvipanda
Copy link
Contributor

Thank you so much, @fperez! We should find ways to automate this even further, and with more people being hired at 2i2c I'm hopeful that can happen :)

@yuvipanda yuvipanda merged commit f837420 into main Nov 25, 2022
@fperez
Copy link
Contributor Author

fperez commented Nov 25, 2022

Awesome, thanks so much! It took me a bit to figure out how to get the script to work in-place in the yaml structure while preserving structure (ruyaml is essentially undocumented in this aspect, and no methods have docstrings, so it's kind of hunting in the dark). But the script now works well for this purpose, so we can eventually put it in a more automated pipeline if needed.

BTW - building the env on the hub took forever. I timed it on my laptop where it took 1 min 25s, while on the hub it took easily ~ 20 minutes (I didn't time it precisely). I know those filesystem-intensive operations are slow, but this seemed unusually slow. Just figured I'd let you know.

@yuvipanda
Copy link
Contributor

@fperez yeah, that's sort of the performance I'd expect on NFS honestly! Another reason I recommend against putting user environments on NFS... :D

@fperez
Copy link
Contributor Author

fperez commented Nov 26, 2022

I hear you @yuvipanda! TBH I was (naïvely) expecting the gap to be a bit less brutal. In the cloud, it really seems to be horrid (on a local, "classic" linux cluster the penalties are manageable).

In any case - given this, do you have any other suggestions for users to experiment when they need to build and keep around an env to use? No matter what we do with the base env, users will always need to experiment with new environments, they'll have a few one-off packages they need to build, etc.

What is being recommended to other 2i2c users in general?

@weiji14 weiji14 deleted the fperez-packages branch December 4, 2022 02:42
weiji14 added a commit that referenced this pull request Dec 5, 2022
Bump xarray from 2022.11.0 to 2022.12.0 which contains a
[bugfix](pydata/xarray#7304) useful for reading
multiple groups from ICESat-2 HDF5 files in AWS S3 buckets.

Also taking the opportunity to sort packages alphabetically in the
`environment.yml` and reorganize some sections originally added in #2.
Hopefully this will make it clearer on where new conda packages can be
added in the future!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants