-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DEV: Add gitpod files #48107
DEV: Add gitpod files #48107
Conversation
|
||
# ----------------------------------------------------------------------------- | ||
# ---- Installing mamba ---- | ||
RUN wget -q -O mambaforge3.sh \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we even need to use conda / mamba within Docker? Wondering if it’s worth having an extra layer of virtualization versus simplification not going through all these steps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Maybe related), if we are not expecting users to be able to alter the dev environment when using gitpod
, maybe mamba/conda can be removed entirely? https://pythonspeed.com/articles/conda-docker-image-size/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to keep using conda / mamba, another option could also be to use micromamba, which avoids having an additional base environment (the second point mentioned in the linked blog post for why the resulting docker images become big)
And we should probably also clean up the cached downloaded packages with conda.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking we could simply pip install from requirements.txt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I put some comparisons on the mamba within Docker versus without in #49981
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I put some comparisons on the mamba within Docker versus without in #49981
Thanks, that's useful. I will comment about the usage of mamba there further.
|
||
# -------------------------------------------------------- | ||
# some useful extensions to have | ||
vscode: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this common to add for gitpod? I don’t think all of our developers use VSCode so am somewhat hesitant to have a say on what “good” extensions are
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t think all of our developers use VSCode
We might rather want to look at contributors that do use VSCode, and see which extensions those typically use, to ensure we can give a good base experience here (if you don't use VSCode, you also won't have expectations about which extension should be enabled, so as long we provide a useful set, those users will be happy I think)
In any case I suppose we want to enable some extensions. For example at least the python extension? Further of course it's a bit the question how far we want to go (eg restructuredtext highlighting can be useful since our docs use that a lot, maybe cython highlighting could be added as well)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these seem useful, but I'd remove autodocstring
as docstrings in pandas are often created via decorators
RUN conda activate ${CONDA_ENV} && \ | ||
mamba install ccache -y && \ | ||
# needed for docs rendering later on | ||
python -m pip install --no-cache-dir sphinx-autobuild && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't be opposed to adding this to environment.yml
git config --global alias.co checkout | ||
git config --global alias.ci commit | ||
git config --global alias.st status | ||
git config --global alias.br branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add those aliases? I can imagine that those will typically different for contributors (eg I use git s
instead of git st
for status ;))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this could be quite user-specific, so at first, I thought we sould remove them.
But on second thought, I can see how for very new users, it can be useful to put something in for sprints. They can be quickly introduced to using aliases without learning how to set them, and more experienced users can over-write them easily by setting their own aliases, no?!
WDYT, keep or remove?
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
I updated the docker files after local testing. The resulting images are on: I tried to optimize using micromamba for a while, but couldn't make it work yet. For now, I prefer to have a working gitpod for the upcoming sprints. I am happy to give this another try at a later point in time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this, curious to see how people will find using it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this, curious to see how people find using it
|
||
# -------------------------------------------------------- | ||
# some useful extensions to have | ||
vscode: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these seem useful, but I'd remove autodocstring
as docstrings in pandas are often created via decorators
.gitpod.yml
Outdated
|
||
# assuming we use dockerhub: name of the docker user, docker image, tag, e.g. https://hub.docker.com/r/pandas/pandas-gitpod/tags | ||
image: | ||
file: pandas-dev/pandas-gitpod:latest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be updated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. Once we have the automation, the latest tag will always have the latest image, and otherwise we will also have an archive of previous images with a date tag. See https://hub.docker.com/r/numpy/numpy-gitpod/tags. For now, I have been manually over-riding the latest tag with pushes from my local machine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I wasn't very specific. I meant the "pandas-dev" org name (instead of "pythonpandas"), assuming this is the dockerhub org/image:version naming?
# are visible in the host and container. | ||
# The docker image is retrieved from the pandas dockerhub repository | ||
# | ||
# docker run --rm -it -v $(pwd):/home/pandas pandas/pandas-dev:<image-tag> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
FROM gitpod/workspace-base:latest as clone | ||
|
||
# the clone should be deep enough for versioneer to work | ||
RUN git clone https://github.com/pandas-dev/pandas --depth 12 /tmp/pandas |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is depth of 12 then deep enough? Or will this still automatically fetch tags? (otherwise you can also explicitly fetch tags in a separate command)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it should be when we last chatted about this?!
gitpod/settings.json
Outdated
"restructuredtext.linter.disabledLinters": ["doc8","rst-lint", "rstcheck"], | ||
"python.defaultInterpreterPath": "/home/gitpod/mambaforge3/envs/pandas-dev/bin/python", | ||
"esbonio.sphinx.buildDir": "${workspaceRoot}/doc/build/html", | ||
"esbonio.sphinx.confDir": "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
esbonio is not in the list of extensions in gitpod.yml, would that be needed? (or those settings removed) Or is this a default extension?
RUN mamba env create -f /tmp/environment.yml | ||
# ---- Create conda environment ---- | ||
RUN conda activate $CONDA_ENV && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RUN mamba env create -f /tmp/environment.yml | |
# ---- Create conda environment ---- | |
RUN conda activate $CONDA_ENV && \ | |
# ---- Create conda environment ---- | |
RUN mamba env create -f /tmp/environment.yml && \ | |
conda activate $CONDA_ENV && \ |
Ensuring this is done in a single step makes this more efficient because then the conda clean also happens directly in the same layer
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
I am planning to merge this, so that people can test it in practice and get some experience with it. We can consolidate/improve the docker files in subsequent PRs (eg #48108) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's give this a try
xref #47790
gitpod.docker
copies the base docker, which will speed things up for any Gitpod user, and includes a shallow clone of the repository (more speed)..profile
and.bashrc
. Some things in there are finicky to order of operations (also in relation to the GitHub action which is coming in the next PR), so I recommend avoiding rearranging that file 😉root
because there isn't atools
library. Maybe in the future, we can discuss renamingscripts
totools
to avoid adding another folder? I don't have a strong opinion on this, I just felt awkward about adding a folder there.Sidenote: I'm going on a few days vacation, and will be back to test the actions next week so I made this a draft PR for now. If someone with docker/actions experience wants to leave a quick review—awesome. When I'm back, I will be testing it with my private account until I can access a pandas docker hub account.
CC @jorisvandenbossche @mroeschke