Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEV: Add gitpod files #48107

Merged
merged 13 commits into from
Dec 2, 2022
Merged

DEV: Add gitpod files #48107

merged 13 commits into from
Dec 2, 2022

Conversation

noatamir
Copy link
Member

xref #47790

  1. I split the Docker file into 2 Dockers
  2. The base Docker includes the pandas environment, and has a new way to run Mamba in Docker, and builds on my machine (M1 mac).
  3. The gitpod.docker copies the base docker, which will speed things up for any Gitpod user, and includes a shallow clone of the repository (more speed).
  4. The workspace file defines some nice VS Code features, and the caches for C, C++, and the creation of .profile and .bashrc. Some things in there are finicky to order of operations (also in relation to the GitHub action which is coming in the next PR), so I recommend avoiding rearranging that file 😉
  5. I added the Gitpod files in a new library, currently under root because there isn't a tools library. Maybe in the future, we can discuss renaming scripts to tools to avoid adding another folder? I don't have a strong opinion on this, I just felt awkward about adding a folder there.

Sidenote: I'm going on a few days vacation, and will be back to test the actions next week so I made this a draft PR for now. If someone with docker/actions experience wants to leave a quick review—awesome. When I'm back, I will be testing it with my private account until I can access a pandas docker hub account.

CC @jorisvandenbossche @mroeschke


# -----------------------------------------------------------------------------
# ---- Installing mamba ----
RUN wget -q -O mambaforge3.sh \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we even need to use conda / mamba within Docker? Wondering if it’s worth having an extra layer of virtualization versus simplification not going through all these steps

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Maybe related), if we are not expecting users to be able to alter the dev environment when using gitpod, maybe mamba/conda can be removed entirely? https://pythonspeed.com/articles/conda-docker-image-size/

Copy link
Member

@jorisvandenbossche jorisvandenbossche Aug 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to keep using conda / mamba, another option could also be to use micromamba, which avoids having an additional base environment (the second point mentioned in the linked blog post for why the resulting docker images become big)

And we should probably also clean up the cached downloaded packages with conda.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking we could simply pip install from requirements.txt

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I put some comparisons on the mamba within Docker versus without in #49981

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I put some comparisons on the mamba within Docker versus without in #49981

Thanks, that's useful. I will comment about the usage of mamba there further.


# --------------------------------------------------------
# some useful extensions to have
vscode:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this common to add for gitpod? I don’t think all of our developers use VSCode so am somewhat hesitant to have a say on what “good” extensions are

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think all of our developers use VSCode

We might rather want to look at contributors that do use VSCode, and see which extensions those typically use, to ensure we can give a good base experience here (if you don't use VSCode, you also won't have expectations about which extension should be enabled, so as long we provide a useful set, those users will be happy I think)

In any case I suppose we want to enable some extensions. For example at least the python extension? Further of course it's a bit the question how far we want to go (eg restructuredtext highlighting can be useful since our docs use that a lot, maybe cython highlighting could be added as well)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these seem useful, but I'd remove autodocstring as docstrings in pandas are often created via decorators

RUN conda activate ${CONDA_ENV} && \
mamba install ccache -y && \
# needed for docs rendering later on
python -m pip install --no-cache-dir sphinx-autobuild && \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't be opposed to adding this to environment.yml

@mroeschke mroeschke added the Build Library building on various platforms label Aug 16, 2022
Comment on lines +10 to +13
git config --global alias.co checkout
git config --global alias.ci commit
git config --global alias.st status
git config --global alias.br branch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add those aliases? I can imagine that those will typically different for contributors (eg I use git s instead of git st for status ;))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this could be quite user-specific, so at first, I thought we sould remove them.
But on second thought, I can see how for very new users, it can be useful to put something in for sprints. They can be quickly introduced to using aliases without learning how to set them, and more experienced users can over-write them easily by setting their own aliases, no?!
WDYT, keep or remove?

@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Oct 15, 2022
@noatamir
Copy link
Member Author

noatamir commented Nov 9, 2022

I updated the docker files after local testing. The resulting images are on:
https://hub.docker.com/r/pythonpandas/pandas-dev
https://hub.docker.com/r/pythonpandas/pandas-gitpod

I tried to optimize using micromamba for a while, but couldn't make it work yet. For now, I prefer to have a working gitpod for the upcoming sprints. I am happy to give this another try at a later point in time.

@noatamir noatamir marked this pull request as ready for review November 9, 2022 22:49
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this, curious to see how people will find using it

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this, curious to see how people find using it


# --------------------------------------------------------
# some useful extensions to have
vscode:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these seem useful, but I'd remove autodocstring as docstrings in pandas are often created via decorators

.gitpod.yml Outdated

# assuming we use dockerhub: name of the docker user, docker image, tag, e.g. https://hub.docker.com/r/pandas/pandas-gitpod/tags
image:
file: pandas-dev/pandas-gitpod:latest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be updated?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Once we have the automation, the latest tag will always have the latest image, and otherwise we will also have an archive of previous images with a date tag. See https://hub.docker.com/r/numpy/numpy-gitpod/tags. For now, I have been manually over-riding the latest tag with pushes from my local machine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I wasn't very specific. I meant the "pandas-dev" org name (instead of "pythonpandas"), assuming this is the dockerhub org/image:version naming?

# are visible in the host and container.
# The docker image is retrieved from the pandas dockerhub repository
#
# docker run --rm -it -v $(pwd):/home/pandas pandas/pandas-dev:<image-tag>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

FROM gitpod/workspace-base:latest as clone

# the clone should be deep enough for versioneer to work
RUN git clone https://github.com/pandas-dev/pandas --depth 12 /tmp/pandas
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is depth of 12 then deep enough? Or will this still automatically fetch tags? (otherwise you can also explicitly fetch tags in a separate command)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it should be when we last chatted about this?!

"restructuredtext.linter.disabledLinters": ["doc8","rst-lint", "rstcheck"],
"python.defaultInterpreterPath": "/home/gitpod/mambaforge3/envs/pandas-dev/bin/python",
"esbonio.sphinx.buildDir": "${workspaceRoot}/doc/build/html",
"esbonio.sphinx.confDir": ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

esbonio is not in the list of extensions in gitpod.yml, would that be needed? (or those settings removed) Or is this a default extension?

Comment on lines +89 to +91
RUN mamba env create -f /tmp/environment.yml
# ---- Create conda environment ----
RUN conda activate $CONDA_ENV && \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
RUN mamba env create -f /tmp/environment.yml
# ---- Create conda environment ----
RUN conda activate $CONDA_ENV && \
# ---- Create conda environment ----
RUN mamba env create -f /tmp/environment.yml && \
conda activate $CONDA_ENV && \

Ensuring this is done in a single step makes this more efficient because then the conda clean also happens directly in the same layer

noatamir and others added 3 commits December 1, 2022 15:24
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Dec 1, 2022

I am planning to merge this, so that people can test it in practice and get some experience with it.

We can consolidate/improve the docker files in subsequent PRs (eg #48108)

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's give this a try

@jorisvandenbossche jorisvandenbossche merged commit 39e0964 into pandas-dev:main Dec 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Build Library building on various platforms Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants