Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEV: Add gitpod files #48107

Merged
merged 13 commits into from
Dec 2, 2022
Merged
58 changes: 58 additions & 0 deletions .gitpod.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Building pandas on init
# Might delegate this later to prebuild with Q2 improvements on gitpod
# https://www.gitpod.io/docs/config-start-tasks/#configuring-the-terminal
# -------------------------------------------------------------------------

# assuming we use dockerhub: name of the docker user, docker image, tag, e.g. https://hub.docker.com/r/pandas/pandas-gitpod/tags
image: pythonpandas/pandas-gitpod:latest
tasks:
- name: Prepare development environment
init: |
mkdir -p .vscode
cp gitpod/settings.json .vscode/settings.json
conda activate pandas-dev
git pull --unshallow # need to force this else the prebuild fails
git fetch --tags
python setup.py build_ext -j 4
python -m pip install -e . --no-build-isolation
echo "🛠 Completed rebuilding Pandas!! 🛠 "
echo "✨ Pre-build complete! You can close this terminal ✨ "

# --------------------------------------------------------
# exposing ports for liveserve
ports:
- port: 5500
onOpen: notify

# --------------------------------------------------------
# some useful extensions to have
vscode:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this common to add for gitpod? I don’t think all of our developers use VSCode so am somewhat hesitant to have a say on what “good” extensions are

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think all of our developers use VSCode

We might rather want to look at contributors that do use VSCode, and see which extensions those typically use, to ensure we can give a good base experience here (if you don't use VSCode, you also won't have expectations about which extension should be enabled, so as long we provide a useful set, those users will be happy I think)

In any case I suppose we want to enable some extensions. For example at least the python extension? Further of course it's a bit the question how far we want to go (eg restructuredtext highlighting can be useful since our docs use that a lot, maybe cython highlighting could be added as well)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these seem useful, but I'd remove autodocstring as docstrings in pandas are often created via decorators

extensions:
- ms-python.python
- yzhang.markdown-all-in-one
- eamodio.gitlens
- lextudio.restructuredtext
# add or remove what you think is generally useful to most contributors
# avoid adding too many. they each open a pop-up window

# --------------------------------------------------------
# using prebuilds for the container
# With this configuration the prebuild will happen on push to main
github:
prebuilds:
# enable for main/default branch
main: true
# enable for other branches (defaults to false)
branches: false
# enable for pull requests coming from this repo (defaults to true)
pullRequests: false
# enable for pull requests coming from forks (defaults to false)
pullRequestsFromForks: false
# add a check to pull requests (defaults to true)
addCheck: false
# add a "Review in Gitpod" button as a comment to pull requests (defaults to false)
addComment: false
# add a "Review in Gitpod" button to the pull request's description (defaults to false)
addBadge: false
# add a label once the prebuild is ready to pull requests (defaults to false)
addLabel: false
100 changes: 100 additions & 0 deletions gitpod/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
#
# Dockerfile for pandas development
#
# Usage:
# -------
#
# To make a local build of the container, from the 'Docker-dev' directory:
# docker build --rm -f "Dockerfile" -t <build-tag> "."
#
# To use the container use the following command. It assumes that you are in
# the root folder of the pandas git repository, making it available as
# /home/pandas in the container. Whatever changes you make to that directory
# are visible in the host and container.
# The docker image is retrieved from the pandas dockerhub repository
#
# docker run --rm -it -v $(pwd):/home/pandas pandas/pandas-dev:<image-tag>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

#
# By default the container will activate the conda environment pandas-dev
# which contains all the dependencies needed for pandas development
#
# To build and install pandas run:
# python setup.py build_ext -j 4
# python -m pip install -e . --no-build-isolation
#
# This image is based on: Ubuntu 20.04 (focal)
# https://hub.docker.com/_/ubuntu/?tab=tags&name=focal
# OS/ARCH: linux/amd64
FROM gitpod/workspace-base:latest

ARG MAMBAFORGE_VERSION="22.9.0-1"
ARG CONDA_ENV=pandas-dev
ARG PANDAS_HOME="/home/pandas"


# ---- Configure environment ----
ENV CONDA_DIR=/home/gitpod/mambaforge3 \
SHELL=/bin/bash
ENV PATH=${CONDA_DIR}/bin:$PATH \
WORKSPACE=/workspace/pandas

# -----------------------------------------------------------------------------
# ---- Creating as root - note: make sure to change to gitpod in the end ----
USER root

# Avoid warnings by switching to noninteractive
ENV DEBIAN_FRONTEND=noninteractive

# Configure apt and install packages
RUN apt-get update \
&& apt-get -y install --no-install-recommends apt-utils dialog 2>&1 \
#
# Install tzdata and configure timezone (fix for tests which try to read from "/etc/localtime")
&& apt-get -y install tzdata \
&& ln -fs /usr/share/zoneinfo/Etc/UTC /etc/localtime \
&& dpkg-reconfigure -f noninteractive tzdata \
#
# Verify git, process tools, lsb-release (common in install instructions for CLIs) installed
&& apt-get -y install git iproute2 procps iproute2 lsb-release \
#
# cleanup
&& apt-get autoremove -y \
&& apt-get clean -y \
&& rm -rf /var/lib/apt/lists/*

# Switch back to dialog for any ad-hoc use of apt-get
ENV DEBIAN_FRONTEND=dialog

# Allows this Dockerfile to activate conda environments
SHELL ["/bin/bash", "--login", "-o", "pipefail", "-c"]

# -----------------------------------------------------------------------------
# ---- Installing mamba ----
RUN wget -q -O mambaforge3.sh \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we even need to use conda / mamba within Docker? Wondering if it’s worth having an extra layer of virtualization versus simplification not going through all these steps

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Maybe related), if we are not expecting users to be able to alter the dev environment when using gitpod, maybe mamba/conda can be removed entirely? https://pythonspeed.com/articles/conda-docker-image-size/

Copy link
Member

@jorisvandenbossche jorisvandenbossche Aug 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to keep using conda / mamba, another option could also be to use micromamba, which avoids having an additional base environment (the second point mentioned in the linked blog post for why the resulting docker images become big)

And we should probably also clean up the cached downloaded packages with conda.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking we could simply pip install from requirements.txt

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I put some comparisons on the mamba within Docker versus without in #49981

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I put some comparisons on the mamba within Docker versus without in #49981

Thanks, that's useful. I will comment about the usage of mamba there further.

"https://github.com/conda-forge/miniforge/releases/download/$MAMBAFORGE_VERSION/Mambaforge-$MAMBAFORGE_VERSION-Linux-x86_64.sh" && \
bash mambaforge3.sh -p ${CONDA_DIR} -b && \
rm mambaforge3.sh

# -----------------------------------------------------------------------------
# ---- Copy needed files ----
# basic workspace configurations
COPY ./gitpod/workspace_config /usr/local/bin/workspace_config

RUN chmod a+rx /usr/local/bin/workspace_config && \
workspace_config

# the container to create a conda environment from it
COPY environment.yml /tmp/environment.yml

RUN mamba env create -f /tmp/environment.yml
# ---- Create conda environment ----
RUN conda activate $CONDA_ENV && \
Comment on lines +89 to +91
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
RUN mamba env create -f /tmp/environment.yml
# ---- Create conda environment ----
RUN conda activate $CONDA_ENV && \
# ---- Create conda environment ----
RUN mamba env create -f /tmp/environment.yml && \
conda activate $CONDA_ENV && \

Ensuring this is done in a single step makes this more efficient because then the conda clean also happens directly in the same layer

mamba install ccache -y && \
# needed for docs rendering later on
python -m pip install --no-cache-dir sphinx-autobuild && \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't be opposed to adding this to environment.yml

conda clean --all -f -y && \
rm -rf /tmp/*

# -----------------------------------------------------------------------------
# Always make sure we are not root
USER gitpod
46 changes: 46 additions & 0 deletions gitpod/gitpod.Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Doing a local shallow clone - keeps the container secure
# and much slimmer than using COPY directly or making a
# remote clone
ARG BASE_CONTAINER="pythonpandas/pandas-dev:latest"
FROM gitpod/workspace-base:latest as clone

# the clone should be deep enough for versioneer to work
RUN git clone https://github.com/pandas-dev/pandas --depth 12 /tmp/pandas
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is depth of 12 then deep enough? Or will this still automatically fetch tags? (otherwise you can also explicitly fetch tags in a separate command)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it should be when we last chatted about this?!


# -----------------------------------------------------------------------------
# Using the pandas-dev Docker image as a base
# This way, we ensure we have all the needed compilers and dependencies
# while reducing the build time
FROM ${BASE_CONTAINER} as build

# -----------------------------------------------------------------------------
USER root

# -----------------------------------------------------------------------------
# ---- ENV variables ----
# ---- Directories needed ----
ENV WORKSPACE=/workspace/pandas/ \
CONDA_ENV=pandas-dev

# Allows this micromamba.Dockerfile to activate conda environments
SHELL ["/bin/bash", "--login", "-o", "pipefail", "-c"]

# Copy over the shallow clone
COPY --from=clone --chown=gitpod /tmp/pandas ${WORKSPACE}

# Everything happens in the /workspace/pandas directory
WORKDIR ${WORKSPACE}

# Build pandas to populate the cache used by ccache
RUN git config --global --add safe.directory /workspace/pandas
RUN conda activate ${CONDA_ENV} && \
python setup.py build_ext --inplace && \
ccache -s

# Gitpod will load the repository into /workspace/pandas. We remove the
# directory from the image to prevent conflicts
RUN rm -rf ${WORKSPACE}

# -----------------------------------------------------------------------------
# Always return to non privileged user
USER gitpod
6 changes: 6 additions & 0 deletions gitpod/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{
"restructuredtext.updateOnTextChanged": "true",
"restructuredtext.updateDelay": 300,
"restructuredtext.linter.disabledLinters": ["doc8","rst-lint", "rstcheck"],
"python.defaultInterpreterPath": "/home/gitpod/mambaforge3/envs/pandas-dev/bin/python"
}
54 changes: 54 additions & 0 deletions gitpod/workspace_config
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
#!/bin/bash
# Basic configurations for the workspace

set -e

# gitpod/workspace-base needs at least one file here
touch /home/gitpod/.bashrc.d/empty

# Add git aliases
git config --global alias.co checkout
git config --global alias.ci commit
git config --global alias.st status
git config --global alias.br branch
Comment on lines +10 to +13
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add those aliases? I can imagine that those will typically different for contributors (eg I use git s instead of git st for status ;))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this could be quite user-specific, so at first, I thought we sould remove them.
But on second thought, I can see how for very new users, it can be useful to put something in for sprints. They can be quickly introduced to using aliases without learning how to set them, and more experienced users can over-write them easily by setting their own aliases, no?!
WDYT, keep or remove?

git config --global alias.hist "log --pretty=format:'%h %ad | %s%d [%an]' --graph --date=short"
git config --global alias.type 'cat-file -t'
git config --global alias.dump 'cat-file -p'

# Enable basic vim defaults in ~/.vimrc
echo "filetype plugin indent on" >>~/.vimrc
echo "set colorcolumn=80" >>~/.vimrc
echo "set number" >>~/.vimrc
echo "syntax enable" >>~/.vimrc

# Vanity custom bash prompt - makes it more legible
echo "PS1='\[\e]0;\u \w\a\]\[\033[01;36m\]\u\[\033[m\] > \[\033[38;5;141m\]\w\[\033[m\] \\$ '" >>~/.bashrc

# Enable prompt color in the skeleton .bashrc
# hadolint ignore=SC2016
sed -i 's/^#force_color_prompt=yes/force_color_prompt=yes/' /etc/skel/.bashrc

# .gitpod.yml is configured to install pandas from /workspace/pandas
echo "export PYTHONPATH=${WORKSPACE}" >>~/.bashrc

# make conda activate command available from /bin/bash (login and interactive)
if [[ ! -f "/etc/profile.d/conda.sh" ]]; then
ln -s ${CONDA_DIR}/etc/profile.d/conda.sh /etc/profile.d/conda.sh
fi
echo ". ${CONDA_DIR}/etc/profile.d/conda.sh" >>~/.bashrc
echo "conda activate pandas-dev" >>~/.bashrc

# Enable prompt color in the skeleton .bashrc
# hadolint ignore=SC2016
sed -i 's/^#force_color_prompt=yes/force_color_prompt=yes/' /etc/skel/.bashrc

# .gitpod.yml is configured to install pandas from /workspace/pandas
echo "export PYTHONPATH=/workspace/pandas" >>~/.bashrc

# Set up ccache for compilers for this Dockerfile
# REF: https://github.com/conda-forge/compilers-feedstock/issues/31
echo "conda activate pandas-dev" >>~/.startuprc
echo "export CC=\"ccache \$CC\"" >>~/.startuprc
echo "export CXX=\"ccache \$CXX\"" >>~/.startuprc
echo "source ~/.startuprc" >>~/.profile
echo "source ~/.startuprc" >>~/.bashrc