You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
as a user of Visual Studio Code, I was happy to see that devcontainer, i.e., a way to develop and run pandas with all its dependencies in an isolated docker container with vscode integration, was supported by pandas.
My assumption is that the devcontainer configuration should allow contributors, new and experienced, to automate tedious tasks away (install dependencies, build pandas, configure vscode), and accelerate pandas development.
However, I felt that the current implementation had some shortcomings:
It did not allow me, as a new contributor to pandas, to jump right into pandas development, as it did not build pandas automatically (and check for preconditions, such as git tags being present in the fork), and furthermore suffered from filesystem permission issues, as it was acting as root within the container.
I wasn't sure how close the environment I would be developing in would be to the one used in GitHub Actions CI workflows, possibly causing troubleshooting issues when a commit works in the devcontainer but not on GH or vice versa.
It didn't fully configure vscode to be compatible with pandas' style guidelines (e.g., 88 character line limit) and toolchain (e.g., pre-commit hooks)
Many of the vscode configurations were deprecated and to be replaced.
Feature Description / Request for Comments
I opened PR #54845 to address some of these points, though I don't claim to have solved them completely.
@mroeschke made reference to the history of pandas' Dockerfile, e.g., #49981, where docke image size and build times were of concern.
The image size of PR #54845 stands at roughly 1.2 GB, and build time at a few minutes for the docker image itself, plus on the order of ten minutes to have it install the python mamba environment and build pandas when first launched as a vscode devcontainer.
To better understand how to improve devcontainer support for pandas contributors, I'd like to ask the following:
What is the Dockerfile use case today, aside from being part of the devcontainer setup? Which workflows is the docker image a part of, what are their constraints?
Is there an official docker image built and published by pandas? If so, when are new builds triggered?
Alternative Solutions
Depending on the discussion, one could have, e.g.,
Separate Dockerfiles for the vscode devcontainer use case, and for the other use cases
A common Dockerfile containing largely-static OS-level setup (e.g., C++ build dependencies), while the rest that evolves more quickly in the pandas repo is left for devcontainer-specific files and/or the docker image user
A Dockerfile that contains a pre-built pandas distribution sourced from the main branch -- in this case I think one would want to automatically build and publish docker images with each merge, so that users (i.e., contributors) can fetch a ready-made image and start developing.
What are your thoughts on this?
The text was updated successfully, but these errors were encountered:
DavidToneian
changed the title
ENH: devcontainer & docker: use patterns and improvements
ENH: devcontainer & docker: use cases and improvements
Aug 29, 2023
Feature Type
Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas
Problem Description
Hi,
as a user of Visual Studio Code, I was happy to see that devcontainer, i.e., a way to develop and run pandas with all its dependencies in an isolated docker container with vscode integration, was supported by pandas.
My assumption is that the devcontainer configuration should allow contributors, new and experienced, to automate tedious tasks away (install dependencies, build pandas, configure vscode), and accelerate pandas development.
However, I felt that the current implementation had some shortcomings:
root
within the container.Feature Description / Request for Comments
I opened PR #54845 to address some of these points, though I don't claim to have solved them completely.
@mroeschke made reference to the history of pandas' Dockerfile, e.g., #49981, where docke image size and build times were of concern.
The image size of PR #54845 stands at roughly 1.2 GB, and build time at a few minutes for the docker image itself, plus on the order of ten minutes to have it install the python
mamba
environment and build pandas when first launched as a vscode devcontainer.To better understand how to improve devcontainer support for pandas contributors, I'd like to ask the following:
Dockerfile
use case today, aside from being part of the devcontainer setup? Which workflows is the docker image a part of, what are their constraints?Alternative Solutions
Depending on the discussion, one could have, e.g.,
Dockerfile
s for the vscode devcontainer use case, and for the other use casesDockerfile
containing largely-static OS-level setup (e.g., C++ build dependencies), while the rest that evolves more quickly in the pandas repo is left for devcontainer-specific files and/or the docker image userDockerfile
that contains a pre-built pandas distribution sourced from themain
branch -- in this case I think one would want to automatically build and publish docker images with each merge, so that users (i.e., contributors) can fetch a ready-made image and start developing.What are your thoughts on this?
The text was updated successfully, but these errors were encountered: