Skip to content

ENH: devcontainer & docker: use cases and improvements #54862

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 of 3 tasks
DavidToneian opened this issue Aug 29, 2023 · 0 comments
Open
1 of 3 tasks

ENH: devcontainer & docker: use cases and improvements #54862

DavidToneian opened this issue Aug 29, 2023 · 0 comments
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@DavidToneian
Copy link
Contributor

DavidToneian commented Aug 29, 2023

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

Hi,

as a user of Visual Studio Code, I was happy to see that devcontainer, i.e., a way to develop and run pandas with all its dependencies in an isolated docker container with vscode integration, was supported by pandas.

My assumption is that the devcontainer configuration should allow contributors, new and experienced, to automate tedious tasks away (install dependencies, build pandas, configure vscode), and accelerate pandas development.

However, I felt that the current implementation had some shortcomings:

  1. It did not allow me, as a new contributor to pandas, to jump right into pandas development, as it did not build pandas automatically (and check for preconditions, such as git tags being present in the fork), and furthermore suffered from filesystem permission issues, as it was acting as root within the container.
  2. I wasn't sure how close the environment I would be developing in would be to the one used in GitHub Actions CI workflows, possibly causing troubleshooting issues when a commit works in the devcontainer but not on GH or vice versa.
  3. It didn't fully configure vscode to be compatible with pandas' style guidelines (e.g., 88 character line limit) and toolchain (e.g., pre-commit hooks)
  4. Many of the vscode configurations were deprecated and to be replaced.

Feature Description / Request for Comments

I opened PR #54845 to address some of these points, though I don't claim to have solved them completely.

@mroeschke made reference to the history of pandas' Dockerfile, e.g., #49981, where docke image size and build times were of concern.

The image size of PR #54845 stands at roughly 1.2 GB, and build time at a few minutes for the docker image itself, plus on the order of ten minutes to have it install the python mamba environment and build pandas when first launched as a vscode devcontainer.

To better understand how to improve devcontainer support for pandas contributors, I'd like to ask the following:

  1. What is the Dockerfile use case today, aside from being part of the devcontainer setup? Which workflows is the docker image a part of, what are their constraints?
  2. Is there an official docker image built and published by pandas? If so, when are new builds triggered?

Alternative Solutions

Depending on the discussion, one could have, e.g.,

  1. Separate Dockerfiles for the vscode devcontainer use case, and for the other use cases
  2. A common Dockerfile containing largely-static OS-level setup (e.g., C++ build dependencies), while the rest that evolves more quickly in the pandas repo is left for devcontainer-specific files and/or the docker image user
  3. A Dockerfile that contains a pre-built pandas distribution sourced from the main branch -- in this case I think one would want to automatically build and publish docker images with each merge, so that users (i.e., contributors) can fetch a ready-made image and start developing.

What are your thoughts on this?

@DavidToneian DavidToneian added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 29, 2023
@DavidToneian DavidToneian changed the title ENH: devcontainer & docker: use patterns and improvements ENH: devcontainer & docker: use cases and improvements Aug 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

1 participant