Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streamline docker usage #49981

Merged
merged 6 commits into from
Dec 2, 2022
Merged

Conversation

WillAyd
Copy link
Member

@WillAyd WillAyd commented Dec 1, 2022

Took a look at this setup today and found a few things that could be improved to make it easier for new contributors. Some quick comparisons:

  • Image size current 6.1 GB vs new 2.75 GB
  • Image build time of current somewhere in the 30-45 minute range versus new at 2.5 min
  • Current image blurs the lines of responsibility between a Docker image and a container
  • Current image puts another layer of virtualization in with mamba that is arguably unnecessary with Docker

The remaining bottleneck with image creation is #48828

&& python setup.py build_ext -j 4 \
&& python -m pip install --no-build-isolation -e .
RUN python -m pip install --upgrade pip
RUN python -m pip install --use-deprecated=legacy-resolver \
Copy link
Member Author

@WillAyd WillAyd Dec 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--use-deprecated=legacy-resolver is in place because the new pip resolver seems to download multiple package versions to (I assume) check dependencies via setup.py or pyproject.yml. Some of our dependencies (ex: boto3) have a ton of releases, so it makes for a painfully slow process

Possible resolutions to use the newer solver I think are

  1. CI: add minimal requirements file #48828
  2. https://pip.pypa.io/en/stable/topics/dependency-resolution/#possible-ways-to-reduce-backtracking

So either reduce the number of dependencies, add a floor to their version or set up a constraint file once after solving

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or use mamba, which has a decent dependency resolver ...

@WillAyd WillAyd mentioned this pull request Dec 1, 2022
@mroeschke
Copy link
Member

Is this compatible with VSCode's .devcontainer.json setup? I don't use VSCode personally but not sure if that assumes the Dockerfile needs to be configured a certain way

@WillAyd
Copy link
Member Author

WillAyd commented Dec 1, 2022

Good question...at the very least it should match whatever the previous image had in terms of compatibility, but also not a VSCode user. Maybe @MarcoGorelli knows?

@MarcoGorelli
Copy link
Member

@natmokval I think you said you were using Docker? Was it via vscode? If so, just tagging in case you felt like having a look

Good question...at the very least it should match whatever the previous image had in terms of compatibility, but also not a VSCode user. Maybe @MarcoGorelli knows?

I don't remember any specific requirements in terms of compatibility when I tried it years' ago, but I'll give this a go

@MarcoGorelli
Copy link
Member

MarcoGorelli commented Dec 2, 2022

I think this line will need changing:

"python.pythonPath": "/opt/conda/bin/python",

Anyway, I tried out the rest in vscode, and it worked perfectly, well done!

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's ship this, thanks @WillAyd !

@MarcoGorelli MarcoGorelli added this to the 2.0 milestone Dec 2, 2022
@MarcoGorelli MarcoGorelli added Build Library building on various platforms Docs labels Dec 2, 2022
@MarcoGorelli MarcoGorelli merged commit 9e6bbde into pandas-dev:main Dec 2, 2022

# Configure apt and install packages
RUN apt-get update \
&& apt-get -y install --no-install-recommends apt-utils git tzdata dialog 2>&1 \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if those packages are not actually needed anymore? Especially tzdata seems to have been added on purpose to have the timezone tests working: #46219

(we copied this for the gitpod docker file as well, but so if it's no longer needed, we can clean that up there as well)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tzdata is already installed on the image. I'm guessing either from the base python image or may be included as part of the base ubuntu

@natmokval
Copy link
Contributor

@natmokval I think you said you were using Docker? Was it via vscode? If so, just tagging in case you felt like having a look

Hi @MarcoGorelli , I use docker with VSCode, but without .devcontainer.json. I run docker in terminal and mount my repository into container by -v /Users/natalia/pandas-natmokval:/home/pandas. Haven't tried dev containers so far.

@WillAyd WillAyd deleted the docker-simplify branch December 2, 2022 15:47
@WillAyd
Copy link
Member Author

WillAyd commented Dec 2, 2022

Awesome thanks @MarcoGorelli for the VSCode help

@MarcoGorelli
Copy link
Member

Is it OK to backport this so that the correct instructions show up in the stable version of the docs?

@WillAyd
Copy link
Member Author

WillAyd commented Dec 23, 2022

I am fine with that

@MarcoGorelli
Copy link
Member

@meeseeksdev please backport to 1.5.x

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Dec 23, 2022
MarcoGorelli pushed a commit that referenced this pull request Dec 23, 2022
Backport PR #49981: Streamline docker usage

Co-authored-by: William Ayd <will_ayd@innobi.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Build Library building on various platforms Docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants