Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: optimize docker-compose up for faster boot time #27953

Merged
merged 6 commits into from
Apr 12, 2024

Conversation

mistercrunch
Copy link
Member

@mistercrunch mistercrunch commented Apr 9, 2024

SUMMARY

When firing the docker-compose up command, which is pretty much the first step of most developer workflows, we kick off up 2 chromium downloads: one for PLAYWRIGHT and one for PUPPETEER by default. Those are used for optional features like alerts/reports/thumbnails/ and for CI.

My approach here to prevent all this work when running docker-compose up is to:

  • playwright: move the download/install to the Dockerfile in the dev layer
  • pupeteer: turn off by default. From my understanding this is only really required for a CI step, and is implemented there. I don't think developers are meant to have this locally. But note that you can still turn it on by switching a env var.

As a side mission, since I needed it while working on this, I'm also introducing support for defining environment variables overrides for docker-compose in docker/.env-local while adding this to .gitignore, which developers can use without the fear/confusion around committing local environment specific settings to the repo.


About the core feature in this PR here, this saves minutes upon firing docker-compose up, and reduce confusion
around "what the heck is it doing!?", in a phase where we should just be pulling and starting docker images.

More about PUPPETEER

About PUPPETEER: in #22623, I found that it is used by Applitool in two GHAs:

  • (superset-applitool-cypress)[https://github.com/apache/superset/blob/master/.github/workflows/superset-applitool-cypress.yml]
  • (superset-applitools-storybook)[https://github.com/apache/superset/blob/master/.github/workflows/superset-applitools-storybook.yml]

Seems PUPPETEER is probably never really useful locally.

More about PLAYWRIGHT

About PLAYWRIGHT, this is what powers alerts and reports as well thumbnail-generation.

Potential follow up work

  • decide on Chromium VS Firefox as the primary webdriver to power those features, and remove the other one from the docker image as it bloats it with whichever one is not in use
  • use playwright to download the Firefox webdriver as right now it's a hot mess of wget tarballs, and it seemed playwright wraps this up nicely
  • Dockerfile refactor: currently this layer is low in the docker pipeline, any layer rebuild before this one will retrigger the download of the browser. Eventually we could compose docker images to prevent this. I've been talking about setting up some "fit" and "fat" images for a while, which would include the web driver(s). Currently only dev has this.

@github-actions github-actions bot added the doc Namespace | Anything related to documentation label Apr 9, 2024
@mistercrunch
Copy link
Member Author

@supersetbot rebase

When firing the `docker-compose up` command, which is pretty much the first step
of most developer workflows, we kick up up 2 chromium downloads: one
for PLAYWRIGHT and one for PUPPETEER by default. Those are used for optional features
like alerts/reports/thumbnails/ and for CI.

For convenience, I'm also introducing support for defining
environment variables overrides for docker-compose in `docker/.env-local` while
adding this to `.gitignore`, which developers can use without the
fear/confusion around committing local envrionment specific settings to
the repo.

-----------------

About the core feature in this PR here,
this saves minutes upon firing `docker-compose up`, and reduce confusion
around "what the heck is it doing!?", in a phase where we should just be
pulling and starting docker images.

Most developer workflows don't require either of those, and if it
were the case, we should bake this into the docker image as opposed to
installing during the bootstrap phase. In any case, devs can switch those on
easily by tweaking the env vars specified in `docker/.env`

About PUPPETEER: in #22623, I
found that it is used by Applitool in two GHAs:
-
https://github.com/apache/superset/blob/master/.github/workflows/superset-applitool-cypress.yml
-
https://github.com/apache/superset/blob/master/.github/workflows/superset-applitools-storybook.yml

About PLAYWRIGHT, this is what powers alerts and reports as well
thumbnail-generation. This is more common, and can be flipped on
by setting `ENABLE_PLAYWRIGHT=true` in `docker/.env`
@github-actions github-actions bot added the github_actions Pull requests that update GitHub Actions code label Apr 11, 2024
@mistercrunch mistercrunch changed the title feat: disable chromium downloads by default in docker-compose feat: optimize docker-compose up for faster boot time Apr 11, 2024
@@ -21,7 +21,7 @@ jobs:
steps:
- id: set_matrix
run: |
MATRIX_CONFIG=$(if [ "${{ github.event_name }}" == "pull_request" ]; then echo '["ci"]'; else echo '["dev", "lean", "py310", "websocket", "dockerize"]'; fi)
MATRIX_CONFIG=$(if [ "${{ github.event_name }}" == "pull_request" ]; then echo '["dev"]'; else echo '["dev", "lean", "py310", "websocket", "dockerize"]'; fi)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a bit cryptic, but the short story is we built only the ci layer in PRs (and a wider list on master push events), and I decided that building dev makes much more sense. I thought ci was all-encompassing when I first set this matrix, but turns out it's just a small layer on top of lean. Testing dev captures the bulk of the dockerfile.

RUN playwright install chromium

# Install GeckoDriver WebDriver
RUN wget -q https://github.com/mozilla/geckodriver/releases/download/${GECKODRIVER_VERSION}/geckodriver-${GECKODRIVER_VERSION}-linux64.tar.gz -O - | tar xfz - -C /usr/local/bin \
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could simply playright install firefox here instead of this line and the following. For now I decided not to touch it.

@mistercrunch mistercrunch merged commit 40e77be into master Apr 12, 2024
43 checks passed
@mistercrunch mistercrunch deleted the disable_playwright branch April 12, 2024 23:07
@artofcomputing artofcomputing mentioned this pull request Apr 15, 2024
3 tasks
EnxDev pushed a commit to EnxDev/superset that referenced this pull request Apr 15, 2024
mistercrunch added a commit that referenced this pull request Apr 15, 2024
As reported here #28026, in
#27953 I added support for a
supplemental `docker/.env-local` override file, not thinking it was
going to be required upon starting docker-compose. I thought I had
tested that while switching branches but somehow didn't test that
docker-compose would simply boot up in the absence of that file.

Since this is confusing to most, and we don't really have a script
I can latch onto to `touch docker/.env-local` for now, i decided
to backtrack and simply  remove the convenience "feature".
People should find an alternate way to alter their environment
somehow before starting the service.

I added a bit of documentation to point to an existing mechanism
around providing a `docker/pythonpath_dev/superset_config_docker.py`
file, which existed in the logic but wasn't clearly documented. This
solves for my use case where I was working a PR adding features
around Slack, where I wanted to provide an API key configuration
without risks of commiting/pushing it back to the repo.
@@ -33,7 +33,11 @@ services:
- redis:/data

db:
env_file: docker/.env
env_file:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these changes for env_file seems make i can not start superset following the documentation by
docker-compose -f docker-compose-image-tag.yml up

the error:

services.db.env_file contains {"path": "docker/.env", "required": true}, which is an invalid type, it should be a string
services.superset.env_file contains {"path": "docker/.env", "required": true}, which is an invalid type, it should be a string
services.superset-init.env_file contains {"path": "docker/.env", "required": true}, which is an invalid type, it should be a string
services.superset-worker.env_file contains {"path": "docker/.env", "required": true}, which is an invalid type, it should be a string
services.superset-worker-beat.env_file contains {"path": "docker/.env", "required": true}, which is an invalid type, it should be a string```

I am on Ubuntu 22.04.4 LTS 

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @shaoxp , this got fixed in #28039 , but note that you need docker-compose 2.24+ for this config to work. I tried to lookup if there's a version of the yaml I can setup to inform users to upgrade but couldn't find anything.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the info. i am able to start locally.

maybe you can update the document to include the required version.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I'm thinking a "check_env.py" script could be could, that script would make sure all external tools are meeting the minimum version requirements and such (npm, node, docker, docker-compose, git-lfs, ...).

qleroy pushed a commit to qleroy/superset that referenced this pull request Apr 28, 2024
jzhao62 pushed a commit to jzhao62/superset that referenced this pull request May 16, 2024
vinothkumar66 pushed a commit to vinothkumar66/superset that referenced this pull request Nov 11, 2024
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 4.1.0 labels Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels doc Namespace | Anything related to documentation github_actions Pull requests that update GitHub Actions code preset-io size/L 🚢 4.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants