Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Publish a new Docker image containing Chrome binary #185

Closed
3 tasks
brunoocasali opened this issue Feb 9, 2022 · 7 comments · Fixed by #235
Closed
3 tasks

Publish a new Docker image containing Chrome binary #185

brunoocasali opened this issue Feb 9, 2022 · 7 comments · Fixed by #235
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@brunoocasali
Copy link
Member

brunoocasali commented Feb 9, 2022

In order to solve the issue #139 for the Docker users, we will need to publish a new version of the base Dockerfile containing the Chrome binary.

To accomplish this issue we need to:

  • Create a new Dockerfile with the Chrome binary additions
  • Configure Github Actions to release a new image version getmeili/docs-scraper-with-chrome
  • Update README sections regarding the usage of this new image, which will be required only by the users who need the chrome binary.

After this addition, we will be able to instruct users to use this new image when they need it, and we will not impact the current users of the getmeili/docs-scraper image with a non-requested size addition.

hint: we could base this new image in the algolia's image https://hub.docker.com/layers/algolia/docsearch-scraper/latest/ or in this comment #139 (comment)

@brunoocasali brunoocasali added enhancement New feature or request good first issue Good for newcomers labels Feb 9, 2022
@meili-bors meili-bors bot closed this as completed in 5863acb Aug 2, 2022
@brunoocasali
Copy link
Member Author

@alallema, after #235 is merged, now we have an increase in the size of our Docker image:

image

This is not ideal, but if we were straight to what this issue says, the job is not done yet 😅.

Two choices:

  • Reopen this issue and just close it when we have a new docker image done.
  • Let this be as it is and wait for future users to ask for improvements.

What are your thoughts about that?

@alallema
Copy link
Contributor

alallema commented Aug 4, 2022

@brunoocasali, I think it's better to keep this one open no?

@alallema alallema reopened this Aug 4, 2022
@mdraevich
Copy link
Contributor

@alallema @brunoocasali
Seems like greater image size has been caused by the installation of chromium-driver.
By taking into account dependencies of chromium-driver it's possible to figure out why it happened - at least chromium-browser being a dependency has installed size from 100MB-200MB depending on architecture.

At any rate it's gonna be great to find out the way to decrease the overall size of image.
BTW seems like algolia/docsearch-scraper has the same purpose as well as greater image size... Things are not that bad 😀

@mdraevich
Copy link
Contributor

mdraevich commented Aug 25, 2022

@brunoocasali @alallema
I spent a little time on building different Dockerfiles in order to compare their final uncompressed size.
I have three dockerfiles:

  • with_chromium.Dockerfile -- actual Dockerfile where chromium-driver & chromium are installed from debian repository.
  • with_chrome.Dockerfile -- changed Dockerfile where chrome-driver & chrome are installed from official repository.
  • without_pipenv.Dockerfile -- changed Dockerfile where chromium-driver & chromium are installed from debian repository, but pipenv is removed from image.

After building images that's a picture I got:

admin@docker-lab:~/meilisearch-demo/docs-scraper-fork$ docker image ls | grep test/
test/scraper-without-pipenv       latest      88568ec55e5c   6 seconds ago        1.73GB
test/scraper-with-chrome          latest      44c5bb15c055   9 minutes ago        1.95GB
test/scraper-with-chromium        latest      8bbeb6236cb9   About an hour ago    1.84GB

So, to come to conclusions:

  • switching to official chrome has no benefits from the image size perspective. But it's worth to check if chromium is still maintained & updated.
  • there is a way to decrease image size by removing pipenv from the final image as I believe there is no need to have pipenv in docker image. In order to install required packages we're gonna convert Pipfile to requirements.txt and install it by using pip.

What do you think about that?

@brunoocasali
Copy link
Member Author

Hello @mdraevich! Thanks for working on this. Your thoughts are really helpful!

My initial thought about reducing the image size was focused on splitting into two different images:

  • getmeili/docs-scraper just what is required to run the scrapping
  • getmeili/docs-scraper-with-chrome everything from the previous image + the entire chrome/chromium binary/

This brings extra complexity to managing because we will have to publish two images. Still, in my, POV is the best alternative since it helps people who don't need the chrome binary + people who need it without hurting the disk size from the first group.

That said, I'm not sure if it is worth just removing the pipenv from the image.

What do you think @alallema?

@alallema
Copy link
Contributor

alallema commented Sep 7, 2022

Thank you so much @mdraevich for your research! @brunoocasali I agree with you, I'm not sure it worth it for now. Maybe we could close this issue in meantime don't you think ?

@brunoocasali
Copy link
Member Author

Yeah, let's do it then. If somebody asks for a size compression, we can reopen this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants