Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Dockerfile #70

Closed
wants to merge 5 commits into from
Closed

add Dockerfile #70

wants to merge 5 commits into from

Conversation

mpadge
Copy link

@mpadge mpadge commented May 12, 2023

Thanks for the invitation @imartinez . This is only a draft for now, because we need to decide whether it should have endpoints exposed.

@thebigbone
Copy link

thebigbone commented May 12, 2023

I would suggest to use an alpine image instead of ubuntu. Alpine is lightweight, no bloatware and a lot faster than ubuntu

@Polpetta
Copy link

I'd also suggest to use the COPY directive instead of cloning with git this repository. It makes the build faster and doesn't need to connect to Github servers but instead it allows to source directly from the local copy.

@mpadge mpadge marked this pull request as ready for review May 12, 2023 12:45
@mpadge
Copy link
Author

mpadge commented May 12, 2023

@imartinez Current form works, as provides at least a simple start. Would you like me to update docs as well before merging, or after?

@Polpetta Great idea, feel free to add to PR

@thebigbone ubuntu is a safe fallback because it's an image that people are way more likely to already have than any others. Size is not an issue because the image ends up > 15GB anyway, so starting sizes are completely irrelevant. But it's @imartinez's repo anyway, who ultimately gets to decide what base image to use.

@vilaca
Copy link
Contributor

vilaca commented May 12, 2023

The problem with using ubuntu images is that by being bigger they also introduce more possible vulns.

Copy link

@Polpetta Polpetta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are my 2 cents. I'd also suggest creating a .dockerignore file to ignore files and other folders that could slow the build context (like models. db)

Comment on lines +10 to +21
RUN cd home \
&& git clone https://github.com/imartinez/privateGPT.git \
&& cd privateGPT \
&& pip install -r requirements.txt

RUN echo "PERSIST_DIRECTORY=db\nLLAMA_EMBEDDINGS_MODEL=models/ggml-model-q4_0.bin\nMODEL_TYPE=GPT4All\nMODEL_PATH=models/ggml-gpt4all-j-v1.3-groovy.bin\nMODEL_N_CTX=1000" > home/privateGPT/.env \
&& chmod a+x home/privateGPT/.env

RUN mkdir home/privateGPT/models \
&& cd home/privateGPT/models \
&& wget https://gpt4all.io/models/ggml-gpt4all-j-v1.3-groovy.bin \
&& wget https://huggingface.co/Pi3141/alpaca-native-7B-ggml/resolve/397e872bf4c83f4c642317a5bf65ce84a105786e/ggml-model-q4_0.bin

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also check the order of the statements, in order to make next builds quicker by leveraging Docker build cache feature.

Suggested change
RUN cd home \
&& git clone https://github.com/imartinez/privateGPT.git \
&& cd privateGPT \
&& pip install -r requirements.txt
RUN echo "PERSIST_DIRECTORY=db\nLLAMA_EMBEDDINGS_MODEL=models/ggml-model-q4_0.bin\nMODEL_TYPE=GPT4All\nMODEL_PATH=models/ggml-gpt4all-j-v1.3-groovy.bin\nMODEL_N_CTX=1000" > home/privateGPT/.env \
&& chmod a+x home/privateGPT/.env
RUN mkdir home/privateGPT/models \
&& cd home/privateGPT/models \
&& wget https://gpt4all.io/models/ggml-gpt4all-j-v1.3-groovy.bin \
&& wget https://huggingface.co/Pi3141/alpaca-native-7B-ggml/resolve/397e872bf4c83f4c642317a5bf65ce84a105786e/ggml-model-q4_0.bin
WORKDIR /privateGPT/
RUN mkdir models \
&& cd models \
&& wget https://gpt4all.io/models/ggml-gpt4all-j-v1.3-groovy.bin \
&& wget https://huggingface.co/Pi3141/alpaca-native-7B-ggml/resolve/397e872bf4c83f4c642317a5bf65ce84a105786e/ggml-model-q4_0.bin
RUN echo "PERSIST_DIRECTORY=db\nLLAMA_EMBEDDINGS_MODEL=models/ggml-model-q4_0.bin\nMODEL_TYPE=GPT4All\nMODEL_PATH=models/ggml-gpt4all-j-v1.3-groovy.bin\nMODEL_N_CTX=1000" > .env \
&& chmod a+x .env
COPY . .
RUN pip install -r requirements.txt
ENTRYPOINT ["/usr/bin/python", "/privateGPT/privateGTP.py"]

I went by heart so please check it locally! 👍

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Polpetta I didn't put an entrypoint yet, because that would then need to expose the source_directory in this repo to a .env var so the whole thing could me run with a local volume mounted to source_directory, or elsewhere. Any clever ideas how to expose that while enabling local mount to fill or replace source_directory?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can always create a link to a local path or a volume of your choice when running docker run, like docker run -v/you/local/sources:/privateGPT/source_directory imartinez/privateGPT, no need to set a .env var imo.

Dockerfile Outdated Show resolved Hide resolved
Co-authored-by: Davide Polonio <poloniodavide@gmail.com>
@tgh19
Copy link

tgh19 commented May 13, 2023

Doing the lords work here @mpadge

@imartinez
Copy link
Collaborator

@mpadge thanks for the work! Please update the readme presenting this as an alternative way of getting the project running. Try to make it super clear, taking into account there is a lot of people checking out this repo, and not everyone is an experienced SW dev. We can merge it once you got it. Thanks!!

@hanwsf
Copy link

hanwsf commented May 14, 2023

#90 (comment)
ubuntu:latest doesn't work, tested.

@JulienA
Copy link

JulienA commented May 14, 2023

Hello i made another that using dockerfile and compose #120

@mpadge
Copy link
Author

mpadge commented May 15, 2023

@imartinez I'm going to close this in favour of #120 from @JulienA. docker-compose is definitely the way to go, to separate the install and ingest steps.

@mpadge mpadge closed this May 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants