-
Notifications
You must be signed in to change notification settings - Fork 7.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add Dockerfile #70
add Dockerfile #70
Conversation
I would suggest to use an alpine image instead of ubuntu. Alpine is lightweight, no bloatware and a lot faster than ubuntu |
I'd also suggest to use the |
@imartinez Current form works, as provides at least a simple start. Would you like me to update docs as well before merging, or after? @Polpetta Great idea, feel free to add to PR @thebigbone ubuntu is a safe fallback because it's an image that people are way more likely to already have than any others. Size is not an issue because the image ends up > 15GB anyway, so starting sizes are completely irrelevant. But it's @imartinez's repo anyway, who ultimately gets to decide what base image to use. |
The problem with using ubuntu images is that by being bigger they also introduce more possible vulns. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are my 2 cents. I'd also suggest creating a .dockerignore
file to ignore files and other folders that could slow the build context (like models
. db
)
RUN cd home \ | ||
&& git clone https://github.com/imartinez/privateGPT.git \ | ||
&& cd privateGPT \ | ||
&& pip install -r requirements.txt | ||
|
||
RUN echo "PERSIST_DIRECTORY=db\nLLAMA_EMBEDDINGS_MODEL=models/ggml-model-q4_0.bin\nMODEL_TYPE=GPT4All\nMODEL_PATH=models/ggml-gpt4all-j-v1.3-groovy.bin\nMODEL_N_CTX=1000" > home/privateGPT/.env \ | ||
&& chmod a+x home/privateGPT/.env | ||
|
||
RUN mkdir home/privateGPT/models \ | ||
&& cd home/privateGPT/models \ | ||
&& wget https://gpt4all.io/models/ggml-gpt4all-j-v1.3-groovy.bin \ | ||
&& wget https://huggingface.co/Pi3141/alpaca-native-7B-ggml/resolve/397e872bf4c83f4c642317a5bf65ce84a105786e/ggml-model-q4_0.bin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd also check the order of the statements, in order to make next builds quicker by leveraging Docker build cache feature.
RUN cd home \ | |
&& git clone https://github.com/imartinez/privateGPT.git \ | |
&& cd privateGPT \ | |
&& pip install -r requirements.txt | |
RUN echo "PERSIST_DIRECTORY=db\nLLAMA_EMBEDDINGS_MODEL=models/ggml-model-q4_0.bin\nMODEL_TYPE=GPT4All\nMODEL_PATH=models/ggml-gpt4all-j-v1.3-groovy.bin\nMODEL_N_CTX=1000" > home/privateGPT/.env \ | |
&& chmod a+x home/privateGPT/.env | |
RUN mkdir home/privateGPT/models \ | |
&& cd home/privateGPT/models \ | |
&& wget https://gpt4all.io/models/ggml-gpt4all-j-v1.3-groovy.bin \ | |
&& wget https://huggingface.co/Pi3141/alpaca-native-7B-ggml/resolve/397e872bf4c83f4c642317a5bf65ce84a105786e/ggml-model-q4_0.bin | |
WORKDIR /privateGPT/ | |
RUN mkdir models \ | |
&& cd models \ | |
&& wget https://gpt4all.io/models/ggml-gpt4all-j-v1.3-groovy.bin \ | |
&& wget https://huggingface.co/Pi3141/alpaca-native-7B-ggml/resolve/397e872bf4c83f4c642317a5bf65ce84a105786e/ggml-model-q4_0.bin | |
RUN echo "PERSIST_DIRECTORY=db\nLLAMA_EMBEDDINGS_MODEL=models/ggml-model-q4_0.bin\nMODEL_TYPE=GPT4All\nMODEL_PATH=models/ggml-gpt4all-j-v1.3-groovy.bin\nMODEL_N_CTX=1000" > .env \ | |
&& chmod a+x .env | |
COPY . . | |
RUN pip install -r requirements.txt | |
ENTRYPOINT ["/usr/bin/python", "/privateGPT/privateGTP.py"] | |
I went by heart so please check it locally! 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Polpetta I didn't put an entrypoint yet, because that would then need to expose the source_directory
in this repo to a .env
var so the whole thing could me run with a local volume mounted to source_directory
, or elsewhere. Any clever ideas how to expose that while enabling local mount to fill or replace source_directory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can always create a link to a local path or a volume of your choice when running docker run
, like docker run -v/you/local/sources:/privateGPT/source_directory imartinez/privateGPT
, no need to set a .env
var imo.
Co-authored-by: Davide Polonio <poloniodavide@gmail.com>
Doing the lords work here @mpadge |
@mpadge thanks for the work! Please update the readme presenting this as an alternative way of getting the project running. Try to make it super clear, taking into account there is a lot of people checking out this repo, and not everyone is an experienced SW dev. We can merge it once you got it. Thanks!! |
#90 (comment) |
Hello i made another that using dockerfile and compose #120 |
@imartinez I'm going to close this in favour of #120 from @JulienA. docker-compose is definitely the way to go, to separate the install and ingest steps. |
Thanks for the invitation @imartinez . This is only a draft for now, because we need to decide whether it should have endpoints exposed.