-
Notifications
You must be signed in to change notification settings - Fork 7.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VERY BIG performance improvement and beautiful features #521
Changes from all commits
b482d8f
0fe977f
c2a9235
7be527c
c1939db
afe8abb
dee2dba
562a718
3809682
f992b9c
f7aa817
d9e88d3
1f2e39c
4c290d2
02dc09b
a0d565c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
from dotenv import load_dotenv | ||
import os | ||
from deep_translator import GoogleTranslator | ||
import langdetect | ||
load_dotenv() | ||
|
||
auto_translate = os.environ.get("AUTO_TRANSLATE") | ||
|
||
def translate(text): | ||
if auto_translate == None or auto_translate == "false" or auto_translate == "False" or auto_translate == "0": | ||
return text | ||
else: | ||
if langdetect.detect(text) == "en": | ||
return text | ||
new_text = GoogleTranslator(source="auto", target="en").translate(text) | ||
print(f"Translated '{text}' to '{new_text}'") | ||
return new_text | ||
|
||
if __name__ == "__main__": | ||
print(translate("Qual è la massa di un elettrone?")) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,11 @@ | ||
PERSIST_DIRECTORY=db | ||
MODEL_TYPE=GPT4All | ||
MODEL_PATH=models/ggml-gpt4all-j-v1.3-groovy.bin | ||
EMBEDDINGS_MODEL_NAME=all-MiniLM-L6-v2 | ||
MODEL_N_CTX=1000 | ||
TARGET_SOURCE_CHUNKS=4 | ||
MODEL_TYPE=LlamaCpp | ||
MODEL_PATH=/path/for/model | ||
#best english embeddings model | ||
#best italian efederici/sentence-it5-base | ||
EMBEDDINGS_MODEL_NAME=all-mpnet-base-v2 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that this one uses There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was also concerned about this. But it seems to work well in my tests. I am using it to take formulas from my physics book while studying hahaha. |
||
MODEL_N_CTX=4096 | ||
N_GPU_LAYERS=12 | ||
USE_MLOCK=1 | ||
TARGET_SOURCE_CHUNKS=8 | ||
N_BATCH=1024 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
export LLAMA_CUBLAS=1 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. so in your case : set CMAKE_ARGS=-DLLAMA_CUBLAS=on
set FORCE_CMAKE=1 are not needed? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that flag are for windows. this script work with linux. I already tested it :) |
||
#check if venv exists | ||
if [ ! -d "venv" ]; then | ||
python3 -m venv venv | ||
fi | ||
source venv/bin/activate | ||
pip install -r requirements.txt |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,9 +5,10 @@ | |
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler | ||
from langchain.vectorstores import Chroma | ||
from langchain.llms import GPT4All, LlamaCpp | ||
|
||
import os | ||
import argparse | ||
|
||
from check_lang import translate | ||
load_dotenv() | ||
|
||
embeddings_model_name = os.environ.get("EMBEDDINGS_MODEL_NAME") | ||
|
@@ -17,6 +18,9 @@ | |
model_path = os.environ.get('MODEL_PATH') | ||
model_n_ctx = os.environ.get('MODEL_N_CTX') | ||
target_source_chunks = int(os.environ.get('TARGET_SOURCE_CHUNKS',4)) | ||
n_gpu_layers = os.environ.get('N_GPU_LAYERS') | ||
use_mlock = os.environ.get('USE_MLOCK') | ||
n_batch = os.environ.get('N_BATCH') if os.environ.get('N_BATCH') != None else 512 | ||
|
||
from constants import CHROMA_SETTINGS | ||
|
||
|
@@ -31,12 +35,13 @@ def main(): | |
# Prepare the LLM | ||
match model_type: | ||
case "LlamaCpp": | ||
llm = LlamaCpp(model_path=model_path, n_ctx=model_n_ctx, callbacks=callbacks, verbose=False) | ||
llm = LlamaCpp(model_path=model_path, n_ctx=model_n_ctx, callbacks=callbacks, verbose=False,n_gpu_layers=n_gpu_layers, use_mlock=use_mlock,top_p=0.9, n_batch=n_batch) | ||
case "GPT4All": | ||
llm = GPT4All(model=model_path, n_ctx=model_n_ctx, backend='gptj', callbacks=callbacks, verbose=False) | ||
case _default: | ||
print(f"Model {model_type} not supported!") | ||
exit; | ||
|
||
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever, return_source_documents= not args.hide_source) | ||
# Interactive questions and answers | ||
while True: | ||
|
@@ -45,20 +50,20 @@ def main(): | |
break | ||
|
||
# Get the answer from the chain | ||
res = qa(query) | ||
res = qa(translate(query)) | ||
answer, docs = res['result'], [] if args.hide_source else res['source_documents'] | ||
|
||
# Print the relevant sources used for the answer | ||
for document in docs: | ||
print("\n> " + document.metadata["source"] + ":") | ||
print(document.page_content) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe adding some nice colors? Also using translate_src = os.environ.get('TRANSLATE_SRC_LANG', "en")
translate_dst = os.environ.get('TRANSLATE_DST_LANG', "fr") for document in docs:
print(f"\n\033[31m Source: {document.metadata['source']} \033[0m")
if translate_ans:
document.page_content = GoogleTranslator(source=translate_src, target=translate_dst).translate(document.page_content)
print(f"\033[32m\033[2m : {document.page_content} \033[0m") There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I was already thinking about it. I was going to implement it as soon as I had some free time :) |
||
|
||
# Print the result | ||
print("\n\n> Question:") | ||
print(query) | ||
print("\n> Answer:") | ||
print(answer) | ||
|
||
# Print the relevant sources used for the answer | ||
for document in docs: | ||
print("\n> " + document.metadata["source"] + ":") | ||
print(document.page_content) | ||
|
||
def parse_arguments(): | ||
parser = argparse.ArgumentParser(description='privateGPT: Ask questions to your documents without an internet connection, ' | ||
'using the power of LLMs.') | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I'd remove the translation feature from this PR so that the great core improvements can be viewed separately since this feature might be a bit controversial (requires internet, uses Google Translate,..).
If you decide to leave them in though it would be awesome if you could mention it in the README and add
AUTO_TRANSLATE
to the example.envThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest all the changes I have made to in this PR are changes I had to make to make privateGPT functional for me. And so I thought that just as they are useful for me they might be useful for someone else. In any case, actually the translation function as useful as it is, it totally goes against the purpose of this project, so yes, I will remove it.
I was thinking, however, of having the model locally translate the prompt
For example asking Vicuna "If this text is not English, translate it into English." After Vicuna does the translation you use Vicuna's response to execute the prompt. This would ensure privacy