-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How can I merge UserData and LLM / ChatLLM data ? #203
Comments
Hi! UserData assumes you are chatting with UserData docs so its like ChatLLM with those docs. If you generated the db externally you maybe followed readme, then your docs are part of UserData yes |
Well, I'm a bit confused :) I generated db using this command : Thanks |
@slavag I know it's a bit confusing. Especially without tooltips or other documentation, those names are not entirely clear. Happy if you have suggestions. This may help: https://github.com/h2oai/h2ogpt/blob/main/FAQ.md#explain-things-in-ui |
Thanks, I found this : "To Chat with your docs, choose, e.g. UserData. To avoid including docs, and just chat with LLM, choose ChatLLM." , but when I'm trying to chat with UserData, I don't see results that I'm getting with LLM. So, how can this be fixed ? |
Hi @slavag can you show me what you mean by "I don't see results that I'm getting with LLM"? I can't quite follow. Thanks. |
@slavag Thanks for explaining. I think it might be related to #192 that I'll fix tomorrow. Basically, the prompt is getting truncated away when the doc chunks added are too long, because of a rough estimate of the number of characters per token. A temporary work-around is to run generate.py with Let me know if that helps. It fixes the cases I've seen. A more general solution will be done soon. |
BTW, you can also start the UI with |
@pseudotensor Thanks, the --top_k_docs=3 solved the issue, --chunk_size=256 doesn't. |
Hi @slavag Great. Will fix the general issue soon. chunk_size makes sense if you already created the database, that would only help if one remade the db. top_k_docs is good choice for now. I'll even make it default for now until I resolve that issue. |
How did you generate db? |
I don't think the title of this issue quite matches OP's concern, but it's exactly what I'm looking for. @pseudotensor Is there a way I can use UserData with LLM mode? That is to say: If a user asks a question while in the LLM collection, but there is a document in UserData that contains relevant information - I would like the bot to output a mix of the two responses. I was thinking as a workaround I could upload a document to the LLM collection, but it seems to switch to MyData on upload. I've also tried using 'All' (which at least as I've conceptualized it, is not what I'm looking for) but I get I did just find #447 (comment) which works great on the front end. How can I use it with the Gradio Client? After some tinkering, it seems to be the case that as long as that flag is passed in while generating, that's the default behaviour in both the front end and while using the Gradio client. Can you confirm? |
I couldn't quite follow what is meant by "bot to output a mix of the two responses". Can you explain? FYI Yes, for gradio client, gradio server would just need to have that parameter set. I confirm. |
I would most likely make things worse by trying to explain. This is all very new to me & I'm still very much learning the terminology. |
Hi,
I generated db with my emails, downloaded from Gmail, and I can use UserData data source , or I can use LLM / ChatLLM data sources. Is there any option combine them ? As UserData contains only my email without anything else and LLM / ChatLLM contains everything, but my content.
Thanks
The text was updated successfully, but these errors were encountered: