How can I merge UserData and LLM / ChatLLM data ? #203

slavag · 2023-05-29T18:36:58Z

Hi,
I generated db with my emails, downloaded from Gmail, and I can use UserData data source , or I can use LLM / ChatLLM data sources. Is there any option combine them ? As UserData contains only my email without anything else and LLM / ChatLLM contains everything, but my content.

Thanks

pseudotensor · 2023-05-29T19:06:05Z

Hi! UserData assumes you are chatting with UserData docs so its like ChatLLM with those docs.

If you generated the db externally you maybe followed readme, then your docs are part of UserData yes

slavag · 2023-05-29T19:34:06Z

Well, I'm a bit confused :) I generated db using this command :
python generate.py --base_model=gptj --score_model=None --langchain_mode='UserData' --user_path=user_path
And when it was generated I can chat with UserData source, but it seems to be generated only from the UserData (many times just empty answer). But if I select LLM / ChatLLM I'm getting different result, which looks ok. So, maybe I'm doing something wrong ?

Thanks

pseudotensor · 2023-05-30T02:19:23Z

@slavag I know it's a bit confusing. Especially without tooltips or other documentation, those names are not entirely clear. Happy if you have suggestions.

This may help: https://github.com/h2oai/h2ogpt/blob/main/FAQ.md#explain-things-in-ui

slavag · 2023-05-30T05:57:54Z

Thanks, I found this : "To Chat with your docs, choose, e.g. UserData. To avoid including docs, and just chat with LLM, choose ChatLLM." , but when I'm trying to chat with UserData, I don't see results that I'm getting with LLM. So, how can this be fixed ?

pseudotensor · 2023-05-30T08:11:59Z

Hi @slavag can you show me what you mean by "I don't see results that I'm getting with LLM"? I can't quite follow. Thanks.

slavag · 2023-05-30T08:37:50Z

Hi,

Sure, in this screenshot I'm using UserData and result is empty:

And in this I'm using ChatLLM, which provides result

Thanks

pseudotensor · 2023-05-30T09:05:52Z

@slavag Thanks for explaining. I think it might be related to #192 that I'll fix tomorrow. Basically, the prompt is getting truncated away when the doc chunks added are too long, because of a rough estimate of the number of characters per token.

A temporary work-around is to run generate.py with --top_k_docs=3 or --chunk_size=256 to reduce the amount of data in the context. top_k_docs can also be controlled from the UI via expert settings.

Let me know if that helps. It fixes the cases I've seen. A more general solution will be done soon.

pseudotensor · 2023-05-30T09:08:42Z

BTW, you can also start the UI with --h2ocolors=False to get better colors in non-dark mode.

slavag · 2023-05-30T09:27:16Z

@pseudotensor Thanks, the --top_k_docs=3 solved the issue, --chunk_size=256 doesn't.
And special thanks for --h2ocolors=False :)

pseudotensor · 2023-05-30T09:29:54Z

Hi @slavag Great. Will fix the general issue soon.

chunk_size makes sense if you already created the database, that would only help if one remade the db.

top_k_docs is good choice for now. I'll even make it default for now until I resolve that issue.

Sanjana8888 · 2023-06-26T10:50:28Z

How did you generate db?

JettScythe · 2023-07-27T16:02:07Z

I don't think the title of this issue quite matches OP's concern, but it's exactly what I'm looking for. @pseudotensor Is there a way I can use UserData with LLM mode? That is to say: If a user asks a question while in the LLM collection, but there is a document in UserData that contains relevant information - I would like the bot to output a mix of the two responses. I was thinking as a workaround I could upload a document to the LLM collection, but it seems to switch to MyData on upload.

I've also tried using 'All' (which at least as I've conceptualized it, is not what I'm looking for) but I get Did not generate db since no sources and then a normal looking LLM response.

I did just find #447 (comment) which works great on the front end. How can I use it with the Gradio Client?

After some tinkering, it seems to be the case that as long as that flag is passed in while generating, that's the default behaviour in both the front end and while using the Gradio client. Can you confirm?

pseudotensor · 2023-07-27T18:10:40Z

I couldn't quite follow what is meant by "bot to output a mix of the two responses". Can you explain?

FYI ---use_llm_if_no_docs=True is default again, for a while I had False, but it's a bit odd to get no "No related documents found" by default.

Yes, for gradio client, gradio server would just need to have that parameter set. I confirm.

JettScythe · 2023-07-27T18:14:06Z

I would most likely make things worse by trying to explain. This is all very new to me & I'm still very much learning the terminology.
--use_llm_if_no_docs=True is exactly what I was looking for. Thank you very much :)

pseudotensor closed this as completed May 30, 2023

pseudotensor reopened this May 30, 2023

pseudotensor closed this as completed May 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can I merge UserData and LLM / ChatLLM data ? #203

How can I merge UserData and LLM / ChatLLM data ? #203

slavag commented May 29, 2023

pseudotensor commented May 29, 2023

slavag commented May 29, 2023

pseudotensor commented May 30, 2023

slavag commented May 30, 2023

pseudotensor commented May 30, 2023

slavag commented May 30, 2023

pseudotensor commented May 30, 2023 •

edited

Loading

pseudotensor commented May 30, 2023

slavag commented May 30, 2023

pseudotensor commented May 30, 2023

Sanjana8888 commented Jun 26, 2023

JettScythe commented Jul 27, 2023 •

edited

Loading

pseudotensor commented Jul 27, 2023

JettScythe commented Jul 27, 2023

How can I merge UserData and LLM / ChatLLM data ? #203

How can I merge UserData and LLM / ChatLLM data ? #203

Comments

slavag commented May 29, 2023

pseudotensor commented May 29, 2023

slavag commented May 29, 2023

pseudotensor commented May 30, 2023

slavag commented May 30, 2023

pseudotensor commented May 30, 2023

slavag commented May 30, 2023

And in this I'm using ChatLLM, which provides result

pseudotensor commented May 30, 2023 • edited Loading

pseudotensor commented May 30, 2023

slavag commented May 30, 2023

pseudotensor commented May 30, 2023

Sanjana8888 commented Jun 26, 2023

JettScythe commented Jul 27, 2023 • edited Loading

pseudotensor commented Jul 27, 2023

JettScythe commented Jul 27, 2023

pseudotensor commented May 30, 2023 •

edited

Loading

JettScythe commented Jul 27, 2023 •

edited

Loading