You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I see issues with node deduplication. Ingesting transcripts of informal conversations, I am getting, for example, duplicate nodes for what is clearly the same person, e.g., "John Doe" and "John T. Doe". Is there a way to have more control over this, of even post-training, a capability to collapse these nodes into a single one?
The text was updated successfully, but these errors were encountered:
I've posted the same question on discord yesterday. I'm testing LightRAG with scientific papers which use abbreviated names very often. The same is for entities that are rephrased.
@rabner - I guess I could try to preprocess the raw text and normalize the names before training, but being able to have some control over the deduping seems like a basic capability that is still missing.
@jbartot Hi I am having the same issues, also dealing with scientific papers as @rabner. Did any of you find a solution? Also I am trying to insure references are consistant which seems like a hot topic in the RAG world. One obvious issue is papers are referenced with different styles even based on publisher. I have tried using Marker with llm and made some changes to the joint metadata.json, each paper then has several fields including the equations and references. I am not sure if is the right way of doing it, since I am also creating opportunities for introducing errors.
What are you using for converting pdf to txt?
I see issues with node deduplication. Ingesting transcripts of informal conversations, I am getting, for example, duplicate nodes for what is clearly the same person, e.g., "John Doe" and "John T. Doe". Is there a way to have more control over this, of even post-training, a capability to collapse these nodes into a single one?
The text was updated successfully, but these errors were encountered: