Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Biocypher KG rag prompt construction need query result examples #204

Open
winternewt opened this issue Sep 10, 2024 · 0 comments
Open

Biocypher KG rag prompt construction need query result examples #204

winternewt opened this issue Sep 10, 2024 · 0 comments

Comments

@winternewt
Copy link
Contributor

Issue description

When KG rag agent constructs a query to a knowledge graph, it uses the schema and series of prompts to construct a query based on available Entities, Relationships, fields which results in a formally correct query.

Generate a database query in Cypher that answers the user's question. You can use the following entities: ['Drug'], relationships: ['DrugInteraction'], and properties: {'Entities': {'Drug': {'name': 'metformin'}}, 'Relationships': {'DrugInteraction': {'level': None, 'class': None}}}. Given the following valid combinations of source, relationship, and target: '(:None)-(:DrugInteraction)->(:None)', generate a Cypher query using one of these combinations. Only return the query, without any additional text, symbols or characters --- just the query statement.

However, in a knowledge graph entities may have a naming convention of entities that the agent does not take into account because such context is not supplied to it. I've already ran into same issues with database rag agents before, database contained latin species names, say "M. musculus", so it was essential for LLM to convert and sometimes re-formulate user queries with common names, plurals. generalizations and even other taxons like "mice", "rodent", "insects" into a specific set of latin names of species present in "Latin Names" column

Steps to reproduce

https://drugs.longevity-genie.info/

Query A:

What drug interactions of Metformin are you aware of? What are these interactions?
In query A the drug name "Metformin", matches the drug interactions graph unspecified naming convention with 'first capital letter' and therefore yields proper results and a correct answer.

Query B:

What drug interactions of metformin are you aware of? What are these interactions?
In query B the cypher query is exactly the same, the only difference is 'm' instead of 'M'
The query is formally correct, the number of results is 0. LLM has no means to know that there is a 'first capital letter' naming convention in play and cant devise that from the result containing no examples
Reflection doesn't solve this at all due to same reason: 0 results is technically a valid result, figuring this is a false negative requires either trial-and-error or prior knowledge.

Expected result

This can be mitigated on case-by case basis by prompting somewhat but require more robust solution.
Either more detailed schema or few-shot output examples of * are essential for LLM to comprehend the unspecified conventions present in contents, even better both.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant