Biocypher KG rag prompt construction need query result examples #204

winternewt · 2024-09-10T23:50:55Z

Issue description

When KG rag agent constructs a query to a knowledge graph, it uses the schema and series of prompts to construct a query based on available Entities, Relationships, fields which results in a formally correct query.

Generate a database query in Cypher that answers the user's question. You can use the following entities: ['Drug'], relationships: ['DrugInteraction'], and properties: {'Entities': {'Drug': {'name': 'metformin'}}, 'Relationships': {'DrugInteraction': {'level': None, 'class': None}}}. Given the following valid combinations of source, relationship, and target: '(:None)-(:DrugInteraction)->(:None)', generate a Cypher query using one of these combinations. Only return the query, without any additional text, symbols or characters --- just the query statement.

However, in a knowledge graph entities may have a naming convention of entities that the agent does not take into account because such context is not supplied to it. I've already ran into same issues with database rag agents before, database contained latin species names, say "M. musculus", so it was essential for LLM to convert and sometimes re-formulate user queries with common names, plurals. generalizations and even other taxons like "mice", "rodent", "insects" into a specific set of latin names of species present in "Latin Names" column

Steps to reproduce

https://drugs.longevity-genie.info/

Query A:

What drug interactions of Metformin are you aware of? What are these interactions?
In query A the drug name "Metformin", matches the drug interactions graph unspecified naming convention with 'first capital letter' and therefore yields proper results and a correct answer.

Query B:

What drug interactions of metformin are you aware of? What are these interactions?
In query B the cypher query is exactly the same, the only difference is 'm' instead of 'M'
The query is formally correct, the number of results is 0. LLM has no means to know that there is a 'first capital letter' naming convention in play and cant devise that from the result containing no examples
Reflection doesn't solve this at all due to same reason: 0 results is technically a valid result, figuring this is a false negative requires either trial-and-error or prior knowledge.

Expected result

This can be mitigated on case-by case basis by prompting somewhat but require more robust solution.
Either more detailed schema or few-shot output examples of * are essential for LLM to comprehend the unspecified conventions present in contents, even better both.

The text was updated successfully, but these errors were encountered:

winternewt mentioned this issue Sep 11, 2024

Graph nodes Semantic search biocypher/biocypher#374

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Biocypher KG rag prompt construction need query result examples #204

Biocypher KG rag prompt construction need query result examples #204

winternewt commented Sep 10, 2024

Biocypher KG rag prompt construction need query result examples #204

Biocypher KG rag prompt construction need query result examples #204

Comments

winternewt commented Sep 10, 2024

Issue description

Steps to reproduce

Query A:

Query B:

Expected result