Skip to content

Commit

Permalink
add knowledge graph vignette
Browse files Browse the repository at this point in the history
  • Loading branch information
slobentanzer committed Feb 6, 2024
1 parent e7278f6 commit f6253c6
Show file tree
Hide file tree
Showing 4 changed files with 100 additions and 9 deletions.
Binary file added docs/kg-demo.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
85 changes: 85 additions & 0 deletions docs/vignette-kg.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# Vignette: Knowledge Graph RAG

This vignette demonstrates the KG module of BioChatter as used by the
BioChatter Next application. We connect to a BioCypher knowledge graph (KG) to
retrieve relevant information for a given question. We then use the retrieved
information to generate a response to the question. The application can connect
to any real-world BioCypher KG by providing the connection details in the `KG
Settings` dialog.

## Background

For the demonstration purposes of this vignette, we include a demo KG based on
an open-source dataset of crime statistics in Manchester, because it allows us
to redistribute the KG due to its small size and public domain licence, and
because it is easily understood. This is the schema of the KG:

```mermaid
graph LR;
Person(:Person) -- KNOWS --> Person
Person -- FAMILY_REL --> Person
Person -- LIVES_AT --> Location(:Location)
Person -- PARTY_TO --> Crime(:Crime)
Person -- MADE_CALL --> PhoneCall(:PhoneCall)
Person -- RECEIVED_CALL --> PhoneCall
Crime -- INVESTIGATED_BY --> Officer(:Officer)
Crime -- OCCURRED_AT --> Location
Object(:Object) -- INVOLVED_IN --> Crime
```

The KG is adapted from a [Neo4j
tutorial](https://github.com/neo4j-graph-examples/pole), and is available as a
BioCypher adapter including a BioChatter Light integration
[here](https://github.com/biocypher/pole). We also include it in an optional
BioChatter Next Docker Compose
[configuration](https://github.com/biocypher/biochatter-next/blob/main/biochatter-next/docker-compose-incl-kg.yml)
to allow trying it out locally.

## Usage

In BioChatter Next, we first activate the KG functionality by clicking on the
`KG Settings` button in the sidebar. In the settings dialog, we can activate the
KG functionality and select how many results we want to retrieve. Returning to
the conversation and enabling the KG functionality for the current chat
(directly above the send button), we can then ask the model about the KG. The
conversation is pasted below for convenience.

![KG Demo](kg-demo.gif)

In the background, the RagAgent module of BioChatter receives the question and
generates a query to retrieve the desired information. This is then passed back
to the primary model, which includes it in its answer generation.

## Conclusion

The native integration of BioCypher KGs into the BioChatter framework allows for
a seamless integration of KGs into the conversational AI. This in turn
facilitates knowledge accessibility in a wide range of application domains.

## Conversation

### Message From You
How many crimes happened in total?

### Message From ChatGPT
According to the provided statement, the total number of crimes is 28,762. This information suggests that there have been 28,762 reported crimes in the given context. However, without additional information or context, it is difficult to determine the specific time period, location, or nature of these crimes. It's important to note that crime statistics can vary depending on the source, methodology, and reporting practices. Therefore, it is always advisable to consult official crime reports or local law enforcement agencies for the most accurate and up-to-date information on crime statistics.

### Message From You
Where happened most crimes?

### Message From ChatGPT
Based on the provided statement, it appears that the location with the highest number of crimes is a parking area, with a total of 811 reported crimes. However, without additional context or information, it is difficult to determine the consistency or inconsistency of this statement with other available information. It is important to note that crime rates and locations can vary significantly depending on various factors such as the region, city, neighborhood, and specific time period being considered. To provide a more comprehensive analysis, it would be helpful to have access to a broader dataset or additional information about the specific location and its crime statistics.

### Message From You
Who were the top 3 offenders?

### Message From ChatGPT
Based on the provided information, the top three offenders are:

1. Phillip Williamson: According to the information given, Phillip Williamson has committed 5 crimes.

2. Jessica Kelly: The information states that Jessica Kelly has also committed 5 crimes.

3. Brian Morales: The data indicates that Brian Morales has committed 4 crimes.

These three individuals, Phillip Williamson, Jessica Kelly, and Brian Morales, are the top offenders based on the number of crimes they have committed.
16 changes: 8 additions & 8 deletions docs/vignette-rag.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Vignette: Retrieval-Augmented Generation
# Vignette: Retrieval-Augmented Generation (RAG)

This vignette demonstrates the RAG module of BioChatter as used by the
BioChatter Next application. This basic use case involves an LLM manuscript
Expand Down Expand Up @@ -76,19 +76,19 @@ database instance, free of charge at

### BioChatter RAG

#### Contextual Prompt:
#### Contextual Prompt
You are an assistant to a biomedical researcher.

#### Contextual Prompt:
#### Contextual Prompt
You have access to text fragments from one or multiple scientific manuscripts, which were retrieved by their semantic similarity to the user's question.

#### Contextual Prompt:
#### Contextual Prompt
Your task is to help the user understand the manuscripts and their relevance to their questions. Be concise in your explanations, provide quotes from the manuscripts where appropriate, and be open about limitations in cases where the text does not contain pertinent information.

#### Message From You:
#### Message From You
What is known about the CFTR protein in resistance to ovarian cancer?

#### Message From BioChatter:
#### Message From BioChatter
Based on the provided information, there are several studies that suggest a potential role of the CFTR (Cystic Fibrosis Transmembrane Conductance Regulator) protein in resistance to ovarian cancer. Here are some key findings from the referenced manuscripts:

1. Gene expression profiling of ovarian tumors before and after chemotherapy revealed molecular signatures of chemoresistant tumors (M, et al., 2006). This study did not specifically mention CFTR, but it provides insights into the molecular changes associated with chemoresistance.
Expand All @@ -107,10 +107,10 @@ Based on the provided information, there are several studies that suggest a pote

It is important to note that while these studies suggest a potential association between CFTR and resistance to ovarian cancer, the exact mechanisms and clinical implications are still being investigated. Further research is needed to fully understand the role of CFTR in ovarian cancer resistance.

#### Message From You:
#### Message From You
Tell me more about the studies of aggression and knockdown of CFTR in vivo.

#### Message From BioChatter:
#### Message From BioChatter
Based on the provided information, there are two relevant studies regarding the aggression and knockdown of CFTR in ovarian cancer:

1. The study by Wu et al. (2013) investigated the constitutive activation of nuclear factor κB (NF-κB) and its contribution to CFTR expression, cancer progression, and poor prognosis in cervical cancer. Although this study is not specific to ovarian cancer, it provides insights into the role of CFTR and NF-κB in promoting cancer progression.
Expand Down
8 changes: 7 additions & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,8 @@ nav:
- Overview: benchmark-overview.md
- All Results: benchmark-results.md
- Vignettes:
- Retrieval-Augmented Generation: vignette-rag.md
- Retrieval-Augmented Generation (RAG): vignette-rag.md
- Knowledge Graph RAG: vignette-kg.md
- API Reference:
- LLM Connectivity: llm_connect-reference.md
- Vectorstore: vectorstore-reference.md
Expand All @@ -82,6 +83,11 @@ markdown_extensions:
slugify: !!python/object/apply:pymdownx.slugs.slugify
kwds:
case: lower
- pymdownx.superfences:
custom_fences:
- name: mermaid
class: mermaid
format: !!python/name:pymdownx.superfences.fence_code_format

extra_javascript:
- https://unpkg.com/tablesort@5.3.0/dist/tablesort.min.js
Expand Down

0 comments on commit f6253c6

Please sign in to comment.