Skip to content

Add AI Ecosystem page #995

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Oct 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions pages/_meta.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
"index": "Home",
"getting-started": "Getting started",
"client-libraries": "Client libraries",
"ai-ecosystem": "AI ecosystem",
"fundamentals": "Fundamentals",
"data-migration": "Data migration",
"querying": "Querying",
Expand Down
41 changes: 41 additions & 0 deletions pages/ai-ecosystem.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
---
title: Memgraph's AI Ecosystem
description: Explore key features, such as community detection, node embeddings, and graph neural networks, alongside integrations with popular AI libraries like LangChain and LlamaIndex, to create powerful, data-driven GenAI solutions.
---

import { Card, Cards } from 'nextra/components'
import { Callout } from 'nextra/components'

# Memgraph's AI Ecosystem

To learn about Memgraph's key features to build AI apps, explore the following
pages:

- [GraphRAG](/ai-ecosystem/graph-rag)
- [Machine learning](/ai-ecosystem/machine-learning)


AI spans multiple areas like machine learning (ML), natural language processing
(NLP), and knowledge representation and reasoning (KRR), often overlapping to
create advanced systems. A key example is **Generative AI (GenAI)**, which generates
new content like text or images. Large Language Models (LLMs) power many GenAI
apps, but getting them to work with your custom data can be challenging.

Fine-tuning LLMs to incorporate custom data is often complex, slow, and costly.
Plus, frequent updates make it inefficient.

**Retrieval-Augmented Generation (RAG)** solves this by enhancing LLMs with
external data sources, enabling dynamic, scalable knowledge updates. Traditional
RAG is based on vector structure with vector databases, and it has proven to be
a great solution in many use cases. Still, it often falls short when retrieving
crucial knowledge from complex datasets. That is where GraphRAG excels.

**GraphRAG** improves on this by using knowledge graphs and graph features (e.g.,
community detection, neighborhood analysis) for more accurate retrieval and
data-rich insights. This hybrid approach provides better context and performance
for GenAI applications.

Memgraph has been a popular choice in AI, especially for cases that utilize
[machine learning](/ai-ecosystem/machine-learning). It also proves to be a great
choice to build a [GraphRAG](/ai-ecosystem/graph-rag).

4 changes: 4 additions & 0 deletions pages/ai-ecosystem/_meta.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"graph-rag": "GraphRAG",
"machine-learning": "Machine learning"
}
286 changes: 286 additions & 0 deletions pages/ai-ecosystem/graph-rag.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,286 @@
---
title: GraphRAG
description: Learn how Memgraph fits into the architecture of your RAG system.
---
import { Callout } from 'nextra/components'
import { Card, Cards } from 'nextra/components'

# GraphRAG with Memgraph


LLMs have the knowledge they were trained on. By building a RAG, you're
expanding that knowledge with your data. It is important to understand how to
**structure and model** the data and how to **find and extract relevant
information** for LLM to provide more accurate responses personalized to your
specific data.

![graphrag-memgraph](/pages/ai-ecosystem/graphrag-memgraph.png)

**GraphRAG is a RAG system that combines the strengths of knowledge graphs and
LLMs**. Knowledge graphs are a structured representation of information where
entities and their relationships are organized to enable reasoning and insights.

Here are the main strengths of graph in a RAG system:
- **Relational context** - Knowledge graph structure holds the information about
semantics.
- **Improved retrieval accuracy** - Having retrieval strategies specific to
graphs, such as community detection and impact analytics.
- **Multi-hop reasoning** - Ability to traverse through data neighborhoods.
- **Efficient information navigation** - Scanning subgraphs instead of full
datasets.
- **Dynamically evolving knowledge** - Updating graph in real time.

**Graph structure is a prerequisite for GraphRAG, and a graph database is even
better**. A GraphRAG application running in production needs **scalability**,
**real-time performance**, **incremental updates** and **persistence**. Having a
graph database as a part of the GraphRAG is especially useful if other
application parts also rely on the graph database.


## Key Memgraph features

Memgraph is a graph database that stores your knowledge graph, and it ensures
[durability](/fundamentals/data-durability) of stored data for backup and
recovery. Refer to our [graph modeling guide](/fundamentals/graph-modeling) for
tips and tricks on building a knowledge graph.

With Memgraph as an in-memory graph database, you can quickly traverse through
your graph with [deep path traversals](/advanced-algorithms/deep-path-traversal)
and not worry about latency.

You can [ingest streaming data](/data-streams) into Memgraph with Kafka,
Redpanda or Pulsar which you can then query with (dynamic) MAGE algorithms or
your custom procedures. That allows you to have a growing knowledge graph that's
being updated on-the-fly.

Here are the most useful features in Memgraph to build a GraphRAG:

- [Deep-path traversals](/advanced-algorithms/deep-path-traversal)
- [Louvain community
detection](/advanced-algorithms/available-algorithms/community_detection)
- [Dynamic community
detection](/advanced-algorithms/available-algorithms/community_detection_online)
- [PageRank](/advanced-algorithms/available-algorithms/pagerank)
- [Dynamic PageRank](/advanced-algorithms/available-algorithms/pagerank_online)
- [Text search](/querying/text-search)
- [Run-time schema tracking](/querying/schema#run-time-schema-tracking)
- **Coming soon - Memgraph 2.21** - Vector search - Usually used in the first
step of finding and extracting relevant information (pivot search)
- **Coming soon - Memgraph 2.21** - Leiden community detection - Proven to be a
better and faster version of Louvain community detection, guaranteeing
well-connected communities. It is usually used in the second step of finding
and extracting relevant information (relevance expansion).

Here is how those features fit into the GraphRAG architecture:

![graphrag](/pages/ai-ecosystem/graphrag.png)

## Tools

[GraphChat](/data-visualization/user-manual/graphchat) is a Memgraph Lab feature
that allows users to extract insights from a graph database by asking questions
in plain English. It incorporates elements of GraphRAG. This two-phase
Generative AI app first generates Cypher queries from the text and then
summarizes the query results in the final response.

![graphchat](/pages/ai-ecosystem/graphchat-rag.png)

## Integrations

Memgraph offers several integrations with popular AI frameworks to help you
customize and build your own GenAI application from scratch. Below are some of
the libraries integrated with Memgraph.

{<h4>LangChain</h4>}

LangChain is a framework for developing applications powered by large language
models (LLMs). Currently, with Memgraph's LangChain integration you can query
your graph database with natural language. The example can be found on the
[LangChain
documentation](https://python.langchain.com/docs/integrations/graphs/memgraph/).

<Callout type="info">
We are in the process of updating and improving the integration. We added a
support to build a knowledge graph from unstructured data and improved schema
generation speed. To track progress and speed things up, please upvote the [PR on LangChain
GitHub](https://github.com/langchain-ai/langchain/pull/27017) 👍.
</Callout>

{<h4>LlamaIndex</h4>}

LlamaIndex is a simple, flexible data framework for connecting custom data
sources to large language models. Currently, [Memgraph's
integration](https://docs.llamaindex.ai/en/stable/api_reference/storage/graph_stores/memgraph/)
supports creating a knowledge graph from unstructured data and querying with
natural language. You can follow the example on [LlamaIndex
docs](https://docs.llamaindex.ai/en/stable/examples/property_graph/property_graph_memgraph/)
or go through quick start below.

{<h5>Installation</h5>}

To install LlamaIndex and Memgraph graph store, run:

```shell
pip install llama-index llama-index-graph-stores-memgraph
```

{<h5>Environment setup</h5>}
Before you get started, make sure you have [Memgraph](/getting-started) running
in the background.

To use Memgraph as the underlying graph store for LlamaIndex, define your graph
store by providing the credentials used for your database:

```python
from llama_index.graph_stores.memgraph import MemgraphPropertyGraphStore

username = "" # Enter your Memgraph username (default "")
password = "" # Enter your Memgraph password (default "")
url = "" # Specify the connection URL, e.g., 'bolt://localhost:7687'

graph_store = MemgraphPropertyGraphStore(
username=username,
password=password,
url=url,
)
```

Additionally, a working OpenAI key is required:

```python
import os
os.environ["OPENAI_API_KEY"] = "<YOUR_API_KEY>" # Replace with your OpenAI API key
```

{<h5>Dataset</h5>}
For the dataset, we'll use a text about Charles Darwin stored in the
`/data/charles_darwin/charles.txt` file:

```
Charles Robert Darwin was an English naturalist, geologist, and biologist,
widely known for his contributions to evolutionary biology. His proposition that
all species of life have descended from a common ancestor is now generally
accepted and considered a fundamental scientific concept. In a joint publication
with Alfred Russel Wallace, he introduced his scientific theory that this
branching pattern of evolution resulted from a process he called natural
selection, in which the struggle for existence has a similar effect to the
artificial selection involved in selective breeding. Darwin has been described
as one of the most influential figures in human history and was honoured by
burial in Westminster Abbey.
```

```python
from llama_index.core import SimpleDirectoryReader

documents = SimpleDirectoryReader("./data/charles_darwin/").load_data()
```

The data is now loaded in the documents variable which we'll pass as an argument
in the next step of index creation and graph construction.


{<h5>Graph construction</h5>}

LlamaIndex provides multiple [graph
constructors](https://docs.llamaindex.ai/en/latest/module_guides/indexing/lpg_index_guide/#construction).
In this example, we'll use the
[SchemaLLMPathExtractor](https://docs.llamaindex.ai/en/latest/module_guides/indexing/lpg_index_guide/#schemallmpathextractor),
which allows to both predefine the schema or use the one LLM provides without
explicitly defining entities.

```python
from llama_index.core import PropertyGraphIndex
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
from llama_index.core.indices.property_graph import SchemaLLMPathExtractor

index = PropertyGraphIndex.from_documents(
documents,
embed_model=OpenAIEmbedding(model_name="text-embedding-ada-002"),
kg_extractors=[
SchemaLLMPathExtractor(
llm=OpenAI(model="gpt-4", temperature=0.0),
)
],
property_graph_store=graph_store,
show_progress=True,
)
```

In the below image, you can see how the text was transformed into a knowledge
graph and stored into Memgraph.

![llama-index](/pages/ai-ecosystem/llamaindex-kg.png)

{<h5>Querying</h5>}

Labeled property graphs can be queried in several ways to retrieve nodes and
paths and in LlamaIndex, several node retrieval methods at once can be combined.

If no sub-retrievers are provided, the defaults are
[LLMSynonymRetriever](https://docs.llamaindex.ai/en/latest/module_guides/indexing/lpg_index_guide/#default-llmsynonymretriever).

```python
query_engine = index.as_query_engine(include_text=True)

response = query_engine.query("Who did Charles Robert Darwin collaborate with?")
print(str(response))
```

In the image below, you can see what's happening under the hood to get the answer.

![llama-retriever](/pages/ai-ecosystem/llama-retriever.png)

## Resources
- [Cedars-Sinai: Using Graph Databases for Knowledge-Aware Automated Machine
Learning](https://memgraph.com/webinars/cedars-sinai-using-graph-databases-for-knowledge-aware-automated-machine-learning):
A webinar on utilizing AutoML and GraphRAG to improve predicitions and drug
discovery for Alzheimer's disease.
- [Optimizing Insulin Management: The Role of GraphRAG in Patient
Care](https://memgraph.com/webinars/optimizing-insulin-management-the-role-of-graphrag-in-patient-care):
A webinar on GraphRAG with Memgraph which enhances AI decision-making to
improve healthcare outcomes.
- [Enhancing LLM Chatbot Efficiency with
GraphRAG](https://memgraph.com/user-stories/white-paper/enhancing-llm-chatbot-efficiency-with-graphrag):
A webinar from which you can learn how Microchip Technology leveraged Memgraph
to improve the efficiency of their LLM-powered chatbot.
- [LLMs, Memgraph and Orbit: Modeling Community
Networks](https://memgraph.com/webinars/modeling-online-community-networks-with-llms-and-memgraph):
A webinar on how Orbit leverages LLMs with Memgraph to model, simulate, and
enrich these dynamic conversational ecosystems.
- [GenAI
Stack](https://memgraph.com/blog/building-gen-ai-applications-with-memgraph-gpt-llama):
A blog post about Memgraph's demo app showcasing natural language querying
utilizing LangChain integration with various LLMs.
- [Querying Memgraph through an
LLM](https://www.youtube.com/watch?v=okmk357t9W8&list=PL7Eotag2rRhZssS4f11PKAHuCykMCljg3):
Community call with Brett Brewer, former VP at Microsoft, demoing LangChain
integration with Memgraph to query with natural language.

## Want to learn more?

To learn more, check out [Enhancing AI with graph databases and LLMs
bootcamp](https://memgraph.com/academy/enhancing-ai-with-graph-databases-and-llms)
and [on-demand resources](https://memgraph.com/on-demand). Stay up to date with
[Memgraph events](https://memgraph.com/events) and watch videos from the [AI,
LLMs and GraphRAG YouTube
playlist](https://youtube.com/playlist?list=PL7Eotag2rRhYX6lZNbk7SPOcqREF7xzyU&si=RHDKio8o31KQ2QmV).

If you have questions regarding Memgraph or want to provide feedback, [join our
Discord](https://www.discord.gg/memgraph) community.

If you prefer a call, schedule a 30 min session with one of our engineers to
discuss how Memgraph fits with your architecture. Our engineers are highly
experienced in helping companies of all sizes to integrate and get the most out
of Memgraph in their projects. Talk to us about data modeling, optimizing
queries, defining infrastructure requirements or migrating from your existing
graph database. No nonsense or sales pitch, just tech.

![](/pages/getting-started/memgraph-office-hours.svg)

<Cards>
<Card
title="Book a call"
href="https://memgraph.com/office-hours"
/>
</Cards>
Loading