Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AI Ecosystem page #995

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions pages/_meta.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
"index": "Home",
"getting-started": "Getting started",
"client-libraries": "Client libraries",
"ai-ecosystem": "AI ecosystem",
"fundamentals": "Fundamentals",
"data-migration": "Data migration",
"querying": "Querying",
Expand Down
354 changes: 354 additions & 0 deletions pages/ai-ecosystem.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,354 @@
---
title: Memgraph's AI Ecosystem
description: Explore key features, such as community detection, node embeddings, and graph neural networks, alongside integrations with popular AI libraries like LangChain and LlamaIndex, to create powerful, data-driven GenAI solutions.
---

# Memgraph's AI Ecosystem

**Artificial intelligence (AI)** involves various areas, such as machine
learning (ML), natural language processing (NLP) and knowledge representation
and reasoning (KRR), each focusing on different aspects and strategies for
creating intelligent systems. Usually, these areas overlap, which leads to the
development of more advanced AI systems. One example of such overlap is
**Generative AI (GenAI)**.

GenAI focuses on developing AI systems to generate new content, such as text or
images. **Large language models (LLMs)** are a core technology to power many
GenAI apps. LLMs are trained on massive datasets and understand and generate
natural language. Still, if you want the LLM to understand your custom data,
then you need to find the best way to transfer that knowledge to the LLM.

One way of doing that is **fine-tuning**, which can be hard to set up, slow to
get it up and running, and expensive. Also, if the knowledge you want to
transfer is growing, fine-tuning would have to be a repeatable process, which
isn't viable.

That is where **retrieval augmented generation (RAG)** came to the rescue. RAG
enhances GenAI and aims to improve LLMs' accuracy and personalization by
augmenting them with external data sources that the LLM doesn't know by default.
It is easy to setup, flexible, and scalable, and it supports dynamic knowledge
updates.

Traditional RAG systems are enhancing GenAI, but they often fall short when
retrieving crucial knowledge from **complex datasets**. That is where GraphRAG
excels. **GraphRAG** is a type of RAG that utilizes knowledge graphs and graph
features (e.g., finding communities and analyzing neighborhoods) to enhance
retrieval. When other data structures are used in combination, it's sometimes
called HybridRAG.

By using a graph-based or hybrid approach to retrieval, GenAI applications can
access more structured, contextually rich, and relational data, leading to
better performance, more accurate generation, and richer insights.

Memgraph has been a popular choice in AI, especially for cases that utilize
[machine learning](#machine-learning). It also proves to be a great choice to
build a [GraphRAG](#graphrag-with-memgraph).

## GraphRAG with Memgraph

LLMs have the knowledge they were trained on. By building a RAG, you're
expanding that knowledge with your data. It is important to understand how to
**structure and model** the data and how to **find and extract relevant
information** for LLM to provide more accurate responses personalized to your
specific data.

**GraphRAG is a RAG system that combines the strengths of knowledge graphs and
LLMs**. Knowledge graphs are a structured representation of information where
entities and their relationships are organized to enable reasoning and insights.

Here are the main strengths of graph in a RAG system:
- **Relational context** - Knowledge graph structure holds the information about
semantics.
- **Improved retrieval accuracy** - Having retrieval strategies specific to
graphs, such as community detection and impact analytics.
- **Multi-hop reasoning** - Ability to traverse through data neighborhoods.
- **Efficient information navigation** - Scanning subgraphs instead of full
datasets.
- **Dynamically evolving knowledge** - Updating graph in real time.

Graph structure is a prerequisite for GraphRAG, and a graph database is even
better. A GraphRAG application running in production needs **scalability**,
**real-time performance**, **incremental updates** and **persistence**. Having a
graph database as a part of the GraphRAG is especially useful if other
application parts also rely on the graph database.


### Key Memgraph features

Several features in Memgraph are useful when building a GraphRAG:

- [Louvain community
detection](/advanced-algorithms/available-algorithms/community_detection)
- [Dynamic community
detection](/advanced-algorithms/available-algorithms/community_detection_online)
- [PageRank](/advanced-algorithms/available-algorithms/pagerank)
- [Dynamic PageRank](/advanced-algorithms/available-algorithms/pagerank_online)
- [Deep-path traversals](/advanced-algorithms/deep-path-traversal)
- [Text search](/querying/text-search)
- [Run-time schema tracking](/schema#run-time-schema-tracking)

Here is how those features fit into the GraphRAG architecture:

![memgraph-graphrag](/pages/ai-ecosystem/memgraph-graphrag.png)

{<h4>Coming soon</h4>}

Two new important features for building a GraphRAG with Memgraph will be out in
the next release, Memgraph 2.21, in early November:

- **Vector search** - Usually used in the first step of finding and extracting
relevant information (pivot search).
- **Leiden community detection** - Proven to be a better and faster version of
Louvain community detection, guaranteeing well-connected communities. It is
usually used in the second step of finding and extracting relevant information
(relevance expansion).

### Tools

- [GraphChat](/data-visualization/user-manual/graphchat): GenAI-powered feature
in Memgraph Lab

**TODO: add visual and explain**

### Integrations

Memgraph offers several integrations with popular AI libraries to help you
customize and build your own GenAI application from scratch. Below are some of
the libraries integrated with Memgraph:

{<h4>LangChain</h4>}

**TODO: add how LangChain integration works once it's merged**

![langchain](/pages/ai-ecosystem/langchain-memgraph.png)

{<h4>LlamaIndex</h4>}

**TODO: add about LlamaIndex**

{<h5>Installation</h5>}

To install LlamaIndex and Memgraph graph store, run:

```shell
pip install llama-index llama-index-graph-stores-memgraph
```

{<h5>Environment setup</h5>}
Before you get started, make sure you have [Memgraph](/getting-started) running
in the background.

To use Memgraph as the underlying graph store for LlamaIndex, define your graph
store by providing the credentials used for your database:

```python
from llama_index.graph_stores.memgraph import MemgraphPropertyGraphStore

username = "" # Enter your Memgraph username (default "")
password = "" # Enter your Memgraph password (default "")
url = "" # Specify the connection URL, e.g., 'bolt://localhost:7687'

graph_store = MemgraphPropertyGraphStore(
username=username,
password=password,
url=url,
)
```

Additionally, a working OpenAI key is required:

```python
import os
os.environ["OPENAI_API_KEY"] = "<YOUR_API_KEY>" # Replace with your OpenAI API key
```

{<h5>Dataset</h5>}
For the dataset, we'll use a text about Charles Darwin stored in the
`/data/charles_darwin/charles.txt` file. For the content of the text file, check
the image below.

```python
from llama_index.core import SimpleDirectoryReader

documents = SimpleDirectoryReader("./data/charles_darwin/").load_data()
```

The data is now loaded in the documents variable which we'll pass as an argument
in the next step of index creation and graph construction.

{<h5>Graph construction</h5>}

LlamaIndex provides multiple [graph
constructors](https://docs.llamaindex.ai/en/latest/module_guides/indexing/lpg_index_guide/#construction).
In this example, we'll use the
[SchemaLLMPathExtractor](https://docs.llamaindex.ai/en/latest/module_guides/indexing/lpg_index_guide/#schemallmpathextractor),
which allows to both predefine the schema or use the one LLM provides without
explicitly defining entities.

```python
from llama_index.core import PropertyGraphIndex
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
from llama_index.core.indices.property_graph import SchemaLLMPathExtractor

index = PropertyGraphIndex.from_documents(
documents,
embed_model=OpenAIEmbedding(model_name="text-embedding-ada-002"),
kg_extractors=[
SchemaLLMPathExtractor(
llm=OpenAI(model="gpt-4", temperature=0.0),
)
],
property_graph_store=graph_store,
show_progress=True,
)
```

{<h5>Querying</h5>}

Labeled property graphs can be queried in several ways to retrieve nodes and
paths and in LlamaIndex, several node retrieval methods at once can be combined.

If no sub-retrievers are provided, the defaults are
[LLMSynonymRetriever](https://docs.llamaindex.ai/en/latest/module_guides/indexing/lpg_index_guide/#default-llmsynonymretriever).

```python
query_engine = index.as_query_engine(include_text=True)

response = query_engine.query("Who is Charles Robert Darwin?")
print(str(response))
```

**TODO: add a bit of a explanation**
In the image below...

![llamaindex](/pages/ai-ecosystem/llama-memgraph.png)


### Resources
- [Enhancing LLM Chatbot Efficiency with
GraphRAG](https://memgraph.com/user-stories/white-paper/enhancing-llm-chatbot-efficiency-with-graphrag):
A webinar from which you can learn how Microchip Technology leveraged Memgraph
to improve the efficiency of their LLM-powered chatbot.
- [LLMs, Memgraph and Orbit: Modeling Community
Networks](https://memgraph.com/webinars/modeling-online-community-networks-with-llms-and-memgraph):
A webinar on how Orbit leverages LLMs with Memgraph to model, simulate, and
enrich these dynamic conversational ecosystems.
- [GenAI
Stack](https://memgraph.com/blog/building-gen-ai-applications-with-memgraph-gpt-llama):
A blog post about Memgraph's demo app showcasing natural language querying
utilizing LangChain integration with various LLMs.
- [Querying Memgraph through an
LLM](https://www.youtube.com/watch?v=okmk357t9W8&list=PL7Eotag2rRhZssS4f11PKAHuCykMCljg3):
Community call with Brett Brewer, former VP at Microsoft, demoing LangChain
integration with Memgraph to query with natural language.

## Machine learning

Memgraph has been a popular choice in the AI world for a while now, especially
for the use cases around machine learning (ML). The MAGE library aims to provide
the most commonly used graph algorithms, and that includes [graph ML
algorithms](/advanced-algorithms/available-algorithms#graph-ml-algorithms) as
well.

### Node embeddings

Supervised machine learning is a subset of ML where algorithms try to learn from
data. Modeling the interactions between entities as graphs has enabled
researchers to understand the various networks systematically. For the computer
to understand these networks, embedding a large graph in low dimensional space
is necessary, creating node embeddings. It has been demonstrated that graph
embedding is superior to alternatives in many supervised learning tasks, such as
**node classification** and **link prediction**, which are usually used for
friendship or content recommendations and advertisement.

{<h4>Algorithms</h4>}

Here are the MAGE algorithms which create node embeddings:
- [node2vec](/advanced-algorithms/available-algorithms/node2vec): An algorithm
for calculating node embeddings on a static graph.
- [node2vec_online](/advanced-algorithms/available-algorithms/node2vec_online):
An algorithm for calculating node embeddings on a dynamic graph.

{<h4>Resources</h4>}

In case you'd like to learn more about the topic and see
practical examples, check out the following resources:
- [Introduction to Node
Embedding](https://memgraph.com/blog/introduction-to-node-embedding): A blog
post covering the basics of node embeddings.
- [How Node2Vec Works – A Random Walk-Based Node Embedding
Method](https://memgraph.com/blog/how-node2vec-works): A blog post from which
you can learn all about the node2vec algorithm.
- [Understanding How Dynamic node2vec Works on Streaming
Data](https://memgraph.com/blog/dynamic-node2vec-on-streaming-data): A blog
post from which you can learn all about the dynamic node2vec algorithm.
- [Link Prediction With node2vec in Physics Collaboration
Network](https://memgraph.com/blog/link-prediction-with-node2vec-in-physics-collaboration-network):
A demo on building a recommendation system using link predictions calculated
with the node2vec algorithm.
- [Recommendation System Using Online Node2Vec With Memgraph
MAGE](https://memgraph.com/blog/online-node2vec-recommendation-system): A demo
on building an online recommendation system using k-means clustering and
node2vec algorithm.

### Graph neural networks

Using the node2vec algorithm to determine node embeddings works well, but graph
neural networks (GNNs) are more precise. GNNs aim to get the node
representations automatically and efficiently by iteratively aggregating the
representations of node neighbors and combining them with their representation
from the previous iteration. GNNs can inductively learn about your dataset,
which means that after training is complete, you can apply their knowledge to a
similar use case, meaning you don't have to retrain the whole algorithm.

![gnn](/pages/ai-ecosystem/memgraph-gnn.png)

{<h4>Algorithms</h4>}

Here are the MAGE algorithms that are using GNNs:
- [Link prediction with
GNN](/advanced-algorithms/available-algorithms/gnn_link_prediction): Module
for predicting links in the graph by using GNNs.
- [Node classification with
GNN](/advanced-algorithms/available-algorithms/gnn_node_classification):
GNN-based node classification module.

{<h4>Resources</h4>}

In case you'd like to learn more about the topic and see
practical examples, check out the following resources:
- [Building a Recommendation System for Telecommunication Packages Using Graph
Neural
Networks](https://memgraph.com/blog/building-a-recommendation-system-for-telecommunication-packages-using-graph-neural-networks):
A demo on how to build a recommendation system using Memgraph's link
prediction with GNN.
- [Become an Inspector for a Day and Detect Fraudsters With Graph ML on
Memgraph](https://memgraph.com/blog/become-an-inspector-for-a-day-and-detect-fraudsters-with-graph-ml-on-memgraph):
A demo on how to build a fraud detection system using node classification with
GNN.

### Temporal graph networks

Temporal graph networks (TGNs) are GNNs which work on temporal graph networks,
meaning they deal with continuous-time dynamic graphs.


{<h4>Algorithms</h4>}

Here is the MAGE algorithm that uses TGNs:
- [Link prediction and node classification with
TGN](/advanced-algorithms/available-algorithms/tgn): A TGN-based module for
link prediction and node classification.

![tgn](/pages/ai-ecosystem/memgraph-tgn.png)

{<h4>Resources</h4>}

In case you'd like to learn more about the topic, check out
the following resource:
- [Temporal Graph Neural Networks With Pytorch - How to Create a Simple
Recommendation Engine on an Amazon
Dataset](https://memgraph.com/blog/amazon-user-item-recommender-with-tgn-and-memgraph):
An under-the-hood blog post explaining TGN in Memgraph and showcasing
recommendation system demo using link prediction with TGN.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added public/pages/ai-ecosystem/llama-memgraph.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added public/pages/ai-ecosystem/memgraph-gnn.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added public/pages/ai-ecosystem/memgraph-graphrag.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added public/pages/ai-ecosystem/memgraph-tgn.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.