cypher-playground

NOT YET FULLY IMPLEMENTED

📚 Learning and exploring the Cypher query language for graph databases like Apache AGE.

Cypher is a declarative graph query language that allows for expressive and efficient data querying in a property graph.

-- https://en.wikipedia.org/wiki/Cypher_(query_language)

Apache AGE: Graph Processing & Analytics for RDBs

-- https://age.apache.org/

Overview

This project explores the ergonomics of evolving an existing relational data model to a graph model. Specifically, we explore migrating existing relational data in a Postgres database to an Apache AGE graph model in the same Postgres database. We want to answer these questions:

What is the developer experience of Cypher?
- How do Cypher queries compare to SQL queries? I expect recursive queries to be the main draw of Cypher. But what are the awkward parts of Cypher?
What is the data migration story for Apache AGE?
- What tools does Apache AGE offer to migrate existing relational data to a graph model? And better yet, can we skip a migration altogether? Meaning, can we write Cypher queries over relational data? (The answer is no; but I know Apache AGE is rapidly evolving and is strategically invested in relational databases, so I'm hopeful that this area enriches over time).

This project uses US geographies as its data domain. Specifically, we'll model ZIP codes, their containing city and their containing state. This creates a tree-like structure. This data model is not complex enough to warrant a graph data model so let's make it more interesting and also model "state adjacencies". For example, Minnesota neighbors Wisconsin.

This project uses Docker to run a Postgres database pre-installed with Apache AGE.

This project defines a multi-module Gradle project that defines Java programs that load the initial domain data, migrate the data from a relational state to a graph state, and query the data.

Here is a breakdown of the components of this project:

docker-compose.yml and postgres-init/
- This is the Docker-related stuff. The Docker Compose file defines the Postgres container and mounts the postgres-init/ directory into the container. The postgres-init/ directory contains the SQL scripts that initialize the database with the relational schema and the US state data.
data-loader/
- data-loader/ is a Gradle module. It defines a Java program that loads the ZIP code and city data from the zips.jsonl file.
data-migrator/
- NOT YET IMPLEMENTED
- data-migrator/ is a Gradle module. It defines a Java program that migrates the relational data to a graph data model.
data-queryer/
- NOT YET IMPLEMENTED
- data-queryer/ is a Gradle module. It defines a Java program that queries the graph data using Cypher.

Background

I'm interested in learning graph-based query languages. While I love SQL, the ability to express a pattern-matching query over a graph of data and get a serialized "object graph" response is something I often pine for when I'm otherwise stuck with a SQL query full of joins. I've been eyeing graph databases for a long time (but also cautiously eyeing them because you don't want to get tangled up with a technology that gets abandoned). I've had some brief but good experience using Cypher queries and now I want to learn more in-depth. Graph databases have gone through the hype cycle and hopefully we are nearing the "plateau of productivity", but we're not there yet. There are competing technologies, none of which have cemented a lead. Still, there is a lot of activity in the space.

GQL (Graph Query Language) is a standards-body proposed graph query language heavily inspired by Cypher but it is not yet a real thing. Cypher™️ proper is actually a Neo4J-specific language. Neo4J graciously supported an open specification called openCypher which is basically Cypher but it is meant to be implemented by different vendors and open source projects. openCypher is what I am exploring in this playground repository.

Apache AGE

For this project, I have to choose a database that supports openCypher. Apache AGE is a Postgres extension that brings graph capabilities to the very mature and wildly popular Postgres database. AGE is an acronym for "A Graph Extension". AGE is very promising because it is tied to Postgres (a sign of stability and maturity), it reached a 1.0 release in 2022 and it is under the Apache umbrella (another sign of durability). The project has good momentum. Plus it has a Java client. I'll use AGE for this playground repository.

This playground repository is effectively a playground for both Cypher and Apache AGE.

Forward Looking: GQL

Lastly, I have my eyes on SQL/PGQ which is a proposed extension to the SQL standard which would allow for graph queries. This is by far the most conservative leap from the SQL world to the graph world and this is what I'm most interested in. But this project is extremely early so there's nothing to play with yet.

Instructions

Follow these instructions to get up and running with a graph database, some sample data, and some cypher queries.

Pre-requisite: Docker
Start the Postgres database with the AGE extension.
- ```
docker-compose up --detach
```
- As part of the startup procedure, the relational schema is created and the US state data gets loaded.

Load the ZIP code and city data.

```
./gradlew :data-loader:run
```
It will look something like the following.

00:19:46 [main] INFO  dataloader.Main - Loading ZIP code data from the local file into Postgres ...
00:20:22 [main] INFO  dataloader.Main - Loaded 25,701 cities and 29,353 ZIP codes.

Migrate the relational data to a graph model.
- NOT YET IMPLEMENTED
- ```
./gradlew :data-migrator:run
```
Query the graph data.
- NOT YET IMPLEMENTED
- ```
./gradlew :data-queryer:run
```
- Read the Java source code to understand the Cypher queries.
When you're done, stop the database.
- ```
docker-compose down
```

Notes

The AGE manual is great. Here are some quotes.

Cypher uses a Postgres namespace for every individual graph. It is recommended that no DML or DDL commands are executed in the namespace that is reserved for the graph.

AGE uses a custom data type called agtype, which is the only data type returned by AGE. Agtype is a superset of Json and a custom implementation of JsonB.

Cypher cannot be used in an expression, the query must exists in the FROM clause of a query. However, if the cypher query is placed in a Subquery, it will behave as any SQL style query.

Wish List

General clean-ups, TODOs and things I wish to implement for this project:

Reference

openCypher GitHub repository
- openCypher is a specification of the Cypher query language
Wikipedia: Graph Query Language
- GQL (Graph Query Language) is a proposed standard. It borrows heavily from Cypher.
HackerNews comment about Apache AGE, SQL/PGQ, and GQL
- Thanks to this person for leaving the comment. It's a great concise summary.
Apache AGE docs

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
buildSrc		buildSrc
data-loader		data-loader
gradle		gradle
postgres-init		postgres-init
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle.kts		settings.gradle.kts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cypher-playground

Overview

Background

Apache AGE

Forward Looking: GQL

Instructions

Notes

Wish List

Reference

About

Releases

Packages

Languages

dgroomes/cypher-playground

Folders and files

Latest commit

History

Repository files navigation

cypher-playground

Overview

Background

Apache AGE

Forward Looking: GQL

Instructions

Notes

Wish List

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages