Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GIE/docs] reorganize gremlin docs #2862

Merged
merged 6 commits into from
Jun 13, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -64,11 +64,11 @@ and the vineyard store that offers efficient in-memory data transfers.
interactive_engine/getting_started
interactive_engine/deployment
interactive_engine/tinkerpop_eco
interactive_engine/guide_and_examples
.. interactive_engine/guide_and_examples
interactive_engine/design_of_gie
interactive_engine/supported_gremlin_steps
.. interactive_engine/supported_gremlin_steps
interactive_engine/dev_and_test.md
interactive_engine/faq.md
.. interactive_engine/faq.md

.. toctree::
:maxdepth: 1
Expand Down
25 changes: 25 additions & 0 deletions docs/interactive_engine/faq.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,30 @@
# FAQs for GIE Gremlin Usage

## Compatibility with TinkerPop
GIE supports the property graph model and Gremlin traversal language defined by Apache TinkerPop,
and provides a Gremlin Websockets server that supports TinkerPop version 3.4.
In addition to the original Gremlin queries, we further introduce some syntactic sugars to allow
more succinct expression. However, because of the distributed nature and practical considerations, it is worth to notice the following limitations of our implementations of Gremlin.

- Functionalities
- Graph mutations.
- Lambda and Groovy expressions and functions, such as the `.map{<expression>}`, the `.by{<expression>}`, and the `.filter{<expression>}` functions, and `System.currentTimeMillis()`, etc. By the way, we have provided the `expr()` [syntactic sugar](../interactive_engine/supported_gremlin_steps.md) to handle complex expressions.
- Gremlin traversal strategies.
- Transactions.
- Secondary index isn’t currently available. Primary keys will be automatically indexed.

- Gremlin Steps: See [here](supported_gremlin_steps.md) for a complete supported/unsupported list of Gremlin.

## Property Graph Constraints
The current release of GIE supports two graph stores: one leverages [Vineyard](https://v6d.io/) to supply an in-memory store for immutable
graph data, and the other, called [groot](../storage_engine/groot.md), is developed on top of [RocksDB](https://rocksdb.org/) that also provides real-time write and data consistency via [snapshot isolation](https://en.wikipedia.org/wiki/Snapshot_isolation). Both stores support graph data being partitioned across multiple servers. By design, the following constraints are introduced (on both stores):
- Each graph has a schema comprised of the edge labels, property keys, and vertex labels used therein.
- Each vertex type or label has a primary key (property) defined by user. The system will automatically
generate a String-typed unique identifier for each vertex and edge, encoding both the label information
as well as user-defined primary keys (for vertex).
- Each vertex or edge property can be of the following data types: `int`, `long`, `float`, `double`,
`String`, `List<int>`, `List<long>`, and `List<String>`.

## What's the difference between Inner ID and Property ID ?

The main difference between Inner ID and Property ID is that Inner ID is a system-assigned identifier used internally by the graph engine for efficient data storage and retrieval, while Property ID is a user-defined property within a specific entity type.
Expand Down
161 changes: 16 additions & 145 deletions docs/interactive_engine/tinkerpop_eco.md
Original file line number Diff line number Diff line change
@@ -1,147 +1,18 @@
# Apache TinkerPop Ecosystem
[Apache TinkerPop](http://tinkerpop.apache.org/) is an open framework for developing interactive graph applications using the Gremlin query language. GIE implements TinkerPop's [Gremlin Server](https://tinkerpop.apache.org/docs/current/reference/#gremlin-server) interface so that the system can seamlessly interact with the TinkerPop ecosystem, including development tools such as [Gremlin Console] (https://tinkerpop.apache.org/docs/current/reference/#gremlin-console) and language wrappers such as Java and Python.

All you need to connect with existing Tinkerpop ecosystem is to obtain the GIE Frontend service endpoint.
How to do that?
- Follow the [instruction](./deployment.md#deploy-your-first-gie-service) while deploying GIE in a K8s cluster,
- Follow the [instruction](./dev_and_test.md#manually-start-the-gie-services) while starting GIE on a local machine.

## Connecting Gremlin within Python

GIE makes it easy to connect to a loaded graph with Tinkerpop's [Gremlin-Python](https://pypi.org/project/gremlinpython/).

```Python
import sys
from gremlin_python import statics
from gremlin_python.structure.graph import Graph
from gremlin_python.process.graph_traversal import __
from gremlin_python.process.strategies import *
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection

graph = Graph()
gremlin_endpoint = # the GIE Frontend service endpoint you've obtained
remoteConn = DriverRemoteConnection('ws://' + gremlin_endpoint + '/gremlin','g')
g = graph.traversal().withRemote(remoteConn)

res = g.V().count().next()
assert res == 6
```

````{hint}
A simpler option is to use the `gremlin` object for submitting Gremlin queries through
[GraphScope's python SDK](./getting_started.md), which is a wrapper that encompasses Tinkerpop's
Gremlin-Python and will automatically acquire the endpoint.
````

In large-scale data processing scenarios, streaming the returned data is often necessary to avoid Out of Memory (OOM) issues caused by handling a large volume of data, which provides benefits such as memory efficiency, continuous processing, incremental analysis, reduced latency, scalability, and resource optimization. It enables you to handle and analyze vast amounts of data effectively while mitigating the risk of memory-related issues. Here is an example to guide you how to collect results in a streaming way by python sdk.
```Python
from queue import Queue
from gremlin_python.driver.client import Client

graph_url = # the GIE Frontend service endpoint you've obtained
client = Client(graph_url, "g")

ret = []
q = client.submit('g.V()')
while True:
try:
ret.extend(q.next())
except StopIteration:
break

print(ret)
```
Furthermore, here are some parameters that can be used to configure the streaming size on the server-side.
```bash
# interactive_engine/compiler/src/main/resources/conf/gremlin-server.yaml
...
# total num of streaming batch size returned by compiler service
resultIterationBatchSize: 64
...

```

## Connecting Gremlin within Java
See [Gremlin-Java](https://tinkerpop.apache.org/docs/current/reference/#gremlin-java) for connecting Gremlin
within the Java language.

Here is an example to guide you how to collect results in a streaming way by java sdk.
```java
Cluster cluster = Cluster.build()
.addContactPoint("localhost") // use your host ip
.port(8182) // use your port
.create();
Client client = cluster.connect();
ResultSet resultSet = client.submit("g.V()"); // use your query
Iterator<Result> results = resultSet.iterator();
while(results.hasNext()) {
display(results.next()); // display each result in your way
}
client.close();
cluster.close();
GIE is a system that seamlessly integrates with the [Apache TinkerPop]((http://tinkerpop.apache.org/)) ecosystem,
which is an open framework for developing interactive graph applications using the gremlin query language.
By implementing TinkerPop's gremlin [Server]((https://tinkerpop.apache.org/docs/current/reference/#gremlin-server)) interface,
GIE allows for easy interaction with client/development tools such as gremlin console and language
wrappers such as Java and Python. If you're interested in empowering your existing gremlin applications
with the distributed capability of GIE, the following documentations will guide you through the process.

```{toctree} arguments
---
caption: GIE For Tinkerpop Ecosystem
maxdepth: 2
---
tinkerpop_gremlin
guide_and_examples
supported_gremlin_steps
faq
```

## Gremlin Console
1. Download Gremlin console and unpack to your local directory.
```bash
# if the given version (3.6.4) is not found, try to access https://dlcdn.apache.org to
# download an available version.
curl -LO https://dlcdn.apache.org/tinkerpop/3.6.4/apache-tinkerpop-gremlin-console-3.6.4-bin.zip && \
unzip apache-tinkerpop-gremlin-console-3.6.4-bin.zip && \
cd apache-tinkerpop-gremlin-console-3.6.4
```

2. In the directory of gremlin console, modify the `hosts` and `port` in `conf/remote.yaml` to the GIE Frontend Service endpoint, as
```bash
hosts: [your_endpoint_address]
port: [your_endpoint_port]
serializer: { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { serializeResultToString: true }}
```

3. Open the Gremlin console
```bash
chmod +x bin/gremlin.sh
bin/gremlin.sh
```

4. At the `gremlin>` prompt, enter the following to connect to the GraphScope session and switch to remote mode so that all
subsequent Gremlin queries will be sent to the remote connection automatically.
```bash
gremlin> :remote connect tinkerpop.server conf/remote.yaml
gremlin> :remote console
gremlin> g.V().count()
==> 6
gremlin>
```

5. You are now ready to submit any Gremlin queries via either the Python SDK or Gremlin console.

6. When you are finished, enter the following to exit the Gremlin Console.
```bash
gremlin> :exit
```

## Compatibility with TinkerPop
GIE supports the property graph model and Gremlin traversal language defined by Apache TinkerPop,
and provides a Gremlin Websockets server that supports TinkerPop version 3.4.
In addition to the original Gremlin queries, we further introduce some syntactic sugars to allow
more succinct expression. However, because of the distributed nature and practical considerations, it is worth to notice the following limitations of our implementations of Gremlin.

- Functionalities
- Graph mutations.
- Lambda and Groovy expressions and functions, such as the `.map{<expression>}`, the `.by{<expression>}`, and the `.filter{<expression>}` functions, and `System.currentTimeMillis()`, etc. By the way, we have provided the `expr()` [syntactic sugar](../interactive_engine/supported_gremlin_steps.md) to handle complex expressions.
- Gremlin traversal strategies.
- Transactions.
- Secondary index isn’t currently available. Primary keys will be automatically indexed.

- Gremlin Steps: See [here](supported_gremlin_steps.md) for a complete supported/unsupported list of Gremlin.

## Property Graph Constraints
The current release of GIE supports two graph stores: one leverages [Vineyard](https://v6d.io/) to supply an in-memory store for immutable
graph data, and the other, called [groot](../storage_engine/groot.md), is developed on top of [RocksDB](https://rocksdb.org/) that also provides real-time write and data consistency via [snapshot isolation](https://en.wikipedia.org/wiki/Snapshot_isolation). Both stores support graph data being partitioned across multiple servers. By design, the following constraints are introduced (on both stores):
- Each graph has a schema comprised of the edge labels, property keys, and vertex labels used therein.
- Each vertex type or label has a primary key (property) defined by user. The system will automatically
generate a String-typed unique identifier for each vertex and edge, encoding both the label information
as well as user-defined primary keys (for vertex).
- Each vertex or edge property can be of the following data types: `int`, `long`, `float`, `double`,
`String`, `List<int>`, `List<long>`, and `List<String>`.
122 changes: 122 additions & 0 deletions docs/interactive_engine/tinkerpop_gremlin.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
# GIE For Gremlin
This document will provide you with step-by-step guidance on how to connect your gremlin applications to the GIE's
FrontEnd service, which offers functionalities similar to the official Tinkerpop service.

Your first step is to obtain the endpoint of GIE Frontend service:
- Follow the [instruction](./deployment.md#deploy-your-first-gie-service) while deploying GIE in a K8s cluster,
- Follow the [instruction](./dev_and_test.md#manually-start-the-gie-services) while starting GIE on a local machine.

## Connecting Gremlin within Python

GIE makes it easy to connect to a loaded graph with Tinkerpop's [Gremlin-Python](https://pypi.org/project/gremlinpython/).

```Python
import sys
from gremlin_python import statics
from gremlin_python.structure.graph import Graph
from gremlin_python.process.graph_traversal import __
from gremlin_python.process.strategies import *
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection

graph = Graph()
gremlin_endpoint = # the GIE Frontend service endpoint you've obtained
remoteConn = DriverRemoteConnection('ws://' + gremlin_endpoint + '/gremlin','g')
g = graph.traversal().withRemote(remoteConn)

res = g.V().count().next()
assert res == 6
```

````{hint}
A simpler option is to use the `gremlin` object for submitting Gremlin queries through
[GraphScope's python SDK](./getting_started.md), which is a wrapper that encompasses Tinkerpop's
Gremlin-Python and will automatically acquire the endpoint.
````

In large-scale data processing scenarios, streaming the returned data is often necessary to avoid Out of Memory (OOM) issues caused by handling a large volume of data, which provides benefits such as memory efficiency, continuous processing, incremental analysis, reduced latency, scalability, and resource optimization. It enables you to handle and analyze vast amounts of data effectively while mitigating the risk of memory-related issues. Here is an example to guide you how to collect results in a streaming way by python sdk.
```Python
from queue import Queue
from gremlin_python.driver.client import Client

graph_url = # the GIE Frontend service endpoint you've obtained
client = Client(graph_url, "g")

ret = []
q = client.submit('g.V()')
while True:
try:
ret.extend(q.next())
except StopIteration:
break

print(ret)
```
Furthermore, here are some parameters that can be used to configure the streaming size on the server-side.
```bash
# interactive_engine/compiler/src/main/resources/conf/gremlin-server.yaml
...
# total num of streaming batch size returned by compiler service
resultIterationBatchSize: 64
...

```

## Connecting Gremlin within Java
See [Gremlin-Java](https://tinkerpop.apache.org/docs/current/reference/#gremlin-java) for connecting Gremlin
within the Java language.

Here is an example to guide you how to collect results in a streaming way by java sdk.
```java
Cluster cluster = Cluster.build()
.addContactPoint("localhost") // use your host ip
.port(8182) // use your port
.create();
Client client = cluster.connect();
ResultSet resultSet = client.submit("g.V()"); // use your query
Iterator<Result> results = resultSet.iterator();
while(results.hasNext()) {
display(results.next()); // display each result in your way
}
client.close();
cluster.close();
```

## Gremlin Console
1. Download Gremlin console and unpack to your local directory.
```bash
# if the given version (3.6.4) is not found, try to access https://dlcdn.apache.org to
# download an available version.
curl -LO https://dlcdn.apache.org/tinkerpop/3.6.4/apache-tinkerpop-gremlin-console-3.6.4-bin.zip && \
unzip apache-tinkerpop-gremlin-console-3.6.4-bin.zip && \
cd apache-tinkerpop-gremlin-console-3.6.4
```

2. In the directory of gremlin console, modify the `hosts` and `port` in `conf/remote.yaml` to the GIE Frontend Service endpoint, as
```bash
hosts: [your_endpoint_address]
port: [your_endpoint_port]
serializer: { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { serializeResultToString: true }}
```

3. Open the Gremlin console
```bash
chmod +x bin/gremlin.sh
bin/gremlin.sh
```

4. At the `gremlin>` prompt, enter the following to connect to the GraphScope session and switch to remote mode so that all
subsequent Gremlin queries will be sent to the remote connection automatically.
```bash
gremlin> :remote connect tinkerpop.server conf/remote.yaml
gremlin> :remote console
gremlin> g.V().count()
==> 6
gremlin>
```

5. You are now ready to submit any Gremlin queries via either the Python SDK or Gremlin console.

6. When you are finished, enter the following to exit the Gremlin Console.
```bash
gremlin> :exit
```