Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: ResultSet& ValueWrapper sugars for Graph Visualisation & Data Sci #323

Merged
merged 19 commits into from
Apr 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
271 changes: 212 additions & 59 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,48 +1,37 @@
# nebula-python

This repository holds the official Python API for NebulaGraph.
# NebulaGraph Python Client

[![pdm-managed](https://img.shields.io/badge/pdm-managed-blueviolet)](https://pdm.fming.dev)
[![pypi-version](https://img.shields.io/pypi/v/nebula3-python)](https://pypi.org/project/nebula3-python/)
[![python-version](https://img.shields.io/badge/python-3.6.2+%20|%203.7%20|%203.8%20|%203.9%20|%203.10%20|%203.11%20|%203.12-blue)](https://www.python.org/)

## Getting Started

## Before you start
**Note**: Ensure you are using the correct version, refer to the [Capability Matrix](#Capability-Matrix) for how the Python client version corresponds to the NebulaGraph Database version.

Before you start, please read this section to choose the right branch for you. The compatibility between the API and NebulaGraph service can be found in [How to choose nebula-python](#How-to-choose-nebula-python). The current master branch is compatible with NebulaGraph 3.x.
### Accessing NebulaGraph

## The directory structure
- For **first-time** trying out Python client, go through [Quick Example: Connecting to GraphD Using Graph Client](#Quick-Example:-Connecting-to-GraphD-Using-Graph-Client).

```text
|--nebula-python
|
|-- nebula3 // client code
| |-- fbthrift // the fbthrift lib code
| |-- common
| |-- data
| |-- graph
| |-- meta
| |-- net // the net code for graph client
| |-- storage
| |-- Config.py // the pool config
| |__ Exception.py // the define exception
|
|-- examples
| |-- GraphClientMultiThreadExample.py // the multi thread example
| |-- GraphClientSimpleExample.py // the simple example
| |__ ScanVertexEdgeExample.py
|
|-- tests // the test code
|
|-- setup.py // used to install or package
|
|__ README.md // the introduction of nebula3-python

```

## How to get nebula3-python

### Option one: install with pip
- If your Graph Application is a **Web Service** dedicated to one Graph Space, go with Singleton of **Session Pool**, check [Using the Session Pool: A Guide](#Using-the-Session-Pool:-A-Guide).

- If you're building Graph Analysis Tools(Scan instead of Query), you may want to use the **Storage Client** to scan vertices and edges, see [Quick Example: Using Storage Client to Scan Vertices and Edges](#Quick-Example:-Using-Storage-Client-to-Scan-Vertices-and-Edges).

### Handling Query Results

- On how to form a query result into a **Pandas DataFrame**, see [Example: Fetching Query Results into a Pandas DataFrame](#Example:-Fetching-Query-Results-into-a-Pandas-DataFrame).

- On how to render/visualize the query result, see [Example: Extracting Edge and Vertex Lists from Query Results](#Example:-Extracting-Edge-and-Vertex-Lists-from-Query-Results), it demonstrates how to extract lists of edges and vertices from any query result by utilizing the `ResultSet.dict_for_vis()` method.

### Jupyter Notebook Integration

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/wey-gu/ipython-ngql/blob/main/examples/get_started.ipynb)


If you are about to access NebulaGraph within Jupyter Notebook, you may want to use the [NebulaGraph Jupyter Extension](https://pypi.org/project/ipython-ngql/), which provides a more interactive way to access NebulaGraph. See also this on Google Colab: [NebulaGraph on Google Colab](https://colab.research.google.com/github/wey-gu/ipython-ngql/blob/main/examples/get_started.ipynb).

## Obtaining nebula3-python

### Method 1: Installation via pip

```python
# for v3.x
Expand All @@ -51,7 +40,10 @@ pip install nebula3-python==$version
pip install nebula2-python==$version
```

### Option two: install from the source code
### Method 2: Installation via source

<details>
<summary>Click to expand</summary>

- Clone from GitHub

Expand All @@ -74,7 +66,9 @@ pip install .
python3 setup.py install
```

## Quick example to use graph-client to connect graphd
</details>

## Quick Example: Connecting to GraphD Using Graph Client

```python
from nebula3.gclient.net import ConnectionPool
Expand All @@ -93,7 +87,7 @@ ok = connection_pool.init([('127.0.0.1', 9669)], config)
session = connection_pool.get_session('root', 'nebula')

# select space
session.execute('USE nba')
session.execute('USE basketballplayer')

# show tags
result = session.execute('SHOW TAGS')
Expand All @@ -104,25 +98,131 @@ session.release()

# option 2 with session_context, session will be released automatically
with connection_pool.session_context('root', 'nebula') as session:
session.execute('USE nba')
session.execute('USE basketballplayer')
result = session.execute('SHOW TAGS')
print(result)

# close the pool
connection_pool.close()
```

## Example of using session pool
## Using the Session Pool: A Guide

The session pool is a collection of sessions that are managed by the pool. It is designed to improve the efficiency of session management and to reduce the overhead of session creation and destruction.

Session Pool comes with the following assumptions:

1. A space must already exist in the database prior to the initialization of the session pool.
2. Each session pool is associated with a single user and a single space to ensure consistent access control for the user. For instance, a user may possess different access permissions across various spaces. To execute queries in multiple spaces, consider utilizing several session pools.
3. Whenever `sessionPool.execute()` is invoked, the session executes the query within the space specified in the session pool configuration.
4. It is imperative to avoid executing commands through the session pool that would alter passwords or remove users.

For more details, see [SessionPoolExample.py](example/SessionPoolExample.py).

## Example: Extracting Edge and Vertex Lists from Query Results

For graph visualization purposes, the following code snippet demonstrates how to effortlessly extract lists of edges and vertices from any query result by utilizing the `ResultSet.dict_for_vis()` method.

```python
result = session.execute(
'GET SUBGRAPH WITH PROP 2 STEPS FROM "player101" YIELD VERTICES AS nodes, EDGES AS relationships;')

data_for_vis = result.dict_for_vis()
```

Then, we could pass the `data_for_vis` to a front-end visualization library such as `vis.js`, `d3.js` or Apache ECharts. There is an example of Apache ECharts in [exapmple/apache_echarts.html](example/apache_echarts.html).

The dict/JSON structure with `dict_for_vis()` is as follows:

<details>
<summary>Click to expand</summary>

```json
{
'nodes': [
{
'id': 'player100',
'labels': ['player'],
'props': {
'name': 'Tim Duncan',
'age': '42',
'id': 'player100'
}
},
{
'id': 'player101',
'labels': ['player'],
'props': {
'age': '36',
'name': 'Tony Parker',
'id': 'player101'
}
}
],
'edges': [
{
'src': 'player100',
'dst': 'player101',
'name': 'follow',
'props': {
'degree': '95'
}
}
],
'nodes_dict': {
'player100': {
'id': 'player100',
'labels': ['player'],
'props': {
'name': 'Tim Duncan',
'age': '42',
'id': 'player100'
}
},
'player101': {
'id': 'player101',
'labels': ['player'],
'props': {
'age': '36',
'name': 'Tony Parker',
'id': 'player101'
}
}
},
'edges_dict': {
('player100', 'player101', 0, 'follow'): {
'src': 'player100',
'dst': 'player101',
'name': 'follow',
'props': {
'degree': '95'
}
}
},
'nodes_count': 2,
'edges_count': 1
}
```

</details>

## Example: Fetching Query Results into a Pandas DataFrame

> For `nebula3-python>=3.6.0`:

Assuming you have pandas installed, you can use the following code to fetch query results into a pandas DataFrame:

```bash
pip3 install pandas
```
There are some limitations while using the session pool:

1. There MUST be an existing space in the DB before initializing the session pool.
2. Each session pool is corresponding to a single USER and a single Space. This is to ensure that the user's access control is consistent. i.g. The same user may have different access privileges in different spaces. If you need to run queries in different spaces, you may have multiple session pools.
3. Every time when sessinPool.execute() is called, the session will execute the query in the space set in the session pool config.
4. Commands that alter passwords or drop users should NOT be executed via session pool.
```python
result = session.execute('<your query>')
df = result.as_data_frame()
```
see /example/SessinPoolExample.py
## Quick example to fetch result to dataframe

<details>
<summary>For `nebula3-python<3.6.0`:</summary>

```python
from nebula3.gclient.net import ConnectionPool
Expand All @@ -142,7 +242,7 @@ def result_to_df(result: ResultSet) -> pd.DataFrame:
col_name = columns[col_num]
col_list = result.column_values(col_name)
d[col_name] = [x.cast() for x in col_list]
return pd.DataFrame.from_dict(d, orient='columns')
return pd.DataFrame(d)

# define a config
config = Config()
Expand All @@ -165,7 +265,14 @@ connection_pool.close()

```

## Quick example to use storage-client to scan vertex and edge
</details>

## Quick Example: Using Storage Client to Scan Vertices and Edges

Storage Client enables you to scan vertices and edges from the storage service instead of the graph service w/ nGQL/Cypher. This is useful when you need to scan a large amount of data.

<details>
<summary>Click to expand</summary>

You should make sure the scan client can connect to the address of storage which see from `SHOW HOSTS`

Expand Down Expand Up @@ -205,7 +312,11 @@ while resp.has_next():
print(edge_data)
```

## How to choose nebula-python
</details>

See [ScanVertexEdgeExample.py](example/ScanVertexEdgeExample.py) for more details.

## Capability Matrix

| Nebula-Python Version | NebulaGraph Version |
| --------------------- | ------------------- |
Expand All @@ -220,34 +331,76 @@ while resp.has_next():
| 3.5.0 | >=3.4.0 |
| master | master |

## How to contribute to nebula-python

[Fork](https://github.com/vesoft-inc/nebula-python/fork) this repo, then clone it locally
(be sure to replace the `{username}` in the repo URL below with your GitHub username):
## Directory Structure Overview

```text
.
└──nebula-python
   │
   ├── nebula3 // client source code
   │   ├── fbthrift // the RPC code generated from thrift protocol
   │   ├── common
   │   ├── data
   │   ├── graph
   │   ├── meta
   │   ├── net // the net code for graph client
   │   ├── storage // the storage client code
   │   ├── Config.py // the pool config
   │   └── Exception.py // the exceptions
   │
   ├── examples
   │   ├── FormatResp.py // the format response example
   │   ├── SessionPoolExample.py // the session pool example
   │   ├── GraphClientMultiThreadExample.py // the multi thread example
   │   ├── GraphClientSimpleExample.py // the simple example
   │   └── ScanVertexEdgeExample.py // the scan vertex and edge example(storage client)
   │
   ├── tests // the test code
   │
   ├── setup.py // used to install or package
└── README.md // the introduction of nebula3-python

```


## Contribute to Nebula-Python

<details>
<summary>Click to expand</summary>

To contribute, start by [forking](https://github.com/vesoft-inc/nebula-python/fork) the repository. Next, clone your forked repository to your local machine. Remember to substitute `{username}` with your actual GitHub username in the URL below:

```bash
git clone https://github.com/{username}/nebula-python.git
cd nebula-python
```
For package management, we utilize [PDM](https://github.com/pdm-project/pdm). Please begin by installing it:

We use [PMD](https://github.com/pdm-project/pdm) to manage the package, install it first:

```
```bash
pipx install pdm
```

Visit the [PDM documentation](https://pdm-project.org) for alternative installation methods.

Install the package and all dev dependencies:
```

```bash
pdm install
```

Make sure the Nebula server in running, then run the tests with pytest:
```

```bash
pdm test
```

Using the default formatter with [black](https://github.com/psf/black).

Please run `pdm fmt` to format python code before submitting.

See [How to contribute](https://github.com/vesoft-inc/nebula-community/blob/master/Contributors/how-to-contribute.md) for the general process of contributing to Nebula projects.

</details>

Loading
Loading