Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the deploy_with_existing_vineyard_cluster doc for easy understanding #2930

Merged
merged 2 commits into from
Jun 27, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 60 additions & 7 deletions docs/deployment/deploy_with_existing_vineyard_cluster.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,34 @@
# Depoly with Existing Vineyard Cluster
# Deploy with Existing Vineyard Cluster

If you have already deployed a vineyard cluster, you can easily deploy GraphScope on the existing cluster and reuse the vineyard data such as graph with several GraphScope sessions. This will allow you to load a graph to the existing vineyard cluster and then reuse it with multiple GraphScope sessions, without needing to deploy a separate vineyard cluster for each session.
If you have already deployed a vineyard cluster, you can easily deploy GraphScope on the existing cluster and reuse the vineyard data such as graph with several GraphScope sessions. This will allow you to load a graph to the existing vineyard cluster and then reuse it with multiple GraphScope sessions, without the need to deploy a separate vineyard cluster for each session

dashanji marked this conversation as resolved.
Show resolved Hide resolved
This doc provides step-by-step instructions on how to do this.
:::{figure-md}

<img src="../images/default_session.png"
alt="GraphScope default session"
width="80%">

Create a default GraphScope session
:::

If you create a default GraphScope session, all engines including Vineyard are bundled in the same pod, so that they can be deployed on
any node within the Kubernetes cluster. However, this creates a closed Vineyard cluster, which is only accessible to the GraphScope session. When the session is closed, the Vineyard cluster is also deleted, and it cannot be accessed by other GraphScope sessions.


:::{figure-md}

<img src="../images/session_with_vineyard_cluster.png"
alt="GraphScope sessions connect to an existing vineyard cluster"
width="80%">

Connecting GraphScope sessions to an existing vineyard cluster for data sharing
:::

The figure above shows that GraphScope sessions can share the data in the same vineyard cluster as the engines in different sessions are deployed on the same node within the Kubernetes cluster and connected to the same vineyard socket. Multiple sessions can reuse the same graph as long as the vineyard cluster is alive. This is a common use pattern of vineyard on Kubernetes.

If you don't want to reserve the vineyard cluster for a long time, you can store the graphs in the vineyard cluster in the persistent storage, and then load the data from the persistent storage to the vineyard cluster when you need it. For more details, please refer to [Persistent storage of graphs on the Kubernetes cluster](./persistent_storage_of_graphs_on_k8s.md).

Next provides step-by-step instructions on how to do this.

## Prerequisites

Expand Down Expand Up @@ -36,16 +62,28 @@ python3 -m pip install vineyard
```

By default, the Vineyard cluster consists of three Vineyard instances and three etcd instances.
However, since we only have one node in the Kubernetes cluster, we need to specify the number of Vineyard instances and etcd instances using the `vineyard_replicas` and `vineyard_etcd_replicas` parameters.
However, since we only have one node in the Kubernetes cluster, we need to specify the number of Vineyard instances and etcd instances using the `vineyard_replicas` and `vineyard_etcd_replicas` parameters. DON'T set the number of Vineyard instances and etcd instances to be greater than the number of nodes in the Kubernetes cluster. Instead, the number of vineyard replicas and the number of engine pods can be set independently.

Create and check the namespace `vineyard-system` as follows.

```bash
$ kubectl create namespace vineyard-system
namespace/vineyard-system created
$ kubectl get namespace vineyard-system
NAME STATUS AGE
vineyard-system Active 33s
```

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

etcd replica could have a reasonable default, and don't let the user care those details in the getting started tutorial.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we raise error when the number of vineyard instances greater than the number of nodes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we raise error when the number of vineyard instances greater than the number of nodes?

It should be managed by the users rather than vineyardctl. Also, it won't block the creation of the vineyard cluster. if the number of vineyard instances is greater than the number of nodes, the other vineyard pods will be pending.

To deploy a simple Vineyard cluster with one Vineyard instance and one etcd instance, follow the next step:

```python
import vineyard

# The default deployment name is `vineyardd-sample` and the default namespace is `vineyard-system`. Also, you can specify the deployment name and namespace by `name` parameter and `namespace` parameter. For more details about the parameters, please refer to the doc of vineyardctl
# https://github.com/v6d-io/v6d/blob/main/k8s/cmd/README.md
# Notice, all character `-` in the parameter of vineyardctl should be replaced with `_` in the python API
vineyard.deploy.vineyardctl.deploy.vineyard_deployment(
vineyard_replicas=1,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only leave the vineyard_replicas is okay, the others could use the default.
Also since you already use the name vineyard_deployment, why you prefix every parameters with vineyard_? for example, just vineyard_deployment(replicas=1, etcd_replicas=2) is meaningful enough for me.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only leave the vineyard_replicas is okay, the others could use the default. Also since you already use the name vineyard_deployment, why you prefix every parameters with vineyard_? for example, just vineyard_deployment(replicas=1, etcd_replicas=2) is meaningful enough for me.

Thanks for the feedback, I will change the parameters in the next version.

vineyard_etcd_replicas=1,
create_namespace=True
)
```

Expand All @@ -60,6 +98,17 @@ etcd0 1/1 Running 0 73m
vineyardd-sample-5db59987f-vr2fg 1/1 Running 0 73m
```

## The lifecycle of a vineyard cluster

If you deploy the vineyard cluster with the vineyardctl API, it will persist until you manually delete it. The vineyard cluster will not be affected by quitting the GraphScope session. You can delete the vineyard cluster with the following command:

```python
import vineyard
vineyard.deploy.vineyardctl.delete.vineyard_deployment()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete doesn't require the deployment name?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a default deployment name vineyardd-sample, as we create the vineyard deployment with the default name.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emm, I think better let the delete be explicit.

```

However, if you do not deploy the vineyard cluster beforehand, it will be created when you create a GraphScope session with the specified vineyard deployment name and namespace. The vineyard cluster will be deleted when you close the GraphScope session.

## Load the dataset to the Kubernetes cluster

Depending on how the Kubernetes cluster was created, you may need to take different steps to make your dataset available within the cluster. If the cluster was not created using minikube, you will need to either copy the dataset to the nodes of the Kubernetes cluster or mount it onto them. On the other hand, if the cluster was created using minikube, you can directly mount the dataset to the minikube VM, without the need for further copying or mounting operations.
Expand Down Expand Up @@ -103,6 +152,8 @@ k8s_volumes = {
}

# the step will be long as it will create a graphscope cluster
# Make sure the vineyard cluster is created before creating the GraphScope session
# if it's not exist, a new vineyard cluster will be created and the graph will be loaded to the new vineyard cluster
sess = graphscope.session(
k8s_namespace='vineyard-system',
k8s_vineyard_deployment='vineyardd-sample',
Expand All @@ -125,6 +176,7 @@ then load the graph with the vineyard id in the new GraphScope session.

```python
import graphscope
import vineyard

# the step will be long as it will create a graphscope cluster
new_sess = graphscope.session(
Expand All @@ -134,7 +186,7 @@ new_sess = graphscope.session(

# Use the vineyard id of the graph the last GraphScope session loaded
# assume the vineyard id is 22731319746904674, you can load it as follows
graph = new_sess.load_from(vineyard_id=22731319746904674)
graph = new_sess.load_from(vineyard.ObjectID(22731319746904674))
```

Check the graph as follows.
Expand All @@ -156,6 +208,7 @@ If you see the output above, that means you have successfully reused the existin
Delete the Vineyard cluster by

```python
# the default vineyard deployment name is `vineyardd-sample` and namespace is `vineyard-system`, if you don't specify the arguments when you create the vineyard cluster, you can delete it as follows
vineyard.deploy.vineyardctl.delete.vineyard_deployment()
```

Expand Down
Binary file added docs/images/default_session.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/session_with_vineyard_cluster.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.