-
Notifications
You must be signed in to change notification settings - Fork 14
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Signed-off-by: SimFG <bang.fu@zilliz.com>
- Loading branch information
Showing
6 changed files
with
333 additions
and
13 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,41 @@ | ||
# Milvus-CDC | ||
|
||
Milvus-CDC is a change data capture tool for Milvus. It can capture the changes of upstream Milvus collections and sink them to downstream Milvus. | ||
CDC is "Change Data Capture", and Milvus-CDC is a change data capture tool for Milvus. It can capture the changes of upstream Milvus collections and sink them to downstream Milvus. This will bring the following benefits: | ||
|
||
1. Data reliability is improved and the probability of data loss is reduced; | ||
2. Based on CDC, the active-standby disaster recovery feature of milvus can be implemented to ensure that even if the milvus-source cluster fails, it can be quickly switched to ensure the availability of upper-layer services; | ||
|
||
## Quick Start | ||
|
||
**Please use the source code to compile and use it now**, because milvus-cdc is undergoing rapid iteration, the image currently provided is not stable, and the implementation method has also changed a lot. | ||
|
||
```bash | ||
git clone https://github.com/zilliztech/milvus-cdc.git | ||
|
||
make build | ||
``` | ||
|
||
After successfully building, the `cdc` bin file will be generated in the `server` directory. | ||
|
||
**DON'T execute it directly.** If you do it, I think you must get a error because you need to configure it before using it. How to configure and use cdc, refer to: [Milvus-CDC Usage](doc/cdc-usage.md) | ||
|
||
## Basic Components | ||
|
||
At present, cdc mainly consists of two parts: http server and corelib. | ||
|
||
- The http server, is responsible for accepting user-side requests, controlling task execution, and maintaining meta-information; | ||
- corelib, is responsible for synchronizing the execution of tasks, including reader and writer: | ||
- reader reads relevant information from etcd and mq of source Milvus; | ||
- The writer converts the msg in mq into Milvus api parameters and sends the request to the target Milvus; | ||
|
||
![components](doc/pic/milvus-cdc-components.png) | ||
|
||
## CDC Data Processing Flow | ||
|
||
1. User creates cdc task through http interface; | ||
2. Obtain collection-related meta-information through etcd in Milvus-source, such as the channel information and checkpoint information corresponding to the collection, etc; | ||
3. After obtaining the meta-information related to the collection, connect to mq(message queue) to subscribe to the data; | ||
4. Read the data in mq, parse the data and forward it through go-sdk or perform the same operation as milvus-source; | ||
|
||
![flow](doc/pic/milvus-cdc-data.png) | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,291 @@ | ||
# Milvus-CDC Usage | ||
|
||
## Limitation | ||
|
||
1. Only the cluster dimension can be synchronized, that is, the collection name can only be specified as `*` when creating a task. The collection dimension will also be supported in the future. | ||
2. Supported operations | ||
- Create/Drop Collection | ||
- Insert/Delete | ||
- Create/Drop Partition | ||
- Create/Drop Index | ||
- Load/Release/Flush | ||
- Create/Drop Database | ||
|
||
**Anything not mentioned is not supported;** | ||
|
||
3. Milvus cdc only supports synchronized data. If you need active and backup disaster recovery functions, please contact us; | ||
|
||
## Configuration | ||
|
||
The configuration consists of two main parts: the Milvus configuration for Milvus-Target and the CDC startup configuration. | ||
|
||
Furthermore, please ensure that both the source and target Milvus versions are **2.3.2 or higher**. Any other versions are not supported. | ||
|
||
1. Milvus-Target Config | ||
|
||
Set the value of `common.ttMsgEnabled` in the milvus.yaml file in the milvus-target cluster to `false`. | ||
|
||
2. CDC Startup Config | ||
|
||
The following is the cdc startup configuration file template. You can see this file in the `server/configs` directory. | ||
|
||
```yaml | ||
# cdc server address | ||
address: 0.0.0.0:8444 | ||
# max task num | ||
maxTaskNum: 100 | ||
# max task name length | ||
maxNameLength: 256 | ||
|
||
# cdc meta data config | ||
metaStoreConfig: | ||
# the metastore type, available value: etcd, mysql | ||
storeType: etcd | ||
# etcd address | ||
etcdEndpoints: | ||
- localhost:2379 | ||
# mysql connection address | ||
mysqlSourceUrl: root:root@tcp(127.0.0.1:3306)/milvus-cdc?charset=utf8 | ||
# meta data prefix, if multiple cdc services use the same store service, you can set different rootPaths to achieve multi-tenancy | ||
rootPath: cdc | ||
|
||
# milvus-source config, these settings are basically the same as the corresponding configuration of milvus.yaml in milvus source. | ||
sourceConfig: | ||
# etcd config | ||
etcdAddress: | ||
- localhost:2379 | ||
etcdRootPath: by-dev | ||
etcdMetaSubPath: meta | ||
# default partition name | ||
defaultPartitionName: _default | ||
# read buffer length, mainly used for buffering if writing data to milvus-target is slow. | ||
readChanLen: 10 | ||
# milvus-target mq config, which is pulsar or kafka | ||
pulsar: | ||
address: pulsar://localhost:6650 | ||
webAddress: localhost:80 | ||
maxMessageSize: 5242880 | ||
tenant: public | ||
namespace: default | ||
# authPlugin: org.apache.pulsar.client.impl.auth.AuthenticationToken | ||
# authParams: token:xxx | ||
# kafka: | ||
# address: 127.0.0.1:9092 | ||
``` | ||
|
||
After completing these two steps of configuration, do not rush to start CDC. Need to ensure the following points: | ||
|
||
1. Whether the `common.ttMsgEnabled` configuration value of milvus-target is `false``; | ||
2. Confirm that the `mq` type configured by cdc is the same as the type configured by milvus-source; | ||
3. Ensure that the network environment where the cdc service is located can correctly connect to the mq and etcd addresses in the milvus-source in the configuration; | ||
|
||
## Usage | ||
|
||
All http APIs need to comply with the following rules: | ||
|
||
- Request method: POST | ||
- Request path:/cdc | ||
- Request body: | ||
|
||
```json | ||
{ | ||
"request_type": "", | ||
"request_data": {} | ||
} | ||
``` | ||
|
||
Different requests are distinguished by `request_type`, which currently includes: create, delete, pause, resume, get and list. If the request fails, a non-200 http status code will be returned. | ||
|
||
### create request | ||
|
||
- milvus_connect_param, the connection params of the milvus-target server; | ||
- collection_infos, the collection information that needs to be synchronized, which currently only supports `*`; | ||
- rpc_channel_info, the corresponding name value is composed of the two values of `common.chanNamePrefix.cluster` and `common.chanNamePrefix.replicateMsg` in **milvus-source**, connected by the symbol `-` | ||
|
||
```http | ||
POST localhost:8444/cdc | ||
Content-Type: application/json | ||
body: | ||
{ | ||
"request_type":"create", | ||
"request_data":{ | ||
"milvus_connect_param":{ | ||
"host":"localhost", | ||
"port":19530, | ||
"username":"root", | ||
"password":"Milvus", | ||
"enable_tls":true, | ||
"connect_timeout":10 | ||
}, | ||
"collection_infos":[ | ||
{ | ||
"name":"*" | ||
} | ||
], | ||
"rpc_channel_info": { | ||
"name": "by-dev-replicate-msg" | ||
} | ||
} | ||
} | ||
``` | ||
|
||
After success, the task_id will be returned, such as: | ||
|
||
```json | ||
{"task_id":"6623ae52d35842a5a2c9d89b16ed7aa1"} | ||
``` | ||
|
||
If there is an exception, an http error will appear. | ||
|
||
### delete request | ||
|
||
delete a cdc task. | ||
|
||
**request** | ||
|
||
```http | ||
POST localhost:8444/cdc | ||
Content-Type: application/json | ||
body: | ||
{ | ||
"request_type":"delete", | ||
"request_data": { | ||
"task_id": "f84605ae48fb4170990ab80addcbd71e" | ||
} | ||
} | ||
``` | ||
|
||
**response** | ||
|
||
```json | ||
{} | ||
``` | ||
|
||
### pause request | ||
|
||
pause a cdc task. | ||
|
||
**request** | ||
|
||
```http | ||
POST localhost:8444/cdc | ||
Content-Type: application/json | ||
body: | ||
{ | ||
"request_type":"pause", | ||
"request_data": { | ||
"task_id": "4d458a58b0f74e85b842b1244dc69546" | ||
} | ||
} | ||
``` | ||
|
||
**response** | ||
|
||
```json | ||
{} | ||
``` | ||
|
||
### resume request | ||
|
||
resume a cdc task. | ||
|
||
**request** | ||
|
||
```http | ||
POST localhost:8444/cdc | ||
Content-Type: application/json | ||
body: | ||
{ | ||
"request_type":"resume", | ||
"request_data": { | ||
"task_id": "4d458a58b0f74e85b842b1244dc69546" | ||
} | ||
} | ||
``` | ||
|
||
**response** | ||
|
||
```json | ||
{} | ||
``` | ||
|
||
### get request | ||
|
||
get a cdc task info | ||
|
||
**request** | ||
|
||
```http | ||
POST localhost:8444/cdc | ||
Content-Type: application/json | ||
body: | ||
{ | ||
"request_type":"get", | ||
"request_data": { | ||
"task_id": "4d458a58b0f74e85b842b1244dc69546" | ||
} | ||
} | ||
``` | ||
|
||
**response** | ||
|
||
```json | ||
{ | ||
"task_id":"4d458a58b0f74e85b842b1244dc69546", | ||
"Milvus_connect_param":{ | ||
"host":"localhost", | ||
"port":19530, | ||
"connect_timeout":10 | ||
}, | ||
"collection_infos":[ | ||
{ | ||
"name":"*" | ||
} | ||
], | ||
"state":"Running" | ||
} | ||
``` | ||
|
||
### list request | ||
|
||
list the info of all cdc tasks. | ||
|
||
**request** | ||
|
||
```http | ||
POST localhost:8444/cdc | ||
Content-Type: application/json | ||
body: | ||
{ | ||
"request_type":"list" | ||
} | ||
``` | ||
|
||
**response** | ||
|
||
```json | ||
{ | ||
"tasks": [ | ||
{ | ||
"task_id": "4d458a58b0f74e85b842b1244dc69546", | ||
"Milvus_connect_param": { | ||
"host": "localhost", | ||
"port": 19530, | ||
"connect_timeout": 10 | ||
}, | ||
"collection_infos": [ | ||
{ | ||
"name": "*" | ||
} | ||
], | ||
"state": "Running" | ||
} | ||
] | ||
} | ||
``` |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters