Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exchange update 2 #2214

Merged
merged 2 commits into from
Aug 10, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,8 @@ Exchange has the following advantages:

## Version compatibility

Exchange supports Spark versions 2.2.x, 2.4.x, and 3.x.x, which are named `nebula-exchange_spark_2.2`, `nebula-exchange_spark_2.4`, and `nebula-exchange_spark_3.0` for different Spark versions.

The correspondence between the NebulaGraph Exchange version (the JAR version), the NebulaGraph core version and the Spark version is as follows.

| Exchange version | NebulaGraph version | Spark version |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,8 @@ For different data sources, the vertex configurations are different. There are m
|`tags.name`|string|-|Yes|The tag name defined in NebulaGraph.|
|`tags.type.source`|string|-|Yes|Specify a data source. For example, `csv`.|
|`tags.type.sink`|string|`client`|Yes|Specify an import method. Optional values are `client` and `SST`.|
|`tags.writeMode`|string|`INSERT`|No|Types of batch operations on data, including batch inserts, updates, and deletes. Optional values are `INSERT`, `UPDATE`, `DELETE`.|
|`tags.deleteEdge`|string|`false`|No|Whether or not to delete the related incoming and outgoing edges of the vertices when performing a batch delete operation. This parameter takes effect when `tags.writeMode` is `DELETE`.|
|`tags.fields`|list\[string\]|-|Yes|The header or column name of the column corresponding to properties. If there is a header or a column name, please use that name directly. If a CSV file does not have a header, use the form of `[_c0, _c1, _c2]` to represent the first column, the second column, the third column, and so on.|
|`tags.nebula.fields`|list\[string\]|-|Yes|Property names defined in NebulaGraph, the order of which must correspond to `tags.fields`. For example, `[_c1, _c2]` corresponds to `[name, age]`, which means that values in the second column are the values of the property `name`, and values in the third column are the values of the property `age`.|
|`tags.vertex.field`|string|-|Yes|The column of vertex IDs. For example, when a CSV file has no header, users can use `_c0` to indicate values in the first column are vertex IDs.|
Expand Down Expand Up @@ -245,6 +247,7 @@ For the specific parameters of different data sources for edge configurations, p
|`edges.name`| string|-|Yes|The edge type name defined in NebulaGraph.|
|`edges.type.source`|string|-|Yes|The data source of edges. For example, `csv`.|
|`edges.type.sink`|string|`client`|Yes|The method specified to import data. Optional values are `client` and `SST`.|
|`edges.writeMode`|string|`INSERT`|No|Types of batch operations on data, including batch inserts, updates, and deletes. Optional values are `INSERT`, `UPDATE`, `DELETE`.|
|`edges.fields`|list\[string\]|-|Yes|The header or column name of the column corresponding to properties. If there is a header or column name, please use that name directly. If a CSV file does not have a header, use the form of `[_c0, _c1, _c2]` to represent the first column, the second column, the third column, and so on.|
|`edges.nebula.fields`|list\[string\]|-|Yes|Edge names defined in NebulaGraph, the order of which must correspond to `edges.fields`. For example, `[_c2, _c3]` corresponds to `[start_year, end_year]`, which means that values in the third column are the values of the start year, and values in the fourth column are the values of the end year.|
|`edges.source.field`|string|-|Yes|The column of source vertices of edges. For example, `_c0` indicates a value in the first column that is used as the source vertex of an edge.|
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,12 @@ After Exchange is compiled, copy the conf file `target/classes/application.conf`
# policy:hash
}

# Batch operation types, including INSERT, UPDATE, and DELETE. defaults to INSERT.
#writeMode: INSERT

# Whether or not to delete the related incoming and outgoing edges of the vertices when performing a batch delete operation. This parameter takes effect when `writeMode` is `DELETE`.
#deleteEdge: false

# The number of data written to NebulaGraph in a single batch.
batch: 256

Expand Down Expand Up @@ -254,6 +260,9 @@ After Exchange is compiled, copy the conf file `target/classes/application.conf`
# (Optional) Specify a column as the source of the rank.
#ranking: rank

# Batch operation types, including INSERT, UPDATE, and DELETE. defaults to INSERT.
#writeMode: INSERT

# The number of data written to NebulaGraph in a single batch.
batch: 256

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -202,6 +202,12 @@ After Exchange is compiled, copy the conf file `target/classes/application.conf`
# If the CSV file does not have a header, set the header to false. The default value is false.
header: false

# Batch operation types, including INSERT, UPDATE, and DELETE. defaults to INSERT.
#writeMode: INSERT

# Whether or not to delete the related incoming and outgoing edges of the vertices when performing a batch delete operation. This parameter takes effect when `writeMode` is `DELETE`.
#deleteEdge: false

# The number of data written to NebulaGraph in a single batch.
batch: 256

Expand Down Expand Up @@ -317,6 +323,9 @@ After Exchange is compiled, copy the conf file `target/classes/application.conf`
# If the CSV file does not have a header, set the header to false. The default value is false.
header: false

# Batch operation types, including INSERT, UPDATE, and DELETE. defaults to INSERT.
#writeMode: INSERT

# The number of data written to NebulaGraph in a single batch.
batch: 256

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -203,6 +203,11 @@ After Exchange is compiled, copy the conf file `target/classes/application.conf`
# }
}

# Batch operation types, including INSERT, UPDATE, and DELETE. defaults to INSERT.
#writeMode: INSERT

# Whether or not to delete the related incoming and outgoing edges of the vertices when performing a batch delete operation. This parameter takes effect when `writeMode` is `DELETE`.
#deleteEdge: false

# Number of pieces of data written to NebulaGraph in a single batch.
batch: 256
Expand Down Expand Up @@ -283,6 +288,9 @@ After Exchange is compiled, copy the conf file `target/classes/application.conf`
# (Optional) Specify a column as the source of the rank.
#ranking: rank

# Batch operation types, including INSERT, UPDATE, and DELETE. defaults to INSERT.
#writeMode: INSERT

# The number of data written to NebulaGraph in a single batch.
batch: 256

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -245,6 +245,12 @@ After Exchange is compiled, copy the conf file `target/classes/application.conf`
# }
}

# Batch operation types, including INSERT, UPDATE, and DELETE. defaults to INSERT.
#writeMode: INSERT

# Whether or not to delete the related incoming and outgoing edges of the vertices when performing a batch delete operation. This parameter takes effect when `writeMode` is `DELETE`.
#deleteEdge: false

# The number of data written to NebulaGraph in a single batch.
batch: 256

Expand Down Expand Up @@ -318,6 +324,9 @@ After Exchange is compiled, copy the conf file `target/classes/application.conf`
# (Optional) Specify a column as the source of the rank.
#ranking: rank

# Batch operation types, including INSERT, UPDATE, and DELETE. defaults to INSERT.
#writeMode: INSERT

# The number of data written to NebulaGraph in a single batch.
batch: 256

Expand Down
41 changes: 36 additions & 5 deletions docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-jdbc.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,12 @@ Before importing data, you need to confirm the following information:

- Learn about the Schema created in NebulaGraph, including names and properties of Tags and Edge types, and more.

- The Hadoop service has been installed and started.

## Precautions

nebula-exchange_spark_2.2 supports only single table queries, not multi-table queries.

## Steps

### Step 1: Create the Schema in NebulaGraph
Expand Down Expand Up @@ -190,11 +196,18 @@ After Exchange is compiled, copy the conf file `target/classes/application.conf`
driver:"com.mysql.cj.jdbc.Driver"

# Database user name and password
user:root
user:"root"
password:"12345"

table:player
sentence:"select playerid, age, name from player order by playerid"
# Scanning a single table to read data.
# nebula-exchange_spark_2.2 must configure this parameter, and can additionally configure sentence.
# nebula-exchange_spark_2.4 and nebula-exchange_spark_3.0 can configure this parameter, but not at the same time as sentence.
table:"basketball.player"

# Use query statement to read data.
# nebula-exchange_spark_2.2 can configure this parameter. Multi-table queries are not supported. Only the table name needs to be written after from. The form `db.table` is not supported.
# nebula-exchange_spark_2.4 and nebula-exchange_spark_3.0 can configure this parameter, but not at the same time as table. Multi-table queries are supported.
# sentence:"select playerid, age, name from player, team order by playerid"

# (optional)Multiple connections read parameters. See https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html
partitionColumn:playerid # optional. Must be a numeric, date, or timestamp column from the table in question.
Expand All @@ -221,6 +234,12 @@ After Exchange is compiled, copy the conf file `target/classes/application.conf`
# }
}

# Batch operation types, including INSERT, UPDATE, and DELETE. defaults to INSERT.
#writeMode: INSERT

# Whether or not to delete the related incoming and outgoing edges of the vertices when performing a batch delete operation. This parameter takes effect when `writeMode` is `DELETE`.
#deleteEdge: false

# The number of data written to NebulaGraph in a single batch.
batch: 256

Expand Down Expand Up @@ -278,8 +297,17 @@ After Exchange is compiled, copy the conf file `target/classes/application.conf`
driver:"com.mysql.cj.jdbc.Driver"
user:root
password:"12345"
table:follow
sentence:"select src_player,dst_player,degree from follow order by src_player"

# Scanning a single table to read data.
# nebula-exchange_spark_2.2 must configure this parameter, and can additionally configure sentence.
# nebula-exchange_spark_2.4 and nebula-exchange_spark_3.0 can configure this parameter, but not at the same time as sentence.
table:"basketball.follow"

# Use query statement to read data.
# nebula-exchange_spark_2.2 can configure this parameter. Multi-table queries are not supported. Only the table name needs to be written after from. The form `db.table` is not supported.
# nebula-exchange_spark_2.4 and nebula-exchange_spark_3.0 can configure this parameter, but not at the same time as table. Multi-table queries are supported.
# sentence:"select src_player,dst_player,degree from follow order by src_player"

partitionColumn:src_player
lowerBound:1
upperBound:5
Expand Down Expand Up @@ -315,6 +343,9 @@ After Exchange is compiled, copy the conf file `target/classes/application.conf`
# (Optional) Specify a column as the source of the rank.
#ranking: rank

# Batch operation types, including INSERT, UPDATE, and DELETE. defaults to INSERT.
#writeMode: INSERT

# The number of data written to NebulaGraph in a single batch.
batch: 256

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -222,6 +222,12 @@ After Exchange is compiled, copy the conf file `target/classes/application.conf`
# }
}

# Batch operation types, including INSERT, UPDATE, and DELETE. defaults to INSERT.
#writeMode: INSERT

# Whether or not to delete the related incoming and outgoing edges of the vertices when performing a batch delete operation. This parameter takes effect when `writeMode` is `DELETE`.
#deleteEdge: false

# The number of data written to NebulaGraph in a single batch.
batch: 256

Expand Down Expand Up @@ -323,6 +329,9 @@ After Exchange is compiled, copy the conf file `target/classes/application.conf`
# (Optional) Specify a column as the source of the rank.
#ranking: rank

# Batch operation types, including INSERT, UPDATE, and DELETE. defaults to INSERT.
#writeMode: INSERT

# The number of data written to NebulaGraph in a single batch.
batch: 256

Expand Down
12 changes: 11 additions & 1 deletion docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-kafka.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,9 @@ Before importing data, you need to confirm the following information:

## Precautions

Only client mode is supported when importing Kafka data, i.e. the value of parameters `tags.type.sink` and `edges.type.sink` is `client`.
- Only client mode is supported when importing Kafka data, i.e. the value of parameters `tags.type.sink` and `edges.type.sink` is `client`.

- When importing Kafka data, do not use Exchange version 3.4.0, which adds caching of imported data and does not support streaming data import. Use Exchange versions 3.0.0, 3.3.0, or 3.5.0.

## Steps

Expand Down Expand Up @@ -186,6 +188,11 @@ After Exchange is compiled, copy the conf file `target/classes/application.conf`
# }
}

# Batch operation types, including INSERT, UPDATE, and DELETE. defaults to INSERT.
#writeMode: INSERT

# Whether or not to delete the related incoming and outgoing edges of the vertices when performing a batch delete operation. This parameter takes effect when `writeMode` is `DELETE`.
#deleteEdge: false

# The number of data written to NebulaGraph in a single batch.
batch: 10
Expand Down Expand Up @@ -252,6 +259,9 @@ After Exchange is compiled, copy the conf file `target/classes/application.conf`
# # (Optional) Specify a column as the source of the rank.
# #ranking: rank

# # Batch operation types, including INSERT, UPDATE, and DELETE. defaults to INSERT.
# #writeMode: INSERT

# # The number of data written to NebulaGraph in a single batch.
# batch: 10

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -179,6 +179,12 @@ After Exchange is compiled, copy the conf file `target/classes/application.conf`
# }
}

# Batch operation types, including INSERT, UPDATE, and DELETE. defaults to INSERT.
#writeMode: INSERT

# Whether or not to delete the related incoming and outgoing edges of the vertices when performing a batch delete operation. This parameter takes effect when `writeMode` is `DELETE`.
#deleteEdge: false

# The number of data written to NebulaGraph in a single batch.
batch: 256

Expand Down Expand Up @@ -277,6 +283,9 @@ After Exchange is compiled, copy the conf file `target/classes/application.conf`
# (Optional) Specify a column as the source of the rank.
#ranking: rank

# Batch operation types, including INSERT, UPDATE, and DELETE. defaults to INSERT.
#writeMode: INSERT

# The number of Spark partitions.
partition:10

Expand Down
43 changes: 37 additions & 6 deletions docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-mysql.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,12 @@ Before importing data, you need to confirm the following information:

- Learn about the Schema created in NebulaGraph, including names and properties of Tags and Edge types, and more.

- The Hadoop service has been installed and started.

## Precautions

nebula-exchange_spark_2.2 supports only single table queries, not multi-table queries.

## Steps

### Step 1: Create the Schema in NebulaGraph
Expand Down Expand Up @@ -187,11 +193,19 @@ After Exchange is compiled, copy the conf file `target/classes/application.conf`

host:192.168.*.*
port:3306
database:"basketball"
table:"player"
user:"test"
password:"123456"
sentence:"select playerid, age, name from player order by playerid;"
database:"basketball"

# Scanning a single table to read data.
# nebula-exchange_spark_2.2 must configure this parameter. Sentence is not supported.
# nebula-exchange_spark_2.4 and nebula-exchange_spark_3.0 can configure this parameter, but not at the same time as sentence.
table:"basketball.player"

# Use query statement to read data.
# This parameter is not supported by nebula-exchange_spark_2.2.
# nebula-exchange_spark_2.4 and nebula-exchange_spark_3.0 can configure this parameter, but not at the same time as table. Multi-table queries are supported.
# sentence: "select * from people, player, team"

# Specify the column names in the player table in fields, and their corresponding values are specified as properties in the NebulaGraph.
# The sequence of fields and nebula.fields must correspond to each other.
Expand All @@ -209,6 +223,12 @@ After Exchange is compiled, copy the conf file `target/classes/application.conf`
# }
}

# Batch operation types, including INSERT, UPDATE, and DELETE. defaults to INSERT.
#writeMode: INSERT

# Whether or not to delete the related incoming and outgoing edges of the vertices when performing a batch delete operation. This parameter takes effect when `writeMode` is `DELETE`.
#deleteEdge: false

# The number of data written to NebulaGraph in a single batch.
batch: 256

Expand Down Expand Up @@ -260,11 +280,19 @@ After Exchange is compiled, copy the conf file `target/classes/application.conf`

host:192.168.*.*
port:3306
database:"basketball"
table:"follow"
user:"test"
password:"123456"
sentence:"select src_player,dst_player,degree from follow order by src_player;"
database:"basketball"

# Scanning a single table to read data.
# nebula-exchange_spark_2.2 must configure this parameter. Sentence is not supported.
# nebula-exchange_spark_2.4 and nebula-exchange_spark_3.0 can configure this parameter, but not at the same time as sentence.
table:"basketball.follow"

# Use query statement to read data.
# This parameter is not supported by nebula-exchange_spark_2.2.
# nebula-exchange_spark_2.4 and nebula-exchange_spark_3.0 can configure this parameter, but not at the same time as table. Multi-table queries are supported.
# sentence: "select * from follow, serve"

# Specify the column names in the follow table in fields, and their corresponding values are specified as properties in the NebulaGraph.
# The sequence of fields and nebula.fields must correspond to each other.
Expand Down Expand Up @@ -295,6 +323,9 @@ After Exchange is compiled, copy the conf file `target/classes/application.conf`
# (Optional) Specify a column as the source of the rank.
#ranking: rank

# Batch operation types, including INSERT, UPDATE, and DELETE. defaults to INSERT.
#writeMode: INSERT

# The number of data written to NebulaGraph in a single batch.
batch: 256

Expand Down
11 changes: 11 additions & 0 deletions docs-2.0/nebula-exchange/use-exchange/ex-ug-import-from-neo4j.md
Original file line number Diff line number Diff line change
Expand Up @@ -190,6 +190,13 @@ After Exchange is compiled, copy the conf file `target/classes/application.conf`
# newColName:new-field
# }
}

# Batch operation types, including INSERT, UPDATE, and DELETE. defaults to INSERT.
#writeMode: INSERT

# Whether or not to delete the related incoming and outgoing edges of the vertices when performing a batch delete operation. This parameter takes effect when `writeMode` is `DELETE`.
#deleteEdge: false

partition: 10
batch: 1000
check_point_path: /tmp/test
Expand Down Expand Up @@ -250,6 +257,10 @@ After Exchange is compiled, copy the conf file `target/classes/application.conf`
# }
}
#ranking: rank

# Batch operation types, including INSERT, UPDATE, and DELETE. defaults to INSERT.
#writeMode: INSERT

partition: 10
batch: 1000
check_point_path: /tmp/test
Expand Down
Loading