Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature](cold-hot) support s3 resource #8808

Merged
merged 6 commits into from
Apr 13, 2022
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions docs/.vuepress/sidebar/en.js
Original file line number Diff line number Diff line change
Expand Up @@ -632,6 +632,7 @@ module.exports = [
"CREATE INDEX",
"CREATE MATERIALIZED VIEW",
"CREATE REPOSITORY",
"CREATE RESOURCE",
"CREATE TABLE LIKE",
"CREATE TABLE",
"CREATE VIEW",
Expand All @@ -641,6 +642,7 @@ module.exports = [
"DROP INDEX",
"DROP MATERIALIZED VIEW",
"DROP REPOSITORY",
"DROP RESOURCE",
"DROP TABLE",
"DROP VIEW",
"HLL",
Expand All @@ -649,6 +651,7 @@ module.exports = [
"REFRESH TABLE",
"RESTORE",
"SHOW ENCRYPTKEYS",
"SHOW RESOURCES",
"TRUNCATE TABLE",
"create-function",
"drop-function",
Expand Down
217 changes: 110 additions & 107 deletions docs/en/sql-reference/sql-statements/Administration/ALTER SYSTEM.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,114 +25,117 @@ under the License.
-->

# ALTER SYSTEM
## Description

This statement is used to operate on nodes in a system. (Administrator only!)
Grammar:
1) Adding nodes (without multi-tenant functionality, add in this way)
ALTER SYSTEM ADD BACKEND "host:heartbeat_port"[,"host:heartbeat_port"...];
2) Adding idle nodes (that is, adding BACKEND that does not belong to any cluster)
ALTER SYSTEM ADD FREE BACKEND "host:heartbeat_port"[,"host:heartbeat_port"...];
3) Adding nodes to a cluster
ALTER SYSTEM ADD BACKEND TO cluster_name "host:heartbeat_port"[,"host:heartbeat_port"...];
4) Delete nodes
ALTER SYSTEM DROP BACKEND "host:heartbeat_port"[,"host:heartbeat_port"...];
5) Node offline
ALTER SYSTEM DECOMMISSION BACKEND "host:heartbeat_port"[,"host:heartbeat_port"...];
6)226;- 21152;-Broker
ALTER SYSTEM ADD BROKER broker_name "host:port"[,"host:port"...];
(7) 20943;"23569;" Broker
ALTER SYSTEM DROP BROKER broker_name "host:port"[,"host:port"...];
8) Delete all Brokers
ALTER SYSTEM DROP ALL BROKER broker_name
9) Set up a Load error hub for centralized display of import error information
ALTER SYSTEM SET LOAD ERRORS HUB PROPERTIES ("key" = "value"[, ...]);
10) Modify property of BE
ALTER SYSTEM MODIFY BACKEND "host:heartbeat_port" SET ("key" = "value"[, ...]);

Explain:
1) Host can be hostname or IP address
2) heartbeat_port is the heartbeat port of the node
3) Adding and deleting nodes are synchronous operations. These two operations do not take into account the existing data on the node, the node is directly deleted from the metadata, please use cautiously.
4) Node offline operations are used to secure offline nodes. This operation is asynchronous. If successful, the node will eventually be removed from the metadata. If it fails, the offline will not be completed.
5) The offline operation of the node can be cancelled manually. See CANCEL DECOMMISSION for details
6) Load error hub:
Currently, two types of Hub are supported: Mysql and Broker. You need to specify "type" = "mysql" or "type" = "broker" in PROPERTIES.
If you need to delete the current load error hub, you can set type to null.
1) When using the Mysql type, the error information generated when importing will be inserted into the specified MySQL library table, and then the error information can be viewed directly through the show load warnings statement.

Hub of Mysql type needs to specify the following parameters:
host: mysql host
port: mysql port
user: mysql user
password: mysql password
database mysql database
table: mysql table

2) When the Broker type is used, the error information generated when importing will form a file and be written to the designated remote storage system through the broker. Make sure that the corresponding broker is deployed
Hub of Broker type needs to specify the following parameters:
Broker: Name of broker
Path: Remote Storage Path
Other properties: Other information necessary to access remote storage, such as authentication information.

7) Modify BE node attributes currently supports the following attributes:
1. tag.location:Resource tag
2. disable_query: Query disabled attribute
3. disable_load: Load disabled attribute

## example

1. Add a node
ALTER SYSTEM ADD BACKEND "host:port";

2. Adding an idle node
ALTER SYSTEM ADD FREE BACKEND "host:port";

3. Delete two nodes
ALTER SYSTEM DROP BACKEND "host1:port", "host2:port";

4. offline two nodes
ALTER SYSTEM DECOMMISSION BACKEND "host1:port", "host2:port";

5. Add two Hdfs Broker
ALTER SYSTEM ADD BROKER hdfs "host1:port", "host2:port";

6. Add a load error hub of Mysql type
ALTER SYSTEM SET LOAD ERRORS HUB PROPERTIES
("type"= "mysql",
"host" = "192.168.1.17"
"port" = "3306",
"User" = "my" name,
"password" = "my_passwd",
"database" = "doris_load",
"table" = "load_errors"
);

7. 添加一个 Broker 类型的 load error hub
ALTER SYSTEM SET LOAD ERRORS HUB PROPERTIES
("type"= "broker",
"Name" = BOS,
"path" = "bos://backup-cmy/logs",
"bos_endpoint" ="http://gz.bcebos.com",
"bos_accesskey" = "069fc278xxxxxx24ddb522",
"bos_secret_accesskey"="700adb0c6xxxxxx74d59eaa980a"
);

8. Delete the current load error hub
ALTER SYSTEM SET LOAD ERRORS HUB PROPERTIES
("type"= "null");

9. Modify BE resource tag

ALTER SYSTEM MODIFY BACKEND "host1:9050" SET ("tag.location" = "group_a");

10. Modify the query disabled attribute of BE

ALTER SYSTEM MODIFY BACKEND "host1:9050" SET ("disable_query" = "true");
## Description

11. Modify the load disabled attribute of BE

ALTER SYSTEM MODIFY BACKEND "host1:9050" SET ("disable_load" = "true");
This statement is used to operate on nodes in a system. (Administrator only!)

Syntax:
1) Adding nodes (without multi-tenant functionality, add in this way)
ALTER SYSTEM ADD BACKEND "host:heartbeat_port"[,"host:heartbeat_port"...];
2) Adding idle nodes (that is, adding BACKEND that does not belong to any cluster)
ALTER SYSTEM ADD FREE BACKEND "host:heartbeat_port"[,"host:heartbeat_port"...];
3) Adding nodes to a cluster
ALTER SYSTEM ADD BACKEND TO cluster_name "host:heartbeat_port"[,"host:heartbeat_port"...];
4) Delete nodes
ALTER SYSTEM DROP BACKEND "host:heartbeat_port"[,"host:heartbeat_port"...];
5) Node offline
ALTER SYSTEM DECOMMISSION BACKEND "host:heartbeat_port"[,"host:heartbeat_port"...];
6) Add Broker
ALTER SYSTEM ADD BROKER broker_name "host:port"[,"host:port"...];
7) Drop Broker
ALTER SYSTEM DROP BROKER broker_name "host:port"[,"host:port"...];
8) Delete all Brokers
ALTER SYSTEM DROP ALL BROKER broker_name
9) Set up a Load error hub for centralized display of import error information
ALTER SYSTEM SET LOAD ERRORS HUB PROPERTIES ("key" = "value"[, ...]);
10) Modify property of BE
ALTER SYSTEM MODIFY BACKEND "host:heartbeat_port" SET ("key" = "value"[, ...]);

Explain:
1) Host can be hostname or IP address
2) heartbeat_port is the heartbeat port of the node
3) Adding and deleting nodes are synchronous operations. These two operations do not take into account the existing data on the node, the node is directly deleted from the metadata, please use cautiously.
4) Node offline operations are used to secure offline nodes. This operation is asynchronous. If successful, the node will eventually be removed from the metadata. If it fails, the offline will not be completed.
5) The offline operation of the node can be cancelled manually. See CANCEL DECOMMISSION for details
6) Load error hub:
Currently, two types of Hub are supported: Mysql and Broker. You need to specify "type" = "mysql" or "type" = "broker" in PROPERTIES.
If you need to delete the current load error hub, you can set type to null.
1) When using the Mysql type, the error information generated when importing will be inserted into the specified MySQL library table, and then the error information can be viewed directly through the show load warnings statement.

Hub of Mysql type needs to specify the following parameters:
host: mysql host
port: mysql port
user: mysql user
password: mysql password
database mysql database
table: mysql table

2) When the Broker type is used, the error information generated when importing will form a file and be written to the designated remote storage system through the broker. Make sure that the corresponding broker is deployed
Hub of Broker type needs to specify the following parameters:
Broker: Name of broker
Path: Remote Storage Path
Other properties: Other information necessary to access remote storage, such as authentication information.

7) Modify BE node attributes currently supports the following attributes:
1. tag.location:Resource tag
2. disable_query: Query disabled attribute
3. disable_load: Load disabled attribute

## Example

1. Add a node
ALTER SYSTEM ADD BACKEND "host:port";

2. Adding an idle node
ALTER SYSTEM ADD FREE BACKEND "host:port";

3. Delete two nodes
ALTER SYSTEM DROP BACKEND "host1:port", "host2:port";

4. offline two nodes
ALTER SYSTEM DECOMMISSION BACKEND "host1:port", "host2:port";

5. Add two Hdfs Broker
ALTER SYSTEM ADD BROKER hdfs "host1:port", "host2:port";

6. Add a load error hub of Mysql type
ALTER SYSTEM SET LOAD ERRORS HUB PROPERTIES
("type"= "mysql",
"host" = "192.168.1.17"
"port" = "3306",
"User" = "my" name,
"password" = "my_passwd",
"database" = "doris_load",
"table" = "load_errors"
);

7. 添加一个 Broker 类型的 load error hub
ALTER SYSTEM SET LOAD ERRORS HUB PROPERTIES
("type"= "broker",
"Name" = BOS,
"path" = "bos://backup-cmy/logs",
"bos_endpoint" ="http://gz.bcebos.com",
"bos_accesskey" = "069fc278xxxxxx24ddb522",
"bos_secret_accesskey"="700adb0c6xxxxxx74d59eaa980a"
);

8. Delete the current load error hub
ALTER SYSTEM SET LOAD ERRORS HUB PROPERTIES
("type"= "null");

9. Modify BE resource tag

ALTER SYSTEM MODIFY BACKEND "host1:9050" SET ("tag.location" = "group_a");

10. Modify the query disabled attribute of BE

ALTER SYSTEM MODIFY BACKEND "host1:9050" SET ("disable_query" = "true");

11. Modify the load disabled attribute of BE

ALTER SYSTEM MODIFY BACKEND "host1:9050" SET ("disable_load" = "true");

## keyword
AGE,SYSTEM,BACKGROUND,BROKER,FREE

AGE, SYSTEM, BACKGROUND, BROKER, FREE
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,8 @@ under the License.
1) The following attributes of the modified partition are currently supported.
- storage_medium
- storage_cooldown_time
- storage_cold_medium
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not need storage cold medium, If user specified storage resource then it implies the cold storage is S3 or HDFS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also a old logic data migration from SSD to HDD, we use both storage_cold_medium and remote_storage_resource to distinguish them

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If BE have ssd,hdd,and s3 resource, how to define the operation from ssd->hdd->s3?

- remote_storage_resource
- replication_num
— in_memory
2) For single-partition tables, partition_name is the same as the table name.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
---
{
"title": "CREATE RESOURCE",
"language": "en"
}
---

<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

# CREATE RESOURCE

## Description

This statement is used to create a resource. Only the root or admin user can create resources. Currently supports Spark, ODBC, S3 external resources.
In the future, other external resources may be added to Doris for use, such as Spark/GPU for query, HDFS/S3 for external storage, MapReduce for ETL, etc.

Syntax:
CREATE [EXTERNAL] RESOURCE "resource_name"
qidaye marked this conversation as resolved.
Show resolved Hide resolved
PROPERTIES ("key"="value", ...);

Explanation:
1. The type of resource needs to be specified in PROPERTIES "type" = "[spark|odbc_catalog|s3]", currently supports spark, odbc_catalog, s3.
2. The PROPERTIES varies according to the resource type, see the example for details.

## Example

1. Create a Spark resource named spark0 in yarn cluster mode.

````
CREATE EXTERNAL RESOURCE "spark0"
PROPERTIES
(
"type" = "spark",
"spark.master" = "yarn",
"spark.submit.deployMode" = "cluster",
"spark.jars" = "xxx.jar,yyy.jar",
"spark.files" = "/tmp/aaa,/tmp/bbb",
"spark.executor.memory" = "1g",
"spark.yarn.queue" = "queue0",
"spark.hadoop.yarn.resourcemanager.address" = "127.0.0.1:9999",
"spark.hadoop.fs.defaultFS" = "hdfs://127.0.0.1:10000",
"working_dir" = "hdfs://127.0.0.1:10000/tmp/doris",
"broker" = "broker0",
"broker.username" = "user0",
"broker.password" = "password0"
);
````

Spark related parameters are as follows:
- spark.master: Required, currently supports yarn, spark://host:port.
- spark.submit.deployMode: The deployment mode of the Spark program, required, supports both cluster and client.
- spark.hadoop.yarn.resourcemanager.address: Required when master is yarn.
- spark.hadoop.fs.defaultFS: Required when master is yarn.
- Other parameters are optional, refer to http://spark.apache.org/docs/latest/configuration.html

Working_dir and broker need to be specified when Spark is used for ETL. described as follows:
working_dir: The directory used by the ETL. Required when spark is used as an ETL resource. For example: hdfs://host:port/tmp/doris.
broker: broker name. Required when spark is used as an ETL resource. Configuration needs to be done in advance using the `ALTER SYSTEM ADD BROKER` command.
broker.property_key: The authentication information that the broker needs to specify when reading the intermediate file generated by ETL.

2. Create an ODBC resource

````
CREATE EXTERNAL RESOURCE `oracle_odbc`
PROPERTIES (
"type" = "odbc_catalog",
"host" = "192.168.0.1",
"port" = "8086",
"user" = "test",
"password" = "test",
"database" = "test",
"odbc_type" = "oracle",
"driver" = "Oracle 19 ODBC driver"
);
````

The relevant parameters of ODBC are as follows:
- hosts: IP address of the external database
- driver: The driver name of the ODBC appearance, which must be the same as the Driver name in be/conf/odbcinst.ini.
- odbc_type: the type of the external database, currently supports oracle, mysql, postgresql
- user: username of the foreign database
- password: the password information of the corresponding user

3. Create S3 resource

````
CREATE RESOURCE "remote_s3"
PROPERTIES
(
"type" = "s3",
"s3_endpoint" = "http://bj.s3.com",
"s3_region" = "bj",
"s3_root_path" = "/path/to/root",
"s3_access_key" = "bbb",
"s3_secret_key" = "aaaa",
"s3_max_connections" = "50",
"s3_request_timeout_ms" = "3000",
"s3_connection_timeout_ms" = "1000"
);
````

S3 related parameters are as follows:
- s3_endpoint: s3 endpoint
- s3_region: s3 region
- s3_root_path: s3 root directory
- s3_access_key: s3 access key
- s3_secret_key: s3 secret key
- s3_max_connections: the maximum number of s3 connections, the default is 50
- s3_request_timeout_ms: s3 request timeout, in milliseconds, the default is 3000
- s3_connection_timeout_ms: s3 connection timeout, in milliseconds, the default is 1000


## keyword

CREATE, RESOURCE
Loading