-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature](cold-hot) support s3 resource #8808
Conversation
…ble with it Change-Id: Ic8b80b1a48d7d5224aff5fe61cb67366f28c6c1c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR approved by anyone and no changes requested. |
Could you please provide the sql references? For example how to define S3 resource and how to use it to create table |
@@ -296,9 +296,11 @@ Syntax: | |||
``` | |||
PROPERTIES ( | |||
"storage_medium" = "[SSD|HDD]", | |||
["storage_cold_medium" = "[HDD|S3]"], | |||
["remote_storage_resource" = "xxx"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just use storage_resource= xxx , if it is defined then it implies that the cold storage medium type is s3 or hdd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remote_storage_resource
must be used with storage_cold_medium
. Standalone use is not supported.
There is a check in PropertyAnalyzer
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need the conception of remote_storage_resource, if there is remote_storage_resource, there maybe a local_storage_resource.
And we could create a table on S3 or HDFS directly not in cold and hot scenario but in decouple compute and storage scenario.
"storage_medium" = "SSD", "storage_cooldown_time" = "2015-06-04 00:00:00" | ||
"storage_medium" = "SSD", | ||
"storage_cold_medium" = "S3", | ||
"remote_storage_resource" = "remote_s3", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cold_storage_resource = "xxx,xxxx,xxxx" it should support a array, because S3 bucket's bandwidth is too small, maybe we need multiple bucket to extend the bandwidth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the same resource when creating a table to facilitate data management.
Since the remote_storage_resource
is partition level, user can change the resource when creating a new partition or modify by ALTER TABEL
clause. In this way, a table can use multiple buckets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If user could specify multi resources then he do not care the bandwidth in the future. If there is only one resource he need to monitor and optimize the resource usage by using alter table clause. It is too hard for the user.
You can find the syntax in issue #8807 |
@@ -71,6 +71,8 @@ under the License. | |||
1) The following attributes of the modified partition are currently supported. | |||
- storage_medium | |||
- storage_cooldown_time | |||
- storage_cold_medium |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not need storage cold medium, If user specified storage resource then it implies the cold storage is S3 or HDFS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also a old logic data migration from SSD to HDD, we use both storage_cold_medium
and remote_storage_resource
to distinguish them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If BE have ssd,hdd,and s3 resource, how to define the operation from ssd->hdd->s3?
@@ -75,11 +94,20 @@ public static DataProperty read(DataInput in) throws IOException { | |||
public void write(DataOutput out) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use gson to do serialize and deserialize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DataProperty
is a already exist class. I can not change the serialize/deserialize to GSON since the backward compatible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
....
for read method, you could use
if (meta version > 108)
{
gson.readxxx
}
else
{
read field
read filed
...
}
for write method, you could use
if (meta version > 108)
{
gson.writexxx
}
else
{
write field
write field
....
}
fe/fe-core/src/main/java/org/apache/doris/catalog/OdbcCatalogResource.java
Outdated
Show resolved
Hide resolved
Change-Id: I8c1132651d528e2a929717ce02717a8e5206f018
fe/fe-core/src/main/java/org/apache/doris/clone/DynamicPartitionScheduler.java
Outdated
Show resolved
Hide resolved
@@ -25,17 +25,19 @@ under the License. | |||
--> | |||
|
|||
# SHOW RESOURCES |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should use system table instead of show command....
For example, select * from information_schema.resources where xxxx;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good suggestion, this pr is mainly for adding s3 resource, using the historical resource design.
We can do a code refactoring for show resource after this is done.
This would make the code more independent and would make the review easier
docs/zh-CN/sql-reference/sql-statements/Data Definition/CREATE RESOURCE.md
Show resolved
Hide resolved
|
||
SHOW RESOURCES, RESOURCES |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a system table is better solution for resources
fe/fe-core/src/main/java/org/apache/doris/catalog/DataProperty.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/catalog/DataProperty.java
Outdated
Show resolved
Hide resolved
Change-Id: I58dcf3f65d4ccf4d14f38863c474bb5d89749032
Change-Id: Ief7a1b2fad60de30efe72c1ccb80282c5856a29e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR approved by at least one committer and no changes requested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Add cold hot support in FE meta, support alter resource DDL in FE
Add cold hot support in FE meta, support alter resource DDL in FE
Add cold hot support in FE meta, support alter resource DDL in FE
Proposed changes
issue:#8807
remote_storage_resource
andremote_storage_cooldown_time
inDataProperty
.remote_storage_resource
Problem Summary:
Describe the overview of changes.
Checklist(Required)
Further comments
If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...