Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature](cold-hot) support s3 resource #8808

Merged
merged 6 commits into from
Apr 13, 2022
Merged

Conversation

qidaye
Copy link
Contributor

@qidaye qidaye commented Apr 1, 2022

Proposed changes

issue:#8807

  1. Support create s3 resource
  2. Support alter resource
  3. Add check whether the resource is in use before deleting it
  4. Support create OLAP table with s3 resource
  5. Add remote_storage_resource and remote_storage_cooldown_time in DataProperty.
  6. Support for modifying a single data property, do not support modify remote_storage_resource

Problem Summary:

Describe the overview of changes.

Checklist(Required)

  1. Does it affect the original behavior: (Yes/No/I Don't know)
  2. Has unit tests been added: (Yes/No/No Need)
  3. Has document been added or modified: (Yes/No/No Need)
  4. Does it need to update dependencies: (Yes/No)
  5. Are there any changes that cannot be rolled back: (Yes/No)

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

…ble with it

Change-Id: Ic8b80b1a48d7d5224aff5fe61cb67366f28c6c1c
pengxiangyu
pengxiangyu previously approved these changes Apr 1, 2022
Copy link
Contributor

@pengxiangyu pengxiangyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

github-actions bot commented Apr 1, 2022

PR approved by anyone and no changes requested.

@yiguolei
Copy link
Contributor

yiguolei commented Apr 1, 2022

Could you please provide the sql references? For example how to define S3 resource and how to use it to create table

@@ -296,9 +296,11 @@ Syntax:
```
PROPERTIES (
"storage_medium" = "[SSD|HDD]",
["storage_cold_medium" = "[HDD|S3]"],
["remote_storage_resource" = "xxx"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just use storage_resource= xxx , if it is defined then it implies that the cold storage medium type is s3 or hdd.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remote_storage_resource must be used with storage_cold_medium. Standalone use is not supported.
There is a check in PropertyAnalyzer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need the conception of remote_storage_resource, if there is remote_storage_resource, there maybe a local_storage_resource.
And we could create a table on S3 or HDFS directly not in cold and hot scenario but in decouple compute and storage scenario.

"storage_medium" = "SSD", "storage_cooldown_time" = "2015-06-04 00:00:00"
"storage_medium" = "SSD",
"storage_cold_medium" = "S3",
"remote_storage_resource" = "remote_s3",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cold_storage_resource = "xxx,xxxx,xxxx" it should support a array, because S3 bucket's bandwidth is too small, maybe we need multiple bucket to extend the bandwidth.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the same resource when creating a table to facilitate data management.
Since the remote_storage_resource is partition level, user can change the resource when creating a new partition or modify by ALTER TABEL clause. In this way, a table can use multiple buckets.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If user could specify multi resources then he do not care the bandwidth in the future. If there is only one resource he need to monitor and optimize the resource usage by using alter table clause. It is too hard for the user.

@qidaye
Copy link
Contributor Author

qidaye commented Apr 1, 2022

Could you please provide the sql references? For example how to define S3 resource and how to use it to create table

You can find the syntax in issue #8807

@@ -71,6 +71,8 @@ under the License.
1) The following attributes of the modified partition are currently supported.
- storage_medium
- storage_cooldown_time
- storage_cold_medium
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not need storage cold medium, If user specified storage resource then it implies the cold storage is S3 or HDFS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also a old logic data migration from SSD to HDD, we use both storage_cold_medium and remote_storage_resource to distinguish them

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If BE have ssd,hdd,and s3 resource, how to define the operation from ssd->hdd->s3?

@@ -75,11 +94,20 @@ public static DataProperty read(DataInput in) throws IOException {
public void write(DataOutput out) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use gson to do serialize and deserialize

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataProperty is a already exist class. I can not change the serialize/deserialize to GSON since the backward compatible

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

....
for read method, you could use
if (meta version > 108)
{
gson.readxxx
}
else
{
read field
read filed
...
}

for write method, you could use
if (meta version > 108)
{
gson.writexxx
}
else
{
write field
write field
....
}

@qidaye qidaye requested a review from yiguolei April 8, 2022 03:06
@morningman morningman added kind/meta-version-change Categorizes issue or PR as related to changing meta version area/remote-storage labels Apr 8, 2022
Change-Id: I8c1132651d528e2a929717ce02717a8e5206f018
@@ -25,17 +25,19 @@ under the License.
-->

# SHOW RESOURCES
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use system table instead of show command....
For example, select * from information_schema.resources where xxxx;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good suggestion, this pr is mainly for adding s3 resource, using the historical resource design.
We can do a code refactoring for show resource after this is done.
This would make the code more independent and would make the review easier


SHOW RESOURCES, RESOURCES
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a system table is better solution for resources

Change-Id: I58dcf3f65d4ccf4d14f38863c474bb5d89749032
@github-actions github-actions bot added area/planner Issues or PRs related to the query planner kind/docs Categorizes issue or PR as related to documentation. labels Apr 11, 2022
Change-Id: Ief7a1b2fad60de30efe72c1ccb80282c5856a29e
Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 11, 2022
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

@yiguolei yiguolei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yiguolei yiguolei merged commit bca1213 into apache:master Apr 13, 2022
@qidaye qidaye deleted the remote_resource branch April 13, 2022 02:07
weizhengte pushed a commit to weizhengte/incubator-doris that referenced this pull request Apr 22, 2022
Add cold hot support in FE meta, support alter resource DDL in FE
zhengshiJ pushed a commit to zhengshiJ/incubator-doris that referenced this pull request Apr 27, 2022
Add cold hot support in FE meta, support alter resource DDL in FE
starocean999 pushed a commit to starocean999/incubator-doris that referenced this pull request May 19, 2022
Add cold hot support in FE meta, support alter resource DDL in FE
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. area/planner Issues or PRs related to the query planner area/remote-storage kind/docs Categorizes issue or PR as related to documentation. kind/meta-version-change Categorizes issue or PR as related to changing meta version reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants