[Improvement]: Build a rule of relationship between table and optimizer/resource group #1865

majin1102 · 2023-08-21T14:43:40Z

Search before asking

I have searched in the issues and found no similar issues.

What would you like to be improved?

Right now, when we want to declare a table optimized by some optimizer group, we have two clear ways:

set default optimizer group of a catalog, and don't declare optimizer group in table properties:
declare a property of 'self-optimizing.group' in table properties (in create table or alter table statement):

In practice, using default optimizer group has better experience while not flexible in case that multiple groups are necessary in one catalog. Using table property provides more flexibility but sacrifice user experience and security, imagine that every table(user) needs to know the resources behind AMS and has the authority to allocate resources, this could be a disaster.

It does't seem a big deal of this because in many cases there's only one external/default optimizer group without considerations for security and isolation. But it would be never late to have a better way to provide user experience, isolation and security for self-optimizing

How should we improve?

Better user experience
users only declare relationships in one place and use them everywhere. It's a bad idea to define a property in table which means table owner must know the concepts and instances.

It's a good idea of declaring properties in optimizer group and use an extendable rule like regex

Better security
Relationships of table and resource should be certain and can not be modified without the permission of the owner of resources. It is clear that declaring properties in optimizer group fulfills this criterion

Better isolation
when we declare relationships of table and resources or modify them, the rules must be mutually exclusive

In conclusion, I proposed that declaring regex rules in optimizer group defines relationships of table and resources. For example:

catalog1.db1.*
catalog2..

leads to a clear definition that this optimizer groups could be used in these tables and only used by them.

Are you willing to submit PR?

Yes I am willing to submit a PR!

Subtasks

No response

Code of Conduct

I agree to follow this project's Code of Conduct

github-actions · 2024-08-21T00:04:08Z

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

github-actions · 2024-09-04T00:04:26Z

This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'

Aireed · 2024-09-06T02:16:33Z

I believe the intended effect of this feature should be as follows:
Priority of group configuration: Table-level configuration > Regex rule configuration > Catalog default configuration.

If a rule is manually configured on the table, it should take precedence.
Rule persistence

Regex rule configurations should not be written to the underlying table's properties.
the rule stored in catalog properties

Rule change

Changes to regex rules will result in group changes for all affected tables.
After a regex rule is deleted, the catalog's default configuration should take effect.

It can take effect through TableRuntimeRefresh.

Rule queries:

When displaying the Optimize group list, show the rules affecting the tables which is collected from catalog prooperties.

2.Also display these configuration rules in the catalog's properties.

IMO,Based on the issue description, there was an initial intention to configure this rule at the group level. I agree with this, but from an implementation standpoint, this will involve extensive changes in every properties call.

If we configure the regex rules in the catalog's properties, the effect of this property can be consolidated in the BasicUnkeyedTable::properties call within the MixedCatalogUtil::mergeCatalogPropertiesToTable method, making it convenient to implement.

@XBaith @majin1102 @zhoujinsong @nicochen WDYT.

majin1102 · 2024-09-06T02:55:52Z

I don't think rules on catalog properties are necessary if we could use optimizer group

klion26 · 2024-09-07T14:06:18Z

Will it be possible for multiple types of optimizers to exist in one OptimizerGroup in the future? For example, the same OptimizerGroup may contain both Flink and Spark optimizers. The OptimizerGroup is similar to a logical resource pool, and different types of optimizers will occupy some resources.

majin1102 · 2024-09-18T11:39:52Z

Will it be possible for multiple types of optimizers to exist in one OptimizerGroup in the future? For example, the same OptimizerGroup may contain both Flink and Spark optimizers. The OptimizerGroup is similar to a logical resource pool, and different types of optimizers will occupy some resources.

What scenarios would this hybrid resource model be helpful for？
I believe this will introduce considerable complexity.

klion26 · 2024-09-20T02:30:49Z

@majin1102 Thanks for the reply, I'm asking this because of the following scenarios: when using Flink optimizer for merging, the optimizer may stop/or need to chase data, or there may be sudden needs for merging. However, Flink optimizer is not particularly good at automatic scaling (at least on Yarn).

In addition, if we consider resources, that is, OptimizerGroup is just a resource pool, and the optimizer is an application running in OptimzierGroup(similar to OptimizerGroup is a queue of Yarn, the optimizer is an application), will this not add too much complexity, or is there something I missed here? thanks

majin1102 added type:improvement good first issue Good for newcomers module:ams-server Ams server module labels Aug 21, 2023

github-actions bot added the stale label Aug 21, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 4, 2024

Aireed reopened this Sep 6, 2024

github-actions bot removed the stale label Sep 7, 2024

Aireed mentioned this issue Oct 28, 2024

[AMORO-1865]build a rule of relationship between table and resource group #3300

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Improvement]: Build a rule of relationship between table and optimizer/resource group #1865

[Improvement]: Build a rule of relationship between table and optimizer/resource group #1865

majin1102 commented Aug 21, 2023 •

edited

Loading

github-actions bot commented Aug 21, 2024

github-actions bot commented Sep 4, 2024

Aireed commented Sep 6, 2024

majin1102 commented Sep 6, 2024 •

edited

Loading

klion26 commented Sep 7, 2024

majin1102 commented Sep 18, 2024

klion26 commented Sep 20, 2024

[Improvement]: Build a rule of relationship between table and optimizer/resource group #1865

[Improvement]: Build a rule of relationship between table and optimizer/resource group #1865

Comments

majin1102 commented Aug 21, 2023 • edited Loading

Search before asking

What would you like to be improved?

How should we improve?

Are you willing to submit PR?

Subtasks

Code of Conduct

github-actions bot commented Aug 21, 2024

github-actions bot commented Sep 4, 2024

Aireed commented Sep 6, 2024

majin1102 commented Sep 6, 2024 • edited Loading

klion26 commented Sep 7, 2024

majin1102 commented Sep 18, 2024

klion26 commented Sep 20, 2024

majin1102 commented Aug 21, 2023 •

edited

Loading

majin1102 commented Sep 6, 2024 •

edited

Loading