-
Notifications
You must be signed in to change notification settings - Fork 755
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: functional dependency optimization #7438
Comments
@leiysky HI. I would like to take this issue. Is there any one working on this? I swear I will finish this on my own ~~~ :( |
We have a plan to implement this after #6547 is done. You are still welcome to take the issue, but I suggest you make a proposal first so we can review it and give some advice since this is an important infrastructure for the optimizer. |
Great. I will write a proposal on this. |
Can this function be optimized? |
I think so. I will consider it inside the proposal. |
This is about CSE optimization. Is it included in functional dependency optimization? |
HI. I think we could do this via functional dependency. What do you mean by CSE optimization? |
CSE is abbr of Common Subexpression Elimination. Let's explain with an example: Given query I'm not familiar with functional dependency, but from the very limited knowledge I've learned from the wiki, functional dependency is discovered from the relation between data of different columns and can be broken during updates on data. |
@andylokandy @BohuTANG Sorry. After some investigations, I think I was wrong about doing CSE optimization (the duplicate json_parse elimination) via functional dependency analysis. It's basically another problem to solve. For CSE optimization, I think we could build a common sub-expression collector and rewrite project list or predicate conditions in a select-where-from block via discovered CSEs. The basic idea of the CSE collector is to rewrite each expression in a canonical form which uniquely identify a set of semantically equal expressions, search CSE as visiting it. If the leaf node of an expression is a column during canonical form rewriting, rewrite it to its equivalent column with smallest column id according to an equality inferrer which could be used to infer whether two columns are equal. |
@AngleNet I've read your proposal, but would you mind putting the document into a GitHub issue so we can review it without login Tencent account? |
Good. I will put it in a issue later. |
A functional dependency is a relationship between two attributes in the same database.
If an attribute determines another attribute, we will call it a determinant, and the determined attribute is functionally dependent on the determinant.
We will denote the relationship with$X \rightarrow Y$ if $X$ is a determinant of $Y$ , where both $X$ and $Y$ can be a set of attributes.
For example, given a SQL:
Here the attribute$(a) \rightarrow (b)$ .
a
is a determinant ofb
becauseb
is derived froma
. Thus we can get a functional dependencyFunctional dependency is helpful in query optimization, with it we can eliminate outer joins, eliminate useless
DISTINCT
, eliminate cross joins and etc.Reference wiki and the thesis for more details.
The functional dependencies can be derived from relational operators, so it's possible to encapsulate them into
RelationalProperty
.The text was updated successfully, but these errors were encountered: