-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: track fine-grained view dependencies #49872
sql: track fine-grained view dependencies #49872
Conversation
Please let me know if I missed any column references where view dependencies should be added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not entirely convinced the logic covers all cases; even if it does, it seems fragile. We should add a test (similar to TestBuilder) which adds a "view-deps" directive which builds a query and just prints out the view deps. That would allow us to much more easily test all sorts of cases.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @RichardJCai)
pkg/sql/opt/view_dependencies.go, line 31 at r1 (raw file):
ColumnOrdinals util.FastIntSet ColumnIDToOrd map[ColumnID]int
Any ColumnID that is coming from a TableID can be converted to a table ordinal using TableID.ColumnOrdinal, can we use that instead of adding this map?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 7 of 7 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @RichardJCai)
pkg/sql/logictest/testdata/logic_test/views, line 697 at r1 (raw file):
# Should be able to drop columns views don't depend on. statement ok CREATE TABLE tv8 (x INT, y INT);
[nit] did we already have tv1-tv7? I don't see them above...
pkg/sql/logictest/testdata/logic_test/views, line 707 at r1 (raw file):
statement ok CREATE TABLE tv9(x INT, y INT); CREATE VIEW v9 AS SELECT x FROM tv9 GROUP BY x,y
you should add some tests where you are also projecting an expression, such as "SELECT x+1 ..."
pkg/sql/logictest/testdata/logic_test/views, line 728 at r1 (raw file):
statement error pq: cannot drop column \"y\" because view \"v10\" depends on it ALTER TABLE tv10a DROP COLUMN y
tv10a -> tv10b
(would help if the table names were a bit more friendly to make this sort of duplicate test easier to spot)
pkg/sql/logictest/testdata/logic_test/views, line 778 at r1 (raw file):
ALTER TABLE tv13 DROP COLUMN y # Dependencies should be tracked in window functions.
[nit] aggregate and window functions
(your first example below is not a window function)
pkg/sql/logictest/testdata/logic_test/views, line 789 at r1 (raw file):
statement ok CREATE TABLE tv15(x INT, y INT); CREATE VIEW v15 AS SELECT sum(x) OVER (PARTITION by y) FROM tv15;
what about adding ORDER BY with a different column to the partition too?
pkg/sql/opt/optbuilder/builder.go, line 342 at r1 (raw file):
} // TrackReferencedColumnForViews is used to add a column to a views dependencies.
[nit] a views dependencies
pkg/sql/opt/optbuilder/builder.go, line 348 at r1 (raw file):
for i := range b.viewDeps { dep := b.viewDeps[i] for id, ord := range dep.ColumnIDToOrd {
Since this is a map you shouldn't need to iterate through the whole thing. Can't you just do:
if ord, ok := dep.ColumnIDToOrd[col.id]; ok {
dep.ColumnOrdinals.Add(ord)
}
I agree that this seems fragile, do you have any suggestions to make it less so? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @RichardJCai)
pkg/sql/opt/view_dependencies.go, line 31 at r1 (raw file):
Previously, RichardJCai (Richard Cai) wrote…
I agree that this seems fragile, do you have any suggestions to make it less so? Also I like the idea of having a view-deps test directive. Although we still can't exhaustively test all the cases, what kind of testing would we need to be confident?
Not really, other than the test. Just being able to see all the deps at once instead of having to probe specific columns by trying to remove them would make a big difference. Some conditions that we should test in addition to what you already tested:
- cases where the same table is used multiple times (with different column sets)
- cases where a column is only used by an ORDER BY (eg
SELECT x FROM t ORDER BY y
), including in ORDER BY inside aggregates - cases with an ordered-set aggregate function (eg
SELECT percentile_cont(0.50) WITHIN GROUP (ORDER BY c) FROM t
)
I'm also wondering if |
We need to include all columns referenced in the query, not just what ends up being used after optimization. If we don't, we can remove columns and then the view query won't work anymore. That is #17269, see an example in there. |
Ah ok got it -- never mind then! |
0a0871e
to
2754061
Compare
I started tracking column names in view deps so they could be printed out using TestBuilder and moved the tests over to use TestBuilder to show the columns, let me know what you think about this, I was also unclear on how to remove the map and use TableIDs. Also addressed Rebecca's comments and added the suggested tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @RichardJCai and @rytaft)
pkg/sql/opt/view_dependencies.go, line 31 at r1 (raw file):
Previously, RichardJCai (Richard Cai) wrote…
I started tracking column names in view deps so they could be printed out using TestBuilder and moved the tests over to use TestBuilder to show the columns, let me know what you think about this, I was also unclear on how to remove the map and use TableIDs.
Also addressed Rebecca's comments and added the suggested tests.
I see, we don't have a TableID handy. It's fine to keep the map, just add a comment explaining what it is (and make it clear that it holds all columns of the data source, not just the used ones).
We should be able to get the names using DataSource.(cat.Table).Column(ord).ColName()
or DataSource.(cat.View).ColumnName(ord)
(this could be a ViewDeps method) instead of storing them separately.
pkg/sql/opt/view_dependencies.go, line 27 at r2 (raw file):
// ColumnOrdinals is the set of column ordinals that are referenced by the // view for this table. In most cases, this consists of all "public" columns
This comment needs updating.
pkg/sql/opt/memo/expr_format.go, line 556 at r2 (raw file):
} if !dep.ColumnOrdinals.Empty() { fmt.Fprintf(f.Buffer, " [column ordinals: %s]", dep.ColumnOrdinals)
I don't think we need to print both the ordinals and the names. If we have names, we should use them.
pkg/sql/opt/optbuilder/builder.go, line 342 at r2 (raw file):
} // TrackReferencedColumnForViews is used to add a column to a views dependencies.
[nit] to the view dependencies
pkg/sql/opt/optbuilder/testdata/create_view, line 93 at r2 (raw file):
└── dependencies ├── ab [column ordinals: (0)] [column names: [a]] ├── ab
does this mean we don't depend on any columns? maybe change the formatter to print that explicitly somehow (e.g. ab [no columns]
).
pkg/sql/opt/optbuilder/testdata/create_view, line 106 at r2 (raw file):
error (42601): CREATE VIEW specifies 1 column name, but data source has 2 columns build
Why did you remove this comment?
2754061
to
3d8d5d4
Compare
Addressed these comments. Also regarding the [no columns] I added this in but theres a small difference where column dependencies aren't tracked when selecting from a view, ie view dependency on another view. I think this makes sense since adding a dependency on the view itself but not the view's columns makes sense. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @RichardJCai and @rytaft)
pkg/sql/opt/optbuilder/testdata/create_view, line 106 at r2 (raw file):
Previously, RichardJCai (Richard Cai) wrote…
Addressed these comments.
Also regarding the [no columns] I added this in but theres a small difference where column dependencies aren't tracked when selecting from a view, ie view dependency on another view. I think this makes sense since adding a dependency on the view itself but not the view's columns makes sense.
Sounds like something we will need to fix at some point too. We can make GetColumnNames
return an ok
flag and we would only show "no columns" if it returned ok=true and empty list.
I'm not sure we would have to "fix" it right? Since we don't really depend on another view's columns? I don't think we can change a views column set (not 100% sure), but even renaming view dependent columns in Postgres doesn't change the column set of the view. But yeah I can add that check where it doesn't show columns if the dependency is another view. |
3d8d5d4
to
dd82d5f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 8 of 8 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde and @RichardJCai)
pkg/sql/opt/view_dependencies.go, line 31 at r3 (raw file):
ColumnOrdinals util.FastIntSet ColumnIDToOrd map[ColumnID]int
Still need a comment for this
pkg/sql/opt/view_dependencies.go, line 39 at r3 (raw file):
} // GetColumnNames returns a sorted list of the name of the column dependencies
[nit] name -> names
pkg/sql/opt/optbuilder/testdata/create_view, line 72 at r3 (raw file):
└── dependencies ├── av └── ab [column names: [a]]
I would simplify this formatting to look like this:
└── ab [columns: a]
pkg/sql/opt/optbuilder/testdata/create_view, line 93 at r3 (raw file):
└── dependencies ├── ab [column names: [a]] ├── ab [no columns]
Will it be a problem that we have two different dependencies listed for ab
with different columns?
pkg/sql/opt/optbuilder/testdata/create_view, line 325 at r3 (raw file):
├── columns: sum:5 └── dependencies └── abc [column names: [a b]]
Where's the dependency on c?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 6 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde and @RichardJCai)
dd82d5f
to
c8dfb20
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde and @rytaft)
pkg/sql/opt/view_dependencies.go, line 31 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Still need a comment for this
Added this comment.
pkg/sql/opt/view_dependencies.go, line 39 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
[nit] name -> names
Done.
pkg/sql/opt/optbuilder/testdata/create_view, line 72 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
I would simplify this formatting to look like this:
└── ab [columns: a]
Done.
pkg/sql/opt/optbuilder/testdata/create_view, line 93 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Will it be a problem that we have two different dependencies listed for
ab
with different columns?
I don't think it will be a problem, the same table is used as a data source multiple times where different column sets are used.
pkg/sql/opt/optbuilder/testdata/create_view, line 325 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Where's the dependency on c?
Oops missed this check. Fixed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 5 files at r4.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde and @RichardJCai)
pkg/sql/opt/memo/expr_format.go, line 557 at r4 (raw file):
colNames, isTable := dep.GetColumnNames() if len(colNames) > 0 { fmt.Fprintf(f.Buffer, " [column names:")
column names -> columns
(shorter is better given how must is printed in these opt plans)
pkg/sql/opt/optbuilder/window.go, line 410 at r4 (raw file):
col := outScope.findExistingCol(e, false /* allowSideEffects */) if col != nil { b.TrackReferencedColumnForViews(col)
Do you need to call b.TrackReferencedColumnForViews
every time the function findExistingCol
is used? In that case, maybe you should move this call inside that function. I think that would be a bit less error-prone, and also help avoid future errors.
pkg/sql/opt/optbuilder/testdata/create_view, line 93 at r3 (raw file):
Previously, RichardJCai (Richard Cai) wrote…
I don't think it will be a problem, the same table is used as a data source multiple times where different column sets are used.
But have you confirmed (maybe with some logic tests) that those dependencies are respected when you try to drop those columns? Looking at the code in opt_exec_factory.go
it looks like the dependencies are correctly combined for each table, but it would be good to add some tests for future-proofing.
c8dfb20
to
4f8f31f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde and @rytaft)
pkg/sql/opt/memo/expr_format.go, line 557 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
column names -> columns
(shorter is better given how must is printed in these opt plans)
Done.
pkg/sql/opt/optbuilder/window.go, line 410 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Do you need to call
b.TrackReferencedColumnForViews
every time the functionfindExistingCol
is used? In that case, maybe you should move this call inside that function. I think that would be a bit less error-prone, and also help avoid future errors.
Good call, I checked and this made sense to me, I moved the TrackReferencedColumnsForViews call into findExistingCol and added a comment about findExistingCol will add a view dep if a column is found.
pkg/sql/opt/optbuilder/testdata/create_view, line 93 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
But have you confirmed (maybe with some logic tests) that those dependencies are respected when you try to drop those columns? Looking at the code in
opt_exec_factory.go
it looks like the dependencies are correctly combined for each table, but it would be good to add some tests for future-proofing.
Sounds good, I added some logic tests for the case where the table is used as a data source multiple times with different column sources
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 8 of 8 files at r5.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde and @RichardJCai)
pkg/sql/logictest/testdata/logic_test/views, line 693 at r5 (raw file):
# Ensure a table that is referenced multiple times with different column sets # depends on the correct columns. Depended on columns should not be droppable.
[nit] I'd change this to "Ensure a view that contains a table that is referenced multiple times...."
pkg/sql/opt/optbuilder/orderby.go, line 106 at r5 (raw file):
// projection. if col := projectionsScope.findExistingCol( b,
you shouldn't need to pass in b
-- scope already contains builder
as a data member.
4f8f31f
to
c21d18c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde and @rytaft)
pkg/sql/logictest/testdata/logic_test/views, line 693 at r5 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
[nit] I'd change this to "Ensure a view that contains a table that is referenced multiple times...."
Done.
pkg/sql/opt/optbuilder/orderby.go, line 106 at r5 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
you shouldn't need to pass in
b
-- scope already containsbuilder
as a data member.
You're right, updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working through all these changes!
Reviewed 5 of 5 files at r6.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @RaduBerinde and @RichardJCai)
pkg/sql/opt/optbuilder/builder.go, line 345 at r6 (raw file):
// dependencies. This should be called whenever a column reference is made in a // view query. func (b *Builder) TrackReferencedColumnForViews(col *scopeColumn) {
[nit] does this function need to be exported? Seems like we're not using it outside of optbuilder, right?
Fixes cockroachdb#29021 Only create view dependencies on columns if the column is referenced in the view query. Release note (sql change): Views now only create a dependency on a table's column if the column is referenced in the view definition. Previously, all columns were added as a dependency meaning if a table was referenced in a view, all columns regardless of if the column was actually referenced would be added to the view's dependencies.
c21d18c
to
62bf6a2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde and @rytaft)
pkg/sql/opt/optbuilder/builder.go, line 345 at r6 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
[nit] does this function need to be exported? Seems like we're not using it outside of optbuilder, right?
Good point, updated to not export it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 4 files at r7.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde)
bors r=rytaft |
Thanks for the reviews! |
Build succeeded |
Fixes #29021
Only create view dependencies on columns if the column is referenced in the view query.
Note: this does not fix the dependencies of old views.
Release note (sql change): Views now only create a dependency on a table's column if the
column is referenced in the view definition. Previously, all columns were added as a dependency
meaning if a table was referenced in a view, all columns regardless of if the column was actually
referenced would be added to the view's dependencies.