-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-16311][SQL] Metadata refresh should work on temporary views #14009
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[SPARK-16311][SQL] Improve metadata refresh
| */ | ||
| def invalidateTable(name: TableIdentifier): Unit = { /* no-op */ } | ||
| def refreshTable(name: TableIdentifier): Unit = { | ||
| // Go through temporary tables and invalidate them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the test case of HiveMetadataCacheSuite.scala, users might refresh the base table by using spark.catalog.refreshTable("view_table"). Normally, they do not specify the current database name. Then, its database name is empty. Thus, this table will be treated as a temporary table. This comment might need a correction.
|
LGTM except a minor comment. |
|
Test build #61599 has finished for PR 14009 at commit
|
|
Test build #61600 has finished for PR 14009 at commit
|
| val newCount = sql("select count(*) from view_refresh").first().getLong(0) | ||
| assert(newCount > 0 && newCount < 100) | ||
| } | ||
| }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This style is pretty weird...
|
LGTM except for a minor styling issue. |
|
Thanks - I fixed the two comments. Going to merge it in master/2.0. |
## What changes were proposed in this pull request? This patch fixes the bug that the refresh command does not work on temporary views. This patch is based on #13989, but removes the public Dataset.refresh() API as well as improved test coverage. Note that I actually think the public refresh() API is very useful. We can in the future implement it by also invalidating the lazy vals in QueryExecution (or alternatively just create a new QueryExecution). ## How was this patch tested? Re-enabled a previously ignored test, and added a new test suite for Hive testing behavior of temporary views against MetastoreRelation. Author: Reynold Xin <rxin@databricks.com> Author: petermaxlee <petermaxlee@gmail.com> Closes #14009 from rxin/SPARK-16311. (cherry picked from commit 16a2a7d) Signed-off-by: Reynold Xin <rxin@databricks.com>
|
Test build #61773 has finished for PR 14009 at commit
|
|
Hi, @rxin . |
|
Thank you for fast fix! |
What changes were proposed in this pull request?
This patch fixes the bug that the refresh command does not work on temporary views. This patch is based on #13989, but removes the public Dataset.refresh() API as well as improved test coverage.
Note that I actually think the public refresh() API is very useful. We can in the future implement it by also invalidating the lazy vals in QueryExecution (or alternatively just create a new QueryExecution).
How was this patch tested?
Re-enabled a previously ignored test, and added a new test suite for Hive testing behavior of temporary views against MetastoreRelation.