Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AQUMV]Enable answer query using Materialized View for external table. #702

Merged
merged 2 commits into from
Dec 4, 2024

Conversation

avamingli
Copy link
Contributor

@avamingli avamingli commented Nov 6, 2024

Allow answer query using materialized views which have external or foreign tables.
Since we don't know if the data is up to date of externel table outside CBDB, introduce a new GUC:

aqumv_allow_foreign_table

Let user decide if they want to use matview instead of query on external tables.

create readable external table aqumv_ext_r(id int) 
location ('demoprot://aqumvtextfile.txt') format 'text';
create materialized view aqumv_ext_mv as
  select * from aqumv_ext_r;

explain (costs off, verbose)
select * from aqumv_ext_r;
               QUERY PLAN
------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3)
   Output: id
   ->  Seq Scan on aqumv.aqumv_ext_mv
         Output: id
 Optimizer: Postgres query optimizer

Index could also be used if there were on matviews.

create index on aqumv_ext_mv(id);
explain (costs off, verbose)
select * from aqumv_ext_r where id = 5;
                            QUERY PLAN
----------------------------------------------------------------------
 Gather Motion 1:1  (slice1; segments: 1)
   Output: id
   ->  Index Only Scan using aqumv_ext_mv_id_idx on aqumv.aqumv_ext_mv
         Output: id
         Index Cond: (aqumv_ext_mv.id = 5)
 Optimizer: Postgres query optimizer

fix #ISSUE_Number


Change logs

Describe your change clearly, including what problem is being solved or what feature is being added.

If it has some breaking backward or forward compatibility, please clary.

Why are the changes needed?

Describe why the changes are necessary.

Does this PR introduce any user-facing change?

If yes, please clarify the previous behavior and the change this PR proposes.

How was this patch tested?

Please detail how the changes were tested, including manual tests and any relevant unit or integration tests.

Contributor's Checklist

Here are some reminders and checklists before/when submitting your pull request, please check them:

  • Make sure your Pull Request has a clear title and commit message. You can take git-commit template as a reference.
  • Sign the Contributor License Agreement as prompted for your first-time contribution(One-time setup).
  • Learn the coding contribution guide, including our code conventions, workflow and more.
  • List your communication in the GitHub Issues or Discussions (if has or needed).
  • Document changes.
  • Add tests for the change
  • Pass make installcheck
  • Pass make -C src/test installcheck-cbdb-parallel
  • Feel free to request cloudberrydb/dev team for review and approval when your PR is ready🥳

@avamingli
Copy link
Contributor Author

see more details in #693

my-ship-it
my-ship-it previously approved these changes Nov 6, 2024
@reshke
Copy link
Contributor

reshke commented Nov 6, 2024

aqumv_allow_foreign_table

PostgreSQL-style GUC would have name like enable_XXX, huh? So, maybe enable_aqumv_foreign_table

@avamingli
Copy link
Contributor Author

aqumv_allow_foreign_table

PostgreSQL-style GUC would have name like enable_XXX, huh? So, maybe enable_aqumv_foreign_table

Not sure.. I follow this one: allow_system_table_mods

@avamingli
Copy link
Contributor Author

@my-ship-it
As we have refresh fast path at #682, but for external tables we don't know the status(always up to date in gp_maview_aux).
This will make REFRESH command fail to do the real thing from external data.

So, we should skip fast path for the views have external tables. That need catalog change to record if a view has external tables outside CBDB.

@my-ship-it
Copy link
Contributor

@my-ship-it As we have refresh fast path at #682, but for external tables we don't know the status(always up to date in gp_maview_aux). This will make REFRESH command fail to do the real thing from external data.

So, we should skip fast path for the views have external tables. That need catalog change to record if a view has external tables outside CBDB.

Yes, I think it is a reasonable behavior. Thanks!

@avamingli
Copy link
Contributor Author

@my-ship-it As we have refresh fast path at #682, but for external tables we don't know the status(always up to date in gp_maview_aux). This will make REFRESH command fail to do the real thing from external data.
So, we should skip fast path for the views have external tables. That need catalog change to record if a view has external tables outside CBDB.

Yes, I think it is a reasonable behavior. Thanks!

Done in 273dae1,
add has_foreign to identify a mv's Query has foreign tables.
If true, we will avoid Refresh fast path no matter what the datastatus of mv is.
This needs a catalog change, create a label for it.

my-ship-it
my-ship-it previously approved these changes Nov 29, 2024
Allow answer query using materialized views which have external
or foreign tables.
Since we don't know if the data is up to date of externel table
outside CBDB, introduce a new GUC:

  aqumv_allow_foreign_table

Let user decide if they want to use matview instead of query on
external tables.

create materialized view aqumv_ext_mv as
  select * from aqumv_ext_r;

explain (costs off, verbose)
select * from aqumv_ext_r;
               QUERY PLAN
------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3)
   Output: id
   ->  Seq Scan on aqumv.aqumv_ext_mv
         Output: id
 Optimizer: Postgres query optimizer

Index could also be used if there were on matviews.
create index on aqumv_ext_mv(id);
explain (costs off, verbose)
select * from aqumv_ext_r where id = 5;
                            QUERY PLAN
----------------------------------------------------------------------
 Gather Motion 1:1  (slice1; segments: 1)
   Output: id
   ->  Index Only Scan using aqumv_ext_mv_id_idx on aqumv.aqumv_ext_mv
         Output: id
         Index Cond: (aqumv_ext_mv.id = 5)
 Optimizer: Postgres query optimizer
For matview has foreign tables, we never know the data
status.
Since we usually create a matview has read external tables,
the initial data status is Up-to-date.
That will make REFRESH fail to do the real thing with
fast path feature, the worst case is we never try to get
the external data during REFRESH.

Authored-by: Zhang Mingli avamingli@gmail.com
@avamingli
Copy link
Contributor Author

Rebased to resolve conflicts.

Copy link
Member

@yjhjstz yjhjstz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@my-ship-it my-ship-it merged commit 47f626c into apache:main Dec 4, 2024
11 checks passed
@avamingli avamingli deleted the aqumv_ext branch December 4, 2024 08:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants