Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use PostgreSQL nodes for scanning tables #477

Merged
merged 53 commits into from
Jan 10, 2025
Merged

Use PostgreSQL nodes for scanning tables #477

merged 53 commits into from
Jan 10, 2025

Conversation

mkaruza
Copy link
Collaborator

@mkaruza mkaruza commented Dec 4, 2024

Idea is to reconstruct query based on duckdb filtering information, for each table and use that information to plan postgres execution. This plan will, potentially exeucute with parallel workers. If no workers are available we will scan this local to thread.

This approach has advantage that it also will support all other scan nodes that are available (and postgresql thinks are best - index/index only/bitmap scan, also partitioned tables should be possible)

Fixes #243

@mkaruza mkaruza requested review from Y-- and JelteF December 4, 2024 13:29
@mkaruza mkaruza marked this pull request as draft December 4, 2024 13:29
Copy link
Collaborator

@JelteF JelteF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial round of review. I really like the direction of the code. One general feedback is that this definitely needs a bunch of tests too.

src/pgduckdb.cpp Outdated Show resolved Hide resolved
src/pgduckdb_duckdb.cpp Outdated Show resolved Hide resolved
src/pgduckdb_types.cpp Show resolved Hide resolved
src/scan/postgres_scan.cpp Outdated Show resolved Hide resolved
src/scan/postgres_scan.cpp Outdated Show resolved Hide resolved
src/scan/postgres_table_reader.cpp Outdated Show resolved Hide resolved
src/scan/postgres_table_reader.cpp Outdated Show resolved Hide resolved
include/pgduckdb/scan/postgres_table_reader.hpp Outdated Show resolved Hide resolved
src/scan/postgres_table_reader.cpp Outdated Show resolved Hide resolved
src/scan/postgres_table_reader.cpp Outdated Show resolved Hide resolved
@lepsa0321
Copy link

I've been looking into the PR you submitted, and I'm curious about its impact on index-scan functionality. Could you please clarify if this PR addresses the issue where index-scans were not being used when expected?

If it does resolve this problem, could you also provide guidance on how to enforce the use of an index-scan? Specifically:

Are there any configuration settings or parameters that can be adjusted to prefer index-scans over other scan methods?
Is there a way to hint or force the query planner to choose an index-scan for certain queries?
Any additional steps or considerations we should be aware of when trying to ensure that index-scans are utilized?
Your insights would be greatly appreciated as they will help us better understand the improvements and how to leverage them effectively in our environment.

Thank you

@mkaruza mkaruza marked this pull request as ready for review December 14, 2024 10:28
@mkaruza mkaruza force-pushed the postgres-native-scan branch from de068c6 to 03b1e3a Compare December 14, 2024 10:33
@mkaruza
Copy link
Collaborator Author

mkaruza commented Dec 14, 2024

I've been looking into the PR you submitted, and I'm curious about its impact on index-scan functionality. Could you please clarify if this PR addresses the issue where index-scans were not being used when expected?

If it does resolve this problem, could you also provide guidance on how to enforce the use of an index-scan? Specifically:

Are there any configuration settings or parameters that can be adjusted to prefer index-scans over other scan methods? Is there a way to hint or force the query planner to choose an index-scan for certain queries? Any additional steps or considerations we should be aware of when trying to ensure that index-scans are utilized? Your insights would be greatly appreciated as they will help us better understand the improvements and how to leverage them effectively in our environment.

Thank you

Hi, i have added regression test that shows that Bitmap/Index/IndexOnly scan is used when appropriate - hopefully that will help you answer your question.

@lepsa0321
Copy link

I've been looking into the PR you submitted, and I'm curious about its impact on index-scan functionality. Could you please clarify if this PR addresses the issue where index-scans were not being used when expected?
If it does resolve this problem, could you also provide guidance on how to enforce the use of an index-scan? Specifically:
Are there any configuration settings or parameters that can be adjusted to prefer index-scans over other scan methods? Is there a way to hint or force the query planner to choose an index-scan for certain queries? Any additional steps or considerations we should be aware of when trying to ensure that index-scans are utilized? Your insights would be greatly appreciated as they will help us better understand the improvements and how to leverage them effectively in our environment.
Thank you

Hi, i have added regression test that shows that Bitmap/Index/IndexOnly scan is used when appropriate - hopefully that will help you answer your question.

Thank you very much~

include/pgduckdb/catalog/pgduckdb_table.hpp Outdated Show resolved Hide resolved
include/pgduckdb/catalog/pgduckdb_table.hpp Show resolved Hide resolved
include/pgduckdb/pgduckdb_process_latch.hpp Outdated Show resolved Hide resolved
include/pgduckdb/utility/rename_ruleutils.h Outdated Show resolved Hide resolved
src/scan/postgres_table_reader.cpp Outdated Show resolved Hide resolved
include/pgduckdb/scan/postgres_scan.hpp Show resolved Hide resolved
src/scan/postgres_scan.cpp Outdated Show resolved Hide resolved
src/scan/postgres_scan.cpp Outdated Show resolved Hide resolved
include/pgduckdb/scan/postgres_scan.hpp Outdated Show resolved Hide resolved
src/scan/postgres_table_reader.cpp Outdated Show resolved Hide resolved
src/pgduckdb_hooks.cpp Outdated Show resolved Hide resolved
@mkaruza mkaruza force-pushed the postgres-native-scan branch from 6a726f9 to 570f34f Compare January 7, 2025 15:31
# Conflicts:
#	include/pgduckdb/pgduckdb_guc.h

# Conflicts:
#	src/scan/heap_reader.cpp
#	src/scan/postgres_seq_scan.cpp
Added WaitLatch that can be used with multiple threads in process.
JelteF and others added 13 commits January 9, 2025 16:00
At the very least SlotGetAllAttrs should should be called while holding
the lock.
If GlobaclLock is held for duration of populating output vector we don't
need any special WaitLatch wrappers and we can call directly this
function.
We should now support all types with native postgres scan
* Setting this variable to `0` disables parallelization
* Cardinality of table less than 65536 use only single parallel process
* Higher cardinality will try to use `max_workers_per_postgres_scan`
  parallel processes with upper limit of `max_parallel_workers`
@JelteF JelteF force-pushed the postgres-native-scan branch from 7b0685b to e0135e8 Compare January 9, 2025 15:00
@JelteF JelteF merged commit 5a69a04 into main Jan 10, 2025
5 checks passed
@JelteF JelteF deleted the postgres-native-scan branch January 10, 2025 14:44
ritwizsinha pushed a commit to ritwizsinha/pg_duckdb that referenced this pull request Jan 11, 2025
Idea is to reconstruct query based on duckdb filtering information, for
each table and use that information to plan postgres execution. This
plan will, potentially exeucute with parallel workers. If no workers are
available we will scan this local to thread.

This approach has advantage that it also will support all other scan
nodes that are available (and postgresql thinks are best - index/index
only/bitmap scan, also partitioned tables should be possible)

Fixes duckdb#243

---------

Co-authored-by: Yves <yves@motherduck.com>
Co-authored-by: Jelte Fennema-Nio <github-tech@jeltef.nl>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support index scans and index only scans on postgres data
4 participants