-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Closed
Labels
enhancementNew feature or requestNew feature or request
Description
Is your feature request related to a problem or challenge?
# make testdata
import pandas as pd
n_rows = 100000
data = {'id': range(1, n_rows + 1)}
df = pd.DataFrame(data)
df.to_parquet('test1.parquet')
df.to_parquet('test2.parquet')
yimo@YideMacBook-Pro PythonProject % datafusion-cli
DataFusion CLI v46.0.1
> select * from 'test1.parquet' as t1 join 'test2.parquet' as t2 on t1.id < t2.id limit 10;
zsh: killed datafusion-cli
Describe the solution you'd like
same sql in postgresql
create table t1
(
id integer
);
create table t2
(
id integer
);
truncate t1;
truncate t2;
insert into t1(select *
from generate_series(0, 100000));
insert into t2(select *
from generate_series(0, 100000));
explain analyse select * from t1 join t2 on t1.id <= t2.id limit 10;
Limit (cost=0.00..0.45 rows=10 width=8) (actual time=0.021..36.926 rows=10 loops=1)
-> Nested Loop (cost=0.00..169452460.14 rows=3765537655 width=8) (actual time=0.021..36.924 rows=10 loops=1)
Join Filter: (t1.id <= t2.id)
Rows Removed by Join Filter: 299997
-> Seq Scan on t2 (cost=0.00..1572.65 rows=112965 width=4) (actual time=0.007..0.011 rows=4 loops=1)
-> Materialize (cost=0.00..1943.02 rows=100001 width=4) (actual time=0.008..6.338 rows=75002 loops=4)
-> Seq Scan on t1 (cost=0.00..1443.01 rows=100001 width=4) (actual time=0.003..4.574 rows=100001 loops=1)
Planning Time: 0.044 ms
Execution Time: 37.334 ms
Describe alternatives you've considered
I think should merge limit and join to one physical plan;
Additional context
No response
2010YOUY01
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request