Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regression on timestemp range join. #9755

Closed
my-vegetable-has-exploded opened this issue Mar 23, 2024 · 3 comments
Closed

Performance regression on timestemp range join. #9755

my-vegetable-has-exploded opened this issue Mar 23, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@my-vegetable-has-exploded
Copy link
Contributor

Describe the bug

For query in #8393, datafusion-cli v34 take 10s to finish the query. But it take 40s for datafusion-cli v36 to finish.

To Reproduce

Mostly same with #8393.

produce data.

CREATE
OR REPLACE TABLE pricing AS
SELECT
    t,
    RANDOM() as v
FROM
    range(
        '2022-01-01' :: TIMESTAMP,
        '2023-01-01' :: TIMESTAMP,
        INTERVAL 1 DAY
    ) ts(t);

COPY pricing to 'pricing.parquet' (format 'parquet');

CREATE
OR REPLACE TABLE timestamps AS
SELECT
    t
FROM
    range(
        '2022-01-01' :: TIMESTAMP,
        '2023-01-01' :: TIMESTAMP,
        INTERVAL 10 SECOND
    ) ts(t);

COPY timestamps to 'timestamps.parquet' (format 'parquet');

run query.

EXPLAIN ANALYZE WITH pricing_state AS (
    SELECT
        t as valid_from,
        COALESCE(
            LEAD(t, 1) OVER (
                ORDER BY
                    t
            ),
            '2077-12-31'
        ) as valid_to,
        v
    FROM
        'pricing.parquet'
)

SELECT
    t.t,
    p.v
FROM
	'timestamps.parquet' t
    LEFT JOIN pricing_state p ON t.t BETWEEN p.valid_from
    AND p.valid_to;

Expected behavior

No response

Additional context

And flamegraph is quite different.

v36
图片

v34
图片

see https://gist.github.com/my-vegetable-has-exploded/ba16c59c96c81fa20f52b56f254ea8be for more information.

@my-vegetable-has-exploded my-vegetable-has-exploded added the bug Something isn't working label Mar 23, 2024
@alamb
Copy link
Contributor

alamb commented Mar 24, 2024

Did the join order change?

We could find via looking at the explain plan

@Omega359
Copy link
Contributor

Note that the first statement in this ticket does not run out of the box in main:

CREATE
OR REPLACE TABLE pricing AS
SELECT
    t,
    RANDOM() as v
FROM
    range(
        '2022-01-01' :: TIMESTAMP,
        '2023-01-01' :: TIMESTAMP,
        INTERVAL 1 DAY
    ) ts(t);
Error during planning: table function 'range' not found

I'll file a new issue for this shortly.

@my-vegetable-has-exploded
Copy link
Contributor Author

Sorry for late response. I found that I mistakenly used the debug version of the cli. In fact, the new version will work better.
Sorry for it.😭

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants