Skip to content

Question: can we collect statistic if the table doesn't collect statistic during creating #15455

@xudong963

Description

@xudong963

I noticed that after creating a table, enabling collect_stat, and scanning the table, we won't get the statistics for the table.

We must enable collect_stat before creating a table, then when scan the table, we can get the statistics.

set datafusion.execution.collect_statistics = true;

CREATE EXTERNAL TABLE t2 (id INT not null, date DATE) STORED AS PARQUET LOCATION './data/' PARTITIONED BY (date) WITH ORDER (id ASC);

INSERT INTO t2 VALUES (4, '2025-03-01'), (3, '2025-3-02'), (2, '2025-03-03'), (1, '2025-03-04');

SELECT * FROM t2 ORDER BY id ASC; // we'll get the statistics
CREATE EXTERNAL TABLE t2 (id INT not null, date DATE) STORED AS PARQUET LOCATION './data/' PARTITIONED BY (date) WITH ORDER (id ASC);

INSERT INTO t2 VALUES (4, '2025-03-01'), (3, '2025-3-02'), (2, '2025-03-03'), (1, '2025-03-04');

set datafusion.execution.collect_statistics = true;

SELECT * FROM t2 ORDER BY id ASC; // we won't get the statistics

Do we have ways to get statistics if we don't enable collect_statistics before creating the table?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions