I noticed that after creating a table, enabling collect_stat, and scanning the table, we won't get the statistics for the table.
We must enable collect_stat before creating a table, then when scan the table, we can get the statistics.
set datafusion.execution.collect_statistics = true;
CREATE EXTERNAL TABLE t2 (id INT not null, date DATE) STORED AS PARQUET LOCATION './data/' PARTITIONED BY (date) WITH ORDER (id ASC);
INSERT INTO t2 VALUES (4, '2025-03-01'), (3, '2025-3-02'), (2, '2025-03-03'), (1, '2025-03-04');
SELECT * FROM t2 ORDER BY id ASC; // we'll get the statistics
CREATE EXTERNAL TABLE t2 (id INT not null, date DATE) STORED AS PARQUET LOCATION './data/' PARTITIONED BY (date) WITH ORDER (id ASC);
INSERT INTO t2 VALUES (4, '2025-03-01'), (3, '2025-3-02'), (2, '2025-03-03'), (1, '2025-03-04');
set datafusion.execution.collect_statistics = true;
SELECT * FROM t2 ORDER BY id ASC; // we won't get the statistics
Do we have ways to get statistics if we don't enable collect_statistics before creating the table?