You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I added an external table and mistakenly gave the wrong name for one of its partition columns. The DDL operation returned successfully, and some basic queries on the table were successful, but others resulted in panics.
To Reproduce
Using the example nyctaxi data set, but typing monht instead of month for one of the partition columns does not yield an error when loading the table, nor when counting its records:
$ RUST_BACKTRACE=1 python3
Python 3.11.8 | packaged by conda-forge | (main, Feb 16 2024, 20:53:32) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import datafusion as df
>>> ctx = df.SessionContext()
>>> ctx.sql("""
... CREATE EXTERNAL TABLE taxi
... STORED AS PARQUET
... PARTITIONED BY (year, monht)
... LOCATION '/path/to/nyctaxi'
... """)
DataFrame()
++
++
>>> ctx.sql("SELECT COUNT(*) FROM taxi")
DataFrame()
+----------+
| COUNT(*) |
+----------+
| 2964624 |
+----------+
Instead, the first error is a panic while reading data from the table:
>>> ctx.sql("SELECT * FROM taxi")
thread 'tokio-runtime-worker' panicked at /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/datafusion-36.0.0/src/datasource/physical_plan/file_scan_config.rs:248:54:
index out of bounds: the len is 0 but the index is 0
stack backtrace:
thread 'tokio-runtime-worker' panicked at /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/datafusion-36.0.0/src/datasource/physical_plan/file_scan_config.rs:248:54:
index out of bounds: the len is 0 but the index is 0
thread 'tokio-runtime-worker' panicked at /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/datafusion-36.0.0/src/datasource/physical_plan/file_scan_config.rs:248:54:
index out of bounds: the len is 0 but the index is 0
0: rust_begin_unwind
at /rustc/5119208fd78a77547c705d1695428c88d6791263/library/std/src/panicking.rs:645:5
1: core::panicking::panic_fmt
at /rustc/5119208fd78a77547c705d1695428c88d6791263/library/core/src/panicking.rs:72:14
2: core::panicking::panic_bounds_check
at /rustc/5119208fd78a77547c705d1695428c88d6791263/library/core/src/panicking.rs:208:5
3: datafusion::datasource::physical_plan::file_scan_config::PartitionColumnProjector::project
4: <datafusion::datasource::physical_plan::file_stream::FileStream<F> as futures_core::stream::Stream>::poll_next
5: datafusion_physical_plan::stream::RecordBatchReceiverStreamBuilder::run_input::{{closure}}
6: tokio::runtime::task::raw::poll
7: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
8: tokio::runtime::task::raw::poll
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
stack backtrace:
0: rust_begin_unwind
at /rustc/5119208fd78a77547c705d1695428c88d6791263/library/std/src/panicking.rs:645:5
1: core::panicking::panic_fmt
at /rustc/5119208fd78a77547c705d1695428c88d6791263/library/core/src/panicking.rs:72:14
2: core::panicking::panic_bounds_check
at /rustc/5119208fd78a77547c705d1695428c88d6791263/library/core/src/panicking.rs:208:5
3: datafusion::datasource::physical_plan::file_scan_config::PartitionColumnProjector::project
4: <datafusion::datasource::physical_plan::file_stream::FileStream<F> as futures_core::stream::Stream>::poll_next
5: datafusion_physical_plan::stream::RecordBatchReceiverStreamBuilder::run_input::{{closure}}
6: tokio::runtime::task::raw::poll
7: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
8: tokio::runtime::task::raw::poll
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Traceback (most recent call last):
stack backtrace:
File "<stdin>", line 1, in <module>
pyo3_runtime.PanicException: index out of bounds: the len is 0 but the index is 0
0: rust_begin_unwind
at /rustc/5119208fd78a77547c705d1695428c88d6791263/library/std/src/panicking.rs:645:5
>>> 1: core::panicking::panic_fmt
at /rustc/5119208fd78a77547c705d1695428c88d6791263/library/core/src/panicking.rs:72:14
2: core::panicking::panic_bounds_check
at /rustc/5119208fd78a77547c705d1695428c88d6791263/library/core/src/panicking.rs:208:5
3: datafusion::datasource::physical_plan::file_scan_config::PartitionColumnProjector::project
4: <datafusion::datasource::physical_plan::file_stream::FileStream<F> as futures_core::stream::Stream>::poll_next
5: datafusion_physical_plan::stream::RecordBatchReceiverStreamBuilder::run_input::{{closure}}
6: tokio::runtime::task::raw::poll
7: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
8: tokio::runtime::task::raw::poll
Expected behavior
Ideally this misconfiguration would result in an error message when creating the table (as I understand, some initial scan of the filesystem or object store is performed as part of this DDL operation, and so there is an opportunity to validate the supplied partition columns), or barring that I would at least expect an error message during queries, and not a panic.
Additional context
Using datafusion 36.0.0 module for Python 3.11.
The text was updated successfully, but these errors were encountered:
Thank you for the report! @MohamedAbdeen21 noticed a similar issue and in #9912 added validation which should raise an error in this scenario during the CREATE EXTERNAL TABLE statement execution. This feature should be included in the 38.0.0 release and is available now on the main branch.
Belatedly confirmed that this issue is now resolved and using the wrong partition column name yields an error Exception: DataFusion error: Plan("Inferred partitions to be ...")
Describe the bug
I added an external table and mistakenly gave the wrong name for one of its partition columns. The DDL operation returned successfully, and some basic queries on the table were successful, but others resulted in panics.
To Reproduce
Using the example
nyctaxi
data set, but typingmonht
instead ofmonth
for one of the partition columns does not yield an error when loading the table, nor when counting its records:Instead, the first error is a panic while reading data from the table:
Expected behavior
Ideally this misconfiguration would result in an error message when creating the table (as I understand, some initial scan of the filesystem or object store is performed as part of this DDL operation, and so there is an opportunity to validate the supplied partition columns), or barring that I would at least expect an error message during queries, and not a panic.
Additional context
Using datafusion 36.0.0 module for Python 3.11.
The text was updated successfully, but these errors were encountered: