Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

index out of range error from datafusion_row::write::write_field #2910

Closed
thomas-k-cameron opened this issue Jul 14, 2022 · 8 comments · Fixed by #2968
Closed

index out of range error from datafusion_row::write::write_field #2910

thomas-k-cameron opened this issue Jul 14, 2022 · 8 comments · Fixed by #2968
Labels
bug Something isn't working

Comments

@thomas-k-cameron
Copy link
Contributor

thomas-k-cameron commented Jul 14, 2022

Describe the bug
index out of range error coming from datafusion_row::write::write_field

To Reproduce
It happened when I ran a query against a proprietary data set.
SQL is,

SELECT 
    tag(column_1),
    COUNT(column_1)
FROM 
    csv
GROUP BY
    column_1

I haven't been able to reproduce it without that data set.

Expected behavior
It does not panic

Additional context
cargo 1.62.0 (a748cf5a3 2022-06-08)

thread 'tokio-runtime-worker' panicked at 'range end index 153 out of range for slice of length 152', library/core/src/slice/index.rs:73:5
stack backtrace:
   0: rust_begin_unwind
             at /rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at /rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/core/src/panicking.rs:142:14
   2: core::slice::index::slice_end_index_len_fail_rt
             at /rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/core/src/slice/index.rs:73:5
   3: core::ops::function::FnOnce::call_once
             at /rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/core/src/ops/function.rs:248:5
   4: core::intrinsics::const_eval_select
             at /rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/core/src/intrinsics.rs:2372:5
   5: core::slice::index::slice_end_index_len_fail
             at /rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/core/src/slice/index.rs:67:9
   6: datafusion_row::writer::write_field
   7: datafusion_row::writer::write_row
   8: <datafusion::physical_plan::aggregates::row_hash::GroupedHashAggregateStreamV2 as futures_core::stream::Stream>::poll_next
   9: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
  10: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
  11: tokio::runtime::task::harness::Harness<T,S>::poll
  12: std::thread::local::LocalKey<T>::with
  13: tokio::runtime::thread_pool::worker::Context::run_task
  14: tokio::runtime::thread_pool::worker::Context::run
  15: tokio::macros::scoped_tls::ScopedKey<T>::set
  16: tokio::runtime::thread_pool::worker::run
  17: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll
  18: tokio::runtime::task::harness::Harness<T,S>::poll
  19: tokio::runtime::blocking::pool::Inner::run
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ArrowError(ExternalError(Execution("Join Error: task 15 panicked")))', src/main.rs:47:46
stack backtrace:
   0: rust_begin_unwind
             at /rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at /rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/core/src/panicking.rs:142:14
   2: core::result::unwrap_failed
             at /rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/core/src/result.rs:1785:5
   3: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
   4: std::thread::local::LocalKey<T>::with
   5: tokio::park::thread::CachedParkThread::block_on
   6: tokio::runtime::Runtime::block_on
   7: my_app::main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
thread 'tokio-runtime-worker' panicked at 'range end index 128 out of range for slice of length 120', library/core/src/slice/index.rs:73:5
stack backtrace:
   0: rust_begin_unwind
             at /rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at /rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/core/src/panicking.rs:142:14
   2: core::slice::index::slice_end_index_len_fail_rt
             at /rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/core/src/slice/index.rs:73:5
   3: core::ops::function::FnOnce::call_once
             at /rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/core/src/ops/function.rs:248:5
   4: core::intrinsics::const_eval_select
             at /rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/core/src/intrinsics.rs:2372:5
   5: core::slice::index::slice_end_index_len_fail
             at /rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/core/src/slice/index.rs:67:9
   6: datafusion_row::writer::write_field
   7: datafusion_row::writer::write_row
   8: <datafusion::physical_plan::aggregates::row_hash::GroupedHashAggregateStreamV2 as futures_core::stream::Stream>::poll_next
   9: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
  10: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
  11: tokio::runtime::task::harness::Harness<T,S>::poll
  12: std::thread::local::LocalKey<T>::with
  13: tokio::runtime::thread_pool::worker::Context::run_task
  14: tokio::runtime::thread_pool::worker::Context::run
  15: tokio::macros::scoped_tls::ScopedKey<T>::set
  16: tokio::runtime::thread_pool::worker::run
  17: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll
  18: tokio::runtime::task::harness::Harness<T,S>::poll
  19: tokio::runtime::blocking::pool::Inner::run
@thomas-k-cameron thomas-k-cameron added the bug Something isn't working label Jul 14, 2022
@comphead
Copy link
Contributor

Hi @thomas-k-cameron what is tag function?

@thomas-k-cameron
Copy link
Contributor Author

@comphead
It's my own UDF.
It still generates the same error even without it.

I was able to reproduce the error without the data set I was talking about.
I uploaded the code here.

https://github.com/thomas-k-cameron/df_rs_error_find

@comphead
Copy link
Contributor

Thanks for providing data. I was able to reproduce it locally. Looks like #2877 related, but without even having join.

@thomas-k-cameron
Copy link
Contributor Author

Glad to hear that.
Is there anything else that I can do?

@comphead
Copy link
Contributor

Glad to hear that. Is there anything else that I can do?

Nothing more needed for now, I'll try to investigate why its happening. Thanks for reporting such weird thing.
My vision the problem somewhere in hasher, which used both in group by and hash joins.

@comphead
Copy link
Contributor

Looks like there are couple of issues in row writer. The interesting thing they occur in very specific scenarios. Still investigating.

@thomas-k-cameron
Copy link
Contributor Author

Lovely. Thanks a lot!

@comphead
Copy link
Contributor

comphead commented Jul 26, 2022

Well, maybe it happens before writer, because the same data from CSV and constructed manually works differently. Manual works, CSV failed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants