Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

streaming agg panics on error (e.g. overflow) #11256

Closed
xxchan opened this issue Jul 26, 2023 · 7 comments
Closed

streaming agg panics on error (e.g. overflow) #11256

xxchan opened this issue Jul 26, 2023 · 7 comments
Assignees
Labels
no-issue-activity type/bug Something isn't working
Milestone

Comments

@xxchan
Copy link
Member

xxchan commented Jul 26, 2023

Describe the bug

No response

Error message/log

2023-07-26T21:02:30.816376+02:00  INFO risingwave_frontend::scheduler::snapshot: unpin snapshot with RPC min_epoch=4794807074488320
2023-07-26T21:02:33.884322+02:00 DEBUG local_execute{query_id="97433ab6-ea48-4b13-8f39-fa533e86e38a" epoch=BatchQueryEpoch { epoch: Some(Current(4794807270965248)) }}: risingwave_frontend::scheduler::local: Starting to run query self.query.query_id=QueryId:97433ab6-ea48-4b13-8f39-fa533e86e38a self.sql=""
2023-07-26T21:02:33.88455+02:00 DEBUG local_execute{query_id="97433ab6-ea48-4b13-8f39-fa533e86e38a" epoch=BatchQueryEpoch { epoch: Some(Current(4794807270965248)) }}: risingwave_frontend::scheduler::local: Local execution mode converts a plan with two stages
2023-07-26T21:02:33.887459+02:00 DEBUG risingwave_batch::task::task_execution: Task TaskId { task_id: 0, stage_id: 1, query_id: "97433ab6-ea48-4b13-8f39-fa533e86e38a" } state changed to Running
2023-07-26T21:02:33.888094+02:00 DEBUG batch_execute{task_id=0 stage_id=1 query_id="97433ab6-ea48-4b13-8f39-fa533e86e38a"}: risingwave_batch::task::task_execution: Batch task TaskId { task_id: 0, stage_id: 1, query_id: "97433ab6-ea48-4b13-8f39-fa533e86e38a" } finished successfully.
2023-07-26T21:02:33.888142+02:00 DEBUG batch_execute{task_id=0 stage_id=1 query_id="97433ab6-ea48-4b13-8f39-fa533e86e38a"}: risingwave_batch::task::task_execution: Task TaskId { task_id: 0, stage_id: 1, query_id: "97433ab6-ea48-4b13-8f39-fa533e86e38a" } state changed to Finished
2023-07-26T21:02:38.6515+02:00 ERROR risingwave_stream::task::stream_manager: actor exit actor=20 error=failed to send message to actor 17: Barrier(Barrier { epoch: EpochPair { curr: 4794807467507712, prev: 4794807401971712 }, mutation: None, kind: Barrier, tracing_context: TracingContext(Context { entries: 0 }), passed_actors: [3, 20] })
2023-07-26T21:02:40.877175+02:00  INFO risingwave_frontend::scheduler::snapshot: unpin snapshot with RPC min_epoch=4794807205363712
2023-07-26T21:02:43.141439+02:00 ERROR risingwave_stream::task::stream_manager: actor exit actor=18 error=failed to send message to actor 17: Barrier(Barrier { epoch: EpochPair { curr: 4794807533043712, prev: 4794807467507712 }, mutation: None, kind: Checkpoint, tracing_context: TracingContext(Context { entries: 0 }), passed_actors: [1, 18] })
2023-07-26T21:02:44.650369+02:00 ERROR risingwave_stream::task::stream_manager: actor exit actor=25 error=failed to send message to actor 17: Barrier(Barrier { epoch: EpochPair { curr: 4794807533043712, prev: 4794807467507712 }, mutation: None, kind: Checkpoint, tracing_context: TracingContext(Context { entries: 0 }), passed_actors: [8, 25] })
2023-07-26T21:02:47.699683+02:00 ERROR risingwave_stream::task::stream_manager: actor exit actor=21 error=failed to send message to actor 17: Barrier(Barrier { epoch: EpochPair { curr: 4794807467507712, prev: 4794807401971712 }, mutation: None, kind: Barrier, tracing_context: TracingContext(Context { entries: 0 }), passed_actors: [4, 21] })
2023-07-26T21:02:50.717527+02:00 ERROR risingwave_stream::task::stream_manager: actor exit actor=23 error=failed to send message to actor 17: Barrier(Barrier { epoch: EpochPair { curr: 4794807467507712, prev: 4794807401971712 }, mutation: None, kind: Barrier, tracing_context: TracingContext(Context { entries: 0 }), passed_actors: [6, 23] })
2023-07-26T21:02:56.214871+02:00 ERROR risingwave_stream::task::stream_manager: actor exit actor=24 error=failed to send message to actor 17: Barrier(Barrier { epoch: EpochPair { curr: 4794807467507712, prev: 4794807401971712 }, mutation: None, kind: Barrier, tracing_context: TracingContext(Context { entries: 0 }), passed_actors: [7, 24] })
2023-07-26T21:02:58.751606+02:00 ERROR risingwave_compute::rpc::service::stream_service: failed to collect barrier: Actor 20 exit unexpectedly: failed to send message to actor 17: Barrier(Barrier { epoch: EpochPair { curr: 4794807467507712, prev: 4794807401971712 }, mutation: None, kind: Barrier, tracing_context: TracingContext(Context { entries: 0 }), passed_actors: [3, 20] })
  backtrace of `StreamError`:
   0: std::backtrace_rs::backtrace::libunwind::trace
             at /rustc/f0411ffcebcd7f75ac02ed45feb53ffd07b75398/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
   1: std::backtrace_rs::backtrace::trace_unsynchronized
             at /rustc/f0411ffcebcd7f75ac02ed45feb53ffd07b75398/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2: std::backtrace::Backtrace::create
             at /rustc/f0411ffcebcd7f75ac02ed45feb53ffd07b75398/library/std/src/backtrace.rs:332:13
   3: <risingwave_stream::error::StreamError as core::convert::From<risingwave_stream::error::Inner>>::from
             at ./src/stream/src/error.rs:47:10
   4: <T as core::convert::Into<U>>::into
             at /rustc/f0411ffcebcd7f75ac02ed45feb53ffd07b75398/library/core/src/convert/mod.rs:717:9
   5: <risingwave_stream::error::StreamError as core::convert::From<anyhow::Error>>::from
             at ./src/stream/src/error.rs:110:9
   6: <T as core::convert::Into<U>>::into
             at /rustc/f0411ffcebcd7f75ac02ed45feb53ffd07b75398/library/core/src/convert/mod.rs:717:9
   7: <risingwave_stream::executor::exchange::output::LocalOutput as risingwave_stream::executor::exchange::output::Output>::send::{{closure}}::{{closure}}
             at ./src/stream/src/executor/exchange/output.rs:78:17
   8: core::result::Result<T,E>::map_err
             at /rustc/f0411ffcebcd7f75ac02ed45feb53ffd07b75398/library/core/src/result.rs:828:27
   9: <risingwave_stream::executor::exchange::output::LocalOutput as risingwave_stream::executor::exchange::output::Output>::send::{{closure}}
             at ./src/stream/src/executor/exchange/output.rs:73:9
......

2023-07-26T21:02:46.177167+02:00 ERROR risingwave_stream::task::stream_manager: actor exit actor=17 error=Executor error: Chunk operation error: Numeric out of range
  backtrace of `StreamExecutorError`:
   0: std::backtrace_rs::backtrace::libunwind::trace
             at /rustc/f0411ffcebcd7f75ac02ed45feb53ffd07b75398/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
   1: std::backtrace_rs::backtrace::trace_unsynchronized
             at /rustc/f0411ffcebcd7f75ac02ed45feb53ffd07b75398/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2: std::backtrace::Backtrace::create
             at /rustc/f0411ffcebcd7f75ac02ed45feb53ffd07b75398/library/std/src/backtrace.rs:332:13
   3: <risingwave_stream::executor::error::StreamExecutorError as core::convert::From<risingwave_stream::executor::error::Inner>>::from
             at ./src/stream/src/executor/error.rs:102:10
   4: <T as core::convert::Into<U>>::into
             at /rustc/f0411ffcebcd7f75ac02ed45feb53ffd07b75398/library/core/src/convert/mod.rs:717:9
   5: <risingwave_stream::executor::error::StreamExecutorError as core::convert::From<risingwave_expr::error::ExprError>>::from
             at ./src/stream/src/executor/error.rs:152:9
   6: <core::result::Result<T,F> as core::ops::try_trait::FromResidual<core::result::Result<core::convert::Infallible,E>>>::from_residual
             at /rustc/f0411ffcebcd7f75ac02ed45feb53ffd07b75398/library/core/src/result.rs:1961:27
   7: <risingwave_stream::executor::aggregation::agg_impl::foldable::PrimitiveSummable<S,I> as risingwave_stream::executor::aggregation::agg_impl::foldable::StreamingFoldable<S,I>>::accumulate
             at ./src/stream/src/executor/aggregation/agg_impl/foldable.rs:107:17
   8: <risingwave_stream::executor::aggregation::agg_impl::foldable::StreamingFoldAgg<R,I,S> as risingwave_stream::executor::aggregation::agg_impl::StreamingAggInput<I>>::apply_batch_concrete
             at ./src/stream/src/executor/aggregation/agg_impl/foldable.rs:324:43
   9: <risingwave_stream::executor::aggregation::agg_impl::foldable::StreamingFoldAgg<risingwave_common::array::primitive_array::PrimitiveArray<risingwave_common::types::decimal::Decimal>,risingwave_common::array::primitive_array::PrimitiveArray<risingwave_common::types::decimal::Decimal>,S> as risingwave_stream::executor::aggregation::agg_impl::StreamingAggImpl>::apply_batch
             at ./src/stream/src/executor/aggregation/agg_impl/foldable.rs:421:17
  10: risingwave_stream::executor::aggregation::value::ValueState::apply_chunk
             at ./src/stream/src/executor/aggregation/value.rs:64:9
  11: risingwave_stream::executor::aggregation::agg_state::AggState<S>::apply_chunk
             at ./src/stream/src/executor/aggregation/agg_state.rs:122:17
  12: risingwave_stream::executor::aggregation::agg_group::AggGroup<S,Strtg>::apply_chunk
             at ./src/stream/src/executor/aggregation/agg_group.rs:271:13
  13: risingwave_stream::executor::simple_agg::SimpleAggExecutor<S>::apply_chunk::{{closure}}
             at ./src/stream/src/executor/simple_agg.rs:226:9
  14: risingwave_stream::executor::simple_agg::SimpleAggExecutor<S>::execute_inner::{{closure}}
             at ./src/stream/src/executor/simple_agg.rs:337:68
......

2023-07-26T21:03:00.770534+02:00 ERROR risingwave_compute::rpc::service::stream_service: failed to collect barrier: Actor 20 exit unexpectedly: failed to send message to actor 17: Barrier(Barrier { epoch: EpochPair { curr: 4794807467507712, prev: 4794807401971712 }, mutation: None, kind: Barrier, tracing_context: TracingContext(Context { entries: 0 }), passed_actors: [3, 20] })
  backtrace of `StreamError`:
   0: std::backtrace_rs::backtrace::libunwind::trace
             at /rustc/f0411ffcebcd7f75ac02ed45feb53ffd07b75398/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
   1: std::backtrace_rs::backtrace::trace_unsynchronized
             at /rustc/f0411ffcebcd7f75ac02ed45feb53ffd07b75398/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2: std::backtrace::Backtrace::create
             at /rustc/f0411ffcebcd7f75ac02ed45feb53ffd07b75398/library/std/src/backtrace.rs:332:13
   3: <risingwave_stream::error::StreamError as core::convert::From<risingwave_stream::error::Inner>>::from
             at ./src/stream/src/error.rs:47:10
   4: <T as core::convert::Into<U>>::into
             at /rustc/f0411ffcebcd7f75ac02ed45feb53ffd07b75398/library/core/src/convert/mod.rs:717:9
   5: <risingwave_stream::error::StreamError as core::convert::From<anyhow::Error>>::from
             at ./src/stream/src/error.rs:110:9
   6: <T as core::convert::Into<U>>::into
             at /rustc/f0411ffcebcd7f75ac02ed45feb53ffd07b75398/library/core/src/convert/mod.rs:717:9
   7: <risingwave_stream::executor::exchange::output::LocalOutput as risingwave_stream::executor::exchange::output::Output>::send::{{closure}}::{{closure}}
             at ./src/stream/src/executor/exchange/output.rs:78:17
   8: core::result::Result<T,E>::map_err
             at /rustc/f0411ffcebcd7f75ac02ed45feb53ffd07b75398/library/core/src/result.rs:828:27
   9: <risingwave_stream::executor::exchange::output::LocalOutput as risingwave_stream::executor::exchange::output::Output>::send::{{closure}}
             at ./src/stream/src/executor/exchange/output.rs:73:9
.....

2023-07-26T21:03:03.593814+02:00 ERROR risingwave_stream::task::stream_manager: actor exit actor=1 error=failed to send message to actor 18: Barrier(Barrier { epoch: EpochPair { curr: 4794807598514176, prev: 4794807533043712 }, mutation: None, kind: Barrier, tracing_context: TracingContext(Context { entries: 0 }), passed_actors: [1] })
2023-07-26T21:03:03.593807+02:00 ERROR risingwave_compute::rpc::service::stream_service: failed to collect barrier: Actor 20 exit unexpectedly: failed to send message to actor 17: Barrier(Barrier { epoch: EpochPair { curr: 4794807467507712, prev: 4794807401971712 }, mutation: None, kind: Barrier, tracing_context: TracingContext(Context { entries: 0 }), passed_actors: [3, 20] })
  backtrace of `StreamError`:
   0: std::backtrace_rs::backtrace::libunwind::trace
             at /rustc/f0411ffcebcd7f75ac02ed45feb53ffd07b75398/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
   1: std::backtrace_rs::backtrace::trace_unsynchronized
             at /rustc/f0411ffcebcd7f75ac02ed45feb53ffd07b75398/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2: std::backtrace::Backtrace::create
             at /rustc/f0411ffcebcd7f75ac02ed45feb53ffd07b75398/library/std/src/backtrace.rs:332:13
   3: <risingwave_stream::error::StreamError as core::convert::From<risingwave_stream::error::Inner>>::from
             at ./src/stream/src/error.rs:47:10
   4: <T as core::convert::Into<U>>::into
             at /rustc/f0411ffcebcd7f75ac02ed45feb53ffd07b75398/library/core/src/convert/mod.rs:717:9
   5: <risingwave_stream::error::StreamError as core::convert::From<anyhow::Error>>::from
             at ./src/stream/src/error.rs:110:9
   6: <T as core::convert::Into<U>>::into
             at /rustc/f0411ffcebcd7f75ac02ed45feb53ffd07b75398/library/core/src/convert/mod.rs:717:9
   7: <risingwave_stream::executor::exchange::output::LocalOutput as risingwave_stream::executor::exchange::output::Output>::send::{{closure}}::{{closure}}
             at ./src/stream/src/executor/exchange/output.rs:78:17
   8: core::result::Result<T,E>::map_err
.....

To Reproduce

Use this test case

statement ok
create table t(d decimal);
statement ok
insert into t values (9000000000000000000000000000),
(9000000000000000000000000000),
(9000000000000000000000000000),
(9000000000000000000000000000),
(9000000000000000000000000000),
(9000000000000000000000000000),
(9000000000000000000000000000),
(9000000000000000000000000000);
query T
select sum(d) from t;
----
72000000000000000000000000000
statement ok
insert into t values (9000000000000000000000000000);
statement error Expr error: Numeric out of range
select sum(d) from t;

Expected behavior

No response

How did you deploy RisingWave?

No response

The version of RisingWave

latest main

Additional context

Note: It doesn't panic immediately, but wait a while, and finally panic at failed to execute barrier

@xxchan xxchan added the type/bug Something isn't working label Jul 26, 2023
@github-actions github-actions bot added this to the release-1.1 milestone Jul 26, 2023
@xxchan xxchan changed the title streaming agg panics on error streaming agg panics on error (e.g. overflow) Jul 26, 2023
@TennyZhuang
Copy link
Contributor

We shouldn't panic it here, however we can do nothing except suspend the mview :)

@xxchan
Copy link
Member Author

xxchan commented Jul 27, 2023

risingwavelabs/rfcs#54 ? 🥵

@BugenZhao
Copy link
Member

BugenZhao commented Jul 28, 2023

Note: It doesn't panic immediately, but wait a while, and finally panic at failed to execute barrier

Just FYI: this is expected for executor failure without recovery enabled in the meta service.

Copy link
Contributor

This issue has been open for 60 days with no activity. Could you please update the status? Feel free to continue discussion or close as not planned.

@stdrc
Copy link
Member

stdrc commented Jun 12, 2024

This seems to be fixed already.

@stdrc stdrc closed this as completed Jun 12, 2024
@stdrc
Copy link
Member

stdrc commented Jun 12, 2024

Oh sorry it's not "fixed". It's just recovered and the last insertion will not succeed.

@xiangjinwu
Copy link
Contributor

Oh sorry it's not "fixed". It's just recovered and the last insertion will not succeed.

Adding an example:

dev=> create table t (k varchar, v varchar);
CREATE_TABLE

dev=> create materialized view mv as select array_agg(k order by k), jsonb_object_agg(k, v), jsonb_agg(k order by k) from t;
CREATE_MATERIALIZED_VIEW

dev=> insert into t values ('foo', 'bar');
INSERT 0 1

dev=> select * from mv;
 array_agg | jsonb_object_agg | jsonb_agg 
-----------+------------------+-----------
 {foo}     | {"foo": "bar"}   | ["foo"]
(1 row)

dev=> insert into t values (null, null); -- `jsonb_object_agg` error when key is null
INSERT 0 1

dev=> select * from mv;
 array_agg | jsonb_object_agg | jsonb_agg 
-----------+------------------+-----------
 {foo}     | {"foo": "bar"}   | ["foo"]
(1 row)

dev=> insert into t values ('foo2', 'bar2');
INSERT 0 1

dev=> select * from mv;
 array_agg  |        jsonb_object_agg        |    jsonb_agg    
------------+--------------------------------+-----------------
 {foo,foo2} | {"foo": "bar", "foo2": "bar2"} | ["foo", "foo2"]
(1 row)

dev=> insert into t values (null, null);
INSERT 0 1

dev=> select * from mv;
 array_agg  |        jsonb_object_agg        |    jsonb_agg    
------------+--------------------------------+-----------------
 {foo,foo2} | {"foo": "bar", "foo2": "bar2"} | ["foo", "foo2"]
(1 row)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-issue-activity type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants