-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add StreamingWindowExec to DataFusion physical plan to support aggregations over unbounded data #11366
Comments
Actually existing pub enum InputOrderMode {
/// There is no partial permutation of the expressions satisfying the
/// existing ordering.
Linear,
/// There is a partial permutation of the expressions satisfying the
/// existing ordering. Indices describing the longest partial permutation
/// are stored in the vector.
PartiallySorted(Vec<usize>),
/// There is a (full) permutation of the expressions satisfying the
/// existing ordering.
Sorted,
} When mode is either |
To add to the comment above -- |
@mustafasrepo & @ozankabak thanks for the feedback. the target usecases we were going for are flink style workloads, with data read from kafka that is generally not be ordered and thus needs to be watermarked. we tried the vanilla aggregates and ran into PipelineBreaking panics. An example workload we're trying to compute is of the nature, lmk if this can already be expressed with current operators as is --
|
To help as best as I can, let me first reiterate my understanding of your use case: You have a streaming source, which has some columns like In such a case, what you can do is to use a projection to add an order defining column based on processing time, and use |
Is your feature request related to a problem or challenge?
Currently DataFusion somewhat supports computations over Unbounded data, with
SymmetricHashJoinExec
being able to join unbounded streams of data. However ability to aggregate over unbounded data seems to be missing in project as yet. In stream processing, aggregating over windows of streaming data is a common concept. A Streaming Window however behaves more like anAggregationExec
than aWindowExec
, we have a POC StreamingWindowExec built off a fork of DataFusion 39.0.We would like to collaborate with the community to upstream this operator as well improve the design.
Describe the solution you'd like
No response
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: