[SPARK-26655] [SS] Support multiple aggregates in append mode #23576

arunmahadevan · 2019-01-17T21:22:08Z

What changes were proposed in this pull request?

This patch proposes to add support for multiple aggregates in append mode. In
append mode, the aggregates are emitted only after the watermark passes
the threshold (e.g. the window boundary) and the emitted value is not
affected by further late data. This allows to chain multiple aggregates
in 'Append' output mode without worrying about retractions etc.

However the current event time watermarks in structured streaming are
tracked at a global level and this does not work when aggregates are
chained. The downstream watermarks usually lags the ones before and the
global (min or max) watermarks will not let the stages make progress
independently.

The patch tracks the watermarks at each (stateful)
operator so that the aggregate outputs are generated when the watermark
passes the thresholds at the corresponding stateful operator. The values
are also saved into the commit/offset logs (similar to global watermark)

Each aggregate should have a corresponding watermark defined while
creating the query (E.g. via withWatermark) and this is used to
track the progress of event time corresponding to the stateful operator.

How was this patch tested?

New and existing unit tests

Please review http://spark.apache.org/contributing.html before opening a pull request.

This patch proposes to add support for multiple aggregates in append mode. In append mode, the aggregates are emitted only after the watermark passes the threshold (e.g. the window boundary) and the emitted value is not affected by further late data. This allows to chain multiple aggregates in 'Append' output mode without worrying about retractions etc. However the current event time watermarks in structured streaming are tracked at a global level and this does not work when aggregates are chained. The downstream watermarks usually lags the ones before and the global (min or max) watermarks will not let the stages make progress independently. The patch tracks the watermarks at each (stateful) operator so that the aggregate outputs are generated when the watermark passes the thresholds at the corresponding stateful operator. The values are also saved into the commit/offset logs (similar to global watermark) Each aggregate should have a corresponding watermark defined while creating the query (E.g. via withWatermark) and this is used to track the progress of event time corresponding to the stateful operator.

SparkQA · 2019-01-18T01:21:41Z

Test build #101380 has finished for PR 23576 at commit c1a0cd6.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class CommitMetadata(nextBatchWatermarkMs: Long = 0,

arunmahadevan · 2019-01-18T01:32:07Z

ping @HeartSaVioR @tdas @jose-torres

HeartSaVioR

I would spend more time to take a look deeply on the code change, but the concept looks good to me. Left some comments in test part.

It would be also better to describe some of use cases here.

HeartSaVioR · 2019-01-18T04:40:58Z

...src/main/scala/org/apache/spark/sql/execution/streaming/StreamingSymmetricHashJoinExec.scala

    // Latest watermark value is more than that used in this previous executed plan
    val watermarkHasChanged =
-      eventTimeWatermark.isDefined && newMetadata.batchWatermarkMs > eventTimeWatermark.get
+      eventTimeWatermark.isDefined &&  getWatermark(newMetadata) > eventTimeWatermark.get


nit: two spaces after &&

HeartSaVioR · 2019-01-18T04:54:38Z

sql/core/src/test/scala/org/apache/spark/sql/streaming/EventTimeWatermarkSuite.scala

+    testStream(windowedAggregation)(
+      AddData(inputData, 10, 11, 11, 12, 12),
+      CheckNewAnswer(),
+      AddData(inputData, 25), // watermark -> group1 = 15, group2 = 10


Might better to explain which rows are emitted from the first aggregation to help developers (reviewers for now) to verify the value of watermark for the second aggregation.
(I can imagine that would be (10, 15, 5), but better to show it explicitly.)

Now I see you're adding current states in other test. It would be ideal we have similar level of explanation here.

Added more comments

HeartSaVioR · 2019-01-18T04:58:20Z

sql/core/src/test/scala/org/apache/spark/sql/streaming/EventTimeWatermarkSuite.scala

+      CheckNewAnswer(),
+      AddData(inputData, 25), // watermark -> group1 = 15, group2 = 10
+      CheckNewAnswer(),
+      assertNumTotalStateRows(3),


Same here: better to explain them, especially we have two different state operators.

HeartSaVioR · 2019-01-18T13:43:02Z

sql/core/src/test/scala/org/apache/spark/sql/streaming/EventTimeWatermarkSuite.scala

+      AddData(inputData, 26, 26, 27),
+      CheckNewAnswer(),
+      AddData(inputData, 40), // watermark -> group1 = 30 , group2 = 25
+      CheckNewAnswer((15, 1, 1), (15, 2, 2))


This query doesn't look intuitive and looks a bit hard to track since second aggregation groups aggregated value on first aggregation.

Could we change the query a bit? I guess one of example would be having user as column, and grouping by window and user in the first aggregation (so aggregated per window and user), and grouping by window in the second aggregation (so aggregated per window - global users).

I wanted to check with multiple keys as part of the second group as well. I have added comments so that its easy to follow. Also added an additional test case like the one you suggested, with required comments.

HeartSaVioR · 2019-01-18T13:44:37Z

sql/core/src/test/scala/org/apache/spark/sql/streaming/EventTimeWatermarkSuite.scala

+  test("multiple aggregates in append mode recovery") {
+    val inputData = MemoryStream[Int]
+
+    val windowedAggregation = inputData.toDF()


Same here: I would ask to consider changing query.

SparkQA · 2019-01-19T03:14:51Z

Test build #101415 has finished for PR 23576 at commit 3dc918c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HeartSaVioR

Looks good overall. Left some comments.

HeartSaVioR · 2019-01-20T23:33:28Z

...lyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala

-        val watermarkAttributes = aggregate.groupingExpressions.collect {
-          case a: Attribute if a.metadata.contains(EventTimeWatermark.delayKey) => a
-        }
+        aggregates.foreach(aggregate => {


nit: foreach { aggregate =>

https://github.com/databricks/scala-style-guide/blob/master/README.md#anonymous-methods

HeartSaVioR · 2019-01-21T03:40:53Z

sql/core/src/test/scala/org/apache/spark/sql/streaming/EventTimeWatermarkSuite.scala

+                                // since watermark of group2 is at 10
+        CheckNewAnswer(),
+        assertNumTotalStateRows(3), // {[25-30],25} -> 1 in state1 and
+                                    // {[30-35],1} -> 1, {[30-35],1} -> 1 {[30-35],2} -> 2 in state2


window2 [15 - 20] -> (1,1), (2,2) is retained in state2 and explanation here are different. Looks like former is correct.

yes correct

HeartSaVioR · 2019-01-21T03:55:40Z

sql/core/src/test/scala/org/apache/spark/sql/streaming/EventTimeWatermarkSuite.scala

+                                // window1 [40 - 45] -> (40,1) is emitted down
+                                // window1 [55 - 60] -> (55, 1) is retained in state1
+                                // window2 [30 - 35] -> (1,2), (2,1) is emitted out
+                                // window2 [40 - 45] -> (40, 1) is retained in state2


yes, thanks for noticing.

HeartSaVioR · 2019-01-21T04:11:03Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/WatermarkTracker.scala

+      case s: StatefulOperator => s
    }

+    statefulOperators.foreach(statefulOperator => {


Same here for nit: foreach { statefulOperator =>

HeartSaVioR · 2019-01-21T04:12:01Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/WatermarkTracker.scala

+      }
+
+      // compute watermark for the stateful operator node
+      statefulOperator.stateInfo.foreach(state => {


Same here for nit: foreach { state =>

HeartSaVioR · 2019-01-21T04:22:06Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/WatermarkTracker.scala

+          updateWaterMarkMap(eventTimeExecs,
+            statefulOperatorToEventTimeMap.getOrElseUpdate(state.operatorId,
+              new mutable.HashMap[Int, Long]()))
+          val newWatermarkMs = statefulOperatorToEventTimeMap(state.operatorId).values.toSeq.min


We may want to apply watermark policy also here, like policy.chooseGlobalWatermark(statefulOperatorToEventTimeMap(state.operatorId).values.toSeq)

I think with global watermark there needs to be a way to make progress across all stateful operators and looks like min did not work in all cases.

But I am not sure if it would make sense to choose max for the individual operator level watermark. If event times in one of the inputs is lagging, the best would be to not advance the watermark beyond it. Watermarks should ideally advance when all input data has been observed and choosing max would cause more events to be discarded as late data.

IMO we can just choose min here but would like to hear opinion from other reviewers as well.

OK I also would defer to other reviewers as well. Thanks for sharing your thought about this.

Just to record here, I agree this should be just min.

HeartSaVioR · 2019-01-21T04:29:38Z

cc. to @zsxwing and @gaborgsomogyi as well

HeartSaVioR

LGTM except deferred of thing of decision: whether to always use min or as user configured (max or min) for watermark. I guess other reviewers will provide opinions so LGTM here.

SparkQA · 2019-01-25T03:44:04Z

Test build #101655 has finished for PR 23576 at commit 91046cd.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jose-torres · 2019-01-25T20:43:53Z

(Just leaving a note that I am looking at this; it's taking me a while to think through the details of the watermarking.)

arunmahadevan · 2019-02-04T23:49:54Z

@jose-torres , did you get a chance to take a look ? Let me know how to take it forward.

jose-torres

I think this needs to be a subset of a more general proposal to make watermarks non-global. It’s not obvious to me that it’s valid for a stateful operator to reach through an arbitrary child and grab a watermark from the other side.

jose-torres · 2019-02-06T18:31:21Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/WatermarkTracker.scala

+
+    statefulOperators.foreach { statefulOperator =>
+      // find the first event time child node(s)
+      val eventTimeExecs = statefulOperator match {


I’m not sure I understand this. Don’t we throw away this val immediately when we leave the foreach scope?

We use this to finally update the statefulOperatorToWatermark below (its passed to updateWaterMarkMap).

So basically we collect all the event time (via EventTimeWaterMarkExec) inputs to a stateful operator. The (input) watermark of that stateful operator is then the minimum of all the input watermarks (the event times minus the lag) coming into that node.

But what's the value of eventTimeExecs when we reach line 145? I don't understand how it's in scope at all, and if it is it seems that it would have only the value computed for the last stateful operator in statefulOperators. Maybe there's some Scala magic I'm missing here.

actually line: 145 is inside the statefulOperators.foreach. Maybe I need to refactor it out to a separate method to make it more readable.

arunmahadevan · 2019-02-06T19:40:39Z

It’s not obvious to me that it’s valid for a stateful operator to reach through an arbitrary child and grab a watermark from the other side.

@jose-torres, we don't pick an arbitrary child. We consider all the event time inputs (children) to the stateful operator and compute the watermark as the minimum of all the input watermarks (which is the EventTime minus the lag of each child EventTimeWatermarkExec).

HeartSaVioR · 2019-02-06T23:36:08Z

I think this needs to be a subset of a more general proposal to make watermarks non-global.

Do you have some idea/plan in mind for non-global watermark? Just curious, because that might be one of major change on concept.

arunmahadevan · 2019-02-07T00:38:41Z

Do you have some idea/plan in mind for non-global watermark? Just curious, because that might be one of major change on concept.

I am not sure if you mean the way the watermarks are computed or propagate...anyways that seems orthogonal. Here we compute the operator watermark as minimum of input watermarks which should hold irrespective. I don't see another reasonable way to make progress at each operator.

jose-torres · 2019-02-07T17:54:01Z

I'd agree that min is the only reasonable way to compute an operator watermark. What I think we need a design for is operator watermarks in general, and how they slot into the rest of Spark. Questions I worry can't be addressed by a PR include:

I have a plan tree A: EventTimeExec -> B: StatefulOperator -> C: StatefulOperator. Can C use the watermark in A? If so, is it safe to do that when B transforms or projects away the watermarked column - if not, what are the rules for how watermarks must be provided with multiple aggregates?
Do all of our optimization and execution rules respect the semantics of operator watermarks?
We can currently call withWatermark at any point in the query plan. Is this consistent with operator watermarks? Even if we can support the two of them together, do we want to?

arunmahadevan · 2019-02-07T18:28:11Z

I have a plan tree A: EventTimeExec -> B: StatefulOperator -> C: StatefulOperator. Can C use the watermark in A? If so, is it safe to do that when B transforms or projects away the watermarked column - if not, what are the rules for how watermarks must be provided with multiple aggregates?

Typically C cannot since A is the input watermark of B and assuming it does some aggregation, it needs to emit a new watermark. Theres a new check in the UnsupportedOperationChecker where it checks that each aggregate's grouping expression has a event time watermark attribute, which kind of enforces this. So one would have to explicitly specify a timestamp output column and a second watermark like

input.withWatermark("ts", ...)
      .groupBy(window($"ts", ...), $"key").count()
      .select($"window.end" as "windowts", $"count")
      .withWatermark("windowts", ...)
      .groupBy(window($"windowts", ...), $"count").count()

Do all of our optimization and execution rules respect the semantics of operator watermarks?

Need to check if it would interfere with multiple watermarks or we need any new rules.

We can currently call withWatermark at any point in the query plan. Is this consistent with operator watermarks? Even if we can support the two of them together, do we want to?

I thought withWatermark should be called before the groupBy so that the grouping attribute will have a watermark otherwise it fails in the UnsupportedOperationChecker. With multiple aggregates, it should be called before each aggregate.

arunmahadevan · 2019-02-19T22:58:55Z

I have slightly modified the watermark computation logic and the details are documented here and attached to the JIRA. Please review.

SparkQA · 2019-02-20T02:20:20Z

Test build #102523 has finished for PR 23576 at commit 91f67ef.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-02-21T03:52:38Z

Test build #102564 has finished for PR 23576 at commit 8579fb6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

arunmahadevan · 2019-02-21T19:15:54Z

ping @jose-torres, @tdas for further reviews.

jose-torres · 2019-02-26T22:50:32Z

I'm not going to give a hard no, but my review is that I don't think we should move forward with this PR without a full design for the new concept of operator-local watermarks.

arunmahadevan · 2019-02-26T23:52:08Z

@jose-torres , thanks for the comment. The current approach allows the users to chain aggregates in append mode as long as they define a watermark per aggregate. Its a reasonable approach I could figure out within the existing framework. We could relax this restriction but need a way to choose the timestamp field per aggregate if the user does not explicitly specify it. The other alternative I explored would require a separate watermark channel and would need more disruptive changes.

Could you review the doc and comment on the areas you would like more info and/or alternatives that can be explored so that we can agree on a more solid design to proceed further ?

jose-torres · 2019-02-27T00:56:54Z

I would want a proposal that's a commit rather than a diff, if that makes sense. Something in the form of:

Here's the definition of what a watermark means in Spark. When an operator asks the streaming engine "what is my watermark?", it's calculated in suchandsuch way.
This definition of watermark supports use cases A, B, C. We want to support these for suchandsuch reason.
This definition of watermark does not support use cases X, Y, Z. For suchandsuch reason, we're confident that we do not want to support them and will not want to support them in the near future.

What I'm worried about is possibilities like this. Suppose we decide that we want to support multiple aggregates in complete mode in 3.1, and realize that we need a separate watermark channel in that case. Then we'll be stuck; we will be forced to either break the semantic we just added, or establish a weird piecemeal semantic where you specify watermarks differently depending on the shape of your query.

arunmahadevan · 2019-02-27T01:19:50Z

The modes and watermarks should be independent, but anyways let me explore it a bit further and also try to address your other points in the design.

echauchot · 2019-05-13T09:51:13Z

This PR is definitely very interesting, thanks ! Having the support for multiple aggregations in streaming mode is very important. I saw the design document for wider watermark support got no comments. I added a comment

arunmahadevan · 2019-06-12T16:19:49Z

Output watermark can be computed as some function of input watermark and the timestamp of events at that operator (e.g. min(input watermarks, timestamps of oldest event at that node)) so we could compute the other by storing only the input watermark. For now, we require the user to provide a timestamp column + lag using “withWatermark()” before each aggregate operation.
E.g.

input.
.withWatermark("inputtime",...)
.groupBy(window(...))
.select($"window1.end".as("windowtime")
.withWatermark("windowtime")..
.groupBy(...)...

Here the window.end of the first groupBy is the output watermark which becomes the input watermark of the second groupBy.

Also note that the input water mark of an operator is propagated to the next operator only in the next batch so that it processes the events first and then the watermark.

Let me know the specific cases where you found issues.

HeartSaVioR · 2019-06-12T21:44:32Z

I've left comment in the doc. Sorry I shouldn't leave comment here to make confusion. Let's talk in doc. Btw I guess this patch is not addressing the doc yet, then you may want to mark this patch as WIP.

echauchot · 2019-08-22T07:53:14Z

@arunmahadevan any updates on this feature or on the watermark design document ?

HeartSaVioR · 2019-08-26T01:51:51Z

I revisited and thought about this briefly, and felt that the watermark and mode Spark provide are different with other frameworks.

Append mode is tricky if you are familiar with other frameworks. In Append mode, Spark tries to ensure there's only one output for each key, which "delay threshold" is taken into consideration as well. AFAIK, Flink emits another output if late but allowed tuple comes later than watermark and updates output, hence dealing with "upsert" is necessary. (Not sure for Beam but I guess Flink follows the Beam model so I would expect similar.) In Spark, "upsert" is still yet defined for DSv2, and hence UPDATE mode will be disabled for Spark 3. (#23859)

Suppose there's stateful operator OP1 with batch B2, and watermark is defined before OP1 with delay threshold set to 1hr. The range of outputs OP1 can emit in B2 are following:

WM(OP1B1) - delay threshold <= outputs < WM(OP1B2) - delay threshold

as it denotes that outputs which were not evicted (emitted) from previous batch but match condition of evicting (emitting) for this batch.

If we have OP2 having OP1 as upstream, it will retrieve outputs as above, and to not drop any intermediate outputs, either 1) OP2 should inherit WM(OP1B1) as WM(OP2B2) and also have equal or bigger delay threshold, or 2) OP2 should define WM(OP2B2) as WM(OP1B1) - delay threshold.

Maybe that's less important, as I can't think of safe approach in current status of Spark. I think Spark may need to make some changes before introducing advanced features.

I think the main issue of Spark Structured Streaming is being "flexible" on watermark, flexible enough to let end users mess up their query easily. I assume other frameworks have special field for "event time" and prevent modifying the field, but for Spark it's just same as other column and open for modification. If event time is modified, it's no longer in line with watermark and the result would be indeterministic. Same for withWatermark, end users can call withWatermark between OP1 and OP2, then everything is up to end users - what would be WM(OP2)? - and Spark can't help there.

Similarly, which is event time column for stream-stream joined output where event time column is defined per each input? I'm not seeing clear definition of this.

I'd in favor to let streaming engine manages event time and watermark once value of event time is defined, and restrict end users to modify event time (one-time update). To achieve this, each row should have meta-column of "event time", and once it's defined, further update should be done only from Spark side - each stateful operator needs to decide the event time of output according to input(s) and its watermark. (e.g. for windowed aggregation, "window.start" should be used for "event time" and it shouldn't be changed.) That's the only way Spark could ensure event time and watermark are in sync during multiple stateful operations.

echauchot · 2019-08-27T08:56:51Z

Hi @HeartSaVioR, thanks for your feedack. Regarding late data with Beam, indeed when an element comes behind the watermark but before allowed lateness it delays window closing. So the element that comes in that ranges counts in the output data. If it comes after allowed lateness, it is dropped. Regarding output mode, most of Beam runners (spark for ex) support discarding output in which element from different windows are independent and previous states are dropped. It seems very similar to spark append mode.

Regarding event time, indeed beam forbids modifying it and also there is an event time per element the same way you suggest event time per row in spark. Also our answer to join-join stream watermark is: we take the minimum of the output watermark of previous stages. But that is because Beam WM is based on the minimum event timestamp seen. Also stateful operators do not change the event timestamp, no body does. That is why we defined input and output watermark to introduce this delay.
=> That is another way of solving the problem but I agree, it requires a good amount of change in the spark model to introduce input/output watermarks and per-operator ones.

Still there is something I do not understand. With previous Spark Dstream framework, multiple-aggregations were supported. What has changed in Spark watermark behavior that makes it not supported now with Structured Streaming ?

HeartSaVioR · 2019-08-27T09:19:50Z

Regarding output mode, most of Beam runners (spark for ex) support discarding output in which element from different windows are independent and previous states are dropped.

I'm not sure I understand it correctly. The point for Append mode is, output for specific key (key shouldn't be necessary to be windowed, but should include "event time" column) will be provided only once in any case (orthogonal to fault tolerance, and doesn't mean "exactly-once" here), regardless of allowed lateness, no case of "upsert". If Beam doesn't close the window when watermark passes by (but still doesn't pass by allowed lateness) but triggers window and emits the output of window so far (so output could be emitted multiple times), it's not compatible with Spark's Append mode.

stream-stream join should decide which "event time" should be taken even we change the way of storing event time, as there're two rows being joined. How Beam decides "event time" for new record from two records? In column based event time (current Spark), it should be hard to choose "min" or "max" of event time, as which column to pick as event time should be decided by query plan phase.

HeartSaVioR · 2019-08-27T09:42:20Z

Still there is something I do not understand. With previous Spark Dstream framework, multiple-aggregations were supported. What has changed in Spark watermark behavior that makes it not supported now with Structured Streaming ?

Sorry I'm not aware of DStream's behavior, kinda started from structured streaming and didn't mind DStream much. But as there's no notion of event time and watermark in DStream doc, I'd rather avoid dealing with DStream for event time processing. When you're playing with DStream, you'll likely be doing processing time, with limit to batch duration.

http://spark.apache.org/docs/latest/streaming-programming-guide.html

echauchot · 2019-08-28T08:25:18Z

Regarding output mode, most of Beam runners (spark for ex) support discarding output in which element from different windows are independent and previous states are dropped.

I'm not sure I understand it correctly. The point for Append mode is, output for specific key (key shouldn't be necessary to be windowed, but should include "event time" column) will be provided only once in any case (orthogonal to fault tolerance, and doesn't mean "exactly-once" here), regardless of allowed lateness, no case of "upsert". If Beam doesn't close the window when watermark passes by (but still doesn't pass by allowed lateness) but triggers window and emits the output of window so far (so output could be emitted multiple times), it's not compatible with Spark's Append mode.

Beam does not trigger output unless the watermark pass the end of window + allowed lateness. There is no triggering between end of window and allowed lateness. Close and output is at the same time.

stream-stream join should decide which "event time" should be taken even we change the way of storing event time, as there're two rows being joined. How Beam decides "event time" for new record from two records? In column based event time (current Spark), it should be hard to choose "min" or "max" of event time, as which column to pick as event time should be decided by query plan phase.

Ah I thought we were talking about watermark. For choosing the event timestamp, Beam uses a TimestampCombiner which default policy is to set the resulting timestamp to the end of the window for new record.

HeartSaVioR · 2019-08-28T08:46:52Z

Beam does not trigger output unless the watermark pass the end of window + allowed lateness. There is no triggering between end of window and allowed lateness. Close and output is at the same time.

Ah OK I see. That looks similar as Append mode. That's a bit different from what I read a book for Flink so assuming there're some differences between Beam and Flink... (BTW I also read "Streaming Systems", though it mostly explains theory and not having pretty much details on Beam.)

Ah I thought we were talking about watermark. For choosing the event timestamp, Beam uses a TimestampCombiner which default policy is to set the resulting timestamp to the end of the window for new record.

That seems to only explain the case where window is applied. How it works for other cases? Does it keep the origin event timestamp as it is? In windowed stream-stream join it also makes sense, but there're also non-windowed stream-stream join as well, and then output should have only one event time whereas there're two inputs.

echauchot · 2019-08-28T09:08:19Z

Beam does not trigger output unless the watermark pass the end of window + allowed lateness. There is no triggering between end of window and allowed lateness. Close and output is at the same time.

Ah OK I see. That looks similar as Append mode. That's a bit different from what I read a book for Flink so assuming there're some differences between Beam and Flink... (BTW I also read "Streaming Systems", though it mostly explains theory and not having pretty much details on Beam.)

Ah I thought we were talking about watermark. For choosing the event timestamp, Beam uses a TimestampCombiner which default policy is to set the resulting timestamp to the end of the window for new record.

That seems to only explain the case where window is applied. How it works for other cases? Does it keep the origin event timestamp as it is? In windowed stream-stream join it also makes sense, but there're also non-windowed stream-stream join as well, and then output should have only one event time whereas there're two inputs.

Windows are mandatory in streaming mode in Beam (otherwise there is no trigger time and no output). But if you are in batch mode (the only case where you can have no window), then the timestamps of all elements are set to +INF.

PS: I'm simplifying bit, in reality we can replace windows by configured triggers that can be based on the number of elements or processing time but as they don't exist in spark I did not mention them here.

echauchot · 2019-08-28T09:10:25Z

@HeartSaVioR will you be at the ApacheCon in September so that we can meet in person and discuss these topics ?

HeartSaVioR · 2019-08-28T11:30:10Z

Maybe I need to go through both Beam and Flink to understand the details and discuss in detailed level. That might take some time as I may need to spend my own time to cover it.

And @echauchot thanks for asking but I wouldn't be at the ApacheCon - I might consider attending the event eventually when ApacheCon plans to hold in east Asia (I'm in S.Korea).

You may also want to consider that I'm just a one of contributors in Spark project, and without long-term support (shepherd) from community (committers/PMC members), I couldn't put efforts for this huge major feature. (So if you have a chance to meet some PMC members of Apache Spark in person, it would be better chance for you.) Moreover the necessary efforts seem to be beyond which I could spend my own time, so persuading my employer might be also needed.

HeartSaVioR · 2019-08-28T11:45:36Z

cc. @tdas @zsxwing @jose-torres to see whether committers in this area are interested on this topic or not.

echauchot · 2019-08-28T12:15:13Z

@HeartSaVioR thanks for pinging the right guys !
PS: usually ApacheCon is either in Europe or in North America, I don't know if they have plans to organise it in Asia at some point :)

github-actions · 2020-01-02T00:07:31Z

We're closing this PR because it hasn't been updated in a while.
This isn't a judgement on the merit of the PR in any way. It's just
a way of keeping the PR queue manageable.

If you'd like to revive this PR, please reopen it!

echauchot · 2020-08-31T15:04:24Z

I checked in the structured streaming programming guide for 3.0 in case it was addressed by another PR and the guide missed an update with 3.0. But it seems it is still unsupported. Any luck reopening this PR so that we can have multiple aggregations in spark in streaming mode ? It is a valuable feature IMHO !
@HeartSaVioR @dongjoon-hyun ?

balajij81 · 2020-09-19T01:18:48Z

I too agree that this PR is very valuable there are lot of use cases spark could be efficiently used for real-time streaming use cases.

dongjoon-hyun · 2020-09-20T18:58:42Z

This PR is reopened based on the community request ( @echauchot and @balajij81 's ).

dongjoon-hyun · 2020-09-20T19:00:38Z

Ping @arunmahadevan since he is the author.
Ping @HeartSaVioR since he is the Apache Spark committer now and he approved this PR before.

If @HeartSaVioR wants this PR, I believe @HeartSaVioR can take over this when the main author, @arunmahadevan , is busy. Of course, @arunmahadevan is the main author and @HeartSaVioR will be the co-author in the commit. It's up to him and @arunmahadevan .

Also, cc @tdas , @zsxwing , @jose-torres , @cloud-fan , @gatorsmile , @dbtsai .

SparkQA · 2020-09-20T19:05:55Z

Test build #128921 has finished for PR 23576 at commit 8579fb6.

This patch fails Scala style tests.
This patch does not merge cleanly.
This patch adds no public classes.

HeartSaVioR · 2020-09-28T12:31:02Z

Unfortunately I would say the conceptual change of watermark should go through some sort of discussion (or even SPIP), which means it's no-op unless we have enough committers who are interested and be willing to support it.

That said, if @arunmahadevan has a willingness to re-visit this, then I can add myself to the reviewer of the design doc and the following PR, but I probably wouldn't be sufficient. I might take it up if I see at least 3 committers on this area (or committers at least want to follow up the area and approve once they feel qualified) are promised to be committed on this topic. But before that I really want to drive my own discussion topics in dev. mailing list which I don't have any input from committers.

github-actions · 2021-01-07T01:15:45Z

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

HeartSaVioR reviewed Jan 18, 2019

View reviewed changes

Address review comments

3dc918c

HeartSaVioR reviewed Jan 21, 2019

View reviewed changes

Review comments

91046cd

HeartSaVioR approved these changes Jan 25, 2019

View reviewed changes

jose-torres reviewed Feb 6, 2019

View reviewed changes

Track downstream watermark based on upstream

8579fb6

arunmahadevan force-pushed the agg branch from 91f67ef to 8579fb6 Compare February 20, 2019 23:33

dongjoon-hyun added the STRUCTURED STREAMING label Jun 14, 2019

github-actions bot added the Stale label Jan 2, 2020

github-actions bot closed this Jan 3, 2020

HeartSaVioR mentioned this pull request Jan 7, 2020

[SPARK-27340][SS] Alias on TimeWindow expression may cause watermark metadata lost #24457

Closed

dongjoon-hyun reopened this Sep 20, 2020

dongjoon-hyun removed the Stale label Sep 20, 2020

github-actions bot added the Stale label Jan 7, 2021

github-actions bot closed this Jan 8, 2021

[SPARK-26655] [SS] Support multiple aggregates in append mode #23576

[SPARK-26655] [SS] Support multiple aggregates in append mode #23576

Uh oh!

Conversation

arunmahadevan commented Jan 17, 2019

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Jan 18, 2019

Uh oh!

arunmahadevan commented Jan 18, 2019

Uh oh!

HeartSaVioR left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR Jan 18, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jan 19, 2019

Uh oh!

HeartSaVioR left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR commented Jan 21, 2019

Uh oh!

HeartSaVioR left a comment

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jan 25, 2019

Uh oh!

jose-torres commented Jan 25, 2019

Uh oh!

arunmahadevan commented Feb 4, 2019

Uh oh!

jose-torres left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR Jan 18, 2019 •

edited

Loading

arunmahadevan commented Feb 6, 2019 •

edited

Loading

arunmahadevan commented Feb 7, 2019 •

edited

Loading

echauchot commented May 13, 2019 •

edited

Loading

HeartSaVioR commented Aug 26, 2019 •

edited

Loading