[SPARK-18124] Observed delay based Event Time Watermarks #15702

marmbrus · 2016-10-31T22:22:07Z

This PR adds a new method withWatermark to the Dataset API, which can be used specify an event time watermark. An event time watermark allows the streaming engine to reason about the point in time after which we no longer expect to see late data. This PR also has augmented StreamExecution to use this watermark for several purposes:

To know when a given time window aggregation is finalized and thus results can be emitted when using output modes that do not allow updates (e.g. Append mode).
To minimize the amount of state that we need to keep for on-going aggregations, by evicting state for groups that are no longer expected to change. Although, we do still maintain all state if the query requires (i.e. if the event time is not present in the groupBy or when running in Complete mode).

An example that emits windowed counts of records, waiting up to 5 minutes for late data to arrive.

df.withWatermark("eventTime", "5 minutes")
  .groupBy(window($"eventTime", "1 minute") as 'window)
  .count()
  .writeStream
  .format("console")
  .mode("append") // In append mode, we only output finalized aggregations.
  .start()

Calculating the watermark.

The current event time is computed by looking at the MAX(eventTime) seen this epoch across all of the partitions in the query minus some user defined delayThreshold. An additional constraint is that the watermark must increase monotonically.

Note that since we must coordinate this value across partitions occasionally, the actual watermark used is only guaranteed to be at least delay behind the actual event time. In some cases we may still process records that arrive more than delay late.

This mechanism was chosen for the initial implementation over processing time for two reasons:

it is robust to downtime that could affect processing delay
it does not require syncing of time or timezones between the producer and the processing engine.

Other notable implementation details

A new trigger metric eventTimeWatermark outputs the current value of the watermark.
We mark the event time column in the Attribute metadata using the key spark.watermarkDelay. This allows downstream operations to know which column holds the event time. Operations like window propagate this metadata.
explain() marks the watermark with a suffix of -T${delayMs} to ease debugging of how this information is propagated.
Currently, we don't filter out late records, but instead rely on the state store to avoid emitting records that are both added and filtered in the same epoch.

Remaining in this PR

The test for recovery is currently failing as we don't record the watermark used in the offset log. We will need to do so to ensure determinism, but this is deferred until SPARK-17829 [SQL] Stable format for offset log #15626 is merged.

Other follow-ups

There are some natural additional features that we should consider for future work:

Ability to write records that arrive too late to some external store in case any out-of-band remediation is required.
Update mode so you can get partial results before a group is evicted.
Other mechanisms for calculating the watermark. In particular a watermark based on quantiles would be more robust to outliers.

marmbrus · 2016-10-31T22:24:05Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/EventTimeWatermarkExec.scala

+    child: SparkPlan) extends SparkPlan {
+
+  // TODO: Use Spark SQL Metrics?
+  val maxEventTime = new MaxLong


@zsxwing am I doing this right?

rxin · 2016-11-01T00:20:58Z

common/unsafe/src/main/java/org/apache/spark/unsafe/types/CalendarInterval.java

  public final long microseconds;

+  public final long milliseconds() {
+      return this.microseconds / MICROS_PER_MILLI;


2 space indent

rxin · 2016-11-01T00:24:28Z

sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala

      )(sparkSession)).as[T]
  }

+  /**


need a tag here for experimental

rxin · 2016-11-01T00:24:52Z

sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala

+   * @since 2.1.0
+   */
+  @Experimental
+  @InterfaceStability.Evolving


you'd need one that takes in a column wouldn't you?

SparkQA · 2016-11-01T00:37:06Z

Test build #67839 has finished for PR 15702 at commit 5b92132.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

marmbrus · 2016-11-01T00:48:00Z

@ericl - flaky test... Should we turn it off for now?

retest this please

ericl · 2016-11-01T00:58:25Z

I'm still trying to find a failure that includes https://github.com/apache/spark/pull/15701/files. Until then it's hard to debug.

Another option might be turning off or adding a retry around this particular test, I'll make another PR for that.

SparkQA · 2016-11-01T06:15:01Z

Test build #67866 has finished for PR 15702 at commit 14a728e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

brkyvz · 2016-11-01T15:59:51Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala

-  override def toString: String = s"$name#${exprId.id}$typeSuffix"
+  /** Used to signal the column used to calculate an eventTime watermark (e.g. a#1-T{delayMs}) */
+  private def delaySuffix = if (metadata.contains(EventTimeWatermark.delayKey)) {
+    s"-T${metadata.getLong(EventTimeWatermark.delayKey)}"


is this in milliseconds or microseconds like timestamp type?

brkyvz · 2016-11-01T16:01:38Z

...catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/EventTimeWatermark.scala

+    if (a semanticEquals eventTime) {
+      val updatedMetadata = new MetadataBuilder()
+        .withMetadata(a.metadata)
+        .putLong(EventTimeWatermark.delayKey, delay.milliseconds)


I'm a bit confused. Normally Spark SQL uses microsecond precision for TimestampType. When it converts it to LongType, it uses second precision. Here we're using milliseconds. Wouldn't that be super confusing to reason about?

I switched it to using CalendarInterval to make it clearer what units were being used where. I chose milliseconds because it seemed like the right granularity. Microseconds are too short for the global coordination required and seconds lack granularity. It should be easy to change, and I'm open to that if there's consensus this is too confusing though.

Updating the key to include Ms

brkyvz · 2016-11-01T16:18:31Z

A very dumb question (I apologize), there is nothing stopping a user to actually use processing time as watermarks with this API either. One can easily do df.withColumn("timestamp", current_timestamp()).withWatermark("timestamp"). I like that we're suggesting users to use eventTime for stability, but we're not actually constraining them, right?

My biggest confusion here, that I couldn't find documented was the Type of the watermark column. Does it need to be timestamp type or can it be LongType?

marmbrus · 2016-11-01T19:09:34Z

Not a dumb question! You can certainly use processing time if those are the semantics you require. I do think there is a little bit of work we need to do to ensure determinism for these functions. Specifically, now() in SQL is supposed to return the moment query execution began. For streaming we'll need to record this and make sure that we give the same time even in the case of failures (based on the batch id with the current execution model).

Good point on the documentation. The thing you are missing is that it must be used in a window function, which does require TimestampType. I can see how to make this more clear.

SparkQA · 2016-11-02T03:06:38Z

Test build #67939 has finished for PR 15702 at commit 311e7c0.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

koeninger · 2016-11-02T02:58:31Z

sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala

+   *  - To know when a given time window aggregation can be finalized and thus can be emitted when
+   *    using output modes that do not allow updates.
+   *  - To minimize the amount of state that we need to keep for on-going aggregations.
+   *


Should this be "The current watermark is computed..." ?

what is an epoch, it isn't mentioned in the docs or elsewhere in the PR

Changed to watermark. For epoch, I really just mean "during some period of time where we decide too coordinate across the partitions". This happens at batch boundaries now, but that is not part of the contract we are promising. I just removed that word to avoid confusion.

koeninger · 2016-11-02T03:02:15Z

sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala

+   *
+   * Spark will use this watermark for several purposes:
+   *  - To know when a given time window aggregation can be finalized and thus can be emitted when
+   *    using output modes that do not allow updates.


For append, this sounds like the intention is emit only once watermark has passed, and drop state.
But for other output modes, it's not clear from reading this what the effect of the watermark on emission and dropping state is.

koeninger · 2016-11-02T03:08:21Z

sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala

+   *
+   * @param eventTime the name of the column that contains the event time of the row.
+   * @param delayThreshold the minimum delay to wait to data to arrive late, relative to the latest
+   *                       record that has been processed in the form of an interval


Should this make it clear what the minimum useful granularity is (ms)?

That seems like more of an implementation detail, rather than a contract of the API. The real contract is stated above as the actual watermark used is only guaranteed to be at least 'delayThreshold' behind the actual event time. There aren't really any bounds we can promise without knowing more about the query (even ms).

koeninger · 2016-11-02T03:20:57Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StatefulAggregate.scala

+            }
+
+          // Update and output modified rows from the StateStore.
+          case Some(Update) =>


I'm not clear on why the semantics of Update mean that watermarks shouldn't be used to remove state.

@koeninger, Update shall allow the late data to correct the previous results even they are late than the threshold, the similar implementation is in http://cdn.oreillystatic.com/en/assets/1/event/160/Triggers%20in%20Apache%20Beam%20_incubating_%20Presentation.pdf (search withLateFirings)...correct me if I was wrong

To put it the other way, do the docs in this PR tell you as a user that for any output method other than Append, you are potentially keeping unlimited aggregate state in memory, regardless of whether you set a watermark?

The only output modes that are supported publicly are Complete and Append (update is only available internally for tests). When we add support for Update (I'd like to do this soon), it should also evict tuples which can no longer be updated due to their group falling beneath the watermark. I thought that it was fairly clear that Complete would need to retain the complete set of aggregate state, but I'm happy to make this more explicit if others are confused by this.

Yes, I think it's a good idea to explicitly say for each output mode whether watermarks affect emit and evict. Just so I'm clear, the intention is

Append: affects emit, affects evict
Update: doesn't affect emit, affects evict
Complete: doesn't affect emit, no eviction

Is that right?

That is correct.

Generally, updates should be able to take into account late arrivals (in respect to EndOfWindow) and allow to act upon a user defined strategy, such as: update for each following element.

koeninger · 2016-11-02T03:25:27Z

Given the concerns Ofir raised about a single far future event screwing up monotonic event time, do you want to document that problem even if there isn't an enforced filter for it?

SparkQA · 2016-11-02T20:20:57Z

Test build #67998 has finished for PR 15702 at commit 2685771.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

zsxwing

Looks good overall. My comments can be addressed later.

zsxwing · 2016-11-02T20:05:20Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/EventTimeWatermarkExec.scala

+import org.apache.spark.unsafe.types.CalendarInterval
+import org.apache.spark.util.AccumulatorV2
+
+class MaxLong(protected var currentValue: Long = 0)


nit: protected -> private

nit: Could you document that this one only support positive longs?

zsxwing · 2016-11-02T20:05:48Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/EventTimeWatermarkExec.scala

+
+class MaxLong(protected var currentValue: Long = 0)
+  extends AccumulatorV2[Long, Long]
+  with Serializable {


nit: not needed. AccumulatorV2 is already Serializable.

zsxwing · 2016-11-02T20:11:07Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala

 case class ValueUpdated(key: UnsafeRow, value: UnsafeRow) extends StoreUpdate

-case class KeyRemoved(key: UnsafeRow) extends StoreUpdate
+case class ValueRemoved(key: UnsafeRow, value: UnsafeRow) extends StoreUpdate


Any special reason to change this? It seems weird that adding an unused field value.

It is used. We need the value to emit the result upon eviction.

zsxwing · 2016-11-02T20:16:10Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala

+        streamMetrics.reportTriggerDetail(EVENT_TIME_WATERMARK, newWatermark)
+        currentEventTimeWatermark = newWatermark
+      } else {
+        logTrace(s"Event time didn't move: $newWatermark < $currentEventTimeWatermark")


We need to call streamMetrics.reportTriggerDetail(EVENT_TIME_WATERMARK, newWatermark) here. Otherwise, the trigger details won't have EVENT_TIME_WATERMARK for this batch.

zsxwing · 2016-11-02T20:17:49Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala

+    }.headOption.foreach { newWatermark =>
+      if (newWatermark > currentEventTimeWatermark) {
+        logInfo(s"Updating eventTime watermark to: $newWatermark ms")
+        streamMetrics.reportTriggerDetail(EVENT_TIME_WATERMARK, newWatermark)


Is it fine to just set EVENT_TIME_WATERMARK to 0 if the first batch doesn't have any data (E.g., the filter specified by the user drops all data)?

I think thats okay?

I suggest just fixing it since it's pretty easy. Just if (newWatermark == 0) "-" else newWatermark.toString

I see, that makes sense. I actually just moved it out so we only report if its non-zero.

zsxwing · 2016-11-02T20:26:11Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/EventTimeWatermarkExec.scala

+    child.execute().mapPartitions { iter =>
+      val getEventTime = UnsafeProjection.create(eventTime :: Nil, child.output)
+      iter.map { row =>
+        maxEventTime.add(getEventTime(row).getLong(0))


Just a small question: which place will check the eventTime type? I guess getLong just throws an exception if the format is wrong. Can we fail it before starting the spark job?

Added to checkAnalysis.

zsxwing · 2016-11-02T20:29:17Z

sql/core/src/test/scala/org/apache/spark/sql/streaming/WatermarkSuite.scala

+      CheckAnswer((10, 3)),
+      AddData(inputData, 10),     // 10 is later than 15 second watermark
+      CheckAnswer((10, 3)),
+      AddData(inputData, 25),     // 10 is later than 15 second watermark


nit: the comment is wrong

SparkQA · 2016-11-02T20:42:32Z

Test build #3397 has finished for PR 15702 at commit 2685771.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-11-02T20:43:01Z

Test build #3398 has finished for PR 15702 at commit 2685771.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

zsxwing · 2016-11-02T22:48:19Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala

+          case etw: EventTimeWatermark =>
+            etw.eventTime.dataType match {
+              case s: StructType
+                if s.find(_.name == "start").map(_.dataType).contains(TimestampType) =>


nit: Option.contains is not in Scala 2.10.

really? lame...

Oh... it should also check the end of the window, not the start...

SparkQA · 2016-11-02T22:55:49Z

Test build #68023 has finished for PR 15702 at commit 379255d.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-11-02T23:06:23Z

Test build #68025 has finished for PR 15702 at commit 7a9b6dd.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

tdas

Major feedback - Python API for withWatermark()?
Other than its looking good

tdas · 2016-11-03T00:02:11Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala

        operator match {
+          case etw: EventTimeWatermark =>
+            etw.eventTime.dataType match {
+              case s: StructType


Which high level case is caught by this condition?

The result of a window operation.

tdas · 2016-11-03T00:09:31Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/EventTimeWatermarkExec.scala

+  }
+
+  override def add(v: Long): Unit = {
+    if (value < v) { currentValue = v }


nit: less confusing to read if if (currentValue < v) { currentValue = v }.
In fact why not used math.max?

tdas · 2016-11-03T00:09:50Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/EventTimeWatermarkExec.scala

+  }
+
+  override def merge(other: AccumulatorV2[Long, Long]): Unit = {
+    if (currentValue < other.value) {


nit: same as above, why not use math.max

SparkQA · 2016-11-03T02:14:04Z

Test build #68029 has finished for PR 15702 at commit 1d4784f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tdas · 2016-11-10T23:05:38Z

LGTM, pending tests.

SparkQA · 2016-11-11T00:55:05Z

Test build #68496 has finished for PR 15702 at commit de601bb.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

marmbrus · 2016-11-11T00:57:40Z

jenkins, test this please

SparkQA · 2016-11-11T03:37:09Z

Test build #68504 has finished for PR 15702 at commit de601bb.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-11-14T21:36:38Z

Test build #68631 has finished for PR 15702 at commit 87d8618.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

marmbrus · 2016-11-14T21:47:08Z

jenkins test this please

SparkQA · 2016-11-15T00:31:49Z

Test build #68637 has finished for PR 15702 at commit 87d8618.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tdas · 2016-11-15T00:46:01Z

I am merging this to master and 2.1

This PR adds a new method `withWatermark` to the `Dataset` API, which can be used specify an _event time watermark_. An event time watermark allows the streaming engine to reason about the point in time after which we no longer expect to see late data. This PR also has augmented `StreamExecution` to use this watermark for several purposes: - To know when a given time window aggregation is finalized and thus results can be emitted when using output modes that do not allow updates (e.g. `Append` mode). - To minimize the amount of state that we need to keep for on-going aggregations, by evicting state for groups that are no longer expected to change. Although, we do still maintain all state if the query requires (i.e. if the event time is not present in the `groupBy` or when running in `Complete` mode). An example that emits windowed counts of records, waiting up to 5 minutes for late data to arrive. ```scala df.withWatermark("eventTime", "5 minutes") .groupBy(window($"eventTime", "1 minute") as 'window) .count() .writeStream .format("console") .mode("append") // In append mode, we only output finalized aggregations. .start() ``` ### Calculating the watermark. The current event time is computed by looking at the `MAX(eventTime)` seen this epoch across all of the partitions in the query minus some user defined _delayThreshold_. An additional constraint is that the watermark must increase monotonically. Note that since we must coordinate this value across partitions occasionally, the actual watermark used is only guaranteed to be at least `delay` behind the actual event time. In some cases we may still process records that arrive more than delay late. This mechanism was chosen for the initial implementation over processing time for two reasons: - it is robust to downtime that could affect processing delay - it does not require syncing of time or timezones between the producer and the processing engine. ### Other notable implementation details - A new trigger metric `eventTimeWatermark` outputs the current value of the watermark. - We mark the event time column in the `Attribute` metadata using the key `spark.watermarkDelay`. This allows downstream operations to know which column holds the event time. Operations like `window` propagate this metadata. - `explain()` marks the watermark with a suffix of `-T${delayMs}` to ease debugging of how this information is propagated. - Currently, we don't filter out late records, but instead rely on the state store to avoid emitting records that are both added and filtered in the same epoch. ### Remaining in this PR - [ ] The test for recovery is currently failing as we don't record the watermark used in the offset log. We will need to do so to ensure determinism, but this is deferred until #15626 is merged. ### Other follow-ups There are some natural additional features that we should consider for future work: - Ability to write records that arrive too late to some external store in case any out-of-band remediation is required. - `Update` mode so you can get partial results before a group is evicted. - Other mechanisms for calculating the watermark. In particular a watermark based on quantiles would be more robust to outliers. Author: Michael Armbrust <michael@databricks.com> Closes #15702 from marmbrus/watermarks. (cherry picked from commit c071878) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>

This PR adds a new method `withWatermark` to the `Dataset` API, which can be used specify an _event time watermark_. An event time watermark allows the streaming engine to reason about the point in time after which we no longer expect to see late data. This PR also has augmented `StreamExecution` to use this watermark for several purposes: - To know when a given time window aggregation is finalized and thus results can be emitted when using output modes that do not allow updates (e.g. `Append` mode). - To minimize the amount of state that we need to keep for on-going aggregations, by evicting state for groups that are no longer expected to change. Although, we do still maintain all state if the query requires (i.e. if the event time is not present in the `groupBy` or when running in `Complete` mode). An example that emits windowed counts of records, waiting up to 5 minutes for late data to arrive. ```scala df.withWatermark("eventTime", "5 minutes") .groupBy(window($"eventTime", "1 minute") as 'window) .count() .writeStream .format("console") .mode("append") // In append mode, we only output finalized aggregations. .start() ``` ### Calculating the watermark. The current event time is computed by looking at the `MAX(eventTime)` seen this epoch across all of the partitions in the query minus some user defined _delayThreshold_. An additional constraint is that the watermark must increase monotonically. Note that since we must coordinate this value across partitions occasionally, the actual watermark used is only guaranteed to be at least `delay` behind the actual event time. In some cases we may still process records that arrive more than delay late. This mechanism was chosen for the initial implementation over processing time for two reasons: - it is robust to downtime that could affect processing delay - it does not require syncing of time or timezones between the producer and the processing engine. ### Other notable implementation details - A new trigger metric `eventTimeWatermark` outputs the current value of the watermark. - We mark the event time column in the `Attribute` metadata using the key `spark.watermarkDelay`. This allows downstream operations to know which column holds the event time. Operations like `window` propagate this metadata. - `explain()` marks the watermark with a suffix of `-T${delayMs}` to ease debugging of how this information is propagated. - Currently, we don't filter out late records, but instead rely on the state store to avoid emitting records that are both added and filtered in the same epoch. ### Remaining in this PR - [ ] The test for recovery is currently failing as we don't record the watermark used in the offset log. We will need to do so to ensure determinism, but this is deferred until apache#15626 is merged. ### Other follow-ups There are some natural additional features that we should consider for future work: - Ability to write records that arrive too late to some external store in case any out-of-band remediation is required. - `Update` mode so you can get partial results before a group is evicted. - Other mechanisms for calculating the watermark. In particular a watermark based on quantiles would be more robust to outliers. Author: Michael Armbrust <michael@databricks.com> Closes apache#15702 from marmbrus/watermarks.

marmbrus added 3 commits October 27, 2016 21:11

first test passing

e6e3bbe

cleanup

9232072

Merge remote-tracking branch 'origin/master' into watermarks

5b92132

marmbrus commented Oct 31, 2016

View reviewed changes

rxin reviewed Nov 1, 2016

View reviewed changes

marmbrus changed the title ~~[SPARK-18124] Observed-delay based Even Time Watermarks~~ [SPARK-18124] Observed-delay based Event Time Watermarks Nov 1, 2016

rxin reviewed Nov 1, 2016

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala

)(sparkSession)).as[T]

}

/**

Copy link

Contributor

rxin Nov 1, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need a tag here for experimental

rxin reviewed Nov 1, 2016

View reviewed changes

comments

14a728e

marmbrus changed the title ~~[SPARK-18124] Observed-delay based Event Time Watermarks~~ [SPARK-18124] Observed delay based Event Time Watermarks Nov 1, 2016

brkyvz reviewed Nov 1, 2016

View reviewed changes

fix unwindowed groupping, say milliseconds

311e7c0

koeninger reviewed Nov 2, 2016

View reviewed changes

fix wording

2685771

zsxwing reviewed Nov 2, 2016

View reviewed changes

ryan's comments

379255d

zsxwing reviewed Nov 2, 2016

View reviewed changes

scala 2.10...

7a9b6dd

marmbrus added 2 commits November 2, 2016 15:57

Merge remote-tracking branch 'origin/master' into watermarks

554119a

update to use .eval

1d4784f

tdas reviewed Nov 3, 2016

View reviewed changes

tdas comments

de601bb

Merge remote-tracking branch 'origin/master' into watermarks

87d8618

asfgit closed this in c071878 Nov 15, 2016

[SPARK-18124] Observed delay based Event Time Watermarks #15702

[SPARK-18124] Observed delay based Event Time Watermarks #15702

Uh oh!

Conversation

marmbrus commented Oct 31, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Calculating the watermark.

Other notable implementation details

Remaining in this PR

Other follow-ups

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Nov 1, 2016

Uh oh!

marmbrus commented Nov 1, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ericl commented Nov 1, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Nov 1, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brkyvz commented Nov 1, 2016

Uh oh!

marmbrus commented Nov 1, 2016

Uh oh!

SparkQA commented Nov 2, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CodingCat Nov 2, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amitsela Nov 13, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

koeninger commented Nov 2, 2016

Uh oh!

SparkQA commented Nov 2, 2016

Uh oh!

zsxwing left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

marmbrus commented Oct 31, 2016 •

edited

Loading

marmbrus commented Nov 1, 2016 •

edited

Loading

ericl commented Nov 1, 2016 •

edited

Loading

CodingCat Nov 2, 2016 •

edited

Loading

amitsela Nov 13, 2016 •

edited

Loading