[SPARK-19719][SS] Kafka writer for both structured streaming and batch queires #17043

ghost · 2017-02-23T21:50:38Z

What changes were proposed in this pull request?

Add a new Kafka Sink and Kafka Relation for writing streaming and batch queries, respectively, to Apache Kafka.

Streaming Kafka Sink

When addBatch is called
-- If batchId is great than the last written batch
--- Write batch to Kafka
---- Topic will be taken from the record, if present, or from a topic option, which overrides topic in record.
-- Else ignore

Batch Kafka Sink

KafkaSourceProvider will implement CreatableRelationProvider
CreatableRelationProvider#createRelation will write the passed in Dataframe to a Kafka
Topic will be taken from the record, if present, or from topic option, which overrides topic in record.
Save modes Append and ErrorIfExist supported under identical semantics. Other save modes result in an AnalysisException

@tdas @zsxwing

How was this patch tested?

The following unit tests will be included

write to stream with topic field: valid stream write with data that includes an existing topic in the schema
write structured streaming aggregation w/o topic field, with default topic: valid stream write with data that does not include a topic field, but the configuration includes a default topic
write data with bad schema: various cases of writing data that does not conform to a proper schema e.g., 1. no topic field or default topic, and 2. no value field
write data with valid schema but wrong types: data with a complete schema but wrong types e.g., key and value types are integers.
write to non-existing topic: write a stream to a topic that does not exist in Kafka, which has been configured to not auto-create topics.
write batch to kafka: simple write batch to Kafka, which goes through the same code path as streaming scenario, so validity checks will not be redone here.

Examples

// Structured Streaming
val writer = inputStringStream.map(s => s.get(0).toString.getBytes()).toDF("value")
 .selectExpr("value as key", "value as value")
 .writeStream
 .format("kafka")
 .option("checkpointLocation", checkpointDir)
 .outputMode(OutputMode.Append)
 .option("kafka.bootstrap.servers", brokerAddress)
 .option("topic", topic)
 .queryName("kafkaStream")
 .start()

// Batch 
val df = spark
 .sparkContext
 .parallelize(Seq("1", "2", "3", "4", "5"))
 .map(v => (topic, v))
 .toDF("topic", "value")

df.write
 .format("kafka")
 .option("kafka.bootstrap.servers",brokerAddress)
 .option("topic", topic)
 .save()

Please review http://spark.apache.org/contributing.html before opening a pull request.

…8682

SparkQA · 2017-02-24T18:31:40Z

Test build #73436 has finished for PR 17043 at commit 129cfcd.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-02-24T18:35:45Z

Test build #73438 has finished for PR 17043 at commit 67e3c06.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-02-24T20:39:37Z

Test build #73443 has finished for PR 17043 at commit e6b6dc1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

zsxwing

Made one pass. Most of my comments are minor.

zsxwing · 2017-02-24T22:12:17Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSink.scala

+import org.apache.spark.sql.execution.streaming.Sink
+
+private[kafka010] class KafkaSink(
+  sqlContext: SQLContext,


nit: 4 spaces

zsxwing · 2017-02-24T22:12:42Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSink.scala

+  sqlContext: SQLContext,
+  executorKafkaParams: ju.Map[String, Object],
+  defaultTopic: Option[String]) extends Sink with Logging {
+  var latestBatchId = -1L


nit: private

zsxwing · 2017-02-24T22:21:52Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala

+  val STARTING_OFFSETS_OPTION_KEY = "startingoffsets"
+  val ENDING_OFFSETS_OPTION_KEY = "endingoffsets"
+  val FAIL_ON_DATA_LOSS_OPTION_KEY = "failondataloss"
+  val TOPIC_OPTION_KEY = "topic"


nit: looks like only this one needs to be public

zsxwing · 2017-02-24T22:27:52Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala

+      partitionColumns: Seq[String],
+      outputMode: OutputMode): Sink = {
+    val caseInsensitiveParams = parameters.map { case (k, v) => (k.toLowerCase, v) }
+    val defaultTopic = caseInsensitiveParams.get(TOPIC_OPTION_KEY).map(_.trim.toLowerCase)


Remove toLowerCase. Kakfa's topic is case sensitive.

zsxwing · 2017-02-24T22:31:49Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala

+      outputMode: OutputMode): Sink = {
+    val caseInsensitiveParams = parameters.map { case (k, v) => (k.toLowerCase, v) }
+    val defaultTopic = caseInsensitiveParams.get(TOPIC_OPTION_KEY).map(_.trim.toLowerCase)
+    val specifiedKafkaParams =


Need to throw an exception if the user specifies serializer like the source. Also need to add tests.

if (caseInsensitiveParams.contains(s"kafka.${ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG}")) { throw new IllegalArgumentException( s"Kafka option '${ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG}' is not supported as keys " + "are deserialized as byte arrays with ByteArrayDeserializer. Use DataFrame operations " + "to explicitly deserialize the keys.") } if (caseInsensitiveParams.contains(s"kafka.${ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG}")) { throw new IllegalArgumentException( s"Kafka option '${ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG}' is not supported as " + "value are deserialized as byte arrays with ByteArrayDeserializer. Use DataFrame " + "operations to explicitly deserialize the values.") }

zsxwing · 2017-02-24T23:19:23Z

external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSinkSuite.scala

+      "save mode overwrite not allowed for kafka"))
+  }
+
+  test("write big data with small producer buffer") {


Could you clarify what's the purpose of this test?

zsxwing · 2017-02-24T23:23:05Z

external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSinkSuite.scala

+  }
+
+  test("write batch with null topic field value, and no topic option") {
+    val df = spark


nit: val df = Seq[(String, String)](null -> "1").toDF("topic", "value")

zsxwing · 2017-02-24T23:23:16Z

external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSinkSuite.scala

+  test("write batch unsupported save modes") {
+    val topic = newTopic()
+    testUtils.createTopic(topic)
+    val df = spark


nit: val df = Seq[(String, String)](null -> "1").toDF("topic", "value")

zsxwing · 2017-02-24T23:24:52Z

external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSinkSuite.scala

+  test("write batch to kafka") {
+    val topic = newTopic()
+    testUtils.createTopic(topic)
+    val df = spark


nit: val df = Seq("1", "2", "3", "4", "5").map(v => (topic, v)).toDF("topic", "value")

zsxwing · 2017-02-24T23:30:15Z

external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSinkSuite.scala

+    override def runAction(): Unit = {
+      ms.addData(values)
+      q.processAllAvailable()
+      Thread.sleep(5000) // wait for data to appear in Kafka


Don't use sleep. How about passing a latest offset to AddMoreData and waiting until you can see it via KafkaTestUtils.getLatestOffsets?

@tcondie see my higher-level comments about refactoring the tests.

tdas

Its looks quite good, but some changes are still needed.

Tests can be improved a lot. Its too verbose
Validation of schema needs to be done before creating the sink to throw AnalysisException before query has started.

tdas · 2017-02-24T21:06:49Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSink.scala

+import org.apache.spark.sql.execution.streaming.Sink
+
+private[kafka010] class KafkaSink(
+  sqlContext: SQLContext,


incorrect indent

tdas · 2017-02-24T21:07:39Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSink.scala

+  sqlContext: SQLContext,
+  executorKafkaParams: ju.Map[String, Object],
+  defaultTopic: Option[String]) extends Sink with Logging {
+  var latestBatchId = -1L


make this volatile, just in case we ever parallelize things.

tdas · 2017-02-24T21:07:52Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala

  }
-
 }
+


tdas · 2017-02-24T21:45:28Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSink.scala

+private[kafka010] class KafkaSink(
+  sqlContext: SQLContext,
+  executorKafkaParams: ju.Map[String, Object],
+  defaultTopic: Option[String]) extends Sink with Logging {


change to topic.

tdas · 2017-02-24T21:46:04Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSink.scala

+        data.queryExecution, executorKafkaParams, defaultTopic)
+      latestBatchId = batchId
+    }
+  }


Set a good toString() so that it shows up nicely in the StreamingQueryProgress.

tdas · 2017-02-24T23:08:28Z

external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSinkSuite.scala

+  }
+
+
+  test("write data with bad schema") {


bad schema error should be thrown immediately when start() is called, rather later after the query has started. you have to validate before creating the sink object for this.

tdas · 2017-02-24T23:16:34Z

external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSinkSuite.scala

+
+  private def newTopic(): String = s"topic-${topicId.getAndIncrement()}"
+
+  test("write to stream with topic field") {


This is a hugely verbose and round about way of testing kafka - using the streaming source to test the streaming sink is complicated and hard to debug when things go wrong. I think we should investigate a better way. How about you try this, similar to the FileStreamSinkSuite

Use memory stream as you are doing, just use AddData (no need for AddMoreData)

Define a function called checkKafka(expectedData) which will

processAllAvailable

read data in kafka as batch query and verify the results.

Then the test would look simpler.

testStream ( AddData(...) checkKafka(...) AddData(...) checkKafka(...) ... )

Here is a what the checkKafka method would look like. See
https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/streaming/StateStoreMetricsTest.scala#L22

def checkKafka[T](expectedResults: T*): AssertOnQuery = AssertOnQuery { q => q.processAllAvailable val kafkaData = spark.read.format("kafka") .... checkDataset(expectedResults, kafkaData) }

tdas · 2017-02-24T23:18:15Z

external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSinkSuite.scala

+
+  private val topicId = new AtomicInteger(0)
+
+  private def newTopic(): String = s"topic-${topicId.getAndIncrement()}"


i would organize the tests as follows.

tests for batch queries as they test lesser code paths

names should start with "batch - "

these would test for all the corner cases of topic not there, etc.

tests for streaming queries

names should start with "streaming - "

tdas · 2017-02-24T23:19:36Z

external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSinkSuite.scala

+          .selectExpr(s"'$topic' as topic", "CAST(value as INT) as key", "value")
+          .writeStream
+          .format("kafka")
+          .option("checkpointLocation", checkpointDir.getCanonicalPath)


why do you need checkpoint location for this test? simplify each test as much as possible with as little requirements as possible.

and a lot of this code duplicate. make internal functions to reduce duplication.

tdas · 2017-02-24T23:20:20Z

external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSinkSuite.scala

+        writer = input.toDF()
+          .writeStream
+          .format("kafka")
+          .option("checkpointLocation", checkpointDir.getCanonicalPath)


why do you need checkpointing?

tdas · 2017-02-24T23:51:10Z

external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSinkSuite.scala

+        count += 1
+        val fieldTypes: Array[DataType] = Array(BinaryType)
+        val converter = UnsafeProjection.create(fieldTypes)
+


nit: extra line
and you can make this code simpler

val fieldTypes: Array[DataType] = Array(BinaryType) val converter = UnsafeProjection.create(fieldTypes)val row = new SpecificInternalRow(fieldTypes) row.update(0, data) val iter = Seq.fill(1000)(converter.apply(row)).iterator writeTask.execute(iter)

…riter

SparkQA · 2017-02-28T22:46:19Z

Test build #73614 has finished for PR 17043 at commit b48f173.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tdas · 2017-03-03T22:39:48Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSink.scala

+private[kafka010] class KafkaSink(
+    sqlContext: SQLContext,
+    executorKafkaParams: ju.Map[String, Object],
+    defaultTopic: Option[String]) extends Sink with Logging {


nit: this is still called defaultTopic. its not a default one.

SparkQA · 2017-03-03T23:35:53Z

Test build #73870 has finished for PR 17043 at commit 2dd3ffb.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

zsxwing

Overall looks good. Left some minor comments.

zsxwing · 2017-03-06T06:51:13Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala

+          s"${SaveMode.ErrorIfExists} (default).")
+      case _ => // good
+    }
+    val defaultTopic = parameters.get(TOPIC_OPTION_KEY).map(_.trim.toLowerCase)


Remove toLowerCase

zsxwing · 2017-03-06T07:05:02Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala

+     * We cannot support this for Kafka. Therefore, in order to make things consistent,
+     * we return an empty base relation.
+     */
+    new BaseRelation {


Looks like this return value is called in CreateDataSourceTableAsSelectCommand. Kafka cannot support it. I think it's better to make the methods of this special BaseRelation throw UnsupportedOperationException in case the returned relation is used by mistake.

zsxwing · 2017-03-06T07:08:36Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala

+      .keySet
+      .filter(_.toLowerCase.startsWith("kafka."))
+      .map { k => k.drop(6).toString -> parameters(k) }
+      .toMap + (ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG -> classOf[BytesSerializer].getName,


ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG -> ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG
ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG -> ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG

After fixing them, KafkaWriteTask doesn't need to set these configs.

zsxwing · 2017-03-06T07:09:56Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaWriteTask.scala

+    inputSchema: Seq[Attribute],
+    topic: Option[String]) {
+  // used to synchronize with Kafka callbacks
+  @volatile var failedWrite: Exception = null


nit: private

zsxwing · 2017-03-06T07:10:00Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaWriteTask.scala

+    topic: Option[String]) {
+  // used to synchronize with Kafka callbacks
+  @volatile var failedWrite: Exception = null
+  val projection = createProjection


nit: private

zsxwing · 2017-03-06T07:10:04Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaWriteTask.scala

+  // used to synchronize with Kafka callbacks
+  @volatile var failedWrite: Exception = null
+  val projection = createProjection
+  var producer: KafkaProducer[Array[Byte], Array[Byte]] = _


nit: private

zsxwing · 2017-03-06T07:12:45Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala

+
+  private def kafkaParamsForProducer(parameters: Map[String, String]): Map[String, String] = {
+    val caseInsensitiveParams = parameters.map { case (k, v) => (k.toLowerCase, v) }
+    if (caseInsensitiveParams.contains(s"kafka.${ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG}")) {


ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG -> ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG

Please also fix the exception message.

zsxwing · 2017-03-06T07:12:59Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala

+          + "to explicitly deserialize the keys.")
+    }
+
+    if (caseInsensitiveParams.contains(s"kafka.${ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG}"))


ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG -> ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG

Please also fix the exception message.

zsxwing · 2017-03-06T07:20:30Z

external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSinkSuite.scala

+      .format("kafka")
+      .option("kafka.bootstrap.servers", testUtils.brokerAddress)
+      .option("topic", topic)
+      .save()


This test should read from Kafka and compare the results in order to verify the results were written correctly.

SparkQA · 2017-03-06T20:20:41Z

Test build #74034 has finished for PR 17043 at commit b1d554a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tdas · 2017-03-06T21:26:13Z

external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSinkSuite.scala

+
+  private def newTopic(): String = s"topic-${topicId.getAndIncrement()}"
+
+  private def createKafkaReader(topic: String): DataFrame = {


super nit: can you move these utility functions (topicId, newTopic, createKafka*) to the end of the class, so that the tests are earlier in the class.

tdas · 2017-03-06T21:27:16Z

external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSinkSuite.scala

+      .option("kafka.bootstrap.servers", testUtils.brokerAddress)
+      .option("topic", topic)
+      .save()
+    checkAnswer(createKafkaReader(topic).selectExpr("CAST(value as STRING) value"),


super nit: formatting.

checkAnswer( createKafkaReader(topic).selectExpr("CAST(value as STRING) value"), Row("1") :: Row("2") :: Row("3") :: Row("4") :: Row("5") :: Nil)

tdas · 2017-03-06T21:44:48Z

external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSinkSuite.scala

+      input: DataFrame,
+      withTopic: Option[String] = None,
+      withOutputMode: Option[OutputMode] = None,
+      withOptions: Option[Map[String, String]] = None)


this can default to empty instead of using option around a map.

tdas · 2017-03-06T21:48:41Z

LGTM to the code except minor nits in tests. Please update the title (remove WIP) and description (add design) and then we can merge this.

SparkQA · 2017-03-06T23:03:29Z

Test build #74043 has finished for PR 17043 at commit 107e513.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tdas · 2017-03-07T00:36:13Z

Merging this to master. Thank you very much @tcondie

…h queires ## What changes were proposed in this pull request? Add a new Kafka Sink and Kafka Relation for writing streaming and batch queries, respectively, to Apache Kafka. ### Streaming Kafka Sink - When addBatch is called -- If batchId is great than the last written batch --- Write batch to Kafka ---- Topic will be taken from the record, if present, or from a topic option, which overrides topic in record. -- Else ignore ### Batch Kafka Sink - KafkaSourceProvider will implement CreatableRelationProvider - CreatableRelationProvider#createRelation will write the passed in Dataframe to a Kafka - Topic will be taken from the record, if present, or from topic option, which overrides topic in record. - Save modes Append and ErrorIfExist supported under identical semantics. Other save modes result in an AnalysisException tdas zsxwing ## How was this patch tested? ### The following unit tests will be included - write to stream with topic field: valid stream write with data that includes an existing topic in the schema - write structured streaming aggregation w/o topic field, with default topic: valid stream write with data that does not include a topic field, but the configuration includes a default topic - write data with bad schema: various cases of writing data that does not conform to a proper schema e.g., 1. no topic field or default topic, and 2. no value field - write data with valid schema but wrong types: data with a complete schema but wrong types e.g., key and value types are integers. - write to non-existing topic: write a stream to a topic that does not exist in Kafka, which has been configured to not auto-create topics. - write batch to kafka: simple write batch to Kafka, which goes through the same code path as streaming scenario, so validity checks will not be redone here. ### Examples ```scala // Structured Streaming val writer = inputStringStream.map(s => s.get(0).toString.getBytes()).toDF("value") .selectExpr("value as key", "value as value") .writeStream .format("kafka") .option("checkpointLocation", checkpointDir) .outputMode(OutputMode.Append) .option("kafka.bootstrap.servers", brokerAddress) .option("topic", topic) .queryName("kafkaStream") .start() // Batch val df = spark .sparkContext .parallelize(Seq("1", "2", "3", "4", "5")) .map(v => (topic, v)) .toDF("topic", "value") df.write .format("kafka") .option("kafka.bootstrap.servers",brokerAddress) .option("topic", topic) .save() ``` Please review http://spark.apache.org/contributing.html before opening a pull request. Author: Tyson Condie <tcondie@gmail.com> Closes #17043 from tcondie/kafka-writer.

Tyson Condie added 30 commits January 19, 2017 17:53

add kafka relation and refactor kafka source

d371758

update

b6c3055

update

4c81812

single kafka provider for both stream and batch

ab02a4c

added uninterruptible thread version of kafka offset reader

e6b57ed

added uninterruptible thread version of kafka offset reader

ff94ed8

update tests

f8fd34c

resolve conflicts in KafakSource

41271e2

update comments

74d96fc

address comments from @zsxwing

d31fc81

update

1db1649

Merge branch 'master' of https://github.com/apache/spark into SPARK-1…

3b0d48b

…8682

address comments from @zsxwing

a5b0269

late binding offsets

c08c01f

update to late binding logic

79d335e

Merge branch 'SPARK-18682' into kafka-writer

a44b365

remove kafka log4j debug

51291e3

remove kafka log4j debug

b597cf1

Merge branch 'SPARK-18682' into kafka-writer

84b32c5

update

f5ae301

address comments from @zsxwing

2487a72

update

789d3af

Merge branch 'SPARK-18682' into kafka-writer

56a06e7

update

e74473b

update

73df054

address comments from @tdas

5b48fc6

address feedback from @tdas and @sxwing

5776009

update merge

63d453f

update

3c4eecf

update

b0611e4

Tyson Condie added 2 commits February 24, 2017 10:06

update

129cfcd

update

67e3c06

revise exceptions and topic option

e6b6dc1

zsxwing requested changes Feb 24, 2017

View reviewed changes

tdas suggested changes Feb 24, 2017

View reviewed changes

tdas reviewed Feb 24, 2017

View reviewed changes

Tyson Condie added 2 commits February 27, 2017 13:52

Merge branch 'master' of https://github.com/apache/spark into kafka-w…

3981d7b

…riter

address comments from @tdas @zsxwing

b48f173

tdas reviewed Mar 3, 2017

View reviewed changes

update

2dd3ffb

zsxwing requested changes Mar 6, 2017

View reviewed changes

address comments from @zsxwing

b1d554a

tdas reviewed Mar 6, 2017

View reviewed changes

update

107e513

ghost changed the title ~~[SPARK-19719][SS][WIP] Kafka writer for both structured streaming and batch queires~~ [SPARK-19719][SS] Kafka writer for both structured streaming and batch queires Mar 6, 2017

asfgit closed this in b0a5cd8 Mar 7, 2017

Nimfadora mentioned this pull request Mar 20, 2019

[SPARK-20597][SQL][SS][WIP] KafkaSourceProvider falls back on path as synonym for topic #23791

Closed


		private def newTopic(): String = s"topic-${topicId.getAndIncrement()}"

		test("write to stream with topic field") {


		private val topicId = new AtomicInteger(0)

		private def newTopic(): String = s"topic-${topicId.getAndIncrement()}"


		private def newTopic(): String = s"topic-${topicId.getAndIncrement()}"

		private def createKafkaReader(topic: String): DataFrame = {

[SPARK-19719][SS] Kafka writer for both structured streaming and batch queires #17043

[SPARK-19719][SS] Kafka writer for both structured streaming and batch queires #17043

Uh oh!

Conversation

ghost commented Feb 23, 2017 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Streaming Kafka Sink

Batch Kafka Sink

How was this patch tested?

The following unit tests will be included

Examples

Uh oh!

SparkQA commented Feb 24, 2017

Uh oh!

SparkQA commented Feb 24, 2017

Uh oh!

SparkQA commented Feb 24, 2017

Uh oh!

zsxwing left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tdas left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tdas Feb 24, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tdas Feb 24, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Feb 28, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Mar 3, 2017

Uh oh!

ghost commented Feb 23, 2017 •

edited by ghost

Loading

tdas left a comment •

edited

Loading

tdas Feb 24, 2017 •

edited

Loading

tdas Feb 24, 2017 •

edited

Loading