Add mu kafka consumer producer #793

naree · 2020-02-17T11:42:07Z

What this does?

https://github.com/47deg/marlow/issues/515
https://github.com/47deg/marlow/issues/542

This PR is about creating APIs for building Kafka consumer and producer with minimum effort with fs2-kafka and models defined and generated using sbt-mu-srcgen.

The kafka poc has been updated to use the APIs

https://github.com/naree/mu-kafka-sandbox

An example in the IT test and good entry point to the API use.

https://github.com/higherkindness/mu-scala/blob/add-mu-kafka-consumer-producer/modules/kafka/src/test/scala/higherkindness/mu/kafka/it/example/MuKafkaServiceSpec.scala#L88-L95

Two types of developers were considered as the target users

FP developers with cats-effect experience only
FP developers with experience in both cats-effect and fs2-stream

The idea is not to shield developers from accessing fs2-stream but provide simple APIs to start from.

Additionally, I made the assumption that the developers would be familiar with
concurrent programming and understand how to select the correct ContextShift.

This PR is one of features for mu-kafka.
Still to come in later PRs.

Documentation
Message key generation and propagation strategy
IT Testing with Docker
Multiple messages types per topic
At least once message processing paradigm
Transaction
Config
Metrics

Checklist

Reviewed the diff to look for typos, println and format errors.
Updated the docs accordingly.

naree · 2020-02-17T11:54:05Z

modules/kafka/src/main/scala/higherkindness/mu/kafka/ProducerStream.scala

+          .eval(
+            Logger[F].info(
+              result.records.head
+                .fold("Error: ProducerResult contained empty records.")(a => s"Published $a")


@pirita This is an example how an error can be logged inside the fs2 Stream inside the IO monad. This should become available on Kibana via the normal log aggregation path as we discussed with @pepegar

codecov · 2020-02-17T11:59:41Z

Codecov Report

Merging #793 into master will increase coverage by 0.16%.
The diff coverage is 93.54%.

@@            Coverage Diff             @@
##           master     #793      +/-   ##
==========================================
+ Coverage   88.71%   88.87%   +0.16%     
==========================================
  Files          58       61       +3     
  Lines         833      863      +30     
  Branches        2        1       -1     
==========================================
+ Hits          739      767      +28     
- Misses         94       96       +2

Impacted Files	Coverage Δ
...scala/higherkindness/mu/kafka/ProducerStream.scala	`85.71% <85.71%> (ø)`
...cala/higherkindness/mu/format/AvroWithSchema.scala	`100.00% <100.00%> (ø)`
...scala/higherkindness/mu/kafka/ConsumerStream.scala	`100.00% <100.00%> (ø)`
...herkindness/mu/rpc/kafka/KafkaManagementImpl.scala	`100.00% <0.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 42e5749...59675ea. Read the comment docs.

naree · 2020-02-17T12:01:46Z

modules/kafka/src/test/scala/higherkindness/mu/kafka/it/example/MuKafkaServiceSpec.scala

+        .unsafeRunAsyncAndForget()
+
+      kafka
+        .consumer(topic, consumerGroup, putConsumeMessageIntoFuture)


@raulraja putConsumeMessageIntoFuture is the function for processing the message. Devs experienced in fs2 can use higherkindness.mu.kafka.ConsumerStream directly.

naree · 2020-02-17T12:02:52Z

modules/kafka/src/test/scala/higherkindness/mu/kafka/it/example/MuKafkaServiceSpec.scala

+        }
+        (consumed, processor)
+      }
+


Below are the two functions for creating a Kafka producer and consumer wrapped in IO monads.

raulraja

😊 looks great

tzimisce012 · 2020-02-17T12:10:31Z

modules/kafka/src/main/scala/higherkindness/mu/kafka/package.scala

+  ): F[Unit] =
+    ConsumerStream(topic, consumerSettings.atLeastOnceFromEarliest(groupId, brokers))
+      .through(messageProcessingPipe)
+      .compile


Shouldn't we keep the result of consumer as Stream[F, A]. In this way, the user could consume a message, process it and then produce this new message on a new topic 🤔

tzimisce012 · 2020-02-17T12:13:58Z

modules/kafka/src/main/scala/higherkindness/mu/kafka/package.scala

+  def consumer[F[_], A](
+      topic: String,
+      groupId: String,
+      messageProcessingPipe: Pipe[F, A, A]


Does it make sense the signature Pipe[F, A, A]? Let's say that I consume a message (UserAdded), with its id I go to the db and I get another thing (UserInfo) and I want to return a mix between those 2 data models (UserAddedInfo). Could it be possible?

@tzimisce012 yes I was considering allowing the return type to be different.

Since we're calling .drain, the return type of the pipe doesn't matter, right? We may as well make it Pipe[F, A, _].

tzimisce012 · 2020-02-17T12:23:11Z

modules/kafka/src/main/scala/higherkindness/mu/kafka/ConsumerStream.scala

+            val a = decoder.decode(message.record.value)
+            for {
+              _ <- fs2.Stream.eval(logger.info(a.toString))
+            } yield a


where do you commit the read messages? 🤔

@tzimisce012 With h.m.k.consumerSetting.atLeastOnceFromEarliest setttings, I am leaving it to be auto.

Oh, I just saw this .withEnableAutoCommit(true)

I think we'll need to iterate on this later. Right now I only see atLeastOnceFromEarliest, which turns on auto-commit. We'll probably want to provide more flexibility in commit strategies, possibly by exposing fs2-kafka's CommitableConsumerRecord or something equivalent to it.

Yes, at the moment it is quite rigid. I wanted to avoid getting this PR longer than it is now 😄. So I added only a simple case.

I want to develop the example project further, mu-kafka-sandbox by adding more complicated scenarios to feed into the mu-kafka development.

tzimisce012 · 2020-02-17T12:28:59Z

modules/kafka/src/main/scala/higherkindness/mu/kafka/ConsumerStream.scala

+      logger: Logger[F]
+  ): Stream[F, A] =
+    kafkaConsumerStream
+      .evalTap(_.subscribeTo(topic))


Are you considering the option of partitionStream? In this way, you will have a stream of streams, but on each stream you will have ordered messages. Also, on this way, each stream can be processed on parallel (if you wish that)

@tzimisce012 Yes, I want to tackle more complicated scenarios later. With this PR, I wanted to make that everyone is happy with the general direction :)

fedefernandez · 2020-02-18T08:37:03Z

modules/kafka/src/main/scala/higherkindness/mu/format/AvroWithSchema.scala

+object AvroWithSchema {
+  implicit def encoder[T: SchemaFor: ToRecord]: Encoder[T] = new Encoder[T] {
+    override def encode(t: T): Array[Byte] = {
+      val bOut = new ByteArrayOutputStream()


this looks pretty similar to this

mu-scala/modules/internal/src/main/scala/higherkindness/mu/rpc/internal/encoders/avro.scala

Lines 58 to 63 in 136f81e

val baos: ByteArrayOutputStream = new ByteArrayOutputStream()

val output: AvroBinaryOutputStream[A] = AvroOutputStream.binary[A](baos)

output.write(value)

output.close()

new ByteArrayInputStream(baos.toByteArray)

Would it be possible to put the code in one place and reuse it?

Yes I thought about this as well. I will add a ticket to refactor to share the common code.

fedefernandez · 2020-02-18T08:38:41Z

modules/kafka/src/main/scala/higherkindness/mu/format/AvroWithSchema.scala

+    override def decode(bytes: Array[Byte]): T = {
+      val in = AvroInputStream.data[T](bytes)
+      in.close()
+      in.iterator.toSet.head


Same that above. In this case, it seems we're not closing the stream in the internal

mu-scala/modules/internal/src/main/scala/higherkindness/mu/rpc/internal/encoders/avro.scala

Lines 53 to 54 in 136f81e

val input: AvroBinaryInputStream[A] = AvroInputStream.binary[A](stream)

input.iterator.toList.head

In this file, I don't think the in.close() is doing anything, because the AvroInputStream is wrapping a SeekableByteArrayInput, whose close() method is a no-op.

In the file linked by @fedefernandez, I think we're doing the right thing? The gRPC Javadoc doesn't say anything about whether we should close the stream, but the standard Java idiom is that you don't close a resource you were passed as an argument. Closing the stream is the responsibility of its creator.

fedefernandez · 2020-02-18T08:40:21Z

modules/kafka/src/main/scala/higherkindness/mu/kafka/ConsumerStream.scala

+
+object ConsumerStream {
+
+  def apply[F[_]: Sync, A](topic: String, settings: ConsumerSettings[F, String, Array[Byte]])(


I think we don't need Sync since we have the ConcurrentEffect instance

fedefernandez · 2020-02-18T08:40:37Z

modules/kafka/src/main/scala/higherkindness/mu/kafka/ConsumerStream.scala

+      concurrentEffect: ConcurrentEffect[F],
+      timer: Timer[F],
+      decoder: Decoder[A],
+      sync: Sync[F]


Not needed at this point

The : Sync context bound is also not needed

fedefernandez · 2020-02-18T08:47:21Z

modules/kafka/src/main/scala/higherkindness/mu/kafka/ConsumerStream.scala

+            val a = decoder.decode(message.record.value)
+            for {
+              _ <- fs2.Stream.eval(logger.info(a.toString))
+            } yield a


We can use evalTap in the logger print

kafkaConsumerStream .evalTap(_.subscribeTo(topic)) .flatMap(_.stream.flatMap(msg => decoder.decode(msg.record.value))) .evalTap(a => logger.info(a.toString))

On the other hand, I don't know if this (print the message) is something we want to do at library level or at library client level

Yes, I know what you meant. In general it would be better if the library does less but it would be such a useful thing to have for debugging purpose to ensure that everything is working at the library level. We could put it at DEBUG or even at TRACE level.

Setting the log level to DEBUG looks good to me. Thanks!

I would not log the message contents at the library level, even as DEBUG. The message contents might be huge, and may contain sensitive data that shouldn't be logged. Best to let the user decide what they want to log.

fedefernandez · 2020-02-18T08:47:38Z

modules/kafka/src/main/scala/higherkindness/mu/kafka/ProducerStream.scala

+      implicit contextShift: ContextShift[F],
+      concurrentEffect: ConcurrentEffect[F],
+      timer: Timer[F],
+      sync: Sync[F],


Suggested change

sync: Sync[F],

fedefernandez · 2020-02-18T08:47:55Z

modules/kafka/src/main/scala/higherkindness/mu/kafka/ProducerStream.scala

+      implicit contextShift: ContextShift[F],
+      concurrentEffect: ConcurrentEffect[F],
+      timer: Timer[F],
+      sync: Sync[F],


Suggested change

sync: Sync[F],

fedefernandez · 2020-02-18T08:48:29Z

modules/kafka/src/main/scala/higherkindness/mu/kafka/ProducerStream.scala

+      implicit contextShift: ContextShift[F],
+      concurrentEffect: ConcurrentEffect[F],
+      timer: Timer[F],
+      sync: Sync[F],


Suggested change

sync: Sync[F],

fedefernandez · 2020-02-18T08:49:32Z

modules/kafka/src/main/scala/higherkindness/mu/kafka/ProducerStream.scala

+                .fold("Error: ProducerResult contained empty records.")(a => s"Published $a")
+            )
+          )
+          .flatMap(_ => Stream.eval(sync.delay(result)))


Suggested change

.flatMap(_ => Stream.eval(sync.delay(result)))

.evalMap(_ => sync.delay(result))

fedefernandez · 2020-02-18T08:49:52Z

modules/kafka/src/main/scala/higherkindness/mu/kafka/ProducerStream.scala

+      )
+      .covary[F]
+      .through(publishToKafka)
+      .flatMap(result =>


fedefernandez · 2020-02-18T08:53:07Z

project/ProjectPlugin.scala

@@ -3,7 +3,7 @@ import microsites.MicrositesPlugin.autoImport._
 import microsites._
 import sbt.Keys._
 import sbt.ScriptedPlugin.autoImport._
-import sbt._
+import sbt.{compilerPlugin, _}


Suggested change

import sbt.{compilerPlugin, _}

import sbt._

fedefernandez

@naree the overall implementation looks good to me

cb372 · 2020-02-18T15:27:13Z

modules/kafka/src/main/scala/higherkindness/mu/format/Decoder.scala

+ * limitations under the License.
+ */
+
+package higherkindness.mu.format


Maybe Encoder and Decoder should be moved somewhere under the higherkindness.mu.kafka package? Even though they are already namespaced by being in the kafka module, just looking at the FQN higherkindness.mu.format.Decoder doesn't give you any hint that it's Kafka-related.

Well, I don't think they need to be related. I put h.m.format package inside kafka project for now but this package only contains serialisation related code. In fact as pointed by @fedefernandez, the serialisation code is duplicated else where in mu-scala. We just keep one copy and put somewhere more appropriate. mu-format? mu-marshaller?

I think h.m.format.Decoder and h.m.format.Encoder should be treated as general interface rather than specific to mu-kafka

Can we use the internal module? If not, I'm ok with mu-format but I'd avoid extra modules as possible (we have a lot of already 😅 ).

@fedefernandez true. I am happy to move it to internal and remove the duplicate code as pointed out. We can always break it out to its own module as & when necessary.

cb372 · 2020-02-18T15:28:25Z

modules/kafka/src/main/scala/higherkindness/mu/format/Decoder.scala

+package higherkindness.mu.format
+
+trait Decoder[A] {
+  def decode(a: Array[Byte]): A


I would expect this to return something like Either[DecodeError, A]

Yes I agree. This PR is missing an error handling code. We will need to add appropriate actions when decoding fails. My suggestion is that we log it and throw away the message, ie commit the offset. All other approaches I have seen such as dead message topic, brings up its own complications. Or we can leave it to the end user.

cb372 · 2020-02-18T15:34:06Z

modules/kafka/src/main/scala/higherkindness/mu/kafka/ConsumerStream.scala

+      sync: Sync[F]
+  ): Stream[F, A] =
+    for {
+      implicit0(logger: Logger[F]) <- fs2.Stream.eval(Slf4jLogger.create[F])


nit: there are a few places where you can replace fs2.Stream with just Stream, as you have it imported

Always welcome free nit picking 😄. What the review is for!

pepegar

The general design looks good to me :D

raulraja · 2020-05-21T21:31:48Z

Can this be merged? Thanks!

naree · 2020-05-26T11:25:26Z

Can this be merged? Thanks!

Yes, I will rebase first and merge it.

cb372 · 2020-06-08T12:03:14Z

I'll merge this now. We'll need some follow-up PRs to iterate on the design, resolve some of the TODOs in the code and things pointed out in commments above, and add documentation for the feature.

I think the next thing to try would be making use of fs2-kafka's Serializer and Deserializer type classes so we don't have to define our own.

cb372 · 2020-06-08T12:03:44Z

Thanks @naree!

Naree Song added 17 commits February 13, 2020 17:34

Added producer and consumer creation methods

d7c0551

Clean-up refactoring

266f3f4

remove unused logback file

2679595

Remove an incorrect import

1a35e69

more cleanup

01b306f

add helper methods

023c6d7

cleanup

5716f54

cleanup

1d94983

Remove the test only method

41371bb

Use embedded kafka for integration testing

236a6b6

Use embedded kafka for integration testing

75c7138

User random port for kafka

19be455

cleanup

f83885b

cleanup

4fd5a89

Make settings injectable

bd0186f

Cleanup

d11a794

Polishing

ae6b675

naree requested a review from juanpedromoreno February 17, 2020 11:42

naree assigned cb372, serras, rafaparadela, pepegar and naree and unassigned cb372, serras, rafaparadela and pepegar Feb 17, 2020

naree requested review from cb372, pepegar and rafaparadela February 17, 2020 11:44

naree requested review from serras and tzimisce012 February 17, 2020 11:44

naree commented Feb 17, 2020

View reviewed changes

raulraja approved these changes Feb 17, 2020

View reviewed changes

tzimisce012 reviewed Feb 17, 2020

View reviewed changes

fedefernandez reviewed Feb 18, 2020

View reviewed changes

cb372 reviewed Feb 18, 2020

View reviewed changes

pepegar approved these changes Mar 27, 2020

View reviewed changes

Naree Song and others added 8 commits June 6, 2020 22:44

ADD-MU Resolve merge conflicts

9becaa3

ADD-MU Fix the compilation errors due to the avro4s API changes

c453af1

ADD-MU Fix the compilation errors due to the avro4s API changes

cc01d5c

ADD-MU Remove the APIs from the kafka package object

4471474

ADD-MU Resolve merge conflicts

e8e337d

ADD-MU Reformat

643d040

scalafmtSbt

403c585

Unused import

59675ea

cb372 merged commit 7417378 into master Jun 8, 2020

cb372 deleted the add-mu-kafka-consumer-producer branch June 8, 2020 12:03

	val baos: ByteArrayOutputStream = new ByteArrayOutputStream()
	val output: AvroBinaryOutputStream[A] = AvroOutputStream.binary[A](baos)
	output.write(value)
	output.close()

	new ByteArrayInputStream(baos.toByteArray)

	val input: AvroBinaryInputStream[A] = AvroInputStream.binary[A](stream)
	input.iterator.toList.head


		object ConsumerStream {

		def apply[F[_]: Sync, A](topic: String, settings: ConsumerSettings[F, String, Array[Byte]])(

	.flatMap(_ => Stream.eval(sync.delay(result)))
	.evalMap(_ => sync.delay(result))

Add mu kafka consumer producer #793

Add mu kafka consumer producer #793

Conversation

naree commented Feb 17, 2020 • edited Loading

What this does?

Checklist

naree Feb 17, 2020 • edited Loading

Choose a reason for hiding this comment

codecov bot commented Feb 17, 2020 • edited Loading

Codecov Report

naree Feb 17, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

raulraja left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fedefernandez Feb 18, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fedefernandez left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

naree Feb 18, 2020 • edited Loading

Choose a reason for hiding this comment

fedefernandez Feb 18, 2020 • edited Loading

Choose a reason for hiding this comment

naree Feb 18, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

naree Feb 18, 2020 • edited Loading

Choose a reason for hiding this comment

pepegar left a comment

Choose a reason for hiding this comment

raulraja commented May 21, 2020

naree commented May 26, 2020

cb372 commented Jun 8, 2020

cb372 commented Jun 8, 2020

naree commented Feb 17, 2020 •

edited

Loading

naree Feb 17, 2020 •

edited

Loading

codecov bot commented Feb 17, 2020 •

edited

Loading

naree Feb 17, 2020 •

edited

Loading

fedefernandez Feb 18, 2020 •

edited

Loading

naree Feb 18, 2020 •

edited

Loading

fedefernandez Feb 18, 2020 •

edited

Loading

naree Feb 18, 2020 •

edited

Loading

naree Feb 18, 2020 •

edited

Loading