Add v2 record batch format for FetchResponse V4 support. #1185

tvoinarovskyi · 2017-08-25T10:04:50Z

Changed layout of record parsing to allow V2 integration with batch concept. For now I only moved the abstract implementation and wrapped LegacyBatch using existing MessageSet and Message constructs. What I want to do next is:

Add implementation for v2 records. A bit dirty implementation is already done here
Add builder class implementation and fix Producer to use that.
Refactor LegacyRecordBatch to not use Message and MessageSet constructs (not needed overhead)

@dpkp What do you think, it's quite a hard change, but I did not find a good way to abstract Batches using existing constructs. Moreover this is based on current Java client version, thou with not as many classes.

tvoinarovskyi · 2017-08-30T13:25:46Z

Wow, that was a bit rough... Ended up with 2-3 hacks to support old versions of clients, but this should do for now. I plan to refactor LegacyRecord part to not use Message/MessageSet constructs, but for now, I need a break. In the mean time, feedback would be great.
Will finish up next week.

jeffwidman · 2017-08-30T19:12:23Z

Ended up with 2-3 hacks to support old versions of clients

Do you mean the old SimpleClient stacks? Or just so new KafkaClient can support old brokers?

If the former, than I started working on #1193 last night, and if I got that finished up, that would make your life simpler now and future code maintenance more straightforward...

tvoinarovskyi · 2017-08-30T20:59:32Z

@jeffwidman Yea, I meant the SimpleClient stacks. If we remove those, I could also remove some hacks from this PR.

jeffwidman · 2017-08-30T21:56:38Z

Then let's remove those first so that we don't have to worry about maintaining/cleaning up those hacks later. I started working on removing them last night, will try to push up a PR later tonight...

tvoinarovskyi · 2017-09-12T20:05:35Z

@dpkp I'm satisfied with the changes, but I want your feedback on this. Also, we need to decide where will this change be merged into, is it going for 1.3.X or 2.X.X. I'm more for the second, as we can clean up the hacks for old clients.

dpkp · 2017-10-02T03:29:55Z

Very impressive! I may have a few style suggestions that I'll write up tomorrow, but generally it looks very good. Re: merge location, I prefer just landing to master and defer choices about what to label the next release (1.4 / 2.0 / etc).

dpkp · 2017-10-03T04:23:44Z

kafka/protocol/fetch.py

+
+
+class Records(Bytes):
+


This class seems out of place in fetch.py . Why not put in message.py?

dpkp · 2017-10-03T04:26:01Z

kafka/protocol/fetch.py

+
+    @classmethod
+    def encode(cls, buffer):
+        # For old producer, backward compat


What other alternatives do we have here? Will we need to maintain MessageSet, or is there a way to transition the old code off of it entirely?

Now that I think of it, let's just leave the field as Bytes in the protocol and wrap it into respectable classes in Fetcher and Old client. No hacks that way.

dpkp · 2017-10-03T04:27:29Z

kafka/protocol/fetch.py

+from .types import Array, Int8, Int16, Int32, Int64, Schema, String, Bytes
+
+
+class Records(Bytes):


Worth a docstring explaining how this class / struct fits in compared w Bytes and MemoryRecords

dpkp · 2017-10-03T04:30:26Z

kafka/protocol/legacy.py

@@ -216,6 +216,7 @@ def decode_fetch_response(cls, response):

    @classmethod
    def decode_message_set(cls, messages):


Should document what type we expect here for messages. It seems like this changes the signature and could ause breakage.

dpkp · 2017-10-03T13:33:36Z

kafka/record/util.py

+        return ord(memview[pos])
+
+
+def encode_varint(num):


Are VARINTs used elsewhere in the new Kafka protocol APIs? Perhaps these functions belong under kafka.protocol.types ?

No, currently only the Record structure uses it. If other API's will use it we can restructure later. I want this module to be very isolated with a good Facade. I want to cythonize this part later, I already did some work on that in aiokafka, gives very good results so far.

tvoinarovskyi · 2017-10-03T13:59:26Z

@dpkp Say, now that I look at it, the PR is massive. Instead of merging the whole thing let me split in in 2 parts:

LegacyRecords refactoring. I will just replace the current message.py with the legacy_records.py of this PR. The folder layout will be the same as of this PR.
DefaultRecords implementation. Will add the V2 structures.

That way we can first make sure we don't have a regression from the LegacyRecords classes and only after merge the v2 records. And the changeset will be more suited for review.

dpkp · 2017-10-03T14:25:25Z

+1

dpkp · 2017-10-04T13:39:41Z

kafka/record/legacy_records.py

+            if self._compression_type == self.CODEC_GZIP:
+                compressed = gzip_encode(data)
+            elif self._compression_type == self.CODEC_SNAPPY:
+                compressed = snappy_encode(data)


When I test locally and snappy libraries are not available, this will raise a NotImplementedError and crash the producer IO thread:

ERROR:kafka.producer.sender:Uncaught error in kafka producer I/O thread

But this is only logged and producer.send() still returns a future. If you try to resolve the future with .get() it hangs forever. We should probably improve this so that the exception is raised to the user or at least does not crash the IO thread and is used to resolve the future. My preference is the former.

dpkp · 2017-10-04T14:05:12Z

kafka/producer/kafka.py

@@ -353,7 +354,7 @@ def __init__(self, **configs):
        if self.config['compression_type'] == 'lz4':


Since compression_type is a producer configuration, we could put the check for compression libraries import in __init__ here

dpkp · 2017-10-04T16:29:55Z

Should also note that this PR changes the error raised when compression libraries are not available -- was AssertionError, now NotImplementedError. Related to #1168

jeffwidman · 2017-10-04T19:33:30Z

kafka/consumer/fetcher.py

+                        # we did not read a single message from a non-empty
+                        # buffer because that message's size is larger than
+                        # fetch size, in this case record this exception
+                        self._record_too_large_partitions[tp] = fetch_offset


According to the Java consumer docs, this is actually now a soft-limit so that the consumer can make progress. That change is likely outside the scope of this PR, so I filed #1232 to track that.

dpkp · 2017-10-04T19:37:28Z

Running quick benchmarks suggests that KafkaProducer performance is reduced significantly when using the new record format. My laptop shows a drop from 9K records/sec to 5.5 records/sec. On the other hand, KafkaConsumer performance improved from 25K/sec to 35K/sec. This was against a local, 3-broker cluster running 0.11.0.0:

KAFKA_VERSION=0.11.0.0 PYTHONPATH=. ./benchmarks/producer_performance.py --brokers 3

When testing against 0.10.2.1 (the old message format), this PR performed basically the same as master (perhaps slightly better).

My gut says that this is ok for now; folks that care about consumer performance will benefit, folks that care about producer performance will be sad. But since we allow users to select the api version they want, they can always pin to 0.10 in their producers.

jeffwidman · 2017-10-04T20:48:08Z

9K records/sec to 5.5 records/sec

Is that truly only 5.5 records/sec or did you mean 5.5K/sec? A 50% reduction is very different than a 99% reduction in throughput...

I'm also curious if the Java producer saw such a drastic throughput drop... ?

dpkp · 2017-10-05T00:53:00Z

Ha, yea. 5.5K records / sec. From about 1.2MB/sec to 600KB/sec

tvoinarovskyi · 2017-10-05T10:25:10Z

@dpkp I do expect a drop in V2 format, I did describe the problems in #1119 (comment). For now, don't worry about those, I will do heavy benchmarking on that part before the merge. The main problem is probably the varint pure python implementation, it's a bad fit for the language.

tvoinarovskyi · 2017-10-05T10:27:15Z

@jeffwidman JVM's Jit probably made it barely visible ) Java's fast for those type of problems, that's all. I'm positive, that PyPy will also yield quite a good number even on the current implementation.

dpkp · 2017-10-07T03:11:17Z

I'm going to cut a release this weekend but would like to not include this PR yet. Mostly this is because memoryview does not work on python 2.6. And although we say that we do not support 2.6, it still does work -- it is just not actively tested. I think if we're going to break python2.6 affirmatively we should do it loudly / with a large version bump.

tvoinarovskyi · 2017-10-11T12:58:23Z

@dpkp Seems like there's a bug here and the reader goes v1 on read, cause the read speed should have dropped also on v2. Probably I know where's the bug )

tvoinarovskyi · 2017-10-12T11:19:07Z

Latest microbenchmarks for the V2 format:

(.kafka-python-osx)MacBook-Pro-Taras:kafka-python taras$ python benchmarks/record_batch_read.py 
.....................
batch_read_v0: Mean +- std dev: 1.33 ms +- 0.03 ms
.....................
batch_read_v1: Mean +- std dev: 1.35 ms +- 0.03 ms
.....................
batch_read_v2: Mean +- std dev: 3.52 ms +- 0.06 ms
(.kafka-python-osx)MacBook-Pro-Taras:kafka-python taras$ python benchmarks/record_batch_compose.py 
.....................
batch_append_v0: Mean +- std dev: 1.16 ms +- 0.03 ms
.....................
batch_append_v1: Mean +- std dev: 1.18 ms +- 0.03 ms
.....................
batch_append_v2: Mean +- std dev: 4.35 ms +- 0.08 ms
(.kafka-python-osx)MacBook-Pro-Taras:kafka-python taras$

Will need to lower it to at least 2 times slower...

tvoinarovskyi · 2017-10-13T12:30:37Z

And hereby I declare myself as a magician and thus will perform magic on the latest commit.
In other words spent some hours looking at optcodes for varint functions and did some optimisations:

(.kafka-python-osx)MacBook-Pro-Taras:kafka-python taras$ python benchmarks/record_batch_read.py 
.....................
batch_read_v0: Mean +- std dev: 1.32 ms +- 0.03 ms
.....................
batch_read_v1: Mean +- std dev: 1.35 ms +- 0.03 ms
.....................
batch_read_v2: Mean +- std dev: 2.56 ms +- 0.04 ms
(.kafka-python-osx)MacBook-Pro-Taras:kafka-python taras$ python benchmarks/record_batch_compose.py 
.....................
batch_append_v0: Mean +- std dev: 1.20 ms +- 0.04 ms
.....................
batch_append_v1: Mean +- std dev: 1.21 ms +- 0.03 ms
.....................
batch_append_v2: Mean +- std dev: 3.15 ms +- 0.04 ms

jeffwidman · 2017-10-13T16:20:20Z

Very impressive.

FYI, since you're a collaborator on the project, you can manually re-trigger the travis builds for any PR yourself within the Travis GUI, no need to push commits to force them: https://stackoverflow.com/questions/17606874/trigger-a-travis-ci-rebuild-without-pushing-a-commit

tvoinarovskyi · 2017-10-13T20:01:06Z

@jeffwidman Yea, but it bugged out on me. It did not want to restart from the interface. Had to create another job)

jeffwidman · 2017-10-24T02:50:06Z

Is this ready to be merged? I see merge conflicts...

…ilder. Added bytecode optimization for varint and append/read_msg functions. Mostly based on avoiding LOAD_GLOBAL calls.

tvoinarovskyi · 2017-10-24T15:43:56Z

@dpkp @jeffwidman Will go straight to the point. I want to merge this. This PR is quite big even now and dancing around it will not bring better performance. To get better it needs a C extension, which should not be part of this PR. Any objections?

dpkp · 2017-10-24T17:48:53Z

kafka/consumer/fetcher.py

-                    self.config['fetch_min_bytes'],
-                    self.config['fetch_max_bytes'],
-                    partition_data)
+                if version == 3:


I think this should be <= 3 ?

no, the 0,1 and 2 versions are handled above

yep -- i missed that!

dpkp · 2017-10-24T18:07:53Z

I'm +1 for merge after fixing the FetchRequest api version check. Performance hit is ok with me in order to support newer features, especially headers. I'm not super excited about C-extensions. pypy JIT is quite good.

jeffwidman · 2017-10-24T20:56:42Z

+1 from me. Agree with @dpkp, at the end of the day, core features like this trump throughput concerns. Plus in the majority of consumers that I support, most of the time is due to processing the contents of the message, not kafka-python.

While the speed of C-extensions are nice, I'd rather direct those who want that to confluent-kafka-python, no sense us re-inventing that wheel. A big reason I chose kafka-python was due to the simplicity of debugging production issues without having to worry about C-extensions. When I get called in to firefight an emergency, being able to just drop a pdb breakpoint and poke around is very valuable.

dpkp · 2017-10-24T22:37:30Z

Great work!

jeffwidman requested a review from dpkp August 28, 2017 18:03

tvoinarovskyi mentioned this pull request Sep 12, 2017

Message creation, encoding, and decoding performance #1210

Closed

tvoinarovskyi changed the title ~~[WIP] Add v2 record batch format for FetchResponse V4 support.~~ Add v2 record batch format for FetchResponse V4 support. Sep 12, 2017

dpkp approved these changes Oct 3, 2017

View reviewed changes

dpkp reviewed Oct 4, 2017

View reviewed changes

jeffwidman mentioned this pull request Oct 4, 2017

KIP-74 / KAFKA-2063: max_partition_fetch_bytes should now be a soft limit #1232

Open

jeffwidman reviewed Oct 4, 2017

View reviewed changes

tvoinarovskyi mentioned this pull request Oct 11, 2017

Refactor MessageSet and Message into LegacyRecordBatch #1252

Merged

tvoinarovskyi force-pushed the v2_records branch from ba9b11c to 20405e5 Compare October 12, 2017 11:06

tvoinarovskyi force-pushed the v2_records branch from 1a95cb7 to e5fce34 Compare October 14, 2017 21:20

tvoinarovskyi force-pushed the v2_records branch from e5fce34 to 4d0c387 Compare October 24, 2017 10:44

Add DefaultRecordBatch implementation aka V2 message format parser/bu…

3af66bc

…ilder. Added bytecode optimization for varint and append/read_msg functions. Mostly based on avoiding LOAD_GLOBAL calls.

tvoinarovskyi force-pushed the v2_records branch from c7fc65b to 3af66bc Compare October 24, 2017 15:13

dpkp reviewed Oct 24, 2017

View reviewed changes

tvoinarovskyi merged commit 8b05ee8 into master Oct 24, 2017

tvoinarovskyi deleted the v2_records branch October 24, 2017 22:28

		from .types import Array, Int8, Int16, Int32, Int64, Schema, String, Bytes


		class Records(Bytes):

		@@ -216,6 +216,7 @@ def decode_fetch_response(cls, response):

		@classmethod
		def decode_message_set(cls, messages):

		@@ -353,7 +354,7 @@ def __init__(self, **configs):
		if self.config['compression_type'] == 'lz4':

Add v2 record batch format for FetchResponse V4 support. #1185

Add v2 record batch format for FetchResponse V4 support. #1185

Conversation

tvoinarovskyi commented Aug 25, 2017 • edited Loading

tvoinarovskyi commented Aug 30, 2017

jeffwidman commented Aug 30, 2017

tvoinarovskyi commented Aug 30, 2017

jeffwidman commented Aug 30, 2017

tvoinarovskyi commented Sep 12, 2017

dpkp commented Oct 2, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tvoinarovskyi commented Oct 3, 2017 • edited Loading

dpkp commented Oct 3, 2017 via email • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dpkp commented Oct 4, 2017

Choose a reason for hiding this comment

dpkp commented Oct 4, 2017

jeffwidman commented Oct 4, 2017 • edited Loading

dpkp commented Oct 5, 2017 • edited Loading

tvoinarovskyi commented Oct 5, 2017

tvoinarovskyi commented Oct 5, 2017

dpkp commented Oct 7, 2017

tvoinarovskyi commented Oct 11, 2017

tvoinarovskyi commented Oct 12, 2017

tvoinarovskyi commented Oct 13, 2017

jeffwidman commented Oct 13, 2017 • edited Loading

tvoinarovskyi commented Oct 13, 2017

jeffwidman commented Oct 24, 2017

tvoinarovskyi commented Oct 24, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dpkp commented Oct 24, 2017

jeffwidman commented Oct 24, 2017 • edited Loading

dpkp commented Oct 24, 2017

tvoinarovskyi commented Aug 25, 2017 •

edited

Loading

tvoinarovskyi commented Oct 3, 2017 •

edited

Loading

dpkp commented Oct 3, 2017 via email •

edited

Loading

jeffwidman commented Oct 4, 2017 •

edited

Loading

dpkp commented Oct 5, 2017 •

edited

Loading

jeffwidman commented Oct 13, 2017 •

edited

Loading

jeffwidman commented Oct 24, 2017 •

edited

Loading