Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add v2 record batch format for FetchResponse V4 support. #1185

Merged
merged 1 commit into from
Oct 24, 2017

Conversation

tvoinarovskyi
Copy link
Collaborator

@tvoinarovskyi tvoinarovskyi commented Aug 25, 2017

Changed layout of record parsing to allow V2 integration with batch concept. For now I only moved the abstract implementation and wrapped LegacyBatch using existing MessageSet and Message constructs. What I want to do next is:

  • Add implementation for v2 records. A bit dirty implementation is already done here
  • Add builder class implementation and fix Producer to use that.
  • Refactor LegacyRecordBatch to not use Message and MessageSet constructs (not needed overhead)

@dpkp What do you think, it's quite a hard change, but I did not find a good way to abstract Batches using existing constructs. Moreover this is based on current Java client version, thou with not as many classes.

@jeffwidman jeffwidman requested a review from dpkp August 28, 2017 18:03
@tvoinarovskyi
Copy link
Collaborator Author

Wow, that was a bit rough... Ended up with 2-3 hacks to support old versions of clients, but this should do for now. I plan to refactor LegacyRecord part to not use Message/MessageSet constructs, but for now, I need a break. In the mean time, feedback would be great.
Will finish up next week.

@jeffwidman
Copy link
Collaborator

Ended up with 2-3 hacks to support old versions of clients

Do you mean the old SimpleClient stacks? Or just so new KafkaClient can support old brokers?

If the former, than I started working on #1193 last night, and if I got that finished up, that would make your life simpler now and future code maintenance more straightforward...

@tvoinarovskyi
Copy link
Collaborator Author

@jeffwidman Yea, I meant the SimpleClient stacks. If we remove those, I could also remove some hacks from this PR.

@jeffwidman
Copy link
Collaborator

Then let's remove those first so that we don't have to worry about maintaining/cleaning up those hacks later. I started working on removing them last night, will try to push up a PR later tonight...

@tvoinarovskyi tvoinarovskyi changed the title [WIP] Add v2 record batch format for FetchResponse V4 support. Add v2 record batch format for FetchResponse V4 support. Sep 12, 2017
@tvoinarovskyi
Copy link
Collaborator Author

@dpkp I'm satisfied with the changes, but I want your feedback on this. Also, we need to decide where will this change be merged into, is it going for 1.3.X or 2.X.X. I'm more for the second, as we can clean up the hacks for old clients.

@dpkp
Copy link
Owner

dpkp commented Oct 2, 2017

Very impressive! I may have a few style suggestions that I'll write up tomorrow, but generally it looks very good. Re: merge location, I prefer just landing to master and defer choices about what to label the next release (1.4 / 2.0 / etc).



class Records(Bytes):

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class seems out of place in fetch.py . Why not put in message.py?


@classmethod
def encode(cls, buffer):
# For old producer, backward compat
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What other alternatives do we have here? Will we need to maintain MessageSet, or is there a way to transition the old code off of it entirely?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I think of it, let's just leave the field as Bytes in the protocol and wrap it into respectable classes in Fetcher and Old client. No hacks that way.

from .types import Array, Int8, Int16, Int32, Int64, Schema, String, Bytes


class Records(Bytes):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth a docstring explaining how this class / struct fits in compared w Bytes and MemoryRecords

@@ -216,6 +216,7 @@ def decode_fetch_response(cls, response):

@classmethod
def decode_message_set(cls, messages):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should document what type we expect here for messages. It seems like this changes the signature and could ause breakage.

return ord(memview[pos])


def encode_varint(num):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are VARINTs used elsewhere in the new Kafka protocol APIs? Perhaps these functions belong under kafka.protocol.types ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, currently only the Record structure uses it. If other API's will use it we can restructure later. I want this module to be very isolated with a good Facade. I want to cythonize this part later, I already did some work on that in aiokafka, gives very good results so far.

@tvoinarovskyi
Copy link
Collaborator Author

tvoinarovskyi commented Oct 3, 2017

@dpkp Say, now that I look at it, the PR is massive. Instead of merging the whole thing let me split in in 2 parts:

  • LegacyRecords refactoring. I will just replace the current message.py with the legacy_records.py of this PR. The folder layout will be the same as of this PR.
  • DefaultRecords implementation. Will add the V2 structures.

That way we can first make sure we don't have a regression from the LegacyRecords classes and only after merge the v2 records. And the changeset will be more suited for review.

@dpkp
Copy link
Owner

dpkp commented Oct 3, 2017 via email

if self._compression_type == self.CODEC_GZIP:
compressed = gzip_encode(data)
elif self._compression_type == self.CODEC_SNAPPY:
compressed = snappy_encode(data)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I test locally and snappy libraries are not available, this will raise a NotImplementedError and crash the producer IO thread:

ERROR:kafka.producer.sender:Uncaught error in kafka producer I/O thread

But this is only logged and producer.send() still returns a future. If you try to resolve the future with .get() it hangs forever. We should probably improve this so that the exception is raised to the user or at least does not crash the IO thread and is used to resolve the future. My preference is the former.

@@ -353,7 +354,7 @@ def __init__(self, **configs):
if self.config['compression_type'] == 'lz4':
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since compression_type is a producer configuration, we could put the check for compression libraries import in __init__ here

@dpkp
Copy link
Owner

dpkp commented Oct 4, 2017

Should also note that this PR changes the error raised when compression libraries are not available -- was AssertionError, now NotImplementedError. Related to #1168

# we did not read a single message from a non-empty
# buffer because that message's size is larger than
# fetch size, in this case record this exception
self._record_too_large_partitions[tp] = fetch_offset
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the Java consumer docs, this is actually now a soft-limit so that the consumer can make progress. That change is likely outside the scope of this PR, so I filed #1232 to track that.

@dpkp
Copy link
Owner

dpkp commented Oct 4, 2017

Running quick benchmarks suggests that KafkaProducer performance is reduced significantly when using the new record format. My laptop shows a drop from 9K records/sec to 5.5 records/sec. On the other hand, KafkaConsumer performance improved from 25K/sec to 35K/sec. This was against a local, 3-broker cluster running 0.11.0.0:

KAFKA_VERSION=0.11.0.0 PYTHONPATH=. ./benchmarks/producer_performance.py --brokers 3

When testing against 0.10.2.1 (the old message format), this PR performed basically the same as master (perhaps slightly better).

My gut says that this is ok for now; folks that care about consumer performance will benefit, folks that care about producer performance will be sad. But since we allow users to select the api version they want, they can always pin to 0.10 in their producers.

@jeffwidman
Copy link
Collaborator

jeffwidman commented Oct 4, 2017

9K records/sec to 5.5 records/sec

Is that truly only 5.5 records/sec or did you mean 5.5K/sec? A 50% reduction is very different than a 99% reduction in throughput...

I'm also curious if the Java producer saw such a drastic throughput drop... ?

@dpkp
Copy link
Owner

dpkp commented Oct 5, 2017

Ha, yea. 5.5K records / sec. From about 1.2MB/sec to 600KB/sec

@tvoinarovskyi
Copy link
Collaborator Author

@dpkp I do expect a drop in V2 format, I did describe the problems in #1119 (comment). For now, don't worry about those, I will do heavy benchmarking on that part before the merge. The main problem is probably the varint pure python implementation, it's a bad fit for the language.

@tvoinarovskyi
Copy link
Collaborator Author

@jeffwidman JVM's Jit probably made it barely visible ) Java's fast for those type of problems, that's all. I'm positive, that PyPy will also yield quite a good number even on the current implementation.

@dpkp
Copy link
Owner

dpkp commented Oct 7, 2017

I'm going to cut a release this weekend but would like to not include this PR yet. Mostly this is because memoryview does not work on python 2.6. And although we say that we do not support 2.6, it still does work -- it is just not actively tested. I think if we're going to break python2.6 affirmatively we should do it loudly / with a large version bump.

@tvoinarovskyi
Copy link
Collaborator Author

@dpkp Seems like there's a bug here and the reader goes v1 on read, cause the read speed should have dropped also on v2. Probably I know where's the bug )

@tvoinarovskyi
Copy link
Collaborator Author

Latest microbenchmarks for the V2 format:

(.kafka-python-osx)MacBook-Pro-Taras:kafka-python taras$ python benchmarks/record_batch_read.py 
.....................
batch_read_v0: Mean +- std dev: 1.33 ms +- 0.03 ms
.....................
batch_read_v1: Mean +- std dev: 1.35 ms +- 0.03 ms
.....................
batch_read_v2: Mean +- std dev: 3.52 ms +- 0.06 ms
(.kafka-python-osx)MacBook-Pro-Taras:kafka-python taras$ python benchmarks/record_batch_compose.py 
.....................
batch_append_v0: Mean +- std dev: 1.16 ms +- 0.03 ms
.....................
batch_append_v1: Mean +- std dev: 1.18 ms +- 0.03 ms
.....................
batch_append_v2: Mean +- std dev: 4.35 ms +- 0.08 ms
(.kafka-python-osx)MacBook-Pro-Taras:kafka-python taras$ 

Will need to lower it to at least 2 times slower...

@tvoinarovskyi
Copy link
Collaborator Author

And hereby I declare myself as a magician and thus will perform magic on the latest commit.
In other words spent some hours looking at optcodes for varint functions and did some optimisations:

(.kafka-python-osx)MacBook-Pro-Taras:kafka-python taras$ python benchmarks/record_batch_read.py 
.....................
batch_read_v0: Mean +- std dev: 1.32 ms +- 0.03 ms
.....................
batch_read_v1: Mean +- std dev: 1.35 ms +- 0.03 ms
.....................
batch_read_v2: Mean +- std dev: 2.56 ms +- 0.04 ms
(.kafka-python-osx)MacBook-Pro-Taras:kafka-python taras$ python benchmarks/record_batch_compose.py 
.....................
batch_append_v0: Mean +- std dev: 1.20 ms +- 0.04 ms
.....................
batch_append_v1: Mean +- std dev: 1.21 ms +- 0.03 ms
.....................
batch_append_v2: Mean +- std dev: 3.15 ms +- 0.04 ms

@jeffwidman
Copy link
Collaborator

jeffwidman commented Oct 13, 2017

Very impressive.

FYI, since you're a collaborator on the project, you can manually re-trigger the travis builds for any PR yourself within the Travis GUI, no need to push commits to force them: https://stackoverflow.com/questions/17606874/trigger-a-travis-ci-rebuild-without-pushing-a-commit

@tvoinarovskyi
Copy link
Collaborator Author

@jeffwidman Yea, but it bugged out on me. It did not want to restart from the interface. Had to create another job)

@jeffwidman
Copy link
Collaborator

Is this ready to be merged? I see merge conflicts...

…ilder.

Added bytecode optimization for varint and append/read_msg functions. Mostly based on avoiding LOAD_GLOBAL calls.
@tvoinarovskyi
Copy link
Collaborator Author

@dpkp @jeffwidman Will go straight to the point. I want to merge this. This PR is quite big even now and dancing around it will not bring better performance. To get better it needs a C extension, which should not be part of this PR. Any objections?

self.config['fetch_min_bytes'],
self.config['fetch_max_bytes'],
partition_data)
if version == 3:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be <= 3 ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, the 0,1 and 2 versions are handled above

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep -- i missed that!

@dpkp
Copy link
Owner

dpkp commented Oct 24, 2017

I'm +1 for merge after fixing the FetchRequest api version check. Performance hit is ok with me in order to support newer features, especially headers. I'm not super excited about C-extensions. pypy JIT is quite good.

@jeffwidman
Copy link
Collaborator

jeffwidman commented Oct 24, 2017

+1 from me. Agree with @dpkp, at the end of the day, core features like this trump throughput concerns. Plus in the majority of consumers that I support, most of the time is due to processing the contents of the message, not kafka-python.

While the speed of C-extensions are nice, I'd rather direct those who want that to confluent-kafka-python, no sense us re-inventing that wheel. A big reason I chose kafka-python was due to the simplicity of debugging production issues without having to worry about C-extensions. When I get called in to firefight an emergency, being able to just drop a pdb breakpoint and poke around is very valuable.

@tvoinarovskyi tvoinarovskyi merged commit 8b05ee8 into master Oct 24, 2017
@tvoinarovskyi tvoinarovskyi deleted the v2_records branch October 24, 2017 22:28
@dpkp
Copy link
Owner

dpkp commented Oct 24, 2017

Great work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants