kafka-partition-availability-benchmark

This repository contains a Kafka partition stress test. The goal of it is to make it easier to validate changes to Kafka with respect how many concurrent replicated partitions it can support.

We want to ensure that our Kafka users have the following guarantees:

Any topic or partition is available for consume or produce at any time
Latencies are below certain SLAs for consume and produce
Users can reset the consumer offset to beginning and reconsume at any time

Given that Kafka currently (as of 02/02/18) doesn't limit the number of topics you are allowed to created, this tool helped us answer, "how many topics and partitions can we place on our multitenant Kafka systems before things start going downhill?"

Building

This will create a runnable jar in the target directory called kafka_partition_availability_benchmark.jar:

mvn package

Configuration

You can see all configurable parameters in src/main/resources/kafka-partition-availability-benchmark.properties

The defaults are set to something you can run against a local single-broker installation of kafka. In most cases, you only probably need to set four things to put stress on a real test environment:

cat > ~/.kafka-partition-availability-benchmark.properties << EOF
kafka.replication.factor = 3
num_topics = 4000
num_concurrent_consumers = 4000
kafka.brokers = my_test_kafka:9092
EOF

Depending on how powerful your test runner host is, you might be able to bump up the number of topics past 4000. In our testing, 4000 was what an i3.2xlarge instance could bear before test results started to get skewed.

To get past 4000 topics, we ran this tool on multiple test runners. We recommend setting the default topic prefix to something unique per test runner by doing something like this:

echo "default_topic_prefix = `hostname`" >> ~/.kafka-partition-availability-benchmark.properties

Measuring produce and consume at the same time

By default, the benchmark will only produce one message per partition and re-consume that messages every second. To test produce continuously in the same fashion and not rely on re-consuming the same messages, set the following configuration options:

cat > ~/.kafka-partition-availability-benchmark.properties << EOF
keep_producing = true
EOF

Running

You can use your favorite configuration management tool such as Ansible to make the below more elegant. From a central host you can do something like this:

for test_runner in ${test_runners}; do
    rsync ~/.kafka-partition-availability-benchmark.properties target/kafka_partition_availability_benchmark.jar ${test_runner}:
    ssh ${test_runner} 'echo "default_topic_prefix = `hostname`" >> ~/.kafka-partition-availability-benchmark.properties'
    ssh ${test_runner} 'nohup java -jar kafka_partition_availability_benchmark.jar &> kafka_partition_availability_benchmark.log &';
    ssh ${test_runner} 'for pr in `pgrep -f kafka_partition_availability_benchmark`; do sudo prlimit -n50000:50000 -p $pr; done';
done

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
dashboards		dashboards
src/main		src/main
.gitignore		.gitignore
CODEOWNERS		CODEOWNERS
LICENSE.txt		LICENSE.txt
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

kafka-partition-availability-benchmark

Building

Configuration

Measuring produce and consume at the same time

Running

About

Releases

Packages

Contributors 4

Languages

License

salesforce/kafka-partition-availability-benchmark

Folders and files

Latest commit

History

Repository files navigation

kafka-partition-availability-benchmark

Building

Configuration

Measuring produce and consume at the same time

Running

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages