Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Kafka Binaries for Integration Tests (Issue #176) #193

Merged
merged 10 commits into from
Aug 14, 2014
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,6 @@ build
dist
MANIFEST
env
servers/*/kafka-bin
.coverage
.noseids
6 changes: 0 additions & 6 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -1,6 +0,0 @@
[submodule "servers/0.8.0/kafka-src"]
path = servers/0.8.0/kafka-src
url = https://github.com/apache/kafka.git
[submodule "servers/0.8.1/kafka-src"]
path = servers/0.8.1/kafka-src
url = https://github.com/apache/kafka.git
9 changes: 6 additions & 3 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,13 @@ python:
- 2.7
- pypy

env:
-
- KAFKA_VERSION=0.8.0
- KAFKA_VERSION=0.8.1
- KAFKA_VERSION=0.8.1.1

before_install:
- git submodule update --init --recursive
- sudo apt-get install libsnappy-dev
- ./build_integration.sh

Expand All @@ -19,5 +24,3 @@ install:

script:
- tox -e `./travis_selector.sh $TRAVIS_PYTHON_VERSION`
- KAFKA_VERSION=0.8.0 tox -e `./travis_selector.sh $TRAVIS_PYTHON_VERSION`
- KAFKA_VERSION=0.8.1 tox -e `./travis_selector.sh $TRAVIS_PYTHON_VERSION`
29 changes: 15 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -190,32 +190,33 @@ pip install python-snappy
tox
```

## Run a single unit test
```shell
tox -e py27 -- -v --with-id 102
```

## Run the integration tests

The integration tests will actually start up real local Zookeeper
instance and Kafka brokers, and send messages in using the client.

Note that you may want to add this to your global gitignore:
First, get the kafka binaries for integration testing:
```shell
.gradle/
clients/build/
contrib/build/
contrib/hadoop-consumer/build/
contrib/hadoop-producer/build/
core/build/
core/data/
examples/build/
perf/build/
./build_integration.sh
```

First, check out and the Kafka source:
By default, the build_integration.sh script will download binary
distributions for all supported kafka versions.
To test against the latest source build, set KAFKA_VERSION=trunk
and optionally set SCALA_VERSION (defaults to 2.8.0, but 2.10.1 is recommended)
```shell
git submodule update --init
./build_integration.sh
SCALA_VERSION=2.10.1 KAFKA_VERSION=trunk ./build_integration.sh
```

Then run the tests against supported Kafka versions:
```shell
KAFKA_VERSION=0.8.0 tox
KAFKA_VERSION=0.8.1 tox
KAFKA_VERSION=0.8.1.1 tox
KAFKA_VERSION=trunk tox
```

62 changes: 59 additions & 3 deletions build_integration.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,61 @@
#!/bin/bash

git submodule update --init
(cd servers/0.8.0/kafka-src && ./sbt update package assembly-package-dependency)
(cd servers/0.8.1/kafka-src && ./gradlew jar)
# Versions available for testing via binary distributions
OFFICIAL_RELEASES="0.8.0 0.8.1 0.8.1.1"

# Useful configuration vars, with sensible defaults
if [ -z "$SCALA_VERSION" ]; then
SCALA_VERSION=2.8.0
fi

# On travis CI, empty KAFKA_VERSION means skip integration tests
# so we dont try to get binaries
# Otherwise it means test all official releases, so we get all of them!
if [ -z "$KAFKA_VERSION" -a -z "$TRAVIS" ]; then
KAFKA_VERSION=$OFFICIAL_RELEASES
fi

# By default look for binary releases at archive.apache.org
if [ -z "$DIST_BASE_URL" ]; then
DIST_BASE_URL="https://archive.apache.org/dist/kafka/"
fi

# When testing against source builds, use this git repo
if [ -z "$KAFKA_SRC_GIT" ]; then
KAFKA_SRC_GIT="https://github.com/apache/kafka.git"
fi

pushd servers
mkdir -p dist
pushd dist
for kafka in $KAFKA_VERSION; do
if [ "$kafka" == "trunk" ]; then
if [ ! -d "$kafka" ]; then
git clone $KAFKA_SRC_GIT $kafka
fi
pushd $kafka
git pull
./gradlew -PscalaVersion=$SCALA_VERSION -Pversion=$kafka releaseTarGz -x signArchives
popd
# Not sure how to construct the .tgz name accurately, so use a wildcard (ugh)
tar xzvf $kafka/core/build/distributions/kafka_*.tgz -C ../$kafka/
rm $kafka/core/build/distributions/kafka_*.tgz
mv ../$kafka/kafka_* ../$kafka/kafka-bin
else
echo "-------------------------------------"
echo "Checking kafka binaries for ${kafka}"
echo
wget -N https://archive.apache.org/dist/kafka/$kafka/kafka_${SCALA_VERSION}-${kafka}.tgz || wget -N https://archive.apache.org/dist/kafka/$kafka/kafka_${SCALA_VERSION}-${kafka}.tar.gz
echo
if [ ! -d "../$kafka/kafka-bin" ]; then
echo "Extracting kafka binaries for ${kafka}"
tar xzvf kafka_${SCALA_VERSION}-${kafka}.t* -C ../$kafka/
mv ../$kafka/kafka_${SCALA_VERSION}-${kafka} ../$kafka/kafka-bin
else
echo "$kafka/kafka-bin directory already exists -- skipping tgz extraction"
fi
fi
echo
done
popd
popd
1 change: 0 additions & 1 deletion servers/0.8.0/kafka-src
Submodule kafka-src deleted from 15bb39
118 changes: 118 additions & 0 deletions servers/0.8.1.1/resources/kafka.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# see kafka.server.KafkaConfig for additional details and defaults

############################# Server Basics #############################

# The id of the broker. This must be set to a unique integer for each broker.
broker.id={broker_id}

############################# Socket Server Settings #############################

# The port the socket server listens on
port={port}

# Hostname the broker will bind to. If not set, the server will bind to all interfaces
host.name={host}

# Hostname the broker will advertise to producers and consumers. If not set, it uses the
# value for "host.name" if configured. Otherwise, it will use the value returned from
# java.net.InetAddress.getCanonicalHostName().
#advertised.host.name=<hostname routable by clients>

# The port to publish to ZooKeeper for clients to use. If this is not set,
# it will publish the same port that the broker binds to.
#advertised.port=<port accessible by clients>

# The number of threads handling network requests
num.network.threads=2

# The number of threads doing disk I/O
num.io.threads=8

# The send buffer (SO_SNDBUF) used by the socket server
socket.send.buffer.bytes=1048576

# The receive buffer (SO_RCVBUF) used by the socket server
socket.receive.buffer.bytes=1048576

# The maximum size of a request that the socket server will accept (protection against OOM)
socket.request.max.bytes=104857600


############################# Log Basics #############################

# A comma seperated list of directories under which to store log files
log.dirs={tmp_dir}/data

# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
num.partitions={partitions}
default.replication.factor={replicas}

############################# Log Flush Policy #############################

# Messages are immediately written to the filesystem but by default we only fsync() to sync
# the OS cache lazily. The following configurations control the flush of data to disk.
# There are a few important trade-offs here:
# 1. Durability: Unflushed data may be lost if you are not using replication.
# 2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.
# 3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to exceessive seeks.
# The settings below allow one to configure the flush policy to flush data after a period of time or
# every N messages (or both). This can be done globally and overridden on a per-topic basis.

# The number of messages to accept before forcing a flush of data to disk
#log.flush.interval.messages=10000

# The maximum amount of time a message can sit in a log before we force a flush
#log.flush.interval.ms=1000

############################# Log Retention Policy #############################

# The following configurations control the disposal of log segments. The policy can
# be set to delete segments after a period of time, or after a given size has accumulated.
# A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
# from the end of the log.

# The minimum age of a log file to be eligible for deletion
log.retention.hours=168

# A size-based retention policy for logs. Segments are pruned from the log as long as the remaining
# segments don't drop below log.retention.bytes.
#log.retention.bytes=1073741824

# The maximum size of a log segment file. When this size is reached a new log segment will be created.
log.segment.bytes=536870912

# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
log.retention.check.interval.ms=60000

# By default the log cleaner is disabled and the log retention policy will default to just delete segments after their retention expires.
# If log.cleaner.enable=true is set the cleaner will be enabled and individual logs can then be marked for log compaction.
log.cleaner.enable=false

############################# Zookeeper #############################

# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.
zookeeper.connect={zk_host}:{zk_port}/{zk_chroot}

# Timeout in ms for connecting to zookeeper
zookeeper.connection.timeout.ms=1000000
24 changes: 24 additions & 0 deletions servers/0.8.1.1/resources/log4j.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

log4j.rootLogger=INFO, stdout

log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=[%d] %p %m (%c)%n

log4j.logger.kafka=DEBUG, stdout
log4j.logger.org.I0Itec.zkclient.ZkClient=INFO, stdout
log4j.logger.org.apache.zookeeper=INFO, stdout
21 changes: 21 additions & 0 deletions servers/0.8.1.1/resources/zookeeper.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# the directory where the snapshot is stored.
dataDir={tmp_dir}
# the port at which the clients will connect
clientPort={port}
clientPortAddress={host}
# disable the per-ip limit on the number of connections since this is a non-production config
maxClientCnxns=0
1 change: 0 additions & 1 deletion servers/0.8.1/kafka-src
Submodule kafka-src deleted from 150d0a
Loading