qubole · lewisdawson · Oct 3, 2016 · Oct 5, 2016 · Oct 5, 2016 · Oct 11, 2016
diff --git a/Jenkinsfile b/Jenkinsfile
@@ -0,0 +1,4 @@
+#!/usr/bin/env groovy
+common {
+ slackChannel = '#connect-eng'
+}
diff --git a/NOTICE b/NOTICE
@@ -285,8 +285,8 @@ The following libraries are included in packaged versions of this project:
 
 * Pentaho aggdesigner
  * COPYRIGHT: Copyright 2006 - 2013 Pentaho Corporation
- * LICENSE: licenses/LICENSE.gpl2.txt
- * HOMEPAGE: https://github.com/pentaho/pentaho-aggdesigner
+ * LICENSE: licenses/LICENSE.apache2.txt
+ * HOMEPAGE: https://github.com/julianhyde/aggdesigner/tree/master/pentaho-aggdesigner-algorithm
 
 * SLF4J
  * COPYRIGHT: Copyright (c) 2004-2013 QOS.ch

diff --git a/docs/changelog.rst b/docs/changelog.rst
@@ -3,6 +3,29 @@
 Changelog
 =========
 
+Version 3.3.0
+-------------
+
+* `PR-187 <https://github.com/confluentinc/kafka-connect-hdfs/pull/187>`_ - CC-491: Consolidate and simplify unit tests of HDFS connector.
+* `PR-205 <https://github.com/confluentinc/kafka-connect-hdfs/pull/205>`_ - Upgrade avro to 1.8.2.
+
+Version 3.2.2
+-------------
+
+* `PR-194 <https://github.com/confluentinc/kafka-connect-hdfs/pull/194>`_ - Fix HdfsSinkConnector to extend from SinkConnector instead of Connector.
+* `PR-200 <https://github.com/confluentinc/kafka-connect-hdfs/pull/200>`_ - Fix incorrect licensing and webpage info.
+
+Version 3.2.1
+-------------
+No changes
+
+Version 3.2.0
+-------------
+
+* `PR-135 <https://github.com/confluentinc/kafka-connect-hdfs/pull/135>`_ - Fix typos
+* `PR-164 <https://github.com/confluentinc/kafka-connect-hdfs/pull/164>`_ - Issue 136 - Support topic with dots in hive.
+* `PR-170 <https://github.com/confluentinc/kafka-connect-hdfs/pull/170>`_ - MINOR: Upgrade Hadoop version to 2.7.3 and joda-time to 2.9.7
+
 Version 3.1.1
 -------------
 No changes

diff --git a/docs/conf.py b/docs/conf.py
@@ -57,9 +57,9 @@ def setup(app):
 # built documents.
 #
 # The short X.Y version.
-version = '3.0'
+version = '3.3'
 # The full version, including alpha/beta/rc tags.
-release = '3.1.3-SNAPSHOT'
+release = '3.3.0-hotfix.1'
 
 # The language for content autogenerated by Sphinx. Refer to documentation
 # for a list of supported languages.

diff --git a/docs/hdfs_connector.rst b/docs/hdfs_connector.rst
@@ -5,8 +5,8 @@ The HDFS connector allows you to export data from Kafka topics to HDFS files in
 and integrates with Hive to make data immediately available for querying with HiveQL.
 
 The connector periodically polls data from Kafka and writes them to HDFS. The data from each Kafka
-topic is partitioned by the provided partitioner and divided into chucks. Each chunk of data is
-represented as an HDFS file with topic, kafka partition, start and end offsets of this data chuck
+topic is partitioned by the provided partitioner and divided into chunks. Each chunk of data is
+represented as an HDFS file with topic, kafka partition, start and end offsets of this data chunk
 in the filename. If no partitioner is specified in the configuration, the default partitioner which
 preserves the Kafka partitioning is used. The size of each data chunk is determined by the number of
 records written to HDFS, the time written to HDFS and schema compatibility.
@@ -20,9 +20,8 @@ Quickstart
 In this Quickstart, we use the HDFS connector to export data produced by the Avro console producer
 to HDFS.
 
-Start Zookeeper, Kafka and SchemaRegistry if you haven't done so. The instructions on how to start
-these services are available at the Confluent Platform QuickStart. You also need to have Hadoop
-running locally or remotely and make sure that you know the HDFS url. For Hive integration, you
+Before you start the Confluent services, make sure Hadoop is
+running locally or remotely and that you know the HDFS url. For Hive integration, you
 need to have Hive installed and to know the metastore thrift uri.
 
 This Quickstart assumes that you started the required services with the default configurations and
@@ -39,12 +38,41 @@ Also, this Quickstart assumes that security is not configured for HDFS and Hive
 please make the necessary configurations change following `Secure HDFS and Hive Metastore`_
 section.
 
-First, start the Avro console producer::
+First, start all the necessary services using Confluent CLI.
+
+.. tip::
+
+ If not already in your PATH, add Confluent's ``bin`` directory by running: ``export PATH=<path-to-confluent>/bin:$PATH``
+
+.. sourcecode:: bash
+
+ $ confluent start
+
+Every service will start in order, printing a message with its status:
+
+.. sourcecode:: bash
+
+ Starting zookeeper
+ zookeeper is [UP]
+ Starting kafka
+ kafka is [UP]
+ Starting schema-registry
+ schema-registry is [UP]
+ Starting kafka-rest
+ kafka-rest is [UP]
+ Starting connect
+ connect is [UP]
+
+Next, start the Avro console producer to import a few records to Kafka:
+
+.. sourcecode:: bash
 
  $ ./bin/kafka-avro-console-producer --broker-list localhost:9092 --topic test_hdfs \
  --property value.schema='{"type":"record","name":"myrecord","fields":[{"name":"f1","type":"string"}]}'
 
-Then in the console producer, type in::
+Then in the console producer, type in:
+
+.. sourcecode:: bash
 
  {"f1": "value1"}
  {"f1": "value2"}
@@ -54,42 +82,102 @@ The three records entered are published to the Kafka topic ``test_hdfs`` in Avro
 
 Before starting the connector, please make sure that the configurations in
 ``etc/kafka-connect-hdfs/quickstart-hdfs.properties`` are properly set to your configurations of
-Hadoop, e.g. ``hdfs.url`` points to the proper HDFS and using FQDN in the host. Then run the
-following command to start Kafka connect with the HDFS connector::
+Hadoop, e.g. ``hdfs.url`` points to the proper HDFS and using FQDN in the host. Then start connector by loading its
+configuration with the following command:
+
+.. sourcecode:: bash
 
+ $ confluent load hdfs-sink -d etc/kafka-connect-hdfs/quickstart-hdfs.properties
+ {
+ "name": "hdfs-sink",
+ "config": {
+ "connector.class": "io.confluent.connect.hdfs.HdfsSinkConnector",
+ "tasks.max": "1",
+ "topics": "test_hdfs",
+ "hdfs.url": "hdfs://localhost:9000",
+ "flush.size": "3",
+ "name": "hdfs-sink"
+ },
+ "tasks": []
+ }
 
- $ ./bin/connect-standalone etc/schema-registry/connect-avro-standalone.properties \
- etc/kafka-connect-hdfs/quickstart-hdfs.properties
+To check that the connector started successfully view the Connect worker's log by running:
 
-You should see that the process starts up and logs some messages, and then exports data from Kafka
-to HDFS. Once the connector finishes ingesting data to HDFS, check that the data is available
-in HDFS::
+.. sourcecode:: bash
+
+ $ confluent log connect
+
+Towards the end of the log you should see that the connector starts, logs a few messages, and then exports
+data from Kafka to HDFS.
+Once the connector finishes ingesting data to HDFS, check that the data is available in HDFS:
+
+.. sourcecode:: bash
 
  $ hadoop fs -ls /topics/test_hdfs/partition=0
 
 You should see a file with name ``/topics/test_hdfs/partition=0/test_hdfs+0+0000000000+0000000002.avro``
 The file name is encoded as ``topic+kafkaPartition+startOffset+endOffset.format``.
 
-You can use ``avro-tools-1.7.7.jar``
-(available in `Apache mirrors <http://mirror.metrocast.net/apache/avro/avro-1.7.7/java/avro-tools-1.7.7.jar>`_)
-to extract the content of the file. Run ``avro-tools`` directly on Hadoop as::
+You can use ``avro-tools-1.8.2.jar``
+(available in `Apache mirrors <http://mirror.metrocast.net/apache/avro/avro-1.8.2/java/avro-tools-1.8.2.jar>`_)
+to extract the content of the file. Run ``avro-tools`` directly on Hadoop as:
+
+.. sourcecode:: bash
+
+ $ hadoop jar avro-tools-1.8.2.jar tojson \
+ hdfs://<namenode>/topics/test_hdfs/partition=0/test_hdfs+0+0000000000+0000000002.avro
+
+where "<namenode>" is the HDFS name node hostname.
 
- $ hadoop jar avro-tools-1.7.7.jar tojson \
- /topics/test_hdfs/partition=0/test_hdfs+0+0000000000+0000000002.avro
+or, if you experience issues, first copy the avro file from HDFS to the local filesystem and try again with java:
 
-or, if you experience issues, first copy the avro file from HDFS to the local filesystem and try again with java::
+.. sourcecode:: bash
 
  $ hadoop fs -copyToLocal /topics/test_hdfs/partition=0/test_hdfs+0+0000000000+0000000002.avro \
  /tmp/test_hdfs+0+0000000000+0000000002.avro
 
- $ java -jar avro-tools-1.7.7.jar tojson /tmp/test_hdfs+0+0000000000+0000000002.avro
+ $ java -jar avro-tools-1.8.2.jar tojson /tmp/test_hdfs+0+0000000000+0000000002.avro
 
-You should see the following output::
+You should see the following output:
+
+.. sourcecode:: bash
 
  {"f1":"value1"}
  {"f1":"value2"}
  {"f1":"value3"}
 
+Finally, stop the Connect worker as well as all the rest of the Confluent services by running:
+
+.. sourcecode:: bash
+
+ $ confluent stop
+ Stopping connect
+ connect is [DOWN]
+ Stopping kafka-rest
+ kafka-rest is [DOWN]
+ Stopping schema-registry
+ schema-registry is [DOWN]
+ Stopping kafka
+ kafka is [DOWN]
+ Stopping zookeeper
+ zookeeper is [DOWN]
+
+or stop all the services and additionally wipe out any data generated during this quickstart by running:
+
+.. sourcecode:: bash
+
+ $ confluent destroy
+ Stopping connect
+ connect is [DOWN]
+ Stopping kafka-rest
+ kafka-rest is [DOWN]
+ Stopping schema-registry
+ schema-registry is [DOWN]
+ Stopping kafka
+ kafka is [DOWN]
+ Stopping zookeeper
+ zookeeper is [DOWN]
+ Deleting: /tmp/confluent.w1CpYsaI
 
 .. note:: If you want to run the Quickstart with Hive integration, before starting the connector,
  you need to add the following configurations to
@@ -144,7 +232,9 @@ description of the available configuration options.
 
 Example
 ~~~~~~~
-Here is the content of ``etc/kafka-connect-hdfs/quickstart-hdfs.properties``::
+Here is the content of ``etc/kafka-connect-hdfs/quickstart-hdfs.properties``:
+
+.. sourcecode:: bash
 
  name=hdfs-sink
  connector.class=io.confluent.connect.hdfs.HdfsSinkConnector
@@ -166,18 +256,22 @@ Format and Partitioner
 ~~~~~~~~~~~~~~~~~~~~~~
 You need to specify the ``format.class`` and ``partitioner.class`` if you want to write other
 formats to HDFS or use other partitioners. The following example configurations demonstrates how to
-write Parquet format and use hourly partitioner::
+write Parquet format and use hourly partitioner:
+
+.. sourcecode:: bash
 
  format.class=io.confluent.connect.hdfs.parquet.ParquetFormat
  partitioner.class=io.confluent.connect.hdfs.partitioner.HourlyPartitioner
 
-.. note:: If you want ot use the field partitioner, you need to specify the ``partition.field.name``
+.. note:: If you want to use the field partitioner, you need to specify the ``partition.field.name``
  configuration as well to specify the field name of the record.
 
 Hive Integration
 ~~~~~~~~~~~~~~~~
 At minimum, you need to specify ``hive.integration``, ``hive.metastore.uris`` and
-``schema.compatibility`` when integrating Hive. Here is an example configuration::
+``schema.compatibility`` when integrating Hive. Here is an example configuration:
+
+.. sourcecode:: bash
 
  hive.integration=true
  hive.metastore.uris=thrift://localhost:9083 # FQDN for the host part
@@ -203,7 +297,9 @@ latest Hive table schema. Please find more information on schema compatibility i
 Secure HDFS and Hive Metastore
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 To work with secure HDFS and Hive metastore, you need to specify ``hdfs.authentication.kerberos``,
-``connect.hdfs.principal``, ``connect.keytab``, ``hdfs.namenode.principal``::
+``connect.hdfs.principal``, ``connect.keytab``, ``hdfs.namenode.principal``:
+
+.. sourcecode:: bash
 
  hdfs.authentication.kerberos=true
  connect.hdfs.principal=connect-hdfs/_HOST@YOUR-REALM.COM