Skip to content


Folders and files

Last commit message
Last commit date

Latest commit



11 Commits

Repository files navigation


Example of SummingBird in hybrid mode.

I'm currently improving it to be used as a production-ready bootstrap for any job.

See logged issues for expected improvements. Feel free as well to add any other!


 - Get Kafka 0.7.2 here:

Unzip it anywhere and follow instructions until step 4:

To test it, run a producer and a consumer in two separate shell windows and write anything to the producer shell window. You should see what was sent in the consumer shell window.

Note: the example is currently incompatible with Kafka 0.8.x due to KafkaSpout being available only for version 0.7.x. Some examples are available but not production ready.

 - Get Memcached here:

On OSX you can use brew to get it:

   brew install memcached


 - First get summingbird from my cloned repository:
This clone uses version 0.8.0 of Storehaus library as there is a compatibility issue with Memcached store.

 - Build the project using the following command:
    cd summingbird
   ./sbt update compile

 - Compile summingbird-example project:
   cd summingbird-example
   ./sbt update compile


Configuration strings are located in src/main/resources/ file.

All configuration strings are retrieved from package objects.

 - Change all paths for Scalding. Storm and memcached configuration values should be the same (default local mode).

Running everything

 - Start ZooKeeper, Kafka server and a Kafka producer: follow instructions from step 2 and step 3 (

 - Start memcached service: go to memcached folder and run ```memcached``` command

 - Start example "service":
    ./sbt "company-hybrid-example/run --local"

This starts Storm and Scalding platforms

 - Start example console in another term window to send data and test Kafka (this will be soon replaced by a real test producer)
    ./sbt "company-hybrid-example/console"

You can now send messages from your Kafka producer which will be consumed by Summingbird.

To test the Storm consumer, type in the Scala REPL:
    scala> import
    scala> HybridClient.stormLookup("timestamp") // This tests the Storm store linked to the ClientStore
    res1: Option[Long] = Some(2)
    scala> HybridClient.processHadoop // This tests the Storm store linked to the ClientStore
    scala> HybridClient.lookup("timestamp") // This inserts a file containing 2 lines with timestamp and 1 line with another json key/value pair
    [... lot of logs ...]
    scala> HybridClient.lookup("timestamp") // This queries the ClientStore to retrieved merged values between Storm and Scalding stores
    res2: Option[Long] = Some(4)

To test the Scalding consumer, type in the Scala REPL (to be replaced by a real consumer App as well)
    scala> HybridClient.processHadoop
    [... some logging ...]
    scala> HybridClient.hadoopLookup
    Results : Right((BatchID.11636597,ReaderFn(<function1>)))
    scala> ScaldingRunner.queryFiles()
    14/04/02 00:36:35 INFO compress.CodecPool: Got brand-new decompressor
    14/04/02 00:36:35 INFO compress.CodecPool: Got brand-new decompressor
    14/04/02 00:36:35 INFO compress.CodecPool: Got brand-new decompressor
    lolstamp : 1
    timestamp : 2


For Storm: make sure everything is already launched (Kafka + memcached)

For Scalding: change directories in Scalding package object and create a file to be consumed (in a JSON format) somewhere.


Example of SummingBird in hybrid mode






No releases published


No packages published