services | platforms | author |
---|---|---|
hdinsight |
java |
blackmist |
A basic example of a Java-based Apache Storm Topology that can be used with Storm on HDInsight. This project demonstrates two ways of defining a Java-based Storm topology; one defines the topology programatically in Java, while the other defines the topology using Flux.
The primary difference between the two projects is that defining a topology using Flux separates configuration from implementation. With Flux, the topology (including configuration parameters,) are defined in a YAML file that is provided when you start the topology. This allows you to easily change the configuration without having to recompile the project.
NOTE: Flux is available with Storm 0.10.x, which is included with Storm on HDInsight 3.3 and 3.4. If you are using an older version of Storm on HDinsight, you cannot use Flux and should instead use the project in the Java
directory.
See Develop a Java topology for Storm on HDInsight for a walkthrough of the steps used to create this project.
NOTE: This project assumes Storm 1.0.1, which is available with Storm on HDInsight cluster version 3.5.
-
Fork/Clone the repository to your development environment.
-
Install Java JDK 7 or higher. This was tested with Oracle Java 7 and 8, but should work under things like OpenJDK as well.
-
Install Maven.
-
Assuming Java and Maven are both in the path, and everything is configured fine for JAVA_HOME, use the following to build the topology on the development environment:
mvn compile package
-
If you have installed Storm in your development environment, you can use the following command to run the topology in local mode for testing:
storm jar target/WordCount-1.0-SNAPSHOT.jar org.apache.storm.flux.Flux --local -R /topology.yaml
The
--local
parameter runs the topology in local mode on your development environment. The-R /topology.yaml
parameter uses thetopology.yaml
file resource from the jar file to define the topology.As it runs, the topology will display startup information. Then it begins to display lines similar to the following as sentences are emitted from the spout and processed by the bolts.
17:33:27 [Thread-12-count] INFO com.microsoft.example.WordCount - Emitting a count of 56 for word snow 17:33:27 [Thread-12-count] INFO com.microsoft.example.WordCount - Emitting a count of 56 for word white 17:33:27 [Thread-12-count] INFO com.microsoft.example.WordCount - Emitting a count of 112 for word seven 17:33:27 [Thread-16-count] INFO com.microsoft.example.WordCount - Emitting a count of 195 for word the 17:33:27 [Thread-30-count] INFO com.microsoft.example.WordCount - Emitting a count of 113 for word and 17:33:27 [Thread-30-count] INFO com.microsoft.example.WordCount - Emitting a count of 57 for word dwarfs
There will be a 10 second delay between batches of logged information, as the WordCount component waits on a tick tuple before emitting, and the default timeout defined in the YAML file is 10 seconds.
IMPORTANT!
If you are using Storm on a Windows development machine, you may see errors similar to the following:
2017-12-11 16:28:44,792 main ERROR Unable to create file C:\tools\apache-storm-1.1.1\logs/access-web-${sys:daemon.name}.log java.io.IOException: The filename, directory name, or volume label syntax is incorrect
To work around this error, go to your local Storm development installation and edit the
log4j2\cluster.xml
file. Find the line that begins with<RollingFile name="WEB-ACCESS"
, and remove the string-${sys:daemon.name}
from thefileName
property.On Windows, if no output is generated to the console, you can find it stored in the
<storm installation directory>\logs\jar.log
file. -
Make a copy of the
topology.yaml
file from the project. Call it something likenewtopology.yaml
. In the file, find the following section and change the value of10
to5
. This changes the interval between emitting batches of word counts from 10 seconds to 5.- id: "counter-bolt" className: "com.microsoft.example.WordCount" constructorArgs: - 10 parallelism: 1
-
To run the topology in local mode, use the following command:
storm jar target/WordCount-1.0-SNAPSHOT.jar org.apache.storm.flux.Flux --local /path/to/newtopology.yaml
Change the
/path/to/newtopology.yaml
to the path to the newtopology.yaml file you created in the previous step. This command will use the newtopology.yaml as the topology definition.Once the topology starts, you should notice that the time between emitted batches has changed to reflect the value in newtopology.yaml. So you can see that you can change your configuration through a YAML file without having to recompile the topology.
-
Fork/Clone the repository to your development environment.
-
Install Java JDK 7 or higher. This was tested with Oracle Java 7 and 8, but should work under things like OpenJDK as well.
-
Install Maven
-
Assuming Java and Maven are in the path, and everything is configured fine for JAVA_HOME, use the following to build and run the topology on the development environment:
mvn compile exec:java -Dstorm.topology=com.microsoft.example.WordCountTopology
As it runs, the topology will display startup information. Then it begins to display lines similar to the following as sentences are emitted from the spout and processed by the bolts.
17:33:27 [Thread-12-count] INFO com.microsoft.example.WordCount - Emitting a count of 56 for word snow 17:33:27 [Thread-12-count] INFO com.microsoft.example.WordCount - Emitting a count of 56 for word white 17:33:27 [Thread-12-count] INFO com.microsoft.example.WordCount - Emitting a count of 112 for word seven 17:33:27 [Thread-16-count] INFO com.microsoft.example.WordCount - Emitting a count of 195 for word the 17:33:27 [Thread-30-count] INFO com.microsoft.example.WordCount - Emitting a count of 113 for word and 17:33:27 [Thread-30-count] INFO com.microsoft.example.WordCount - Emitting a count of 57 for word dwarfs
While you can package and deploy this to an HDInsight cluster, it's pretty boring since this topology doesn't generate any output files. So you can see it running, and creating multiple instances, but that's about it.
Use the following command to create a .jar package for the topology.
mvn package
This will create a file named WordCount-1.0-SNAPSHOT.jar
in the target
directory.
Use one of the following links to learn how to deploy the jar file to a Storm on HDInsight cluster:
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.