-
Configure HDFS input files path in config.properties (Only Parquet for now)
-
Configure input text column in config.properties
-
Configure HDFS output path in config.properties
-
spark-submit --class org.opentools.extraction.ExtractTopics --master yarn --deploy-mode cluster ExtractTopics-1.0.jar
mvn clean compile
Uber Jar
mvn compile assembly:single