Skip to content

Commit 698f449

Browse files
authored
Create README.md
1 parent bf3879b commit 698f449

File tree

1 file changed

+31
-0
lines changed

1 file changed

+31
-0
lines changed

README.md

+31
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# Spark-ETL-Pipeline
2+
This project is an example for reading data from kafka as a Spark DataFrame and writing objects from Spark into hive database after performing transformation.
3+
4+
## Version Compatibility
5+
6+
Scala| Spark|sbt
7+
--- | --- | ---
8+
2.11.12| 2.4.0| 1.3.13
9+
10+
## Build from Source
11+
```bash
12+
$ sbt assembly
13+
```
14+
## Run
15+
16+
in order to run this application in kerberos enabled environment use the following command.
17+
you have to create your jaas.config file based on your production configuration.
18+
19+
```bash
20+
spark-submit
21+
--deploy-mode cluster
22+
--files "spark_jaas.conf#spark_jaas.conf,your_keytabfile..keytab#your_keytabfile..keytab"
23+
--driver-java-options "-Djava.security.auth.login.config=./spark_jaas.conf"
24+
--conf "spark.executor.extraJavaOptions=-Djava.security.auth.login.config=./spark_jaas.conf"
25+
--conf spark.yarn.submit.waitAppCompletion=false
26+
--driver-memory 16G
27+
--name SPARK_ETL
28+
--files config.ini,log4j.properties,spark_jaas.conf,your_keytabfile..keytab
29+
path_to_your_jar_file.jar path_to_log4j.properties path_to_config.ini
30+
31+
```

0 commit comments

Comments
 (0)