@@ -8,31 +8,29 @@ A python library that can submit spark job to spark yarn cluster using rest API
8
8
### Getting Started:
9
9
10
10
#### Use the library
11
- ** Spark Job Handler** :
12
- ** jobName:** name of the Spark Job
13
- ** jar:** location of the Jar (local/hdfs)
14
- ** run_class:** Entry class of the appliaction
15
- ** hadoop_rm:** hadoop resource manager host ip
16
- ** hadoop_web_hdfs:** hadoop web hdfs ip
17
- ** hadoop_nn:** hadoop name node ip (Normally same as of web_hdfs)
18
- ** env_type** : env type is CDH or HDP
19
- ** local_jar:** flag to define if a jar is local (Local jar gets uploaded to hdfs)
20
- ** spark_properties:** custom properties that need to be set
21
-
22
11
``` python
23
- # Import the SparkJobHandler
24
- from spark_job_handler import SparkJobHandler
12
+ # Import the SparkJobHandler
13
+ from spark_job_handler import SparkJobHandler
25
14
26
- ...
15
+ ...
27
16
28
- logger = logging.getLogger(' TestLocalJobSubmit' )
29
- # Create a spark JOB
30
- sparkJob = SparkJobHandler(logger = logger, job_name = " test_local_job_submit" ,
31
- jar = " ./simple-project/target/scala-2.10/simple-project_2.10-1.0.jar" ,
32
- run_class = " IrisApp" , hadoop_rm = ' rma' , hadoop_web_hdfs = ' nn' , hadoop_nn = ' nn' ,
33
- env_type = " CDH" , local_jar = True , spark_properties = None )
34
- trackingUrl = sparkJob.run()
35
- print " Job Tracking URL: %s " % trackingUrl
17
+ logger = logging.getLogger(' TestLocalJobSubmit' )
18
+ # Create a spark JOB
19
+ # jobName: name of the Spark Job
20
+ # jar: location of the Jar (local/hdfs)
21
+ # run_class: entry class of the appliaction
22
+ # hadoop_rm: hadoop resource manager host ip
23
+ # hadoop_web_hdfs: hadoop web hdfs ip
24
+ # hadoop_nn: hadoop name node ip (Normally same as of web_hdfs)
25
+ # env_type: env type is CDH or HDP
26
+ # local_jar: flag to define if a jar is local (Local jar gets uploaded to hdfs)
27
+ # spark_properties: custom properties that need to be set
28
+ sparkJob = SparkJobHandler(logger = logger, job_name = " test_local_job_submit" ,
29
+ jar = " ./simple-project/target/scala-2.10/simple-project_2.10-1.0.jar" ,
30
+ run_class = " IrisApp" , hadoop_rm = ' rma' , hadoop_web_hdfs = ' nn' , hadoop_nn = ' nn' ,
31
+ env_type = " CDH" , local_jar = True , spark_properties = None )
32
+ trackingUrl = sparkJob.run()
33
+ print " Job Tracking URL: %s " % trackingUrl
36
34
```
37
35
The above code starts an spark application using the local jar (simple-project/target/scala-2.10/simple-project_2.10-1.0.jar)
38
36
For more example see the [ test_spark_job_handler.py] ( https://github.com/s8sg/spark-py-submit/blob/master/test_spark_job_handler.py )
@@ -68,11 +66,12 @@ Run the test:
68
66
$ python test_spark_job_handler.py
69
67
```
70
68
71
- #### Utility:
69
+ ### Utility:
72
70
* upload_to_hdfs.py: upload local file to hdfs file system
73
71
74
- #### Notes:
75
- The Library is still in early stage and need testing, fixing and documentation
72
+ ### Notes:
73
+ The Library is still in early stage and need testing, bug- fixing and documentation
76
74
Before running, follow the below steps:
77
- * Update the Port if required in settings.py
78
- * Make the spark-jar available in hdfs as: ` /user/spark/share/lib/spark-assembly.jar `
75
+ * Update the ResourceManager,NameNode and WebHDFS Port if required in settings.py
76
+ * Make the spark-jar available in hdfs as: ` hdfs:/user/spark/share/lib/spark-assembly.jar `
77
+ For Contribution Please Create Issue corresponding PR
0 commit comments