Skip to content

Commit e6200d1

Browse files
authored
Update README.md
1 parent 4cbfe65 commit e6200d1

File tree

1 file changed

+26
-27
lines changed

1 file changed

+26
-27
lines changed

README.md

Lines changed: 26 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -8,31 +8,29 @@ A python library that can submit spark job to spark yarn cluster using rest API
88
### Getting Started:
99

1010
#### Use the library
11-
**Spark Job Handler**:
12-
**jobName:** name of the Spark Job
13-
**jar:** location of the Jar (local/hdfs)
14-
**run_class:** Entry class of the appliaction
15-
**hadoop_rm:** hadoop resource manager host ip
16-
**hadoop_web_hdfs:** hadoop web hdfs ip
17-
**hadoop_nn:** hadoop name node ip (Normally same as of web_hdfs)
18-
**env_type**: env type is CDH or HDP
19-
**local_jar:** flag to define if a jar is local (Local jar gets uploaded to hdfs)
20-
**spark_properties:** custom properties that need to be set
21-
2211
```python
23-
# Import the SparkJobHandler
24-
from spark_job_handler import SparkJobHandler
12+
# Import the SparkJobHandler
13+
from spark_job_handler import SparkJobHandler
2514

26-
...
15+
...
2716

28-
logger = logging.getLogger('TestLocalJobSubmit')
29-
# Create a spark JOB
30-
sparkJob = SparkJobHandler(logger=logger, job_name="test_local_job_submit",
31-
jar="./simple-project/target/scala-2.10/simple-project_2.10-1.0.jar",
32-
run_class="IrisApp", hadoop_rm='rma', hadoop_web_hdfs='nn', hadoop_nn='nn',
33-
env_type="CDH", local_jar=True, spark_properties=None)
34-
trackingUrl = sparkJob.run()
35-
print "Job Tracking URL: %s" % trackingUrl
17+
logger = logging.getLogger('TestLocalJobSubmit')
18+
# Create a spark JOB
19+
# jobName: name of the Spark Job
20+
# jar: location of the Jar (local/hdfs)
21+
# run_class: entry class of the appliaction
22+
# hadoop_rm: hadoop resource manager host ip
23+
# hadoop_web_hdfs: hadoop web hdfs ip
24+
# hadoop_nn: hadoop name node ip (Normally same as of web_hdfs)
25+
# env_type: env type is CDH or HDP
26+
# local_jar: flag to define if a jar is local (Local jar gets uploaded to hdfs)
27+
# spark_properties: custom properties that need to be set
28+
sparkJob = SparkJobHandler(logger=logger, job_name="test_local_job_submit",
29+
jar="./simple-project/target/scala-2.10/simple-project_2.10-1.0.jar",
30+
run_class="IrisApp", hadoop_rm='rma', hadoop_web_hdfs='nn', hadoop_nn='nn',
31+
env_type="CDH", local_jar=True, spark_properties=None)
32+
trackingUrl = sparkJob.run()
33+
print "Job Tracking URL: %s" % trackingUrl
3634
```
3735
The above code starts an spark application using the local jar (simple-project/target/scala-2.10/simple-project_2.10-1.0.jar)
3836
For more example see the [test_spark_job_handler.py](https://github.com/s8sg/spark-py-submit/blob/master/test_spark_job_handler.py)
@@ -68,11 +66,12 @@ Run the test:
6866
$ python test_spark_job_handler.py
6967
```
7068

71-
#### Utility:
69+
### Utility:
7270
* upload_to_hdfs.py: upload local file to hdfs file system
7371

74-
#### Notes:
75-
The Library is still in early stage and need testing, fixing and documentation
72+
### Notes:
73+
The Library is still in early stage and need testing, bug-fixing and documentation
7674
Before running, follow the below steps:
77-
* Update the Port if required in settings.py
78-
* Make the spark-jar available in hdfs as: `/user/spark/share/lib/spark-assembly.jar`
75+
* Update the ResourceManager,NameNode and WebHDFS Port if required in settings.py
76+
* Make the spark-jar available in hdfs as: `hdfs:/user/spark/share/lib/spark-assembly.jar`
77+
For Contribution Please Create Issue corresponding PR

0 commit comments

Comments
 (0)