Skip to content

Commit 7e01e47

Browse files
committed
Added basic build script and PIP package info
1 parent e6200d1 commit 7e01e47

File tree

6 files changed

+249
-0
lines changed

6 files changed

+249
-0
lines changed

.pypirc

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
[distutils]
2+
index-servers=pypi
3+
4+
[pypi]
5+
repository = https://upload.pypi.org/legacy/
6+
username = s8sg

.travis.yml

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
language: python
2+
sudo: false
3+
python:
4+
- "2.6"
5+
- "2.7"
6+
- "3.2"
7+
- "3.3"
8+
- "3.4"
9+
- "3.5"
10+
- "3.5-dev" # 3.5 development branch
11+
- "3.6"
12+
- "3.6-dev" # 3.6 development branch
13+
- "3.7-dev" # 3.7 development branch
14+
- "nightly" # currently points to 3.7-dev
15+
# command to install dependencies
16+
install: "pip install -e ."

README.rst

Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
Spark Py Submit
2+
===============
3+
4+
A python library that can submit spark job to spark yarn cluster using
5+
rest API
6+
7+
| **Note: It Currently supports the CDH(5.6.1) and
8+
HDP(2.3.2.0-2950,2.4.0.0-169)**
9+
| The Library is Inspired from:
10+
``github.com/bernhard-42/spark-yarn-rest-api``
11+
12+
Getting Started:
13+
~~~~~~~~~~~~~~~~
14+
15+
Use the library
16+
^^^^^^^^^^^^^^^
17+
18+
.. code:: python
19+
20+
# Import the SparkJobHandler
21+
from spark_job_handler import SparkJobHandler
22+
23+
...
24+
25+
logger = logging.getLogger('TestLocalJobSubmit')
26+
# Create a spark JOB
27+
# jobName: name of the Spark Job
28+
# jar: location of the Jar (local/hdfs)
29+
# run_class: entry class of the appliaction
30+
# hadoop_rm: hadoop resource manager host ip
31+
# hadoop_web_hdfs: hadoop web hdfs ip
32+
# hadoop_nn: hadoop name node ip (Normally same as of web_hdfs)
33+
# env_type: env type is CDH or HDP
34+
# local_jar: flag to define if a jar is local (Local jar gets uploaded to hdfs)
35+
# spark_properties: custom properties that need to be set
36+
sparkJob = SparkJobHandler(logger=logger, job_name="test_local_job_submit",
37+
jar="./simple-project/target/scala-2.10/simple-project_2.10-1.0.jar",
38+
run_class="IrisApp", hadoop_rm='rma', hadoop_web_hdfs='nn', hadoop_nn='nn',
39+
env_type="CDH", local_jar=True, spark_properties=None)
40+
trackingUrl = sparkJob.run()
41+
print "Job Tracking URL: %s" % trackingUrl
42+
43+
| The above code starts an spark application using the local jar
44+
(simple-project/target/scala-2.10/simple-project\_2.10-1.0.jar)
45+
| For more example see the
46+
`test\_spark\_job\_handler.py <https://github.com/s8sg/spark-py-submit/blob/master/test_spark_job_handler.py>`__
47+
48+
Build the simple-project
49+
^^^^^^^^^^^^^^^^^^^^^^^^
50+
51+
.. code:: bash
52+
53+
$ cd simple-project
54+
$ sbt package;cd ..
55+
56+
The above steps will create the target jar as:
57+
``./simple-project/target/scala-2.10/simple-project_2.10-1.0.jar``
58+
59+
Update the nodes Ip in test:
60+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
61+
62+
| Add the node IP for hadoop resource manager and Name node in the
63+
test\_cases:
64+
| \* rm: Resource Manager \* nn: Name Node
65+
66+
load the data and make it available to HDFS:
67+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
68+
69+
.. code:: bash
70+
71+
$ wget https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data
72+
73+
upload data to the HDFS:
74+
75+
.. code:: bash
76+
77+
$ python upload_to_hdfs.py <name_nodei_ip> iris.data /tmp/iris.data
78+
79+
Run the test cases:
80+
^^^^^^^^^^^^^^^^^^^
81+
82+
Make the simple-project jar available in HDFS to test remote jar:
83+
84+
.. code:: bash
85+
86+
$ python upload_to_hdfs.py <name_nodei_ip> simple-project/target/scala-2.10/simple-project_2.10-1.0.jar /tmp/test_data/simple-project_2.10-1.0.jar
87+
88+
Run the test:
89+
90+
.. code:: bash
91+
92+
$ python test_spark_job_handler.py
93+
94+
Utility:
95+
~~~~~~~~
96+
97+
- upload\_to\_hdfs.py: upload local file to hdfs file system
98+
99+
Notes:
100+
~~~~~~
101+
102+
| The Library is still in early stage and need testing, bug-fixing and
103+
documentation
104+
| Before running, follow the below steps:
105+
| \* Update the ResourceManager,NameNode and WebHDFS Port if required in
106+
settings.py
107+
| \* Make the spark-jar available in hdfs as:
108+
``hdfs:/user/spark/share/lib/spark-assembly.jar``
109+
| For Contribution Please Create Issue corresponding PR

setup.cfg

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
[bdist_wheel]
2+
universal=1

setup.py

Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
# Always prefer setuptools over distutils
2+
from setuptools import setup, find_packages
3+
# To use a consistent encoding
4+
from codecs import open
5+
from os import path
6+
7+
here = path.abspath(path.dirname(__file__))
8+
9+
# Get the long description from the README file
10+
with open(path.join(here, 'README.rst'), encoding='utf-8') as f:
11+
long_description = f.read()
12+
13+
setup(
14+
name='spark-yarn-submit',
15+
16+
# Versions should comply with PEP440. For a discussion on single-sourcing
17+
# the version across setup.py and the project code, see
18+
# https://packaging.python.org/en/latest/single_source_version.html
19+
version='1.0.0',
20+
21+
description='library to handle spark job submit in a yarn cluster in different environment',
22+
long_description=long_description,
23+
24+
# The project's main homepage.
25+
url='https://github.com/s8sg/spark-py-submit',
26+
27+
# Author details
28+
author='Swarvanu Sengupta (s8sg)',
29+
author_email='swarvanusg@gmail.com',
30+
31+
# Choose your license
32+
license='MIT',
33+
34+
# See https://pypi.python.org/pypi?%3Aaction=list_classifiers
35+
classifiers=[
36+
# How mature is this project? Common values are
37+
# 3 - Alpha
38+
# 4 - Beta
39+
# 5 - Production/Stable
40+
'Development Status :: 3 - Alpha',
41+
42+
# Indicate who your project is intended for
43+
'Intended Audience :: Developers',
44+
'Topic :: Software Development :: Build Tools',
45+
46+
# Pick your license as you wish (should match "license" above)
47+
'License :: OSI Approved :: MIT License',
48+
49+
# Specify the Python versions you support here. In particular, ensure
50+
# that you indicate whether you support Python 2, Python 3 or both.
51+
'Programming Language :: Python :: 2',
52+
'Programming Language :: Python :: 2.6',
53+
'Programming Language :: Python :: 2.7',
54+
'Programming Language :: Python :: 3',
55+
'Programming Language :: Python :: 3.3',
56+
'Programming Language :: Python :: 3.4',
57+
'Programming Language :: Python :: 3.5',
58+
],
59+
60+
# What does your project relate to?
61+
keywords='spark yarn submit bigdata hadoop',
62+
63+
# You can just specify the packages manually here if your project is
64+
# simple. Or you can use find_packages().
65+
packages=find_packages(),
66+
67+
# Alternatively, if you want to distribute just a my_module.py, uncomment
68+
# this:
69+
py_modules=["spark_job_handler","settings"],
70+
71+
# List run-time dependencies here. These will be installed by pip when
72+
# your project is installed. For an analysis of "install_requires" vs pip's
73+
# requirements files see:
74+
# https://packaging.python.org/en/latest/requirements.html
75+
76+
# List additional groups of dependencies here (e.g. development
77+
# dependencies). You can install these using the following syntax,
78+
# for example:
79+
# $ pip install -e .[dev,test]
80+
81+
# If there are data files included in your packages that need to be
82+
# installed, specify them here. If using Python 2.6 or less, then these
83+
# have to be included in MANIFEST.in as well.
84+
85+
# Although 'package_data' is the preferred approach, in some case you may
86+
# need to place data files outside of your packages. See:
87+
# http://docs.python.org/3.4/distutils/setupscript.html#installing-additional-files # noqa
88+
# In this case, 'data_file' will be installed into '<sys.prefix>/my_data'
89+
90+
# To provide executable scripts, use entry points in preference to the
91+
# "scripts" keyword. Entry points provide cross-platform support and allow
92+
# pip to create the appropriate form of executable for the target platform.
93+
)

tox.ini

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
[tox]
2+
envlist = py{26,27,33,34}
3+
4+
[testenv]
5+
basepython =
6+
py26: python2.6
7+
py27: python2.7
8+
py33: python3.3
9+
py34: python3.4
10+
deps =
11+
check-manifest
12+
{py27,py33,py34}: readme_renderer
13+
flake8
14+
pytest
15+
commands =
16+
check-manifest --ignore tox.ini,tests*
17+
# py26 doesn't have "setup.py check"
18+
{py27,py33,py34}: python setup.py check -m -r -s
19+
flake8 .
20+
py.test tests
21+
[flake8]
22+
exclude = .tox,*.egg,build,data
23+
select = E,W,F

0 commit comments

Comments
 (0)