Added basic build script and PIP package info

s8sg · s8sg · commit 7e01e47e40c3 · 2017-02-09T19:04:05.000+09:00
diff --git a/.pypirc b/.pypirc
@@ -0,0 +1,6 @@
+[distutils]
+index-servers=pypi
+
+[pypi]
+repository = https://upload.pypi.org/legacy/
+username = s8sg
diff --git a/.travis.yml b/.travis.yml
@@ -0,0 +1,16 @@
+language: python
+sudo: false
+python:
+        - "2.6"
+        - "2.7"
+        - "3.2"
+        - "3.3"
+        - "3.4"
+        - "3.5"
+        - "3.5-dev" # 3.5 development branch
+        - "3.6"
+        - "3.6-dev" # 3.6 development branch
+        - "3.7-dev" # 3.7 development branch
+        - "nightly" # currently points to 3.7-dev
+# command to install dependencies
+install: "pip install -e ."
diff --git a/README.rst b/README.rst
@@ -0,0 +1,109 @@
+Spark Py Submit
+===============
+
+A python library that can submit spark job to spark yarn cluster using
+rest API
+
+| **Note: It Currently supports the CDH(5.6.1) and
+  HDP(2.3.2.0-2950,2.4.0.0-169)**
+| The Library is Inspired from:
+  ``github.com/bernhard-42/spark-yarn-rest-api``
+
+Getting Started:
+~~~~~~~~~~~~~~~~
+
+Use the library
+^^^^^^^^^^^^^^^
+
+.. code:: python
+
+    # Import the SparkJobHandler
+    from spark_job_handler import SparkJobHandler
+
+    ...
+
+    logger = logging.getLogger('TestLocalJobSubmit')
+    # Create a spark JOB
+    # jobName:           name of the Spark Job   
+    # jar:               location of the Jar (local/hdfs)  
+    # run_class:         entry class of the appliaction   
+    # hadoop_rm:         hadoop resource manager host ip  
+    # hadoop_web_hdfs:   hadoop web hdfs ip   
+    # hadoop_nn:         hadoop name node ip (Normally same as of web_hdfs)  
+    # env_type:          env type is CDH or HDP  
+    # local_jar:         flag to define if a jar is local (Local jar gets uploaded to hdfs)  
+    # spark_properties:  custom properties that need to be set 
+    sparkJob = SparkJobHandler(logger=logger, job_name="test_local_job_submit", 
+                jar="./simple-project/target/scala-2.10/simple-project_2.10-1.0.jar",
+                run_class="IrisApp", hadoop_rm='rma', hadoop_web_hdfs='nn', hadoop_nn='nn',
+                env_type="CDH", local_jar=True, spark_properties=None)
+    trackingUrl = sparkJob.run()
+    print "Job Tracking URL: %s" % trackingUrl
+
+| The above code starts an spark application using the local jar
+  (simple-project/target/scala-2.10/simple-project\_2.10-1.0.jar)
+| For more example see the
+  `test\_spark\_job\_handler.py <https://github.com/s8sg/spark-py-submit/blob/master/test_spark_job_handler.py>`__
+
+Build the simple-project
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code:: bash
+
+      $ cd simple-project
+      $ sbt package;cd ..
+
+The above steps will create the target jar as:
+``./simple-project/target/scala-2.10/simple-project_2.10-1.0.jar``
+
+Update the nodes Ip in test:
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+| Add the node IP for hadoop resource manager and Name node in the
+  test\_cases:
+| \* rm: Resource Manager \* nn: Name Node
+
+load the data and make it available to HDFS:
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code:: bash
+
+      $ wget https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data
+
+upload data to the HDFS:
+
+.. code:: bash
+
+      $ python upload_to_hdfs.py <name_nodei_ip> iris.data /tmp/iris.data
+
+Run the test cases:
+^^^^^^^^^^^^^^^^^^^
+
+Make the simple-project jar available in HDFS to test remote jar:
+
+.. code:: bash
+
+      $ python upload_to_hdfs.py <name_nodei_ip> simple-project/target/scala-2.10/simple-project_2.10-1.0.jar /tmp/test_data/simple-project_2.10-1.0.jar
+
+Run the test:
+
+.. code:: bash
+
+      $ python test_spark_job_handler.py 
+
+Utility:
+~~~~~~~~
+
+-  upload\_to\_hdfs.py: upload local file to hdfs file system
+
+Notes:
+~~~~~~
+
+| The Library is still in early stage and need testing, bug-fixing and
+  documentation
+| Before running, follow the below steps:
+| \* Update the ResourceManager,NameNode and WebHDFS Port if required in
+  settings.py
+| \* Make the spark-jar available in hdfs as:
+  ``hdfs:/user/spark/share/lib/spark-assembly.jar``
+| For Contribution Please Create Issue corresponding PR
diff --git a/setup.cfg b/setup.cfg
@@ -0,0 +1,2 @@
+[bdist_wheel]
+universal=1
diff --git a/setup.py b/setup.py
@@ -0,0 +1,93 @@
+# Always prefer setuptools over distutils
+from setuptools import setup, find_packages
+# To use a consistent encoding
+from codecs import open
+from os import path
+
+here = path.abspath(path.dirname(__file__))
+
+# Get the long description from the README file
+with open(path.join(here, 'README.rst'), encoding='utf-8') as f:
+    long_description = f.read()
+
+setup(
+    name='spark-yarn-submit',
+
+    # Versions should comply with PEP440.  For a discussion on single-sourcing
+    # the version across setup.py and the project code, see
+    # https://packaging.python.org/en/latest/single_source_version.html
+    version='1.0.0',
+
+    description='library to handle spark job submit in a yarn cluster in different environment',
+    long_description=long_description,
+
+    # The project's main homepage.
+    url='https://github.com/s8sg/spark-py-submit',
+
+    # Author details
+    author='Swarvanu Sengupta (s8sg)',
+    author_email='swarvanusg@gmail.com',
+
+    # Choose your license
+    license='MIT',
+
+    # See https://pypi.python.org/pypi?%3Aaction=list_classifiers
+    classifiers=[
+        # How mature is this project? Common values are
+        #   3 - Alpha
+        #   4 - Beta
+        #   5 - Production/Stable
+        'Development Status :: 3 - Alpha',
+
+        # Indicate who your project is intended for
+        'Intended Audience :: Developers',
+        'Topic :: Software Development :: Build Tools',
+
+        # Pick your license as you wish (should match "license" above)
+        'License :: OSI Approved :: MIT License',
+
+        # Specify the Python versions you support here. In particular, ensure
+        # that you indicate whether you support Python 2, Python 3 or both.
+        'Programming Language :: Python :: 2',
+        'Programming Language :: Python :: 2.6',
+        'Programming Language :: Python :: 2.7',
+        'Programming Language :: Python :: 3',
+        'Programming Language :: Python :: 3.3',
+        'Programming Language :: Python :: 3.4',
+        'Programming Language :: Python :: 3.5',
+    ],
+
+    # What does your project relate to?
+    keywords='spark yarn submit bigdata hadoop',
+
+    # You can just specify the packages manually here if your project is
+    # simple. Or you can use find_packages().
+    packages=find_packages(),
+
+    # Alternatively, if you want to distribute just a my_module.py, uncomment
+    # this:
+    py_modules=["spark_job_handler","settings"],
+
+    # List run-time dependencies here.  These will be installed by pip when
+    # your project is installed. For an analysis of "install_requires" vs pip's
+    # requirements files see:
+    # https://packaging.python.org/en/latest/requirements.html
+
+    # List additional groups of dependencies here (e.g. development
+    # dependencies). You can install these using the following syntax,
+    # for example:
+    # $ pip install -e .[dev,test]
+
+    # If there are data files included in your packages that need to be
+    # installed, specify them here.  If using Python 2.6 or less, then these
+    # have to be included in MANIFEST.in as well.
+
+    # Although 'package_data' is the preferred approach, in some case you may
+    # need to place data files outside of your packages. See:
+    # http://docs.python.org/3.4/distutils/setupscript.html#installing-additional-files # noqa
+    # In this case, 'data_file' will be installed into '<sys.prefix>/my_data'
+
+    # To provide executable scripts, use entry points in preference to the
+    # "scripts" keyword. Entry points provide cross-platform support and allow
+    # pip to create the appropriate form of executable for the target platform.
+)
diff --git a/tox.ini b/tox.ini
@@ -0,0 +1,23 @@
+[tox]
+envlist = py{26,27,33,34}
+
+[testenv]
+basepython =
+    py26: python2.6
+    py27: python2.7
+    py33: python3.3
+    py34: python3.4
+deps =
+    check-manifest
+    {py27,py33,py34}: readme_renderer
+    flake8
+    pytest
+commands =
+    check-manifest --ignore tox.ini,tests*
+    # py26 doesn't have "setup.py check"
+    {py27,py33,py34}: python setup.py check -m -r -s
+    flake8 .
+    py.test tests
+[flake8]
+exclude = .tox,*.egg,build,data
+select = E,W,F