NLTK-example

This example shows how to distribute PySpark with python packages. It is based on this blog.

How to use

Create conda environment and zip them.
Set spark.yarn.appMasterEnv.PYSPARK_PYTHON with your conda environment in spark-defaults.conf
- e.g.) spark.yarn.appMasterEnv.PYSPARK_PYTHON=./NLTK/nltk_env/bin/python
Set environmental variable: PYSPARK_PYTHON=./NLTK/nltk_env/bin/python

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
README.md		README.md
pyspark_nltk.py		pyspark_nltk.py
setup.sh		setup.sh
spark-defaults.conf		spark-defaults.conf