fix 'can not find /tmp/xxx/yyy.tar.gz' error when use spark cluster mode #3111
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
when spark use cluster deploy-mode, the run_path will be created on the submitting host instead of the host where the driver is located. This will casuse error below:
Traceback (most recent call last): File "pyspark_runner.py", line 143, in <module> _get_runner_class()(*sys.argv[1:]).run() File "pyspark_runner.py", line 119, in run self.job.setup_remote(sc) File "/opt/tiger/ss_lib/python_package/lib/python2.7/site-packages/luigi/contrib/spark.py", line 307, in setup_remote self._setup_packages(sc) File "/opt/tiger/ss_lib/python_package/lib/python2.7/site-packages/luigi/contrib/spark.py", line 364, in _setup_packages tar = tarfile.open(tar_path, "w:gz") File "/usr/lib/python2.7/tarfile.py", line 1693, in open return func(name, filemode, fileobj, **kwargs) File "/usr/lib/python2.7/tarfile.py", line 1740, in gzopen fileobj = gzip.GzipFile(name, mode, compresslevel, fileobj) File "/usr/lib/python2.7/gzip.py", line 94, in __init__ fileobj = self.myfileobj = __builtin__.open(filename, mode or 'rb') IOError: [Errno 2] No such file or directory: '/tmp/xxxYcUXC/yyy.tar.gz'
Description
In this PR, we will create the parent directory before compresse and upload packages in the host where driver is located.
The driver is the role who will run
_setup_packages
func but notrun
func inclass SparkSubmitTask