Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PySparkTask fix for bytes / str type error and import error #2168

Merged
merged 4 commits into from
Jul 13, 2017

Conversation

ntim
Copy link
Contributor

@ntim ntim commented Jun 30, 2017

Description

With master, the following exception occurs when e.g. executing the "pyspark_wc.py" example:

ERROR: [pid 18936] Worker Worker(salt=451680745, workers=1, host=..., username=..., pid=18936) failed    InlinePySparkWordCount()
Traceback (most recent call last):
  File ".../luigi/worker.py", line 191, in run
    new_deps = self._run_get_new_deps()
  File ".../luigi/worker.py", line 129, in _run_get_new_deps
    task_gen = self.task.run()
  File ".../luigi/contrib/spark.py", line 281, in run
    self._dump(fd)
  File ".../luigi/contrib/spark.py", line 292, in _dump
    d = d.replace(b'(c__main__', "(c" + module_name)
TypeError: can't concat bytes to str

Motivation and Context

Resolves the issue with python 3.6 and python 2.7

Have you tested this? If so, how?

This stage can now be successful executed given the pyspark_wc.py file is put in the PYTHON_PATH (see #1576)

Resolves #1988

@mention-bot
Copy link

@ntim, thanks for your PR! By analyzing the history of the files in this pull request, we identified @jthi3rry, @ehdr and @steenzout to be potential reviewers.

… the run directory and add the run directory to the PYTHON_PATH.
@ntim ntim changed the title PySparkTask fix for bytes / str type error PySparkTask fix for bytes / str type error and import error Jun 30, 2017
@ntim
Copy link
Contributor Author

ntim commented Jun 30, 2017

Fixes #1576 by coping the python file in which the class of the job instance is defined to the run path to which the instance is pickled. Then in the pyspark_runner.py included the run path to the PYTHON_PATH at run time.

@Tarrasch
Copy link
Contributor

Would it possible to unittest this? Even if you can't run a spark task (that would be ideal) you could ensure xyz is pickle-able?

@ntim
Copy link
Contributor Author

ntim commented Jun 30, 2017

I am now able to run spark tasks with luigi, I will look into the existing spark unit tests.

@ntim
Copy link
Contributor Author

ntim commented Jul 3, 2017

@Tarrasch added some simple checks to see if the unpickling of the task instance works

@Tarrasch
Copy link
Contributor

Tarrasch commented Jul 3, 2017

@ntim, cool just double-check, the test-case failed before this patch right?

@Tarrasch
Copy link
Contributor

Tarrasch commented Jul 3, 2017

Seems there also flake 8 error: https://travis-ci.org/spotify/luigi/jobs/249591300

@ntim
Copy link
Contributor Author

ntim commented Jul 4, 2017

Ah sorry, forgot to uncomment an assertion, tests are all green now.

@ntim
Copy link
Contributor Author

ntim commented Jul 13, 2017

@Tarrasch can you please merge?

@dlstadther dlstadther merged commit 484095b into spotify:master Jul 13, 2017
This was referenced Jun 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants