Skip to content

Conversation

@vundela
Copy link

@vundela vundela commented Apr 21, 2017

What changes were proposed in this pull request?

In pyspark when multiple threads are used, broadcast variables are pickled with wrong PythonRDD wrap functions which leads to the following exception(Because of the race condition between the threads on java side with py4j).

16/01/08 17:10:20 ERROR Executor: Exception in task 0.0 in stage 9.0 (TID 9)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/Network/Servers/mother.adverplex.com/Volumes/homeland/Users/walker/.spark/spark-1.6.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/worker.py", line 98, in main
command = pickleSer._read_with_length(infile)
File "/Network/Servers/mother.adverplex.com/Volumes/homeland/Users/walker/.spark/spark-1.6.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/serializers.py", line 164, in _read_with_length
return self.loads(obj)
File "/Network/Servers/mother.adverplex.com/Volumes/homeland/Users/walker/.spark/spark-1.6.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/serializers.py", line 422, in loads
return pickle.loads(obj)
File "/Network/Servers/mother.adverplex.com/Volumes/homeland/Users/walker/.spark/spark-1.6.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/broadcast.py", line 39, in _from_id
raise Exception("Broadcast variable '%s' not loaded!" % bid)
Exception: (Exception("Broadcast variable '6' not loaded!",), <function _from_id at 0xce7a28>, (6L,))

at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:166)
at org.apache.spark.api.python.PythonRunner$$anon$1.(PythonRDD.scala:207)
at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:125)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

This change will fix the race condition in branch-1.6 by making sure that broadcast variables are pickled with same pythonRDD function.

How was this patch tested?

Reproduced the issue mentioned in SPARK-12717, following the instructions specified in jira
Make sure that issue is fixed with the changes

Please review http://spark.apache.org/contributing.html before opening a pull request.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@maver1ck
Copy link
Contributor

Hi,
What about this issue ?

@HyukjinKwon
Copy link
Member

One thing I am sure is, @BryanCutler proposed, IMHO, a better approach and merged into master, 2.2 and 2.1. I guess this PR should be closed at least.

srowen added a commit to srowen/spark that referenced this pull request Sep 12, 2017
@srowen srowen mentioned this pull request Sep 12, 2017
@asfgit asfgit closed this in dd88fa3 Sep 13, 2017
zifeif2 pushed a commit to zifeif2/spark that referenced this pull request Nov 22, 2025
Closes apache#18522
Closes apache#17722
Closes apache#18879
Closes apache#18891
Closes apache#18806
Closes apache#18948
Closes apache#18949
Closes apache#19070
Closes apache#19039
Closes apache#19142
Closes apache#18515
Closes apache#19154
Closes apache#19162
Closes apache#19187
Closes apache#19091

Author: Sean Owen <sowen@cloudera.com>

Closes apache#19203 from srowen/CloseStalePRs3.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants