-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-3148] Update global variables of HttpBroadcast so that multiple SparkContexts can coexist #2059
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…e SparkContexts can coexist
…e SparkContexts can coexist
|
QA tests have started for PR 2059 at commit
|
|
QA tests have finished for PR 2059 at commit
|
…e SparkContexts can coexist
|
QA tests have started for PR 2059 at commit
|
|
QA tests have finished for PR 2059 at commit
|
|
Could you provide some more background in order to help us review this PR? Did older Spark versions support multiple SparkContexts? Is this fixing a regression from an earlier release? Please add this information to the pull request description so that it's incorporated in the final commit message if we merge this. Thanks! |
|
Hi @JoshRosen SparkContext1 creates broadcastManager and initializes HttpBroadcast object. HttpBroadcast creates httpserver and broadcastDir and so on. However SparkContext2 in the same process won't initialize HttpBroadcast object when creating broadcastManager. Since HttpBroadcast object is marked initialized and will not be initialized any more. SparkContext1 and SparkContext2 will share the same HttpBroadcast object. When SparkContext1 stops HttpBroadcast, HttpBroadcast in SparkContext2 actually is stopped. When HttpBroadcast1 cleans up files, some files owned by SparkContext2 may be removed. Since they are the same one. |
|
QA tests have started for PR 2059 at commit
|
|
QA tests have finished for PR 2059 at commit
|
|
I think that we should close this issue for now, since there are other blockers to multiple SparkContexts in the same JVM (namely, the global SparkEnv). We'll consider this fix when addressing the larger "multiple SparkContexts issue", though. |
Update global variables of HttpBroadcast so that multiple SparkContexts can coexist.
SparkContext1 creates broadcastManager and initializes HttpBroadcast object. HttpBroadcast creates httpserver and broadcastDir and so on. However SparkContext2 in the same process won't initialize HttpBroadcast object when creating broadcastManager. Since HttpBroadcast object is marked initialized and will not be initialized any more. SparkContext1 and SparkContext2 will share the same HttpBroadcast object. When SparkContext1 stops HttpBroadcast, HttpBroadcast in SparkContext2 actually is stopped. When HttpBroadcast1 cleans up files, some files owned by SparkContext2 may be removed. Since they are the same one.
The latest spark version still has this problem.