-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-17336][PYSPARK] Fix appending multiple times to PYTHONPATH from spark-config.sh #15028
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-17336][PYSPARK] Fix appending multiple times to PYTHONPATH from spark-config.sh #15028
Conversation
…ed more than once
|
@srowen mind taking a look? This seems to do the trick for me. Thanks! |
|
Seems reasonable, LGTM |
|
Test build #65158 has finished for PR 15028 at commit
|
|
My only concern is if someone has two different versions of Spark is this seems like it might result in weird behaviour if they want to launch jobs against seperate versions in the same shell (which might not seem likely - but think about doing something like performance testing between Spark versions). |
|
I think the current behavior might be worse on that dimension ... you might get several different versions of things at once on the classpath, not just redundant copies. |
|
I'll go ahead with this because I don't think it makes anything any worse. |
…m spark-config.sh ## What changes were proposed in this pull request? During startup of Spark standalone, the script file spark-config.sh appends to the PYTHONPATH and can be sourced many times, causing duplicates in the path. This change adds a env flag that is set when the PYTHONPATH is appended so it will happen only one time. ## How was this patch tested? Manually started standalone master/worker and verified PYTHONPATH has no duplicate entries. Author: Bryan Cutler <cutlerb@gmail.com> Closes #15028 from BryanCutler/fix-duplicate-pythonpath-SPARK-17336. (cherry picked from commit c76baff) Signed-off-by: Sean Owen <sowen@cloudera.com>
|
Merged to master/2.0 |
|
Since the search order is defined the old behavior probably worked cross versions (albeit in an ugly fashion) - I'll follow up with some checks for spark-perf and fix there if necessary since I think that's really the main place which might have been dependent on this behavior. |
…m spark-config.sh ## What changes were proposed in this pull request? During startup of Spark standalone, the script file spark-config.sh appends to the PYTHONPATH and can be sourced many times, causing duplicates in the path. This change adds a env flag that is set when the PYTHONPATH is appended so it will happen only one time. ## How was this patch tested? Manually started standalone master/worker and verified PYTHONPATH has no duplicate entries. Author: Bryan Cutler <cutlerb@gmail.com> Closes apache#15028 from BryanCutler/fix-duplicate-pythonpath-SPARK-17336.
What changes were proposed in this pull request?
During startup of Spark standalone, the script file spark-config.sh appends to the PYTHONPATH and can be sourced many times, causing duplicates in the path. This change adds a env flag that is set when the PYTHONPATH is appended so it will happen only one time.
How was this patch tested?
Manually started standalone master/worker and verified PYTHONPATH has no duplicate entries.