Skip to content

Conversation

@BryanCutler
Copy link
Member

@BryanCutler BryanCutler commented Sep 9, 2016

What changes were proposed in this pull request?

During startup of Spark standalone, the script file spark-config.sh appends to the PYTHONPATH and can be sourced many times, causing duplicates in the path. This change adds a env flag that is set when the PYTHONPATH is appended so it will happen only one time.

How was this patch tested?

Manually started standalone master/worker and verified PYTHONPATH has no duplicate entries.

@BryanCutler
Copy link
Member Author

@srowen mind taking a look? This seems to do the trick for me. Thanks!

@srowen
Copy link
Member

srowen commented Sep 9, 2016

Seems reasonable, LGTM

@SparkQA
Copy link

SparkQA commented Sep 9, 2016

Test build #65158 has finished for PR 15028 at commit 180a17f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@holdenk
Copy link
Contributor

holdenk commented Sep 9, 2016

My only concern is if someone has two different versions of Spark is this seems like it might result in weird behaviour if they want to launch jobs against seperate versions in the same shell (which might not seem likely - but think about doing something like performance testing between Spark versions).

@srowen
Copy link
Member

srowen commented Sep 10, 2016

I think the current behavior might be worse on that dimension ... you might get several different versions of things at once on the classpath, not just redundant copies.

@srowen
Copy link
Member

srowen commented Sep 11, 2016

I'll go ahead with this because I don't think it makes anything any worse.

asfgit pushed a commit that referenced this pull request Sep 11, 2016
…m spark-config.sh

## What changes were proposed in this pull request?
During startup of Spark standalone, the script file spark-config.sh appends to the PYTHONPATH and can be sourced many times, causing duplicates in the path.  This change adds a env flag that is set when the PYTHONPATH is appended so it will happen only one time.

## How was this patch tested?
Manually started standalone master/worker and verified PYTHONPATH has no duplicate entries.

Author: Bryan Cutler <cutlerb@gmail.com>

Closes #15028 from BryanCutler/fix-duplicate-pythonpath-SPARK-17336.

(cherry picked from commit c76baff)
Signed-off-by: Sean Owen <sowen@cloudera.com>
@srowen
Copy link
Member

srowen commented Sep 11, 2016

Merged to master/2.0

@asfgit asfgit closed this in c76baff Sep 11, 2016
@holdenk
Copy link
Contributor

holdenk commented Sep 11, 2016

Since the search order is defined the old behavior probably worked cross versions (albeit in an ugly fashion) - I'll follow up with some checks for spark-perf and fix there if necessary since I think that's really the main place which might have been dependent on this behavior.

wgtmac pushed a commit to wgtmac/spark that referenced this pull request Sep 19, 2016
…m spark-config.sh

## What changes were proposed in this pull request?
During startup of Spark standalone, the script file spark-config.sh appends to the PYTHONPATH and can be sourced many times, causing duplicates in the path.  This change adds a env flag that is set when the PYTHONPATH is appended so it will happen only one time.

## How was this patch tested?
Manually started standalone master/worker and verified PYTHONPATH has no duplicate entries.

Author: Bryan Cutler <cutlerb@gmail.com>

Closes apache#15028 from BryanCutler/fix-duplicate-pythonpath-SPARK-17336.
@BryanCutler BryanCutler deleted the fix-duplicate-pythonpath-SPARK-17336 branch December 2, 2016 01:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants