Skip to content

Conversation

@xkrogen
Copy link
Contributor

@xkrogen xkrogen commented Jun 19, 2020

What changes were proposed in this pull request?

This PR will remove references to these "blacklist" and "whitelist" terms besides the blacklisting feature as a whole, which can be handled in a separate JIRA/PR.

This touches quite a few files, but the changes are straightforward (variable/method/etc. name changes) and most quite self-contained.

Why are the changes needed?

As per discussion on the Spark dev list, it will be beneficial to remove references to problematic language that can alienate potential community members. One such reference is "blacklist" and "whitelist". While it seems to me that there is some valid debate as to whether these terms have racist origins, the cultural connotations are inescapable in today's world.

Does this PR introduce any user-facing change?

In the test file HiveQueryFileTest, a developer has the ability to specify the system property spark.hive.whitelist to specify a list of Hive query files that should be tested. This system property has been renamed to spark.hive.includelist. The old property has been kept for compatibility, but will log a warning if used. I am open to feedback from others on whether keeping a deprecated property here is unnecessary given that this is just for developers running tests.

How was this patch tested?

Existing tests should be suitable since no behavior changes are expected as a result of this PR.

@xkrogen
Copy link
Contributor Author

xkrogen commented Jun 19, 2020

@holdenk can you help review?

@holdenk
Copy link
Contributor

holdenk commented Jun 20, 2020

Sure, I'm taking this weekend away from coding so I'll get to this early next week.

@xkrogen
Copy link
Contributor Author

xkrogen commented Jun 22, 2020

Updated to address @venkata91 's comments and fix one class I missed in the hive-thriftserver module.

@tgravescs
Copy link
Contributor

ok to test

@SparkQA
Copy link

SparkQA commented Jun 23, 2020

Test build #124427 has finished for PR 28874 at commit 5963b54.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@xkrogen xkrogen force-pushed the xkrogen-SPARK-32036-rename-blacklists branch from 5963b54 to 32ccda7 Compare June 24, 2020 16:53
@xkrogen
Copy link
Contributor Author

xkrogen commented Jun 24, 2020

Updated to resolve whitespace errors and some typos per suggestions by @tgravescs . Thanks!

@SparkQA
Copy link

SparkQA commented Jun 24, 2020

Test build #124492 has finished for PR 28874 at commit 32ccda7.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@xkrogen
Copy link
Contributor Author

xkrogen commented Jun 24, 2020

I'm not sure why the Jenkins build failed. I see this message:

Using `mvn` from path: /home/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.6.3/bin/mvn
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-install-plugin:3.0.0-M1:install (default-cli) on project spark-parent_2.12: ArtifactInstallerException: Failed to install metadata org.apache.spark:spark-parent_2.12/maven-metadata.xml: Could not parse metadata /home/jenkins/.m2/repository/org/apache/spark/spark-parent_2.12/maven-metadata-local.xml: in epilog non whitespace content is not allowed but got > (position: END_TAG seen ...</metadata>\n>... @13:2) -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException

I do not understand how it could be related to my patch, which doesn't touch anything Maven-related. Any suggestions?

@holdenk
Copy link
Contributor

holdenk commented Jun 24, 2020

Yeah I've been running into that too. I reached out to Shane and he took the worker that has the corrupted m2 cache out of rotation. So if we do "Jenkins retest this please" hopefully the other Jenkin's workers have a better m2 cache :)

Copy link
Contributor

@holdenk holdenk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this, I'm happy that we are cleaning this up :)

@SparkQA
Copy link

SparkQA commented Jun 24, 2020

Test build #124497 has finished for PR 28874 at commit 32ccda7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 24, 2020

Test build #124500 has finished for PR 28874 at commit c3214de.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 26, 2020

Test build #124552 has finished for PR 28874 at commit 09ec2e8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@tgravescs tgravescs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me. @holdenk any other comments?

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it's all internal changes, with no behavior changes, so seems fine to me.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK to me, any more comments? I can merge if not.
Let me run tests one more time to be sure.

@srowen
Copy link
Member

srowen commented Jul 11, 2020

Jenkins test this please

@SparkQA
Copy link

SparkQA commented Jul 11, 2020

Test build #125689 has finished for PR 28874 at commit 09ec2e8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Jul 12, 2020

LGTM but we have a merge conflict now

@tgravescs
Copy link
Contributor

@xkrogen can you up merge?

@xkrogen xkrogen force-pushed the xkrogen-SPARK-32036-rename-blacklists branch from 09ec2e8 to 5cd7d81 Compare July 13, 2020 16:44
@xkrogen
Copy link
Contributor Author

xkrogen commented Jul 13, 2020

Thanks a lot for the new eyes @srowen and @tgravescs ! I've just pushed up a resolution to the conflict.

@SparkQA
Copy link

SparkQA commented Jul 13, 2020

Test build #125780 has finished for PR 28874 at commit 5cd7d81.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tgravescs
Copy link
Contributor

@xkrogen unfortunately it appears there is another conflict could you update again?

@xkrogen xkrogen force-pushed the xkrogen-SPARK-32036-rename-blacklists branch from 5cd7d81 to 1ba58fb Compare July 14, 2020 15:31
@xkrogen
Copy link
Contributor Author

xkrogen commented Jul 14, 2020

Playing a game of whack-a-mole here :) Thanks for the heads up @tgravescs . Pushed up another conflict resolution.

@SparkQA
Copy link

SparkQA commented Jul 14, 2020

Test build #125843 has finished for PR 28874 at commit 1ba58fb.

  • This patch fails PySpark pip packaging tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@xkrogen
Copy link
Contributor Author

xkrogen commented Jul 14, 2020

I think the pip packaging test failure is not related to my changes, but I cannot really tell.

Installing collected packages: py4j, pyspark
  Found existing installation: py4j 0.10.9
    Uninstalling py4j-0.10.9:
      Successfully uninstalled py4j-0.10.9
  Found existing installation: pyspark 3.1.0.dev0
Exception:
Traceback (most recent call last):
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 179, in main
    status = self.run(options, args)
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 393, in run
    use_user_site=options.use_user_site,
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/__init__.py", line 50, in install_given_reqs
    auto_confirm=True
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_install.py", line 816, in uninstall
    uninstalled_pathset = UninstallPathSet.from_dist(dist)
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_uninstall.py", line 505, in from_dist
    '(at %s)' % (link_pointer, dist.project_name, dist.location)
AssertionError: Egg-link /home/jenkins/workspace/SparkPullRequestBuilder@2/python does not match installed location of pyspark (at /home/jenkins/workspace/SparkPullRequestBuilder/python)
Cleaning up temporary directory - /tmp/tmp.CjjxmjJmax
[error] running /home/jenkins/workspace/SparkPullRequestBuilder/dev/run-pip-tests ; received return code 2

@tgravescs
Copy link
Contributor

all GitHub action tests passed so I think we are good. I'll kick it once more just to check.

@tgravescs
Copy link
Contributor

test this please

@SparkQA
Copy link

SparkQA commented Jul 14, 2020

Test build #125852 has finished for PR 28874 at commit 1ba58fb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tgravescs
Copy link
Contributor

@xkrogen unfortunately think we are still playing wack a mole, it doesn't seem mergable, could you upmerge again and I'll try to watch closely and get merged.

@xkrogen
Copy link
Contributor Author

xkrogen commented Jul 15, 2020

@tgravescs GitHub seems to be reporting no conflicts, and when I just put together a rebase, there were also no conflicts. Can you check again?

I'm happy to push up a rebased branch, but don't want to have to wait for another test cycle if it's not necessary :)

@tgravescs
Copy link
Contributor

weird, let me try to commit again, it was reporting couldn't do it, but I also saw this PR was stuck and wouldn't say if it was mergeable

@tgravescs
Copy link
Contributor

thanks @xkrogen merged to master

@xkrogen
Copy link
Contributor Author

xkrogen commented Jul 15, 2020

Thanks a lot @tgravescs !

@asfgit asfgit closed this in cf22d94 Jul 15, 2020
@xkrogen xkrogen deleted the xkrogen-SPARK-32036-rename-blacklists branch July 15, 2020 17:11
PY2_CLASS_DICT_BLACKLIST = (PY2_METHOD_WRAPPER_TYPE,
PY2_WRAPPER_DESCRIPTOR_TYPE)
PY2_CLASS_DICT_SKIP_PICKLE_METHOD_TYPE = (PY2_METHOD_WRAPPER_TYPE,
PY2_WRAPPER_DESCRIPTOR_TYPE)
Copy link
Member

@HyukjinKwon HyukjinKwon Jul 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey let's dont change this file but keep as the release (and fix their release and port it together). It's the exact copy of cloudpickle release. We just port their fixes and dont have conflicts here for management purpose. Linters skip this file too and we had to put a lot of efforts to resolve the conflicts here before.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going to upgrade this file at #29114. Seems that release version doesn't have this issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok so since you are upgrading, we are good to leave it as is, correct? Your merge will fix it up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I think we're all good

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing this @HyukjinKwon ! I avoided making changes in other files that I could tell were direct copies (e.g. stuff in hive-thriftserver) but did not realize this was one of those cases.

@HyukjinKwon
Copy link
Member

+1 the changes look good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants