Skip to content

Conversation

@cloud-fan
Copy link
Contributor

@cloud-fan cloud-fan commented Oct 27, 2018

What changes were proposed in this pull request?

after backport #22775 to 2.4, the 2.4 sbt Jenkins QA job is broken, see https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.4-test-sbt-hadoop-2.7/147/console

This PR adds if sys.version >= '3': basestring = str which onlly exists in master.

How was this patch tested?

existing test

@cloud-fan cloud-fan changed the title [SPARK-24709][SQL][2.4] use str instead of basestring [SPARK-24709][SQL][2.4] use str instead of basestring in isinstance Oct 27, 2018
@cloud-fan
Copy link
Contributor Author

cc @HyukjinKwon @gatorsmile

@cloud-fan
Copy link
Contributor Author

BTW the from_csv added in 3.0 also use basestring, maybe we should update it as well in master branch.

@SparkQA
Copy link

SparkQA commented Oct 27, 2018

Test build #98126 has finished for PR 22858 at commit 2917acd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Wenchen, this is because

if sys.version >= '3':
    basestring = str

Is missing. Python 3 does not have basestring.

[Row(json=u'struct<a:bigint>')]
"""
if isinstance(json, basestring):
if isinstance(json, str):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem here is we will not support unicode in Python 2 ..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we apply it to 2.4? I'm not aware of the background, why we did not put

if sys.version >= '3':
    basestring = str

in 2.4?

Copy link
Member

@HyukjinKwon HyukjinKwon Oct 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea we should. They are put only when it's needed because there are so many cases like that (for instance, imap in Python 2 and map in Python 3)

Looks that's added in another PR in master branch only.

Copy link
Member

@felixcheung felixcheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed with @HyukjinKwon - this instead #22858 (comment)

@cloud-fan
Copy link
Contributor Author

@HyukjinKwon thanks for the information! Shall we replace str with basestring in functions.py for master branch?

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Oct 28, 2018

Yup, I think strictly we should change. Looks there are two occurrences at udf and pands_udf isinstance(..., str).

Another problem at PySpark is, inconsistent type comparison like type(...) == t vs isinstance(..., t). For instance, type(...) == dict vs isinstance(..., dict) - the former does not allow OrderedDict but the latter allows.

Another problem is, some types like bool at Python inherits int. In this case, isinstance(...) might produce "unexpected" results, for instance,

>>> isinstance(True, int)
True

I was nervous about the cases above and didn't fix those changes so far.

@SparkQA
Copy link

SparkQA commented Oct 28, 2018

Test build #98151 has finished for PR 22858 at commit 1837449.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Merged to branch-2.4.

asfgit pushed a commit that referenced this pull request Oct 28, 2018
## What changes were proposed in this pull request?

after backport #22775 to 2.4, the 2.4 sbt Jenkins QA job is broken, see https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.4-test-sbt-hadoop-2.7/147/console

This PR adds `if sys.version >= '3': basestring = str` which onlly exists in master.

## How was this patch tested?

existing test

Closes #22858 from cloud-fan/python.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: hyukjinkwon <gurwls223@apache.org>
@HyukjinKwon
Copy link
Member

@cloud-fan, thanks for doing this backport!

@HyukjinKwon
Copy link
Member

Oops, mind fixing PR title too?

@cloud-fan cloud-fan changed the title [SPARK-24709][SQL][2.4] use str instead of basestring in isinstance [SPARK-24709][SQL][2.4] map basestring to str for python 3 Oct 28, 2018
@cloud-fan
Copy link
Contributor Author

title updated

@cloud-fan cloud-fan closed this Oct 28, 2018
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Aug 1, 2019
## What changes were proposed in this pull request?

after backport apache#22775 to 2.4, the 2.4 sbt Jenkins QA job is broken, see https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.4-test-sbt-hadoop-2.7/147/console

This PR adds `if sys.version >= '3': basestring = str` which onlly exists in master.

## How was this patch tested?

existing test

Closes apache#22858 from cloud-fan/python.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: hyukjinkwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants