Skip to content

Conversation

@HeartSaVioR
Copy link
Contributor

What changes were proposed in this pull request?

This patch fixes examples of Like/RLike to test its origin intention correctly. The example doesn't consider the default value of spark.sql.parser.escapedStringLiterals: it's false by default.

Please take a look at current example of Like:

examples = """
Examples:
> SELECT '%SystemDrive%\Users\John' _FUNC_ '\%SystemDrive\%\Users%';
true
""",
note = """
Use RLIKE to match with standard regular expressions.
""",
since = "1.0.0")
case class Like(left: Expression, right: Expression) extends StringRegexExpression {

If spark.sql.parser.escapedStringLiterals=false, then it should fail as there's \U in pattern (spark.sql.parser.escapedStringLiterals=false by default) but it doesn't fail.

The escape character is '\'. If an escape character precedes a special symbol or another
escape character, the following character is matched literally. It is invalid to escape
any other character.

For the query

SET spark.sql.parser.escapedStringLiterals=false;
SELECT '%SystemDrive%\Users\John' like '\%SystemDrive\%\Users%';

SQL parser removes single \ (not sure that is intended) so the expressions of Like are constructed as following (I've printed out expression of left and right for Like/RLike):

LIKE - left %SystemDrive%UsersJohn / right \%SystemDrive\%Users%

which are no longer having origin intention (see left).

Below query tests the origin intention:

SET spark.sql.parser.escapedStringLiterals=false;
SELECT '%SystemDrive%\\Users\\John' like '\%SystemDrive\%\\\\Users%';

LIKE - left %SystemDrive%\Users\John / right \%SystemDrive\%\\Users%

Note that \\\\ is needed in pattern as StringUtils.escapeLikeRegex requires \\ to represent normal character of \.

Same for RLIKE:

SET spark.sql.parser.escapedStringLiterals=true;
SELECT '%SystemDrive%\Users\John' rlike '%SystemDrive%\\Users.*';

RLIKE - left %SystemDrive%\Users\John / right %SystemDrive%\\Users.*

which is OK, but

SET spark.sql.parser.escapedStringLiterals=false;
SELECT '%SystemDrive%\Users\John' rlike '%SystemDrive%\Users.*';

RLIKE - left %SystemDrive%UsersJohn / right %SystemDrive%Users.*

which no longer haves origin intention.

Below query tests the origin intention:

SET spark.sql.parser.escapedStringLiterals=true;
SELECT '%SystemDrive%\\Users\\John' rlike '%SystemDrive%\\\\Users.*';

RLIKE - left %SystemDrive%\Users\John / right %SystemDrive%\\Users.*

Why are the changes needed?

Because the example doesn't test the origin intention. Spark is now running automated tests from these examples, so now it's not only documentation issue but also test issue.

Does this PR introduce any user-facing change?

No, as it only corrects documentation.

How was this patch tested?

Added debug log (like above) and ran queries from spark-sql.

@HeartSaVioR
Copy link
Contributor Author

Relevant tests could fail as we discovered flaky tests, and the fix seems to be submitted via #25956

@SparkQA
Copy link

SparkQA commented Sep 28, 2019

Test build #111524 has finished for PR 25957 at commit f50a718.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 28, 2019

Test build #111525 has finished for PR 25957 at commit cc43cb6.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member

MaxGekk commented Sep 28, 2019

I think of ignoring sql commands in examples by prefix > SET to avoid this #25957 (comment)

@HeartSaVioR
Copy link
Contributor Author

Thanks, yes I had to exclude Like as well in other test as same reason RLike was excluded. Just updated.

@SparkQA
Copy link

SparkQA commented Sep 28, 2019

Test build #111529 has finished for PR 25957 at commit a87a75b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Merged to master.

@HeartSaVioR
Copy link
Contributor Author

Thanks all for reviewing and merging!

@HeartSaVioR HeartSaVioR deleted the SPARK-29281 branch September 28, 2019 22:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants