Skip to content

Conversation

@wangyum
Copy link
Member

@wangyum wangyum commented Jun 13, 2019

What changes were proposed in this pull request?

This PR is to port text.sql from PostgreSQL regression tests. https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/text.sql

The expected results can be found in the link: https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/expected/text.out

When porting the test cases, found a PostgreSQL specific features that do not exist in Spark SQL:
SPARK-28037: Add built-in String Functions: quote_literal

Also, found three inconsistent behavior:
SPARK-27930: Spark SQL's format_string can not fully support PostgreSQL's format
SPARK-28036: Built-in udf left/right has inconsistent behavior
SPARK-28033: String concatenation should low priority than other operators

How was this patch tested?

N/A

@SparkQA
Copy link

SparkQA commented Jun 13, 2019

Test build #106464 has finished for PR 24862 at commit 1068a10.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum wangyum changed the title [WIP][SPARK-28038][SQL][TEST] Port text.sql [SPARK-28038][SQL][TEST] Port text.sql Jul 19, 2019
@SparkQA
Copy link

SparkQA commented Jul 19, 2019

Test build #107897 has finished for PR 24862 at commit 4be0f9e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 19, 2019

Test build #107899 has finished for PR 24862 at commit c9d3d16.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Retest this please.

-- As of 8.3 we have removed most implicit casts to text, so that for example
-- this no longer works:
-- Spark SQL implicit cast integer to string
select length(42);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: If we strictly follow ANSI/SQL, we don't allow this implicit cast along with PostgresSQL.
cc: @gengliangwang

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this casting an integer to string? If yes, I think it is allowed in ANSI SQL and up-cast.

-- an unknown literal. So these work:
-- [SPARK-28033] String concatenation low priority than other arithmeticBinary
select string('four: ') || 2+2;
select string('four: ') || 2+2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: duplicate test

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

select concat_ws(NULL,10,20,null,30) is null;
select reverse('abcde');
-- [SPARK-28036] Built-in udf left/right has inconsistent behavior
-- select i, left('ahoj', i), right('ahoj', i) from range(-5, 5) t(i) order by i;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you comment out this? (I just want to check current output...)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of ANSI mode:

spark-sql> select left('12345', 2);
12
spark-sql> set spark.sql.parser.ansi.enabled=true;
spark.sql.parser.ansi.enabled	true
spark-sql> select left('12345', 2);
Error in query:
no viable alternative at input 'left'(line 1, pos 7)

== SQL ==
select left('12345', 2)
-------^^^

https://issues.apache.org/jira/browse/SPARK-28479

The output if disable ANSI mode:

Spark SQL:

spark-sql> select i, left('ahoj', i), right('ahoj', i) from range(-5, 6) t(i) order by i;
-5
-4
-3
-2
-1
0
1	a	j
2	ah	oj
3	aho	hoj
4	ahoj	ahoj
5	ahoj	ahoj

PostgreSQL:

postgres=# select i, left('ahoj', i), right('ahoj', i) from generate_series(-5, 5) t(i) order by i;
 i  | left | right
----+------+-------
 -5 |      |
 -4 |      |
 -3 | a    | j
 -2 | ah   | oj
 -1 | aho  | hoj
  0 |      |
  1 | a    | j
  2 | ah   | oj
  3 | aho  | hoj
  4 | ahoj | ahoj
  5 | ahoj | ahoj
(11 rows)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Can you turn temporarily off the mode for the query here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@SparkQA
Copy link

SparkQA commented Jul 30, 2019

Test build #108355 has finished for PR 24862 at commit c9d3d16.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 30, 2019

Test build #108382 has finished for PR 24862 at commit 10204c6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Merged to master.

@wangyum wangyum deleted the SPARK-28038 branch July 31, 2019 02:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants