Skip to content

Conversation

@yaooqinn
Copy link
Member

@yaooqinn yaooqinn commented Dec 9, 2019

What changes were proposed in this pull request?

Currently, COMMENT and LOCATION are reserved properties for Datasource v2 namespaces. They can be set via specific clauses and via properties. And the ones specified in clauses take precede of properties. Since they are reserved, which means they are not able to visit directly. They should be used in COMMENT/LOCATION clauses ONLY.

Why are the changes needed?

make reserved properties be reserved.

Does this PR introduce any user-facing change?

yes, 'location', 'comment' are not allowed use in db properties

How was this patch tested?

UNIT tests.

@yaooqinn
Copy link
Member Author

yaooqinn commented Dec 9, 2019

cc @cloud-fan, I think this a separate issue from SET OWNER. As to COMMENT, methinks if have defined it as a reserved property, it should be treat as same as LOCATION. Thanks for reviewing this in advance.

@cloud-fan
Copy link
Contributor

The problem is, we don't have a specific clause to alter COMMENT, but I do agree that it's weird to write both COMMENT and the "comment" property in CREATE TABLE.

We should either treat comment specially, and allow to set it via properties, or create a special clause for altering comment. What does presto and RDBMS do?

@yaooqinn
Copy link
Member Author

yaooqinn commented Dec 9, 2019

presto has no comment as database attribute https://prestodb.io/docs/current/sql/create-schema.html
postgresql has comment on syntax https://www.postgresql.org/docs/9.1/sql-comment.html

Mysql allows comments only for tables and columns. ALTER TABLE t1 COMMENT = 'New table comment'; https://dev.mysql.com/doc/refman/8.0/en/alter-database.html

@SparkQA
Copy link

SparkQA commented Dec 9, 2019

Test build #115022 has finished for PR 26806 at commit c3ee986.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

presto also has the COMMENT ON command: https://prestosql.io/docs/current/sql/comment.html

Can we add it first? Then it's safe to say that these properties are reserved.

@yaooqinn
Copy link
Member Author

OK, I'll add that first

@yaooqinn
Copy link
Member Author

Hi @cloud-fan, do we need an umbrella for the COMMENT ON syntaxes? According to PostgreSQL we may have several matched cases. https://www.postgresql.org/docs/9.1/sql-comment.html

@cloud-fan
Copy link
Contributor

I think we only need to support TABLE and NAMESPACE

@yaooqinn
Copy link
Member Author

How about COLUMN / VIEW?

@cloud-fan
Copy link
Contributor

We have SQL syntax to set comment of a column, and view is not supported by v2 yet. At least we only need to support table and namespace in the near future.

@yaooqinn
Copy link
Member Author

I got it.

@yaooqinn
Copy link
Member Author

yaooqinn commented Jan 3, 2020

Applied the new resolving framework to CreateNamespace, cc @cloud-fan, thanks for reviewing

@SparkQA
Copy link

SparkQA commented Jan 3, 2020

Test build #116092 has finished for PR 26806 at commit c7c2ebf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

case class CreateNamespace(
catalog: SupportsNamespaces,
namespace: Seq[String],
child: LogicalPlan,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we delay this change? We may want to create the rule to convert to v1 commands when migrating to the new framework. Let's focus on disallowing reserved properties in this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@yaooqinn yaooqinn changed the title [SPARK-30183][SQL] Disallow to specify reserved properties in CREATE … [SPARK-30183][SQL] Disallow to specify reserved properties in CREATE NAMESPACE syntax Jan 6, 2020
Option(ctx.comment).map(string).map {
properties += SupportsNamespaces.PROP_COMMENT -> _

if (properties.keySet.intersect(RESERVED_PROPERTIES.asScala.toSet).nonEmpty) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

properties.keySet.exists(RESERVED_PROPERTIES.contains)

import org.apache.spark.sql.catalyst.plans.logical._
import org.apache.spark.sql.catalyst.rules.Rule
import org.apache.spark.sql.connector.catalog.{CatalogManager, CatalogPlugin, LookupCatalog, SupportsNamespaces, Table, TableCatalog, TableChange, V1Table}
import org.apache.spark.sql.connector.catalog.CatalogV2Util.isSessionCatalog
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary change

@SparkQA
Copy link

SparkQA commented Jan 6, 2020

Test build #116123 has finished for PR 26806 at commit 5abbfc2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class CreateNamespaceStatement(

@SparkQA
Copy link

SparkQA commented Jan 6, 2020

Test build #116129 has finished for PR 26806 at commit ae9a0eb.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Jan 6, 2020

Test build #116142 has finished for PR 26806 at commit ae9a0eb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


- Since Spark 3.0, the function `percentile_approx` and its alias `approx_percentile` only accept integral value with range in `[1, 2147483647]` as its 3rd argument `accuracy`, fractional and string types are disallowed, e.g. `percentile_approx(10.0, 0.2, 1.8D)` will cause `AnalysisException`. In Spark version 2.4 and earlier, if `accuracy` is fractional or string value, it will be coerced to an int value, `percentile_approx(10.0, 0.2, 1.8D)` is operated as `percentile_approx(10.0, 0.2, 1)` which results in `10.0`.

- Since Spark 3.0, the namespace properties `location` and `comment` become reserved, it will fail with `ParseException` if we use them as members of `DBPROTERTIES` in `CREATE NAMESPACE` and `ALTER NAMESPACE ... SET PROPERTIES(...)`. We need their specific clauses to specify them, e.g. `CREATE NAMESPACE a.b.c COMMENT 'any comment' LOCATION 'some path'`. We can set `spark.sql.legacy.property.nonReserved` to `true` to ignore the `ParseException`, but notice that in this case, these properties will produce side effects, e.g `SET DBPROTERTIES('location'='/tmp')` might change the location of the database. In Spark version 2.4 and earlier, these properties are neither reserved nor have side effects, e.g. `SET DBPROTERTIES('location'='/tmp')` will not change the location of the database but create a headless property just like `'a'='b'`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's the migration guide, I think it's better to use the syntax of Spark 2.4

it will fail with `ParseException` if we use them in `CREATE DATABASE ... DBPROPERTIES` and `ALTER DATABASE ... SET DBPROPERTIES` ...

@SparkQA
Copy link

SparkQA commented Jan 7, 2020

Test build #116224 has finished for PR 26806 at commit 499579c.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.


- Since Spark 3.0, the function `percentile_approx` and its alias `approx_percentile` only accept integral value with range in `[1, 2147483647]` as its 3rd argument `accuracy`, fractional and string types are disallowed, e.g. `percentile_approx(10.0, 0.2, 1.8D)` will cause `AnalysisException`. In Spark version 2.4 and earlier, if `accuracy` is fractional or string value, it will be coerced to an int value, `percentile_approx(10.0, 0.2, 1.8D)` is operated as `percentile_approx(10.0, 0.2, 1)` which results in `10.0`.

- Since Spark 3.0, the words `location` and `comment` become reserved database properties, it will fail with `ParseException` if we use them as members of `DBPROTERTIES` in `CREATE DATABASE` and `ALTER DATABASE ... SET PROPERTIES(...)`. We need their specific clauses to specify them, e.g. `CREATE DATABASE test COMMENT 'any comment' LOCATION 'some path'`. We can set `spark.sql.legacy.property.nonReserved` to `true` to ignore the `ParseException`, but notice that in this case, these properties will be ignored too, e.g `SET DBPROTERTIES('location'='/tmp')` will affect nothing. In Spark version 2.4 and earlier, these properties are neither reserved nor have side effects, e.g. `SET DBPROTERTIES('location'='/tmp')` will not change the location of the database but only create a headless property just like `'a'='b'`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

notice that in this case, these properties will be ignored too ... We don't need to highlight it as it's the same with 2.4

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not exactly the same, 2.4 will create a dbproperty, we will not do that. I'd change this to your suggestion as is in SQLConf

}
}

private def checkNamespaceProperties(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not just check now, how about cleanNamespaceProperties?

.internal()
.doc("When true, all database and table properties are not reserved and available for " +
"create/alter syntaxes. But please be aware that the reserved properties will still be " +
"used by Spark internally and will ignore their user specified values.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reserved properties will be silently removed.

@SparkQA
Copy link

SparkQA commented Jan 7, 2020

Test build #116227 has finished for PR 26806 at commit ff0054f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 7, 2020

Test build #116236 has finished for PR 26806 at commit a16a15e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 7, 2020

Test build #116249 has finished for PR 26806 at commit 384d531.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class MultilayerPerceptronClassificationModel(JavaProbabilisticClassificationModel,
  • case class ShowTablesStatement(
  • case class ShowNamespacesStatement(

@yaooqinn yaooqinn changed the title [SPARK-30183][SQL] Disallow to specify reserved properties in CREATE NAMESPACE syntax [SPARK-30183][SQL] Disallow to specify reserved properties in CREATE/ALTER NAMESPACE syntax Jan 8, 2020
@SparkQA
Copy link

SparkQA commented Jan 8, 2020

Test build #116276 has finished for PR 26806 at commit 1e05f05.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 8, 2020

Test build #116313 has finished for PR 26806 at commit 57541fd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • logDebug(s\"isMulticlass = $
  • logDebug(s\"isMulticlass = $
  • case class CreateFunctionStatement(
  • case class AlterNamespaceSetLocation(

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in c373123 Jan 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants