-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-26652][SQL] Use Proleptic Gregorian Calendar in Literal.fromString #23596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -41,6 +41,8 @@ import org.apache.spark.sql.AnalysisException | |
| import org.apache.spark.sql.catalyst.{CatalystTypeConverters, InternalRow, ScalaReflection} | ||
| import org.apache.spark.sql.catalyst.expressions.codegen._ | ||
| import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, MapData} | ||
| import org.apache.spark.sql.catalyst.util.DateTimeUtils.{getTimeZone, stringToDate, stringToTimestamp} | ||
| import org.apache.spark.sql.internal.SQLConf | ||
| import org.apache.spark.sql.types._ | ||
| import org.apache.spark.unsafe.types._ | ||
| import org.apache.spark.util.Utils | ||
|
|
@@ -139,6 +141,11 @@ object Literal { | |
| * Constructs a Literal from a String | ||
| */ | ||
| def fromString(str: String, dataType: DataType): Literal = { | ||
| def parse[T](f: UTF8String => Option[T]): T = { | ||
| f(UTF8String.fromString(str)).getOrElse { | ||
| throw new AnalysisException(s"Cannot parse the ${dataType.catalogString} value: $str") | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shouldn't we just throw a ParseException here?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looks this can be called somewhere else not by the parser .. in that case, parse exception might not make much sense. It maybe has to be illigal argument exception or runtime exception as well. |
||
| } | ||
| } | ||
| val value = dataType match { | ||
| case BooleanType => str.toBoolean | ||
| case ByteType => str.toByte | ||
|
|
@@ -148,8 +155,10 @@ object Literal { | |
| case FloatType => str.toFloat | ||
| case DoubleType => str.toDouble | ||
| case StringType => UTF8String.fromString(str) | ||
| case DateType => java.sql.Date.valueOf(str) | ||
| case TimestampType => java.sql.Timestamp.valueOf(str) | ||
| case DateType => parse(stringToDate) | ||
| case TimestampType => | ||
| val timeZone = getTimeZone(SQLConf.get.sessionLocalTimeZone) | ||
| parse(stringToTimestamp(_, timeZone)) | ||
| case CalendarIntervalType => CalendarInterval.fromString(str) | ||
| case t: DecimalType => | ||
| val d = Decimal(str) | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -22,6 +22,7 @@ import javax.xml.bind.DatatypeConverter | |
|
|
||
| import scala.collection.JavaConverters._ | ||
| import scala.collection.mutable.ArrayBuffer | ||
| import scala.util.Try | ||
|
|
||
| import org.antlr.v4.runtime.{ParserRuleContext, Token} | ||
| import org.antlr.v4.runtime.tree.{ParseTree, RuleNode, TerminalNode} | ||
|
|
@@ -36,10 +37,9 @@ import org.apache.spark.sql.catalyst.expressions.aggregate.{First, Last} | |
| import org.apache.spark.sql.catalyst.parser.SqlBaseParser._ | ||
| import org.apache.spark.sql.catalyst.plans._ | ||
| import org.apache.spark.sql.catalyst.plans.logical._ | ||
| import org.apache.spark.sql.catalyst.util.DateTimeUtils.{getTimeZone, stringToDate, stringToTimestamp} | ||
| import org.apache.spark.sql.internal.SQLConf | ||
| import org.apache.spark.sql.types._ | ||
| import org.apache.spark.unsafe.types.{CalendarInterval, UTF8String} | ||
| import org.apache.spark.unsafe.types.CalendarInterval | ||
| import org.apache.spark.util.random.RandomSampler | ||
|
|
||
| /** | ||
|
|
@@ -1552,17 +1552,15 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging | |
| override def visitTypeConstructor(ctx: TypeConstructorContext): Literal = withOrigin(ctx) { | ||
| val value = string(ctx.STRING) | ||
| val valueType = ctx.identifier.getText.toUpperCase(Locale.ROOT) | ||
| def toLiteral[T](f: UTF8String => Option[T], t: DataType): Literal = { | ||
| f(UTF8String.fromString(value)).map(Literal(_, t)).getOrElse { | ||
| def toLiteral(t: DataType): Literal = { | ||
| Try {Literal.fromString(value, t)}.getOrElse { | ||
| throw new ParseException(s"Cannot parse the $valueType value: $value", ctx) | ||
| } | ||
| } | ||
| try { | ||
| valueType match { | ||
| case "DATE" => toLiteral(stringToDate, DateType) | ||
| case "TIMESTAMP" => | ||
| val timeZone = getTimeZone(SQLConf.get.sessionLocalTimeZone) | ||
| toLiteral(stringToTimestamp(_, timeZone), TimestampType) | ||
| case "DATE" => toLiteral(DateType) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can we just do
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I did that in another PR but @hvanhovell doesn't like that.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looks the current way is cleaner (see 016c696). I'm not sure where he said that in the PR though. |
||
| case "TIMESTAMP" => toLiteral(TimestampType) | ||
| case "X" => | ||
| val padding = if (value.length % 2 != 0) "0" else "" | ||
| Literal(DatatypeConverter.parseHexBinary(padding + value)) | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -243,7 +243,7 @@ class LiteralExpressionSuite extends SparkFunSuite with ExpressionEvalHelper { | |
| checkEvaluation(Literal.fromString("Databricks", StringType), "Databricks") | ||
| val dateString = "1970-01-01" | ||
| checkEvaluation(Literal.fromString(dateString, DateType), java.sql.Date.valueOf(dateString)) | ||
| val timestampString = "0000-01-01 00:00:00" | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The year
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does that mean a behavior change introduced by SPARK-26178, SPARK-26243, SPARK-26424 in 3.0.0? I'm expecting that you already documented that before changing this test case.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would say it is bug fix. I guess old (current) implementation wasn't strong enough in checking its input. |
||
| val timestampString = "2000-01-01 00:00:00.123" | ||
| checkEvaluation(Literal.fromString(timestampString, TimestampType), | ||
| java.sql.Timestamp.valueOf(timestampString)) | ||
| val calInterval = new CalendarInterval(1, 1) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait actually, is it ever properly called somewhere previously? If not, we could just remove it. Since all classes in catalyst are considered an internal API, we could just remove if that's not called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan I remember you mentioned this method can be used somewhere like logging. If not, I will remove it. Also cc @gatorsmile as you added tests for the method recently: #22345
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an explicit plan to use this somewhere? If so, that's okay.
Otherwise, let's remove. In general, we shouldn't keep the codes for internal purpose that are not being called. Removed codes remaining in the commit and other branches - we can always revive them when it's actually used or in order to deduplicate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove it now. It was there for
TreeNode.fromJSON, which has been removed long time ago.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. Here is the PR to remove the methods: #23603