Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,8 @@ import org.apache.spark.sql.AnalysisException
import org.apache.spark.sql.catalyst.{CatalystTypeConverters, InternalRow, ScalaReflection}
import org.apache.spark.sql.catalyst.expressions.codegen._
import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, MapData}
import org.apache.spark.sql.catalyst.util.DateTimeUtils.{getTimeZone, stringToDate, stringToTimestamp}
import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.types._
import org.apache.spark.unsafe.types._
import org.apache.spark.util.Utils
Expand Down Expand Up @@ -139,6 +141,11 @@ object Literal {
* Constructs a Literal from a String
*/
def fromString(str: String, dataType: DataType): Literal = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait actually, is it ever properly called somewhere previously? If not, we could just remove it. Since all classes in catalyst are considered an internal API, we could just remove if that's not called.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan I remember you mentioned this method can be used somewhere like logging. If not, I will remove it. Also cc @gatorsmile as you added tests for the method recently: #22345

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an explicit plan to use this somewhere? If so, that's okay.

Otherwise, let's remove. In general, we shouldn't keep the codes for internal purpose that are not being called. Removed codes remaining in the commit and other branches - we can always revive them when it's actually used or in order to deduplicate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove it now. It was there for TreeNode.fromJSON, which has been removed long time ago.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. Here is the PR to remove the methods: #23603

def parse[T](f: UTF8String => Option[T]): T = {
f(UTF8String.fromString(str)).getOrElse {
throw new AnalysisException(s"Cannot parse the ${dataType.catalogString} value: $str")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we just throw a ParseException here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks this can be called somewhere else not by the parser .. in that case, parse exception might not make much sense. It maybe has to be illigal argument exception or runtime exception as well.

}
}
val value = dataType match {
case BooleanType => str.toBoolean
case ByteType => str.toByte
Expand All @@ -148,8 +155,10 @@ object Literal {
case FloatType => str.toFloat
case DoubleType => str.toDouble
case StringType => UTF8String.fromString(str)
case DateType => java.sql.Date.valueOf(str)
case TimestampType => java.sql.Timestamp.valueOf(str)
case DateType => parse(stringToDate)
case TimestampType =>
val timeZone = getTimeZone(SQLConf.get.sessionLocalTimeZone)
parse(stringToTimestamp(_, timeZone))
case CalendarIntervalType => CalendarInterval.fromString(str)
case t: DecimalType =>
val d = Decimal(str)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ import javax.xml.bind.DatatypeConverter

import scala.collection.JavaConverters._
import scala.collection.mutable.ArrayBuffer
import scala.util.Try

import org.antlr.v4.runtime.{ParserRuleContext, Token}
import org.antlr.v4.runtime.tree.{ParseTree, RuleNode, TerminalNode}
Expand All @@ -36,10 +37,9 @@ import org.apache.spark.sql.catalyst.expressions.aggregate.{First, Last}
import org.apache.spark.sql.catalyst.parser.SqlBaseParser._
import org.apache.spark.sql.catalyst.plans._
import org.apache.spark.sql.catalyst.plans.logical._
import org.apache.spark.sql.catalyst.util.DateTimeUtils.{getTimeZone, stringToDate, stringToTimestamp}
import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.types._
import org.apache.spark.unsafe.types.{CalendarInterval, UTF8String}
import org.apache.spark.unsafe.types.CalendarInterval
import org.apache.spark.util.random.RandomSampler

/**
Expand Down Expand Up @@ -1552,17 +1552,15 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging
override def visitTypeConstructor(ctx: TypeConstructorContext): Literal = withOrigin(ctx) {
val value = string(ctx.STRING)
val valueType = ctx.identifier.getText.toUpperCase(Locale.ROOT)
def toLiteral[T](f: UTF8String => Option[T], t: DataType): Literal = {
f(UTF8String.fromString(value)).map(Literal(_, t)).getOrElse {
def toLiteral(t: DataType): Literal = {
Try {Literal.fromString(value, t)}.getOrElse {
throw new ParseException(s"Cannot parse the $valueType value: $value", ctx)
}
}
try {
valueType match {
case "DATE" => toLiteral(stringToDate, DateType)
case "TIMESTAMP" =>
val timeZone = getTimeZone(SQLConf.get.sessionLocalTimeZone)
toLiteral(stringToTimestamp(_, timeZone), TimestampType)
case "DATE" => toLiteral(DateType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just do Cast(Literal(str), DateType)? Or Literal.create(Cast(...).eval, DateType)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did that in another PR but @hvanhovell doesn't like that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks the current way is cleaner (see 016c696). I'm not sure where he said that in the PR though.

case "TIMESTAMP" => toLiteral(TimestampType)
case "X" =>
val padding = if (value.length % 2 != 0) "0" else ""
Literal(DatatypeConverter.parseHexBinary(padding + value))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -243,7 +243,7 @@ class LiteralExpressionSuite extends SparkFunSuite with ExpressionEvalHelper {
checkEvaluation(Literal.fromString("Databricks", StringType), "Databricks")
val dateString = "1970-01-01"
checkEvaluation(Literal.fromString(dateString, DateType), java.sql.Date.valueOf(dateString))
val timestampString = "0000-01-01 00:00:00"
Copy link
Member Author

@MaxGekk MaxGekk Jan 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The year 0000 did not exist in our current era, and it is considered as an invalid year by java.time implementations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean a behavior change introduced by SPARK-26178, SPARK-26243, SPARK-26424 in 3.0.0? I'm expecting that you already documented that before changing this test case.

it is considered as an invalid year by java.time implementations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say it is bug fix. I guess old (current) implementation wasn't strong enough in checking its input.
Year zero does not exists in current (common) era (CE) and before common era (BCE) as well (see https://en.wikipedia.org/wiki/Year_zero).

val timestampString = "2000-01-01 00:00:00.123"
checkEvaluation(Literal.fromString(timestampString, TimestampType),
java.sql.Timestamp.valueOf(timestampString))
val calInterval = new CalendarInterval(1, 1)
Expand Down