Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -617,7 +617,9 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] extends Product {
case s: String => JString(s)
case u: UUID => JString(u.toString)
case dt: DataType => dt.jsonValue
case m: Metadata => m.jsonValue
// SPARK-17356: In usage of mllib, Metadata may store a huge vector of data, transforming
Copy link
Contributor Author

@clockfly clockfly Sep 1, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current implementation of toJSON recursively searches the Map and Seq, and try to convert every field to JSON.

It is quite risky, since we don't know what data is stored in unknown Seq and Map, and it may easily trigger OOM if the Seq or Map is a huge object.

Maybe we should disable converting Seq and Map?

// it to JSON may trigger OutOfMemoryError.
case m: Metadata => Metadata.empty.jsonValue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we use JNothing instead of Metadata.empty.jsonValue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we should not. JNothing is to map scala.Option.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh sorry, I mean JNull

case s: StorageLevel =>
("useDisk" -> s.useDisk) ~ ("useMemory" -> s.useMemory) ~ ("useOffHeap" -> s.useOffHeap) ~
("deserialized" -> s.deserialized) ~ ("replication" -> s.replication)
Expand Down
10 changes: 9 additions & 1 deletion sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ import org.apache.spark.sql.execution.aggregate.TypedAggregateExpression
import org.apache.spark.sql.execution.columnar.InMemoryRelation
import org.apache.spark.sql.execution.datasources.LogicalRelation
import org.apache.spark.sql.execution.streaming.MemoryPlan
import org.apache.spark.sql.types.ObjectType
import org.apache.spark.sql.types.{Metadata, ObjectType}


abstract class QueryTest extends PlanTest {
Expand Down Expand Up @@ -274,6 +274,14 @@ abstract class QueryTest extends PlanTest {
val normalized1 = logicalPlan.transformAllExpressions {
case udf: ScalaUDF => udf.copy(function = null)
case gen: UserDefinedGenerator => gen.copy(function = null)
// After SPARK-17356: the JSON representation no longer has the Metadata. We need to remove
// the Metadata from the normalized plan so that we can compare this plan with the
// JSON-deserialzed plan.
case a @ Alias(child, name) if a.explicitMetadata.isDefined =>
Alias(child, name)(a.exprId, a.qualifier, Some(Metadata.empty), a.isGenerated)
case a: AttributeReference if a.metadata != Metadata.empty =>
AttributeReference(a.name, a.dataType, a.nullable, Metadata.empty)(a.exprId, a.qualifier,
a.isGenerated)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be good to have a test. The test can check that the json representation does not have metadata.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yhuai

It is covered by existing QueryTest. In QueryTest, it will call toJSON and then call fromJSON to convert JSON back to a plan, and do comparison like this:

    if (normalized1 != normalized2) {
      fail(
        s"""
           |== FAIL: the logical plan parsed from json does not match the original one ===
           |${sideBySide(logicalPlan.treeString, normalized2.treeString).mkString("\n")}
          """.stripMargin)
    }

If the Meta data is not empty, the comparison will fail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Let's explain why we do remove the metadata at here. Basically, I think we need to explain because json string does not have metadata, we remove the metadata from the normalized plan. So, we can do the check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(just make the comment a little bit more clear)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

}

// RDDs/data are not serializable to JSON, so we need to collect LogicalPlans that contains
Expand Down