Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -482,15 +482,15 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String
case (fromField, toField) => cast(fromField.dataType, toField.dataType)
}
// TODO: Could be faster?
val newRow = new GenericInternalRow(from.fields.length)
buildCast[InternalRow](_, row => {
val newRow = new GenericInternalRow(from.fields.length)
var i = 0
while (i < row.numFields) {
newRow.update(i,
if (row.isNullAt(i)) null else castFuncs(i)(row.get(i, from.apply(i).dataType)))
i += 1
}
newRow.copy()
Copy link
Contributor

@hvanhovell hvanhovell Jun 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it better to just fix GenericInternalRow.copy? I think I broke it when I removed MutableRow (see #15333).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of having a same row instance and copy it every time, I think it makes more sense to create a different row everytime.

Besides, I also had a PR to fix this: #15082 . Maybe I should reopen it...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hvanhovell I found the semantics of the original GenericMutableRow.copy() pretty dangerous since it didn't properly implement deep copy semantics. In fact, it's impossible to do a proper deep copy there. We only copy the underlying field array but may still share field objects accidentally.

If we do want to "fix" GenericInternalRow.copy(), I'd prefer throwing an exception instead of following the old GenericMutableRow.copy() method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do agree that there are problems with GenericInternalRow and we should document the current shallow copy semantivs. However, since the contract of InternalRow currently allows update, I do think we need to fix copy in order to make it less broken. The

newRow
})
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -345,4 +345,25 @@ class InsertSuite extends DataSourceTest with SharedSQLContext {
)
}
}

test("SPARK-21203 wrong results of insertion of Array of Struct") {
val tabName = "tab1"
withTable(tabName) {
spark.sql(
"""
|CREATE TABLE `tab1`
|(`custom_fields` ARRAY<STRUCT<`id`: BIGINT, `value`: STRING>>)
|USING parquet
""".stripMargin)
spark.sql(
"""
|INSERT INTO `tab1`
|SELECT ARRAY(named_struct('id', 1, 'value', 'a'), named_struct('id', 2, 'value', 'b'))
""".stripMargin)

checkAnswer(
spark.sql("SELECT custom_fields.id, custom_fields.value FROM tab1"),
Row(Array(1, 2), Array("a", "b")))
}
}
}