[SPARK-11553] [SQL] Primitive Row accessors should not convert null to default value #9642

alberskib · 2015-11-11T22:54:00Z

Invocation of getters for type extending AnyVal returns default value (if field value is null) instead of throwing NPE. Please check comments for SPARK-11553 issue for more details.

rxin · 2015-11-12T02:46:50Z

This change breaks binary compatibility. I think we should just update the documentation to say use isNull to check for primitive types.

alberskib · 2015-11-12T08:41:02Z

We could also change only implementation of methods for primitive types (getInt, getDouble etc) to do null check internally and do not change code of getAs but update documentation as you said.

marmbrus · 2015-11-12T21:07:15Z

ok to test

marmbrus · 2015-11-12T21:08:00Z

Please change the PR title to something more descriptive: [SPARK-11553] [SQL] Primitive Row accessors should not convert null to default value

SparkQA · 2015-11-12T21:23:55Z

Test build #45772 has finished for PR 9642 at commit f498f4a.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-11-12T22:24:47Z

Test build #45777 has finished for PR 9642 at commit eef5778.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-11-13T09:08:02Z

Test build #45852 has finished for PR 9642 at commit 858b9a1.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-11-13T10:00:50Z

Test build #45853 has finished for PR 9642 at commit be382bc.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

alberskib · 2015-11-13T20:29:20Z

The only failing test is 'Sessions of SQLContext' in org.apache.spark.sql.SQLContextSuite. Unfortunately on my machine I am not able to reproduce the problem - it is working well. What is more I think that it is not related to my change.

SparkQA · 2015-11-13T23:52:11Z

Test build #45894 has finished for PR 9642 at commit 181b075.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

alberskib · 2015-11-14T12:29:05Z

After rebasing to current master all tests are passing

hvanhovell · 2015-11-14T12:34:00Z

UnsafeRow and SpecificRow have similar problems. Shouldn't we fix those as well? For example:

import org.apache.spark.sql.types.IntegerType
import org.apache.spark.sql.catalyst.expressions.SpecificMutableRow
import org.apache.spark.sql.catalyst.expressions.UnsafeRow

val srow = new SpecificMutableRow(IntegerType :: Nil)
srow.isNullAt(0)
srow.getInt(0)

val urow = new UnsafeRow()
urow.pointTo(new Array[Byte](16), 1, 16)
urow.setNullAt(0)
urow.isNullAt(0)
urow.getInt(0)

// Result:
import org.apache.spark.sql.types.IntegerType
import org.apache.spark.sql.catalyst.expressions.SpecificMutableRow
import org.apache.spark.sql.catalyst.expressions.UnsafeRow
srow: org.apache.spark.sql.catalyst.expressions.SpecificMutableRow = [null]
res129: Boolean = true
res130: Int = 0
urow: org.apache.spark.sql.catalyst.expressions.UnsafeRow = []
res133: Boolean = true
res134: Int = 0

I'd actualy rather not touch this at all. When you are using internal API you should be more carefull and expect some quirkiness.

I can currently think of only one place in which this causes some problems: UDFs with primitive parameters. The engine will pass in default values instead of nulls. Are there any other situations in which this causes problems?

alberskib · 2015-11-14T13:05:38Z

Hey @hvanhovell ! Thanks for the comment. I agree with you that if we want to introduce this change we need to take care of UnsafeRow and SpecificRow as well.
I have couple of questions for you:

I'd actualy rather not touch this at all ...

You mean discard change and only update documentation? Yes this is one option - I even think that it is not so bad.
On the other hand it could be misleading and many users will not notice the difference in documentation - this leads them to making the same errors. I think that right now there is lot of code that is calculating wrong outcomes as a result of this bug.

I can currently think of only one place in which this causes some problems

You mean problem caused by not introducing this change?

I think that we should analyse pros and cons and decide how to proceed.

hvanhovell · 2015-11-14T13:32:11Z

You mean discard change and only update documentation? Yes this is one option - I even think that it is not so bad.

I would only update the documentation. Internal mutable rows are among the most performance critical classes in Spark SQL, so I am not that keen to add (potentially unnecessary) branching to every primitive getter. When someone is using InternalRows he or she is using a SparkSQL internal API anyway, and really should be knowing what (s)he is doing.

You mean problem caused by not introducing this change?

Yes, do you know any other problems caused by this?

alberskib · 2015-11-14T15:49:31Z

Yes, do you know any other problems caused by this?

Right now I did not find any other potential problem.

alberskib · 2015-11-14T16:58:46Z

There is no scaladoc for primitive getters in UnsafeRow and SpecificRow (and super classes) - everything remains there as is.

hvanhovell · 2015-11-14T17:26:13Z

So I have been making a lot of fuss about internal classes, which you are not touching. Sorry about that.

This change is much more benign, but I still wonder if you need to start throwing NPE's. I'd prefer to update the documentation.

alberskib · 2015-11-14T18:59:41Z

No problem.
I am not sure which option is better. At the beginning I was thinking that it is easier to notice exception being throwned rather than change in the documentation, but right now it looks like changing only documentation will be more consistent.
It will be good if more people will express opinion.

marmbrus · 2015-11-16T19:43:08Z

The original behavior of these classes was to throw if you tried to retrieve a null primitive value and I think thats a lot less confusing. Unfortunately it seems that the tungsten refactoring changed this behavior, but since it was the original behavior I'm in favor of returning to it.

marmbrus · 2015-11-16T19:46:17Z

Regarding the internal classes, my preferred option would be to add assertions that we can elide for performance in production.

marmbrus · 2015-11-16T23:13:59Z

Thanks, I'm going to merge this to master and 1.6.

… default value Invocation of getters for type extending AnyVal returns default value (if field value is null) instead of throwing NPE. Please check comments for SPARK-11553 issue for more details. Author: Bartlomiej Alberski <bartlomiej.alberski@allegrogroup.com> Closes #9642 from alberskib/bugfix/SPARK-11553. (cherry picked from commit 3129662) Signed-off-by: Michael Armbrust <michael@databricks.com>

alberskib changed the title ~~SPARK-11553 Fix issue related to Row from DataFrame API~~ [SPARK-11553] [SQL] Primitive Row accessors should not convert null to default value Nov 12, 2015

alberskib force-pushed the bugfix/SPARK-11553 branch from f498f4a to eef5778 Compare November 12, 2015 21:32

alberskib force-pushed the bugfix/SPARK-11553 branch from eef5778 to 858b9a1 Compare November 13, 2015 08:48

alberskib force-pushed the bugfix/SPARK-11553 branch from 858b9a1 to be382bc Compare November 13, 2015 09:14

SPARK-11553 Primitive Row accessors throw NPE for null

181b075

alberskib force-pushed the bugfix/SPARK-11553 branch from be382bc to 181b075 Compare November 13, 2015 21:14

gatorsmile mentioned this pull request Nov 14, 2015

[SPARK-9928][SQL] Removal of LogicalLocalTable #9717

Closed

asfgit closed this in 3129662 Nov 16, 2015

[SPARK-11553] [SQL] Primitive Row accessors should not convert null to default value #9642

[SPARK-11553] [SQL] Primitive Row accessors should not convert null to default value #9642

Uh oh!

Conversation

alberskib commented Nov 11, 2015

Uh oh!

rxin commented Nov 12, 2015

Uh oh!

alberskib commented Nov 12, 2015

Uh oh!

marmbrus commented Nov 12, 2015

Uh oh!

marmbrus commented Nov 12, 2015

Uh oh!

SparkQA commented Nov 12, 2015

Uh oh!

SparkQA commented Nov 12, 2015

Uh oh!

SparkQA commented Nov 13, 2015

Uh oh!

SparkQA commented Nov 13, 2015

Uh oh!

alberskib commented Nov 13, 2015

Uh oh!

SparkQA commented Nov 13, 2015

Uh oh!

alberskib commented Nov 14, 2015

Uh oh!

hvanhovell commented Nov 14, 2015

Uh oh!

alberskib commented Nov 14, 2015

Uh oh!

hvanhovell commented Nov 14, 2015

Uh oh!

alberskib commented Nov 14, 2015

Uh oh!

alberskib commented Nov 14, 2015

Uh oh!

hvanhovell commented Nov 14, 2015

Uh oh!

alberskib commented Nov 14, 2015

Uh oh!

marmbrus commented Nov 16, 2015

Uh oh!

marmbrus commented Nov 16, 2015

Uh oh!

marmbrus commented Nov 16, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants