-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-11553] [SQL] Primitive Row accessors should not convert null to default value #9642
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This change breaks binary compatibility. I think we should just update the documentation to say use isNull to check for primitive types. |
|
We could also change only implementation of methods for primitive types (getInt, getDouble etc) to do null check internally and do not change code of getAs but update documentation as you said. |
|
ok to test |
|
Please change the PR title to something more descriptive: |
|
Test build #45772 has finished for PR 9642 at commit
|
f498f4a to
eef5778
Compare
|
Test build #45777 has finished for PR 9642 at commit
|
eef5778 to
858b9a1
Compare
|
Test build #45852 has finished for PR 9642 at commit
|
858b9a1 to
be382bc
Compare
|
Test build #45853 has finished for PR 9642 at commit
|
|
The only failing test is 'Sessions of SQLContext' in org.apache.spark.sql.SQLContextSuite. Unfortunately on my machine I am not able to reproduce the problem - it is working well. What is more I think that it is not related to my change. |
be382bc to
181b075
Compare
|
Test build #45894 has finished for PR 9642 at commit
|
|
After rebasing to current master all tests are passing |
|
I'd actualy rather not touch this at all. When you are using internal API you should be more carefull and expect some quirkiness. I can currently think of only one place in which this causes some problems: UDFs with primitive parameters. The engine will pass in default values instead of nulls. Are there any other situations in which this causes problems? |
|
Hey @hvanhovell ! Thanks for the comment. I agree with you that if we want to introduce this change we need to take care of
You mean discard change and only update documentation? Yes this is one option - I even think that it is not so bad.
You mean problem caused by not introducing this change? I think that we should analyse pros and cons and decide how to proceed. |
I would only update the documentation. Internal mutable rows are among the most performance critical classes in Spark SQL, so I am not that keen to add (potentially unnecessary) branching to every primitive getter. When someone is using
Yes, do you know any other problems caused by this? |
Right now I did not find any other potential problem. |
|
There is no scaladoc for primitive getters in |
|
So I have been making a lot of fuss about internal classes, which you are not touching. Sorry about that. This change is much more benign, but I still wonder if you need to start throwing |
|
No problem. |
|
The original behavior of these classes was to throw if you tried to retrieve a null primitive value and I think thats a lot less confusing. Unfortunately it seems that the tungsten refactoring changed this behavior, but since it was the original behavior I'm in favor of returning to it. |
|
Regarding the internal classes, my preferred option would be to add assertions that we can elide for performance in production. |
|
Thanks, I'm going to merge this to master and 1.6. |
… default value Invocation of getters for type extending AnyVal returns default value (if field value is null) instead of throwing NPE. Please check comments for SPARK-11553 issue for more details. Author: Bartlomiej Alberski <bartlomiej.alberski@allegrogroup.com> Closes #9642 from alberskib/bugfix/SPARK-11553. (cherry picked from commit 3129662) Signed-off-by: Michael Armbrust <michael@databricks.com>
Invocation of getters for type extending AnyVal returns default value (if field value is null) instead of throwing NPE. Please check comments for SPARK-11553 issue for more details.