-
Notifications
You must be signed in to change notification settings - Fork 151
Fix for NPE in forall. #68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@gorros - Thanks for the PR! I think // how forall works by default
Seq("hi", "hello").forall(_.startsWith("h")) // true
Seq("hi", "hello", null).forall(_.startsWith("h")) // NullPointerException
// returns false correctly
Seq("hi", "hello", "bye").forall(_.startsWith("h"))
// incorrectly returns false
scala.util.Try(Seq("hi", "hello", "bye", null).forall(_.startsWith("h"))).toOption.getOrElse(false)So I think we either need to throw the NullPointerException or return null. It seems like Spark always prefers to return null instead of throwing a NullPointerException. Let me know what you think! Also, if you agree, can you please update the test? Thanks again so much for the help!!! You're motivating me to keep working on this project and make it better. |
|
@MrPowers Well, it is not straight forward. But on the other side That is why |
|
@gorros - good point. Here's a related question I asked in StackOverflow that helped me answer this question. Do you have any thoughts on how to rework the If you don't have any ideas, I can just ask another question on Stackoverflow. Let me know. Thanks! |
|
@MrPowers, so by reworking do you mean to make |
|
@MrPowers there is conflict with return type when I try to return |
|
@gorros - Here's how I think the function should behave (input on the left // => desired output on the right). Let me know what you think! null // => null
Seq("hi", "hello").forall(_.startsWith("h")) // => true
Seq("hi", "hello", null).forall(_.startsWith("h")) // => true
Seq("hi", "hello", "bye").forall(_.startsWith("h")) // => false
Seq("hi", "hello", "bye", null).forall(_.startsWith("h")) // => false
Seq().forall(_.startsWith("h")) // => null
Seq(null).forall(_.startsWith("h")) // => nullSo if the input is This ended up being a lot more complicated than I thought when I initially opened the issue - sorry about that! I really like how Spark never throws NullPointerExceptions. I need to go through all the spark-daria functions again and make sure that none of them ever throw NullPointerExceptions - it's something we need to always watch out for, especially when we use UDFs! |
|
@MrPowers I think there is already question regarding UDFs returning |
|
@MrPowers But on the other side previously I wrote simple UDFs that return |
|
@MrPowers I think main issue is tah our return type us Boolean which is not nullable. |
|
Related to pulling in support for higher order functions: #80 (comment) If we do what I propose in that comment, then we can implement is equivalent to writing The signature of exists however is this: def exists(array: Column, predicate: Column => Column): Column = ???
// the predicate must returna boolean valued columnSo naively it doesn't work with UDFs. However, we can define an implicit conversion from UDFs to functions that operate on columns like so: implicit def udf2columnar(f: UserDefinedFunction): Column => Column = x => f(x)I've used the above solutions in the past, and I think we can take a similar approach like so: implicit def scalaPredicate2udf[T](f: T => Boolean)(implicit e: TypeTag[Boolean], e2: TypeTag[T]): Column => Column = x => udf[Boolean, T](f)(e, e2)(x)We'll need to test this however to make sure it works in practice |
#67