-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix order clause empty df handling #394
Conversation
d2b1971
to
212e8cb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonderfully small PR.
@@ -187,6 +187,9 @@ private void setAllLocalResults() { | |||
throw new SparksoniqRuntimeException("Invalid orderby clause."); | |||
} | |||
Dataset<Row> df = _child.getDataFrame(context, getProjection(parentProjection)); | |||
if (df.count() == 0) { | |||
return df; | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect.
Here I think performance measurements would tell which is better. For an empty DF, the closures would not be called anyway, so that the count() action may be more costly than the gain. Maybe you could open an issue on this? What is the current status of empty DFs on other clauses? Are there any crashes? |
Fortunately, there were no crashes with the other clauses. I have generated the issue for performance investigations of empty DFs for later. |
212e8cb
to
6cd0689
Compare
Based on #393. This PR introduces a tiny change that fixes the crashes caused by empty dataframe being passed as input to order by clause. This would result in an empty sql statement which causes the crash.
Tests added.
@ghislainfourny Should the short circuit mentioned above (seen in OrderByClauseSparkIterator.java file) be added to other FLWOR clauses as well? It could save a bit of unnecessary computation and querying when operating on empty sequences. However this introduces an additional count() action which might be costly.