Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix order clause empty df handling #394

Merged
merged 1 commit into from
Dec 3, 2019

Conversation

CanBerker
Copy link
Collaborator

Based on #393. This PR introduces a tiny change that fixes the crashes caused by empty dataframe being passed as input to order by clause. This would result in an empty sql statement which causes the crash.

Tests added.

@ghislainfourny Should the short circuit mentioned above (seen in OrderByClauseSparkIterator.java file) be added to other FLWOR clauses as well? It could save a bit of unnecessary computation and querying when operating on empty sequences. However this introduces an additional count() action which might be costly.

Copy link
Member

@ghislainfourny ghislainfourny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonderfully small PR.

@@ -187,6 +187,9 @@ private void setAllLocalResults() {
throw new SparksoniqRuntimeException("Invalid orderby clause.");
}
Dataset<Row> df = _child.getDataFrame(context, getProjection(parentProjection));
if (df.count() == 0) {
return df;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect.

@ghislainfourny
Copy link
Member

ghislainfourny commented Dec 2, 2019

Based on #393. This PR introduces a tiny change that fixes the crashes caused by empty dataframe being passed as input to order by clause. This would result in an empty sql statement which causes the crash.

Tests added.

@ghislainfourny Should the short circuit mentioned above (seen in OrderByClauseSparkIterator.java file) be added to other FLWOR clauses as well? It could save a bit of unnecessary computation and querying when operating on empty sequences. However this introduces an additional count() action which might be costly.

Here I think performance measurements would tell which is better. For an empty DF, the closures would not be called anyway, so that the count() action may be more costly than the gain. Maybe you could open an issue on this?

What is the current status of empty DFs on other clauses? Are there any crashes?

@CanBerker
Copy link
Collaborator Author

Fortunately, there were no crashes with the other clauses.

I have generated the issue for performance investigations of empty DFs for later.

@CanBerker CanBerker force-pushed the fixOrderClauseEmptyDFHandling branch from 212e8cb to 6cd0689 Compare December 3, 2019 11:37
@CanBerker CanBerker removed the Question label Dec 3, 2019
@CanBerker CanBerker merged commit 01984a7 into master Dec 3, 2019
@CanBerker CanBerker deleted the fixOrderClauseEmptyDFHandling branch December 3, 2019 11:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants