Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

order by implementation #554

Merged
merged 8 commits into from
Apr 4, 2022
Merged

Conversation

onurkuruu
Copy link
Contributor

@onurkuruu onurkuruu commented Mar 19, 2022

*Issue partiql/partiql-lang#20 *

  • Support for NULLS FIRST, NULLS LAST keywords
  • Support for ordering null/missing, bool, number, date/timestamp/time, string types
  • A bug fixed which SqlParser cannot parse if the statement contains ORDER BY which comes after GROUP BY
  • Support for ordering the results;
    • Normal results
    • Aggregated results without grouping (GROUP BY)
    • Grouped results

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Copy link
Member

@alancai98 alancai98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution and putting together this PR! This is definitely a great start to the implementation! I noticed a few things that need addressed:

  • changing default behavior when ordering spec (i.e. ASC/DESC) is not specified
  • using NaturalExprValueComparators to perform the ORDER BY comparison between the different types
  • some more tests (parsing, evaluation)

Other than that, only had some nits/typos.

lang/resources/org/partiql/type-domains/partiql.ion Outdated Show resolved Hide resolved
lang/src/org/partiql/lang/ast/ast.kt Outdated Show resolved Hide resolved
lang/src/org/partiql/lang/ast/StatementToExprNode.kt Outdated Show resolved Hide resolved
lang/src/org/partiql/lang/ast/ExprNodeToStatement.kt Outdated Show resolved Hide resolved
lang/src/org/partiql/lang/syntax/SqlParser.kt Outdated Show resolved Hide resolved
override fun getErrorMessage(errorContext: PropertyValueMap?): String = ""
},

EVALUATOR_ORDER_BY_ENVIRONMENT_CANNOT_RESOLVED(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add "BE" -- EVALUATOR_ORDER_BY_ENVIRONMENT_CANNOT_BE_RESOLVED

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed here with EVALUATOR_ENVIRONMENT_CANNOT_BE_RESOLVED since I have changed the logic of the code which use this error message.

Comment on lines 2005 to 2032
private fun compileOrderByExpression(sortSpecs: List<PartiqlAst.SortSpec>): List<CompiledOrderByItem> =
sortSpecs.map {
it.orderingSpec
?: errNoContext(
"SortSpec.orderingSpec was not specified",
errorCode = ErrorCode.INTERNAL_ERROR,
internal = true
)

it.nulls
?: errNoContext(
"SortSpec.nulls was not specified",
errorCode = ErrorCode.INTERNAL_ERROR,
internal = true
)

val orderSpec = when (it.orderingSpec) {
is PartiqlAst.OrderingSpec.Asc -> OrderingType.ASC
is PartiqlAst.OrderingSpec.Desc -> OrderingType.DESC
}

val nullsSpec = when (it.nulls) {
is PartiqlAst.NullsSpec.NullsFirst -> NullsType.FIRST
is PartiqlAst.NullsSpec.NullsLast -> NullsType.LAST
}

CompiledOrderByItem(orderSpec, nullsSpec, compileAstExpr(it.expr))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think NaturalExprValueComparators can be used here with some slight modification to case on the OrderingSpec. Using that could remove the need for CompareExtentions.kt and provide the implementation for some of the type comparisons.

Also, NaturalExprValueComparators has a lot of tests in NaturalExprValueComparatorsTest.kt, which can reduce the amount of tests needed to be written.

onurkuruu and others added 2 commits March 24, 2022 19:51
Co-authored-by: Alan Cai <caialan@amazon.com>
* added asc/desc support to NaturalExprValueComparators
* using NaturalExprValueComparators when ordering rows
* getQueryThunk returning list if order by specified otherwise bag
* added more test cases to SqlParserTest and EvaluatingCompilerOrderByTests
@onurkuruu
Copy link
Contributor Author

Hi @alancai98, I have pushed a commit to the branch. I tried to use NaturalExprValueComparators to order the rows but I am not sure about the code changes if this is what you wanted. If this is the correct approach you wanted, I can add more test cases to NaturalExprValueComparatorsTest to cover new changes.

Here is the what the commit contains;

  • fixed some typo
  • added asc/desc support to NaturalExprValueComparators
  • using NaturalExprValueComparators when ordering rows
  • getQueryThunk returning list if order by specified otherwise bag
  • added more test cases to SqlParserTest and EvaluatingCompilerOrderByTests

Copy link
Member

@alancai98 alancai98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes since the initial commit look good. Identified a few nits/typos and there could be some more evaluation tests with other data types.

I think the approach using NaturalExprValueComparators looks reasonable. What other tests were you considering adding?

@onurkuruu
Copy link
Contributor Author

Changes since the initial commit look good. Identified a few nits/typos and there could be some more evaluation tests with other data types.

I think the approach using NaturalExprValueComparators looks reasonable. What other tests were you considering adding?

I thought , I could add some tests to cover ASC/DESC feature of NaturalExprValueComparators to see if everything working as expected. Actually, EvaluatingCompilerOrderByTests covers ASC/DESC feature of NaturalExprValueComparators but I thought, It would be good also adding some test cases for NULLS_FIRST_DESC and NULLS_LAST_DESC.

@alancai98
Copy link
Member

I thought , I could add some tests to cover ASC/DESC feature of NaturalExprValueComparators to see if everything working as expected. Actually, EvaluatingCompilerOrderByTests covers ASC/DESC feature of NaturalExprValueComparators but I thought, It would be good also adding some test cases for NULLS_FIRST_DESC and NULLS_LAST_DESC.

Yeah I think those would be good to add.


Regarding the tests for EvaluatingCompilerOrderByTests.kt, we think more tests should be added for the end-to-end evaluator behavior (even though there are unit tests for NaturalExprValueComparators). Tests should ensure the ORDER BY ordering follows what's specified in section 12.2 of the spec. As a summary, should provide the following tests:

  • tests between different data types
    • assuming ASC and NULLS FIRST, check end-to-end evaluation follows the order: unknowns (null, missing), bool, numbers, date, time, timestamp, text, LOB types, lists, structs, bags)
  • edge case tests for values of the same data type
    • false before true
    • nan before -inf, then numeric values, then +inf
    • timestamps compared by absolute point of time
    • text types compared by lexicographical ordering of Unicode scalar
    • LOB types follow their lexicographical ordering by octet
    • lists compared lexicographically based on comparison of elements, recursively
      • also, shorter arrays come first
    • structs compared lexicographically based on sorted attributes (first by key then by value)
    • bags compared as sorted lists

Feel free to use the data provided in the tests from NaturalExprValueComparatorsTest.k. If it's any easier, you can also include the data directly in the from source rather than the evaluation session. E.g.:

SELECT * FROM [{ 'a': false }, { 'a': true }] ORDER BY a

onurkuruu and others added 2 commits March 30, 2022 10:27
Co-authored-by: Alan Cai <caialan@amazon.com>
* added more tests for EvaluatingCompilerOrderByTests
* added tests that cover desc feature of NaturalExprValueComparatorsTest
@onurkuruu
Copy link
Contributor Author

Thanks for your review @alancai98 , I have pushed a new commit for your comments.

  • Added necessary kdoc comments
  • Fixed some typo
  • Added more test cases which covers different data types and edge cases as defined at specifications for EvaluatingCompilerOrderByTests
  • Refactored NaturalExprValueComparatorsTest to cover desc feature of NaturalExprValueComparators

Copy link
Member

@alancai98 alancai98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@onurkuruu, thanks for adding so many tests! While going through them, I noticed that the semantics of ORDER BY's DESC are not entirely correct when comparing between different data types. I tried to explain some of the reasoning and provide a fix in the comments.

Feel free to let me know if anything's confusing or if you have further questions!

Comment on lines 25 to 31
// the lists below represent the expected ordering of values for asc/desc ordering
// grouped by lists of equivalent values.
// used reversed type of boolean, number, date, time, timestamp, string and lob exprs when preparing
// [basicExprsDescOrder] but cannot used reversed type of list, sexp, struct and bag exprs since they
// can contain different data type at the same time and asc/desc ordering don't change priority of
// comparing data types (nulls first then bools then numbers then date ...)
// so needed to create separate exprs for those types
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the corpus of tests here and in EvaluatingCompilerOrderByTests. As I was reviewing it, I realized the behavior is not implemented entirely correctly for the DESC case comparing between different data types. The current DESC behavior is correct when comparing values of the same data type (e.g. 123 comes before 1 for DESC), but we also want this behavior for comparing between different data types (e.g. numbers comes before booleans for DESC).

From the SQL-92 spec, see page 373:

a) Each <sort specification> specifies the sort direction for the corresponding sort key Ki. If DESC is not specified in the i-th <sort specification>, then the sort direction for Ki is ascending and the applicable <comp op> is the <less than operator>. Otherwise, the sort direction for Ki is descending and the applicable <comp op> is the <greater than operator>.

Thus, we need to use a "greater than operator" which would have the converse/reverse ordering behavior of the "less than operator" defined in the PartiQL spec (section 12.2). There's some ambiguity in our current PartiQL spec, so I created an issue to track for adding more clarification: https://github.com/partiql/partiql-spec/issues/27.

I left a comment with some of my thoughts on how the correct semantics can be implemented in NaturalExprValueComparator.kt.

This semantic change should also obviate the need for the separate list/sexp/struct/bag ExprsDescOrder variables.

@@ -157,6 +170,13 @@ enum class NaturalExprValueComparators(private val nullOrder: NullOrder) : Compa
}
}

private fun <T> orderLeftRight(order: Order, left: T, right: T): Pair<T, T> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Swapping the left and right side ordering is what we want when comparing between two of the same data types for DESC. But since we also want values of different types to be sorted in descending order, this approach won't work. To resolve this without repeating too much of the logic, we could break the old version of compare into its own function and return the inverse result.

See this gist with one approach to reverse the result of comparison for DESC: https://gist.github.com/alancai98/c69fc04f8c4c41177c67c8ffa064012c

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have committed the code from your gist and changed the tests a bit. Now we are comparing different data types with the following order (Nulls/Missing can be changed by ordering and nulls spec)

  • bools, numbers, timestamp, text ... for ascending order
  • bag, tuple, array, lob for descending order

Copy link
Member

@alancai98 alancai98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only have a few small nits/style fixes. Should be ready to approve once addressed

onurkuruu and others added 2 commits April 4, 2022 22:36
Co-authored-by: Alan Cai <caialan@amazon.com>
@onurkuruu
Copy link
Contributor Author

@alancai98 thanks for your review again. It was pleasure to contribute with your support. What are the steps of releasing new features of PartiQL? When we can use the PartiQL with order by support with official release?

@alancai98
Copy link
Member

Thanks again for the contribution and the quick responses to feedback!

What are the steps of releasing new features of PartiQL? When we can use the PartiQL with order by support with official release?

We typically publish new PartiQL versions through maven. There's currently no release schedule in place, but we can perform the release whenever a customer/user needs the latest version. Do you need this latest version released to maven?

@alancai98 alancai98 merged commit e171858 into partiql:main Apr 4, 2022
@onurkuruu
Copy link
Contributor Author

Yes, we need. It would be great to have it.

@alancai98
Copy link
Member

Yes, we need. It would be great to have it.

Got it. We should be able to start the release process today. I'll followup here w/ any updates.

@alancai98
Copy link
Member

@onurkuruu the latest release, v0.6.0, was published to Maven earlier today. Feel free to let us know if you have any issues using it!

@onurkuruu
Copy link
Contributor Author

Thanks @alancai98, I can use Order By now. I also want to add alias support for Order By and Group By. Currently, following queries are not working because of missing bindings.

  • SELECT a as b FROM table ORDER BY b
  • SELECT a as b FROM table GROUP BY b

Do you mind if I open a new PR to support for aliases, if can implement? or Do I need to open an issue for this?

@alancai98
Copy link
Member

Hey @onurkuruu. Sorry have been away due to a bout with COVID. I think it would be good to have select aliases available to GROUP BY and ORDER BY clauses. I believe PostgreSQL, MySQL, and SQLite support this. I created a GitHub issue #571 from your previous comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants