Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Count Expressions #4530

Merged
merged 17 commits into from
Dec 11, 2023

Conversation

adamretter
Copy link
Contributor

@adamretter adamretter commented Aug 30, 2022

This is currently passing 12 out of 13 XQTS tests. The 1 failing test is: prod-CountClause/count-009.

Screenshot 2023-06-06 at 23 20 43

Acceptance of Failing XQTS Test

The 1 failing test occurs due to the existing design and implementation of FLWOR expressions in eXist-db.

Specifically, eXist-db's implementation of the XQuery order by clause does not follow the W3C specification. The W3C XQuery 3.1 Order By Clause specification states:

The purpose of an order by clause is to impose a value-based ordering on the tuples in the tuple stream. The output tuple stream of the order by clause contains the same tuples as its input tuple stream, but the tuples may be in a different order.

This means that any tuples in a tuple stream after an Order By Clause should be in the order defined by the order by clause. In the eXist-db implementation that is not the case. In eXist-db any expressions/clauses within a FLWOR expression that occur after the order by clause, but before the return clause, will not see an ordered tuple stream; instead they see the tuple stream prior to the order by clause. The reason for this is that in eXist-db the operation of executing the sort for the Order By clause is always deferred until just before the return clause of the FLWOR expression. In eXist-db this is known as a postEval phase, see:

  1. FLWORClause#postEval(Sequence)
  2. ForExpr#eval(Sequence, Item)
  3. OrderByClause#postEval(Sequence)

Unfortunately at this time it appears that modifying eXist-db to correctly implement the Order By Clause semantics would be a huge undertaking, and may require a rewrite of all FLWOR expressions. Therefore we consider it out of scope for this PR.

On an tangentially related note, we suspect there may be other problems with all other FLWOR expressions in eXist-db that implement postEval with regards to compliance with W3C XQuery 3.1.


In Further Detail

Given the XQuery:

for $x in ('a', 'b')

  count $index1
  
  let $remainder := $index1 mod 2
  order by $remainder, $index1
  
  count $index2

return
    <x index1="{$index1}" index2="{$index2}">{$x}</x>

The result should be computed as:

<x index1="2" index2="1">b</x>
<x index1="1" index2="2">a</x>

However eXist-db produces the incorrect result due to its incorrect implementation of the order by clause:

<x index1="2" index2="2">b</x>
<x index1="1" index2="1">a</x>

This occurs because eXist-db evaluates the query by effectively executing the following steps:

  1. Bind $x := 'a'
  2. Bind $index1 := 1
  3. Bind $remainder := 1
  4. Bind $index2 := 1
  5. Add <x index1="1" index2="1">a</x> to the result sequence.
  6. Bind $x := 'b'
  7. Bind $index1 := 2
  8. Bind $remainder := 0
  9. Bind $index2 := 2
  10. Add <x index1="2" index2="2">b</x> to the result sequence.
  11. Sort the result sequence based by $remainder ascending and then $index1 ascending
  12. Return the sorted result sequence.

The problem is that eXist-db applies the sorting (i.e. the order by clause) after the fact.

I have created a detailed UML Sequence Diagram of the above XQuery execution in eXist-db that shows the exact problem:
eXist-db Count and Order By - UML Sequence Diagram

In case it is helpful for anyone else, the code for the UML Sequence Diagram:

participant user as "User"
participant xquery as "XQuery"
participant pathexpr as "PathExpr"
participant x as "$x: ForExpr"
participant index1 as "$index1: CountExpr"
participant remainder as "$remainder: LetExpr"
participant orderby as "OrderByClause"
participant index2 as "$index2: CountExpr"
participant element as "ElementConstructor"

activate user

'XQuery Compilation Phase
note over user: XQuery Compilation phase starts here
activate xquery
user -> xquery: compile(...)
xquery -> xquery: analyzeAndOptimizeIfModulesChanged(...)
activate pathexpr
xquery -> pathexpr: analyze(...)
activate x
pathexpr -> x: analyze(...)
activate index1
x -> index1: analyze(...)
activate remainder
index1 -> remainder: analyze(...)
activate orderby
remainder -> orderby: analyze(...)
activate index2
orderby -> index2: analyze(...)
activate element
index2 -> element: analyze(...)
element -> index2
deactivate element
index2 -> orderby
deactivate index2
orderby -> remainder
deactivate orderby
remainder -> index1
deactivate remainder
index1 -> x
deactivate index1
x -> pathexpr
deactivate x
pathexpr -> xquery
deactivate pathexpr
xquery -> user
deactivate xquery

'XQuery Execution Phase
note over user: XQuery Execution phase starts here
activate xquery
user -> xquery: execute(...)
activate pathexpr
xquery -> pathexpr: eval(null)
activate x
pathexpr -> x: eval(null)
x -> x: isOuterFor == true
x -> x: $index1 is FLWORClause == true
activate index1
x -> index1: preEval(('a', 'b'))
index1 -> index1: hasPreviousOrderByDescending() == false
index1 -> index1: calcStartPosition == 0
activate remainder
index1 --> remainder: preEval(('a', 'b'))
activate orderby
remainder --> orderby: preEval(('a', 'b')) 
activate index2
orderby -> index2: preEval(('a', 'b')) 
index2 -> index2: hasPreviousOrderByDescending() == false
index2 -> index2: calcStartPosition == 0
index2 -> orderby
orderby --> remainder
remainder --> index1
index1 -> x

'Process first Item 'a'
x -> x: processItem('a')
x -> index1: eval(null, null)
index1 -> index1: increment/decrement count == 1
index1 -> remainder: eval(null, null)
remainder -> orderby: eval(null, null)
note over orderby, index2, element: Error: No ordering of the Tuple Stream as been performed by `order by` before\n the `count $index2` is evaluated and subsequently used by the `x` Element Constructor
orderby -> index2: eval(null, null)
index2 -> index2: increment/decrement count == 1
activate element
index2 -> element: eval(null, null)
element-> index2: (index1=1 index2=1 x='a')
index2-> orderby: (index1=1 index2=1 x='a')
orderby -> orderby: orderedResult == \n  [\n    (index1=1 index2=1 x='a')\n  ]
note over orderby: orderedResult is not yet sorted!
orderby -> remainder: (index1=1 index2=1 x='a')
remainder -> index1: (index1=1 index2=1 x='a')
index1 -> x: (index1=1 index2=1 x='a')

'Process second Item 'b'
x -> x: processItem('b')
x -> index1: eval(null, null)
index1 -> index1: increment/decrement count == 2
index1 -> remainder: eval(null, null)
note over orderby, index2, element: Error: No ordering of the Tuple Stream as been performed by `order by` before\n the `count $index2` is evaluated and subsequently used by the `x` Element Constructor
remainder -> orderby: eval(null, null)
orderby -> index2: eval(null, null)
index2 -> index2: increment/decrement count == 2
index2 -> element: eval(null, null)
element-> index2: (index1=2 index2=2 x='b')
deactivate element
index2-> orderby: (index1=2 index2=2 x='b')
orderby -> orderby: orderedResult == \n  [\n    (index1=1 index2=1 x='a')\n    (index1=2 index2=2 x='b')\n  ]
note over orderby: orderedResult is not yet sorted!
orderby -> remainder: (index1=2 index2=2 x='b')
remainder -> index1: (index1=2 index2=2 x='b')
index1 -> x: (index1=2 index2=2 x='b')

'Post Eval phase
x -> x: callPostEval == true
note over x: Post Eval Phase starts here...
x --> index1: postEval([\n    (index1=1 index2=1 x='a')\n    (index1=2 index2=2 x='b')\n  ])
index1 --> remainder: postEval([\n    (index1=1 index2=1 x='a')\n    (index1=2 index2=2 x='b')\n  ])
remainder --> orderby: postEval([\n    (index1=1 index2=1 x='a')\n    (index1=2 index2=2 x='b')\n  ])
orderby -> orderby: orderedResult.sort
orderby -> orderby: orderedResult == \n  [\n    (index1=2 index2=2 x='b')\n    (index1=1 index2=1 x='a')\n  ]
note over orderby: orderedResult is now sorted!
note over orderby: Error: The Tuple Stream was sorted after the `count $index2`,\n it should have been sorted before!
orderby -> index2: postEval([\n    (index1=2 index2=2 x='b')\n    (index1=1 index2=1 x='a')\n  ])
index2 -> orderby: [\n    (index1=2 index2=2 x='b')\n    (index1=1 index2=1 x='a')\n  ]
deactivate index2
orderby --> remainder: [\n    (index1=2 index2=2 x='b')\n    (index1=1 index2=1 x='a')\n  ]
deactivate orderby
remainder --> index1: [\n    (index1=2 index2=2 x='b')\n    (index1=1 index2=1 x='a')\n  ]
deactivate remainder
index1 --> x: [\n    (index1=2 index2=2 x='b')\n    (index1=1 index2=1 x='a')\n  ]
deactivate index1
x -> pathexpr: [\n    (index1=2 index2=2 x='b')\n    (index1=1 index2=1 x='a')\n  ]
deactivate x
pathexpr -> xquery: [\n    (index1=2 index2=2 x='b')\n    (index1=1 index2=1 x='a')\n  ]
deactivate pathexpr
note over user: The orderedResult overwrites the result sequence,\n and is returned to the User
xquery -> user: [\n    (index1=2 index2=2 x='b')\n    (index1=1 index2=1 x='a')\n  ]
deactivate xquery

deactivate user

This open source contribution to the eXist-db project was commissioned by the Office of the Historian, U.S. Department of State, https://history.state.gov/.

@adamretter adamretter added the enhancement new features, suggestions, etc. label Aug 30, 2022
@adamretter adamretter added this to the eXist-7.0.0 milestone Aug 30, 2022
@adamretter adamretter added the bug issue confirmed as bug label Aug 30, 2022
@adamretter adamretter force-pushed the feature/count-expression branch 2 times, most recently from ce1aef4 to e667cfd Compare September 3, 2022 16:21
@adamretter adamretter force-pushed the feature/count-expression branch 2 times, most recently from 49fda27 to 4a8f21c Compare December 10, 2022 21:31
@adamretter adamretter force-pushed the feature/count-expression branch from 4a8f21c to f6fea18 Compare March 13, 2023 21:59
@adamretter adamretter force-pushed the feature/count-expression branch from f6fea18 to 0ec88af Compare June 11, 2023 10:39
@adamretter adamretter force-pushed the feature/count-expression branch from 0ec88af to dd84a0b Compare June 11, 2023 10:45
@adamretter adamretter marked this pull request as ready for review June 11, 2023 10:46
@adamretter adamretter requested review from reinhapa and dizzzz June 11, 2023 10:46
@dizzzz
Copy link
Member

dizzzz commented Jun 13, 2023

I missed the review request; I will have a look coming days.
Is there an easy way to seeing an list of review requests?

@adamretter
Copy link
Contributor Author

Is there an easy way to seeing an list of review requests?

@dizzzz Not that I know of. I think I normally get notifications via email myself.

@adamretter adamretter requested a review from reinhapa June 13, 2023 20:36
@adamretter adamretter requested a review from reinhapa June 14, 2023 14:38
@sonarqubecloud
Copy link

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 13 Code Smells

69.4% 69.4% Coverage
0.5% 0.5% Duplication

@reinhapa reinhapa requested a review from wolfgangmm June 15, 2023 07:30
@adamretter adamretter force-pushed the feature/count-expression branch from 477e413 to 1e5993e Compare October 29, 2023 21:24
@reinhapa
Copy link
Member

@adamretter would you mind changing the this PR to draft mode until it is ready?

@adamretter
Copy link
Contributor Author

adamretter commented Oct 30, 2023

@adamretter would you mind changing the this PR to draft mode until it is ready?

@reinhapa Erm... but... it is ready to be merged!

@reinhapa
Copy link
Member

@adamretter could you rebase as the tests fail...

@adamretter adamretter force-pushed the feature/count-expression branch from 76fb045 to 643be29 Compare October 31, 2023 12:23
@sonarqubecloud
Copy link

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 13 Code Smells

71.0% 71.0% Coverage
0.5% 0.5% Duplication

idea Catch issues before they fail your Quality Gate with our IDE extension sonarlint SonarLint

@adamretter
Copy link
Contributor Author

could you rebase as the tests fail...

@reinhapa Done :-)

@reinhapa
Copy link
Member

reinhapa commented Nov 7, 2023

@adamretter there are a some simple code smells left, that could be easy solved. Would you mind to address those? (use of new instanceof and some usage of protected instead of public visibility)

@line-o line-o merged commit 1029a51 into eXist-db:develop Dec 11, 2023
9 of 11 checks passed
@adamretter adamretter deleted the feature/count-expression branch November 28, 2024 16:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug issue confirmed as bug enhancement new features, suggestions, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants