Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Internal] Query: Fixes ORDER BY issue when partial partition key is specified in RequestOptions in a query to sub-partitioned container #4587

Merged
merged 4 commits into from
Jul 18, 2024

Conversation

adityasa
Copy link
Contributor

@adityasa adityasa commented Jul 11, 2024

Pull Request Template

Description

This changes addresses the issue that when an ORDER BY query is issued to a sub-partitioned container with partial partition key in the request options, documents outside the specified logical partition can also be retrieved. This happens if the logical partition range specified by partition key is split across multiple physical partitions (as with all HPK related issues so far) and one of the physical partitions contains data from other logical partitions (which is very likely, especially if a split has taken place).
With the previous fix made in following PR, which is still a required fix, we ensured that in cases when logical partition is split across multiple physical partitions, we perform a cross-partition ordering in the client:
#4507

However, as a result of this change, information regarding the logical partition key is not honored and each physical partition gets queried fully. This is because enumerator is scoped to full partition and when request is issued, epk range header is not specified in the request (only partition id is specified)

This change refactors the existing logic (used in another pipeline) to limit the FeedRangeEpk used by the enumerator to the intersection of physical partition range and logical partition feed range. This effectively limits the query epk range (by including correct header) for cases when a physical partition contains data from other logical partitions as well.

As far as testing is concerned, the InMemoryContainer was already able to repro this case as part of previous change. And it was originally thought to be the issue with the test infra in the previous change and a fix was applied to suppress this issue. The change removes this suppression and it was confirmed that existing ORDER BY queries in SubPartitionedTests repro the issue.

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • [] New feature (non-breaking change which adds functionality)
  • [] Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • [] This change requires a documentation update

To automatically close an issue: closes #4563

@neildsh
Copy link
Contributor

neildsh commented Jul 15, 2024

Great description! So just to confirm the change is already covered by existing tests in SubPartitionTests ?

@adityasa
Copy link
Contributor Author

Correct. I confirmed that the test fails with same same symptoms as reported by cx without the call to HierarchicalPartitionutils from OrderBy Inner Enumerator.

@adityasa adityasa force-pushed the users/adityasa/OrderByWrongResults branch from 8feb272 to ba93330 Compare July 16, 2024 00:37
@adityasa adityasa added the auto-merge Enables automation to merge PRs label Jul 16, 2024
@adityasa
Copy link
Contributor Author

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@adityasa adityasa added auto-merge Enables automation to merge PRs and removed auto-merge Enables automation to merge PRs labels Jul 17, 2024
@kirankumarkolli kirankumarkolli merged commit f277ce3 into master Jul 18, 2024
21 checks passed
@kirankumarkolli kirankumarkolli deleted the users/adityasa/OrderByWrongResults branch July 18, 2024 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge Enables automation to merge PRs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Query on one partition returns data from other partitions on multi-partition container
4 participants