Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug Fix] Avoid reusing shared metrics evaluator across threads #1664

Merged
merged 1 commit into from
Feb 18, 2025

Conversation

rachel88888
Copy link
Contributor

Hello!

I have noticed the same issue as Issue #1506 where the number of results retrieved is inconsistent across reads and traced the issue to the reuse of the same metrics evaluator across threads when reading manifests. Because the metrics evaluator is stateful, this will result in the wrong results being retrieved nondeterministically, depending on the execution order of the threads.

This PR addresses the issue by creating a single metrics evaluator per thread, which I have tested locally. Please let me know if there are any tests I can add, and I am happy to receive feedback.

Thank you!

Closes #1506

@Fokko Fokko added this to the PyIceberg 0.9.0 release milestone Feb 18, 2025
Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch @rachel88888, thanks for fixing this 👍

@Fokko Fokko merged commit d26d1e4 into apache:main Feb 18, 2025
7 checks passed
@rachel88888 rachel88888 deleted the bugfix-1506/scan-plan branch February 18, 2025 15:47
@rachel88888
Copy link
Contributor Author

Thank you for the quick review @Fokko!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Explore potential issue with scan returning the incorrect results
2 participants