[Bug Fix] Avoid reusing shared metrics evaluator across threads #1664

rachel88888 · 2025-02-14T20:58:55Z

Hello!

I have noticed the same issue as Issue #1506 where the number of results retrieved is inconsistent across reads and traced the issue to the reuse of the same metrics evaluator across threads when reading manifests. Because the metrics evaluator is stateful, this will result in the wrong results being retrieved nondeterministically, depending on the execution order of the threads.

This PR addresses the issue by creating a single metrics evaluator per thread, which I have tested locally. Please let me know if there are any tests I can add, and I am happy to receive feedback.

Thank you!

Closes #1506

Fokko

Good catch @rachel88888, thanks for fixing this 👍

rachel88888 · 2025-02-18T15:47:49Z

Thank you for the quick review @Fokko!

[Bug Fix] Avoid reusing shared metrics evaluator across threads

fbe0417

Fokko added this to the PyIceberg 0.9.0 release milestone Feb 18, 2025

Fokko approved these changes Feb 18, 2025

View reviewed changes

Fokko merged commit d26d1e4 into apache:main Feb 18, 2025
7 checks passed

rachel88888 deleted the bugfix-1506/scan-plan branch February 18, 2025 15:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug Fix] Avoid reusing shared metrics evaluator across threads #1664

[Bug Fix] Avoid reusing shared metrics evaluator across threads #1664

rachel88888 commented Feb 14, 2025

Fokko left a comment

rachel88888 commented Feb 18, 2025

[Bug Fix] Avoid reusing shared metrics evaluator across threads #1664

[Bug Fix] Avoid reusing shared metrics evaluator across threads #1664

Conversation

rachel88888 commented Feb 14, 2025

Fokko left a comment

Choose a reason for hiding this comment

rachel88888 commented Feb 18, 2025