-
Notifications
You must be signed in to change notification settings - Fork 538
Update 'balance probabilities test' and 'get_embeddings' test #589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request addresses issues in two tests. The get_embeddings test is updated to correctly handle 'thinking tokens' by adjusting the slice for training embeddings. The balance_probabilities test is rewritten with a more robust criterion, checking for increased uniformity in probability distributions rather than expecting a perfectly uniform one. This change also involves refactoring the balancing logic into a new utility function, balance_probas_by_class_counts, and adding a corresponding unit test. The changes are logical and improve the test suite. I've provided a critical fix for a new test that would otherwise fail due to incorrect parameters, and a suggestion to improve the robustness of the new utility function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the probability balancing logic by extracting it into a standalone utility function and improves the corresponding tests. The changes focus on making the code more maintainable and testable.
Key Changes:
- Extracted
balance_probas_by_class_countsutility function from classifier-specific code - Added dedicated unit test for the probability balancing function
- Improved test coverage for balanced probabilities with more realistic imbalanced dataset scenarios
- Fixed train embeddings extraction in transformer to account for thinking tokens offset
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| src/tabpfn/utils.py | Added new balance_probas_by_class_counts function to centralize probability balancing logic |
| src/tabpfn/classifier.py | Refactored to use the extracted utility function instead of inline implementation |
| tests/test_utils.py | Added unit test for the new balance_probas_by_class_counts function |
| tests/test_classifier_interface.py | Updated test to use synthetic imbalanced dataset and improved assertion logic |
| src/tabpfn/architectures/base/transformer.py | Fixed train embeddings extraction to properly skip thinking token rows |
Comments suppressed due to low confidence (1)
src/tabpfn/architectures/base/transformer.py:574
- This assignment to 'train_encoder_out' is unnecessary as it is redefined before this value is used.
train_encoder_out = self.encoder_compression_layer(
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request effectively addresses two testing issues. The get_embeddings logic is corrected to account for thinking tokens, and the balance_probabilities test is significantly improved to be more robust and meaningful. The refactoring of the balancing logic into a separate utility function with its own unit test is a great improvement for code organization and maintainability.
I've provided a few suggestions to enhance code conciseness, improve numerical stability, and ensure test reproducibility. Overall, these are solid changes.
LeoGrin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Agree that the previous balance probabilty test was too strong. (the new one might be a bit weak but don't have a great solution for that)
Issue
This fixes 3 tests for upcoming V2.5: