-
Notifications
You must be signed in to change notification settings - Fork 312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for numpy 2.0.0 #2269
Conversation
Task linked: CU-86b0y0uu0 SDV - Add support for numpy 2.0.0 #2078 |
ef4ecbe
to
b2e5087
Compare
420e883
to
9034ee7
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2269 +/- ##
=======================================
Coverage 98.63% 98.63%
=======================================
Files 58 58
Lines 6016 6026 +10
=======================================
+ Hits 5934 5944 +10
Misses 82 82
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
9034ee7
to
7834a5f
Compare
The reason why *** pyarrow.lib.ArrowInvalid: Integer value 640058449 not in range: -16777216 to 16777216 This also raises a new issue about the benchmarking, to check if the sampled range is the expected one, currently it does not check, the same number or error is generated and therefore the test fails. We fallback to PS: We can disable |
635808f
to
61d23b1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a minor comment but otherwise it looks good 👍🏻
sdv/constraints/tabular.py
Outdated
# To make the NaN to None mapping work for pd.Categorical data, we need to convert | ||
# the columns to object before replacing NaNs with None. | ||
for column in self._columns: | ||
if pd.api.types.is_categorical_dtype(table_data[column]): | ||
table_data[column] = table_data[column].astype(object) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding the np.bytes revealed an actual bug in the FixedCombinations with pd.Categorical and NaNs. It was not caught before due to reject sampling; before enough rows were generated, but none with a combination including NaNs in the categorical column. After adding np.bytes and somehow a bit randomly, the synthesizer was able to generate only 9 out of the 10 rows. Changing the number of rows to sample also made the test pass but did not fix the bug, haha. This fixes the bug, and I added a test for it. Let me know if it makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused why adding bytes revealed this when the check is for is_categorical_dtype. Bytes are not a categorical dtype
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the change about the bug you found makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just one comment
table_data[self._columns] = table_data[self._columns].astype({ | ||
col: object | ||
for col in self._columns | ||
if pd.api.types.is_categorical_dtype(table_data[col]) | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on this discussion, should we just use fillna instead of replace? Then we don't have to convert
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@R-Palazzo I forgot that fillna cannot be used with None which is probably why we have this line of code in the first place. This solution is good
table_data[self._columns] = table_data[self._columns].astype({ | ||
col: object | ||
for col in self._columns | ||
if pd.api.types.is_categorical_dtype(table_data[col]) | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@R-Palazzo I forgot that fillna cannot be used with None which is probably why we have this line of code in the first place. This solution is good
Resolve #2078
CU-86b0y0uu0