-
Notifications
You must be signed in to change notification settings - Fork 986
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue a warning when converting DF to DT and cols have large whole numbers (numeric types) to convert to bit64 with reference to setNumericRounding #1463
Comments
I tried using
|
See Feel free to reopen if that doesn't fix the issue. There's a FR under vignettes to explain about numeric rounding. |
Ah. Thank you. Although I have been using data tables for quite some time, I never came across setNumericRounding and int64. Since large integers are quite common, being often used as ids, shouldn't a warning be issued when converting a data.frame to a data.table in presence of large integers? It's an easy trap, especially when base methods return the right answer. Now, I just hope that this didn't introduce mistakes in previous projects... Working collaboratively, I often have to convert data frames to data tables on the fly, with the data being read and processed as data frames prior to my input in the code. |
Good point. We should be able to do that.. |
Default is not to do rounding for now.. |
If OP had used
I think we can close here. |
The result from
dt[, .N, by=x]
can be wrong whendt$x
contains large integers.That bit me today and I was surprised not to get the same counts as with
table(dt$x)
.Example:
Now get the number of occurrences of each Num_Acc:
The table() version,
head(table(dt$Num_Acc))
returns, as expected:But the data.table count version,
head(dt[, .N, by = Num_Acc])
, returns:In the latter version, the sum of all N is equal to the number of rows in dt, which is right, but the even numbers seem to be aggregated with odd numbers. This is not right.
It definitely has something to do with large numbers, since it returns the right answer when Num_Acc is converted to character, or transformed into a lower number (e.g. substracted by 201400000000).
Is it possible to make it right?...
The text was updated successfully, but these errors were encountered: