Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

general approach to recognise non canonical memory structures ? #362

Open
moodymudskipper opened this issue Apr 12, 2024 · 3 comments · Fixed by #371
Open

general approach to recognise non canonical memory structures ? #362

moodymudskipper opened this issue Apr 12, 2024 · 3 comments · Fixed by #371
Milestone

Comments

@moodymudskipper
Copy link
Collaborator

moodymudskipper commented Apr 12, 2024

R can create negative zeros, NAs, NaNs, that are mostly not recognised by R functions.

https://twitter.com/antoine_fabri/status/1778467270819213778

Should we take care of those ? This part is not too hard, the only thing is that it might confuse the user, and it means we won't compress c(0, -0, 0, 0) into rep(0, 4) for instance.

However the following shows that this sign does matter:

sign(1/(-0))
#> [1] -1
sign(1/0)
#> [1] 1

This byte issue comes up also with bit64 integers, 0 and NA are considered identical and negative values are all considered identical because the package does some bit hacking.

Defining row.names as c(NA, -n) rather than 1:n also creates "identical" objects with a different serialisation.

We could have also other types of corruptions, like below:

serialize(TRUE, NULL)
#>  [1] 58 0a 00 00 00 03 00 04 02 01 00 03 05 00 00 00 00 05 55 54 46 2d 38 00 00
#> [26] 00 0a 00 00 00 01 00 00 00 01
serialize(FALSE, NULL)
#>  [1] 58 0a 00 00 00 03 00 04 02 01 00 03 05 00 00 00 00 05 55 54 46 2d 38 00 00
#> [26] 00 0a 00 00 00 01 00 00 00 00
serialize(NA, NULL)
#>  [1] 58 0a 00 00 00 03 00 04 02 01 00 03 05 00 00 00 00 05 55 54 46 2d 38 00 00
#> [26] 00 0a 00 00 00 01 80 00 00 00
true_s <- serialize(TRUE, NULL)

true_s2 <- true_s
true_s2[35] <- as.raw(2)
true_s2
#>  [1] 58 0a 00 00 00 03 00 04 02 01 00 03 05 00 00 00 00 05 55 54 46 2d 38 00 00
#> [26] 00 0a 00 00 00 01 00 00 00 02
TRUE2 <- unserialize(true_s2)
TRUE2
#> [1] TRUE
identical(TRUE, TRUE2)
#> [1] FALSE
isTRUE(TRUE2)
#> [1] TRUE
rlang::is_true(TRUE2)
#> [2] FALSE

Created on 2024-04-12 with reprex v2.0.2

In that case it's interesting that identical() actually sees the difference, so we have 2 different TRUE values.

Encoding hell is another issue.

x <- "É"
y <- iconv(x, from="UTF-8", to="latin1")
identical(x, y)
#> [1] TRUE
serialize(x, NULL)
#>  [1] 58 0a 00 00 00 03 00 04 02 01 00 03 05 00 00 00 00 05 55 54 46 2d 38 00 00
#> [26] 00 10 00 00 00 01 00 00 80 09 00 00 00 02 c3 89
serialize(y, NULL)
#>  [1] 58 0a 00 00 00 03 00 04 02 01 00 03 05 00 00 00 00 05 55 54 46 2d 38 00 00
#> [26] 00 10 00 00 00 01 00 00 40 09 00 00 00 01 c9

I'm afraid that if we're too agressive about serialising everything it will slow down the package, but I also really want this package to be helpful in these difficult corner cases, maybe we can have an argument for deep checks, and solve some specific cases with special casing.

@moodymudskipper
Copy link
Collaborator Author

moodymudskipper commented Apr 12, 2024

Also waldo doesn't see those. Ultimately we really need our own waldo, with:

  • More rigorous use of subsetters, .subset() and .subset2()
  • An actual comparison object that we can subset and navigate
  • More control on identity, actually as much control as identical (attributes as sets etc) + bitwise comparison

I suppose the output of construct_issues() is not used in snapshots so this should not be a breaking change in practice.

@moodymudskipper
Copy link
Collaborator Author

was closed by mistake

@moodymudskipper
Copy link
Collaborator Author

Maybe we test if the serialisation is correct, and if it's not we rerun more carefully ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

1 participant