Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Second-last step towards migrating from legacy (Kb, Mb, ...) to IEC (KiB, MiB, ...) units #28

Closed
4 of 5 tasks
HenrikBengtsson opened this issue Jul 11, 2024 · 4 comments
Labels
Misc Issues that cannot be classified otherwise needs patch Implement the agreed fix and prepare a patch for review PLUS 2024 Issues reserved for R Dev Day @ PLUS 2024

Comments

@HenrikBengtsson
Copy link

HenrikBengtsson commented Jul 11, 2024

Background

R uses home-made ("legacy") units for object and file sizes that, e.g.

> x <- double(1e12)
Error: cannot allocate vector of size 7450.6 Gb

Note that it uses 'Gb', which is actually Gigabits (=1000^3 bits).
It should really be using 'GiB' per the official IEC standard, which are defined as KiB (1024 bytes), MiB (1024^2 bytes), GiB (1024^3 bytes), TiB (1024^4 bytes). FWIW, there is also an SI standard: kB (1000 bytes), MB (1000^2 bytes), GB (1000^3 bytes), TB (1000^4 bytes).

Actions

Gist:

  • utils:::format.object_size() method already supports outputting legacy, IEC and SI units via argument legacy, IEC, and SI.
  • Identify functions that output strings with "legacy" units (mostly done)
  • Update above strings to use specific "unit string constants" instead of hard-coding the units, e.g. error(_("cannot allocate memory block of size %0.f Tb"), size) -> error(_("cannot allocate memory block of size %0.f %s"), size, "Tb")
  • Rebuild R and verify that above updates does not break anything, e.g. make check.

Special care has to be taken to avoid triggering additional memory allocations when producing these error messages. This is because in several cases these are triggered when we run out of memory.

After we know the above works as it should, it can be pushed to the R source. After this, we should:

  • update translation messages to use %s, e.g. from "cannot allocate memory block of size %0.f Tb" to "cannot allocate memory block of size %0.f %s"

Eventually, we can switch to IEC units updating the above unit string constants to the IEC standard, e.g. sed -i 's/"Tb"/"TiB"/'.

Note that with this strategy, if we later on would like to move from IEC to SI units, we don't have to update the message format strings and also not the translations - just the unit string constants and how we calculate the values.

R-Core Member

  • Martin Mächler (who approved on this strategy 2024-07-11)

See also

For full background and more details, see HenrikBengtsson/Wishlist-for-R#6.

@hturner hturner added needs patch Implement the agreed fix and prepare a patch for review Misc Issues that cannot be classified otherwise PLUS 2024 Issues reserved for R Dev Day @ PLUS 2024 labels Jul 12, 2024
@HenrikBengtsson
Copy link
Author

Patch

The following patch passes make check-all:

https://github.com/r-devel/r-svn/pull/171/files

@HenrikBengtsson
Copy link
Author

@hturner
Copy link
Member

hturner commented Jul 16, 2024

Corresponding bug is now CLOSED FIXED 🎉

@hturner hturner closed this as completed Jul 16, 2024
@hturner
Copy link
Member

hturner commented Jul 16, 2024

Note: the translation task is not suited to a Dev Day issue as we rarely have a broad range of languages represented at one event. This task could be promoted another way, e.g. in an online translatathon, or simply by contacting translation teams.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Misc Issues that cannot be classified otherwise needs patch Implement the agreed fix and prepare a patch for review PLUS 2024 Issues reserved for R Dev Day @ PLUS 2024
Projects
None yet
Development

No branches or pull requests

2 participants