Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Firmware translations #3206

Merged
merged 23 commits into from
Feb 12, 2024
Merged

Firmware translations #3206

merged 23 commits into from
Feb 12, 2024

Conversation

grdddj
Copy link
Contributor

@grdddj grdddj commented Aug 11, 2023

Creates a proof of concept of translating firmware into various languages.

Currently includes english and czech translations (done by ChatGPT), mainly for model R.

Build process

  • TREZOR_LANG=cs will set the language for Czech both in micropython and Rust

Translations infrastructure:

  • translations are defined in cs.json, in individual sections connected to their context
  • cs.{py,rs}.mako takes data from JSON and creates a {py,rs} file with all the translation symbols as constant strings variables/attributes (so we have a static check that all strings are defined)
  • layouts import the translation module/object and access some attributes on it

TODO:

  • make sure the strings are not stored in memory but in flash
  • check the flash size consumption (experiment with micropython::const and rust::trait approaches)
  • decide whether store all the languages inside one binary (and have messages to change the language), or not allowing the change
  • include font characters for czech language
  • handle plural forms
  • solve micropython vs Rust division - there should be only one place of defining the translations mappings
  • solve differences between TT and TR
  • do the translation in C as well (there are certainly some strings, like error messages)
  • do the translation in bootloader?
  • make sure all the languages contain the same keys (done on Rust side already)

Fixes #991

Copy link
Member

@prusnak prusnak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work! Left some suggestions mainly for the C part.

core/SConscript.firmware Outdated Show resolved Hide resolved
core/SConscript.unix Outdated Show resolved Hide resolved
core/embed/lib/fonts/fonts.c Outdated Show resolved Hide resolved
core/embed/lib/fonts/fonts.c Outdated Show resolved Hide resolved
@prusnak
Copy link
Member

prusnak commented Aug 16, 2023

I'd prefer if convert_char_utf8(const uint8_t c) returned uint32_t - so we'd support full UTF-8. Of course, we will never support for 4 billions glyphs, but at least the UTF-8 support would be complete.

@grdddj
Copy link
Contributor Author

grdddj commented Aug 17, 2023

I'd prefer if convert_char_utf8(const uint8_t c) returned uint32_t - so we'd support full UTF-8. Of course, we will never support for 4 billions glyphs, but at least the UTF-8 support would be complete.

Thanks for the feedback, I noted this down.

@grdddj grdddj self-assigned this Aug 24, 2023
@grdddj
Copy link
Contributor Author

grdddj commented Sep 7, 2023

Recent changes include:

  • moving all the translation data into Rust - so we have only one {en,cs}.json file
  • got rid of the .py files, which were taking quite some space - currently micropython requests translations via trezortranslate::tr function exported from Rust
    • we have static type-check that we only request keys that really exist
    • (I have failed to export some dict-like structure from Rust (did not even manage to return a static string), so went for the function approach, where we can return strings dynamically)

SIze considerations:

  • it is currently around 70kb bigger than the state before translations - which is very unfortunate and should be improved
  • to my surprise, the current state with everything in Rust is about the same size as with majority of translations in micropython
    • it is probably because micropython still needs to define all the strings as keys to the translate function - and they are quite long (not sure if shortening them would take less space?)
  • TranslationsGeneral::get_text() takes around 20kb, it is currently a big match statement, maybe it can be improved

The biggest question currently:

  • how do we allow for changing the language in a built binary without rebuilding and without the binary containing all the languages?

@grdddj grdddj force-pushed the grdddj/fw_translations branch 4 times, most recently from eeb9b3c to 4c2db33 Compare September 14, 2023 10:14
@grdddj grdddj force-pushed the grdddj/fw_translations branch 2 times, most recently from 52632ba to ff7f8b5 Compare October 3, 2023 10:08
@grdddj grdddj force-pushed the grdddj/fw_translations branch 3 times, most recently from 4f91a14 to f46e7e6 Compare October 17, 2023 07:57
@grdddj
Copy link
Contributor Author

grdddj commented Oct 17, 2023

Transforming into regular PR, so it is ready for review.

There is still quite some work, but I think it makes sense to stop now and analyze the status and create clear requirements for what should be done.

Current high-level status:

  • contains translations into czech and french, both done by ChatGPT, so the quality is rather poor
  • translations are centralized in JSON files - en.json, cs.json, fr.json - in core/embed/rust/src/ui/translations
  • english translations are embedded/hardcoded in the code, they will be there all the time, acting as a backup/default
  • foreign translations are stored in two 16 kb sectors, which is enough right now for the translations data
  • fonts/glyphs are currently hardcoded into the firmware, so both czech/french fonts are there all the time
  • trezorctl is responsible for generating the translation data blob (will be replaced by a custom tool)
  • the structure of the blob is a 256-byte header with metadata and then the translations data delimited by 0x00 byte
  • the translations data has around 22kB for each language, the extra fonts for czech and french have around 12 kB each (on TT, TR is like 2 kB)
  • on TT hardware, it takes at most 2.5 ms to load the translation (even with the "inefficient" delimiter storing)
  • it increases the firmware size quite a lot, on TT with both the czech/french font, the firmware is 90 kB bigger - mostly because of storing all the translations keys

Some things/considerations/questions that are missing and should be prioritized:

  • we might want to include offset table into the translations blob (it would increase the size by around 1 kB)
  • the translations are not being signed/verified
  • we do not yet have logic to append new translations to the existing ones (they should be always at the end)
  • firmware update process is not fully accounted for, it should make sure the translations are up-to-date with firmware (tell user that they should update the translations as well)
  • it might make sense to not translate altcoins at all, to save space and a lot of translating work (and confusion for users)
  • we might want to load fonts dynamically with each language
  • with bigger glyph sets, like Azbuka, we might have just one font, instead of four/five
  • TT has one extra 16kb sector (occupied by secret on TR), which might be used specifically for the font data (fonts on TR are much smaller, they may fit together with the translations)
  • in TR, how is it with the last 16 kb sector? could we safely use it? (also on TT)
  • it would be nice to use some data compression (for both translations and font data), but we have a limitation on the Rust side of no-std libraries only and no dynamic memory allocation ... also, it would mean some runtime CPU and RAM overhead

@grdddj grdddj marked this pull request as ready for review October 17, 2023 09:16
@grdddj grdddj changed the title WIP - firmware translations Firmware translations Oct 17, 2023
@grdddj grdddj force-pushed the grdddj/fw_translations branch 2 times, most recently from e2f2c5c to c9e8dad Compare November 6, 2023 09:46
@grdddj
Copy link
Contributor Author

grdddj commented Jan 3, 2024

Translation consideration with Solana - there is a huge amount of english strings in solana/transaction/instructions.py - should we also translate these or not?

for now, we use sha256 and a little of ed25519 for CoSi purposes

also add the Merkle root algorithm
the previous spelling of "aliases" created completely new enum entries

per Enum documentation:

> However, an enum member can have other names associated with it.
> Given two entries A and B with the same value (and A defined first),
> B is an alias for the member A. By-value lookup of the value of A will
> return the member A. By-name lookup of A will return the member A.
> By-name lookup of B will also return the member A.
we've had multiple copies of this function all over the codebase, it's time to move it where it belongs
@grdddj
Copy link
Contributor Author

grdddj commented Feb 12, 2024

@prusnak can you please check the requested changes? We want to merge it today or tomorrow and it is blocked by them

Copy link
Member

@mmilata mmilata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rust side ACK

Copy link
Member

@prusnak prusnak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes from my earlier review were done

@grdddj grdddj merged commit 9414082 into main Feb 12, 2024
58 of 61 checks passed
@grdddj grdddj deleted the grdddj/fw_translations branch February 12, 2024 13:49
@bosomt
Copy link

bosomt commented Mar 7, 2024

QA OK

tested on TT and S3
tested reverting to en-US too

❯ ls -laF | grep 2.7.0
-rw-r--r--@   1 bosomt  staff     1516544 Mar  3 13:45 firmware-T2B1-2.7.0-1f67c1e.bin
-rw-r--r--@   1 bosomt  staff     1516544 Mar  7 19:22 firmware-T2B1-2.7.0-45e8a842a-signed.bin
-rw-r--r--@   1 bosomt  staff     1230336 Mar  3 13:45 firmware-T2B1-btconly-2.7.0-1f67c1e.bin
-rw-r--r--@   1 bosomt  staff     1230336 Mar  7 19:22 firmware-T2B1-btconly-2.7.0-45e8a842a-signed.bin
-rw-r--r--@   1 bosomt  staff     1608704 Mar  3 13:45 firmware-T2T1-2.7.0-1f67c1e.bin
-rw-r--r--@   1 bosomt  staff     1608704 Mar  7 19:22 firmware-T2T1-2.7.0-45e8a842a-signed.bin
-rw-r--r--@   1 bosomt  staff     1290240 Mar  3 13:45 firmware-T2T1-btconly-2.7.0-1f67c1e.bin
-rw-r--r--@   1 bosomt  staff     1289728 Mar  7 19:22 firmware-T2T1-btconly-2.7.0-45e8a842a-signed.bin
-rw-r--r--@   1 bosomt  staff       22222 Mar  7 19:39 translation-T2B1-de-DE-2.7.0.bin
-rw-r--r--@   1 bosomt  staff       23166 Mar  7 19:39 translation-T2B1-es-ES-2.7.0.bin
-rw-r--r--@   1 bosomt  staff       24476 Mar  7 19:39 translation-T2B1-fr-FR-2.7.0.bin
-rw-r--r--@   1 bosomt  staff       24742 Mar  7 19:39 translation-T2T1-de-DE-2.7.0.bin
-rw-r--r--@   1 bosomt  staff       28144 Mar  7 19:39 translation-T2T1-es-ES-2.7.0.bin
-rw-r--r--@   1 bosomt  staff       35738 Mar  7 19:39 translation-T2T1-fr-FR-2.7.0.bin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Approved
Development

Successfully merging this pull request may close these issues.

Replace all English strings with identifiers and collect them
6 participants