-
-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(developer): output new TrieModel format when compiling 💾 #12129
feat(developer): output new TrieModel format when compiling 💾 #12129
Conversation
User Test ResultsTest specification and instructions User tests are not required |
I like this. Before we get any further, can you provide some more detail on this:
|
What does this mean? |
As the changes from #12128 are required to interpret the new code, it will only work with 18.0 at this time. We could consider embedding the decompression functions and a bit more rigging (within the model and the worker) to selectively auto-decompress it if not 18.0+, but per a recent discussion we had, I don't think we'll be planning to do this. The decompression methods should minify pretty well, though.
All models compiled with 12.0 - 17.0 will continue to operate normally, though.
Linked issue: #10336 I ended up entirely dropping the stored
Outside of that, it should be fairly precise. There are a few differences, though:
Could I get some more specifics here? What sort of scenarios and comparisons are you expecting for these tables? Would it basically be "the same comparisons," just boiled down to raw text tables, but for more models? |
HYPOTHETICAL_TEST:
That's a single test using two separate build artifacts (Developer, Android) plus a locally-built one (a model package, built via Developer). |
A small excerpt of the formatting (with line breaks added to loosely simulate word-wrapping):
For a minor breakdown:
The "random" characters before and between each section is used to encode character lengths and the |
For more fun, I locally checked-out the original
|
Here's some data for the first model in the repo I found with a primarily non-BMP script: Old format: 548 KB
New format: 129 KB
|
So, for the lexical models I've tested so far... granted with one reload each...
|
…e-trie-compression' into feat/developer/compress-compiled-tries
Co-authored-by: Marc Durdin <marc@durdin.net>
Fixes #10336.
With these changes in place, Developer will output the new, comparatively-compressed encoded-string Trie format for Trie-based lexical model projects.
For an obvious first trial, when compiling the
nrc.en.mtnt
model, version 0.3.2:Current multi-model comparison table: #12129 (comment)
@keymanapp-test-bot skip
Though, conceivably, we could request compiling an already-existing model with the Developer artifact, then loading it with the Android or iOS artifact.