Change icuexportdata trie format to improve normalizer performance #5813

hsivonen · 2024-11-13T18:56:22Z

With the fast trie type, I see this kind of performance improvement:

el_nfc_to_nfc_utf16/icu4x                                                                             
                        time:   [3.0115 µs 3.0127 µs 3.0141 µs]
                        thrpt:  [679.47 Melem/s 679.78 Melem/s 680.06 Melem/s]
                 change:
                        time:   [-35.114% -35.083% -35.049%] (p = 0.00 < 0.05)
                        thrpt:  [+53.963% +54.042% +54.117%]
                        Performance has improved.

el_nfc_to_nfd_utf16/icu4x                                                                             
                        time:   [4.4824 µs 4.4837 µs 4.4851 µs]
                        thrpt:  [456.62 Melem/s 456.77 Melem/s 456.90 Melem/s]
                 change:
                        time:   [-30.365% -30.238% -30.102%] (p = 0.00 < 0.05)
                        thrpt:  [+43.065% +43.344% +43.605%]
                        Performance has improved.

el_nfd_to_nfd_utf16/icu4x                                                                             
                        time:   [4.4836 µs 4.4848 µs 4.4859 µs]
                        thrpt:  [456.54 Melem/s 456.66 Melem/s 456.78 Melem/s]
                 change:
                        time:   [-31.927% -31.836% -31.751%] (p = 0.00 < 0.05)
                        thrpt:  [+46.522% +46.705% +46.901%]
                        Performance has improved.

el_nfd_to_nfc_utf16/icu4x                                                                             
                        time:   [11.465 µs 11.491 µs 11.514 µs]
                        thrpt:  [194.89 Melem/s 195.29 Melem/s 195.72 Melem/s]
                 change:
                        time:   [-14.115% -14.021% -13.925%] (p = 0.00 < 0.05)
                        thrpt:  [+16.177% +16.307% +16.435%]
                        Performance has improved.

en_nfc_to_nfc_utf16/icu4x                                                                             
                        time:   [990.20 ns 990.50 ns 990.80 ns]
                        thrpt:  [2.0670 Gelem/s 2.0676 Gelem/s 2.0683 Gelem/s]
                 change:
                        time:   [-2.0851% -1.9873% -1.8175%] (p = 0.00 < 0.05)
                        thrpt:  [+1.8512% +2.0275% +2.1295%]
                        Performance has improved.

en_nfc_to_nfd_utf16/icu4x                                                                             
                        time:   [704.59 ns 705.45 ns 706.47 ns]
                        thrpt:  [2.8989 Gelem/s 2.9031 Gelem/s 2.9066 Gelem/s]
                 change:
                        time:   [-30.362% -30.311% -30.265%] (p = 0.00 < 0.05)
                        thrpt:  [+43.401% +43.494% +43.599%]
                        Performance has improved.

en_nfd_to_nfd_utf16/icu4x                                                                             
                        time:   [704.27 ns 704.57 ns 705.05 ns]
                        thrpt:  [2.9048 Gelem/s 2.9067 Gelem/s 2.9080 Gelem/s]
                 change:
                        time:   [-30.268% -30.188% -30.087%] (p = 0.00 < 0.05)
                        thrpt:  [+43.035% +43.242% +43.406%]
                        Performance has improved.

en_nfd_to_nfc_utf16/icu4x                                                                             
                        time:   [991.99 ns 992.27 ns 992.55 ns]
                        thrpt:  [2.0634 Gelem/s 2.0640 Gelem/s 2.0645 Gelem/s]
                 change:
                        time:   [-2.0092% -1.9614% -1.9088%] (p = 0.00 < 0.05)
                        thrpt:  [+1.9460% +2.0006% +2.0504%]
                        Performance has improved.

fr_nfc_to_nfc_utf16/icu4x                                                                             
                        time:   [984.23 ns 984.47 ns 984.72 ns]
                        thrpt:  [2.0798 Gelem/s 2.0803 Gelem/s 2.0808 Gelem/s]
                 change:
                        time:   [-2.0970% -1.9296% -1.8348%] (p = 0.00 < 0.05)
                        thrpt:  [+1.8691% +1.9675% +2.1419%]
                        Performance has improved.

fr_nfc_to_nfd_utf16/icu4x                                                                             
                        time:   [1.5205 µs 1.5215 µs 1.5223 µs]
                        thrpt:  [1.3453 Gelem/s 1.3460 Gelem/s 1.3469 Gelem/s]
                 change:
                        time:   [-24.128% -23.976% -23.831%] (p = 0.00 < 0.05)
                        thrpt:  [+31.286% +31.538% +31.801%]
                        Performance has improved.

fr_nfd_to_nfd_utf16/icu4x                                                                             
                        time:   [1.5139 µs 1.5159 µs 1.5177 µs]
                        thrpt:  [1.3494 Gelem/s 1.3510 Gelem/s 1.3528 Gelem/s]
                 change:
                        time:   [-22.127% -22.036% -21.950%] (p = 0.00 < 0.05)
                        thrpt:  [+28.123% +28.265% +28.414%]
                        Performance has improved.

fr_nfd_to_nfc_utf16/icu4x                                                                             
                        time:   [3.5719 µs 3.5739 µs 3.5760 µs]
                        thrpt:  [588.65 Melem/s 588.99 Melem/s 589.33 Melem/s]
                 change:
                        time:   [-4.9182% -4.8589% -4.7968%] (p = 0.00 < 0.05)
                        thrpt:  [+5.0385% +5.1070% +5.1726%]
                        Performance has improved.

ja_nfc_to_nfc_utf16/icu4x                                                                             
                        time:   [3.3380 µs 3.3388 µs 3.3394 µs]
                        thrpt:  [613.28 Melem/s 613.40 Melem/s 613.53 Melem/s]
                 change:
                        time:   [-42.084% -42.055% -42.027%] (p = 0.00 < 0.05)
                        thrpt:  [+72.495% +72.578% +72.664%]
                        Performance has improved.

ja_nfc_to_nfd_utf16/icu4x                                                                             
                        time:   [4.6673 µs 4.6767 µs 4.6874 µs]
                        thrpt:  [436.91 Melem/s 437.92 Melem/s 438.79 Melem/s]
                 change:
                        time:   [-28.384% -28.291% -28.174%] (p = 0.00 < 0.05)
                        thrpt:  [+39.226% +39.453% +39.633%]
                        Performance has improved.

ja_nfd_to_nfd_utf16/icu4x                                                                             
                        time:   [4.8018 µs 4.8065 µs 4.8115 µs]
                        thrpt:  [425.65 Melem/s 426.09 Melem/s 426.50 Melem/s]
                 change:
                        time:   [-27.291% -27.215% -27.147%] (p = 0.00 < 0.05)
                        thrpt:  [+37.262% +37.391% +37.534%]
                        Performance has improved.

ja_nfd_to_nfc_utf16/icu4x                                                                             
                        time:   [8.8205 µs 8.8228 µs 8.8250 µs]
                        thrpt:  [246.01 Melem/s 246.07 Melem/s 246.13 Melem/s]
                 change:
                        time:   [-14.915% -14.811% -14.716%] (p = 0.00 < 0.05)
                        thrpt:  [+17.255% +17.386% +17.530%]
                        Performance has improved.

kn_nfc_to_nfc_utf16/icu4x                                                                             
                        time:   [8.1968 µs 8.2000 µs 8.2032 µs]
                        thrpt:  [249.66 Melem/s 249.76 Melem/s 249.85 Melem/s]
                 change:
                        time:   [-12.150% -12.094% -12.035%] (p = 0.00 < 0.05)
                        thrpt:  [+13.681% +13.758% +13.831%]
                        Performance has improved.

kn_nfc_to_nfd_utf16/icu4x                                                                             
                        time:   [4.4757 µs 4.4765 µs 4.4774 µs]
                        thrpt:  [457.41 Melem/s 457.50 Melem/s 457.59 Melem/s]
                 change:
                        time:   [-26.836% -26.756% -26.660%] (p = 0.00 < 0.05)
                        thrpt:  [+36.352% +36.529% +36.679%]
                        Performance has improved.

kn_nfd_to_nfd_utf16/icu4x                                                                             
                        time:   [3.7885 µs 3.7893 µs 3.7901 µs]
                        thrpt:  [540.35 Melem/s 540.47 Melem/s 540.59 Melem/s]
                 change:
                        time:   [-31.691% -31.619% -31.551%] (p = 0.00 < 0.05)
                        thrpt:  [+46.094% +46.239% +46.394%]
                        Performance has improved.

kn_nfd_to_nfc_utf16/icu4x                                                                             
                        time:   [10.406 µs 10.411 µs 10.417 µs]
                        thrpt:  [202.08 Melem/s 202.18 Melem/s 202.29 Melem/s]
                 change:
                        time:   [-9.8301% -9.7583% -9.6878%] (p = 0.00 < 0.05)
                        thrpt:  [+10.727% +10.814% +10.902%]
                        Performance has improved.

ko_nfc_to_nfc_utf16/icu4x                                                                             
                        time:   [2.7431 µs 2.7435 µs 2.7440 µs]
                        thrpt:  [746.36 Melem/s 746.48 Melem/s 746.60 Melem/s]
                 change:
                        time:   [-33.624% -33.575% -33.534%] (p = 0.00 < 0.05)
                        thrpt:  [+50.454% +50.547% +50.658%]
                        Performance has improved.

ko_nfc_to_nfd_utf16/icu4x                                                                             
                        time:   [18.572 µs 18.579 µs 18.587 µs]
                        thrpt:  [110.19 Melem/s 110.23 Melem/s 110.27 Melem/s]
                 change:
                        time:   [-5.1016% -4.9844% -4.8827%] (p = 0.00 < 0.05)
                        thrpt:  [+5.1334% +5.2459% +5.3758%]
                        Performance has improved.

ko_nfd_to_nfd_utf16/icu4x                                                                             
                        time:   [6.5094 µs 6.5145 µs 6.5199 µs]
                        thrpt:  [314.12 Melem/s 314.37 Melem/s 314.62 Melem/s]
                 change:
                        time:   [-38.693% -38.636% -38.575%] (p = 0.00 < 0.05)
                        thrpt:  [+62.801% +62.961% +63.113%]
                        Performance has improved.

ko_nfd_to_nfc_utf16/icu4x                                                                             
                        time:   [39.109 µs 39.145 µs 39.181 µs]
                        thrpt:  [102.86 Melem/s 102.95 Melem/s 103.05 Melem/s]
                 change:
                        time:   [-3.7017% -3.6061% -3.5037%] (p = 0.00 < 0.05)
                        thrpt:  [+3.6309% +3.7410% +3.8440%]
                        Performance has improved.

vi_nfc_to_nfc_utf16/icu4x                                                                             
                        time:   [1.3298 µs 1.3313 µs 1.3331 µs]
                        thrpt:  [1.5363 Gelem/s 1.5384 Gelem/s 1.5401 Gelem/s]
                 change:
                        time:   [-14.921% -14.827% -14.696%] (p = 0.00 < 0.05)
                        thrpt:  [+17.228% +17.408% +17.538%]
                        Performance has improved.

vi_nfc_to_nfd_utf16/icu4x                                                                             
                        time:   [7.3388 µs 7.3408 µs 7.3428 µs]
                        thrpt:  [278.91 Melem/s 278.99 Melem/s 279.06 Melem/s]
                 change:
                        time:   [-10.183% -10.060% -9.9402%] (p = 0.00 < 0.05)
                        thrpt:  [+11.037% +11.185% +11.337%]
                        Performance has improved.

vi_nfd_to_nfd_utf16/icu4x                                                                             
                        time:   [6.7638 µs 6.7926 µs 6.8147 µs]
                        thrpt:  [300.53 Melem/s 301.51 Melem/s 302.79 Melem/s]
                 change:
                        time:   [-5.9909% -5.4787% -4.8880%] (p = 0.00 < 0.05)
                        thrpt:  [+5.1392% +5.7963% +6.3727%]
                        Performance has improved.

vi_nfd_to_nfc_utf16/icu4x                                                                             
                        time:   [21.887 µs 21.897 µs 21.908 µs]
                        thrpt:  [119.09 Melem/s 119.15 Melem/s 119.21 Melem/s]
                 change:
                        time:   [-6.6439% -6.5689% -6.4944%] (p = 0.00 < 0.05)
                        thrpt:  [+6.9454% +7.0308% +7.1167%]
                        Performance has improved.

vi_orthographic_to_nfc_utf16/icu4x                                                                             
                        time:   [19.753 µs 19.766 µs 19.780 µs]
                        thrpt:  [120.63 Melem/s 120.71 Melem/s 120.79 Melem/s]
                 change:
                        time:   [-2.8159% -2.7400% -2.6556%] (p = 0.00 < 0.05)
                        thrpt:  [+2.7280% +2.8172% +2.8974%]
                        Performance has improved.

vi_orthographic_to_nfd_utf16/icu4x                                                                             
                        time:   [7.0146 µs 7.0182 µs 7.0223 µs]
                        thrpt:  [339.78 Melem/s 339.97 Melem/s 340.15 Melem/s]
                 change:
                        time:   [-12.492% -12.445% -12.397%] (p = 0.00 < 0.05)
                        thrpt:  [+14.151% +14.214% +14.275%]
                        Performance has improved.

zh_nfc_to_nfc_utf16/icu4x                                                                             
                        time:   [3.2568 µs 3.2577 µs 3.2586 µs]
                        thrpt:  [628.49 Melem/s 628.67 Melem/s 628.83 Melem/s]
                 change:
                        time:   [-35.288% -35.198% -35.146%] (p = 0.00 < 0.05)
                        thrpt:  [+54.194% +54.317% +54.530%]
                        Performance has improved.

zh_nfc_to_nfd_utf16/icu4x                                                                             
                        time:   [2.8441 µs 2.8452 µs 2.8464 µs]
                        thrpt:  [719.50 Melem/s 719.80 Melem/s 720.09 Melem/s]
                 change:
                        time:   [-38.993% -38.911% -38.836%] (p = 0.00 < 0.05)
                        thrpt:  [+63.495% +63.696% +63.914%]
                        Performance has improved.

zh_nfd_to_nfd_utf16/icu4x                                                                             
                        time:   [2.8525 µs 2.8540 µs 2.8555 µs]
                        thrpt:  [717.21 Melem/s 717.59 Melem/s 717.97 Melem/s]
                 change:
                        time:   [-39.014% -38.907% -38.811%] (p = 0.00 < 0.05)
                        thrpt:  [+63.429% +63.685% +63.971%]
                        Performance has improved.

zh_nfd_to_nfc_utf16/icu4x                                                                             
                        time:   [3.2835 µs 3.2847 µs 3.2860 µs]
                        thrpt:  [623.56 Melem/s 623.80 Melem/s 624.02 Melem/s]
                 change:
                        time:   [-34.955% -34.935% -34.913%] (p = 0.00 < 0.05)
                        thrpt:  [+53.641% +53.693% +53.739%]
                        Performance has improved.

hsivonen · 2024-11-13T19:01:01Z

ICU4C issue: https://unicode-org.atlassian.net/browse/ICU-22968

hsivonen · 2024-11-13T19:05:43Z

ICU4C PR: unicode-org/icu#3269

Manishearth

Landable for the purpose of 2.0, but I think this could have a couple more pointers in the docs and be more encapsulated.

components/normalizer/trie-value-format.md

Manishearth · 2024-11-13T21:18:42Z

components/normalizer/src/provider.rs

-    /// Getting a zero from this trie means that you need
-    /// to make another lookup from `DecompositionDataV1::trie`.
+pub struct DecompositionDataV2<'data> {
+    /// Trie for decomposition.
    #[cfg_attr(feature = "serde", serde(borrow))]
    pub trie: CodePointTrie<'data, u32>,


issue: I feel like the packed code logic is all scattered. Can we use a structured NormalizationTrieValue(pub u32) type that has convenience methods for getting all the fields?

I agree that what you suggest would be better for encapsulation. However, given that prior to this PR there was no such encapsulation and I'm already way over my time budget for this, I would very much prefer landing this ASAP (before 2.0 and before this bitrots) without such a refactoring and leaving the refactoring as a follow-up.

components/normalizer/src/provider.rs

hsivonen · 2024-12-11T08:54:11Z

Once the ICU4C side lands, this PR needs an update to take a newer export zip in datagen.

…ven more

… redundant

sffc · 2024-12-16T21:18:49Z

I realized while updating the tag that I think this change makes datagen incompatible with older ICU tags. The data coming from icuexportdata changed structure; it wasn't just some new additions. Are we ok with that?

sffc · 2024-12-16T21:38:21Z

I pushed 4 commits to the branch (ignore the force-push; it was to fix a merge issue I made and did not change any of hsivonen's commits). CI is now all green except for a clippy issue.

Manishearth · 2024-12-16T23:27:51Z

I'm fine with this breaking for 2.0.

hsivonen · 2024-12-17T08:17:53Z

I realized while updating the tag that I think this change makes datagen incompatible with older ICU tags. The data coming from icuexportdata changed structure; it wasn't just some new additions. Are we ok with that?

I think we pretty much have to be. The performance improvement here is just too good to reject in order to enable the use of old data (which would seem like a very niche use case if there even is a use case). Also, my understanding from prior discussions was that we had agreed we're OK with a data compatibility break like this for 2.0.

Thanks for updating this with the new data export.

hsivonen · 2024-12-18T12:25:31Z

Thanks! Landed.

robertbastian · 2025-01-07T15:04:23Z

Why was half of this change done in ICU4C? Couldn't this transformation have been applied in datagen? I'm asking this for two reasons:

I'm not at all familiar with the ICU4C code, so I cannot follow the other half of this change (ICU-22968 Rearrange bits in trie values in normalization data export for ICU4X icu#3269). As an ICU4X expert I'd like to be able to follow these changes without being an ICU4C expert
This would have kept the icuexportdata format stable

Edit: and

In the long term we want this data to come from UCD, without going through ICU4C (Reduce ICU4X's dependence on ICU4C data #4602), at which point an approach like this is not possible anymore anyway

hsivonen · 2025-01-07T16:15:29Z

Why was half of this change done in ICU4C?

Because we started icuexportdata early on by putting the trie builder side is in the ICU4C repo.

Couldn't this transformation have been applied in datagen?

That would have involved keeping around at least part of the ICU4X 1.5 code for interpreting the old data while rewriting parts of icuexportdata in Rust.

Implementing #4602 for the normalizer could make sense as a RIIR project, but then it would make sense to work from UCD and not from whatever the shape of runtime data happened to be at ICU4X 1.5. (But, as discussed previously, the hardest part that makes ICU4X dependent on ICU4C is the collation data builder.)

Part of the problem that this PR fixed is that the ICU4X 1.x normalizer data format was a decomposing normalizer format plus hacks enable a composing normalizer instead of being designed to support a composing normalizer. It wasn't at all designed to support transformation by datagen.

This would have kept the icuexportdata format stable

I think freezing the icuexportdata format for normalization as it happened to be at ICU4X 1.5 would have been anti-useful, since the format isn't meant to be transformed by datagen.

hsivonen added 11 commits November 11, 2024 09:33

Rearrange the trie value bits

d6f37cb

Get rid of normalization data supplements

c7fb1bb

Perform a trie lookup by UTF-16 code unit on the fast path

6fc18fc

Perform trie lookup with UTF-16 code unit in the composing case

82b90d6

Get rid of undecomposed_starter_valid

07ab6ec

Optimize UTF-8 error cases

9bbc2aa

Add a marker value for Hangul syllables

2a7f490

Cleanup

b3c9743

Add documentation for the trie value format

6c21106

Sync data

1cea9d0

Sync the collator with the normalizer data changes

8fe9ff9

hsivonen requested review from sffc, robertbastian, Manishearth, echeran and a team as code owners November 13, 2024 18:56

hsivonen added A-performance Area: Performance (CPU, Memory) C-collator Component: Collation, normalization 2.0-breaking Changes that are breaking API changes labels Nov 13, 2024

hsivonen mentioned this pull request Nov 13, 2024

ICU-22968 Rearrange bits in trie values in normalization data export for ICU4X unicode-org/icu#3269

Merged

7 tasks

Manishearth previously approved these changes Nov 13, 2024

View reviewed changes

hsivonen mentioned this pull request Nov 14, 2024

Trie-based normalization passthrough is slower than in ICU4C #2431

Open

Update icu_harfbuzz

505f1cf

hsivonen dismissed Manishearth’s stale review via 505f1cf November 14, 2024 06:23

hsivonen added 3 commits November 14, 2024 08:23

Merge branch 'main' into normalizerdata

380df55

Mention trie-value-format.md in various places

e5faa23

Avoid a doc comment where a normal comment is needed

852834c

hsivonen added the discuss-priority Discuss at the next ICU4X meeting label Nov 14, 2024

Make the documentation for singleton decomposition more precise

fa0f538

hsivonen added 4 commits December 11, 2024 10:54

Merge branch 'main' into normalizerdata

849c1c9

Correct the remark about the REPLACEMENT CHARACTER in properties.rs

1d9d656

Correct the remark about the REPLACEMENT CHARACTER in properties.rs e…

d710462

…ven more

Remove the remark about the REPLACEMENT CHARACTER in properties.rs as…

70c0706

… redundant

Manishearth previously approved these changes Dec 11, 2024

View reviewed changes

hsivonen mentioned this pull request Dec 12, 2024

Experiment with putting most of potential_passthrough_impl in an cold function #4858

Closed

sffc removed the discuss-priority Discuss at the next ICU4X meeting label Dec 12, 2024

sffc dismissed Manishearth’s stale review via f21e0ce December 16, 2024 21:19

Merge branch 'main' into normalizerdata

677f00f

sffc force-pushed the normalizerdata branch from f21e0ce to 677f00f Compare December 16, 2024 21:20

sffc added 3 commits December 16, 2024 13:20

Update ICU tag

4534db0

cargo make download-repo-sources

9f6f754

cargo make testdata; cargo make bakeddata

d2b5c66

sffc changed the title ~~Improve normalizer performance by adjusting the trie value format~~ Change icuexportdata trie format to improve normalizer performance Dec 16, 2024

hsivonen added 2 commits December 17, 2024 10:21

Address clippy lint

b23c598

Merge branch 'main' into normalizerdata

abf8b9c

hsivonen requested a review from Manishearth December 17, 2024 09:00

Merge branch 'main' into normalizerdata

28e024d

sffc approved these changes Dec 18, 2024

View reviewed changes

hsivonen merged commit 5f103cb into unicode-org:main Dec 18, 2024
28 checks passed

hsivonen deleted the normalizerdata branch December 18, 2024 12:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change icuexportdata trie format to improve normalizer performance #5813

Change icuexportdata trie format to improve normalizer performance #5813

hsivonen commented Nov 13, 2024

hsivonen commented Nov 13, 2024

hsivonen commented Nov 13, 2024

Manishearth left a comment

Manishearth Nov 13, 2024

hsivonen Nov 14, 2024

hsivonen commented Dec 11, 2024 •

edited

Loading

sffc commented Dec 16, 2024

sffc commented Dec 16, 2024

Manishearth commented Dec 16, 2024

hsivonen commented Dec 17, 2024 •

edited

Loading

hsivonen commented Dec 18, 2024

robertbastian commented Jan 7, 2025 •

edited

Loading

hsivonen commented Jan 7, 2025 •

edited

Loading

Change icuexportdata trie format to improve normalizer performance #5813

Change icuexportdata trie format to improve normalizer performance #5813

Conversation

hsivonen commented Nov 13, 2024

hsivonen commented Nov 13, 2024

hsivonen commented Nov 13, 2024

Manishearth left a comment

Choose a reason for hiding this comment

Manishearth Nov 13, 2024

Choose a reason for hiding this comment

hsivonen Nov 14, 2024

Choose a reason for hiding this comment

hsivonen commented Dec 11, 2024 • edited Loading

sffc commented Dec 16, 2024

sffc commented Dec 16, 2024

Manishearth commented Dec 16, 2024

hsivonen commented Dec 17, 2024 • edited Loading

hsivonen commented Dec 18, 2024

robertbastian commented Jan 7, 2025 • edited Loading

hsivonen commented Jan 7, 2025 • edited Loading

hsivonen commented Dec 11, 2024 •

edited

Loading

hsivonen commented Dec 17, 2024 •

edited

Loading

robertbastian commented Jan 7, 2025 •

edited

Loading

hsivonen commented Jan 7, 2025 •

edited

Loading