Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change icuexportdata trie format to improve normalizer performance #5813

Merged
merged 33 commits into from
Dec 18, 2024

Conversation

hsivonen
Copy link
Member

With the fast trie type, I see this kind of performance improvement:

el_nfc_to_nfc_utf16/icu4x                                                                             
                        time:   [3.0115 µs 3.0127 µs 3.0141 µs]
                        thrpt:  [679.47 Melem/s 679.78 Melem/s 680.06 Melem/s]
                 change:
                        time:   [-35.114% -35.083% -35.049%] (p = 0.00 < 0.05)
                        thrpt:  [+53.963% +54.042% +54.117%]
                        Performance has improved.

el_nfc_to_nfd_utf16/icu4x                                                                             
                        time:   [4.4824 µs 4.4837 µs 4.4851 µs]
                        thrpt:  [456.62 Melem/s 456.77 Melem/s 456.90 Melem/s]
                 change:
                        time:   [-30.365% -30.238% -30.102%] (p = 0.00 < 0.05)
                        thrpt:  [+43.065% +43.344% +43.605%]
                        Performance has improved.

el_nfd_to_nfd_utf16/icu4x                                                                             
                        time:   [4.4836 µs 4.4848 µs 4.4859 µs]
                        thrpt:  [456.54 Melem/s 456.66 Melem/s 456.78 Melem/s]
                 change:
                        time:   [-31.927% -31.836% -31.751%] (p = 0.00 < 0.05)
                        thrpt:  [+46.522% +46.705% +46.901%]
                        Performance has improved.

el_nfd_to_nfc_utf16/icu4x                                                                             
                        time:   [11.465 µs 11.491 µs 11.514 µs]
                        thrpt:  [194.89 Melem/s 195.29 Melem/s 195.72 Melem/s]
                 change:
                        time:   [-14.115% -14.021% -13.925%] (p = 0.00 < 0.05)
                        thrpt:  [+16.177% +16.307% +16.435%]
                        Performance has improved.

en_nfc_to_nfc_utf16/icu4x                                                                             
                        time:   [990.20 ns 990.50 ns 990.80 ns]
                        thrpt:  [2.0670 Gelem/s 2.0676 Gelem/s 2.0683 Gelem/s]
                 change:
                        time:   [-2.0851% -1.9873% -1.8175%] (p = 0.00 < 0.05)
                        thrpt:  [+1.8512% +2.0275% +2.1295%]
                        Performance has improved.

en_nfc_to_nfd_utf16/icu4x                                                                             
                        time:   [704.59 ns 705.45 ns 706.47 ns]
                        thrpt:  [2.8989 Gelem/s 2.9031 Gelem/s 2.9066 Gelem/s]
                 change:
                        time:   [-30.362% -30.311% -30.265%] (p = 0.00 < 0.05)
                        thrpt:  [+43.401% +43.494% +43.599%]
                        Performance has improved.

en_nfd_to_nfd_utf16/icu4x                                                                             
                        time:   [704.27 ns 704.57 ns 705.05 ns]
                        thrpt:  [2.9048 Gelem/s 2.9067 Gelem/s 2.9080 Gelem/s]
                 change:
                        time:   [-30.268% -30.188% -30.087%] (p = 0.00 < 0.05)
                        thrpt:  [+43.035% +43.242% +43.406%]
                        Performance has improved.

en_nfd_to_nfc_utf16/icu4x                                                                             
                        time:   [991.99 ns 992.27 ns 992.55 ns]
                        thrpt:  [2.0634 Gelem/s 2.0640 Gelem/s 2.0645 Gelem/s]
                 change:
                        time:   [-2.0092% -1.9614% -1.9088%] (p = 0.00 < 0.05)
                        thrpt:  [+1.9460% +2.0006% +2.0504%]
                        Performance has improved.

fr_nfc_to_nfc_utf16/icu4x                                                                             
                        time:   [984.23 ns 984.47 ns 984.72 ns]
                        thrpt:  [2.0798 Gelem/s 2.0803 Gelem/s 2.0808 Gelem/s]
                 change:
                        time:   [-2.0970% -1.9296% -1.8348%] (p = 0.00 < 0.05)
                        thrpt:  [+1.8691% +1.9675% +2.1419%]
                        Performance has improved.

fr_nfc_to_nfd_utf16/icu4x                                                                             
                        time:   [1.5205 µs 1.5215 µs 1.5223 µs]
                        thrpt:  [1.3453 Gelem/s 1.3460 Gelem/s 1.3469 Gelem/s]
                 change:
                        time:   [-24.128% -23.976% -23.831%] (p = 0.00 < 0.05)
                        thrpt:  [+31.286% +31.538% +31.801%]
                        Performance has improved.

fr_nfd_to_nfd_utf16/icu4x                                                                             
                        time:   [1.5139 µs 1.5159 µs 1.5177 µs]
                        thrpt:  [1.3494 Gelem/s 1.3510 Gelem/s 1.3528 Gelem/s]
                 change:
                        time:   [-22.127% -22.036% -21.950%] (p = 0.00 < 0.05)
                        thrpt:  [+28.123% +28.265% +28.414%]
                        Performance has improved.

fr_nfd_to_nfc_utf16/icu4x                                                                             
                        time:   [3.5719 µs 3.5739 µs 3.5760 µs]
                        thrpt:  [588.65 Melem/s 588.99 Melem/s 589.33 Melem/s]
                 change:
                        time:   [-4.9182% -4.8589% -4.7968%] (p = 0.00 < 0.05)
                        thrpt:  [+5.0385% +5.1070% +5.1726%]
                        Performance has improved.

ja_nfc_to_nfc_utf16/icu4x                                                                             
                        time:   [3.3380 µs 3.3388 µs 3.3394 µs]
                        thrpt:  [613.28 Melem/s 613.40 Melem/s 613.53 Melem/s]
                 change:
                        time:   [-42.084% -42.055% -42.027%] (p = 0.00 < 0.05)
                        thrpt:  [+72.495% +72.578% +72.664%]
                        Performance has improved.

ja_nfc_to_nfd_utf16/icu4x                                                                             
                        time:   [4.6673 µs 4.6767 µs 4.6874 µs]
                        thrpt:  [436.91 Melem/s 437.92 Melem/s 438.79 Melem/s]
                 change:
                        time:   [-28.384% -28.291% -28.174%] (p = 0.00 < 0.05)
                        thrpt:  [+39.226% +39.453% +39.633%]
                        Performance has improved.

ja_nfd_to_nfd_utf16/icu4x                                                                             
                        time:   [4.8018 µs 4.8065 µs 4.8115 µs]
                        thrpt:  [425.65 Melem/s 426.09 Melem/s 426.50 Melem/s]
                 change:
                        time:   [-27.291% -27.215% -27.147%] (p = 0.00 < 0.05)
                        thrpt:  [+37.262% +37.391% +37.534%]
                        Performance has improved.

ja_nfd_to_nfc_utf16/icu4x                                                                             
                        time:   [8.8205 µs 8.8228 µs 8.8250 µs]
                        thrpt:  [246.01 Melem/s 246.07 Melem/s 246.13 Melem/s]
                 change:
                        time:   [-14.915% -14.811% -14.716%] (p = 0.00 < 0.05)
                        thrpt:  [+17.255% +17.386% +17.530%]
                        Performance has improved.

kn_nfc_to_nfc_utf16/icu4x                                                                             
                        time:   [8.1968 µs 8.2000 µs 8.2032 µs]
                        thrpt:  [249.66 Melem/s 249.76 Melem/s 249.85 Melem/s]
                 change:
                        time:   [-12.150% -12.094% -12.035%] (p = 0.00 < 0.05)
                        thrpt:  [+13.681% +13.758% +13.831%]
                        Performance has improved.

kn_nfc_to_nfd_utf16/icu4x                                                                             
                        time:   [4.4757 µs 4.4765 µs 4.4774 µs]
                        thrpt:  [457.41 Melem/s 457.50 Melem/s 457.59 Melem/s]
                 change:
                        time:   [-26.836% -26.756% -26.660%] (p = 0.00 < 0.05)
                        thrpt:  [+36.352% +36.529% +36.679%]
                        Performance has improved.

kn_nfd_to_nfd_utf16/icu4x                                                                             
                        time:   [3.7885 µs 3.7893 µs 3.7901 µs]
                        thrpt:  [540.35 Melem/s 540.47 Melem/s 540.59 Melem/s]
                 change:
                        time:   [-31.691% -31.619% -31.551%] (p = 0.00 < 0.05)
                        thrpt:  [+46.094% +46.239% +46.394%]
                        Performance has improved.

kn_nfd_to_nfc_utf16/icu4x                                                                             
                        time:   [10.406 µs 10.411 µs 10.417 µs]
                        thrpt:  [202.08 Melem/s 202.18 Melem/s 202.29 Melem/s]
                 change:
                        time:   [-9.8301% -9.7583% -9.6878%] (p = 0.00 < 0.05)
                        thrpt:  [+10.727% +10.814% +10.902%]
                        Performance has improved.

ko_nfc_to_nfc_utf16/icu4x                                                                             
                        time:   [2.7431 µs 2.7435 µs 2.7440 µs]
                        thrpt:  [746.36 Melem/s 746.48 Melem/s 746.60 Melem/s]
                 change:
                        time:   [-33.624% -33.575% -33.534%] (p = 0.00 < 0.05)
                        thrpt:  [+50.454% +50.547% +50.658%]
                        Performance has improved.

ko_nfc_to_nfd_utf16/icu4x                                                                             
                        time:   [18.572 µs 18.579 µs 18.587 µs]
                        thrpt:  [110.19 Melem/s 110.23 Melem/s 110.27 Melem/s]
                 change:
                        time:   [-5.1016% -4.9844% -4.8827%] (p = 0.00 < 0.05)
                        thrpt:  [+5.1334% +5.2459% +5.3758%]
                        Performance has improved.

ko_nfd_to_nfd_utf16/icu4x                                                                             
                        time:   [6.5094 µs 6.5145 µs 6.5199 µs]
                        thrpt:  [314.12 Melem/s 314.37 Melem/s 314.62 Melem/s]
                 change:
                        time:   [-38.693% -38.636% -38.575%] (p = 0.00 < 0.05)
                        thrpt:  [+62.801% +62.961% +63.113%]
                        Performance has improved.

ko_nfd_to_nfc_utf16/icu4x                                                                             
                        time:   [39.109 µs 39.145 µs 39.181 µs]
                        thrpt:  [102.86 Melem/s 102.95 Melem/s 103.05 Melem/s]
                 change:
                        time:   [-3.7017% -3.6061% -3.5037%] (p = 0.00 < 0.05)
                        thrpt:  [+3.6309% +3.7410% +3.8440%]
                        Performance has improved.

vi_nfc_to_nfc_utf16/icu4x                                                                             
                        time:   [1.3298 µs 1.3313 µs 1.3331 µs]
                        thrpt:  [1.5363 Gelem/s 1.5384 Gelem/s 1.5401 Gelem/s]
                 change:
                        time:   [-14.921% -14.827% -14.696%] (p = 0.00 < 0.05)
                        thrpt:  [+17.228% +17.408% +17.538%]
                        Performance has improved.

vi_nfc_to_nfd_utf16/icu4x                                                                             
                        time:   [7.3388 µs 7.3408 µs 7.3428 µs]
                        thrpt:  [278.91 Melem/s 278.99 Melem/s 279.06 Melem/s]
                 change:
                        time:   [-10.183% -10.060% -9.9402%] (p = 0.00 < 0.05)
                        thrpt:  [+11.037% +11.185% +11.337%]
                        Performance has improved.

vi_nfd_to_nfd_utf16/icu4x                                                                             
                        time:   [6.7638 µs 6.7926 µs 6.8147 µs]
                        thrpt:  [300.53 Melem/s 301.51 Melem/s 302.79 Melem/s]
                 change:
                        time:   [-5.9909% -5.4787% -4.8880%] (p = 0.00 < 0.05)
                        thrpt:  [+5.1392% +5.7963% +6.3727%]
                        Performance has improved.

vi_nfd_to_nfc_utf16/icu4x                                                                             
                        time:   [21.887 µs 21.897 µs 21.908 µs]
                        thrpt:  [119.09 Melem/s 119.15 Melem/s 119.21 Melem/s]
                 change:
                        time:   [-6.6439% -6.5689% -6.4944%] (p = 0.00 < 0.05)
                        thrpt:  [+6.9454% +7.0308% +7.1167%]
                        Performance has improved.

vi_orthographic_to_nfc_utf16/icu4x                                                                             
                        time:   [19.753 µs 19.766 µs 19.780 µs]
                        thrpt:  [120.63 Melem/s 120.71 Melem/s 120.79 Melem/s]
                 change:
                        time:   [-2.8159% -2.7400% -2.6556%] (p = 0.00 < 0.05)
                        thrpt:  [+2.7280% +2.8172% +2.8974%]
                        Performance has improved.

vi_orthographic_to_nfd_utf16/icu4x                                                                             
                        time:   [7.0146 µs 7.0182 µs 7.0223 µs]
                        thrpt:  [339.78 Melem/s 339.97 Melem/s 340.15 Melem/s]
                 change:
                        time:   [-12.492% -12.445% -12.397%] (p = 0.00 < 0.05)
                        thrpt:  [+14.151% +14.214% +14.275%]
                        Performance has improved.

zh_nfc_to_nfc_utf16/icu4x                                                                             
                        time:   [3.2568 µs 3.2577 µs 3.2586 µs]
                        thrpt:  [628.49 Melem/s 628.67 Melem/s 628.83 Melem/s]
                 change:
                        time:   [-35.288% -35.198% -35.146%] (p = 0.00 < 0.05)
                        thrpt:  [+54.194% +54.317% +54.530%]
                        Performance has improved.

zh_nfc_to_nfd_utf16/icu4x                                                                             
                        time:   [2.8441 µs 2.8452 µs 2.8464 µs]
                        thrpt:  [719.50 Melem/s 719.80 Melem/s 720.09 Melem/s]
                 change:
                        time:   [-38.993% -38.911% -38.836%] (p = 0.00 < 0.05)
                        thrpt:  [+63.495% +63.696% +63.914%]
                        Performance has improved.

zh_nfd_to_nfd_utf16/icu4x                                                                             
                        time:   [2.8525 µs 2.8540 µs 2.8555 µs]
                        thrpt:  [717.21 Melem/s 717.59 Melem/s 717.97 Melem/s]
                 change:
                        time:   [-39.014% -38.907% -38.811%] (p = 0.00 < 0.05)
                        thrpt:  [+63.429% +63.685% +63.971%]
                        Performance has improved.

zh_nfd_to_nfc_utf16/icu4x                                                                             
                        time:   [3.2835 µs 3.2847 µs 3.2860 µs]
                        thrpt:  [623.56 Melem/s 623.80 Melem/s 624.02 Melem/s]
                 change:
                        time:   [-34.955% -34.935% -34.913%] (p = 0.00 < 0.05)
                        thrpt:  [+53.641% +53.693% +53.739%]
                        Performance has improved.

@hsivonen hsivonen added A-performance Area: Performance (CPU, Memory) C-collator Component: Collation, normalization 2.0-breaking Changes that are breaking API changes labels Nov 13, 2024
@hsivonen
Copy link
Member Author

@hsivonen
Copy link
Member Author

ICU4C PR: unicode-org/icu#3269

Manishearth
Manishearth previously approved these changes Nov 13, 2024
Copy link
Member

@Manishearth Manishearth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Landable for the purpose of 2.0, but I think this could have a couple more pointers in the docs and be more encapsulated.

components/normalizer/trie-value-format.md Show resolved Hide resolved
/// Getting a zero from this trie means that you need
/// to make another lookup from `DecompositionDataV1::trie`.
pub struct DecompositionDataV2<'data> {
/// Trie for decomposition.
#[cfg_attr(feature = "serde", serde(borrow))]
pub trie: CodePointTrie<'data, u32>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: I feel like the packed code logic is all scattered. Can we use a structured NormalizationTrieValue(pub u32) type that has convenience methods for getting all the fields?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that what you suggest would be better for encapsulation. However, given that prior to this PR there was no such encapsulation and I'm already way over my time budget for this, I would very much prefer landing this ASAP (before 2.0 and before this bitrots) without such a refactoring and leaving the refactoring as a follow-up.

components/normalizer/src/provider.rs Show resolved Hide resolved
@hsivonen hsivonen added the discuss-priority Discuss at the next ICU4X meeting label Nov 14, 2024
@hsivonen
Copy link
Member Author

hsivonen commented Dec 11, 2024

Once the ICU4C side lands, this PR needs an update to take a newer export zip in datagen.

Manishearth
Manishearth previously approved these changes Dec 11, 2024
@sffc
Copy link
Member

sffc commented Dec 16, 2024

I realized while updating the tag that I think this change makes datagen incompatible with older ICU tags. The data coming from icuexportdata changed structure; it wasn't just some new additions. Are we ok with that?

@sffc sffc changed the title Improve normalizer performance by adjusting the trie value format Change icuexportdata trie format to improve normalizer performance Dec 16, 2024
@sffc
Copy link
Member

sffc commented Dec 16, 2024

I pushed 4 commits to the branch (ignore the force-push; it was to fix a merge issue I made and did not change any of hsivonen's commits). CI is now all green except for a clippy issue.

@Manishearth
Copy link
Member

I'm fine with this breaking for 2.0.

@hsivonen
Copy link
Member Author

hsivonen commented Dec 17, 2024

I realized while updating the tag that I think this change makes datagen incompatible with older ICU tags. The data coming from icuexportdata changed structure; it wasn't just some new additions. Are we ok with that?

I think we pretty much have to be. The performance improvement here is just too good to reject in order to enable the use of old data (which would seem like a very niche use case if there even is a use case). Also, my understanding from prior discussions was that we had agreed we're OK with a data compatibility break like this for 2.0.

Thanks for updating this with the new data export.

@hsivonen hsivonen requested a review from Manishearth December 17, 2024 09:00
@hsivonen hsivonen merged commit 5f103cb into unicode-org:main Dec 18, 2024
28 checks passed
@hsivonen hsivonen deleted the normalizerdata branch December 18, 2024 12:25
@hsivonen
Copy link
Member Author

Thanks! Landed.

@robertbastian
Copy link
Member

robertbastian commented Jan 7, 2025

Why was half of this change done in ICU4C? Couldn't this transformation have been applied in datagen? I'm asking this for two reasons:

Edit: and

@hsivonen
Copy link
Member Author

hsivonen commented Jan 7, 2025

Why was half of this change done in ICU4C?

Because we started icuexportdata early on by putting the trie builder side is in the ICU4C repo.

Couldn't this transformation have been applied in datagen?

That would have involved keeping around at least part of the ICU4X 1.5 code for interpreting the old data while rewriting parts of icuexportdata in Rust.

Implementing #4602 for the normalizer could make sense as a RIIR project, but then it would make sense to work from UCD and not from whatever the shape of runtime data happened to be at ICU4X 1.5. (But, as discussed previously, the hardest part that makes ICU4X dependent on ICU4C is the collation data builder.)

Part of the problem that this PR fixed is that the ICU4X 1.x normalizer data format was a decomposing normalizer format plus hacks enable a composing normalizer instead of being designed to support a composing normalizer. It wasn't at all designed to support transformation by datagen.

  • This would have kept the icuexportdata format stable

I think freezing the icuexportdata format for normalization as it happened to be at ICU4X 1.5 would have been anti-useful, since the format isn't meant to be transformed by datagen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.0-breaking Changes that are breaking API changes A-performance Area: Performance (CPU, Memory) C-collator Component: Collation, normalization
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants