feat(NODE-5909): optimize writing basic latin strings #645

nbbeeken · 2024-02-08T19:05:43Z

Description

What is changing?

This is the inverse opimization made in: #642

Should have thought of it sooner :)

Removed fromUTF8 dead code, we always call encodeInto which is BYOB (bring your own buffer 🍻)

Is there new documentation needed for these changes?

No

What is the motivation for this change?

When strings are short and are only basic latin, it is faster to iterate length of the string and call charCodeAt to obtain the UTF8 byte value.

Release Highlight

BSON short basic latin string writing performance improved!

The BSON library's string encoding logic now attempts to optimize for basic latin (ASCII) characters. This will apply to both BSON keys and BSON values that are or contain strings. If strings are less than 6 bytes we observed approximately 100% increase in speed while around 24 bytes the performance was about 33% better. For any non-basic latin bytes or at 25 bytes or greater the BSON library will continue to use Node.js' Buffer.toString API.

The intent is to generally target the serialization of BSON keys which are often short and only use basic latin.

Double check the following

Ran npm run check:lint script
Self-review completed using the steps outlined here
PR title follows the correct format: type(NODE-xxxx)[!]: description
- Example: feat(NODE-1234)!: rewriting everything in coffeescript
Changes are covered by tests
New TODOs have a related JIRA ticket

nbbeeken · 2024-02-09T20:54:02Z

The test case was repeating 'a' fed to BSON like so:

BSON.serialize({ _id: string });

These benchmarks were run without a string length limit

Node.js v20.11.0
OS: linux
CPUs: 8
Arch: x64
RAM: 33.105743872 GB

See basic latin benchmarks!

testing string of length: 0
do_nothing x 850,495,953 ops/sec ±0.07% (196 runs sampled)
bson_serialize_current x 1,440,612 ops/sec ±0.86% (190 runs sampled)
bson_serialize_latin_writing x 2,465,289 ops/sec ±0.80% (194 runs sampled)
Fastest to slowest is do_nothing, bson_serialize_latin_writing, bson_serialize_current

testing string of length: 5
do_nothing x 851,197,943 ops/sec ±0.07% (196 runs sampled)
bson_serialize_current x 1,409,583 ops/sec ±0.34% (192 runs sampled)
bson_serialize_latin_writing x 2,256,285 ops/sec ±0.88% (193 runs sampled)
Fastest to slowest is do_nothing, bson_serialize_latin_writing, bson_serialize_current

testing string of length: 10
do_nothing x 851,116,772 ops/sec ±0.08% (196 runs sampled)
bson_serialize_current x 1,386,497 ops/sec ±0.33% (191 runs sampled)
bson_serialize_latin_writing x 2,174,389 ops/sec ±0.96% (192 runs sampled)
Fastest to slowest is do_nothing, bson_serialize_latin_writing, bson_serialize_current

testing string of length: 15
do_nothing x 566,973,633 ops/sec ±17.59% (134 runs sampled)
bson_serialize_current x 1,362,566 ops/sec ±0.39% (191 runs sampled)
bson_serialize_latin_writing x 2,070,277 ops/sec ±0.87% (192 runs sampled)
Fastest to slowest is do_nothing, bson_serialize_latin_writing, bson_serialize_current

testing string of length: 25
do_nothing x 150,902,887 ops/sec ±8.73% (115 runs sampled)
bson_serialize_current x 1,374,990 ops/sec ±0.36% (191 runs sampled)
bson_serialize_latin_writing x 1,921,850 ops/sec ±0.82% (192 runs sampled)
Fastest to slowest is do_nothing, bson_serialize_latin_writing, bson_serialize_current

testing string of length: 50
do_nothing x 133,791,776 ops/sec ±2.94% (177 runs sampled)
bson_serialize_current x 771,222 ops/sec ±0.96% (177 runs sampled)
bson_serialize_latin_writing x 804,194 ops/sec ±1.11% (177 runs sampled)
Fastest to slowest is do_nothing, bson_serialize_latin_writing, bson_serialize_current

testing string of length: 500
do_nothing x 129,510,424 ops/sec ±3.59% (175 runs sampled)
bson_serialize_current x 488,674 ops/sec ±1.24% (156 runs sampled)
bson_serialize_latin_writing x 334,235 ops/sec ±1.23% (181 runs sampled)
Fastest to slowest is do_nothing, bson_serialize_current, bson_serialize_latin_writing

testing string of length: 1000
do_nothing x 138,587,441 ops/sec ±3.02% (123 runs sampled)
bson_serialize_current x 411,147 ops/sec ±1.16% (164 runs sampled)
bson_serialize_latin_writing x 204,873 ops/sec ±1.62% (184 runs sampled)
Fastest to slowest is do_nothing, bson_serialize_current, bson_serialize_latin_writing

Ran the same tests again sticking a é at the end of each string, demonstrating the cost of bailing out and redoing some of the work in Buffer.toString

See advanced latin benchmarks!

... omitted smaller cases ... demonstrated no difference because strings were so small

testing string of length: 501
do_nothing x 134,189,343 ops/sec ±3.65% (116 runs sampled)
bson_serialize_current x 471,620 ops/sec ±1.18% (173 runs sampled)
bson_serialize_latin_writing x 278,600 ops/sec ±0.87% (185 runs sampled)
Fastest to slowest is do_nothing, bson_serialize_current, bson_serialize_latin_writing

testing string of length: 1001
do_nothing x 139,626,049 ops/sec ±3.39% (124 runs sampled)
bson_serialize_current x 414,861 ops/sec ±1.00% (167 runs sampled)
bson_serialize_latin_writing x 167,419 ops/sec ±1.27% (185 runs sampled)
Fastest to slowest is do_nothing, bson_serialize_current, bson_serialize_latin_writing

Between these two benchmarks results, I decided to take this improvement the same route as the parsing one and focus on small strings only. I had considered that we could stay in the charCodeAt code all the way until we encounter a non basic latin character, and slice the string at that index and let Node.js do the rest. However, the 500 and 1000 character case in the first set of tests show it does not pay to only use charCodeAt, for "large" strings (500 can hardly be called "large"). So again we're mainly targeting BSON strings getting a boost.

nbbeeken · 2024-02-09T21:19:27Z

All the BSON granular benchmarks run with this change

src/utils/latin.ts

nbbeeken force-pushed the NODE-5909-latin-write branch from a5300ad to 2e612cf Compare February 9, 2024 16:40

nbbeeken marked this pull request as ready for review February 9, 2024 20:54

nbbeeken requested a review from addaleax February 9, 2024 20:54

nbbeeken added 2 commits February 9, 2024 15:55

chore: remove dead fromUTF8 code

00e62a8

feat(NODE-5909): optimize writing basic latin strings

1214643

nbbeeken force-pushed the NODE-5909-latin-write branch from 2e612cf to 1214643 Compare February 9, 2024 20:55

addaleax previously approved these changes Feb 9, 2024

View reviewed changes

src/utils/latin.ts Outdated Show resolved Hide resolved

refactor: rename try basic latin functions to be consistent

3aca300

nbbeeken dismissed addaleax’s stale review via 3aca300 February 9, 2024 21:45

addaleax approved these changes Feb 9, 2024

View reviewed changes

durran self-assigned this Feb 12, 2024

durran added the Primary Review In Review with primary reviewer, not yet ready for team's eyes label Feb 12, 2024

durran approved these changes Feb 12, 2024

View reviewed changes

durran added Team Review Needs review from team and removed Primary Review In Review with primary reviewer, not yet ready for team's eyes labels Feb 12, 2024

durran merged commit ec51256 into main Feb 13, 2024
4 checks passed

durran deleted the NODE-5909-latin-write branch February 13, 2024 09:10

github-actions bot mentioned this pull request Feb 13, 2024

chore(main): release 6.4.0 [skip-ci] #644

Merged

billouboq mentioned this pull request Feb 13, 2024

Why do Mongoose v6 and v7 cause a significant increase in garbage collections and event loop latency compared to v5? Automattic/mongoose#13456

Closed

2 tasks

github-actions bot mentioned this pull request May 7, 2024

chore(main): release 7.0.0 [skip-ci] #687

Closed

This was referenced May 25, 2024

[Snyk] Upgrade bson from 6.3.0 to 6.7.0 boytur/server-posyayee-v1#51

Closed

[Snyk] Upgrade bson from 6.3.0 to 6.7.0 boytur/server-posyayee-v1#54

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(NODE-5909): optimize writing basic latin strings #645

feat(NODE-5909): optimize writing basic latin strings #645

nbbeeken commented Feb 8, 2024 •

edited

Loading

nbbeeken commented Feb 9, 2024

nbbeeken commented Feb 9, 2024

feat(NODE-5909): optimize writing basic latin strings #645

feat(NODE-5909): optimize writing basic latin strings #645

Conversation

nbbeeken commented Feb 8, 2024 • edited Loading

Description

What is changing?

Is there new documentation needed for these changes?

What is the motivation for this change?

Release Highlight

BSON short basic latin string writing performance improved!

Double check the following

nbbeeken commented Feb 9, 2024

nbbeeken commented Feb 9, 2024

nbbeeken commented Feb 8, 2024 •

edited

Loading