Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deps: add simdutf dependency #45803

Closed
wants to merge 3 commits into from
Closed

Conversation

anonrig
Copy link
Member

@anonrig anonrig commented Dec 9, 2022

simdutf provides a faster way of providing utf8 operations with SIMD instructions. @nodejs/undici team was looking for a way to validate utf8 input, and this dependency can make it happen.

Edit: I'm proposing either exposing the following functionality through a new module (like node:encoding) or through util.types or buffer

  • validate_ascii(string)
  • validate_utf8(string)
  • count_utf8(string)

PS: simdutf supports more features, and depending on the need, it makes more sense to expose them through a new module, instead of util.types or buffer.

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/gyp

@nodejs-github-bot nodejs-github-bot added build Issues and PRs related to build files or the CI. dependencies Pull requests that update a dependency file. needs-ci PRs that need a full CI run. tools Issues and PRs related to the tools directory. labels Dec 9, 2022
@anonrig anonrig force-pushed the deps/simdutf branch 2 times, most recently from 2c20c9a to 6daa546 Compare December 9, 2022 21:09
@anonrig anonrig changed the title dep: add simdutf dependency deps: add simdutf dependency Dec 9, 2022
@KhafraDev
Copy link
Member

This would help speedup both ws and undici's WebSocket implementation (which is still WIP). When we receive a text frame or receive a close frame with a reason, we need to validate that the buffer contains valid utf-8.

There are a few ways of doing so currently: a js implementation by default in both undici and ws, and optionally a package such as utf-8-validate. Note that simdutf is many times faster than the c++ version of utf-8-validate in the benchmark above, and the js fallback version is the slowest.

Here is a PR from @lpinca that shows massive speedups when using simdutf: websockets/utf-8-validate#101. Considering how widespread usage of ws is, exposing a very fast ability to validate utf-8 would improve a ton of the ecosystem.

@anonrig anonrig force-pushed the deps/simdutf branch 2 times, most recently from 5027cae to e94ba5f Compare December 9, 2022 21:29
@richardlau richardlau added the request-ci Add this label to start a Jenkins CI on a PR. label Dec 9, 2022
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Dec 9, 2022
@nodejs-github-bot
Copy link
Collaborator

@anonrig anonrig force-pushed the deps/simdutf branch 2 times, most recently from bed88cc to 4269faf Compare December 9, 2022 23:26
deps/simdutf/simdutf.gyp Outdated Show resolved Hide resolved
@anonrig anonrig force-pushed the deps/simdutf branch 3 times, most recently from ced7ef2 to 5566c99 Compare December 10, 2022 02:53
@anonrig anonrig added the request-ci Add this label to start a Jenkins CI on a PR. label Dec 10, 2022
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Dec 10, 2022
@nodejs-github-bot
Copy link
Collaborator

@anonrig anonrig added the request-ci Add this label to start a Jenkins CI on a PR. label Dec 10, 2022
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Dec 10, 2022
RafaelGSS added a commit that referenced this pull request Jan 2, 2023
Notable changes:

buffer:
  * (SEMVER-MINOR) add buffer.isUtf8 for utf8 validation (Yagiz Nizipli) #45947
deps:
  * disable avx512 for simutf on benchmark ci (Yagiz Nizipli) #45803
  * add simdutf dependency (Yagiz Nizipli) #45803
http:
  * (SEMVER-MINOR) improved timeout defaults handling (Paolo Insogna) #45778
net
  * add autoSelectFamily global getter and setter (Paolo Insogna) #45777
os:
  * (SEMVER-MINOR) add availableParallelism() (Colin Ihrig) #45895
util:
  * add fast path for text-decoder fatal flag (Yagiz Nizipli) #45803

PR-URL: #46061
RafaelGSS added a commit that referenced this pull request Jan 2, 2023
Notable changes:

buffer:
  * (SEMVER-MINOR) add buffer.isUtf8 for utf8 validation (Yagiz Nizipli) #45947
deps:
  * disable avx512 for simutf on benchmark ci (Yagiz Nizipli) #45803
  * add simdutf dependency (Yagiz Nizipli) #45803
http:
  * (SEMVER-MINOR) improved timeout defaults handling (Paolo Insogna) #45778
net
  * add autoSelectFamily global getter and setter (Paolo Insogna) #45777
os:
  * (SEMVER-MINOR) add availableParallelism() (Colin Ihrig) #45895
util:
  * add fast path for text-decoder fatal flag (Yagiz Nizipli) #45803

PR-URL: #46061
RafaelGSS added a commit that referenced this pull request Jan 2, 2023
Notable changes:

buffer:
  * (SEMVER-MINOR) add buffer.isUtf8 for utf8 validation (Yagiz Nizipli) #45947
deps:
  * disable avx512 for simutf on benchmark ci (Yagiz Nizipli) #45803
  * add simdutf dependency (Yagiz Nizipli) #45803
http:
  * (SEMVER-MINOR) improved timeout defaults handling (Paolo Insogna) #45778
net
  * add autoSelectFamily global getter and setter (Paolo Insogna) #45777
os:
  * (SEMVER-MINOR) add availableParallelism() (Colin Ihrig) #45895
util:
  * add fast path for text-decoder fatal flag (Yagiz Nizipli) #45803

PR-URL: #46061
RafaelGSS added a commit that referenced this pull request Jan 2, 2023
Notable changes:

buffer:
  * (SEMVER-MINOR) add buffer.isUtf8 for utf8 validation (Yagiz Nizipli) #45947
deps:
  * disable avx512 for simutf on benchmark ci (Yagiz Nizipli) #45803
  * add simdutf dependency (Yagiz Nizipli) #45803
http:
  * (SEMVER-MINOR) improved timeout defaults handling (Paolo Insogna) #45778
net
  * add autoSelectFamily global getter and setter (Paolo Insogna) #45777
os:
  * (SEMVER-MINOR) add availableParallelism() (Colin Ihrig) #45895
util:
  * add fast path for text-decoder fatal flag (Yagiz Nizipli) #45803

PR-URL: #46061
RafaelGSS added a commit that referenced this pull request Jan 3, 2023
Notable changes:

buffer:
  * (SEMVER-MINOR) add buffer.isUtf8 for utf8 validation (Yagiz Nizipli) #45947
deps:
  * disable avx512 for simutf on benchmark ci (Yagiz Nizipli) #45803
  * add simdutf dependency (Yagiz Nizipli) #45803
http:
  * (SEMVER-MINOR) improved timeout defaults handling (Paolo Insogna) #45778
net
  * add autoSelectFamily global getter and setter (Paolo Insogna) #45777
os:
  * (SEMVER-MINOR) add availableParallelism() (Colin Ihrig) #45895
util:
  * add fast path for text-decoder fatal flag (Yagiz Nizipli) #45803

PR-URL: #46061
RafaelGSS pushed a commit that referenced this pull request Jan 4, 2023
PR-URL: #45803
Reviewed-By: Robert Nagy <ronagy@icloud.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Michael Dawson <midawson@redhat.com>
RafaelGSS pushed a commit that referenced this pull request Jan 4, 2023
PR-URL: #45803
Reviewed-By: Robert Nagy <ronagy@icloud.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Michael Dawson <midawson@redhat.com>
RafaelGSS pushed a commit that referenced this pull request Jan 4, 2023
Co-authored-by: Daniel Lemire <daniel@lemire.me>
PR-URL: #45803
Reviewed-By: Robert Nagy <ronagy@icloud.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Michael Dawson <midawson@redhat.com>
RafaelGSS added a commit that referenced this pull request Jan 4, 2023
Notable changes:

buffer:
  * (SEMVER-MINOR) add buffer.isUtf8 for utf8 validation (Yagiz Nizipli) #45947
deps:
  * disable avx512 for simutf on benchmark ci (Yagiz Nizipli) #45803
  * add simdutf dependency (Yagiz Nizipli) #45803
http:
  * (SEMVER-MINOR) improved timeout defaults handling (Paolo Insogna) #45778
net
  * add autoSelectFamily global getter and setter (Paolo Insogna) #45777
os:
  * (SEMVER-MINOR) add availableParallelism() (Colin Ihrig) #45895
util:
  * add fast path for text-decoder fatal flag (Yagiz Nizipli) #45803

PR-URL: #46061
RafaelGSS added a commit that referenced this pull request Jan 4, 2023
Notable changes:

buffer:
  * (SEMVER-MINOR) add buffer.isUtf8 for utf8 validation (Yagiz Nizipli) #45947
deps:
  * disable avx512 for simutf on benchmark ci (Yagiz Nizipli) #45803
  * add simdutf dependency (Yagiz Nizipli) #45803
http:
  * (SEMVER-MINOR) improved timeout defaults handling (Paolo Insogna) #45778
net
  * add autoSelectFamily global getter and setter (Paolo Insogna) #45777
os:
  * (SEMVER-MINOR) add availableParallelism() (Colin Ihrig) #45895
util:
  * add fast path for text-decoder fatal flag (Yagiz Nizipli) #45803

PR-URL: #46061
RafaelGSS added a commit that referenced this pull request Jan 4, 2023
Notable changes:

buffer:
  * (SEMVER-MINOR) add buffer.isUtf8 for utf8 validation (Yagiz Nizipli) #45947
http:
  * (SEMVER-MINOR) improved timeout defaults handling (Paolo Insogna) #45778
net
  * add autoSelectFamily global getter and setter (Paolo Insogna) #45777
os:
  * (SEMVER-MINOR) add availableParallelism() (Colin Ihrig) #45895
util:
  * add fast path for text-decoder fatal flag (Yagiz Nizipli) #45803

PR-URL: #46061
RafaelGSS added a commit that referenced this pull request Jan 4, 2023
Notable changes:

buffer:
  * (SEMVER-MINOR) add buffer.isUtf8 for utf8 validation (Yagiz Nizipli) #45947
http:
  * (SEMVER-MINOR) improved timeout defaults handling (Paolo Insogna) #45778
net
  * add autoSelectFamily global getter and setter (Paolo Insogna) #45777
os:
  * (SEMVER-MINOR) add availableParallelism() (Colin Ihrig) #45895
util:
  * add fast path for text-decoder fatal flag (Yagiz Nizipli) #45803

PR-URL: #46061
RafaelGSS pushed a commit that referenced this pull request Jan 5, 2023
PR-URL: #45803
Reviewed-By: Robert Nagy <ronagy@icloud.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Michael Dawson <midawson@redhat.com>
RafaelGSS pushed a commit that referenced this pull request Jan 5, 2023
PR-URL: #45803
Reviewed-By: Robert Nagy <ronagy@icloud.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Michael Dawson <midawson@redhat.com>
RafaelGSS pushed a commit that referenced this pull request Jan 5, 2023
Co-authored-by: Daniel Lemire <daniel@lemire.me>
PR-URL: #45803
Reviewed-By: Robert Nagy <ronagy@icloud.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Michael Dawson <midawson@redhat.com>
RafaelGSS added a commit that referenced this pull request Jan 5, 2023
Notable changes:

buffer:
  * (SEMVER-MINOR) add buffer.isUtf8 for utf8 validation (Yagiz Nizipli) #45947
http:
  * (SEMVER-MINOR) improved timeout defaults handling (Paolo Insogna) #45778
net
  * add autoSelectFamily global getter and setter (Paolo Insogna) #45777
os:
  * (SEMVER-MINOR) add availableParallelism() (Colin Ihrig) #45895
util:
  * add fast path for text-decoder fatal flag (Yagiz Nizipli) #45803

PR-URL: #46061
RafaelGSS added a commit that referenced this pull request Jan 5, 2023
Notable changes:

buffer:
  * (SEMVER-MINOR) add buffer.isUtf8 for utf8 validation (Yagiz Nizipli) #45947
http:
  * (SEMVER-MINOR) improved timeout defaults handling (Paolo Insogna) #45778
net
  * add autoSelectFamily global getter and setter (Paolo Insogna) #45777
os:
  * (SEMVER-MINOR) add availableParallelism() (Colin Ihrig) #45895
util:
  * add fast path for text-decoder fatal flag (Yagiz Nizipli) #45803

PR-URL: #46061
RafaelGSS added a commit that referenced this pull request Jan 6, 2023
Notable changes:

buffer:
  * (SEMVER-MINOR) add buffer.isUtf8 for utf8 validation (Yagiz Nizipli) #45947
http:
  * (SEMVER-MINOR) improved timeout defaults handling (Paolo Insogna) #45778
net
  * add autoSelectFamily global getter and setter (Paolo Insogna) #45777
os:
  * (SEMVER-MINOR) add availableParallelism() (Colin Ihrig) #45895
util:
  * add fast path for text-decoder fatal flag (Yagiz Nizipli) #45803

PR-URL: #46061
juanarbol pushed a commit that referenced this pull request Jan 26, 2023
PR-URL: #45803
Reviewed-By: Robert Nagy <ronagy@icloud.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Michael Dawson <midawson@redhat.com>
juanarbol pushed a commit that referenced this pull request Jan 26, 2023
PR-URL: #45803
Reviewed-By: Robert Nagy <ronagy@icloud.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Michael Dawson <midawson@redhat.com>
juanarbol pushed a commit that referenced this pull request Jan 26, 2023
Co-authored-by: Daniel Lemire <daniel@lemire.me>
PR-URL: #45803
Reviewed-By: Robert Nagy <ronagy@icloud.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Michael Dawson <midawson@redhat.com>
juanarbol added a commit that referenced this pull request Jan 28, 2023
Notable changes:

* buffer
  * (SEMVER-MINOR) add buffer.isUtf8 for utf8 validation (Yagiz Nizipli) #45947
* deps:
  * disable avx512 for simutf on benchmark ci (Yagiz Nizipli) #45803
  * add simdutf dependency (Yagiz Nizipli) #45803
  * upgrade npm to 9.1.3 (npm team) #45693
* util:
  * add fast path for text-decoder fatal flag (Yagiz Nizipli) #45803

PR-URL: TBD
@juanarbol juanarbol mentioned this pull request Jan 28, 2023
juanarbol added a commit that referenced this pull request Jan 28, 2023
Notable changes:

* buffer
  * (SEMVER-MINOR) add buffer.isUtf8 for utf8 validation (Yagiz Nizipli) #45947
* deps:
  * disable avx512 for simutf on benchmark ci (Yagiz Nizipli) #45803
  * add simdutf dependency (Yagiz Nizipli) #45803
  * upgrade npm to 9.1.3 (npm team) #45693
* util:
  * add fast path for text-decoder fatal flag (Yagiz Nizipli) #45803

PR-URL: #46396
juanarbol added a commit that referenced this pull request Jan 28, 2023
Notable changes:

* buffer
  * (SEMVER-MINOR) add buffer.isUtf8 for utf8 validation (Yagiz Nizipli) #45947
* deps:
  * disable avx512 for simutf on benchmark ci (Yagiz Nizipli) #45803
  * add simdutf dependency (Yagiz Nizipli) #45803
  * upgrade npm to 9.1.3 (npm team) #45693
* util:
  * add fast path for text-decoder fatal flag (Yagiz Nizipli) #45803

PR-URL: #46396
juanarbol added a commit that referenced this pull request Jan 30, 2023
Notable changes:

* buffer
  * (SEMVER-MINOR) add buffer.isUtf8 for utf8 validation (Yagiz Nizipli) #45947
* deps:
  * add simdutf dependency (Yagiz Nizipli) #45803
  * upgrade npm to 9.1.3 (npm team) #45693
* util:
  * add fast path for text-decoder fatal flag (Yagiz Nizipli) #45803

PR-URL: #46396
juanarbol pushed a commit that referenced this pull request Jan 31, 2023
PR-URL: #45803
Reviewed-By: Robert Nagy <ronagy@icloud.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Michael Dawson <midawson@redhat.com>
juanarbol pushed a commit that referenced this pull request Jan 31, 2023
PR-URL: #45803
Reviewed-By: Robert Nagy <ronagy@icloud.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Michael Dawson <midawson@redhat.com>
juanarbol pushed a commit that referenced this pull request Jan 31, 2023
Co-authored-by: Daniel Lemire <daniel@lemire.me>
PR-URL: #45803
Reviewed-By: Robert Nagy <ronagy@icloud.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Michael Dawson <midawson@redhat.com>
juanarbol added a commit that referenced this pull request Jan 31, 2023
Notable changes:

* buffer
  * (SEMVER-MINOR) add buffer.isUtf8 for utf8 validation (Yagiz Nizipli) #45947
* deps:
  * add simdutf dependency (Yagiz Nizipli) #45803
  * upgrade npm to 9.1.3 (npm team) #45693
* util:
  * add fast path for text-decoder fatal flag (Yagiz Nizipli) #45803

PR-URL: #46396
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author ready PRs that have at least one approval, no pending requests for changes, and a CI started. build Issues and PRs related to build files or the CI. commit-queue-rebase Add this label to allow the Commit Queue to land a PR in several commits. dependencies Pull requests that update a dependency file. needs-ci PRs that need a full CI run. notable-change PRs with changes that should be highlighted in changelogs. performance Issues and PRs related to the performance of Node.js. review wanted PRs that need reviews. tools Issues and PRs related to the tools directory.
Projects
None yet
Development

Successfully merging this pull request may close these issues.