Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add noAssert to Buffer::write #4126

Closed
kyriosli opened this issue Dec 3, 2015 · 7 comments
Closed

Add noAssert to Buffer::write #4126

kyriosli opened this issue Dec 3, 2015 · 7 comments
Labels
buffer Issues and PRs related to the buffer subsystem. feature request Issues that request new features to be added to Node.js.

Comments

@kyriosli
Copy link

kyriosli commented Dec 3, 2015

Currently the new Buffer(string) constructor calls utf8Length and write, which can be very slow, because Buffer::write calls StringBytes::Write and which calls v8::String::WriteUtf8 with a buffer address and its capacity same as the utf8Length of the string. But v8::String::WriteUtf8 will be very fast if passed with a capacity equals to or larger than three times of the string length (see here).

So I think, why don't we manually make sure the capacity of the buffer is large enough, and pass a larger capacity (for example, three times of the string length) to the WriteUtf8 method, to make it faster?

@mscdex mscdex added buffer Issues and PRs related to the buffer subsystem. feature request Issues that request new features to be added to Node.js. labels Dec 3, 2015
@kyriosli
Copy link
Author

kyriosli commented Dec 3, 2015

I read the v8 source again and found the trick is that WriteUtf8 calls itself again with NO_NULL_TERMINATION set to the option bits, when the buffer capacity is precisely same as utf8Length of the string. So the good part is it does not need to call String::Flatten to flatten string parts, but the bad part is it calls Utf8Length again which may be time consuming.

@trevnorris
Copy link
Contributor

No doubt there's more we can do here. For example we could short circuit the String::Utf8Length() call by checking String::IsOneByte() first, and remove the need to allocate more memory than necessary. The same check could be used in StringBytes::Write() with UTF8 encoding to use String::WriteOneByte() instead. A simple mock-up I did showed good performance gains doing this.

Thanks for posting the issue. I'll look further into what can be done here.

@trevnorris
Copy link
Contributor

Nevermind about partially what I said. IsOneByte() and ContainsOnlyOneByte() won't work b/c utf8 will encode the > 128 bytes of characters differently. It would have worked if the call had been something like IsAscii().

@trevnorris
Copy link
Contributor

Just tested the IsAscii() idea and in the case that the text contains only ASCII characters, but user uses utf8 encoding, the write is 20% faster. Problem is if the string contains a multibyte character then writing takes up to 15% longer.

@kyriosli
Copy link
Author

kyriosli commented Dec 6, 2015

So the problem is we call byteLength repeatedly. Maybe a slight modification will solve the problem, what if we check the encoding right after the byteLength is calculated? For example:

// src/lib/buffer.js
function fromString(string, encoding) {
  ...

  var length = byteLength(string, encoding);
  if(length === string.length && encoding === 'utf8') {
    encoding = 'binary'
  }

And it should make a difference because this call is avoided.

The next two problems are:

  • createFromString calls node::Buffer::New(Isolate*, Local<String>, encoding) which calls StringBytes::Size and StringBytes::Write, and those two methods also call utf8Length
  • Buffer::write calls StringBytes::Write which calls utf8Length

@jasnell
Copy link
Member

jasnell commented Mar 22, 2016

What's the status on this one?

@Trott
Copy link
Member

Trott commented Jul 7, 2017

Pull request demonstrating an improved implementation would be appreciated, of course, Otherwise, this issue has been inactive for sufficiently long that it seems like perhaps it should be closed. Feel free to re-open (or leave a comment requesting that it be re-opened) if you disagree. I'm just tidying up and not acting on a super-strong opinion or anything like that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
buffer Issues and PRs related to the buffer subsystem. feature request Issues that request new features to be added to Node.js.
Projects
None yet
Development

No branches or pull requests

5 participants