Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Nan::Utf8String over String::Utf8Value #54

Closed
springmeyer opened this issue Aug 1, 2017 · 8 comments
Closed

Use Nan::Utf8String over String::Utf8Value #54

springmeyer opened this issue Aug 1, 2017 · 8 comments

Comments

@springmeyer
Copy link
Contributor

springmeyer commented Aug 1, 2017

A common way to get data from a v8::String to pass to C++ is:

std::string str(*v8::String::Utf8Value(info[0].As<String>());

While concise (by v8/c++ terms) this can be improved. Instead of v8::String::Utf8Value we can use Nan::Utf8String and we should validate the input before trying to create a std::string by:

  • ensuring the info[0] actually is a string
  • ensuring that if it is a string, then the length is > 0 or the string data is null (more on this below)

The Nan::Utf8String api both validates strings internally and is fast. It has an optimized conversion routine similar to node core. And it works across all recent node versions.

So we should convert all usage of String::Utf8Value to Nan::Utf8String. Like the String::Utf8Value interface, the Nan::Utf8String API exposes is the ability to check the length of the string before accessing in the value: this is a very good idea for safety and robustness. We don't want to risk passing a null string around, which might create unpredictable behavior or a crash like in mapbox/node-fontnik#124.

To keep 100% test coverage, after adding lines that check the line, we will need to add a test that tries to pass an empty string, and will need to assert reasonable behavior in this case. So, let's move to code like this:

  Nan::Utf8String utf8_value(info[0]);
  int len = utf8_value.length();
  if (len <= 0) {
     return Nan::ThrowTypeError("arg must be a non-empty string");
  }
  /// work with string data here....
  // e.g. convert to a std::string
  std::string string_copy(*utf8_value, len);

An additional (albeit minor unless the string is massive) performance win of getting the len of the string is that we can pass it to the std::string constructor. Doing std::string string_copy(*utf8_value); would work, but providing the length allows the std::string constructor to avoid calculating the length internally and should be faster since it skips an operation. The std::string class has a bewildering number of different constructors. This one we are using that accepts a const char * as the first arg and an int as the second argument is the (4) from http://en.cppreference.com/w/cpp/string/basic_string/basic_string.

/cc @GretaCB

This was referenced Aug 1, 2017
@GretaCB
Copy link
Contributor

GretaCB commented Aug 3, 2017

Awesome, thanks for this writeup @springmeyer .

Qvestion

Currently the skel does this...

std::string name = *v8::String::Utf8Value(info[0]->ToString());

Does the implementation in your comment above differ in any way re: allocation, since it seems as if Nan::Utf8String utf8_value(info[0]); is allocating on the stack? But I suppose this is trivial since it's using a pointer to pass into the C++ string instantiation?

Also curious where exactly allocation is happening. At std::string name = ?

My real question is...is Nan::Utf8String utf8_value(info[0]); allocating memory?

@GretaCB
Copy link
Contributor

GretaCB commented Aug 3, 2017

Also trying to figure out why ToString() is no longer necessary in your example. Because of Nan?

@GretaCB GretaCB mentioned this issue Aug 4, 2017
@springmeyer
Copy link
Contributor Author

Currently the skel does this...

Sorry, made a copy/paste error. You are right - I've fixed the post above to note what we are currently doing and should avoid.

Also trying to figure out why ToString() is no longer necessary in your example. Because of Nan?

Really insightful! That is exactly the kind of question that will help you be cautious and write secure and robust code. In this case Nan has all the smarts needed internally to validate, in a fast way, that the string is valid, see https://github.com/nodejs/nan/blob/7291008d755f658343bcb61f456cb0a6744a6235/nan.h#L911. So if any of those fail then the length will be 0.

@springmeyer
Copy link
Contributor Author

My real question is...is Nan::Utf8String utf8_value(info[0]); allocating memory?

GREAT question. You are thinking like a performance engineer!!!. So the answer is maybe. This is why NanUtf8String is so fast. Because it uses a kind of SSO (discussed at the bottom of https://github.com/mapbox/cpp/blob/master/glossary.md#instantiate). What that means is that it uses stack allocation for short strings and only falls back to dynamic allocation for strings longer than 1024 bytes. You can see this in action at https://github.com/nodejs/nan/blob/7291008d755f658343bcb61f456cb0a6744a6235/nan.h#L915-L918

@GretaCB
Copy link
Contributor

GretaCB commented Aug 4, 2017

Sorry, made a copy/paste error. You are right

Ah, wasnt saying you were wrong. It tied into my initial comment which was digging into the comparison between the old way and the Nan way (the strike-through above), but then I altered my qvestion.

The reason I asked about whether or not Nan::Utf8String utf8_value(info[0]); is allocating memory is because I was pondering if it was possible to bypass this allocation and just set std::string var to some pointer of info[0]...but I'm realizing now that's probably impossible since std::string can only take a value it understands, which is the v8/nan type. And we would want to check length anyway.

@springmeyer
Copy link
Contributor Author

springmeyer commented Aug 4, 2017

but then I altered my qvestion.

👍

The reason I asked about whether or not Nan::Utf8String utf8_value(info[0]); is allocating memory is because I was pondering if it was possible to bypass this allocation and just set std::string var to some pointer of info[0]...but I'm realizing now that's probably impossible since std::string can only take a value it understands, which is the v8/nan type. And we would want to check length anyway.

Right, std::string needs to own the data it contains. It cannot point to memory managed elsewhere. So I think the only way to create a std::string that bypass's allocation is to create a std::string by moving another std::string. That basically moves ownership of the memory from one variable to the next.

While we are thinking about bypassing allocation, another alternative would be to pass the pointer that references a V8 object directly into the threadpool (instead of converting to a std::string). This is the ideal. I've not tried this with Nan::Utf8String yet but it is commonly done with node::Buffer objects. Instead of converting the node::Buffer to a std::string or some other kind of C++ object to hold the bytes, we pass the buffer object directly to the threadpool and access its internal memory directly, without copies. This is done over at https://github.com/mapnik/node-mapnik/blob/9f0d98c685e87f7aef8df24fb1e3fc1112726f3b/src/mapnik_image.cpp#L3452-L3454. That 1) creates variables to point at the buffer data pointer (without copying anything but the pointer) and 2) marks the JS buffer object as persistent to keep it alive while working with its data in the threadpool.

@GretaCB
Copy link
Contributor

GretaCB commented Aug 8, 2017

Nan string PR merged at #55

@springmeyer Good to close this and carry future improvement convo per your last comment to separate ticket?

@springmeyer
Copy link
Contributor Author

Good to close 👍 As far as last convo about moving to a node::Buffer over std::string, my feelins that would be a decent lift, so perhaps something to keep in mind, but not critical to ticket.

@GretaCB GretaCB closed this as completed Aug 9, 2017
pdufour pushed a commit to lob/pdffonts that referenced this issue Dec 6, 2019
pdufour added a commit to lob/pdffonts that referenced this issue Dec 9, 2019
* Add support for node 12

See mapbox/node-cpp-skel#54
nodejs/nan#849

* updates

* Only need this

* Use compatible versions

* use dockerfile

* updates

* caching

* Run commands from docker

* Fix cov report

* Fixes

* lowercase

* re-add cov check

* change back to yarn run
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants