Discuss a possible redesign of NanNew(...) #210

agnat · 2014-10-30T12:57:48Z

This is not a real PR. It's just a ticket with code.

UPDATE: Seems to be a real PR after all.
UPDATE 2: tldr? Read the Design Notes.

I took a minute and scribbled out a possible redesign of NanNew(...). It's more flexible and allows for very fine grained control but at the same time adds more structure. It also is totally non-greedy and very portable. Please note the generic handling of all v8 integer types. Also note that v8::External now behaves as expected. The replacement is currently called NanNew2(...) so I can mix and match.

It kind of evolved out of #209 and addresses concerns raised in the NAN 2 Discussion.

Sorry about the indentation. But ignoring that what do you guys think?

kkoopa · 2014-10-30T13:20:03Z

Wow, you've been busy. I like it, it does look cleaner. I'm mostly concerned with corner cases, and cannot straight away tell if they will work or not, just by looking at the code.

Can it distinguish between wanting to create a Date and a Number?

v8::Local<v8::Date> NanNew(double time) {
  return v8::Date::New(v8::Isolate::GetCurrent(), time).As<v8::Date>();
}

v8::Local<v8::Number> NanNew(double val) {
  return v8::Number::New(v8::Isolate::GetCurrent(), val);
}

in code like

v8::Local<v8::Value> v = NanNew<v8::Date>(123456.7);

How are default arguments handled? E.g. FunctionTemplate. Luckily there is a finite upper bound on number of possible arguments, probably around four, definitely no more than six. Will it just be n specializations depended on the number of arguments?

In the beginning everything seemed nice and elegant with the current approach, but as more and more stuff was added, ambiguities were observed and needed resolving somehow.

Can you handle creation of bound and unbound scripts consistently, keeping in mind that they were the same on 0.10. I concluded that it was not possible, and had to break one constructor out (NanCompileScript).

What about NanNewContextHandle? I wanted it baked into NanNew, but never got around to it.

What about NanNewBufferHandle I refrained from baking it in, as it returns a Local<Object> but should be constructed with NanNew<node::Buffer> and thought it might be confusing to have it return another type than the one asked for (which no other NanNew would do).

kkoopa · 2014-10-30T13:27:40Z

If you have not seen it, #86 Contains most of the original concerns.

agnat · 2014-10-30T15:06:15Z

Does this work?
[...]
v8::Local<v8::Value> v = NanNew(123456.7);

Hehe... given the code above what do you suppose it should do? No, I'm afraid but overloading by return type is not part of the C++ language. The type of an expression is an inherent feature of the expression. Looking beyond the expression to deduce a type is not C++. For the same reason the ternary operator has to return one single type. ;)

How are default arguments handled?

Although I haven't tried it yet, default arguments should just work as expected. The compiler will pick the NanNew2(...) with the right number of arguments and the matching, defaulted version of Factory<T>::New(...). If you are asking about automatically forwarding the default arguments used by v8 to nan, that again is just not possible. It is the main reason why many C++ devs don't use default arguments at all.

Will it just be n specializations depended on the number of arguments?

Yes, I think so. There are techniques to deal with that but I think, given the small n, we should KISS. We want nan to be easy to use and easy to maintain. Let's not turn this into some template-hyper-macro-hotch-potch... until we have to. ;)

BTW, they are not specializations but independent template functions. (I know, it's a PITA).

In the beginning...

Relax. You really don't have to explain this. I know... believe me... I do...

What about NanNewBufferHandle I refrained from baking it in [...]

I think we should throw it in. If I'm not mistaken all integer constructors return Local<Integer> (the base class), too. Returning an object seems OK to me.

I'll look into the other stuff.

kkoopa · 2014-10-30T15:12:39Z

I updated the code in question to make sense, it was a copypaste from another
issue.

On Thursday 30 October 2014 08:06:17 David Siegel wrote:

Does this work?

[...]
v8::Local<v8::Value> v = NanNew(123456.7);
Hehe... given the code above what do you suppose it should do? No, I'm
afraid but overloading by return type is not part of the C++ language. The
type of an expression is an inherent feature of the expression. Looking
beyond the expression to deduce a type is not C++. For the same reason the
ternary operator has to return one single type. ;)

How are default arguments handled?

Although I haven't tried it yet, default arguments should just work as
expected. The compiler will pick the NanNew2(...) with the right number
of arguments and the matching, defaulted version of Factory<T>::New(...).
If you are asking about automatically forwarding the default arguments used
by v8 to nan, that again is just not possible. It is the main reason why
many C++ devs don't use default arguments at all.

Will it just be n specializations depended on the number of arguments?

Yes, I think so. There are techniques to deal with that but I think, given
the small n, we should KISS. We want nan to be easy to use and easy to
maintain. Let's not turn this into some template-hyper-macro-hotch-potch...
until we have to. ;)

BTW, they are not specializations but independent template functions. (I
know, it's a PITA).

In the beginning...

Relax. You really don't have to explain this. I know... believe me... I
do...

What about NanNewBufferHandle I refrained from baking it in [...]

I think we should throw it in. If I'm not mistaken all integer constructors
return Local<Integer> (the base class), too. Returning an object seems OK
to me.

I'll look into the other stuff.

Reply to this email directly or view it on GitHub:
#210 (comment)

agnat · 2014-10-30T15:31:44Z

Like this?

Date vs. Number

kkoopa · 2014-10-30T15:35:59Z

Yes, it looks good.

On Thursday 30 October 2014 08:31:46 David Siegel wrote:

Like this?

[Date vs.
Number](https://github.com/rvagg/nan/blob/260753e596c28dd24e1476225d166b190
6b25c9e/test/cpp/nan_sketch.cpp#L159-L165)

Reply to this email directly or view it on GitHub:
#210 (comment)

kkoopa · 2014-10-30T15:47:14Z

And oh, I made the Integer constructors of NAN return the proper types, same with Date and everything else that was weird for no good reason. Every NanNew<T> returns a T, as it should be. That is why having the buffer constructor be the odd one out is unsettling and I decided to leave it out, with the excuse that it is not a v8 type, but Node's own.

agnat · 2014-10-30T16:16:13Z

Hm, not sure NAN should interfere like that. Have to think about it.

agnat · 2014-11-01T10:11:52Z

Scripts - bound and unbound

@kkoopa, could you tell me a bit about the string constructors? It looks like uint8_t * is treated as ASCII/Latin-1 and char * is used for UTF-8. Why is that? Also, a lot of manual copying into uint16_t buffers is going on in the pre v0.11 implementation. What's up with that?

kkoopa · 2014-11-01T14:30:24Z

They are mainly treated that way because that's how v8 does it.

ASCII is only a 7-bit subset of Latin1, without the high bit set. The name used here is ASCII for legacy reasons, v8 removed it precisely because it never was conformant ASCII. It always was sort of Latin1 (no particular encoding was actually used), just a 1-byte charset using all 8 bits. Treating them as signed makes no sense whatsoever, a character cannot be negative. The main point is that they really should not be used, if you absolutely have to and cannot make your program correct, then make it explicit and jump through the hoop. "This is a raw (unsigned) byte stream, I really, really, do want this."

ASCII is a proper subset of UTF8, so if you actually have a proper ASCII-encoded string, it is also a proper UTF-8 encoded string, so you would just use that. However, if the high bit is set, it means it is a multi-byte character in UTF-8. It uses the char type just because that is the common way. A string literal is a const char *, this is the normal string, which should be used, so make it easy.

Old versions had no proper way of doing wide strings. That horrible loop is the only way, which is why later versions made sure to fix the problem.

kkoopa · 2014-11-01T14:36:45Z

Does the script test pass with both old and new version? I had the problem that bound and unbound scripts were the same in old v8, and doing typedef v8::Script NanUnboundScript typedef v8::Script NanBoundScript led to ambiguity at some point and the compiler could not distinguish between them, as it thought they were the same.

agnat · 2014-11-01T14:39:32Z

Yes, it works. The key is not to do anything about unbound scripts in old versions. The typedef is enough to make the "normal" script functions match... which is what we want.

kkoopa · 2014-11-01T14:44:22Z

Aha, but is it then impossible to make the names match?

agnat · 2014-11-01T14:48:59Z

Regarding strings: I do get the difference between the encodings. So the idea is to treat everything as UTF8/ASCII except the user casts to uint8_t*? Still don't get it... sorry...

What do you mean by "impossible to make the names match"? Like have a

typedef v8::Script NanBoundScript;

Sure. Why not? It's still there. I just don't use it because I don't like it. :-D

kkoopa · 2014-11-01T15:05:10Z

uint16_t * is a wide string, that naturally maps to UCS2 (or UTF-16 or whatever it is called) in v8, which is what is used internally by the v8::String class.

int8_t * is char * on every sane platform (every one supported by Node), that maps to UTF-8, which is the best encoding ever, this is what you almost always want to use.

uint8_t * maps to StringFromOneByte and Latin1, this should not be used accidentally, so do not feed it normal char data, as it will mostly likely not produce the correct result. A char defaults to signed, so don't treat it as unsigned.

These are the correct signatures from v8:

/** Allocates a new string from UTF-8 data.*/
  static Local<String> NewFromUtf8(Isolate* isolate,
                                  const char* data,
                                  NewStringType type = kNormalString,
                                  int length = -1);

  /** Allocates a new string from Latin-1 data.*/
  static Local<String> NewFromOneByte(
      Isolate* isolate,
      const uint8_t* data,
      NewStringType type = kNormalString,
      int length = -1);

  /** Allocates a new string from UTF-16 data.*/
  static Local<String> NewFromTwoByte(
      Isolate* isolate,
      const uint16_t* data,
      NewStringType type = kNormalString,
      int length = -1);

From an end-user perspective it seems more intuitive to be able to use the name NanBoundScript if there is a NanUnboundScript. I like symmetry and consistency, that way you only need to remember one half, plus the fact that it is symmetric.

kkoopa · 2014-11-01T15:10:02Z

Regarding the Scripts, you also need to correctly combine them with NanRunScript (or any other way of correctly executing both kinds of scripts across versions, this was the only way I came up with). Both bound and unbound scripts existed in old v8 too, they were just not distinguished with different types.

agnat · 2014-11-03T20:59:03Z

So, that's it. The old implementation is still in place and the new implementation is in separate files. The new implementation is selected using #define NAN_NEW_NAN_NEW like here. All tests pass using both implementations. I also added about 80 tests. @kkoopa, can you take it from here?

kkoopa · 2014-11-03T21:21:58Z

Sure, I can probably do that. Will take a closer look tomorrow. Thanks a million. But do add yourself to the list of contributors, this is certainly a major contribution.

On November 3, 2014 10:59:05 PM EET, David Siegel notifications@github.com wrote:

So, that's it. The old implementation is still in place and the new
implementation is in separate files. The new implementation is selected
using #define NAN_NEW_NAN_NEW like
here.
All tests pass using both implementations. I also added about 80 tests.
@kkoopa, can you take it from here?

Reply to this email directly or view it on GitHub:
#210 (comment)

rvagg · 2014-11-04T00:34:50Z

@kkoopa if this is solid enough to make it in then we should add @agnat properly as a collaborator / owner. I'm just going to have to defer to you on whether this is worth replacing our existing NanNew code with.

kkoopa · 2014-11-04T09:34:00Z

Yes, I had that in mind. It most definitely is worth replacing the existing hack with. It is easier to maintain, easier to read and simply more correct.

agnat · 2014-11-04T13:46:32Z

To help you guys to gain a little more trust in this here are a few...

Design Notes:

Let's ignore the convenience functions for a second and get right down to the core. Looking at NanNew<T>(...) from an outside perspective it is a single entry point API to call constructors of T. The possible signatures of NanNew depend on T but other than that, they are quite arbitrary and ... colorful.

The first approach was to use one, massively overloaded name. I don't have to explain this. You all have seen the header file. Now, what's the problem with that? Let the author speak for himself. @kkoopa said:

In the beginning everything seemed nice and elegant with the current approach, but as more and more stuff was added, ambiguities were observed and needed resolving somehow.

In other words: It was bad

for the user, because of the large candidate lists in case of typos
for the maintainer, because it was harder and harder to find ways to sneak things through that pinhole
for the compiler, because it had to sieve through a massive amount of combinations trying to discern what the user wants.

So. scratch that. The massively overloaded name approach is an anti-pattern like the big ball of spaghetti. Back to the drawing board. Interestingly the key to this is the third point. If we can help the compiler to get a better grasp on what we want we improve the situation for both users and maintainers.

Now, how do we do that? Let's start with the entry points. We define a cluster of template functions like so:

template <typename T> v8::Local<T> NanNew() {}
template <typename T, typename A0> v8::Local<T> NanNew(A0 arg0) {}
template <typename T, typename A0, typename A1> v8::Local<T> NanNew(A0 arg0, A1 arg1) {}

and so on. We now have a function that takes a type argument and has an arbitrary signature. Now, recall that the set of legal signatures depends on type T. We're going to model that relation by adding a little template class:

template <typename T> struct Factory;

template <typename T> v8::Local<T> NanNew() {
  return Factory<T>::New();
}
template <typename T, typename A0> v8::Local<T> NanNew(A0 arg0) {
  return Factory<T>::New(arg0);
}

That changes things drastically. The compiler now first looks up Factory<T> and only then starts looking for a version of New that matches the arguments. We have implemented a compile time type dispatch. Now maintainers are able to route calls cleanly by providing implementations of New for certain types T:

template <>
struct Factory<v8::String> {
  static inline v8::Local<v8::String> New(const char* s) { /* ... */ }
};

The code above enables calls like

v8::Local<v8::String> s = NanNew<v8::String>("plonk");

On the user side things improve, too. If invoked with an invalid T the compiler will complain that Factory<T> is undefined. If the arguments are wrong the compiler will only list the candidates found in that factory and not all of them. Another advantage of this design is that Factory<T>::New is an overloaded function and not a template. Overloads provide a better user experience because all the "normal" implicit conversion rules apply. On the maintainer side it is no longer necessary to provide const and non-const versions. It just feels more ordinary and that is a good thing.

kkoopa · 2014-11-04T13:55:03Z

What he said.

rvagg · 2014-11-04T13:58:55Z

Ah, great explanation. Thanks for that.

kkoopa · 2014-11-04T14:06:43Z

I think the best action plan would be: Release 1.4.1 with 9d8b270 and kkoopa/nan@e15c720 asap. Then I'll integrate this and #207 and we can call it 1.5.0.

kkoopa · 2014-11-06T19:25:50Z

Am I correct in reading this that your pre-12 implementation always creates bound scripts? It should create unbound scripts (built through Script::New).

agnat · 2014-11-06T19:28:21Z

Possible. Let me check.

agnat · 2014-11-06T19:32:54Z

Good catch. Fixed.

kkoopa · 2014-11-07T09:34:52Z

Does the old caveat on NanNew<v8::String>(std::string) still apply?

Note: Using NanNew with an std::string is possible, however, you should ensure to use the overload version (NanNew(std::string)) rather than the template version (NanNew<v8::String>(std::string)) as there is an unnecessary performance penalty to using the template version because of the inability for compilers to appropriately deduce to reference types on template specialization.

agnat · 2014-11-07T14:45:09Z

First things first: No such penalty. Calling NanNew<std::string>(...) and calling NanNew(std::string()) is the same thing, down to the assembler level. I'd be totally nonplussed if it isn't.

That said, I'm not quite sure what the note is referring to:

[...] because of the inability for compilers to appropriately deduce to reference types on template specialization.

It doesn't matter much, because it's a thing of the past. But if you have an example/article/stackoverflow at hand...

kkoopa · 2014-11-07T14:52:19Z

It comes from here: e79e3d4
Wanted a reference to a string, got a copy.

agnat · 2014-11-07T15:19:52Z

Hehe, yeah...

Thing is: There is no implicit conversion in template land. std::string and std::string const& are very different things. The compiler has a std::string, though, copied from the string literal. It picks the right specialization and... boom... pass by value. :-/

If you don't specialize on std::string but only on std::string const& it won't compile, because...? Exactly! There is no implicit conversion in template land. Now you're on it.

Good thing we're using overloads now.

kkoopa · 2014-11-07T15:28:15Z

Yeah, that's what I figured. Good, I can update the documentation in the onepointfive branch.

agnat · 2014-11-07T15:52:49Z

I just noticed that I eliminated another superfluous copy by replacing s.c_str() with &*s.begin(). :-D

kkoopa · 2014-11-07T16:34:55Z

Did you fully duplicate the (distinct) tests in news.cpp, or might there be some left out? So that I may remove the old nannew tests.

agnat · 2014-11-07T16:54:40Z

No, I wrote my own test suite without looking at the other tests. Some types are missing in my test, though (RegExp, Object).

I wouldn't drop any test right now. Having a few dups doesn't hurt. It will be a lot easier to sort things out after/while refactoring the tests. You wanted to do that anyway at some point, right?

kkoopa · 2014-11-07T17:03:57Z

Ok, good. I'll leave them in then. Maybe refactor tests for the next major version.

agnat · 2014-11-07T17:16:54Z

Now that I think of it, I don't believe we have (m)any duplicates. Your tests make sure the right value arrives on the JavaScript side, while my tests focus on the C++ side. They do call the same functions but they test very different things.

Regarding NanNew() the tests are probably better than we think they are.

agnat · 2014-11-13T12:50:43Z

I just noticed that I eliminated another superfluous copy by replacing s.c_str() with &*s.begin().

On second thought that wasn't a good idea. The right thing to do is to use s.data().

kkoopa · 2014-11-20T09:10:24Z

I should remember to change this the 1.5 branch.

agnat · 2014-11-20T13:50:09Z

Most def. It probably segfaults with some STLs... :-/

agnat added 2 commits October 30, 2014 13:15

Added a sketch to discuss a redesign of NanNew(...)

4600596

Cosmetics.

9e883ea

Added v8::Date construction.

260753e

agnat added 11 commits October 30, 2014 19:44

Added integer type conversion.

8e5d245

Fixed indentation.

298d4c3

Added v0.10 support and moved things into implementation files

40dc82a

Tell windows about π.

a263261

Fixed a typo.

b18b364

Moved API to header, added tap test helper and script support [WIP]

5bfe710

Added some more C++ tests.

220380d

Another attempt to fix the windows build,

830d360

Wrote some more tests.

880ea4b

Moved NanTap testing adapter

1d7bf7c

Added tests for arrays, dates and (unbound) scripts

4f22214

NanNew redesign: Fixed build using the old implementation.

a3d40e6

agnat mentioned this pull request Nov 6, 2014

Use latest NAN after NanNew(...) redesign lands agnat/node_mdns#102

Closed

NanNew redesign: Fix - Use Script::New instead of Script::Compile

090a6e6

kkoopa mentioned this pull request Nov 7, 2014

NAN 1.5 WIP #216

Merged

agnat mentioned this pull request Nov 8, 2014

WeakPersistent Redesign #218

Closed

kkoopa merged commit 090a6e6 into nodejs:master Jan 14, 2015

Discuss a possible redesign of NanNew(...) #210

Discuss a possible redesign of NanNew(...) #210

Conversation

agnat commented Oct 30, 2014

kkoopa commented Oct 30, 2014

kkoopa commented Oct 30, 2014

agnat commented Oct 30, 2014

kkoopa commented Oct 30, 2014

agnat commented Oct 30, 2014

kkoopa commented Oct 30, 2014

kkoopa commented Oct 30, 2014

agnat commented Oct 30, 2014

agnat commented Nov 1, 2014

kkoopa commented Nov 1, 2014

kkoopa commented Nov 1, 2014

agnat commented Nov 1, 2014

kkoopa commented Nov 1, 2014

agnat commented Nov 1, 2014

kkoopa commented Nov 1, 2014

kkoopa commented Nov 1, 2014

agnat commented Nov 3, 2014

kkoopa commented Nov 3, 2014

rvagg commented Nov 4, 2014

kkoopa commented Nov 4, 2014

agnat commented Nov 4, 2014

Design Notes:

kkoopa commented Nov 4, 2014

rvagg commented Nov 4, 2014

kkoopa commented Nov 4, 2014

kkoopa commented Nov 6, 2014

agnat commented Nov 6, 2014

agnat commented Nov 6, 2014

kkoopa commented Nov 7, 2014

agnat commented Nov 7, 2014

kkoopa commented Nov 7, 2014

agnat commented Nov 7, 2014

kkoopa commented Nov 7, 2014

agnat commented Nov 7, 2014

kkoopa commented Nov 7, 2014

agnat commented Nov 7, 2014

kkoopa commented Nov 7, 2014

agnat commented Nov 7, 2014

agnat commented Nov 13, 2014

kkoopa commented Nov 20, 2014

agnat commented Nov 20, 2014