Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discuss a possible redesign of NanNew(...) #210

Merged
merged 24 commits into from
Jan 14, 2015

Conversation

agnat
Copy link
Collaborator

@agnat agnat commented Oct 30, 2014

This is not a real PR. It's just a ticket with code.

UPDATE: Seems to be a real PR after all.
UPDATE 2: tldr? Read the Design Notes.

I took a minute and scribbled out a possible redesign of NanNew(...). It's more flexible and allows for very fine grained control but at the same time adds more structure. It also is totally non-greedy and very portable. Please note the generic handling of all v8 integer types. Also note that v8::External now behaves as expected. The replacement is currently called NanNew2(...) so I can mix and match.

It kind of evolved out of #209 and addresses concerns raised in the NAN 2 Discussion.

Sorry about the indentation. But ignoring that what do you guys think?

@kkoopa
Copy link
Collaborator

kkoopa commented Oct 30, 2014

Wow, you've been busy. I like it, it does look cleaner. I'm mostly concerned with corner cases, and cannot straight away tell if they will work or not, just by looking at the code.

Can it distinguish between wanting to create a Date and a Number?

v8::Local<v8::Date> NanNew(double time) {
  return v8::Date::New(v8::Isolate::GetCurrent(), time).As<v8::Date>();
}
v8::Local<v8::Number> NanNew(double val) {
  return v8::Number::New(v8::Isolate::GetCurrent(), val);
}

in code like

v8::Local<v8::Value> v = NanNew<v8::Date>(123456.7);

How are default arguments handled? E.g. FunctionTemplate. Luckily there is a finite upper bound on number of possible arguments, probably around four, definitely no more than six. Will it just be n specializations depended on the number of arguments?

In the beginning everything seemed nice and elegant with the current approach, but as more and more stuff was added, ambiguities were observed and needed resolving somehow.

Can you handle creation of bound and unbound scripts consistently, keeping in mind that they were the same on 0.10. I concluded that it was not possible, and had to break one constructor out (NanCompileScript).

What about NanNewContextHandle? I wanted it baked into NanNew, but never got around to it.

What about NanNewBufferHandle I refrained from baking it in, as it returns a Local<Object> but should be constructed with NanNew<node::Buffer> and thought it might be confusing to have it return another type than the one asked for (which no other NanNew would do).

@kkoopa
Copy link
Collaborator

kkoopa commented Oct 30, 2014

If you have not seen it, #86 Contains most of the original concerns.

@agnat
Copy link
Collaborator Author

agnat commented Oct 30, 2014

Does this work?
[...]

v8::Local<v8::Value> v = NanNew(123456.7);

Hehe... given the code above what do you suppose it should do? No, I'm afraid but overloading by return type is not part of the C++ language. The type of an expression is an inherent feature of the expression. Looking beyond the expression to deduce a type is not C++. For the same reason the ternary operator has to return one single type. ;)

How are default arguments handled?

Although I haven't tried it yet, default arguments should just work as expected. The compiler will pick the NanNew2(...) with the right number of arguments and the matching, defaulted version of Factory<T>::New(...). If you are asking about automatically forwarding the default arguments used by v8 to nan, that again is just not possible. It is the main reason why many C++ devs don't use default arguments at all.

Will it just be n specializations depended on the number of arguments?

Yes, I think so. There are techniques to deal with that but I think, given the small n, we should KISS. We want nan to be easy to use and easy to maintain. Let's not turn this into some template-hyper-macro-hotch-potch... until we have to. ;)

BTW, they are not specializations but independent template functions. (I know, it's a PITA).

In the beginning...

Relax. You really don't have to explain this. I know... believe me... I do...

What about NanNewBufferHandle I refrained from baking it in [...]

I think we should throw it in. If I'm not mistaken all integer constructors return Local<Integer> (the base class), too. Returning an object seems OK to me.

I'll look into the other stuff.

@kkoopa
Copy link
Collaborator

kkoopa commented Oct 30, 2014

I updated the code in question to make sense, it was a copypaste from another
issue.

On Thursday 30 October 2014 08:06:17 David Siegel wrote:

Does this work?

[...]

v8::Local<v8::Value> v = NanNew(123456.7);

Hehe... given the code above what do you suppose it should do? No, I'm
afraid but overloading by return type is not part of the C++ language. The
type of an expression is an inherent feature of the expression. Looking
beyond the expression to deduce a type is not C++. For the same reason the
ternary operator has to return one single type. ;)

How are default arguments handled?

Although I haven't tried it yet, default arguments should just work as
expected. The compiler will pick the NanNew2(...) with the right number
of arguments and the matching, defaulted version of Factory<T>::New(...).
If you are asking about automatically forwarding the default arguments used
by v8 to nan, that again is just not possible. It is the main reason why
many C++ devs don't use default arguments at all.

Will it just be n specializations depended on the number of arguments?

Yes, I think so. There are techniques to deal with that but I think, given
the small n, we should KISS. We want nan to be easy to use and easy to
maintain. Let's not turn this into some template-hyper-macro-hotch-potch...
until we have to. ;)

BTW, they are not specializations but independent template functions. (I
know, it's a PITA).

In the beginning...

Relax. You really don't have to explain this. I know... believe me... I
do...

What about NanNewBufferHandle I refrained from baking it in [...]

I think we should throw it in. If I'm not mistaken all integer constructors
return Local<Integer> (the base class), too. Returning an object seems OK
to me.

I'll look into the other stuff.


Reply to this email directly or view it on GitHub:
#210 (comment)

@agnat
Copy link
Collaborator Author

agnat commented Oct 30, 2014

Like this?

Date vs. Number

@kkoopa
Copy link
Collaborator

kkoopa commented Oct 30, 2014

Yes, it looks good.

On Thursday 30 October 2014 08:31:46 David Siegel wrote:

Like this?

[Date vs.
Number](https://github.com/rvagg/nan/blob/260753e596c28dd24e1476225d166b190
6b25c9e/test/cpp/nan_sketch.cpp#L159-L165)


Reply to this email directly or view it on GitHub:
#210 (comment)

@kkoopa
Copy link
Collaborator

kkoopa commented Oct 30, 2014

And oh, I made the Integer constructors of NAN return the proper types, same with Date and everything else that was weird for no good reason. Every NanNew<T> returns a T, as it should be. That is why having the buffer constructor be the odd one out is unsettling and I decided to leave it out, with the excuse that it is not a v8 type, but Node's own.

@agnat
Copy link
Collaborator Author

agnat commented Oct 30, 2014

Hm, not sure NAN should interfere like that. Have to think about it.

@agnat
Copy link
Collaborator Author

agnat commented Nov 1, 2014

Scripts - bound and unbound

@kkoopa, could you tell me a bit about the string constructors? It looks like uint8_t * is treated as ASCII/Latin-1 and char * is used for UTF-8. Why is that? Also, a lot of manual copying into uint16_t buffers is going on in the pre v0.11 implementation. What's up with that?

@kkoopa
Copy link
Collaborator

kkoopa commented Nov 1, 2014

They are mainly treated that way because that's how v8 does it.

ASCII is only a 7-bit subset of Latin1, without the high bit set. The name used here is ASCII for legacy reasons, v8 removed it precisely because it never was conformant ASCII. It always was sort of Latin1 (no particular encoding was actually used), just a 1-byte charset using all 8 bits. Treating them as signed makes no sense whatsoever, a character cannot be negative. The main point is that they really should not be used, if you absolutely have to and cannot make your program correct, then make it explicit and jump through the hoop. "This is a raw (unsigned) byte stream, I really, really, do want this."

ASCII is a proper subset of UTF8, so if you actually have a proper ASCII-encoded string, it is also a proper UTF-8 encoded string, so you would just use that. However, if the high bit is set, it means it is a multi-byte character in UTF-8. It uses the char type just because that is the common way. A string literal is a const char *, this is the normal string, which should be used, so make it easy.

Old versions had no proper way of doing wide strings. That horrible loop is the only way, which is why later versions made sure to fix the problem.

@kkoopa
Copy link
Collaborator

kkoopa commented Nov 1, 2014

Does the script test pass with both old and new version? I had the problem that bound and unbound scripts were the same in old v8, and doing typedef v8::Script NanUnboundScript typedef v8::Script NanBoundScript led to ambiguity at some point and the compiler could not distinguish between them, as it thought they were the same.

@agnat
Copy link
Collaborator Author

agnat commented Nov 1, 2014

Yes, it works. The key is not to do anything about unbound scripts in old versions. The typedef is enough to make the "normal" script functions match... which is what we want.

@kkoopa
Copy link
Collaborator

kkoopa commented Nov 1, 2014

Aha, but is it then impossible to make the names match?

@agnat
Copy link
Collaborator Author

agnat commented Nov 1, 2014

Regarding strings: I do get the difference between the encodings. So the idea is to treat everything as UTF8/ASCII except the user casts to uint8_t*? Still don't get it... sorry...

What do you mean by "impossible to make the names match"? Like have a

typedef v8::Script NanBoundScript;

Sure. Why not? It's still there. I just don't use it because I don't like it. :-D

@kkoopa
Copy link
Collaborator

kkoopa commented Nov 1, 2014

uint16_t * is a wide string, that naturally maps to UCS2 (or UTF-16 or whatever it is called) in v8, which is what is used internally by the v8::String class.

int8_t * is char * on every sane platform (every one supported by Node), that maps to UTF-8, which is the best encoding ever, this is what you almost always want to use.

uint8_t * maps to StringFromOneByte and Latin1, this should not be used accidentally, so do not feed it normal char data, as it will mostly likely not produce the correct result. A char defaults to signed, so don't treat it as unsigned.

These are the correct signatures from v8:

/** Allocates a new string from UTF-8 data.*/
  static Local<String> NewFromUtf8(Isolate* isolate,
                                  const char* data,
                                  NewStringType type = kNormalString,
                                  int length = -1);

  /** Allocates a new string from Latin-1 data.*/
  static Local<String> NewFromOneByte(
      Isolate* isolate,
      const uint8_t* data,
      NewStringType type = kNormalString,
      int length = -1);

  /** Allocates a new string from UTF-16 data.*/
  static Local<String> NewFromTwoByte(
      Isolate* isolate,
      const uint16_t* data,
      NewStringType type = kNormalString,
      int length = -1);

From an end-user perspective it seems more intuitive to be able to use the name NanBoundScript if there is a NanUnboundScript. I like symmetry and consistency, that way you only need to remember one half, plus the fact that it is symmetric.

@kkoopa
Copy link
Collaborator

kkoopa commented Nov 1, 2014

Regarding the Scripts, you also need to correctly combine them with NanRunScript (or any other way of correctly executing both kinds of scripts across versions, this was the only way I came up with). Both bound and unbound scripts existed in old v8 too, they were just not distinguished with different types.

@agnat
Copy link
Collaborator Author

agnat commented Nov 3, 2014

So, that's it. The old implementation is still in place and the new implementation is in separate files. The new implementation is selected using #define NAN_NEW_NAN_NEW like here. All tests pass using both implementations. I also added about 80 tests. @kkoopa, can you take it from here?

@kkoopa
Copy link
Collaborator

kkoopa commented Nov 3, 2014

Sure, I can probably do that. Will take a closer look tomorrow. Thanks a million. But do add yourself to the list of contributors, this is certainly a major contribution.

On November 3, 2014 10:59:05 PM EET, David Siegel notifications@github.com wrote:

So, that's it. The old implementation is still in place and the new
implementation is in separate files. The new implementation is selected
using #define NAN_NEW_NAN_NEW like
here.
All tests pass using both implementations. I also added about 80 tests.
@kkoopa, can you take it from here?


Reply to this email directly or view it on GitHub:
#210 (comment)

@rvagg
Copy link
Member

rvagg commented Nov 4, 2014

@kkoopa if this is solid enough to make it in then we should add @agnat properly as a collaborator / owner. I'm just going to have to defer to you on whether this is worth replacing our existing NanNew code with.

@kkoopa
Copy link
Collaborator

kkoopa commented Nov 4, 2014

Yes, I had that in mind. It most definitely is worth replacing the existing hack with. It is easier to maintain, easier to read and simply more correct.

@agnat
Copy link
Collaborator Author

agnat commented Nov 4, 2014

To help you guys to gain a little more trust in this here are a few...

Design Notes:

Let's ignore the convenience functions for a second and get right down to the core. Looking at NanNew<T>(...) from an outside perspective it is a single entry point API to call constructors of T. The possible signatures of NanNew depend on T but other than that, they are quite arbitrary and ... colorful.

The first approach was to use one, massively overloaded name. I don't have to explain this. You all have seen the header file. Now, what's the problem with that? Let the author speak for himself. @kkoopa said:

In the beginning everything seemed nice and elegant with the current approach, but as more and more stuff was added, ambiguities were observed and needed resolving somehow.

In other words: It was bad

  • for the user, because of the large candidate lists in case of typos
  • for the maintainer, because it was harder and harder to find ways to sneak things through that pinhole
  • for the compiler, because it had to sieve through a massive amount of combinations trying to discern what the user wants.

So. scratch that. The massively overloaded name approach is an anti-pattern like the big ball of spaghetti. Back to the drawing board. Interestingly the key to this is the third point. If we can help the compiler to get a better grasp on what we want we improve the situation for both users and maintainers.

Now, how do we do that? Let's start with the entry points. We define a cluster of template functions like so:

template <typename T> v8::Local<T> NanNew() {}
template <typename T, typename A0> v8::Local<T> NanNew(A0 arg0) {}
template <typename T, typename A0, typename A1> v8::Local<T> NanNew(A0 arg0, A1 arg1) {}

and so on. We now have a function that takes a type argument and has an arbitrary signature. Now, recall that the set of legal signatures depends on type T. We're going to model that relation by adding a little template class:

template <typename T> struct Factory;

template <typename T> v8::Local<T> NanNew() {
  return Factory<T>::New();
}
template <typename T, typename A0> v8::Local<T> NanNew(A0 arg0) {
  return Factory<T>::New(arg0);
}

That changes things drastically. The compiler now first looks up Factory<T> and only then starts looking for a version of New that matches the arguments. We have implemented a compile time type dispatch. Now maintainers are able to route calls cleanly by providing implementations of New for certain types T:

template <>
struct Factory<v8::String> {
  static inline v8::Local<v8::String> New(const char* s) { /* ... */ }
};

The code above enables calls like

v8::Local<v8::String> s = NanNew<v8::String>("plonk");

On the user side things improve, too. If invoked with an invalid T the compiler will complain that Factory<T> is undefined. If the arguments are wrong the compiler will only list the candidates found in that factory and not all of them. Another advantage of this design is that Factory<T>::New is an overloaded function and not a template. Overloads provide a better user experience because all the "normal" implicit conversion rules apply. On the maintainer side it is no longer necessary to provide const and non-const versions. It just feels more ordinary and that is a good thing.

@kkoopa
Copy link
Collaborator

kkoopa commented Nov 4, 2014

What he said.

@rvagg
Copy link
Member

rvagg commented Nov 4, 2014

Ah, great explanation. Thanks for that.

@kkoopa
Copy link
Collaborator

kkoopa commented Nov 4, 2014

I think the best action plan would be: Release 1.4.1 with 9d8b270 and kkoopa/nan@e15c720 asap. Then I'll integrate this and #207 and we can call it 1.5.0.

@kkoopa
Copy link
Collaborator

kkoopa commented Nov 6, 2014

Am I correct in reading this that your pre-12 implementation always creates bound scripts? It should create unbound scripts (built through Script::New).

@agnat
Copy link
Collaborator Author

agnat commented Nov 6, 2014

Possible. Let me check.

@agnat
Copy link
Collaborator Author

agnat commented Nov 6, 2014

Good catch. Fixed.

@kkoopa kkoopa mentioned this pull request Nov 7, 2014
@kkoopa
Copy link
Collaborator

kkoopa commented Nov 7, 2014

Does the old caveat on NanNew<v8::String>(std::string) still apply?

Note: Using NanNew with an std::string is possible, however, you should ensure to use the overload version (NanNew(std::string)) rather than the template version (NanNew<v8::String>(std::string)) as there is an unnecessary performance penalty to using the template version because of the inability for compilers to appropriately deduce to reference types on template specialization.

@agnat
Copy link
Collaborator Author

agnat commented Nov 7, 2014

First things first: No such penalty. Calling NanNew<std::string>(...) and calling NanNew(std::string()) is the same thing, down to the assembler level. I'd be totally nonplussed if it isn't.

That said, I'm not quite sure what the note is referring to:

[...] because of the inability for compilers to appropriately deduce to reference types on template specialization.

It doesn't matter much, because it's a thing of the past. But if you have an example/article/stackoverflow at hand...

@kkoopa
Copy link
Collaborator

kkoopa commented Nov 7, 2014

It comes from here: e79e3d4
Wanted a reference to a string, got a copy.

@agnat
Copy link
Collaborator Author

agnat commented Nov 7, 2014

Hehe, yeah...

Thing is: There is no implicit conversion in template land. std::string and std::string const& are very different things. The compiler has a std::string, though, copied from the string literal. It picks the right specialization and... boom... pass by value. :-/

If you don't specialize on std::string but only on std::string const& it won't compile, because...? Exactly! There is no implicit conversion in template land. Now you're on it.

Good thing we're using overloads now.

@kkoopa
Copy link
Collaborator

kkoopa commented Nov 7, 2014

Yeah, that's what I figured. Good, I can update the documentation in the onepointfive branch.

@agnat
Copy link
Collaborator Author

agnat commented Nov 7, 2014

I just noticed that I eliminated another superfluous copy by replacing s.c_str() with &*s.begin(). :-D

@kkoopa
Copy link
Collaborator

kkoopa commented Nov 7, 2014

Did you fully duplicate the (distinct) tests in news.cpp, or might there be some left out? So that I may remove the old nannew tests.

@agnat
Copy link
Collaborator Author

agnat commented Nov 7, 2014

No, I wrote my own test suite without looking at the other tests. Some types are missing in my test, though (RegExp, Object).

I wouldn't drop any test right now. Having a few dups doesn't hurt. It will be a lot easier to sort things out after/while refactoring the tests. You wanted to do that anyway at some point, right?

@kkoopa
Copy link
Collaborator

kkoopa commented Nov 7, 2014

Ok, good. I'll leave them in then. Maybe refactor tests for the next major version.

@agnat
Copy link
Collaborator Author

agnat commented Nov 7, 2014

Now that I think of it, I don't believe we have (m)any duplicates. Your tests make sure the right value arrives on the JavaScript side, while my tests focus on the C++ side. They do call the same functions but they test very different things.

Regarding NanNew() the tests are probably better than we think they are.

@agnat agnat mentioned this pull request Nov 8, 2014
@agnat
Copy link
Collaborator Author

agnat commented Nov 13, 2014

I just noticed that I eliminated another superfluous copy by replacing s.c_str() with &*s.begin().

On second thought that wasn't a good idea. The right thing to do is to use s.data().

@kkoopa
Copy link
Collaborator

kkoopa commented Nov 20, 2014

I should remember to change this the 1.5 branch.

@agnat
Copy link
Collaborator Author

agnat commented Nov 20, 2014

Most def. It probably segfaults with some STLs... :-/

@kkoopa kkoopa merged commit 090a6e6 into nodejs:master Jan 14, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants