Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discussion] FFI - Giving Buffer more low-level C functionality #1750

Closed
wants to merge 61 commits into from

Conversation

TooTallNate
Copy link
Contributor

This is a PR, with the intention of starting/continuing the discussion about bringing FFI into node/iojs core.

Overall, the biggest benefit to integrating an FFI interface to node-core is that we can avoid C++ compiled addons, for the common case (where extreme performance requirements would be the uncommon case). No more compiling is huge for the UX, especially on Windows.

As of now, this PR adds some low-level functions to the Buffer object. I believe that these are the building blocks necessary to remove the C++ compiled portion of ref, and leaves room to implement other kinds of native interfaces (we could build js-ctypes API on top of these low-level functions as well… I'm planning on doing exactly that if this gains momentum!)

New Buffer APIs

readInt64BE(), readInt64LE(), readUInt64BE(), and readUInt64LE()

Reads an (u)int64 value from the Buffer. Similar to the other read* functions, but if the int64 value falls outside the JS Number max value (±9007199254740992) then a String will be returned instead of a Number. This falls in-line with most Int64 library APIs, including the ctypes Int64 constructor, which accepts either a Number or String.


writeInt64BE(), writeInt64LE(), writeUInt64BE(), and writeUInt64LE()

Complimentary to the read* functions, these allow int64 range integers to be written to Buffer instances. A JS Number may be passed in, and it will be written accordingly. Alternatively, a JS String may be passed in and it will be converted to an (u)int64 via strtoll/strtoull and then written to the Buffer instance (there would probably be value in accepting a Buffer instance that already has a (u)int64 written to it, but that's not currently implemented).


readPointerBE() and readPointerLE()

These functions create a Buffer (pointer) instance from the memory address written to the parent buffer. This is essentially the "dereference" operator in C. Very important for interacting with C libraries via FFI!

Not implemented at the moment, but we could add an optional callback function here that would be invoked during the Buffer free callback, which the user could hook on to to free allocated memory from a C library, for example. This is easy to add if there's no objections.


writePointerBE() and writePointerLE()

The compliment to the readPointer* functions. Essentially the "reference" operator in C. Allows the user to write the memory address of one Buffer instance to an offset in another buffer.


address()

Returns a String representation of the hex address of the pointer in memory.


inspect()

The REPL inspect() function for Buffer gets a cute little update as well thanks to the new address() function. We include the hex pointer address after the constructor name:

> new Buffer(8)
<Buffer@0x10187aa08 00 20 00 00 00 00 00 00>

Missing pieces

So like mentioned above, this is really just the low-level C functionality that we need added as methods to Buffer. There's still some new interfaces we'll need to expose in order to implement a fully-featured FFI interface, or implement js-ctypes. Perhaps we can hash out the details of these in this thread as well:

(not sure about the _ prefix on ffi and dl... just wanted to be conscious about not trampling over the existing npm package names)


Obviously, we'll need to compliment this with good docs and tests (some of which I could pull from ref most likely) before it's ready for merge. I just wanted to get this PR open first to see if there's momentum, and other comments/adjustments that will be needed to be addressed as well. Let's get the 🔮 rolling!

Thanks for reading. Cheers! 🍻

/cc @bnoordhuis @piscisaureus @rvagg @indutny @isaacs @kkoopa @trevnorris @rauchg @saghul @tracker1 @nodejs/build et al.

// or String, based on whether or not a JS Number will lose precision.
// http://stackoverflow.com/q/307179/376773
#define JS_MAX_INT +9007199254740992LL
#define JS_MIN_INT -9007199254740992LL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both of these should be 9007199254740991 instead, since var n = 9007199254740992; console.log(n + 1); will output 9007199254740992 also.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto to this. Number.{MAX,MIN}_SAFE_INTEGER return the same as @mscdex suggested.

@mscdex
Copy link
Contributor

mscdex commented May 20, 2015

I'm not particularly a fan of 'return a string or number depending on the value' for the read*() 64-bit integer functions. Keeping to one type avoids having to do conditionals everywhere.

@TooTallNate
Copy link
Contributor Author

Keeping to one type avoids having to do conditionals everywhere.

The idea is that usually you'd be passing the return value to an Int64 JS implementation, for example the ctypes Int64 constructor or node-int64 or node-bigint, etc., which accept both numbers and strings.

@kkoopa
Copy link

kkoopa commented May 20, 2015

Buffer will be completely rewritten based on typed arrays due to #1451.

@@ -422,6 +426,16 @@ Buffer.prototype.fill = function fill(val, start, end) {
};


Buffer.prototype.address = function address() {
return binding.address(this);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should have preliminary checks here that check that this is a Buffer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a note, both this check and the HasInstance() checks are necessary. Because the object must be an instance of Buffer, and it must contain external array data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JS-side instanceof Buffer checks seem to current only happen in the write*() functions.
fill(), slice(), toString(), etc. don't seem to have these checks. Is that intentional?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. They should in fact have these checks.

EDIT: I'll take care of the other cases in a different PR. Thanks for bringing them up.

@trevnorris
Copy link
Contributor

Pointed out a couple things, but I believe more discussion will need to be had about how this fits in with the soon upcoming massive internal change to Buffer.

@Fishrock123 Fishrock123 added buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. discuss Issues opened for discussions and feedbacks. labels May 20, 2015
@chrisdickinson
Copy link
Contributor

cc'ing @feross into this thread, since he maintains a browser compatibility shim for buffers and may have thoughts!

@TooTallNate TooTallNate changed the title [Idea/Discussion] Giving Buffer more low-level C functionality [Discussion] FFI - Giving Buffer more low-level C functionality May 21, 2015
@TooTallNate TooTallNate added the semver-minor PRs that contain new features and should be released in the next minor version. label May 22, 2015

#ifdef _WIN32
#define __alignof__ __alignof
#define snprintf(buf, bufSize, format, arg) _snprintf_s(buf, bufSize, _TRUNCATE, format, arg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Long line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also seems to be completely unnecessary. Whoops! Removing.

The errno EINVAL behavior is a BSDism, and won't work
cross platform. Instead, pass in the char* endptr value
and check where it ended up.
@TooTallNate
Copy link
Contributor Author

Pointer-related functions removed; will be moved to the ffi module in #1865.

strto(u)ll() usage updated to use an endptr variable, and base 10 radix as per @bnoordhuis' requests.

I think this one is ready for squashing!

@trevnorris
Copy link
Contributor

@TooTallNate Great work. Glad to see the 64 bit operations added. Though I would like to see the Buffer address removed from the console output. The hex values from multiple buffers will be misaligned if address length is different.

@rvagg @bnoordhuis Does .address() fit under the same scope of pointer concerns that was discussed, or is the information benign enough that it doesn't present any other security concern?

On the side, I'd like to handle merging all buffer changes from master to next. Can that be done at any time, or should wait for the next release that includes these changes?

const char* formatter) {
Environment* env = Environment::GetCurrent(args);

ARGS_THIS(args[0].As<Object>());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to check CHECK(HasInstance(args[0])); before doing this. It's easy enough to manipulate the prototype from JS to allow bypassing the JS check and then fatally error here. I don't mind the abort that CHECK() would cause because identifying the issue is much easier.

@trevnorris
Copy link
Contributor

@TooTallNate Think it's a usability issue that I couldn't b.writeUint64LE(b.address())? Doesn't concern me either way, but I see people wanting to potentially do this.

Also, I'll do a follow up PR adding benchmarks for the new read/write methods. Been meaning to give those some love anyway, so don't worry about that here.

@TooTallNate
Copy link
Contributor Author

Think it's a usability issue that I couldn't b.writeUint64LE(b.address())? Doesn't concern me either way, but I see people wanting to potentially do this.

Without being able to readPointer() the memory address back out to a new Buffer instance, I don't think it's exposing much.

@jasnell
Copy link
Member

jasnell commented Nov 16, 2015

@TooTallNate @trevnorris ... any updates on this one?

@aayushkapoor206
Copy link

👍

@tracker1
Copy link

Not seeing anything recent in this thread.. has there been any continued effort towards this?

@TooTallNate
Copy link
Contributor Author

I'm still here and can rebase, etc. but I think I need another core committer to champion this effort or I'm afraid it'll code-rot again.

@tracker1
Copy link

@TooTallNate thanks... honestly, I wish I were more skilled with this level of programming, as I feel getting ffi mainlined would benefit node a lot. IMHO It should have been a core feature much earlier on.

@trevnorris
Copy link
Contributor

@TooTallNate Because of the size of this feature, I may suggest writing a proposal in https://github.com/nodejs/node-eps. If it gets hammered out there and is merged I'll volunteer to help get the patch into core.

@jasnell jasnell added the stalled Issues and PRs that are stalled. label Mar 22, 2016
@jasnell
Copy link
Member

jasnell commented Mar 22, 2016

Marking as stalled. If there's no action on this one in the next few weeks I recommend closing

@jasnell
Copy link
Member

jasnell commented Apr 4, 2016

Closing due to lack of further activity. @TooTallNate ... this is likely something that would still be worthwhile. As @trevnorris has discussed, however, using nodejs/node-eps would likely be the better forum to get things moving.

@jasnell jasnell closed this Apr 4, 2016
@seishun
Copy link
Contributor

seishun commented Aug 31, 2017

@jasnell I would like to add {read|write}[U]Int64{LE|BE} methods to Buffer, which would basically be a small subset of this PR. Does that require a node-eps proposal?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. discuss Issues opened for discussions and feedbacks. semver-minor PRs that contain new features and should be released in the next minor version. stalled Issues and PRs that are stalled.
Projects
None yet
Development

Successfully merging this pull request may close these issues.