-
Notifications
You must be signed in to change notification settings - Fork 29.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
buffer: add buffer.transcode #9038
Conversation
const Buffer = require('buffer').Buffer; | ||
const normalizeEncoding = require('internal/util').normalizeEncoding; | ||
|
||
if (process.binding('config').hasIntl) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about possibly placing this in lib/internal/buffer-transcode.js
and conditionally require()
'ing it. purely cosmetic though, to prevent an extra level of indent. or you could just return early. :)
if (!process.binding('config').hasIntl)
return;
// Buffer instance. | ||
exports.transcode = function transcode(source, from_enc, to_enc) { | ||
if (!source || !(source.buffer instanceof ArrayBuffer)) | ||
throw new TypeError('"source" argument must be a Buffer'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We going to have a complaint about not supporting SharedArrayBuffer
for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually, perhaps. Not too worried about that for now.
Updated |
Returns a new `Buffer` instance. | ||
|
||
Throws if transcoding is not possible or if one of the specified encodings is | ||
invalid or unknown. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe give an example of transcoding is not possible
?
return e; | ||
} | ||
|
||
#define THROW_ICU_ERROR(env, status, msg) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm – this could be a ThrowICUError
function, right?
|
||
MaybeLocal<Object> AsBuffer(Isolate* isolate, | ||
MaybeStackBuffer<char>* buf, | ||
size_t len) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine but at some point this might become a member of the MaybeStackBuffer
class? I realize that would conflict a bit with the MaybeStackBuffer<UChar>
overload, maybe leave a TODO
here?
const icu = process.binding('icu'); | ||
|
||
// Maps the supported transcoding conversions. The top key is the from_enc, | ||
// the child key is the to_enc. The value is the transcoding function to. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that final to
residue from editing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, slight brain malfunction there I think ;-)
return source.toString('base64'); | ||
}, | ||
'hex': (source) => { | ||
return source.toString('hex'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mhhh this returns a string
or a Buffer
depending on the target encoding? I don’t think binary-to-text encodings should be allowed here, .toString()
is the right method for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you're right. I'll pull these back out.
5b65f74
to
62423a0
Compare
@addaleax ... updated! PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, nice!
const uint8_t hi = static_cast<uint8_t>(ts_obj_data[n + 0]); | ||
const uint8_t lo = static_cast<uint8_t>(ts_obj_data[n + 1]); | ||
swapspace[i] = (hi << 8) | lo; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this use SwapBytes16
?
const uint8_t hi = static_cast<uint8_t>(ts_obj_data[n + 0]); | ||
const uint8_t lo = static_cast<uint8_t>(ts_obj_data[n + 1]); | ||
swapspace[i] = (hi << 8) | lo; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(ditto)
MaybeStackBuffer<char> buf; | ||
int32_t len; | ||
|
||
u_strToUTF8(*buf, 1024, &len, source, length, &status); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 1024
seem kind of magic here, although I realize that is largely my fault. 😄 (Not sure if there’s anything to do about that)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be fixed now!
|
||
// Transcodes the Buffer from one encoding to another, returning a new | ||
// Buffer instance. | ||
exports.transcode = function transcode(source, from_enc, to_enc) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style: s/from_enc/fromEncoding/
and s/to_enc/toEncoding/
. Ditto for cnv_from and cnv_to.
msg = "Unspecified ICU Exception"; | ||
Local<String> cons = | ||
String::Concat(estring, FIXED_ONE_BYTE_STRING(env->isolate(), ", ")); | ||
cons = String::Concat(cons, OneByteString(env->isolate(), msg)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize you adapted this code from elsewhere but using snprintf() to format the error message will be much more efficient.
String::Concat(estring, FIXED_ONE_BYTE_STRING(env->isolate(), ", ")); | ||
cons = String::Concat(cons, OneByteString(env->isolate(), msg)); | ||
|
||
Local<Value> e = Exception::Error(cons); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a if (e.Empty()) return Local<Value>();
.
size_t len) { | ||
if (buf->IsAllocated()) { | ||
MaybeLocal<Object> ret = Buffer::New(isolate, buf->out(), len); | ||
buf->Release(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be if (!ret.Empty()) buf->Release();
- it's leaking memory now when the buffer can't be created.
if (buf->IsAllocated()) { | ||
MaybeLocal<Object> ret = | ||
Buffer::New(isolate, reinterpret_cast<char*>(buf->out()), len); | ||
buf->Release(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
UChar* source = nullptr; | ||
MaybeStackBuffer<UChar> swapspace; | ||
if (IsLittleEndian()) { | ||
source = reinterpret_cast<UChar*>(ts_obj_data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above: strict aliasing violation and prone to crashing.
} else { | ||
ThrowICUError(env, status, "Unable to transcode buffer"); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like there is ample opportunity to share code between Ucs2FromUtf8 and Utf8FromUcs2, they are 80% identical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps. For now I'm more inclined to keep these separate as it makes finding and tweaking bugs a bit easier. I'll take another pass in a separate PR to condense things down.
} | ||
|
||
void Release() { | ||
buf_ = buf_st_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this also reset length_?
return env->ThrowTypeError("argument should be a Buffer"); \ | ||
} while (0) | ||
|
||
#define SPREAD_ARG(val, name) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you move this into a common header, you might want to give it a slightly less generic name; e.g. SPREAD_BUFFER_ARG.
@@ -21,7 +21,7 @@ | |||
'toolsets': [ 'target' ], | |||
'direct_dependent_settings': { | |||
'defines': [ | |||
'UCONFIG_NO_CONVERSION=1', | |||
#'UCONFIG_NO_CONVERSION=1', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just remove the 'defines' block instead of commenting it out.
9e0464b
to
2da087e
Compare
|
|
eecad5e
to
d7340ec
Compare
CI looks good. @bnoordhuis PTAL... LGTY? |
ping @bnoordhuis |
buf, v8::NewStringType::kNormal, | ||
len).ToLocalChecked()); | ||
if (e.IsEmpty()) | ||
return Local<Value>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you're returning empty handles as an indicator it's probably be more "V8-ish" have the return signature as a MaybeLocal<Value>
instead. been trying to do that in other locations myself.
char buf[kStorageSize]; | ||
int len = snprintf(buf, sizeof(buf), "%s [%s]", msg, u_errorName(status)); | ||
|
||
Local<Value> e = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
creating Local<Value>
's but there's no HandleScope
. if this callback is expected to always be called within an existing HandleScope
(like MakeCallback
), mind putting a comment at the top. also like MakeCallback
(see src/node.h
).
len).ToLocalChecked()); | ||
if (e.IsEmpty()) | ||
return Local<Value>(); | ||
Local<Object> obj = e->ToObject(env->isolate()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we know this is a v8::Object
then can use e.As<Object>()
. that's also more explicit that no extra handle is being created.
obj->Set(env->code_string(), | ||
String::NewFromUtf8(env->isolate(), | ||
u_errorName(status), v8::NewStringType::kNormal) | ||
.ToLocalChecked()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new v8::Maybe<T>
API for v8::Object::Set()
is annoying and ugly, but if if we're going to use some of the new API might as well use all of it.
if (!ret.IsEmpty()) buf->Release(); | ||
return ret; | ||
} | ||
return Buffer::Copy(isolate, dst, len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you're manipulating the original memory, why bother take a copy?
if (U_SUCCESS(status)) { | ||
len = target - *buf; | ||
args.GetReturnValue().Set( | ||
AsBuffer(env->isolate(), &buf, len).ToLocalChecked()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this operation fails, do we want to abort or throw?
for (size_t n = 0, i = 0; i < length; n += 2, i += 1) { | ||
const uint8_t hi = static_cast<uint8_t>(ts_obj_data[n + 0]); | ||
const uint8_t lo = static_cast<uint8_t>(ts_obj_data[n + 1]); | ||
swapspace[i] = (lo << 8) | hi; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for future performance enhancement, detect the alignment of the pointer and perform as many swaps that can be done in a single go.
const conversions = { | ||
'ascii': { | ||
'latin1': (source) => { | ||
return Buffer.from(source); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm a little confused by this whole object, but right here if we're converting ascii to latin1 shouldn't we be passing 'latin1'
as the encoding argument to Buffer.from()
?
6f70820
to
65144bb
Compare
@bnoordhuis ... ok, reworked the implementation with an eye towards simplification and reducing duplication. PTAL |
Thank you for the follow up review @addaleax. Will get this landed tomorrow if there are no further objections. |
Add buffer.transcode(source, from, to) method. Primarily uses ICU to transcode a buffer's content from one of Node.js' supported encodings to another. Originally part of a proposal to add a new unicode module. Decided to refactor the approach towrds individual PRs without a new module. Refs: nodejs#8075
65144bb
to
4d7472b
Compare
New CI run after squashing: https://ci.nodejs.org/job/node-test-pull-request/4665/ |
green except for unrelated failures. landing |
Add buffer.transcode(source, from, to) method. Primarily uses ICU to transcode a buffer's content from one of Node.js' supported encodings to another. Originally part of a proposal to add a new unicode module. Decided to refactor the approach towrds individual PRs without a new module. Refs: #8075 PR-URL: #9038 Reviewed-By: Anna Henningsen <anna@addaleax.net>
Landed in e8eaaa7 |
Add buffer.transcode(source, from, to) method. Primarily uses ICU to transcode a buffer's content from one of Node.js' supported encodings to another. Originally part of a proposal to add a new unicode module. Decided to refactor the approach towrds individual PRs without a new module. Refs: #8075 PR-URL: #9038 Reviewed-By: Anna Henningsen <anna@addaleax.net>
If this is backported to any of the other release lines, it needs to come with #9838 |
Checklist
make -j8 test
(UNIX), orvcbuild test nosign
(Windows) passesAffected core subsystem(s)
buffer
Description of change
Add buffer.transcode(source, from, to) method. Primarily uses ICU to transcode a buffer's content from one of Node.js' supported encodings to another.
Originally part of a proposal to add a new unicode module. Decided to refactor the approach towrds individual PRs without a new module.
Refs: #8075
/cc @trevnorris @addaleax