-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
u8ArrayToString creates invalid utf8 strings #78
Comments
It's more like a problem of the documentation. u8ArrayToString is creating bytes, although internally it's still JS String object. So it doesn't guarantee UTF-8 validity, just like you can write any binary sequence with host function write_storage in rust-sdk. Maybe this should be named u8ArrayToBytes and provide a checked version that does u8ArrayToString |
No it's not just documentation, you can repro the overflow pretty easily: let initial = "😇";
console.log(initial);
// 😇
let arr = stringToU8Array(initial);
console.log(arr);
// [61, 7]
let s = u8ArrayToString(arr);
console.log(s);
// =� |
If you consider u8ArrayToString is actually u8ArrayToBytes, you know it's the expected behavior |
@austinabell Please check the fix, thanks! Also, the new interface will indicate this is no longer a valid input:
Bytes must be a string containing only |
re-opened because the bug was not fixed:
For more context, JS strings are utf-16 so forcing them to be utf8 by truncation will lead to bugs. Maybe not directly from our tools, but indirectly definitely. |
@austinabell
This is like all other JS library which needs to enforce the type. JS doesn't have compile time type check, so either it can be enforced via documentation or runtime checks. For smart contract, runtime check everywhere can be expensive to experienced users, so we provide the both safe and unsafe path. |
Also, as you noticed, the function in other way is now called: Line 81 in 37965b8
|
As we intentionally dropped u8ArrayToString, and what bytesToU8Array provided behavior is right, I'm closing this issue. As Austin Abell said, all bytes function can't prevent misuse by passing a string, include bytesToU8Array, which is tracked on #117 |
Since it's unchecked, the strings created by it will be invalid strings. Unsure if this is lossy if JS keeps the internal bytes, but this seems unsafe to do.
The issue seems to be also if you're trying to parse a non unicode/utf8 character for the other side of this conversion that there will be an overflow.
Unsure how the jsvm handles thistested and seems this just overflows, so someone could potentially overwrite values they don't have permissions forThe text was updated successfully, but these errors were encountered: