u8ArrayToString creates invalid utf8 strings #78

austinabell · 2022-06-03T00:31:58Z

Since it's unchecked, the strings created by it will be invalid strings. Unsure if this is lossy if JS keeps the internal bytes, but this seems unsafe to do.

The issue seems to be also if you're trying to parse a non unicode/utf8 character for the other side of this conversion that there will be an overflow. ~~Unsure how the jsvm handles this~~ tested and seems this just overflows, so someone could potentially overwrite values they don't have permissions for

ailisp · 2022-06-06T07:29:49Z

It's more like a problem of the documentation. u8ArrayToString is creating bytes, although internally it's still JS String object. So it doesn't guarantee UTF-8 validity, just like you can write any binary sequence with host function write_storage in rust-sdk. Maybe this should be named u8ArrayToBytes and provide a checked version that does u8ArrayToString

austinabell · 2022-06-06T14:52:12Z

No it's not just documentation, you can repro the overflow pretty easily:

let initial = "😇";
console.log(initial);
// 😇
let arr = stringToU8Array(initial);
console.log(arr);
// [61, 7]
let s = u8ArrayToString(arr);
console.log(s);
// =�

ailisp · 2022-06-07T06:05:25Z

If you consider u8ArrayToString is actually u8ArrayToBytes, you know it's the expected behavior

ailisp · 2022-06-07T06:12:00Z

@austinabell Please check the fix, thanks! Also, the new interface will indicate this is no longer a valid input:

let initial = "😇";
console.log(initial);
// 😇
let arr = bytesToU8Array(initial);

Bytes must be a string containing only \x00 to \xff chars.

austinabell · 2022-06-17T20:33:20Z

re-opened because the bug was not fixed:

let initial = "😇";
console.log(initial);
// 😇
let arr = bytesToU8Array(initial);
console.log(arr);
// [61, 7]
let s = u8ArrayToBytes(arr);
console.log(s);
// =�

For more context, JS strings are utf-16 so forcing them to be utf8 by truncation will lead to bugs. Maybe not directly from our tools, but indirectly definitely.

ailisp · 2022-06-20T01:04:29Z

@austinabell
No it's fixed, the snippet above is the unexpected way to use it (I should have this documented in README),

bytesToU8Array expect a bytes, not a JS String. With current API, the safe mode is user should wrap bytes with bytes API: bytesToU8Array(bytes(initial)), and this will fail (bytes will check if every byte in the initial is between --255). For the unsafe use, user need to ensure what passed to bytesToU8Array is really a valid bytes, while the above snippet doesn't hold.

This is like all other JS library which needs to enforce the type. JS doesn't have compile time type check, so either it can be enforced via documentation or runtime checks. For smart contract, runtime check everywhere can be expensive to experienced users, so we provide the both safe and unsafe path.

ailisp · 2022-06-20T01:10:48Z

Also, as you noticed, the function in other way is now called: u8ArrayToBytes instead of u8ArrayToString. It's by design to return bytes instead of string. So the reverse way is also not a bug.

See:
https://github.com/near/near-sdk-js/pull/103/files#diff-3274f1a37032fb0ae4e2823def0007c634e869ae0dfc304ff6a12c36513c3a52
and

near-sdk-js/README.md

Line 81 in 37965b8

    
           - `Bytes` in both arguments and return represent a byte buffer, internally it's a JavaScript String Object. Any binary data `0x00-0xff` is stored as the char '\x00-\xff'. This is because QuickJS doesn't have ArrayBuffer in C API. If the bytes happens to have only 1-byte chars, it happens to be same as the the same content string.

ailisp · 2022-07-20T07:52:13Z

As we intentionally dropped u8ArrayToString, and what bytesToU8Array provided behavior is right, I'm closing this issue. As Austin Abell said, all bytes function can't prevent misuse by passing a string, include bytesToU8Array, which is tracked on #117

austinabell added the bug Something isn't working label Jun 3, 2022

ailisp mentioned this issue Jun 7, 2022

Fix inappropriate naming of u8Array<>bytes #98

Merged

volovyks closed this as completed in #98 Jun 7, 2022

austinabell reopened this Jun 7, 2022

This was referenced Jun 23, 2022

feat: Switch SDK to TypeScript #82

Merged

Change Bytes from string alias to avoid misuse #117

Closed

ailisp closed this as completed Jul 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

u8ArrayToString creates invalid utf8 strings #78

u8ArrayToString creates invalid utf8 strings #78

austinabell commented Jun 3, 2022 •

edited by exalate-issue-sync bot

Loading

ailisp commented Jun 6, 2022

austinabell commented Jun 6, 2022

ailisp commented Jun 7, 2022

ailisp commented Jun 7, 2022

austinabell commented Jun 17, 2022 •

edited

Loading

ailisp commented Jun 20, 2022

ailisp commented Jun 20, 2022 •

edited

Loading

ailisp commented Jul 20, 2022

u8ArrayToString creates invalid utf8 strings #78

u8ArrayToString creates invalid utf8 strings #78

Comments

austinabell commented Jun 3, 2022 • edited by exalate-issue-sync bot Loading

ailisp commented Jun 6, 2022

austinabell commented Jun 6, 2022

ailisp commented Jun 7, 2022

ailisp commented Jun 7, 2022

austinabell commented Jun 17, 2022 • edited Loading

ailisp commented Jun 20, 2022

ailisp commented Jun 20, 2022 • edited Loading

ailisp commented Jul 20, 2022

austinabell commented Jun 3, 2022 •

edited by exalate-issue-sync bot

Loading

austinabell commented Jun 17, 2022 •

edited

Loading

ailisp commented Jun 20, 2022 •

edited

Loading