Use uint8array #308

ailisp · 2022-11-22T10:00:55Z

This PR will fix the issues mentioned in #117 and #195.

Implement Add UTF-8 encoding / decoding and native conversion between Uint8Array and string #314
Breaking changes: Redesign all relevant APIs to take Uint8Array instead of Bytes. Drop Bytes.
Redesign all collections, take string as keys. string can now be any UTF-8 string not only latin1 strings. And all collection APIs remain backward-compatible
Low level API tests, bytes tests and collection tests work
Fix all tests
Add tests to cover all cases conversion/encoding between string and Uint8Array

- any (utf8 seq, non utf8 seq) uint8array to string latin1
- any latin1 string (utf8 seq, non utf8 seq) to uint8array
- any (utf8 seq, non utf8 seq) uint8array to string utf8
- any utf8 string (utf8 chars, latin1 chars that happens to be utf8 seq, latin1 chars) to uint8array
- collections ( JSON.stringify state ), JSON.stringify utf8 input

Fix all examples

…tils

…, fix promise api

…example

ailisp · 2022-12-08T10:03:49Z

run ci tasks locally pass, mark it ready for review. @aesilevich to review changes in C code

volovyks

Great work @ailisp ! Huge effort.
My main suggestion is to move all the conversions to the API level. I left a few comments about that. But we can discuss it separately, feel free to merge this.

examples/__tests__/test-cross-contract-call.ava.js

packages/near-sdk-js/src/api.ts

volovyks · 2022-12-09T14:56:23Z

packages/near-sdk-js/src/api.ts

-  storage_remove(key: Bytes, register: Register): bigint;
+  storage_read(key: Uint8Array, register: Register): bigint;
+  storage_has_key(key: Uint8Array): bigint;
+  storage_write(key: Uint8Array, value: Uint8Array, register: Register): bigint;


We talked about it, but again, what defines if we have string or Uint8Array here? And is it possible to do the conversion in api.ts so users can use strings? I guess they expect "key" to be a string.

If these functions also doing the high level task: decode / encode string into Uint8Array, then user do not have an API to deal with raw bytes. At current form they can compose high level functionality: write string to state by combining storage_write + bytes/encode

what defines if we have string or Uint8Array here

For apis like current_account_id(), we are sure it returned is a string. For apis like storage_read, it takes and returns array with any bytes, which is more fit Uint8Array.

And is it possible to do the conversion in api.ts so users can use strings?

Yes. Discussed further in your next comment

Actually user has it, because they can use env.function() without API wrapper function. But it's less convinient, I agree.

Maybe we can separate this into 2 issues. Function params and return types. Function params can accept string | Uint8Array and return types can be Uint8Array if really necessary. functionRaw and 'function' is also an option.

It looks like a purely DevEx problem, let's get input from DevRel team.

I feel the function and functionRaw distinction makes sense. At the same time, haven't seen devs using these functions or asking about them often, so implementing two functions without them being popular feels contradictory. In any case, having a function that accepts a single param type is closer to single resp principle.

supporting points:

... it's less convinient, I agree.
functionRaw and 'function' is also an option.
function and functionRaw distinction makes sense
having a function that accepts a single param type is closer to single resp principle.

about alternative:

Actually user has it, because they can use env.function() without API wrapper function

Right now env is not exported, and we need more discussion on whether to export it (it doubles host function APIs)

Function params can accept string | Uint8Array and return types can be Uint8Array if really necessary

I thinked about it, will still be breaking changes for function(string | Uint8Array) -> Uint8Array because return type was string, and if return string | Uint8Array based on param type, it's kind of magic.

At the same time, haven't seen devs using these functions or asking about them often,

Yeah it is not often, I've seen one ask: #195 (comment)

Combining your thoughts, I'll go ahead with function + functionRaw (I assume Serhii is okay with both functionRaw and functionBytes and @idea404 prefer functionRaw, and naming of less popular low level API can be changed or alias later)

volovyks · 2022-12-09T14:57:29Z

packages/near-sdk-js/src/collections/lookup-map.ts

    const storageKey = this.keyPrefix + key;
-    return near.storageHasKey(storageKey);
+    return near.storageHasKey(encode(storageKey));


I'm talking about places like this. Can we move this conversion to the API layer?

What do you think of this: have two APIs, storageHasKey and storageHasKeyRaw. where storageHasKey(k) = storageHasKeyRaw(encode(k)). The benefit is backward compatible and in most case the string-version it's what user want. And they can still achieve raw bytes manipulation with raw-version.

The disadvantage is this API might hide the fact that state is in the form of binary. When attempt to use the Low level API, it is expected that they already know nomicon. And to me it's unexpected that near.storageHasKey is not env.storage_has_key but near.storageHasKeyRaw is. This makes api naming inconsistent. For example, input still returns Uint8Array, should it be inputRaw and input?

Hm... maybe not raw, maybe inputBytes() ?
(more comments above)

inputBytes sounds good to me

tests/__tests__/test_promise_api.ava.js

volovyks · 2022-12-09T15:13:33Z

tests/src/highlevel-promise.js

@@ -120,7 +120,7 @@ export class HighlevelPromiseContract {
    return {
      ...callingData(),
      promiseResults: arrayN(near.promiseResultsCount()).map((i) =>
-        near.promiseResult(i)
+        str(near.promiseResult(i))


Any use cases we want to represent promiseResult as something else but string?
(this one is about moving conversion to the API level again)

One example would be promise return one's public key. And callback function take that public key to do something. Public keys that host function returns and takes are binary format.

So, if they both have string in their API, it will work? But we will do the conversion 2 times (gas issue)?

Yes. conversion 2 times adds unnecessary gas consumption

tests/yarn-error.log

examples/src/clean-state.js

examples/yarn-error.log

ailisp · 2022-12-13T09:45:44Z

Based on above discussion, I make storage / valueReturn / input / promise API with two versions. One is string version. It works with any string and utf-8 decode the string to fit NEAR's binary state / arguments interface. The other one is raw version that works directly on Uint8Array and developer can handle customized encoding.

Note that even though string version has the same name, same interface as before, it is not fully backward compatible with original APIs. Take storageWrite as example, the comparison is:

If string is fully ascii, the behavior is the same.
If string contains charCode > 255 unicode char, original API doesn't work, current string API works as expected
If string contains 127 < charCode <= 255, original API stores exactly one byte per char, current string API treat charCode as a subset of unicode char and encode it in utf-8, resulting use two bytes to store one char. It is not as storage efficient as the raw version (storageWriteRaw) or the original version, but result is guaranteed to be correct when read back.

ailisp added 14 commits November 22, 2022 17:47

cherry pick from arraybuffer branch

dd800eb

reset builder

a78156d

update builder.c, fix git diff

5187700

nit

a947408

merge develop

ac487d1

resolve conflict

78ed2a6

strictly different from uint8array and string, fix near-bindgen and u…

c1336a5

…tils

Merge branch 'develop' into use-uint8array

48a1023

make collections use uint8array instead of bytes

c49802a

commit build

b933287

fix build and some tests

e927de7

add text encoding decoding in c side, add TextEncoder and TextDecoder…

c09a8db

…, fix promise api

refactor utf8 and latin1 api

4adcb8f

fix near bindgen and collections utf 8 char issue

46304fc

ailisp mentioned this pull request Dec 2, 2022

Use Uint8Array as internal repr of Bytes and fix #117 #195

Closed

ailisp added 12 commits December 5, 2022 15:42

fix bytes tests

18f3b61

fix public key tests

eec8cb4

merge develop

8197aa6

add all test cases of string<>uint8array conversion

2826581

lint format

f6b837c

fix clean-state cross contract call loop, ft and programmatic update …

9263d61

…example

Merge branch 'develop' into use-uint8array

65d97ed

fixing my-nft.ts build

4e85aba

fix all examples

493ce71

merge develop and resolve conflict

5d05999

remove unused file

0d734ac

fix test

2d4b3a5

ailisp marked this pull request as ready for review December 8, 2022 10:03

ailisp requested a review from volovyks as a code owner December 8, 2022 10:03

volovyks approved these changes Dec 9, 2022

View reviewed changes

ailisp added 3 commits December 12, 2022 14:47

address Serhii comments

56a7a99

keep string APIs mostly backward compatible, make raw apis, fix tests

e8f9c08

fix tests

ea494c6

ailisp merged commit d1ca261 into develop Dec 13, 2022

ailisp deleted the use-uint8array branch December 13, 2022 09:46

This was referenced Dec 13, 2022

Change Bytes from string alias to avoid misuse #117

Closed

unicode support in near-sdk-js's automatic serialization #237

Closed

Add UTF-8 encoding / decoding and native conversion between Uint8Array and string #314

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use uint8array #308

Use uint8array #308

ailisp commented Nov 22, 2022 •

edited

Loading

ailisp commented Dec 8, 2022 •

edited

Loading

volovyks left a comment

volovyks Dec 9, 2022

ailisp Dec 12, 2022

volovyks Dec 12, 2022

idea404 Dec 12, 2022

ailisp Dec 13, 2022

volovyks Dec 9, 2022

ailisp Dec 12, 2022

volovyks Dec 12, 2022

ailisp Dec 13, 2022

volovyks Dec 9, 2022

ailisp Dec 12, 2022

volovyks Dec 12, 2022

ailisp Dec 13, 2022

ailisp commented Dec 13, 2022 •

edited

Loading

Use uint8array #308

Use uint8array #308

Conversation

ailisp commented Nov 22, 2022 • edited Loading

ailisp commented Dec 8, 2022 • edited Loading

volovyks left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ailisp commented Dec 13, 2022 • edited Loading

ailisp commented Nov 22, 2022 •

edited

Loading

ailisp commented Dec 8, 2022 •

edited

Loading

ailisp commented Dec 13, 2022 •

edited

Loading